Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 42.30 MFlops
sqrt-double: 22.97 MFlops
- after:
sqrt-single: 311.42 MFlops
sqrt-double: 311.08 MFlops
Here USE_FP makes a huge difference for f64's, with throughput
going from ~200 MFlops to ~300 MFlops.
Backports commit f131bae8a7b7ed1928cc94c69df291db609c316a from qemu
The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.
The approach followed here avoids checking the FP exception flags register.
See the added comment for details.
This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags that break IEEE compatibility. There is no way to detect
all of these flags at compilation time, but at least we check for
-ffast-math (which defines __FAST_MATH__) and disable hardfloat
(plus emit a #warning) when it is set.
This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.
Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.
Backports commit a94b783952cc493cb241aabb1da8c7a830385baa from qemu
glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
Given that we'll be including <math.h> soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().
Backports commit f9943c7f766678af36d31076b78e466256f4871b from qemu
The __udiv_qrnnd primitive that we nicked from gmp requires its
inputs to be normalized. We were not doing that. Because the
inputs are nearly normalized already, finishing that is trivial.
Replace div128to64 with a "proper" udiv_qrnnd, so that this
remains a reusable primitive.
Fixes: cf07323d494
Fixes: https://bugs.launchpad.net/qemu/+bug/1793119
Backports commit 5dfbc9e4903c0121140f2945f05df48cea72dd82 from qemu
Our minimum required compiler for compiling QEMU is GCC 4.1 these days,
so we can drop the support for compilers which do not provide the
__builtin_clz*() functions yet. Since the countLeadingZeros32/64 are
then identical to the clz32/64 functions, and we do not have to sync
the softloat 2 codebase with upstream anymore (softloat 3 is a complete
rewrite) we can simply replace the functions with our QEMU versions.
Backports commit 0019d5c3a18c31604fb55f9cec3ceb13999c4866 from qemu
It has not had users since f83311e476 ("target-m68k: use floatx80
internally", 2017-06-21).
Note that no other bit-width has floatX_trunc_to_int.
Backports commit c953da8f0be5e026d1c9128660736d72294feb3e from qemu
For 0x1.0000000000003p+0 + 0x1.ffffffep+14 = 0x1.0001fffp+15
we dropped the sticky bit and so failed to raise inexact.
Backports commit 64d450a0eaad5f02f9d6bba1dd451446297bb4dc from qemu
Isolate the target-specific choice to 3 functions instead of 6.
The code in floatx80_default_nan tried to be over-general. There are
only two targets that support this format: x86 and m68k. Thus there
is no point in inventing a mechanism for snan_bit_is_one.
Move routines that no longer have ifdefs out of softfloat-specialize.h.
Backports commit 377ed92679a2a5f838bc0a095112ea5020720fff from qemu
Isolate the target-specific choice to 2 functions instead of 6.
The code in float16_default_nan was only correct for ARM, MIPS, and X86.
Though float16 support is rare among our targets.
The code in float128_default_nan was arguably wrong for Sparc. While
QEMU supports the Sparc 128-bit insns, no real cpu enables it.
The code in floatx80_default_nan tried to be over-general. There are
only two targets that support this format: x86 and m68k. Thus there
is no point in inventing a value for snan_bit_is_one.
Move routines that no longer have ifdefs out of softfloat-specialize.h.
Backports commit 0218a16e540ad416683e19dfbd52f75092507b27 from qemu
For each operand, pass a single enumeration instead of a pair of booleans.
The commit also merges multiple different ifdef-selected implementations
of pickNaNMulAdd into a single function whose body is ifdef-selected.
Backports commit 3bd2dec1a1e8fadb49e3ff2e2633f79e01a25c41 from qemu
For each operand, pass a single enumeration instead of a pair of booleans.
The commit also merges multiple different ifdef-selected implementations
of pickNaN into a single function whose body is ifdef-selected.
Backports commit 4f251cfd52c7945ebd6ab0d86518b1a9aa51b10c from qemu
We will need these helpers within softfloat-specialize.h, so move
the definitions above the include. After specialization, they will
not always be used so mark them to avoid the Werror.
Backports commit 247d1f2190c5530fd18fe92a145d0a1985fca4e4 from qemu
This allows us to delete a lot of additional boilerplate
code which is no longer needed.
Backports commit 6fed16b265a4fcc810895bbca4d67e1ae7a89f07 from qemu
For float16 ARM supports an alternative half-precision format which
sacrifices the ability to represent NaN/Inf in return for a higher
dynamic range. The new FloatFmt flag, arm_althp, is then used to
modify the behaviour of canonicalize and round_canonical with respect
to representation and exception raising.
Usage of this new flag waits until we re-factor float-to-float conversions.
Backports commit ca3a3d5a3141d44aa717dc11e4d33a834a85e1f6 from qemu
With a canonical representation of NaNs, we can silence an SNaN
immediately rather than delay until the final format is known.
Backports commit 0bcfbcbea548656ff930394f296589728c2a0c5d from qemu
With a canonical representation of NaNs, we can return the
default nan directly rather than delay the expansion until
the final format is known.
Note one case where we uselessly assigned to a.sign, which was
overwritten/ignored later when expanding float_class_dnan.
Backports commit f7e598e264b94d0982e647ac303108781d5eb4fa from qemu
Shift the NaN fraction to a canonical position, much like we
do for the fraction of normal numbers. This will facilitate
manipulation of NaNs within the shared code paths.
Backports commit 94933df0e5c34d1a50fc950553f9c9649cae5320 from qemu
The significand is passed to normalizeRoundAndPackFloat128() as high
first, low second. The current code passes the integer first, so the
result is incorrectly shifted left by 64 bits.
This bug affects the emulation of s390x instruction CXLGBR (convert
from logical 64-bit binary-integer operand to extended BFP result).
Backports commit 6603d50648901e8b9e6d66ec1142accf0b1df1e6 from qemu
In float-to-integer conversion, if the floating point input
converts exactly to the largest or smallest integer that
fits in to the result type, this is not an overflow.
In this situation we were producing the correct result value,
but were incorrectly setting the Invalid flag.
For example for Arm A64, "FCVTAS w0, d0" on an input of
0x41dfffffffc00000 should produce 0x7fffffff and set no flags.
Fix the boundary case to take the right half of the if()
statements.
This fixes a regression from 2.11 introduced by the softfloat
refactoring.
Backports commit 333583757c5e910b040bef793974773635ce1918 from qemu
Reported by Coverity (CID1390635). We ensure this for uint_to_float
later on so we might as well mirror that.
Backports commit a5a5f5e2e437db6c19164b734f838a7bf9e0c5ec from qemu
It is implementation defined whether a multiply-add of
(0,inf,qnan) or (inf,0,qnan) raises InvalidaOperation or
not, so we let the target-specific pickNaNMulAdd function
handle this. This means that we must do the "return the
default NaN in default NaN mode" check after the call,
not before. Correct the ordering, and restore the comment
from the old propagateFloat64MulAddNaN() that warned about
this corner case.
This fixes a regression from 2.11 for Arm guests where we would
incorrectly fail to set the Invalid flag for these cases.
Backports commit 1839189bbf89889076aadf0c793c1b57977b28d7 from qemu
Without bounding the increment, we can overflow exp either here
in scalbn_decomposed or when adding the bias in round_canonical.
This can result in e.g. underflowing to 0 instead of overflowing
to infinity.
The old softfloat code did bound the increment.
Backports commit ce8d4082054519f2eaac39958edde502860a7fc6 from qemu
The re-factoring of div_floats changed the order of checking meaning
an operation like -inf/0 erroneously raises the divbyzero flag.
IEEE-754 (2008) specifies this should only occur for operations on
finite operands.
We fix this by moving the check on the dividend being Inf/0 to before
the divisor is zero check.
Backports commit 9cb4e398c2f95c1e837fe9c570e124a55259f725 from qemu
The re-factor broke the raising of INVALID when NaN/Inf is passed to
the float_to_int conversion functions. round_to_uint_and_pack got this
right for NaN but also missed out the Inf handling.
Fixes https://bugs.launchpad.net/qemu/+bug/1759264
Backports commit 801bc56336a127d9b351b3a2cc0336e4d0cb2686 from qemu
Before 8936006 ("fpu/softfloat: re-factor minmax", 2018-02-21),
we used to return +Zero for maxnummag(-Zero,+Zero); after that
commit, we return -Zero.
Fix it by making {min,max}nummag consistent with {min,max}num,
deferring to the latter when the absolute value of the operands
is the same.
With this fix we now pass fp-test.
Backports commit 6245327a367292b354489c54e965646823023919 from qemu
Since f3218a8 ("softfloat: add floatx80 constants")
floatx80_infinity is defined but never used.
This patch updates floatx80 functions to use
this definition.
This allows to define a different default Infinity
value on m68k: the m68k FPU defines infinity with
all bits set to zero in the mantissa.
Backports commit 0f605c889ca3fe9744166ad4149d0dff6dacb696 from qemu
Move fpu/softfloat-macros.h to include/fpu/
Export floatx80 functions to be used by target floatx80
specific implementations.
Exports:
propagateFloatx80NaN(), extractFloatx80Frac(),
extractFloatx80Exp(), extractFloatx80Sign(),
normalizeFloatx80Subnormal(), packFloatx80(),
roundAndPackFloatx80(), normalizeRoundAndPackFloatx80()
Also exports packFloat32() that will be used to implement
m68k fsinh, fcos, fsin, ftan operations.
Backports commit 88857aca93f6ec8f372fb9c8201394b0e5582034 from qemu
This is a little bit of a departure from softfloat's original approach
as we skip the estimate step in favour of a straight iteration. There
is a minor optimisation to avoid calculating more bits of precision
than we need however this still brings a performance drop, especially
for float64 operations.
Backports commit c13bb2da9eedfbc5886c8048df1bc1114b285fb0 from qemu
The compare function was already expanded from a macro. I keep the
macro expansion but move most of the logic into a compare_decomposed.
Backports commit 0c4c90929143a530730e2879204a55a30bf63758 from qemu
Let's do the same re-factor treatment for minmax functions. I still
use the MACRO trick to expand but now all the checking code is common.
Backports commit 89360067071b1844bf745682e18db7dde74cdb8d from qemu
This is one of the simpler manipulations you could make to a floating
point number.
Backports commit 0bfc9f195209593e91a98cf2233753f56a2e5c02 from qemu
These are considerably simpler as the lower order integers can just
use the higher order conversion function. As the decomposed fractional
part is a full 64 bit rounding and inexact handling comes from the
pack functions.
Backports commit c02e1fb80b553d47420f7492de4bc590c2461a86 from qemu
We share the common int64/uint64_pack_decomposed function across all
the helpers and simply limit the final result depending on the final
size.
Backports commit ab52f973a504f8de0c5df64631ba4caea70a7d9e from qemu
We can now add float16_round_to_int and use the common round_decomposed and
canonicalize functions to have a single implementation for
float16/32/64 round_to_int functions.
Backports commit dbe4d53a590f5689772b683984588b3cf6df163e from qemu
We can now add float16_muladd and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 muladd functions.
Backports commit d446830a3aac33e7221e361dad3ab1e1892646cb from qemu
We can now add float16_div and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.
Backports commit cf07323d494f4bc225e405688c2e455c3423cc40 from qemu
We can now add float16_mul and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.
Backports commit 74d707e2cc1e406068acad8e5559cd2584b1073a from qemu
We can now add float16_add/sub and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 add and sub functions.
Backports commit 6fff216769cf7eaa3961c85dee7a72838696d365 from qemu