unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-09-17 23:07:09 +00:00

Author	SHA1	Message	Date
LIU Zhiwei	d26cd63ad6	softfloat: Define misc operations for bfloat16 Backports 5ebf5f4be66c378fd5f3dee85f54dd4942171d57	2021-02-27 16:41:46 -05:00
LIU Zhiwei	d8168a8142	softfloat: Define convert operations for bfloat16 Backports 34f0c0a98a5f3bb6706088c0384f937f7a294d3e	2021-02-27 16:37:11 -05:00
LIU Zhiwei	b0be0d28cc	softfloat: Define operations for bfloat16 Backports 8282310d8535cc2a8431c516e907da79f92df6eb	2021-02-26 15:20:30 -05:00
Frank Chang	d97454eb63	softfloat: Add fp16 and uint8/int8 conversion functions Backports 0d93d8ec632154dea2627a9e989972ee09721187	2021-02-26 15:11:57 -05:00
Max Filippov	d9e561ab2a	softfloat: add xtensa specialization for pickNaNMulAdd pickNaNMulAdd logic on Xtensa is to apply pickNaN to the inputs of the expression (a * b) + c. However if default NaN is produces as a result of (a * b) calculation it is not considered when c is NaN. So with two pickNaN variants there must be two pickNaNMulAdd variants. In addition the invalid flag is always set when (a * b) produces NaN. Backports commit fbcc38e4cb1b539b8615ec9b0adc285351d77628 from qemu	2021-02-26 12:16:51 -05:00
Max Filippov	fee4c62fe4	softfloat: pass float_status pointer to pickNaN Pass float_status structure pointer to the pickNaN so that machine-specific settings are available to NaN selection code. Add use_first_nan property to float_status and use it in Xtensa-specific pickNaN. Backports commit 913602e3ffe6bf50b869a14028a55cb267645ba3	2021-02-26 12:16:05 -05:00
Max Filippov	db780eff66	softfloat: make NO_SIGNALING_NANS runtime property target/xtensa, the only user of NO_SIGNALING_NANS macro has FPU implementations with and without the corresponding property. With NO_SIGNALING_NANS being a macro they cannot be a part of the same QEMU executable. Replace macro with new property in float_status to allow cores with different FPU implementations coexist. Backports cc43c6925113c5bc8f1a0205375931d2e4807c99	2021-02-26 12:11:40 -05:00
Joseph Myers	8d0bf2d6e1	softfloat: return low bits of quotient from floatx80_modrem Both x87 and m68k need the low parts of the quotient for their remainder operations. Arrange for floatx80_modrem to track those bits and return them via a pointer. The architectures using float32_rem and float64_rem do not appear to need this information, so the *_rem interface is left unchanged and the information returned only from floatx80_modrem. The logic used to determine the low 7 bits of the quotient for m68k (target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it looks at the result of converting the remainder to integer, the quotient having been discarded by that point); this patch does not change that, but the m68k maintainers may wish to do so. Backports commit 445810ec915687d37b8ae0ef8d7340ab4a153efa from qemu	2021-02-25 13:39:10 -05:00
Joseph Myers	e4cfbc1f06	softfloat: do not set denominator high bit for floatx80 remainder The floatx80 remainder implementation unnecessarily sets the high bit of bSig explicitly. By that point in the function, arguments that are invalid, zero, infinity or NaN have already been handled and subnormals have been through normalizeFloatx80Subnormal, so the high bit will already be set. Remove the unnecessary code. Backports commit 566601f1f9d972e44214696d3cb320e6c18880aa from qemu	2021-02-25 13:37:13 -05:00
Joseph Myers	2d50384633	softfloat: do not return pseudo-denormal from floatx80 remainder The floatx80 remainder implementation sometimes returns the numerator unchanged when the denominator is sufficiently larger than the numerator. But if the value to be returned unchanged is a pseudo-denormal, that is incorrect. Fix it to normalize the numerator in that case. Backports commit b662495dca0a2a36008cf8def91e2566519ed3f2 from qemu	2021-02-25 13:36:42 -05:00
Joseph Myers	6b63555a00	softfloat: fix floatx80 remainder pseudo-denormal check for zero The floatx80 remainder implementation ignores the high bit of the significand when checking whether an operand (numerator) with zero exponent is zero. This means it mishandles a pseudo-denormal representation of 0x1p-16382L by treating it as zero. Fix this by checking the whole significand instead. Backports commit 499a2f7b554a295cfc10f8cd026d9b20a38fe664 from qemu	2021-02-25 13:35:17 -05:00
Joseph Myers	b08d204a37	softfloat: merge floatx80_mod and floatx80_rem The m68k-specific softfloat code includes a function floatx80_mod that is extremely similar to floatx80_rem, but computing the remainder based on truncating the quotient toward zero rather than rounding it to nearest integer. This is also useful for emulating the x87 fprem and fprem1 instructions. Change the floatx80_rem implementation into floatx80_modrem that can perform either operation, with both floatx80_rem and floatx80_mod as thin wrappers available for all targets. There does not appear to be any use for the _mod operation for other floating-point formats in QEMU (the only other architectures using _rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc, for instructions that have been removed in the latest version of the architecture), so no change is made to the code for other formats. Backports commit 6b8b0136ab3018e4b552b485f808bf66bcf19ead from qemu	2021-02-25 13:34:05 -05:00
Philippe Mathieu-Daudé	4465ff9c93	fpu/softfloat: Silence 'bitwise negation of boolean expression' warning When building with clang version 10.0.0-4ubuntu1, we get: CC lm32-softmmu/fpu/softfloat.o fpu/softfloat.c:3365:13: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation] absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fpu/softfloat.c:3423:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation] absZ0 &= ~ ( ( (uint64_t) ( absZ1<<1 ) == 0 ) & roundNearestEven ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... fpu/softfloat.c:4273:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation] zSig1 &= ~ ( ( zSig2 + zSig2 == 0 ) & roundNearestEven ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix by rewriting the fishy bitwise AND of two bools as an int. Backports commit 4066288694c3bdd175df813cad675a3b5191956b from qemu	2020-06-18 23:56:27 -04:00
Richard Henderson	22004b8106	softfloat: Return bool from all classification predicates This includes _is_any_nan, _is_neg, *_is_inf, etc. Backports commit 150c7a91ce7862bcaf7422f6038dcf0ba4a7eee3 from qemu	2020-05-21 18:23:11 -04:00
Richard Henderson	afd8d05aa2	softfloat: Inline floatx80 compare specializations Replace the floatx80 compare specializations with inline functions that call the standard floatx80_compare{,_quiet} functions. Use bool as the return type. Backports commit c6baf65000f826a713e8d9b5b35e617b0ca9ab5d from qemu	2020-05-21 18:17:53 -04:00
Richard Henderson	57d2419cd3	softfloat: Inline float128 compare specializations Replace the float128 compare specializations with inline functions that call the standard float128_compare{,_quiet} functions. Use bool as the return type. Backports commit b7b1ac684fea49c6bfe1ad8b706aed7b09116d15 from qemu	2020-05-21 18:15:55 -04:00
Richard Henderson	18a46c4d79	softfloat: Inline float64 compare specializations Replace the float64 compare specializations with inline functions that call the standard float64_compare{,_quiet} functions. Use bool as the return type. Backports commit 0673ecdf6cb2b1445a85283db8cbacb251c46516 from qemu	2020-05-21 18:13:44 -04:00
Richard Henderson	a35333741a	softfloat: Inline float32 compare specializations Replace the float32 compare specializations with inline functions that call the standard float32_compare{,_quiet} functions. Use bool as the return type. Backports commit 5da2d2d8e53d80e92a61720ea995c86b33cbf25d from qemu	2020-05-21 18:11:25 -04:00
Richard Henderson	d960523cbd	softfloat: Name compare relation enum Give the previously unnamed enum a typedef name. Use it in the prototypes of compare functions. Use it to hold the results of the compare functions. Backports commit 71bfd65c5fcd72f8af2735905415c7ce4220f6dc from qemu	2020-05-21 18:08:52 -04:00
Richard Henderson	8adc704058	softfloat: Name rounding mode enum Give the previously unnamed enum a typedef name. Use the packed attribute so that we do not affect the layout of the float_status struct. Use it in the prototypes of relevant functions. Adjust switch statements as necessary to avoid compiler warnings. Backports commit 3dede407cc61b64997f0c30f6dbf4df09949abc9 from qemu	2020-05-21 18:02:05 -04:00
Richard Henderson	a5c8178e35	softfloat: Change tininess_before_rounding to bool Slightly tidies the usage within softfloat.c and the representation in float_status. Backports commit a828b373bdabc7e53d1e218e3fc76f85b6674688 from qemu	2020-05-21 17:52:50 -04:00
Richard Henderson	a417227674	softfloat: Replace flag with bool We have had this on the to-do list for quite some time. Backports commit c120391c0090d9c40425c92cdb00f38ea8588ff6 from qemu	2020-05-21 17:48:12 -04:00
Richard Henderson	6530d6342f	softfloat: Use post test for floatN_mul The existing f{32,64}_addsub_post test, which checks for zero inputs, is identical to f{32,64}_mul_fast_test. Which means we can eliminate the fast_test/fast_op hooks in favor of reusing the same post hook. This means we have one fewer test along the fast path for multiply. Backports commit b240c9c497b9880ac0ba29465907d5ebecd48083 from qemu	2020-05-21 17:24:00 -04:00
Joseph Myers	c675454b27	softfloat: fix floatx80 pseudo-denormal round to integer The softfloat function floatx80_round_to_int incorrectly handles the case of a pseudo-denormal where only the high bit of the significand is set, ignoring that bit (treating the number as an exact zero) rather than treating the number as an alternative representation of +/- 2^-16382 (which may round to +/- 1 depending on the rounding mode) as hardware does. Fix this check (simplifying the code in the process). Backports commit 9ecaf5ccec13ff2e8fe1e72f6e0f3367d2169c1c from qemu	2020-05-15 23:59:23 -04:00
Joseph Myers	3d4a7e34e1	softfloat: fix floatx80 pseudo-denormal comparisons The softfloat floatx80 comparisons fail to allow for pseudo-denormals, which should compare equal to corresponding values with biased exponent 1 rather than 0. Add an adjustment for that case when comparing numbers with the same sign. Backports commit be53fa785ab766d2722628403edee75b3e6ab599 from qemu	2020-05-15 23:58:49 -04:00
Joseph Myers	85964d48d2	softfloat: fix floatx80 pseudo-denormal addition / subtraction The softfloat function addFloatx80Sigs, used for addition of values with the same sign and subtraction of values with opposite sign, fails to handle the case where the two values both have biased exponent zero and there is a carry resulting from adding the significands, which can occur if one or both values are pseudo-denormals (biased exponent zero, explicit integer bit 1). Add a check for that case, so making the results match those seen on x86 hardware for pseudo-denormals. Backports commit 41602807766e253ccb6fb761f3ff12767f786e2c from qemu	2020-05-15 23:56:24 -04:00
Joseph Myers	2ea23a5bbd	softfloat: silence sNaN for conversions to/from floatx80 Conversions between IEEE floating-point formats should convert signaling NaNs to quiet NaNs. Most of those in QEMU's softfloat code do so, but those for floatx80 fail to. Fix those conversions to silence signaling NaNs as well. Backports commit 7537c2b4a363237534c96d089a02b0712b49d890 from qemu	2020-05-15 23:54:32 -04:00
Richard Henderson	3e934b99c8	softfloat: Fix BAD_SHIFT from normalizeFloatx80Subnormal All other calls to normalize*Subnormal detect zero input before the call -- this is the only outlier. This case can happen with +0.0 + +0.0 = +0.0 or -0.0 + -0.0 = -0.0, so return a zero of the correct sign. Reported-by: Coverity (CID 1421991) Backports commit 2f311075b7a74124098effc72290767b02869561 from qemu	2020-04-30 07:22:57 -04:00
Alex Bennée	9d83300f3e	fpu: rename softfloat-specialize.h -> .inc.c This is not a normal header and should only be included in the main softfloat.c file to bring in the various target specific specialisations. Indeed as it contains non-inlined C functions it is not even a legal header. Rename it to match our included C convention. Backports commit 00f43279a3e5e7ea3a0fa853157863663e838e2e from qemu	2019-11-18 21:12:30 -05:00
Alex Bennée	dbddafe2df	fpu: replace LIT64 with UINT64_C macros In our quest to eliminate the home rolled LIT64 macro we fixup usage inside the softfloat code. While we are at it we remove some of the extraneous spaces to closer fit the house style. Backports commit e932112420f063776f2b9d9e5512830cd6890a7a from qemu	2019-11-18 20:57:12 -05:00
Alex Bennée	6eb3c9ee79	fpu: use min/max values from stdint.h for integral overflow Remove some more use of LIT64 while making the meaning more clear. We also avoid the need of casts as the results by definition fit into the return type. Backports commit 2c217da0fc9f1127bda804e2a500b8138b02c581 from qemu	2019-11-18 20:45:40 -05:00
Alex Bennée	0d573763c9	fpu: convert float[16/32/64]_squash_denormal to new modern style This also allows us to remove the extractFloat16exp/frac helpers. We avoid using the floatXX_pack_raw functions as they are slight overkill for masking out all but the top bit of the number. The generated code is almost exactly the same as makes no difference to the pre-conversion code. Backports commit e6b405fe00d8e6424a58492b37a1656d1ef0929b from qemu	2019-11-18 20:42:06 -05:00
Alex Bennée	e5c799cd3c	fpu: replace LIT64 usage with UINT64_C for specialize constants We have a wrapper that does the right thing from stdint.h so lets use it for our constants in softfloat-specialize.h Backports commit f7e81a945737631c19405a39d510d2284257c3ff from qemu	2019-11-18 20:39:03 -05:00
Lioncash	d6b706a296	qemu/fpu: Synchronize with Qemu Resolves a few formatting discrepancies	2019-03-09 18:27:31 -05:00
Richard Henderson	11679ff3cf	softfloat: Support float_round_to_odd more places Previously this was only supported for roundAndPackFloat64. New support in round_canonical, round_to_int, float128_round_to_int, roundAndPackFloat32, roundAndPackInt32, roundAndPackInt64, roundAndPackUint64. This does not include any of the floatx80 routines, as we do not have users for that rounding mode there. Backports commit 5d64abb32ffe558e616545819f3e53dd66335994 from qemu	2019-02-28 15:17:38 -05:00
David Hildenbrand	7373819b1a	softfloat: Implement float128_to_uint32 Handling it just like float128_to_uint32_round_to_zero, that hopefully is free of bugs :) Documentation basically copied from float128_to_uint64 Backports commit e45de9922e43c1ce4f4739b62142314a13029d5c from qemu	2019-02-28 15:13:09 -05:00
Thomas Huth	a7c8939b0d	include/fpu/softfloat: Fix compilation with Clang on s390x Clang v7.0.1 does not like the __int128 variable type for inline assembly on s390x: In file included from fpu/softfloat.c:97: include/fpu/softfloat-macros.h:647:9: error: inline asm error: This value type register class is not natively supported! asm("dlgr %0, %1" : "+r"(n) : "r"(d)); ^ Disable this code part there now when compiling with Clang, so that the generic code gets used instead. Backports commit 2c00542c70b9cbd6da510c97cd3d46adcf9e3efc from qemu	2019-01-24 18:37:51 -05:00
Emilio G. Cota	8276a4dc66	hardfloat: implement float32/64 comparison Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: cmp-single: 110.98 MFlops cmp-double: 107.12 MFlops - after: cmp-single: 506.28 MFlops cmp-double: 524.77 MFlops Note that flattening both eq and eq_signaling versions would give us extra performance (695v506, 615v524 Mflops for single/double, respectively) but this would emit two essentially identical functions for each eq/signaling pair, which is a waste. Aggregate performance improvement for the last few patches: [ all charts in png: https://imgur.com/a/4yV8p ] 1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz qemu-aarch64 NBench score; higher is better Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz 16 +-+-----------+-------------+----===-------+---===-------+-----------+-+ 14 +-+..........................@@@&&.=.......@@@&&.=...................+-+ 12 +-+..........................@.@.&.=.......@.@.&.=.....+befor=== +-+ 10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& = +-+ 8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+ @@u& = +-+ 6 +-+............@@@&&=+*##.$%.@.&.=##$$%+@.&.=..###$$%%@i& = +-+ 4 +-+.......###$%%.@.&=...#.$%.@.&.=..#.$%.@.&.=+.#+$ +@m& = +-+ 2 +-+.....*.#$.%.@.&=...#.$%.@.&.=..#.$%.@.&.=..#+$+sqr& = +-+ 0 +-+-----##$%%@@&&=-##$$%@@&&==##$$%@@&&==-##$$%+cmp==-----+-+ FOURIER NEURAL NELU DECOMPOSITION gmean qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905 Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz error bars: 95% confidence interval 4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+ 4 +-+..........................+@@+...........................................................................+-+ 3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub +-+ 2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@& +-+ 2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+ 1.5 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+f%@&$%@&+-+ 0.5 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+sqr@&$%@&+-+ 0 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+cmp&$%@&+-+ 410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean 2. Host: ARM Aarch64 A57 @ 2.4GHz qemu-aarch64 NBench score; higher is better Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz 5 +-+-----------+-------------+-------------+-------------+-----------+-+ 4.5 +-+........................................@@@&==...................+-+ 3 4 +-+..........................@@@&==........@.@&.=.....+before +-+ 3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&== +-+ 2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+ @m@& = +-+ 2 +-+............@@@&==.#.$.%.@&.=.#$$%%.@&.=.#$$%%d@& = +-+ 1.5 +-+.....*#$$%%.@&.=..#.$.%.@&.=..#.$.%.@&.=..#+$ +f@& = +-+ 0.5 +-+......#.$.%.@&.=..#.$.%.@&.=..#.$.%.@&.=..#+$+sqr& = +-+ 0 +-+-----#$$%%@@&==-#$$%%@@&==-#$$%%@@&==-*#$$%+cmp==-----+-+ FOURIER NEURAL NLU DECOMPOSITION gmean	2018-12-19 10:45:22 -05:00
Emilio G. Cota	f7549fc13e	hardfloat: implement float32/64 square root Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: sqrt-single: 42.30 MFlops sqrt-double: 22.97 MFlops - after: sqrt-single: 311.42 MFlops sqrt-double: 311.08 MFlops Here USE_FP makes a huge difference for f64's, with throughput going from ~200 MFlops to ~300 MFlops. Backports commit f131bae8a7b7ed1928cc94c69df291db609c316a from qemu	2018-12-19 10:43:23 -05:00
Emilio G. Cota	3cf836ca83	hardfloat: implement float32/64 fused multiply-add Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: fma-single: 74.73 MFlops fma-double: 74.54 MFlops - after: fma-single: 203.37 MFlops fma-double: 169.37 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: fma-single: 23.24 MFlops fma-double: 23.70 MFlops - after: fma-single: 66.14 MFlops fma-double: 63.10 MFlops 3. IBM POWER8E @ 2.1 GHz - before: fma-single: 37.26 MFlops fma-double: 37.29 MFlops - after: fma-single: 48.90 MFlops fma-double: 59.51 MFlops Here having 3FP64 set to 1 pays off for x86_64: [1] 170.15 vs [0] 153.12 MFlops Backports commit ccf770ba7396c240ca8a1564740083742dd04c08 from qemu	2018-12-19 10:42:00 -05:00
Emilio G. Cota	95781d2bb5	hardfloat: implement float32/64 division Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: div-single: 34.84 MFlops div-double: 34.04 MFlops - after: div-single: 275.23 MFlops div-double: 216.38 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: div-single: 9.33 MFlops div-double: 9.30 MFlops - after: div-single: 51.55 MFlops div-double: 15.09 MFlops 3. IBM POWER8E @ 2.1 GHz - before: div-single: 25.65 MFlops div-double: 24.91 MFlops - after: div-single: 96.83 MFlops div-double: 31.01 MFlops Here setting 2FP64_USE_FP to 1 pays off for x86_64: [1] 215.97 vs [0] 62.15 MFlops Backports commit 4a6295613f533a6841de5968c50e1ca36748807e from qemu	2018-12-19 10:40:00 -05:00
Emilio G. Cota	93991714fb	hardfloat: implement float32/64 multiplication Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: mul-single: 126.91 MFlops mul-double: 118.28 MFlops - after: mul-single: 258.02 MFlops mul-double: 197.96 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: mul-single: 37.42 MFlops mul-double: 38.77 MFlops - after: mul-single: 73.41 MFlops mul-double: 76.93 MFlops 3. IBM POWER8E @ 2.1 GHz - before: mul-single: 58.40 MFlops mul-double: 59.33 MFlops - after: mul-single: 60.25 MFlops mul-double: 94.79 MFlops Backports commit 2dfabc86e656e835c67954c60e143ecd33e15817 from qemu	2018-12-19 10:38:33 -05:00
Emilio G. Cota	0862d9c462	hardfloat: implement float32/64 addition and subtraction Performance results (single and double precision) for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: add-single: 135.07 MFlops add-double: 131.60 MFlops sub-single: 130.04 MFlops sub-double: 133.01 MFlops - after: add-single: 443.04 MFlops add-double: 301.95 MFlops sub-single: 411.36 MFlops sub-double: 293.15 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: add-single: 44.79 MFlops add-double: 49.20 MFlops sub-single: 44.55 MFlops sub-double: 49.06 MFlops - after: add-single: 93.28 MFlops add-double: 88.27 MFlops sub-single: 91.47 MFlops sub-double: 88.27 MFlops 3. IBM POWER8E @ 2.1 GHz - before: add-single: 72.59 MFlops add-double: 72.27 MFlops sub-single: 75.33 MFlops sub-double: 70.54 MFlops - after: add-single: 112.95 MFlops add-double: 201.11 MFlops sub-single: 116.80 MFlops sub-double: 188.72 MFlops Note that the IBM and ARM machines benefit from having HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance can suffer significantly: - IBM Power8: add-single: [1] 54.94 vs [0] 116.37 MFlops add-double: [1] 58.92 vs [0] 201.44 MFlops - Aarch64 A57: add-single: [1] 80.72 vs [0] 93.24 MFlops add-double: [1] 82.10 vs [0] 88.18 MFlops On the Intel machine, having 2F64 set to 1 pays off, but it doesn't for 2F32: - Intel i7-6700K: add-single: [1] 285.79 vs [0] 426.70 MFlops add-double: [1] 302.15 vs [0] 278.82 MFlops Backports commit 1b615d482094e0123d187f0ad3c676ba8eb9d0a3 from qemu	2018-12-19 10:36:55 -05:00
Emilio G. Cota	bca8e39e3c	fpu: introduce hardfloat The appended paves the way for leveraging the host FPU for a subset of guest FP operations. For most guest workloads (e.g. FP flags aren't ever cleared, inexact occurs often and rounding is set to the default [to nearest]) this will yield sizable performance speedups. The approach followed here avoids checking the FP exception flags register. See the added comment for details. This assumes that QEMU is running on an IEEE754-compliant FPU and that the rounding is set to the default (to nearest). The implementation-dependent specifics of the FPU should not matter; things like tininess detection and snan representation are still dealt with in soft-fp. However, this approach will break on most hosts if we compile QEMU with flags that break IEEE compatibility. There is no way to detect all of these flags at compilation time, but at least we check for -ffast-math (which defines __FAST_MATH__) and disable hardfloat (plus emit a #warning) when it is set. This patch just adds common code. Some operations will be migrated to hardfloat in subsequent patches to ease bisection. Note: some architectures (at least PPC, there might be others) clear the status flags passed to softfloat before most FP operations. This precludes the use of hardfloat, so to avoid introducing a performance regression for those targets, we add a flag to disable hardfloat. In the long run though it would be good to fix the targets so that at least the inexact flag passed to softfloat is indeed sticky. Backports commit a94b783952cc493cb241aabb1da8c7a830385baa from qemu	2018-12-19 10:32:32 -05:00
Emilio G. Cota	a9d9005399	softfloat: rename canonicalize to sf_canonicalize glibc >= 2.25 defines canonicalize in commit eaf5ad0 (Add canonicalize, canonicalizef, canonicalizel., 2016-10-26). Given that we'll be including <math.h> soon, prepare for this by prefixing our canonicalize() with sf_ to avoid clashing with the libc's canonicalize(). Backports commit f9943c7f766678af36d31076b78e466256f4871b from qemu	2018-12-19 10:30:38 -05:00
Richard Henderson	834514c676	softfloat: Don't execute divdeu without power7 The divdeu instruction was added to ISA 2.06 (Power7). Exclude this block from older cpus. Fixes: 27ae5109a2ba (softfloat: Specialize udiv_qrnnd for ppc64) Backports commit 7370981bd1ef58b3c20ba8b83cc342d1c61bc773 from qemu	2018-11-11 08:33:46 -05:00
Richard Henderson	e54a2b65c0	softfloat: Specialize udiv_qrnnd for ppc64 The ISA has a 128/64-bit division instruction, though it assumes the low 64-bits of the numerator are 0, and so requires a bit more fixup than a full 128-bit division insn. Backports commit 27ae5109a2ba8b6b679cce3e03e16570a34390a0 from qemu	2018-10-08 14:15:15 -04:00
Richard Henderson	0fd568871e	softfloat: Specialize udiv_qrnnd for s390x The ISA has a 128/64-bit division instruction. Backports commit 739df333dc8853ae6578492675a26a601d6be077 from qemu	2018-10-08 14:15:15 -04:00
Richard Henderson	8de4da1475	softfloat: Specialize udiv_qrnnd for x86_64 The ISA has a 128/64-bit division instruction. Backports commit b299e88d4261b0af30190e74005ad930e04f3a11 from qemu	2018-10-08 14:15:15 -04:00
Richard Henderson	90fdf9b598	softfloat: Fix division The __udiv_qrnnd primitive that we nicked from gmp requires its inputs to be normalized. We were not doing that. Because the inputs are nearly normalized already, finishing that is trivial. Replace div128to64 with a "proper" udiv_qrnnd, so that this remains a reusable primitive. Fixes: cf07323d494 Fixes: https://bugs.launchpad.net/qemu/+bug/1793119 Backports commit 5dfbc9e4903c0121140f2945f05df48cea72dd82 from qemu	2018-10-08 14:15:15 -04:00

1 2 3

140 commits