unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-08-05 08:51:10 +00:00

Author	SHA1	Message	Date
Richard Henderson	8ef39cc2d5	target/arm: Add state for the ARMv8.3-PAuth extension Add storage space for the 5 encryption keys. Backports commit 991ad91b6a1f09a6ad62b6e6da78d83b548daec7 from qemu	2019-01-22 15:11:39 -05:00
Alexander Graf	f2682ff309	target/arm: Allow Aarch32 exception return to switch from Mon->Hyp In U-boot, we switch from S-SVC -> Mon -> Hyp mode when we want to enter Hyp mode. The change into Hyp mode is done by doing an exception return from Mon. This doesn't work with current QEMU. The problem is that in bad_mode_switch() we refuse to allow the change of mode. Note that bad_mode_switch() is used to do validation for two situations: (1) changes to mode by instructions writing to CPSR.M (ie not exception take/return) -- this corresponds to the Armv8 Arm ARM pseudocode Arch32.WriteModeByInstr (2) changes to mode by exception return Attempting to enter or leave Hyp mode via case (1) is forbidden in v8 and UNPREDICTABLE in v7, and QEMU is correct to disallow it there. However, we're already doing that check at the top of the bad_mode_switch() function, so if that passes then we should allow the case (2) exception return mode changes to switch into Hyp mode. We want to test whether we're trying to return to the nonexistent "secure Hyp" mode, so we need to look at arm_is_secure_below_el3() rather than arm_is_secure(), since the latter is always true if we're in Mon (EL3). Backports commit 2d2a4549cc29850aab891495685a7b31f5254b12 from qemu	2019-01-22 15:09:30 -05:00
Cleber Rosa	38ca341aeb	configure: keep track of Python version Some functionality is dependent on the Python version detected/configured on configure. While it's possible to run the Python version later and check for the version, doing it once is preferable. Also, it's a relevant information to keep in build logs, as the overall behavior of the build can be affected by it. Backports commit 755ee70ff758584b8b6190b2cab4b480402af201 from qemu	2019-01-22 15:07:59 -05:00
Philippe Mathieu-Daudé	24c56c65a3	qemu/compiler: Define QEMU_NONSTRING GCC 8 introduced the -Wstringop-truncation checker to detect truncation by the strncat and strncpy functions (closely related to -Wstringop-overflow, which detect buffer overflow by string-modifying functions declared in <string.h>). In tandem of -Wstringop-truncation, the "nonstring" attribute was added: The nonstring variable attribute specifies that an object or member declaration with type array of char, signed char, or unsigned char, or pointer to such a type is intended to store character arrays that do not necessarily contain a terminating NUL. This is useful in detecting uses of such arrays or pointers with functions that expect NUL-terminated strings, and to avoid warnings when such an array or pointer is used as an argument to a bounded string manipulation function such as strncpy. From the GCC manual: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-nonstring-variable-attribute Add the QEMU_NONSTRING macro which checks if the compiler supports this attribute. Backports commit 1daff2f8193496b0e5e0ab56dc48c570c81f804e from qemu	2019-01-22 15:06:09 -05:00
Vitaly Kuznetsov	b6cc2c4e06	i386/kvm: add a comment explaining why .feat_names are commented out for Hyper-V feature bits Hyper-V .feat_names are, unlike hardware features, commented out and it is not obvious why we do that. Document the current status quo. Backports commit abd5fc4c862d033a989552914149f01c9476bb16 from qemu	2019-01-14 15:02:35 -05:00
Vitaly Kuznetsov	2873612479	i386/kvm: expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX as feature words It was found that QMP users of QEMU (e.g. libvirt) may need HV_CPUID_ENLIGHTMENT_INFO.EAX/HV_CPUID_NESTED_FEATURES.EAX information. In particular, 'hv_tlbflush' and 'hv_evmcs' enlightenments are only exposed in HV_CPUID_ENLIGHTMENT_INFO.EAX. HV_CPUID_NESTED_FEATURES.EAX is exposed for two reasons: convenience (we don't need to export it from hyperv_handle_properties() and as future-proof for Enlightened MSR-Bitmap, PV EPT invalidation and direct virtual flush features. Backports commit a2b107dbbd342ff2077aa5af705efaf68c375459 from qemu	2019-01-14 15:01:13 -05:00
Eduardo Habkost	3eb700bec7	x86: host-phys-bits-limit option Backports part of commit 258fe08bd341d2e230676228307294e41f33002c from qemu. Namely, just adding the struct member.	2019-01-14 14:56:57 -05:00
Paolo Bonzini	bf6192276b	target/i386: Disable MPX support on named CPU models MPX support is being phased out by Intel; GCC has dropped it, Linux is also going to do that. Even though KVM will have special code to support MPX after the kernel proper stops enabling it in XCR0, we probably also want to deprecate that in a few years. As a start, do not enable it by default for any named CPU model starting with the 4.0 machine types; this include Skylake, Icelake and Cascadelake. Backports commit ecb85fe48cacb2f8740186e81f2f38a2e02bd963 from qemu	2019-01-14 14:54:40 -05:00
Borislav Petkov	152fdb49de	target-i386: Reenable RDTSCP support on Opteron_G[345] CPU models CPU models The missing functionality was added ~3 years ago with the Linux commit 46896c73c1a4 ("KVM: svm: add support for RDTSCP") so reenable RDTSCP support on those CPU models. Opteron_G2 - being family 15, model 6, doesn't have RDTSCP support (the real hardware doesn't have it. K8 got RDTSCP support with the NPT models, i.e., models >= 0x40). Document the host's minimum required kernel version, while at it. Backports commit 483c6ad426dbab72d912fe4793d7d558671aa727 from qemu	2019-01-14 14:50:22 -05:00
Marc-André Lureau	585ebf50f7	build-sys: build with Vista API by default Both qemu & qga build with Vista API by default already, by defining _WIN32_WINNT 0x0600. Set it globally in osdep.h instead. This replaces WINVER by _WIN32_WINNT in osdep.h. WINVER doesn't seem to be really useful these days. (see also https://blogs.msdn.microsoft.com/oldnewthing/20070411-00/?p=27283) Backports commit 56cdca1d7a6a9c8ce28287b8c986ac9ea87ba603 from qemu	2019-01-13 20:28:51 -05:00
Marc-André Lureau	9ff8b70682	build-sys: move windows defines in osdep.h header This removes some clutter in compilation logging, and allows some easier tweaking per compilation unit/CFLAGS overriding. Note that we can't move those define in os-win32.h, since they must be set before the first system headers are included. Backports commit 007e722c349839f430f10639ba8c94fe43acfe50 from qemu	2019-01-13 20:27:27 -05:00
Roman Bolshakov	03beb4f15a	qemu-thread: Don't block SEGV, ILL and FPE If any of these signals happen on macOS, they are not delivered to other threads and signalfd_compat receives nothing. Indeed, POSIX reference and sigprocmask(2) note that an attempt to block the signals results in undefined behaviour. SEGV and FPE can't also be received by signalfd(2) on Linux. An ability to retrieve SIGBUS via signalfd(2) is used by QEMU for memory preallocation therefore we can't unblock it without consequences. But it's important to leave a remark that the signal is lost on macOS. Backports commit 21a43af0f18335af4abb1959aa28ee9d159a2d43 from qemu	2019-01-13 19:50:32 -05:00
Peter Maydell	55bc017af4	target/arm: Emit barriers for A32/T32 load-acquire/store-release insns Now that MTTCG is here, the comment in the 32-bit Arm decoder that "Since the emulation does not have barriers, the acquire/release semantics need no special handling" is no longer true. Emit the correct barriers for the load-acquire/store-release insns, as we already do in the A64 decoder. Backports commit 96c552958dbb63453b5f02bea6e704006d50e39a from qemu	2019-01-13 19:48:27 -05:00
Richard Henderson	254f882efc	target/arm: SVE brk[ab] merging does not have s bit While brk[ab] zeroing has a flags setting option, the merging variant does not. Retain the same argument structure, to share expansion but force the flag zero and do not decode bit 22. Backports commit 407e6ce7f1f428cb242d424cd35381a77b5b2071 from qemu	2019-01-13 19:39:34 -05:00
Richard Henderson	4d8b7a9967	target/arm: Convert ARM_TBFLAG_* to FIELDs Use "register" TBFLAG_ANY to indicate shared state between A32 and A64, and "registers" TBFLAG_A32 & TBFLAG_A64 for fields that are specific to the given cpu state. Move ARM_TBFLAG_BE_DATA to shared state, instead of its current placement within "Bit usage when in AArch32 state". Backports commit aad821ac4faad369fad8941d25e59edf2514246b from qemu	2019-01-13 19:21:18 -05:00
Fredrik Noring	ee4b59e981	target/mips: Support R5900 three-operand MADD1 and MADDU1 instructions The three-operand MADD and MADDU are specific to R5900 cores. Backports commit a95c4c26f1dc233987350e7cb1cf62d46ade5ce5 from qemu	2019-01-05 08:07:56 -05:00
Philippe Mathieu-Daudé	76bc93690f	target/mips: Support R5900 three-operand MADD and MADDU instructions The three-operand MADD and MADDU are specific to Sony R5900 core, and Toshiba TX19/TX39/TX79 cores as well. The "32-Bit TX System RISC TX39 Family Architecture manual" is available at https://wiki.qemu.org/File:DSAE0022432.pdf Backports commit 3b948f053fc588154d95228da8a6561c61c66104 from qemu	2019-01-05 08:03:43 -05:00
Aleksandar Markovic	5729c803a7	target/mips: MXU: Add handler for an align instruction Add translation handler for S32ALNI MXU instruction. Backports commit 79f5fee7a3c53494c7ca4bc18c72944f5e2d5c2f from qemu	2019-01-05 08:00:09 -05:00
Aleksandar Markovic	94956d81f6	target/mips: MXU: Add handlers for max/min instructions Add translation handlers for six max/min MXU instructions. Backports commit bb84cbf38505bd1d800fdddcd81407a99e5c2142 from qemu	2019-01-05 07:55:39 -05:00
Aleksandar Markovic	bf7da7bf57	target/mips: MXU: Add handlers for logic instructions Add translation handlers for four logic MXU instructions. It should be noted that there is an error in MXU documentation (dated June 2017) regarding opcodes for this group of instructions. This was confirmed by running tests on hardware, and also by looking up other related public source trees (binutils, Android NDK). In initial MXU patches to QEMU, opcodes for MXU logic instructions were created to be in accordance with the MXU documentation, therefore the error from was propagated. This patch corrects that, changing the involved code. Besides that, as MXU was designed and implemented only for 32-bit CPUs, corresponding preprosessor conditions were added around MXU code, which allows more flexible implementation of MXU handlers. Backports commit b621f0187ef789aeef733cf79e5ac83984752394 from qemu	2019-01-05 07:48:08 -05:00
Aleksandar Markovic	ba253dd0d3	target/mips: MXU: Improve the comment containing MXU overview Improve textual description of MXU extension. These are mostly comment formatting changes. Backports commit 84e2c895b12fb7056daeb7e5094656eae7b50d3d from qemu	2019-01-05 07:39:47 -05:00
Aleksandar Markovic	57bb979ce8	target/mips: MXU: Add generic naming for optn2 constants Add generic naming involving generig suffixes OPTN0, OPTN1, OPTN2, OPTN3 for four optn2 constants. Existing suffixes WW, LW, HW, XW are not quite appropriate for some instructions using optn2.	2019-01-05 07:35:49 -05:00
Aleksandar Markovic	b5e1ea2e08	target/mips: MXU: Add missing opcodes/decoding for LX* instructions Add missing opcodes and decoding engine for LXB, LXH, LXW, LXBU, and LXHU instructions. They were for some reason forgotten in previous commits. The MXU opcode list and decoding engine should be now complete. Backports commit c233bf07af7cf2358b69c38150dbd2e3e4a399b6 from qemu	2019-01-05 07:34:07 -05:00
Paul Burton	1c6732b053	atomics: Set ATOMIC_REG_SIZE=8 for MIPS n32 ATOMIC_REG_SIZE is currently defined as the default sizeof(void ) for all MIPS host builds, including those using the n32 ABI. n32 is the MIPS64 ILP32 ABI and as such tcg/mips/tcg-target.h defines TCG_TARGET_REG_BITS as 64 for n32 builds. If we attempt to build QEMU for an n32 host with support for a 64b target architecture then TCG_OVERSIZED_GUEST is 0 and accel/tcg/cputlb.c attempts to use atomic_ functions. This fails because ATOMIC_REG_SIZE is 4, causing the calls to QEMU_BUILD_BUG_ON(sizeof(ptr) > ATOMIC_REG_SIZE) in the various atomic_ functions to generate errors. Fix this by defining ATOMIC_REG_SIZE as 8 for all MIPS64 builds, which will cover both n32 (ILP32) & n64 (LP64) ABIs in much the same was as we already do for x86_64/x32. Backports commit c5b00c1684f3317e887c7401b58dde54c2b05354 from qemu	2019-01-05 07:26:14 -05:00
Richard Henderson	6ed82f77b4	tcg: Improve call argument loading Free the argument register only after we have verified that the temporary is not already in that register. This case is likely now that we are back propagating the preferred register. Backports commit 4250da10923347c9ee907f8d72bd93dfa5ee8742 from qemu	2019-01-05 07:24:08 -05:00
Richard Henderson	64843e8c09	tcg: Record register preferences during liveness With these preferences, we can arrange for function call arguments to be computed into the proper registers instead of requiring extra moves. Backports commit 25f49c5f1508ddf081ce89fa6bbfd87a51eea37b from qemu	2019-01-05 07:22:57 -05:00
Richard Henderson	c2be1cee79	tcg: Add TCG_OPF_BB_EXIT Use this to notice the opcodes that exit the TB, which implies that local temps are really dead and need not be synced. Previously we so marked the true end of the TB, but that was immediately overwritten by the la_bb_end invoked by any TCG_OPF_BB_END opcode, like exit_tb. Backports commit ae36a246ed1a0e96c6c4f478f03d047dfa3a8898 from qemu	2019-01-05 07:09:38 -05:00
Richard Henderson	63cf164724	tcg: Split out more subroutines from liveness_pass_1 Backports commit f65a061c39cc4f9d088201031050e42eb23d5b2a from qemu	2019-01-05 07:07:49 -05:00
Richard Henderson	c348ceba56	tcg: Rename and adjust liveness_pass_1 helpers No need for a "tcg_" prefix for a static function; we already have another "la_" prefix for indicating liveness analysis. Pass in nb_globals and nb_temps, as we will already have them in registers for other loops within the parent function. Backports commit 2616c8082143373e794b62444bf81754f50dbf6b from qemu	2019-01-05 07:05:58 -05:00
Richard Henderson	b356212b33	tcg: Dump register preference info with liveness Backports commit 1894f69a612b35c2a39b44a824da04d74bfe324a from qemu	2019-01-05 07:00:21 -05:00
Richard Henderson	494d802781	tcg: Improve register allocation for matching constraints Try harder to honor the output_pref. When we're forced to allocate a second register for the input, it does not need to use the input constraint; that will be honored by the register we allocate for the output and a move is already required. Backports commit d62816f2db439b2dd761c674f0256f21d9dd2ed0 from qemu	2019-01-05 06:57:56 -05:00
Richard Henderson	83a7de2566	tcg: Add output_pref to TCGOp Allocate storage for, but do not yet fill in, per-opcode preferences for the output operands. Pass it in to the register allocation routines for output operands. Backports commit 69e3706d2b473815e382552e729d12590339e0ac from qemu	2019-01-05 06:54:40 -05:00
Richard Henderson	19bde1a9cf	tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi Pass this through to temp_sync. Backports commit ba87719cd267e6f07b17f6cda08246bf483146d4 from qemu	2019-01-05 06:51:55 -05:00
Richard Henderson	c3aa567b03	tcg: Add preferred_reg argument to temp_sync Pass this through to tcg_reg_alloc. Backports commit 98b4e186c1ccb8f1868c61a33a3be8c2b82654f3 from qemu	2019-01-05 06:50:22 -05:00
Richard Henderson	96b6640f3b	tcg: Add preferred_reg argument to temp_load Pass this through to tcg_reg_alloc. Backports commit b722452aefb089e003b16946a4d73bad1fd3b79b from qemu	2019-01-05 06:48:19 -05:00
Richard Henderson	5e73b27607	tcg: Add preferred_reg argument to tcg_reg_alloc This new argument will aid register allocation by indicating how the temporary will be used in future. If the preference cannot be satisfied, fall back to the constraints of the current insn. Short circuit the preference when it cannot be satisfied or if it does not further constrain the operation. With an eye toward optimizing function call sequences, optimize for the preferred_reg set containing a single register. For the moment, all users pass 0 for preference. Backports commit b016486e7baddb43cfc1e51909b05cde9cf82e0c from qemu	2019-01-05 06:45:15 -05:00
Richard Henderson	6aea2880d2	tcg: Add reachable_code_pass Delete trivially dead code that follows unconditional branches and noreturn helpers. These can occur either via optimization or via the structure of a target's translator following an exception. Backports commit b4fc67c7afd2c338d6e7c73a7f428dfe05ae0603 from qemu	2019-01-05 06:41:16 -05:00
Richard Henderson	26ab4d6560	tcg: Reference count labels Increment when adding branches, and decrement when removing them. Backports commit d88a117eaa39b1d0eb1a79fe84c81840a39eb233 from qemu	2019-01-05 06:39:20 -05:00
Richard Henderson	80b4bef1cc	tcg: Add TCG_CALL_NO_RETURN Remember which helpers have been marked noreturn. Backports commit 15d7409260498505e991e7b9d87118627165e613 from qemu	2019-01-05 06:35:21 -05:00
Richard Henderson	7dbbf58653	tcg: Renumber TCG_CALL_* flags Previously, the low 4 bits were used for TCG_CALL_TYPE_MASK, which was removed in 6a18ae2d2947532d5c26439548afa0481c4529f9. Backports commit 3b50352b05eeafeb95cccd770f7aaba00bbdf6fe from qemu	2019-01-05 06:32:52 -05:00
Marc-André Lureau	ba1f54804a	qapi: fix flat union on uncovered branches conditionals Default branches variant should use the member conditional. This fixes compilation with --disable-replication. Fixes: 335d10cd8e2c3bb6067804b095aaf6371fc1983e Backports commit ce1a1aec47877a281d69dbc2e65f86bfe8fea231 from qemu	2018-12-19 10:53:29 -05:00
Lioncash	f8435ca3a6	Temporarily disable tcg_debug_assert() Backporting 6fa2cef205a60b5c5c3b058f53852416b885c455 by Thomas Huth started invoking assertions on clang. This means Unicorn is doing something silly. This should be tracked down, but in the meantime, restore behavior to allow tests to still be run.	2018-12-19 10:50:48 -05:00
Emilio G. Cota	8276a4dc66	hardfloat: implement float32/64 comparison Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: cmp-single: 110.98 MFlops cmp-double: 107.12 MFlops - after: cmp-single: 506.28 MFlops cmp-double: 524.77 MFlops Note that flattening both eq and eq_signaling versions would give us extra performance (695v506, 615v524 Mflops for single/double, respectively) but this would emit two essentially identical functions for each eq/signaling pair, which is a waste. Aggregate performance improvement for the last few patches: [ all charts in png: https://imgur.com/a/4yV8p ] 1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz qemu-aarch64 NBench score; higher is better Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz 16 +-+-----------+-------------+----===-------+---===-------+-----------+-+ 14 +-+..........................@@@&&.=.......@@@&&.=...................+-+ 12 +-+..........................@.@.&.=.......@.@.&.=.....+befor=== +-+ 10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& = +-+ 8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+ @@u& = +-+ 6 +-+............@@@&&=+*##.$%.@.&.=##$$%+@.&.=..###$$%%@i& = +-+ 4 +-+.......###$%%.@.&=...#.$%.@.&.=..#.$%.@.&.=+.#+$ +@m& = +-+ 2 +-+.....*.#$.%.@.&=...#.$%.@.&.=..#.$%.@.&.=..#+$+sqr& = +-+ 0 +-+-----##$%%@@&&=-##$$%@@&&==##$$%@@&&==-##$$%+cmp==-----+-+ FOURIER NEURAL NELU DECOMPOSITION gmean qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905 Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz error bars: 95% confidence interval 4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+ 4 +-+..........................+@@+...........................................................................+-+ 3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub +-+ 2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@& +-+ 2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+ 1.5 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+f%@&$%@&+-+ 0.5 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+sqr@&$%@&+-+ 0 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+cmp&$%@&+-+ 410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean 2. Host: ARM Aarch64 A57 @ 2.4GHz qemu-aarch64 NBench score; higher is better Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz 5 +-+-----------+-------------+-------------+-------------+-----------+-+ 4.5 +-+........................................@@@&==...................+-+ 3 4 +-+..........................@@@&==........@.@&.=.....+before +-+ 3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&== +-+ 2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+ @m@& = +-+ 2 +-+............@@@&==.#.$.%.@&.=.#$$%%.@&.=.#$$%%d@& = +-+ 1.5 +-+.....*#$$%%.@&.=..#.$.%.@&.=..#.$.%.@&.=..#+$ +f@& = +-+ 0.5 +-+......#.$.%.@&.=..#.$.%.@&.=..#.$.%.@&.=..#+$+sqr& = +-+ 0 +-+-----#$$%%@@&==-#$$%%@@&==-#$$%%@@&==-*#$$%+cmp==-----+-+ FOURIER NEURAL NLU DECOMPOSITION gmean	2018-12-19 10:45:22 -05:00
Emilio G. Cota	f7549fc13e	hardfloat: implement float32/64 square root Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: sqrt-single: 42.30 MFlops sqrt-double: 22.97 MFlops - after: sqrt-single: 311.42 MFlops sqrt-double: 311.08 MFlops Here USE_FP makes a huge difference for f64's, with throughput going from ~200 MFlops to ~300 MFlops. Backports commit f131bae8a7b7ed1928cc94c69df291db609c316a from qemu	2018-12-19 10:43:23 -05:00
Emilio G. Cota	3cf836ca83	hardfloat: implement float32/64 fused multiply-add Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: fma-single: 74.73 MFlops fma-double: 74.54 MFlops - after: fma-single: 203.37 MFlops fma-double: 169.37 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: fma-single: 23.24 MFlops fma-double: 23.70 MFlops - after: fma-single: 66.14 MFlops fma-double: 63.10 MFlops 3. IBM POWER8E @ 2.1 GHz - before: fma-single: 37.26 MFlops fma-double: 37.29 MFlops - after: fma-single: 48.90 MFlops fma-double: 59.51 MFlops Here having 3FP64 set to 1 pays off for x86_64: [1] 170.15 vs [0] 153.12 MFlops Backports commit ccf770ba7396c240ca8a1564740083742dd04c08 from qemu	2018-12-19 10:42:00 -05:00
Emilio G. Cota	95781d2bb5	hardfloat: implement float32/64 division Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: div-single: 34.84 MFlops div-double: 34.04 MFlops - after: div-single: 275.23 MFlops div-double: 216.38 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: div-single: 9.33 MFlops div-double: 9.30 MFlops - after: div-single: 51.55 MFlops div-double: 15.09 MFlops 3. IBM POWER8E @ 2.1 GHz - before: div-single: 25.65 MFlops div-double: 24.91 MFlops - after: div-single: 96.83 MFlops div-double: 31.01 MFlops Here setting 2FP64_USE_FP to 1 pays off for x86_64: [1] 215.97 vs [0] 62.15 MFlops Backports commit 4a6295613f533a6841de5968c50e1ca36748807e from qemu	2018-12-19 10:40:00 -05:00
Emilio G. Cota	93991714fb	hardfloat: implement float32/64 multiplication Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: mul-single: 126.91 MFlops mul-double: 118.28 MFlops - after: mul-single: 258.02 MFlops mul-double: 197.96 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: mul-single: 37.42 MFlops mul-double: 38.77 MFlops - after: mul-single: 73.41 MFlops mul-double: 76.93 MFlops 3. IBM POWER8E @ 2.1 GHz - before: mul-single: 58.40 MFlops mul-double: 59.33 MFlops - after: mul-single: 60.25 MFlops mul-double: 94.79 MFlops Backports commit 2dfabc86e656e835c67954c60e143ecd33e15817 from qemu	2018-12-19 10:38:33 -05:00
Emilio G. Cota	0862d9c462	hardfloat: implement float32/64 addition and subtraction Performance results (single and double precision) for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: add-single: 135.07 MFlops add-double: 131.60 MFlops sub-single: 130.04 MFlops sub-double: 133.01 MFlops - after: add-single: 443.04 MFlops add-double: 301.95 MFlops sub-single: 411.36 MFlops sub-double: 293.15 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: add-single: 44.79 MFlops add-double: 49.20 MFlops sub-single: 44.55 MFlops sub-double: 49.06 MFlops - after: add-single: 93.28 MFlops add-double: 88.27 MFlops sub-single: 91.47 MFlops sub-double: 88.27 MFlops 3. IBM POWER8E @ 2.1 GHz - before: add-single: 72.59 MFlops add-double: 72.27 MFlops sub-single: 75.33 MFlops sub-double: 70.54 MFlops - after: add-single: 112.95 MFlops add-double: 201.11 MFlops sub-single: 116.80 MFlops sub-double: 188.72 MFlops Note that the IBM and ARM machines benefit from having HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance can suffer significantly: - IBM Power8: add-single: [1] 54.94 vs [0] 116.37 MFlops add-double: [1] 58.92 vs [0] 201.44 MFlops - Aarch64 A57: add-single: [1] 80.72 vs [0] 93.24 MFlops add-double: [1] 82.10 vs [0] 88.18 MFlops On the Intel machine, having 2F64 set to 1 pays off, but it doesn't for 2F32: - Intel i7-6700K: add-single: [1] 285.79 vs [0] 426.70 MFlops add-double: [1] 302.15 vs [0] 278.82 MFlops Backports commit 1b615d482094e0123d187f0ad3c676ba8eb9d0a3 from qemu	2018-12-19 10:36:55 -05:00
Emilio G. Cota	bca8e39e3c	fpu: introduce hardfloat The appended paves the way for leveraging the host FPU for a subset of guest FP operations. For most guest workloads (e.g. FP flags aren't ever cleared, inexact occurs often and rounding is set to the default [to nearest]) this will yield sizable performance speedups. The approach followed here avoids checking the FP exception flags register. See the added comment for details. This assumes that QEMU is running on an IEEE754-compliant FPU and that the rounding is set to the default (to nearest). The implementation-dependent specifics of the FPU should not matter; things like tininess detection and snan representation are still dealt with in soft-fp. However, this approach will break on most hosts if we compile QEMU with flags that break IEEE compatibility. There is no way to detect all of these flags at compilation time, but at least we check for -ffast-math (which defines __FAST_MATH__) and disable hardfloat (plus emit a #warning) when it is set. This patch just adds common code. Some operations will be migrated to hardfloat in subsequent patches to ease bisection. Note: some architectures (at least PPC, there might be others) clear the status flags passed to softfloat before most FP operations. This precludes the use of hardfloat, so to avoid introducing a performance regression for those targets, we add a flag to disable hardfloat. In the long run though it would be good to fix the targets so that at least the inexact flag passed to softfloat is indeed sticky. Backports commit a94b783952cc493cb241aabb1da8c7a830385baa from qemu	2018-12-19 10:32:32 -05:00
Emilio G. Cota	5d3ccde625	softfloat: add float{32,64}_is_zero_or_normal These will gain some users very soon. Backports commit 315df0d193929b167b9d7be4665d5f2c0e2427e0 from qemu	2018-12-19 10:31:10 -05:00

1 2 3 4 5 ...

5453 commits