Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 42.30 MFlops
sqrt-double: 22.97 MFlops
- after:
sqrt-single: 311.42 MFlops
sqrt-double: 311.08 MFlops
Here USE_FP makes a huge difference for f64's, with throughput
going from ~200 MFlops to ~300 MFlops.
Backports commit f131bae8a7b7ed1928cc94c69df291db609c316a from qemu
The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.
The approach followed here avoids checking the FP exception flags register.
See the added comment for details.
This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags that break IEEE compatibility. There is no way to detect
all of these flags at compilation time, but at least we check for
-ffast-math (which defines __FAST_MATH__) and disable hardfloat
(plus emit a #warning) when it is set.
This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.
Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.
Backports commit a94b783952cc493cb241aabb1da8c7a830385baa from qemu
glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
Given that we'll be including <math.h> soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().
Backports commit f9943c7f766678af36d31076b78e466256f4871b from qemu
Change the order in which we extract a/b and c/d to
match the output of the upstream xxhash32.
Tested with:
https://github.com/cota/xxhash/tree/qemu
Backports commit b7c2cd08a6f68010ad27c9c0bf2fde02fb743a0e from qemu
For now, defined universally as true, since we previously required
backends to implement swapped memory operations. Future patches
may now remove that support where it is onerous.
Backports commit e1dcf3529d0797b25bb49a20e94b62eb93e7276a from qemu
Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.
Backports commit 6498594c8eda83c5f5915afc34bd03396f8de6df from qemu
Based on the only current user, Sparc:
New code uses 2 constants that take 2 insns to load from constant pool,
plus 13. Old code used 6 constants that took 1 or 2 insns to create,
plus 21. The result is a new total of 17 vs an old total of 29.
Backports commit 9e821eab0ab708add35fa0446d880086e845ee3e from qemu
Based on the only current user, Sparc:
New code uses 1 constant that takes 2 insns to create, plus 8.
Old code used 2 constants that took 2 insns to create, plus 9.
The result is a new total of 10 vs an old total of 13.
Backports commit a686dc71d89b1d7934becd95c843aa1375cdb7e7 from qemu
We now have an invariant that all TCG_TYPE_I32 values are
zero-extended, which means that we do not need to extend
them again during qemu_ld/st, either explicitly via a separate
tcg_out_ext32u or implicitly via P_ADDR32.
Backports commit 4810d96f03be4d3820563e3c6bf13dfc0627f205 from qemu
This preserves the invariant that all TCG_TYPE_I32 values are
zero-extended in the 64-bit host register.
Backports commit 75478279a0c1eafc7b69d5382356da138f58f1bd from qemu
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.
Backports commit 3dbc8c61de4e0d0a2afe0897cda7ab28cd37a164 from qemu
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.
Backports commit 1d21d95b6101786d44d3b4a12400eb80a1ecc647 from qemu
This does require an extra two checks within the slow paths
to replace the assert that we're moving. Also add two checks
within existing functions that lacked any kind of assert for
out of range branch.
Backports commit 55dfd8fedceb1311d9cdded1a0f94b2da91a387d from qemu
The reloc_pc{14,24}_val routines retain their asserts.
Use these directly within the slow paths.
Backports commit d5132903518fadad579ef2de9e45fce98eefaa63 from qemu
This does require an extra two checks within the slow paths
to replace the assert that we're moving.
Backports commit 43fabd30e2f411e8d70ff347902a7c8ed308233e from qemu
This does require an extra two checks within the slow paths
to replace the assert that we're moving.
Backports commit 214bfe83d5a5af70bac2b8d0bd649b018c33c03b from qemu
This will move the assert for success from within (subroutines of)
patch_reloc into the callers. It will also let new code do something
different when a relocation is out of range.
For the moment, all backends are trivially converted to return true.
Backports commit 6ac1778676f4259c10b0629ccd9df319a5d1baeb from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit 8c1b079279fadaee10dc39ca9a58c4c91c7a1854 from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit 791645f0227c9d52ce5fe1ad6e1cda55a9bfe633 from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit 3661612fc3e4b65be03482bf6bafd116101881e1 from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit f9c7246faa279237200a2a53beacaa8100ea1900 from qemu
There are one use apiece for these. There is no longer a need for
preserving branch offset operands, as we no longer re-translate.
Backports commit 37ee93a974c49ab9edfcd1db0aad3838b0395b14 from qemu
There are one use apiece for these. There is no longer a need for
preserving branch offset operands, as we no longer re-translate.
Backports commit 733589b3382afcb0ae9f43e72e083a5ddd38abd5 from qemu
For x86_64, this can remove a REX prefix resulting in smaller code
when manipulating globals of type i32, as we move them between backing
store via cpu_env, aka TCG_AREG0.
Backports commit 5740d9f714835964873325d1210b26811252843f from qemu
Partially reverts ab20bdc1162. The 14-bit displacement that we
allowed to reach the constant pool is not always sufficient.
Retain the tb-relative addressing, as that is how most return
values from the tb are computed.
Backports commit f6823cbe3787aa47db62deede6683077e3da9a2c from qemu
When we add a new entry to the ARMCPRegInfo hash table in
add_cpreg_to_hashtable(), we allocate memory for tehe
ARMCPRegInfo struct itself, and we also g_strdup() the
name string. So the hashtable's value destructor function
must free the name string as well as the struct.
Spotted by clang's leak sanitizer. The leak here is a
small one-off leak at startup, because we don't support
CPU hotplug, and so the only time when we destroy
hash table entries is for the case where ARM_CP_OVERRIDE
means we register a wildcard entry and then override it later.
Backports commit ac87e5072e2cbfcf8e80caac7ef43ceb6914c7af from qemu
The generated C enumeration types explicitly set the enumeration
constants to 0, 1, 2, ... That's exactly what you get when you don't
supply values.
Drop the explicit values. No change now, but it will avoid gaps in
the values when we later add support for 'if' conditions. Avoiding
such gaps will save us the trouble of changing the ENUM_lookup[]
tables to work without a sentinel.
We'll have to take care to ensure the headers required by the 'if'
conditions get always included before the generated QAPI code.
Fortunately, our convention to include "qemu/osdep.h" first in any .c
ensures that's the case for our CONFIG_FOO macros
Backports commit 9c2f56e9f9d5a1f9ddac77dda35f997738e85d11 from qemu
Rename QAPISchemaEnumType.values and related variables to members.
Makes sense ever since commit 93bda4dd4 changed .values from list of
string to list of QAPISchemaMember. Obvious no-op.
Backports commit 57516863644817ca59fab023e0c68d139929f3e0 from qemu
The input visitor has some problems right now, especially
- unsigned type "Range" is used to process signed ranges, resulting in
inconsistent behavior and ugly/magical code
- uint64_t are parsed like int64_t, so big uint64_t values are not
supported and error messages are misleading
- lists/ranges of int64_t are accepted although no list is parsed and
we should rather report an error
- lists/ranges are preparsed using int64_t, making it hard to
implement uint64_t values or uint64_t lists
- types that don't support lists don't bail out
- visiting beyond the end of a list is not handled properly
- we don't actually parse lists, we parse *sets*: members are sorted,
and duplicates eliminated
So let's rewrite it by getting rid of usage of the type "Range" and
properly supporting lists of int64_t and uint64_t (including ranges of
both types), fixing the above mentioned issues.
Lists of other types are not supported and will properly report an
error. Virtual walks are now supported.
Tests have to be fixed up:
- Two BUGs were hardcoded that are fixed now
- The string-input-visitor now actually returns a parsed list and not
an ordered set.
Please note that no users/callers have to be fixed up. Candidates using
visit_type_uint16List() and friends are:
- backends/hostmem.c:host_memory_backend_set_host_nodes()
-- Code can deal with duplicates/unsorted lists
- numa.c::query_memdev()
-- via object_property_get_uint16List(), the list will still be sorted
and without duplicates (via host_memory_backend_get_host_nodes())
- qapi-visit.c::visit_type_Memdev_members()
- qapi-visit.c::visit_type_NumaNodeOptions_members()
- qapi-visit.c::visit_type_RockerOfDpaGroup_members
- qapi-visit.c::visit_type_RxFilterInfo_members()
-- Not used with string-input-visitor.
Backports commit c9fba9de89db51a07689e2cba4865a1e564b8f0f from qemu
The string-input-visitor happily accepts NaN and infinities when parsing
numbers (doubles). They shouldn't. Fix that.
Also, add two test cases, testing if "NaN" and "inf" is properly
rejected.
Backports commit 4b69d4c3d7c133ebc9393ef3f86ce38831921cb6 from qemu
qemu_strtosz() & friends reject NaNs, but happily accept infinities.
They shouldn't. Fix that.
The fix makes use of qemu_strtod_finite(). To avoid ugly casts,
change the @end parameter of qemu_strtosz() & friends from char **
to const char **.
Also, add two test cases, testing that "inf" and "NaN" are properly
rejected. While at it, also fixup the function documentation.
Backports commit af02f4c5179675ad4e26b17ba26694a8fcde17fa from qemu
Provide a trivial implementation with zero limited ordering regions,
which causes the LDLAR and STLLR instructions to devolve into the
LDAR and STLR instructions from the base ARMv8.0 instruction set.
Backports commit 2d7137c10fafefe40a0a049ff8a7bd78b66e661f from qemu
Since arm_hcr_el2_eff includes a check against
arm_is_secure_below_el3, we can often remove a
nearby check against secure state.
In some cases, sort the call to arm_hcr_el2_eff
to the end of a short-circuit logical sequence.
Backports commit 7c208e0f4171c9e2cc35efc12e1bf264a45c229f from qemu
Replace arm_hcr_el2_{fmo,imo,amo} with a more general routine
that also takes SCR_EL3.NS (aka arm_is_secure_below_el3) into
account, as documented for the plethora of bits in HCR_EL2.
Backports commit f77784446045231f7dfa46c9b872091241fa1557 from qemu
The bulk of the work here, beyond base HPD, is defining the
TTBCR2 register. In addition we must check TTBCR.T2E, which
is not present (RES0) for AArch64.
Backports commit ab638a328fd099ba0b23c8c818eb39f2c35414f3 from qemu