Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)
Backports commit 9aaa23c2ae18e6fb9a291b81baf91341db76dfa0 from qemu
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.
Backports commit 54e96c744b70a5d19f14b212a579dd3be8fcaad9 from qemu
Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.
We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.
Backports commit 0aad761fb0aed40c99039eacac470cbd03d07019 from qemu
Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.
Backports commit 77e576a9281825fc170f3b3af83f47e110549b5c from qemu
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.
Backports commit 85ac9aef9a5418de3168df569e21258e853840a2 from qemu
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree. These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.
The refactoring removes some of the oddities of the old decoder:
* operands to the operation and accumulation were often
reversed (taking advantage of the fact that most of these ops
are commutative); the new code follows the pseudocode order
* the Q bit in the insn was in a local variable 'u'; in the
new code it is decoded into a->q
Backports commit 96fc80f5f186decd1a649f6c04252faceb057ad2 from qemu
In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
temporary in do_2shift_env_64(); free it.
Backports commit a4f67e180def790ff0bbb33fc93bb6e80382f041 from qemu
Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.
Backports commit 448f0e5f3ecfbd089b934e5e3aa0ccd1f51a6174 from qemu
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.
Backports commit 18fb58d588898550919392277787979ee7d0d84e from qemu
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.
These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.
Backports commit 9546ca5998d3cbd98a81b2d46a2e92a11b0f78a4 from qemu
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.
Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.
Backports commit 3a1d9eb07b767a7592abca642af80906f9eab0ed from qemu
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.
Backports commit f5b28401200ec95ba89552df3ecdcdc342f6b90b from qemu
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.
Backports commit 0fa1ab0302badabc3581aefcbb2f189ef52c4985 from qemu
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.
As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.
Backports commit b28be09570d0827969b62b8f82b0f720a9915427 from qemu
The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.
Backports commit 9593a3988c3e788790aa107d778386b09f456a6d from qemu
The last real change to this file is from 2012, so it is very likely
that this file is completely out-of-date and ignored today. Let's
simply remove it to avoid confusion if someone finds it by accident.
Backports commit 3575b0aea983ad57804c9af739ed8ff7bc168393 from qemu
This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
/ pcmpestrm / pcmpistri / pcmpistrm substring search, commit
ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.
That commit fixed a bug that showed up in four GCC tests with one libc
implementation. The tests in question generate random inputs to the
intrinsics and compare results to a C implementation, but they only
test 1024 possible random inputs, and when the tests use the cases of
those instructions that work with word rather than byte inputs, it's
easy to have problematic cases that show up much less frequently than
that. Thus, testing with a different libc implementation, and so a
different random number generator, showed up a problem with the
previous patch.
When investigating the previous test failures, I found the description
of these instructions in the Intel manuals (starting from computing a
16x16 or 8x8 set of comparison results) confusing and hard to match up
with the more optimized implementation in QEMU, and referred to AMD
manuals which described the instructions in a different way. Those
AMD descriptions are very explicit that the whole of the string being
searched for must be found in the other operand, not running off the
end of that operand; they say "If the prototype and the SUT are equal
in length, the two strings must be identical for the comparison to be
TRUE.". However, that statement is incorrect.
In my previous commit message, I noted:
The operation in this case is a search for a string (argument d to
the helper) in another string (argument s to the helper); if a copy
of d at a particular position would run off the end of s, the
resulting output bit should be 0 whether or not the strings match in
the region where they overlap, but the QEMU implementation was
wrongly comparing only up to the point where s ends and counting it
as a match if an initial segment of d matched a terminal segment of
s. Here, "run off the end of s" means that some byte of d would
overlap some byte outside of s; thus, if d has zero length, it is
considered to match everywhere, including after the end of s.
The description "some byte of d would overlap some byte outside of s"
is accurate only when understood to refer to overlapping some byte
*within the 16-byte operand* but at or after the zero terminator; it
is valid to run over the end of s if the end of s is the end of the
16-byte operand. So the fix in the previous patch for the case of d
being empty was correct, but the other part of that patch was not
correct (as it never allowed partial matches even at the end of the
16-byte operand). Nor was the code before the previous patch correct
for the case of d nonempty, as it would always have allowed partial
matches at the end of s.
Fix with a partial revert of my previous change, combined with
inserting a check for the special case of s having maximum length to
determine where it is necessary to check for matches.
In the added test, test 1 is for the case of empty strings, which
failed before my 2017 patch, test 2 is for the bug introduced by my
2017 patch and test 3 deals with the case where a match of an initial
segment at the end of the string is not valid when the string ends
before the end of the 16-byte operand (that is, the case that would be
broken by a simple revert of the non-empty-string part of my 2017
patch).
Backports commit bc921b2711c4e2e8ab99a3045f6c0f134a93b535 from qemu
Most x87 instruction implementations fail to raise the expected IEEE
floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in the x87 status word. There is special-case handling of division to
raise the divide-by-zero exception, but that handling is itself buggy:
it raises the exception in inappropriate cases (inf / 0 and nan / 0,
which should not raise any exceptions, and 0 / 0, which should raise
"invalid" instead).
Fix this by converting the floating-point exceptions raised during an
operation by the softfloat machinery into exceptions in the x87 status
word (passing through the existing fpu_set_exception function for
handling related to trapping exceptions). There are special cases
where some functions convert to integer internally but exceptions from
that conversion are not always correct exceptions for the instruction
to raise.
There might be scope for some simplification if the softfloat
exception state either could always be assumed to be in sync with the
state in the status word, or could always be ignored at the start of
each instruction and just set to 0 then; I haven't looked into that in
detail, and it might run into interactions with the various ways the
emulation does not yet handle trapping exceptions properly. I think
the approach taken here, of saving the softfloat state, setting
exceptions there to 0 and then merging the old exceptions back in
after carrying out the operation, is conservatively safe
Backports commit 975af797f1e04e4d1b1a12f1731141d3770fdbce from qemu
The fist / fistt family of instructions should all store the most
negative integer in the destination format when the rounded /
truncated integer result is out of range or the input is an invalid
encoding, infinity or NaN. The fisttpl and fisttpll implementations
(32-bit and 64-bit results, truncate towards zero) failed to do this,
producing the most positive integer in some cases instead. Fix this
by copying the code used to handle this issue for fistpl and fistpll,
adjusted to use the _round_to_zero functions for the actual
conversion (but without any other changes to that code).
Backports commit c8af85b10c818709755f5dc8061c69920611fd4c from qemu
The fbstp implementation fails to check for out-of-range and invalid
values, instead just taking the result of conversion to int64_t and
storing its sign and low 18 decimal digits. Fix this by checking for
an out-of-range result (invalid conversions always result in INT64_MAX
or INT64_MIN from the softfloat code, which are large enough to be
considered as out-of-range by this code) and storing the packed BCD
indefinite encoding in that case.
Backports commit 374ff4d0a3c2cce2bc6e4ba8a77eaba55c165252 from qemu
The fbstp implementation stores +0 when the rounded result should be
-0 because it compares an integer value with 0 to determine the sign.
Fix this by checking the sign bit of the operand instead.
Backports commit 18c53e1e73197a24f9f4b66b1276eb9868db5bf0 from qemu
The fxam implementation does not check for invalid encodings, instead
treating them like NaN or normal numbers depending on the exponent.
Fix it to check that the high bit of the significand is set before
treating an encoding as NaN or normal, thus resulting in correct
handling (all of C0, C2 and C3 cleared) for invalid encodings.
Backports commit 34b9cc076ff423023a779a04a9f7cd7c17372cbf from qemu
The implementations of the fldl2t, fldl2e, fldpi, fldlg2 and fldln2
instructions load fixed constants independent of the rounding mode.
Fix them to load a value correctly rounded for the current rounding
mode (but always rounded to 64-bit precision independent of the
precision control, and without setting "inexact") as specified.
Backports commit 80b4008c805ebcfd4c0d302ac31c1689e34571e0 from qemu
The fscale implementation uses floatx80_scalbn for the final scaling
operation. floatx80_scalbn ends up rounding the result using the
dynamic rounding precision configured for the FPU. But only a limited
set of x87 floating-point instructions are supposed to respect the
dynamic rounding precision, and fscale is not in that set. Fix the
implementation to save and restore the rounding precision around the
call to floatx80_scalbn.
Backports commit c535d68755576bfa33be7aef7bd294a601f776e0 from qemu
The fscale implementation passes infinite exponents through to generic
code that rounds the exponent to a 32-bit integer before using
floatx80_scalbn. In round-to-nearest mode, and ignoring exceptions,
this works in many cases. But it fails to handle the special cases of
scaling 0 by a +Inf exponent or an infinity by a -Inf exponent, which
should produce a NaN, and because it produces an inexact result for
finite nonzero numbers being scaled, the result is sometimes incorrect
in other rounding modes. Add appropriate handling of infinite
exponents to produce a NaN or an appropriately signed exact zero or
infinity as a result
Backports commit c1c5fb8f9067c830e36830c2b82c0ec146c03d7b from qemu
The fscale implementation does not check for invalid encodings in the
exponent operand, thus treating them like INT_MIN (the value returned
for invalid encodings by floatx80_to_int32_round_to_zero). Fix it to
treat them similarly to signaling NaN exponents, thus generating a
quiet NaN result.
Backports commit b40eec96b26028b68c3594fbf34b6d6f029df26a from qemu
The implementation of the fscale instruction returns a NaN exponent
unchanged. Fix it to return a quiet NaN when the provided exponent is
a signaling NaN.
Backports commit 0d48b436327955c69e2eb53f88aba9aa1e0dbaa0 from qemu
The implementation of the fxtract instruction treats all nonzero
operands as normal numbers, so yielding incorrect results for invalid
formats, infinities, NaNs and subnormal and pseudo-denormal operands.
Implement appropriate handling of all those cases.
Backports commit c415f2c58296d86e9abb7e4a133111acf7031da3 from qemu
Detected by asm test suite failures in dav1d
(https://code.videolan.org/videolan/dav1d). Can be reproduced by
`qemu-x86_64 -cpu core2duo ./tests/checkasm --test=mc_8bpc 1659890620`.
Backports commit 2dfbea1a872727fb747ca6adf2390e09956cdc6e from qemu
The miscellaneous control instructions are mutually exclusive
within the t32 decode sub-group.
Backports commit d6084fba47bb9aef79775c1102d4b647eb58c365 from qemu
Includes multiple changes by Richard Henderson as follows:
- Use proper varargs to print the arguments. (2fd51b19c9)
- Rename MultiPattern to IncMultiPattern (040145c4f8)
- Split out MultiPattern from IncMultiPattern (df63044d02)
- Allow group covering the entire insn space (b44b3449a0)
- Move semantic propagation into classes (08561fc128)
- Implement non-overlapping groups (067e8b0f45)
- Drop check for less than 2 patterns in a group (fe079aa13d)
Convert the insns in the one-register-and-immediate group to decodetree.
In the new decode, our asimd_imm_const() function returns a 64-bit value
rather than a 32-bit one, which means we don't need to treat cmode=14 op=1
as a special case in the decoder (it is the only encoding where the two
halves of the 64-bit value are different).
Backports commit 2c35a39eda0b16c2ed85c94cec204bf5efb97812 from qemu
Convert the VCVT fixed-point conversion operations in the
Neon 2-regs-and-shift group to decodetree.
Backports commit 3da26f11711caeaa18318b6afa14dfb81d7650ab from qemu
Convert the VSHLL and VMOVL insns from the 2-reg-shift group
to decodetree. Since the loop always has two passes, we unroll
it to avoid the awkward reassignment of one TCGv to another.
Backports commit 968bf842742a5ffbb0041cb31089e61a9f7a833d from qemu
Convert the VQSHLU and QVSHL 2-reg-shift insns to decodetree.
These are the last of the simple shift-by-immediate insns.
Backports commit 37bfce81b10450071193c8495a07f182ec652e2a from qemu
Convert the VSHR 2-reg-shift insns to decodetree.
Note that unlike the legacy decoder, we present the right shift
amount to the trans_ function as a positive integer.
Backports commit 66432d6b8294e3508218b360acfdf7c244eea993 from qemu
Convert the VSHL and VSLI insns from the Neon 2-registers-and-a-shift
group to decodetree.
Backports commit d3c8c736f8b4bdd02831076286b1788232f46ced from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit 43fa36c96c24349145497adc1b451f9caf74e344 from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit afc8b7d32668547308bdd654a63cf5228936e0ba from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit effa992f153f5e7ab97ab843b565690748c5b402 from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit aaffebd6d3135b8aed7e61932af53b004d261579 from qemu
With this conversion, we will be able to use the same helpers
with sve. This also fixes a bug in which we failed to clear
the high bits of the SVE register after an AdvSIMD operation.
Backports commit 1738860d7e60dec5dbeba17f8b44d31aae3accac from qemu
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.
This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.
Backports commit a04b68e1d4c4f0cd5cd7542697b1b230b84532f5 from qemu