Commit graph

11 commits

Author SHA1 Message Date
Joseph Myers cf54c51869 target/i386: fix IEEE SSE floating-point exception raising
The SSE instruction implementations all fail to raise the expected
IEEE floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in MXCSR.

Fix this by adding such conversions. Unlike for x87, emulated SSE
floating-point operations might be optimized using hardware floating
point on the host, and so a different approach is taken that is
compatible with such optimizations. The required invariant is that
all exceptions set in env->sse_status (other than "denormal operand",
for which the SSE semantics are different from those in the softfloat
code) are ones that are set in the MXCSR; the emulated MXCSR is
updated lazily when code reads MXCSR, while when code sets MXCSR, the
exceptions in env->sse_status are set accordingly.

A few instructions do not raise all the exceptions that would be
raised by the softfloat code, and those instructions are made to save
and restore the softfloat exception state accordingly.

Nothing is done about "denormal operand"; setting that (only for the
case when input denormals are *not* flushed to zero, the opposite of
the logic in the softfloat code for such an exception) will require
custom code for relevant instructions, or else architecture-specific
conditionals in the softfloat code for when to set such an exception
together with custom code for various SSE conversion and rounding
instructions that do not set that exception.

Nothing is done about trapping exceptions (for which there is minimal
and largely broken support in QEMU's emulation in the x87 case and no
support at all in the SSE case).

Backports commit 418b0f93d12a1589d5031405de857844f32e9ccc from qemu
2021-02-25 23:21:32 -05:00
Joseph Myers 2aee4714ab target/i386: reimplement f2xm1 using floatx80 operations
The x87 f2xm1 emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (computing with pow and then
subtracting 1 rather than attempting a better emulation using expm1).

Reimplement using the soft-float operations, including additions and
multiplications with higher precision where appropriate to limit
accumulation of errors. I considered reusing some of the m68k code
for transcendental operations, but the instructions don't generally
correspond exactly to x87 operations (for example, m68k has 2^x and
e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
from applying multiple such operations each rounding to floatx80
precision, I wrote a direct implementation of 2^x - 1 instead. It
would be possible in principle to make the implementation more
efficient by doing the intermediate operations directly with
significands, signs and exponents and not packing / unpacking floatx80
format for each operation, but that would make it significantly more
complicated and it's not clear that's worthwhile; the m68k emulation
doesn't try to do that.

A test is included with many randomly generated inputs. The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding). I haven't
investigated how accurate hardware is.

Backports commit eca30647fc078f4d9ed1b455bd67960f99dbeb7a from qemu
2021-02-25 13:31:13 -05:00
Joseph Myers 18b0ae9ebd target/i386: correct fix for pcmpxstrx substring search
This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
/ pcmpestrm / pcmpistri / pcmpistrm substring search, commit
ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.

That commit fixed a bug that showed up in four GCC tests with one libc
implementation. The tests in question generate random inputs to the
intrinsics and compare results to a C implementation, but they only
test 1024 possible random inputs, and when the tests use the cases of
those instructions that work with word rather than byte inputs, it's
easy to have problematic cases that show up much less frequently than
that. Thus, testing with a different libc implementation, and so a
different random number generator, showed up a problem with the
previous patch.

When investigating the previous test failures, I found the description
of these instructions in the Intel manuals (starting from computing a
16x16 or 8x8 set of comparison results) confusing and hard to match up
with the more optimized implementation in QEMU, and referred to AMD
manuals which described the instructions in a different way. Those
AMD descriptions are very explicit that the whole of the string being
searched for must be found in the other operand, not running off the
end of that operand; they say "If the prototype and the SUT are equal
in length, the two strings must be identical for the comparison to be
TRUE.". However, that statement is incorrect.

In my previous commit message, I noted:

The operation in this case is a search for a string (argument d to
the helper) in another string (argument s to the helper); if a copy
of d at a particular position would run off the end of s, the
resulting output bit should be 0 whether or not the strings match in
the region where they overlap, but the QEMU implementation was
wrongly comparing only up to the point where s ends and counting it
as a match if an initial segment of d matched a terminal segment of
s. Here, "run off the end of s" means that some byte of d would
overlap some byte outside of s; thus, if d has zero length, it is
considered to match everywhere, including after the end of s.

The description "some byte of d would overlap some byte outside of s"
is accurate only when understood to refer to overlapping some byte
*within the 16-byte operand* but at or after the zero terminator; it
is valid to run over the end of s if the end of s is the end of the
16-byte operand. So the fix in the previous patch for the case of d
being empty was correct, but the other part of that patch was not
correct (as it never allowed partial matches even at the end of the
16-byte operand). Nor was the code before the previous patch correct
for the case of d nonempty, as it would always have allowed partial
matches at the end of s.

Fix with a partial revert of my previous change, combined with
inserting a check for the special case of s having maximum length to
determine where it is necessary to check for matches.

In the added test, test 1 is for the case of empty strings, which
failed before my 2017 patch, test 2 is for the bug introduced by my
2017 patch and test 3 deals with the case where a match of an initial
segment at the end of the string is not valid when the string ends
before the end of the 16-byte operand (that is, the case that would be
broken by a simple revert of the non-empty-string part of my 2017
patch).

Backports commit bc921b2711c4e2e8ab99a3045f6c0f134a93b535 from qemu
2020-06-15 13:20:48 -04:00
Janne Grunau 6f41687234 target/i386: fix phadd* with identical destination and source register
Detected by asm test suite failures in dav1d
(https://code.videolan.org/videolan/dav1d). Can be reproduced by
`qemu-x86_64 -cpu core2duo ./tests/checkasm --test=mc_8bpc 1659890620`.

Backports commit 2dfbea1a872727fb747ca6adf2390e09956cdc6e from qemu
2020-06-15 12:59:49 -04:00
Richard Henderson d960523cbd softfloat: Name compare relation enum
Give the previously unnamed enum a typedef name. Use it in the
prototypes of compare functions. Use it to hold the results
of the compare functions.

Backports commit 71bfd65c5fcd72f8af2735905415c7ce4220f6dc from qemu
2020-05-21 18:08:52 -04:00
Peter Maydell 5899803c3c
target/i386: Return 'indefinite integer value' for invalid SSE fp->int conversions
The x86 architecture requires that all conversions from floating
point to integer which raise the 'invalid' exception (infinities of
both signs, NaN, and all values which don't fit in the destination
integer) return what the x86 spec calls the "indefinite integer
value", which is 0x8000_0000 for 32-bits or 0x8000_0000_0000_0000 for
64-bits. The softfloat functions return the more usual behaviour of
positive overflows returning the maximum value that fits in the
destination integer format and negative overflows returning the
minimum value that fits.

Wrap the softfloat functions in x86-specific versions which
detect the 'invalid' condition and return the indefinite integer.

Note that we don't use these wrappers for the 3DNow! pf2id and pf2iw
instructions, which do return the minimum value that fits in
an int32 if the input float is a large negative number.

Fixes: https://bugs.launchpad.net/qemu/+bug/1815423

Backports commit 1e8a98b53867f61da9ca09f411288e2085d323c4 from qemu
2019-11-18 21:48:03 -05:00
Joseph Myers a237d9dbca
target/i386: fix phminposuw in-place operation
The SSE4.1 phminposuw instruction finds the minimum 16-bit element in
the source vector, putting the value of that element in the low 16
bits of the destination vector, the index of that element in the next
three bits and zeroing the rest of the destination. The helper for
this operation fills the destination from high to low, meaning that
when the source and destination are the same register, the minimum
source element can be overwritten before it is copied to the
destination. This patch fixes it to fill the destination from low to
high instead, so the minimum source element is always copied first.
This fixes one gcc test failure in my GCC 6-based testing (and so
concludes the present sequence of patches, as I don't have any further
gcc test failures left in that testing that I attribute to QEMU bugs).

Backports commit aa406feadfc5b095ca147ec56d6187c64be015a7 from qemu
2018-03-04 23:59:26 -05:00
Joseph Myers 85b647e486
target/i386: fix pcmpxstrx substring search
One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri /
pcmpistrm instructions does a substring search. The implementation of
this case in the pcmpxstrx helper is incorrect. The operation in this
case is a search for a string (argument d to the helper) in another
string (argument s to the helper); if a copy of d at a particular
position would run off the end of s, the resulting output bit should
be 0 whether or not the strings match in the region where they
overlap, but the QEMU implementation was wrongly comparing only up to
the point where s ends and counting it as a match if an initial
segment of d matched a terminal segment of s. Here, "run off the end
of s" means that some byte of d would overlap some byte outside of s;
thus, if d has zero length, it is considered to match everywhere,
including after the end of s. This patch fixes the implementation to
correspond with the proper instruction semantics. This fixes four gcc
test failures in my GCC 6-based testing.

Backports commit ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60 from qemu
2018-03-04 23:58:45 -05:00
Joseph Myers 59df0bae12
target/i386: fix packusdw in-place operation
The SSE4.1 packusdw instruction combines source and destination
vectors of signed 32-bit integers into a single vector of unsigned
16-bit integers, with unsigned saturation. When the source and
destination are the same register, this means each 32-bit element of
that register is used twice as an input, to produce two of the 16-bit
output elements, and so if the operation is carried out
element-by-element in-place, no matter what the order in which it is
applied to the elements, the first element's operation will overwrite
some future input. The helper for packssdw avoids this issue by
computing the result in a local temporary and copying it to the
destination at the end; this patch fixes the packusdw helper to do
likewise. This fixes three gcc test failures in my GCC 6-based
testing.

Backports commit 80e19606215d4df370dfe8fe21c558a129f00f0b from qemu
2018-03-04 23:57:54 -05:00
Joseph Myers e883a15231
target/i386: fix pmovsx/pmovzx in-place operations
The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte
or 4-byte inputs and sign-extend or zero-extend them to a wider vector
output. The associated helpers for these instructions do the
extension on each element in turn, starting with the lowest. If the
input and output are the same register, this means that all the input
elements after the first have been overwritten before they are read.
This patch makes the helpers extend starting with the highest element,
not the lowest, to avoid such overwriting. This fixes many GCC test
failures (161 in the gcc testsuite in my GCC 6-based testing) when
testing with a default CPU setting enabling those instructions.

Backports commit c6a56c8e990b213a1638af2d34352771d5fa4d9c from qemu
2018-03-04 23:56:01 -05:00
Thomas Huth b2f1326437
Move target-* CPU file into a target/ folder
We've currently got 18 architectures in QEMU, and thus 18 target-xxx
folders in the root folder of the QEMU source tree. More architectures
(e.g. RISC-V, AVR) are likely to be included soon, too, so the main
folder of the QEMU sources slowly gets quite overcrowded with the
target-xxx folders.
To disburden the main folder a little bit, let's move the target-xxx
folders into a dedicated target/ folder, so that target-xxx/ simply
becomes target/xxx/ instead.

Backports commit fcf5ef2ab52c621a4617ebbef36bf43b4003f4c0 from qemu
2018-03-01 22:50:58 -05:00
Renamed from qemu/target-i386/ops_sse.h (Browse further)