target/i386: fix pmovsx/pmovzx in-place operations

The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte
or 4-byte inputs and sign-extend or zero-extend them to a wider vector
output. The associated helpers for these instructions do the
extension on each element in turn, starting with the lowest. If the
input and output are the same register, this means that all the input
elements after the first have been overwritten before they are read.
This patch makes the helpers extend starting with the highest element,
not the lowest, to avoid such overwriting. This fixes many GCC test
failures (161 in the gcc testsuite in my GCC 6-based testing) when
testing with a default CPU setting enabling those instructions.

Backports commit c6a56c8e990b213a1638af2d34352771d5fa4d9c from qemu
This commit is contained in:
Joseph Myers 2018-03-04 23:55:52 -05:00 committed by Lioncash
parent 7168f72d4d
commit e883a15231
No known key found for this signature in database
GPG key ID: 4E3C3CC1031BA9C7

View file

@ -1617,18 +1617,18 @@ void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
#define SSE_HELPER_F(name, elem, num, F) \
void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \
{ \
d->elem(0) = F(0); \
d->elem(1) = F(1); \
if (num > 2) { \
d->elem(2) = F(2); \
d->elem(3) = F(3); \
if (num > 4) { \
d->elem(4) = F(4); \
d->elem(5) = F(5); \
d->elem(6) = F(6); \
d->elem(7) = F(7); \
d->elem(6) = F(6); \
d->elem(5) = F(5); \
d->elem(4) = F(4); \
} \
d->elem(3) = F(3); \
d->elem(2) = F(2); \
} \
d->elem(1) = F(1); \
d->elem(0) = F(0); \
}
SSE_HELPER_F(helper_pmovsxbw, W, 8, (int8_t) s->B)