Commit graph

106 commits

Author SHA1 Message Date
Richard Henderson e3356f9bad tcg: Drop union from TCGArgConstraint
The union is unused; let "regs" appear in the main structure
without the "u.regs" wrapping.

Backports 9be0d08019465b38e2f1a605960961a491430c21
2021-03-01 19:29:19 -05:00
Richard Henderson 6b91e9bae1 tcg/i386: Implement INDEX_op_rotl{i,s,v}_vec
For immediates, we must continue the special casing of 8-bit
elements. The other element sizes and shift types are trivially
implemented with shifts.

Backports commit 885b1706df6f0211a22e120fac910fb3abf3e733 from qemu
2020-06-14 22:09:24 -04:00
Richard Henderson cc3187b1e4 tcg: Implement gvec support for rotate by scalar
No host backend support yet, but the interfaces for rotls
are in place. Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.

Backports commit 23850a74afb641102325b4b7f74071d929fc4594 from qemu
2020-06-14 22:00:50 -04:00
Richard Henderson be78062fd8 tcg: Implement gvec support for rotate by vector
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
for each backend, and then this code becomes unused.

Backports commit 5d0ceda902915e3f0e21c39d142c92c4e97c3ebb from qemu
2020-06-14 21:43:46 -04:00
Richard Henderson 5cce52a04b tcg: Implement gvec support for rotate by immediate
No host backend support yet, but the interfaces for rotli
are in place. Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.

Backports commit b0f7e7444c03da17e41bf327c8aea590104a28ab from qemu
2020-06-14 21:26:58 -04:00
Richard Henderson 299ba4e867 tcg/i386: Fix INDEX_op_dup2_vec
We were only constructing the 64-bit element, and not
replicating the 64-bit element across the rest of the vector.

Backports commit e20cb81d9c5a3d0f9c08f3642728a210a1c162c9 from qemu
2020-04-30 07:15:08 -04:00
Richard Henderson b358f771f6 tcg/i386: Bound shift count expanding sari_vec
A given RISU testcase for SVE can produce

tcg-op-vec.c:511: do_shifti: Assertion `i >= 0 && i < (8 << vece)' failed.

because expand_vec_sari gave a shift count of 32 to a MO_32
vector shift.

In 44f1441dbe1, we changed from direct expansion of vector opcodes
to re-use of the tcg expanders. So while the comment correctly notes
that the hw will handle such a shift count, we now have to take our
own sanity checks into account. Which is easy in this particular case.

Fixes: 44f1441dbe1

Backports commit 312b426fea4d6dd322d7472c80010a8ba7a166d2 from qemu
2020-04-30 06:26:42 -04:00
Tony Nguyen f75368cd0f
tcg: TCGMemOp is now accelerator independent MemOp
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.

Target dependant attributes are conditionalized upon NEED_CPU_H.

Backports commit 14776ab5a12972ea439c7fb2203a4c15a09094b4 from qemu
2019-11-28 03:01:12 -05:00
Richard Henderson c79510378f
tcg/i386: Use umin/umax in expanding unsigned compare
Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).

Backports commit ebcfb91abed8c0fb180a968b9004419c208dcc02 from qemu
2019-05-24 18:36:32 -04:00
Richard Henderson ffdbc1a233
tcg/i386: Remove expansion for missing minmax
This is now handled by code within tcg-op-vec.c.

Backports commit 3ec3538a45f2fead475b0cca6945092c87927b4f from qemu
2019-05-24 18:34:44 -04:00
Richard Henderson 68cb096196
tcg/i386: Support vector comparison select value
We already had backend support for this feature. Expand the new
cmpsel opcode using vpblendb. The combination allows us to avoid
an extra NOT for some comparison codes.

Backports commit 904c5e19672778cc3349f4975437cfdf3371abb6 from qemu
2019-05-24 18:33:16 -04:00
Richard Henderson 2ea6dfbd63
tcg: Add support for vector compare select
Perform a per-element conditional move. This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.

Backports commit f75da2988eb2457fa23d006d573220c5c680ec4e from qemu
2019-05-24 18:21:13 -04:00
Richard Henderson ca58be9cb4
tcg: Add support for vector bitwise select
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units. Include gvec expanders.

Backports commit 38dc12947ec9106237f9cdbd428792c985cd86ae from qemu
2019-05-24 18:15:10 -04:00
Richard Henderson 60cfe541b2
tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register. This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.

Also fix the 32-bit avx2, which should have used VPBROADCASTW.

Fixes: 1e262b49b533

Backports commit 7b60ef3264e9627ac6efb34e9a6130647e9b55c0 from qemu
2019-05-24 18:04:08 -04:00
Lioncash fcaa52c1fe
tcg: Synchronize with qemu
Resolves any formatting discrepancies and bad merges that slipped
through.
2019-05-16 18:11:08 -04:00
Richard Henderson fd35490991
tcg/i386: Support vector absolute value
Backports commit 18f9b65f1a4225dd314cb9b0a8dea968c5bc2ef3 from qemu
2019-05-16 16:37:33 -04:00
Richard Henderson 6d5e7856ff
tcg: Add support for vector absolute value
Backports commit bcefc90208f8a1d6f619d61c2647281d92277015 from qemu
2019-05-16 16:33:43 -04:00
Richard Henderson 18b3df6e4e
tcg/i386: Support vector scalar shift opcodes
Backports commit 0a8d7a3bf5a149a82450eef555fd61728703dd84 from qemu
2019-05-16 16:19:44 -04:00
Richard Henderson f793ec847d
tcg/i386: Support vector variable shift opcodes
Backports commit a2ce146a06807fe1d1a81e878b8f249ff1e14038 from qemu
2019-05-16 15:53:33 -04:00
Richard Henderson 66e6bea084
tcg: Add INDEX_op_dupm_vec
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.

VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.

Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Backports commit 37ee55a081b7863ffab2151068dd1b2f11376914 from qemu
2019-05-16 15:38:02 -04:00
Richard Henderson a6fd4e2345
tcg/i386: Implement tcg_out_dupm_vec
At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.

Backports commit 1e262b49b5331441f697461e4305fe06719758a7 from qemu
2019-05-16 15:27:15 -04:00
Richard Henderson d4e7c6a8c5
tcg: Add tcg_out_dupm_vec to the backend interface
Currently stubbed out in all backends that support vectors.

Backports commit d6ecb4a978b718dbe108a9fa9ecccc8b7f7cb579 from qemu
2019-05-16 15:24:48 -04:00
Richard Henderson cf238d3544
tcg: Manually expand INDEX_op_dup_vec
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Backports commit bab1671f0fa928fd678a22f934739f06fd5fd035 from qemu
2019-05-16 15:22:29 -04:00
Richard Henderson 3d20e1678c
tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
The i386 backend already has these functions, and the aarch64 backend
could easily split out one. Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.

Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.

Backports commit e7632cfa8b76cdbbc1c76e8737338ef5844e7d60 from qemu
2019-05-16 15:18:48 -04:00
Richard Henderson f86bd1c5d6
tcg: Return bool success from tcg_out_mov
This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Backports commit 78113e83e0007e869c9f0cb4c0497a77538988e3 from qemu
2019-05-16 15:14:42 -04:00
Richard Henderson 6145e3fdd7
tcg: Restart TB generation after out-of-line ldst overflow
This is part c of relocation overflow handling.

Backports commit aeee05f53a5d67304a521d2644dc0a607e3c8b28 from qemu
2019-04-30 10:06:53 -04:00
Richard Henderson 0f20a26b36
tcg/i386: Support INDEX_op_extract2_{i32,i64}
Backports commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab from qemu
2019-04-30 09:37:39 -04:00
Richard Henderson 269fa0daba
tcg: Add INDEX_op_extract2_{i32,i64}
This will let backends implement the double-word shift operation.

Backports commit fce1296f135669eca85dc42154a2a352c818ad76 from qemu
2019-04-30 09:29:05 -04:00
Lioncash 96c52ea053
tcg: Synchronize with qemu 2019-04-22 02:03:01 -04:00
Mark Cave-Ayland 576df55076
tcg/i386: fix unsigned vector saturating arithmetic
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.

Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")

Backports commit 3115584d39afe8cf2a84a40549029f53792abca5 from qemu
2019-02-12 11:37:12 -05:00
Richard Henderson 63d1aae6b2
tcg/i386: Implement vector minmax arithmetic
The avx instruction set does not directly provide MO_64.
We can still implement 64-bit with comparison and vpblendvb.

Backports commit bc37faf4cb2baa77c44298c01558970b88d32808 from qemu
2019-01-29 16:41:11 -05:00
Richard Henderson 5518b543ed
tcg/i386: Implement vector saturating arithmetic
Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.

Backports commit 8ffafbcec275e61f6a1a17ac1d0bd918d5b23db3 from qemu
2019-01-29 16:37:55 -05:00
Richard Henderson 24e65f60ed
tcg/i386: Split subroutines out of tcg_expand_vec_op
This routine was becoming too large.

Backports commit 44f1441dbe14e7174a707d7e7ecbc2c8e080bfda from qemu
2019-01-29 16:33:59 -05:00
Richard Henderson fb684825c8
tcg: Add opcodes for vector minmax arithmetic
Backports commit dd0a0fcdd8848c2a18970c44a62bd8f394c2b495 from qemu
2019-01-29 16:24:52 -05:00
Richard Henderson e0266239ea
tcg: Add opcodes for vector saturated arithmetic
Backports commit 8afaf0506606f8003ef696df849c5a98637a7a83 from qemu
2019-01-29 16:14:34 -05:00
Richard Henderson 5c4e852c6e
tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
For now, defined universally as true, since we previously required
backends to implement swapped memory operations. Future patches
may now remove that support where it is onerous.

Backports commit e1dcf3529d0797b25bb49a20e94b62eb93e7276a from qemu
2018-12-18 05:56:58 -05:00
Richard Henderson 3b85c29bb9
tcg/i386: Assume 32-bit values are zero-extended
We now have an invariant that all TCG_TYPE_I32 values are
zero-extended, which means that we do not need to extend
them again during qemu_ld/st, either explicitly via a separate
tcg_out_ext32u or implicitly via P_ADDR32.

Backports commit 4810d96f03be4d3820563e3c6bf13dfc0627f205 from qemu
2018-12-18 05:42:52 -05:00
Richard Henderson b7b142ed79
tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests
This preserves the invariant that all TCG_TYPE_I32 values are
zero-extended in the 64-bit host register.

Backports commit 75478279a0c1eafc7b69d5382356da138f58f1bd from qemu
2018-12-18 05:38:55 -05:00
Richard Henderson 4e882a95f3
tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.

Backports commit 3dbc8c61de4e0d0a2afe0897cda7ab28cd37a164 from qemu
2018-12-18 05:36:58 -05:00
Richard Henderson bdd6118105
tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.

Backports commit 1d21d95b6101786d44d3b4a12400eb80a1ecc647 from qemu
2018-12-18 05:35:34 -05:00
Richard Henderson fc86fd34ff
tcg/i386: Return false on failure from patch_reloc
Backports commit bec3afd5fc6ab0b6e9d8a01575d58db8d1ad82ce from qemu
2018-12-18 05:27:14 -05:00
Richard Henderson 46189d87b3
tcg: Return success from patch_reloc
This will move the assert for success from within (subroutines of)
patch_reloc into the callers. It will also let new code do something
different when a relocation is out of range.

For the moment, all backends are trivially converted to return true.

Backports commit 6ac1778676f4259c10b0629ccd9df319a5d1baeb from qemu
2018-12-18 05:25:45 -05:00
Richard Henderson 091b4fa1ff
tcg/i386: Move TCG_REG_CALL_STACK from define to enum
Backports commit 66c0285df4270d184afce5ac8b97ac175c89562f from qemu
2018-12-18 05:13:47 -05:00
Richard Henderson f3a8a4a306
tcg/i386: Always use %ebp for TCG_AREG0
For x86_64, this can remove a REX prefix resulting in smaller code
when manipulating globals of type i32, as we move them between backing
store via cpu_env, aka TCG_AREG0.

Backports commit 5740d9f714835964873325d1210b26811252843f from qemu
2018-12-18 05:13:05 -05:00
Roman Kapl 33e69342e3
tcg/i386: fix vector operations on 32-bit hosts
The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7bb ("Add vector operations")

Backports commit 93bf9a42733321fb632bcb9eafd049ef0e3d9417 from qemu
2018-10-02 04:22:35 -04:00
Richard Henderson a4c2dbef3e
tcg/i386: Mark xmm registers call-clobbered
When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.

Fixes: 770c2fc7bb7

Backports commit 672189cd586ea38a2c1d8ab91eb1f9dcff5ceb05 from qemu
2018-07-23 20:00:26 -04:00
John Arbuckle 22c3206738
tcg/i386: Use byte form of xgetbv instruction
The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction. To go around this problem, the raw
encoding of the instruction is used instead.

Backports commit 1019242af11400252f6735ca71a35f81ac23a66d from qemu
2018-06-28 13:23:32 -05:00
Richard Henderson 33f7f6f09a
tcg/i386: Fix dup_vec in non-AVX2 codepath
The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
DUP v26.16b, w21
which becomes TCG IR ops:
dup_vec v128,e8,tmp2,x21
st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0
0x607c5694: 00
0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1
0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14)
0x607c56aa: 00

when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.

Pass the correct source register number to tcg_out_vex_modrm()
for these insns.

Backports commit 7eb30ef0ba2eb59e7430d4848ae8d4bf4e50f768 from qemu
2018-05-11 11:22:38 -04:00
Lioncash 6bdfeb35ec
tcg/i386: Perform comparison pass against qemu
Ensures formatting and code are consistent.
2018-03-20 06:29:06 -04:00
Richard Henderson 2310bd4887
tcg/i386: Support INDEX_op_dup2_vec for -m32
Unknown why -m32 was passing with gcc but not clang; it should have
failed for both. This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.

Backports commit 7f34ed4bcdfda55f978f51aadca64aa970c9f4b6 from qemu
2018-03-17 20:22:24 -04:00