Commit graph

621 commits

Author SHA1 Message Date
Richard Henderson b49c4639d1 tcg: Add temp_readonly
In most, but not all, places that we check for TEMP_FIXED,
we are really testing that we do not modify the temporary.

Backports e01fa97dea857a35be5bb8cce0d632a62e72c689
2021-03-03 20:45:25 -05:00
Richard Henderson 30739864d2 tcg: Consolidate 3 bits into enum TCGTempKind
The temp_fixed, temp_global, temp_local bits are all related.
Combine them into a single enumeration.

Backports ee17db83d2dce35792e9bf03366af193e5e0e5c9
2021-03-03 20:41:24 -05:00
Richard Henderson 520ec7ca76 tcg: Increase tcg_out_dupi_vec immediate to int64_t
While we don't store more than tcg_target_long in TCGTemp,
we shouldn't be limited to that for code generation. We will
be able to use this for INDEX_op_dup2_vec with 2 constants.

Also pass along the minimal vece that may be said to apply
to the constant. This allows some simplification in the
various backends.

Backports 4e18617555955503628a004ed97e1fc2fa7818b9
2021-03-03 20:27:39 -05:00
Richard Henderson c5c19529c5 tcg: Use tcg_out_dupi_vec from temp_load
Having dupi pass though movi is confusing and arguably wrong.

Backports 0a6a8bc8ebfe5ae2a3f18ef48b92a74bc2df2f96
2021-03-03 20:23:02 -05:00
Richard Henderson d0e0c847e1 tcg/aarch64: Use B not BL for tcg_out_goto_long
A typo generated a branch-and-link insn instead of plain branch.

Backports f716bab3a9553259ff90505b3ddd245f4f8c4061
2021-03-03 19:55:13 -05:00
Richard Henderson 9d453e820a tcg: Introduce INDEX_op_qemu_st8_i32
Enable this on i386 to restrict the set of input registers
for an 8-bit store, as required by the architecture. This
removes the last use of scratch registers for user-only mode.

Backports 07ce0b05300de5bc8f1932a4cfbe38f3323e5ab1
2021-03-03 19:51:20 -05:00
Richard Henderson a90476c897 tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP
Always true when movbe is available, otherwise leave
this to generic code.

Backports d2ef1b83a7a2047e0e36d7b62b3a5d151ab958f5
2021-03-03 19:40:00 -05:00
Richard Henderson 5fc12c7fba tcg: Add tcg_gen_bswap_tl alias
The alias is intended to indicate that the bswap is for the
entire target_long. This should avoid ifdefs on some targets.

Backports a66424ba17d661007dc13d78c9e3014ccbaf0efb
2021-03-03 19:30:11 -05:00
Richard Henderson 4ccadaf6cf tcg: Use memset for large vector byte replication
In f47db80cc07, we handled odd-sized tail clearing for
the case of hosts that have vector operations, but did
not handle the case of hosts that do not have vector ops.

This was ok until e2e7168a214b, which changed the encoding
of simd_desc such that the odd sizes are impossible.

Add memset as a tcg helper, and use that for all out-of-line
byte stores to vectors. This includes, but is not limited to,
the tail clearing operation in question.

Backports 6d3ef04893bdea3e7aa08be3cce5141902836a31
2021-03-03 19:28:15 -05:00
Thomas Huth 6a22a7b80e tcg/optimize: Add fallthrough annotations
To be able to compile this file with -Werror=implicit-fallthrough,
we need to add some fallthrough annotations to the case statements
that might fall through. Unfortunately, the typical "/* fallthrough */"
comments do not work here as expected since some case labels are
wrapped in macros and the compiler fails to match the comments in
this case. But using __attribute__((fallthrough)) seems to work fine,
so let's use that instead (by introducing a new QEMU_FALLTHROUGH
macro in our compiler.h header file).

Backports d84568b773fe1fc469c4d8419c3545be52eec82c
2021-03-03 19:18:50 -05:00
Richard Henderson c648361597 tcg: Remove TCG_TARGET_HAS_cmp_vec
The cmp_vec opcode is mandatory; this symbol is unused.

Backports cae5d53b9e72d7a1e43cebeb36471d77a16c6e43
2021-03-01 19:49:02 -05:00
Richard Henderson 45af31fcb4 tcg/optimize: Fold dup2_vec
When the two arguments are identical, this can be reduced to
dup_vec or to mov_vec from a tcg_constant_vec.

Backports commit 1dc4fe70128db05237a00eda6eb15e2b44deb31f
2021-03-01 19:46:14 -05:00
Richard Henderson 456fb66617 tcg: Fix generation of dupi_vec for 32-bit host
The definition of INDEX_op_dupi_vec is that it operates on
units of tcg_target_ulong -- in this case 32 bits. It does
not work to use this for a uint64_t value that happens to be
small enough to fit in tcg_target_ulong.

Backports a5b30d950c42b14bc9da24d1e68add6538d23336
2021-03-01 19:45:30 -05:00
Richard Henderson 578673be68 tcg/i386: Fix dupi for avx2 32-bit hosts
The previous change wrongly stated that 32-bit avx2 should have
used VPBROADCASTW. But that's a 16-bit broadcast and we want a
32-bit broadcast.

Backports f80d09b599a5e0fd7f44653f23b04104cb703f7a
2021-03-01 19:44:09 -05:00
Richard Henderson 50b3632ab4 tcg: Remove TCGOpDef.used
The last user of this field disappeared in f69d277ece4.
2021-03-01 19:43:37 -05:00
Richard Henderson 7813c57f9e tcg: Move some TCG_CT_* bits to TCGArgConstraint bitfields
These are easier to set and test when they have their own fields.
Reduce the size of alias_index and sort_index to 4 bits, which is
sufficient for TCG_MAX_OP_ARGS. This leaves only the bits indicating
constants within the ct field.

Move all initialization to allocation time, rather than init
individual fields in process_op_defs.

Backports bc2b17e6ea582ef3ade2bdca750de269c674c915
2021-03-01 19:41:34 -05:00
Richard Henderson 71a34d84e5 tcg: Remove TCG_CT_REG
This wasn't actually used for anything, really. All variable
operands must accept registers, and which are indicated by the
set in TCGArgConstraint.regs.

Backports commit 74a117906b87ff9220e4baae5a7431d6f4eadd45
2021-03-01 19:38:00 -05:00
Richard Henderson ae075d324d tcg: Move sorted_args into TCGArgConstraint.sort_index
This uses an existing hole in the TCGArgConstraint structure
and will be convenient for keeping the data in one place.

Backports 66792f90f14fef18b25a168922877a367ecdca05
2021-03-01 19:33:45 -05:00
Richard Henderson e3356f9bad tcg: Drop union from TCGArgConstraint
The union is unused; let "regs" appear in the main structure
without the "u.regs" wrapping.

Backports 9be0d08019465b38e2f1a605960961a491430c21
2021-03-01 19:29:19 -05:00
Richard Henderson 1551f6be9d tcg: Adjust simd_desc size encoding
With larger vector sizes, it turns out oprsz == maxsz, and we only
need to represent mismatch for oprsz <= 32. We do, however, need
to represent larger oprsz and do so without reducing SIMD_DATA_BITS.

Reduce the size of the oprsz field and increase the maxsz field.
Steal the oprsz value of 24 to indicate equality with maxsz.

Backports e2e7168a214b0ed98dc357bba96816486a289762
2021-03-01 19:23:37 -05:00
Richard Henderson cd79d2a915 tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.

Backports commit fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
2021-03-01 18:10:07 -05:00
Richard Henderson b478ce5052 tcg: Eliminate one store for in-place 128-bit dup_mem
Do not store back to the exact memory from which we just loaded.

Backports 6a17646176e011ddc463a2870a64c7aaccfe9c50
2021-03-01 18:06:17 -05:00
Stephen Long c9dc750058 tcg: Fix tcg gen for vectorized absolute value
The fallback inline expansion for vectorized absolute value,
when the host doesn't support such an insn was flawed.

E.g. when a vector of bytes has all elements negative, mask
will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1
to the low element instead of all elements becase -mask is 1
and not 0x0101_0101_0101_0101.

Backports commit e7e8f33fb603c3bfa0479d7d924f2ad676a84317
2021-03-01 18:04:46 -05:00
Peter Maydell e0000d1700 target/arm/translate.c: Delete/amend incorrect comments
In arm_tr_init_disas_context() we have a FIXME comment that suggests
"cpu_M0 can probably be the same as cpu_V0". This isn't in fact
possible: cpu_V0 is used as a temporary inside gen_iwmmxt_shift(),
and that function is called in various places where cpu_M0 contains a
live value (i.e. between gen_op_iwmmxt_movq_M0_wRn() and
gen_op_iwmmxt_movq_wRn_M0() calls). Remove the comment.

We also have a comment on the declarations of cpu_V0/V1/M0 which
claims they're "for efficiency". This isn't true with modern TCG, so
replace this comment with one which notes that they're only used with
the iwmmxt decode

Backports 8b4c9a50dc9531a729ae4b5941d287ad0422db48
2021-02-26 11:23:52 -05:00
LIU Zhiwei 0968caa249 target/riscv: add vector extension field in CPURISCVState
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Backports ad9e5aa2ae8032f19a8293b6b8f4661c06167bf0 from qemu
2021-02-26 02:17:49 -05:00
Richard Henderson 55369d710c tcg: Save/restore vecop_list around minmax fallback
Forgetting this asserts when tcg_gen_cmp_vec is called from
within tcg_gen_cmpsel_vec.

Fixes: 72b4c792c7a

Backports commit 69c918d2ef319ac63cd759c527debc2a2bdf3a0c from qemu
2021-02-25 23:33:24 -05:00
Richard Henderson 65d5288563 tcg: Fix do_nonatomic_op_* vs signed operations
The smin/smax/umin/umax operations require the operands to be
properly sign extended. Do not drop the MO_SIGN bit from the
load, and additionally extend the val input.

Backports commit 852f933e482518797f7785a2e017a215b88df815 from qemu
2021-02-25 23:10:40 -05:00
Richard Henderson 0e68fa345e tcg: Improve move ops in liveness_pass_2
If the output of the move is dead, then the last use is in
the store. If we propagate the input to the store, then we
can remove the move opcode entirely.

Backports commit 61f15c487fc2aea14f6b0e52c459ae8b7d252a65 from qemu
2020-06-14 22:13:04 -04:00
Richard Henderson 6b91e9bae1 tcg/i386: Implement INDEX_op_rotl{i,s,v}_vec
For immediates, we must continue the special casing of 8-bit
elements. The other element sizes and shift types are trivially
implemented with shifts.

Backports commit 885b1706df6f0211a22e120fac910fb3abf3e733 from qemu
2020-06-14 22:09:24 -04:00
Richard Henderson cc3187b1e4 tcg: Implement gvec support for rotate by scalar
No host backend support yet, but the interfaces for rotls
are in place. Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.

Backports commit 23850a74afb641102325b4b7f74071d929fc4594 from qemu
2020-06-14 22:00:50 -04:00
Richard Henderson 2aa9d13120 tcg: Remove expansion to shift by vector from do_shifts
We do not reflect this expansion in tcg_can_emit_vecop_list,
so it is unused and unusable. However, we actually perform
the same expansion in do_gvec_shifts, so it is also unneeded.

Backports commit 3d5bb2ea5cc9ed54f65a6929a6e6baa01cabd98b from qemu
2020-06-14 21:53:36 -04:00
Richard Henderson be78062fd8 tcg: Implement gvec support for rotate by vector
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
for each backend, and then this code becomes unused.

Backports commit 5d0ceda902915e3f0e21c39d142c92c4e97c3ebb from qemu
2020-06-14 21:43:46 -04:00
Richard Henderson 5cce52a04b tcg: Implement gvec support for rotate by immediate
No host backend support yet, but the interfaces for rotli
are in place. Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.

Backports commit b0f7e7444c03da17e41bf327c8aea590104a28ab from qemu
2020-06-14 21:26:58 -04:00
Richard Henderson 742301a7c1 tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64}
For the benefit of compatibility of function pointer types,
we have standardized on int32_t and int64_t as the integral
argument to tcg expanders.

We converted most of them in 474b2e8f0f7, but missed the rotates.

Backports commit 07dada0336a83002dfa8673a9220a88e13d9a45c from qemu
2020-05-07 10:41:01 -04:00
Richard Henderson 0bcd0ca93d tcg: Add load_dest parameter to GVecGen2
We have this same parameter for GVecGen2i, GVecGen3,
and GVecGen3i. This will make some SVE2 insns easier
to parameterize.

Backports commit ac09ae627e9a2c65c8a452b69c3dac33c29d0719 from qemu
2020-05-07 10:35:47 -04:00
Richard Henderson f02f71f38f tcg: Improve vector tail clearing
Better handling of non-power-of-2 tails as seen with Arm 8-byte
vector operations.

Backports commit f47db80cc073c0a7a22136c8296b5eca20c0e199 from qemu
2020-05-07 10:24:00 -04:00
Richard Henderson 549b0ec3c5 tcg: Add tcg_gen_gvec_dup_tl
For use when a target needs to pass a configure-specific
target_ulong value to duplicate.

Backports commit 0f039e3ad9131966d9fe509c231b756868b015e2 from qemu
2020-05-07 10:12:09 -04:00
Richard Henderson e65806c356 tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i
These interfaces are now unused.

Backports commit 398f21412aeec158338963e3f71c9313bc126a71 form qemu
2020-05-07 10:11:00 -04:00
Richard Henderson 43a72b0540 tcg: Use tcg_gen_gvec_dup_imm in logical simplifications
Replace the outgoing interface.

Backports commit 03ddb6f315ca6d02dfdba0aecc43aa97c728c428 from qemu
2020-05-07 10:09:53 -04:00
Richard Henderson 07f622e57d tcg: Add tcg_gen_gvec_dup_imm
Add a version of tcg_gen_dup_* that takes both immediate and
a vector element size operand. This will replace the set of
tcg_gen_gvec_dup{8,16,32,64}i functions that encode the element
size within the function name.

Backports commit 44c94677febd15488f9190b11eaa4a08e8ac696b from qemu
2020-05-07 09:55:25 -04:00
lixinyu 7c32c5b0a4 tcg/mips: mips sync* encode error
OPC_SYNC_WMB, OPC_SYNC_MB, OPC_SYNC_ACQUIRE, OPC_SYNC_RELEASE and
OPC_SYNC_RMB have wrong encode. According to the mips manual,
their encode should be 'OPC_SYNC | 0x?? << 6' rather than
'OPC_SYNC | 0x?? << 5'. Wrong encode can lead illegal instruction
errors. These instructions often appear with multi-threaded
simulation.

Fixes: 6f0b99104a3 ("tcg/mips: Add support for fence")

Backports commit a4e57084c16d5b0eff3651693fba04f26b30b551 from qemu
2020-04-30 07:24:57 -04:00
Richard Henderson 299ba4e867 tcg/i386: Fix INDEX_op_dup2_vec
We were only constructing the 64-bit element, and not
replicating the 64-bit element across the rest of the vector.

Backports commit e20cb81d9c5a3d0f9c08f3642728a210a1c162c9 from qemu
2020-04-30 07:15:08 -04:00
Richard Henderson b358f771f6 tcg/i386: Bound shift count expanding sari_vec
A given RISU testcase for SVE can produce

tcg-op-vec.c:511: do_shifti: Assertion `i >= 0 && i < (8 << vece)' failed.

because expand_vec_sari gave a shift count of 32 to a MO_32
vector shift.

In 44f1441dbe1, we changed from direct expansion of vector opcodes
to re-use of the tcg expanders. So while the comment correctly notes
that the hw will handle such a shift count, we now have to take our
own sanity checks into account. Which is easy in this particular case.

Fixes: 44f1441dbe1

Backports commit 312b426fea4d6dd322d7472c80010a8ba7a166d2 from qemu
2020-04-30 06:26:42 -04:00
Richard Henderson 12b4e01d9c tcg: Add tcg_gen_gvec_5_ptr
Extend the vector generator infrastructure to handle
5 vector arguments.

Backports commit 2445971604c1cfd3ec484457159f4ac300fb04d2 from qemu
2020-03-21 16:54:01 -04:00
Tony Nguyen f75368cd0f
tcg: TCGMemOp is now accelerator independent MemOp
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.

Target dependant attributes are conditionalized upon NEED_CPU_H.

Backports commit 14776ab5a12972ea439c7fb2203a4c15a09094b4 from qemu
2019-11-28 03:01:12 -05:00
Emilio G. Cota ca15f620b0
tcg/README: fix typo s/afterwise/afterwards/
Afterwise is "wise after the fact", as in "hindsight".
Here we meant "afterwards" (as in "subsequently"). Fix it.
2019-11-28 02:37:51 -05:00
tony.nguyen@bt.com b4c2c94602
configure: Define target access alignment in configure
This patch moves the define of target access alignment earlier from
target/foo/cpu.h to configure.

Suggested in Richard Henderson's reply to "[PATCH 1/4] tcg: TCGMemOp is now
accelerator independent MemOp"

Backports commit 52bf9771fdfce98e98cea36a17a18915be6f6b7f from qemu
2019-11-18 21:41:35 -05:00
Richard Henderson d291a311ee
tcg/aarch64: Fix output of extract2 opcodes
This patch fixes two problems:
(1) The inputs to the EXTR insn were reversed,
(2) The input constraints use rZ, which means that we need to use
the REG0 macro in order to supply XZR for a constant 0 input.

Fixes: 464c2969d5d

Backports commit 1789d4274b851fb8fdf4a947ce5474c63e813d0d from qemu
2019-08-08 19:25:37 -04:00
Richard Henderson b2d75f4955
tcg: Fix constant folding of INDEX_op_extract2_i32
On a 64-bit host, discard any replications of the 32-bit
sign bit when performing the shift and merge.

Backports commit 80f4d7c3ae216c191fb403e149bcba88d6aa40bb from qemu
2019-08-08 19:25:01 -04:00
Richard Henderson 955661ad7b
tcg: Fix expansion of INDEX_op_not_vec
This operation can always be emitted, even if we need to
fall back to xor. Adjust the assertions to match.

Backports commit 11978f6f58f1d3d66429f7ff897524f693d823ce from qemu
2019-08-08 19:23:03 -04:00