We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.
Backports commit fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
The fallback inline expansion for vectorized absolute value,
when the host doesn't support such an insn was flawed.
E.g. when a vector of bytes has all elements negative, mask
will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1
to the low element instead of all elements becase -mask is 1
and not 0x0101_0101_0101_0101.
Backports commit e7e8f33fb603c3bfa0479d7d924f2ad676a84317
In arm_tr_init_disas_context() we have a FIXME comment that suggests
"cpu_M0 can probably be the same as cpu_V0". This isn't in fact
possible: cpu_V0 is used as a temporary inside gen_iwmmxt_shift(),
and that function is called in various places where cpu_M0 contains a
live value (i.e. between gen_op_iwmmxt_movq_M0_wRn() and
gen_op_iwmmxt_movq_wRn_M0() calls). Remove the comment.
We also have a comment on the declarations of cpu_V0/V1/M0 which
claims they're "for efficiency". This isn't true with modern TCG, so
replace this comment with one which notes that they're only used with
the iwmmxt decode
Backports 8b4c9a50dc9531a729ae4b5941d287ad0422db48
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.
Backports ad9e5aa2ae8032f19a8293b6b8f4661c06167bf0 from qemu
Forgetting this asserts when tcg_gen_cmp_vec is called from
within tcg_gen_cmpsel_vec.
Fixes: 72b4c792c7a
Backports commit 69c918d2ef319ac63cd759c527debc2a2bdf3a0c from qemu
The smin/smax/umin/umax operations require the operands to be
properly sign extended. Do not drop the MO_SIGN bit from the
load, and additionally extend the val input.
Backports commit 852f933e482518797f7785a2e017a215b88df815 from qemu
If the output of the move is dead, then the last use is in
the store. If we propagate the input to the store, then we
can remove the move opcode entirely.
Backports commit 61f15c487fc2aea14f6b0e52c459ae8b7d252a65 from qemu
For immediates, we must continue the special casing of 8-bit
elements. The other element sizes and shift types are trivially
implemented with shifts.
Backports commit 885b1706df6f0211a22e120fac910fb3abf3e733 from qemu
No host backend support yet, but the interfaces for rotls
are in place. Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.
Backports commit 23850a74afb641102325b4b7f74071d929fc4594 from qemu
We do not reflect this expansion in tcg_can_emit_vecop_list,
so it is unused and unusable. However, we actually perform
the same expansion in do_gvec_shifts, so it is also unneeded.
Backports commit 3d5bb2ea5cc9ed54f65a6929a6e6baa01cabd98b from qemu
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
for each backend, and then this code becomes unused.
Backports commit 5d0ceda902915e3f0e21c39d142c92c4e97c3ebb from qemu
No host backend support yet, but the interfaces for rotli
are in place. Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.
Backports commit b0f7e7444c03da17e41bf327c8aea590104a28ab from qemu
For the benefit of compatibility of function pointer types,
we have standardized on int32_t and int64_t as the integral
argument to tcg expanders.
We converted most of them in 474b2e8f0f7, but missed the rotates.
Backports commit 07dada0336a83002dfa8673a9220a88e13d9a45c from qemu
We have this same parameter for GVecGen2i, GVecGen3,
and GVecGen3i. This will make some SVE2 insns easier
to parameterize.
Backports commit ac09ae627e9a2c65c8a452b69c3dac33c29d0719 from qemu
For use when a target needs to pass a configure-specific
target_ulong value to duplicate.
Backports commit 0f039e3ad9131966d9fe509c231b756868b015e2 from qemu
Add a version of tcg_gen_dup_* that takes both immediate and
a vector element size operand. This will replace the set of
tcg_gen_gvec_dup{8,16,32,64}i functions that encode the element
size within the function name.
Backports commit 44c94677febd15488f9190b11eaa4a08e8ac696b from qemu
OPC_SYNC_WMB, OPC_SYNC_MB, OPC_SYNC_ACQUIRE, OPC_SYNC_RELEASE and
OPC_SYNC_RMB have wrong encode. According to the mips manual,
their encode should be 'OPC_SYNC | 0x?? << 6' rather than
'OPC_SYNC | 0x?? << 5'. Wrong encode can lead illegal instruction
errors. These instructions often appear with multi-threaded
simulation.
Fixes: 6f0b99104a3 ("tcg/mips: Add support for fence")
Backports commit a4e57084c16d5b0eff3651693fba04f26b30b551 from qemu
We were only constructing the 64-bit element, and not
replicating the 64-bit element across the rest of the vector.
Backports commit e20cb81d9c5a3d0f9c08f3642728a210a1c162c9 from qemu
A given RISU testcase for SVE can produce
tcg-op-vec.c:511: do_shifti: Assertion `i >= 0 && i < (8 << vece)' failed.
because expand_vec_sari gave a shift count of 32 to a MO_32
vector shift.
In 44f1441dbe1, we changed from direct expansion of vector opcodes
to re-use of the tcg expanders. So while the comment correctly notes
that the hw will handle such a shift count, we now have to take our
own sanity checks into account. Which is easy in this particular case.
Fixes: 44f1441dbe1
Backports commit 312b426fea4d6dd322d7472c80010a8ba7a166d2 from qemu
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.
Target dependant attributes are conditionalized upon NEED_CPU_H.
Backports commit 14776ab5a12972ea439c7fb2203a4c15a09094b4 from qemu
This patch moves the define of target access alignment earlier from
target/foo/cpu.h to configure.
Suggested in Richard Henderson's reply to "[PATCH 1/4] tcg: TCGMemOp is now
accelerator independent MemOp"
Backports commit 52bf9771fdfce98e98cea36a17a18915be6f6b7f from qemu
This patch fixes two problems:
(1) The inputs to the EXTR insn were reversed,
(2) The input constraints use rZ, which means that we need to use
the REG0 macro in order to supply XZR for a constant 0 input.
Fixes: 464c2969d5d
Backports commit 1789d4274b851fb8fdf4a947ce5474c63e813d0d from qemu
On a 64-bit host, discard any replications of the 32-bit
sign bit when performing the shift and merge.
Backports commit 80f4d7c3ae216c191fb403e149bcba88d6aa40bb from qemu
This operation can always be emitted, even if we need to
fall back to xor. Adjust the assertions to match.
Backports commit 11978f6f58f1d3d66429f7ff897524f693d823ce from qemu
Amusingly, we had already ignored the comment to keep this value
at the end of CPUState. This restores the minimum negative offset
from TCG_AREG0 for code generation.
For the couple of uses within qom/cpu.c, without NEED_CPU_H, add
a pointer from the CPUState object to the IcountDecr object within
CPUNegativeOffsetState.
Backports commit 5e1401969b25f676fee6b1c564441759cf967a43 from qemu
Now that we have ArchCPU, we can define this generically,
in the one place that needs it.
Backports commit 677c4d69ac21961e76a386f9bfc892a44923acc0 from qemu
The allows immediates to be used for ORR and BIC,
as well as the trivial inversions, ORC and AND.
Backports commit 9e27f58b9902834dffc0d66d9eb62f78d9c2a632 from qemu
Use MOVI+ORR or MVNI+BIC in order to build some vector constants,
as opposed to dropping them to the constant pool. This includes
all 16-bit constants and a similar set of 32-bit constants.
Backports commit 02f3a5b4744885258758d07ebe09cf965de78bcf from qemu
The compliment of a subset of immediates can be computed
with a single instruction.
Backports commit 7e308e003e5b6ddd3130e09711e1d33693230696 from qemu
There are several sub-classes of vector immediate, and only MOVI
can use them all. This will enable usage of MVNI and ORRI, which
use progressively fewer sub-classes.
This patch adds no new functionality, merely splits the function
and moves part of the logic into tcg_out_dupi_vec.
Backports commit 984fdcee342473dfe797897758929dad654693c8 from qemu
The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination. We
can represent the operation without overlap and choose based on
the operands seen.
Backports commit a9e434a5dc16f71ee156428619fc3c3765b68f26 from qemu
Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).
Backports commit ebcfb91abed8c0fb180a968b9004419c208dcc02 from qemu
We already had backend support for this feature. Expand the new
cmpsel opcode using vpblendb. The combination allows us to avoid
an extra NOT for some comparison codes.
Backports commit 904c5e19672778cc3349f4975437cfdf3371abb6 from qemu
If INDEX_op_foo is always expanded by tcg_expand_vec_op, then
there may be no reasonable set of constraints to return from
tcg_target_op_def for that opcode.
Let TCG_TARGET_HAS_foo be specified as -1 in that case. Thus a
boolean test for TCG_TARGET_HAS_foo is true, but we will not
assert within process_op_defs when no constraints are specified.
Compare this with tcg_can_emit_vec_op, which already uses this
tri-state indication.
Backports commit 25c012b4009256505be3430480954a0233de343e from qemu
This makes do_op3 match do_op2 in allowing for failure,
and thus fall back expansions.
Backports commit 17f79944ebeace8bf43047a33b7775ba5ed9070e from qemu
Perform a per-element conditional move. This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.
Backports commit f75da2988eb2457fa23d006d573220c5c680ec4e from qemu
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units. Include gvec expanders.
Backports commit 38dc12947ec9106237f9cdbd428792c985cd86ae from qemu
The paths through tcg_gen_dup_mem_vec and through MO_128 were
missing the check_size_align. The path through MO_128 was also
missing the expand_clr. This last was not visible because the
only user is ARM SVE, which would set oprsz == maxsz, and not
require the clear.
Fix by adding the check_size_align and using do_dup directly
instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}.
Backports commit 532ba368a13712724137228b5e7e9435994d25e1 from qemu
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register. This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.
Also fix the 32-bit avx2, which should have used VPBROADCASTW.
Fixes: 1e262b49b533
Backports commit 7b60ef3264e9627ac6efb34e9a6130647e9b55c0 from qemu
Remove a function of the same name from target/arm/.
Use a branchless implementation of abs gleaned from gcc.
Backports commit ff1f11f7f8710a768f9313f24bd7f509d3db27e5 from qemu
Allow expansion either via shift by scalar or by replicating
the scalar for shift by vector.
Backports commit b4578cd91cda4cef1c413304353ca6dc5b957b60 from qemu
The gvec expanders perform a modulo on the shift count. If the target
requires alternate behaviour, then it cannot use the generic gvec
expanders anyway, and will have to have its own custom code.
Backports commit 5ee5c14cacda27e904cd6b0d9e7ffe1acff42838 from qemu
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.
Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:
VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.
VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.
Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.
All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.
Backports commit 37ee55a081b7863ffab2151068dd1b2f11376914 from qemu
The LD1R instruction does all the work. Note that the only
useful addressing mode is a base register with no offset.
Backports commit f23e5e15edfd49d5dd72cab2ed2d85ac354b2eeb from qemu
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.
Backports commit bab1671f0fa928fd678a22f934739f06fd5fd035 from qemu
The i386 backend already has these functions, and the aarch64 backend
could easily split out one. Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.
Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.
Backports commit e7632cfa8b76cdbbc1c76e8737338ef5844e7d60 from qemu
PowerPC Altivec does not support direct moves between vector registers
and general registers. So when tcg_out_mov fails, we can use the
backing memory for the temporary to perform the move.
Backports commit 240c08d0998f402c325fce489de0d14831048128 from qemu
This patch merely changes the interface, aborting on all failures,
of which there are currently none.
Backports commit 78113e83e0007e869c9f0cb4c0497a77538988e3 from qemu
We have a function that takes an additional condition parameter
over the standard backend interface. It already takes care of
eliding no-op moves.
Backports commit c16f52b2c5d91c36e121795bd3b386cea0b7573c from qemu
The only fixed_reg is cpu_env, and it should not be modified
during any TB. Therefore code that tries to special-case moves
into a fixed_reg is dead. Remove it.
Backports commit d63e3b6e694ad6c887be135dddb9cd4893f1a844 from qemu
Replace the single opcode in .opc with a null-terminated
array in .opt_opc. We still require that all opcodes be
used with the same .vece.
Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion. Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.
Convert all existing vector aware front ends.
Backports commit 53229a7703eeb2bbe101a19a33ef22aaf960c65b from qemu
PowerPC Altivec does not support add and subtract of 64-bit elements.
Prepare for that configuration by not assuming the operation is
universally supported.
Backports commit ce27c5d1a38e93da38653af71fb468c5eded4c7b from qemu
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.
Backports commit ac383dde33405106469d04a78de1d76f1a730cb1 from qemu
Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.
Backports commit e1227bb6e59173117f094a6a13b998587b45c928 from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
The tb->cflags field is not passed to tcg generation functions. So
we add a field to TCGContext, storing there a copy of tb->cflags.
Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the new field.
Backports commit e82d5a2460b0e176128027651ff9b104e4bdf5cc from qemu
This will not necessarily restrict the size of the TB, since for v7
the majority of constant pool usage is for calls from the out-of-line
ldst code, which is already at the end of the TB. But this does
allow us to save one insn per reference on the off-chance.
Backports commit b4b82d7e9caff7ccca5c621817b5a4b8e95eb9b1 from qemu
There is no point in coding for a 2GB offset when the max TB size
is already limited to 64k. If we further restrict to 32k then we
can eliminate the extra ADDIS instruction.
Backports commit a7cdaf710f2aaaf0be855a338dd67463d4bb99e2 from qemu
If the TB generates too much code, such that backend relocations
overflow, try again with a smaller TB. In support of this, move
relocation processing from a random place within tcg_out_op, in
the handling of branch opcodes, to a new function at the end of
tcg_gen_code.
This is not a complete solution, as there are additional relocs
generated for out-of-line ldst handling and constant pools.
Backports commit 7ecd02a06f8f4c0bbf872ecc15e37035b7e1df5f from qemu
If a TB generates too much code, try again with fewer insns.
Fixes: https://bugs.launchpad.net/bugs/1824853
Backports commit 6e6c4efed995d9eca6ae0cfdb2252df830262f50 from qemu
Will be helpful for s390x. Input 128 bit and output 64 bit only,
which is sufficient for now.
Backports commit 2089fcc9e7b4174d1c351eaa7d277c02188a6dd2 from qemu
This ports over the RISC-V architecture from Qemu. This is currently a
very barebones transition. No code hooking or any fancy stuff.
Currently, you can feed it instructions and query the CPU state itself.
This also allows choosing whether or not RISC-V 32-bit or RISC-V 64-bit
is desirable through Unicorn's interface as well.
Extremely basic examples of executing a single instruction have been
added to the samples directory to help demonstrate how to use the basic
functionality.
The last update to this file was 9 years ago. In the meantime,
4 of the 6 ideas have actually been completed. The lat two do
not actually make sense anymore.
Backports commit 9e564a1dde5abc7ae4cebc115142f685d98938d7 from qemu
Completely rewrite conditional stores handling. Use cmpxchg.
This eliminates need for separate implementations of SC instruction
emulation for user and system emulation.
Backports commit 33a07fa2db66376e6ee780d4a8b064dc5118cf34 from qemu
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.
Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")
Backports commit 3115584d39afe8cf2a84a40549029f53792abca5 from qemu
Currently, a jump to a label that is not defined anywhere will
be emitted not be relocated. This results in a jump to a random
jump target. With tcg debugging, print a diagnostic to the -d op
file and abort.
This could help debug or detect errors like
c2d9644e6d ("target/arm: Fix crash on conditional instruction in an IT block")
Backports commit bef16ab4e641636b4e85c3d863b4257ce0be4e6f from qemu