Convert the Neon narrowing moves VMQNV, VQMOVN, VQMOVUN in the 2-reg-misc
group to decodetree.
Backports commit 3882bdacb0ad548864b9f2582a32bb5c785e3165 from qemu
Convert the pairwise ops VPADDL and VPADAL in the 2-reg-misc grouping
to decodetree.
At this point we can get rid of the weird CPU_V001 #define that was
used to avoid having to explicitly list all the arguments being
passed to some TCG gen/helper functions.
Backports commit 6106af3aa2304fccee91a3a90138352b0c2af998 from qemu
Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)
Backports commit 9aaa23c2ae18e6fb9a291b81baf91341db76dfa0 from qemu
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.
Backports commit 54e96c744b70a5d19f14b212a579dd3be8fcaad9 from qemu
Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.
We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.
Backports commit 0aad761fb0aed40c99039eacac470cbd03d07019 from qemu
Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.
Backports commit 77e576a9281825fc170f3b3af83f47e110549b5c from qemu
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.
Backports commit 85ac9aef9a5418de3168df569e21258e853840a2 from qemu
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree. These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.
The refactoring removes some of the oddities of the old decoder:
* operands to the operation and accumulation were often
reversed (taking advantage of the fact that most of these ops
are commutative); the new code follows the pseudocode order
* the Q bit in the insn was in a local variable 'u'; in the
new code it is decoded into a->q
Backports commit 96fc80f5f186decd1a649f6c04252faceb057ad2 from qemu
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.
Backports commit 18fb58d588898550919392277787979ee7d0d84e from qemu
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.
These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.
Backports commit 9546ca5998d3cbd98a81b2d46a2e92a11b0f78a4 from qemu
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.
Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.
Backports commit 3a1d9eb07b767a7592abca642af80906f9eab0ed from qemu
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.
Backports commit f5b28401200ec95ba89552df3ecdcdc342f6b90b from qemu
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.
Backports commit 0fa1ab0302badabc3581aefcbb2f189ef52c4985 from qemu
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.
As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.
Backports commit b28be09570d0827969b62b8f82b0f720a9915427 from qemu
Convert the insns in the one-register-and-immediate group to decodetree.
In the new decode, our asimd_imm_const() function returns a 64-bit value
rather than a 32-bit one, which means we don't need to treat cmode=14 op=1
as a special case in the decoder (it is the only encoding where the two
halves of the 64-bit value are different).
Backports commit 2c35a39eda0b16c2ed85c94cec204bf5efb97812 from qemu
Convert the VCVT fixed-point conversion operations in the
Neon 2-regs-and-shift group to decodetree.
Backports commit 3da26f11711caeaa18318b6afa14dfb81d7650ab from qemu
Convert the VSHLL and VMOVL insns from the 2-reg-shift group
to decodetree. Since the loop always has two passes, we unroll
it to avoid the awkward reassignment of one TCGv to another.
Backports commit 968bf842742a5ffbb0041cb31089e61a9f7a833d from qemu
Convert the VQSHLU and QVSHL 2-reg-shift insns to decodetree.
These are the last of the simple shift-by-immediate insns.
Backports commit 37bfce81b10450071193c8495a07f182ec652e2a from qemu
Convert the VSHR 2-reg-shift insns to decodetree.
Note that unlike the legacy decoder, we present the right shift
amount to the trans_ function as a positive integer.
Backports commit 66432d6b8294e3508218b360acfdf7c244eea993 from qemu
Convert the VSHL and VSLI insns from the Neon 2-registers-and-a-shift
group to decodetree.
Backports commit d3c8c736f8b4bdd02831076286b1788232f46ced from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit effa992f153f5e7ab97ab843b565690748c5b402 from qemu
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.
This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.
Backports commit a04b68e1d4c4f0cd5cd7542697b1b230b84532f5 from qemu
Convert the Neon floating point VFMA and VFMS insn to decodetree.
These are the last insns in the 3-reg-same group so we can
remove all the support/loop code from the old decoder.
Backports commit e95485f85657be21135c17a9226e297c21e73360 from qemu
Convert the Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS 3-reg-same
insns to decodetree. (These are all the remaining non-accumulation
instructions in this group.)
Backports commit d5fdf9e9e1c6f2bbb0a4bcaafd85d344cce9c298 from qemu
The usual location for the env argument in the argument list of a TCG helper
is immediately after the return-value argument. recps_f32 and rsqrts_f32
differ in that they put it at the end.
Move the env argument to its usual place; this will allow us to
more easily use these helper functions with the gvec APIs.
Backports commit 26c6f695cfd2a3ccddb4d015a25b56f56aa62928 from qemu
Convert the Neon integer 3-reg-same compare insns VCGE, VCGT,
VCEQ, VACGE and VACGT to decodetree.
Backports commit 727ff1d63213e6666e511956903b9e97a339ec7e from qemu
Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
decodetree.
We don't have a gvec helper for multiply-accumulate, so VMLA and VMLS
need a loop function do_3same_fp(). This takes a reads_vd parameter
to do_3same_fp() which tells it to load the old value into vd before
calling the callback function, in the same way that the do_vfp_3op_sp()
and do_vfp_3op_dp() functions in translate-vfp.inc.c work. (The
only uses in this patch pass reads_vd == true, but later commits
will use reads_vd == false.)
This conversion fixes in passing an underdecoding for VMUL
Backports commit 8aa71ead912ca0a9c0d29b74e0976f91952f950a from qemu
Convert the Neon float VPMIN, VPMAX and VPADD 3-reg-same insns to
decodetree. These are the only remaining 'pairwise' operations,
so we can delete the pairwise-specific bits of the old decoder's
for-each-element loop now.
Backports commit ab978335a56e3618212868fdce3a54217c6e71e6 from qemu
Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
We already have gvec helpers for addition and subtraction, but must
add one for fabd.
Backports commit a26a352bb498662cd0c205cb433a352f86fac7d2 from qemu
Convert the Neon VQDMULH and VQRDMULH 3-reg-same insns to
decodetree. These are the last integer operations in the
3-reg-same group.
Backports commit 7ecc28bc72b8033cf4e0c6332135ec20d4125dfb from qemu
Convert the Neon integer VPADD 3-reg-same insns to decodetree. These
are 'pairwise' operations. (Note that VQRDMLAH, which shares the
same primary opcode but has U=1, has already been converted.)
Backports commit fa22827d4eb078b6c58cd3d19af0b50ed951e832 from qemu
Convert the Neon integer VPMAX and VPMIN 3-reg-same insns to
decodetree. These are 'pairwise' operations.
Backports commit 059c2398a2b1ae86c6722c45e79fb0d0f4d95b1d from qemu
Convert the VQSHL, VRSHL and VQRSHL insns in the 3-reg-same
group to decodetree. We have already implemented the size==0b11
case of these insns; this commit handles the remaining sizes
Backports commit 6812dfdc6b0286730d6f903ebfbdc4f81b80c29b from qemu
Convert the Neon VRHADD and VHSUB 3-reg-same insns to decodetree.
(These are all the other insns in 3-reg-same which were using
GEN_NEON_INTEGER_OP() and which are not pairwise or
reversed-operands.)
Backports commit 8e44d03f4b5590e19a4f7910ca1c327609933dd7 from qemu
Convert the 64-bit element insns in the 3-reg-same group
to decodetree. This covers VQSHL, VRSHL and VQRSHL where
size==0b11.
Backports commit 35d4352fa9e94b35bf17f58181cb16c184b98d56 from qemu
Convert the Neon VQRDMLAH and VQRDMLSH insns in the 3-reg-same group
to decodetree. These don't use do_3same() because they want to
operate on VFP double registers, whose offsets are different from the
neon_reg_offset() calculations do_3same does.
Backports commit a063569508af8295cf6271e06700e5b956bb402d from qemu
Pass a pointer directly to env->vfp.qc[0], rather than env.
This will allow SVE2, which does not modify QC, to pass a
pointer to dummy storage.
Change the return type of inl_qrdml.h_s16 to match the
sense of the operation: signed.
Backports commit e286bf4a72fe3a60490b8d6e3f28d6335677e08c from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit 146aa66ce58b686b8037d0eb3921c1125942dbde from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit c7715b6b51a6f7a5412c5fcb40a4c8586105e597 from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit 8161b75357095fef54c76b1a6ed1e54d0e8655e0 from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit 271063206a46062a45fc6bab8dabe45f0b88159d from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Macro-ize the 5 nearly identical comparisons.
Backports commit 69d5e2bf8c3cefedbfa1c1670137e636dbd7faa5 from qemu
In 1dc8425e551, while converting to gvec, I added an extra range check
against the shift count. This was unnecessary because the encoding of
the shift count produces 0 to the element size - 1.
Backports commit 2f27c5244db300387f15d9ffa5067a204ffd625d from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 893ab0542aa385a287cbe46d5535c8b9e95ce699 from qemu
Create vectorized versions of handle_shri_with_rndacc
for shift+round and shift+round+accumulate. Add out-of-line
helpers in preparation for longer vector lengths from SVE.
Backports commit 6ccd48d4ea244c1c46a24dfa50bfb547f11422dd from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 631e565450c483e0622eec3d8b61d7fa41d16bca from qemu
Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
3-reg-same grouping to decodetree.
Backports commit 0de34fd48ad4e44bf5caa2330657ebefa93cea7d from qemu
Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
Note that for the logic ops the 'size' field forms part of their
decode and the actual operations are always bitwise.
Backports commit 35a548edb6f5043386183b9f6b4139d99d1f130a from qemu
Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
Note that we don't need the neon_3r_sizes[op] check here because all
size values are OK for VADD and VSUB; we'll add this when we convert
the first insn that has size restrictions.
For this we need one of the GVecGen*Fn typedefs currently in
translate-a64.h; move them all to translate.h as a block so they
are visible to the 32-bit decoder.
Backports commit a4e143ac5b9185f670d2f17ee9cc1a430047cb65 from qemu
Convert the Neon "load/store single structure to one lane" insns to
decodetree.
As this is the last set of insns in the neon load/store group,
we can remove the whole disas_neon_ls_insn() function.
Backports commit 123ce4e3daba26b760b472687e1fb1ad82cf1993 from qemu
Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
to decodetree. These are the last ones in the group so we can remove
all the legacy decode for the group.
Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.
Backports commit d27e82f7d02f35e5919bd9cbbcb157f3537069a0 from qemu
Convert the VFM[AS]L (vector) insns to decodetree. This is the last
insn in the legacy decoder for the 3same_ext group, so we can
delete the legacy decoder function for the group entirely.
Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.
Backports commit 9a107e7b8a3c87ab63ec830d3d60f319fc577ff7 from qemu
Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 Neon encodings. At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.
We follow the same pattern we did for the VFP decodetree conversion
(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
with Neon will be moving gradually out to translate-neon.vfp.inc,
which we #include into translate.c.
In order to share the decode files between A32 and T32, we
split Neon into 3 parts:
* data-processing
* load-store
* 'shared' encodings
The first two groups of instructions have similar but not identical
A32 and T32 encodings, so we need to manually transform the T32
encoding into the A32 one before calling the decoder; the third group
covers the Neon instructions which are identical in A32 and T32.
Backports commit 625e3dd44a15dfbe9532daa6454df3f86cf04d3e from qemu
We were accidentally permitting decode of Thumb Neon insns even if
the CPU didn't have the FEATURE_NEON bit set, because the feature
check was being done before the call to disas_neon_data_insn() and
disas_neon_ls_insn() in the Arm decoder but was omitted from the
Thumb decoder. Push the feature bit check down into the called
functions so it is done for both Arm and Thumb encodings.
Backports commit d1a6d3b594157425232a1ae5ea7f51b7a1c1aa2e from qemu
According to Arm ARM, VQDMULL is only valid when U=0, while having
U=1 is unallocated.
Backports commit ab553ef74ee52c0889679d0bd0da084aaf938f5c from qemu
These instructions are often used in glibc's string routines.
They were the final uses of the 32-bit at a time neon helpers.
Backports commit 6b375d3546b009d1e63e07397ec9c6af256e15e9 from qemu
Have the calls adjacent as an intermediate step toward
actually merging the decodes.
Backports commit f0f6d5c81be47d593e5ece7f06df6fba4c15738b from qemu
Now that we no longer have an early check for ARM_FEATURE_VFP,
we can use the proper ISA check in trans_VLLDM_VLSTM.
Backports commit dc778a6873f534817a13257be2acba3ca87ec015 from qemu
All remaining tests for VFP4 are for fused multiply-add insns.
Since the MVFR1 field is used for both VFP and NEON, move its adjustment
from the !has_neon block to the (!has_vfp && !has_neon) block.
Test for vfp of the appropraite width alongside the test for simdfmac
within translate-vfp.inc.c. Within disas_neon_data_insn, we have
already tested for ARM_FEATURE_NEON.
Backports commit c52881bbc22b50db99a6c37171ad3eea7d959ae6 from qemu
Many uses of ARM_FEATURE_VFP3 are testing for the number of simd
registers implemented. Use the proper test vs MVFR0.SIMDReg.
Backports commit a6627f5fc607939f7c8b9c3157fdcb2d368ba0ed from qemu
We still need two different helpers, since NEON and SVE2 get the
inputs from different locations within the source vector. However,
we can convert both to the same internal form for computation.
The sve2 helper is not used yet, but adding it with this patch
helps illustrate why the neon changes are helpful.
Backports commit e7e96fc5ec8c79dc77fef522d5226ac09f684ba5 from qemu
The gvec form will be needed for implementing SVE2.
Extend the implementation to operate on uint64_t instead of uint32_t.
Use a counted inner loop instead of terminating when op1 goes to zero,
looking toward the required implementation for ARMv8.4-DIT.
Backports commit a21bb78e5817be3f494922e1dadd6455fe5d6318 from qemu
These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift. This
requires several masks and selects in addition to the actual
shifts to form the complete answer.
That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.
Backports commit 87b74e8b6edd287ea2160caa0ebea725fa8f1ca1 from qemu
Enforce a convention that an isar_feature function that tests a
32-bit ID register always has _aa32_ in its name, and one that
tests a 64-bit ID register always has _aa64_ in its name.
We already follow this except for three cases: thumb_div,
arm_div and jazelle, which all need _aa32_ adding.
(As noted in the comment, isar_feature_aa32_fp16_arith()
is an exception in that it currently tests ID_AA64PFR0_EL1,
but will switch to MVFR1 once we've properly implemented
FP16 for AArch32.)
Backports commit 873b73c0c891ec20adacc7bd1ae789294334d675 from qemu
Split this helper out of msr_mask in translate.c. At the same time,
transform the negative reductive logic to positive accumulative logic.
It will be usable along the exception paths.
While touching msr_mask, fix up formatting.
Backports commit 4f9584ed4bba8a57a3cb2fa48a682725005d530a from qemu
To implement PAN, we will want to swap, for short periods
of time, to a different privileged mmu_idx. In addition,
we cannot do this with flushing alone, because the AT*
instructions have both PAN and PAN-less versions.
Add the ARMMMUIdx*_PAN constants where necessary next to
the corresponding ARMMMUIdx* constant.
Backports commit 452ef8cb8c7b06f44a30a3c3a54d3be82c4aef59 from qemu
Prepare for, but do not yet implement, the EL2&0 regime.
This involves adding the new MMUIdx enumerators and adjusting
some of the MMUIdx related predicates to match.
Backports commit b9f6033c1a5fb7da55ed353794db8ec064f78bb2 from qemu.
We had completely run out of TBFLAG bits.
Split A- and M-profile bits into two overlapping buckets.
This results in 4 free bits.
We used to initialize all of the a32 and m32 fields in DisasContext
by assignment, in arm_tr_init_disas_context. Now we only initialize
either the a32 or m32 by assignment, because the bits overlap in
tbflags. So zero the entire structure in gen_intermediate_code.
Backports commit 79cabf1f473ca6e9fa0727f64ed9c2a84a36f0aa from qemu
This is part of a reorganization to the set of mmu_idx.
The non-secure EL2 regime only has a single stage translation;
there is no point in pointing out that the idx is for stage1.
Backports commit e013b7411339342aac8d986c5d5e329e1baee8e1 from qemu
This is part of a reorganization to the set of mmu_idx.
The EL3 regime only has a single stage translation, and
is always secure.
Backports commit 127b2b086303296289099a6fb10bbc51077f1d53 from qemu
This is part of a reorganization to the set of mmu_idx.
This emphasizes that they apply to the Secure EL1&0 regime.
Backports commit fba37aedecb82506c62a1f9e81d066b4fd04e443 from qemu
This is part of a reorganization to the set of mmu_idx.
This emphasizes that they apply to the EL1&0 regime.
The ultimate goal is
-- Non-secure regimes:
ARMMMUIdx_E10_0,
ARMMMUIdx_E20_0,
ARMMMUIdx_E10_1,
ARMMMUIdx_E2,
ARMMMUIdx_E20_2,
-- Secure regimes:
ARMMMUIdx_SE10_0,
ARMMMUIdx_SE10_1,
ARMMMUIdx_SE3,
-- Helper mmu_idx for non-secure EL1&0 stage1 and stage2
ARMMMUIdx_Stage2,
ARMMMUIdx_Stage1_E0,
ARMMMUIdx_Stage1_E1,
The 'S' prefix is reserved for "Secure". Unless otherwise specified,
each mmu_idx represents all stages of translation.
Backports commit 01b98b686460b3a0fb47125882e4f8d4268ac1b6 from qemu