Convert the pairwise ops VPADDL and VPADAL in the 2-reg-misc grouping
to decodetree.
At this point we can get rid of the weird CPU_V001 #define that was
used to avoid having to explicitly list all the arguments being
passed to some TCG gen/helper functions.
Backports commit 6106af3aa2304fccee91a3a90138352b0c2af998 from qemu
Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)
Backports commit 9aaa23c2ae18e6fb9a291b81baf91341db76dfa0 from qemu
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.
Backports commit 54e96c744b70a5d19f14b212a579dd3be8fcaad9 from qemu
Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.
We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.
Backports commit 0aad761fb0aed40c99039eacac470cbd03d07019 from qemu
Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.
Backports commit 77e576a9281825fc170f3b3af83f47e110549b5c from qemu
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.
Backports commit 85ac9aef9a5418de3168df569e21258e853840a2 from qemu
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree. These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.
The refactoring removes some of the oddities of the old decoder:
* operands to the operation and accumulation were often
reversed (taking advantage of the fact that most of these ops
are commutative); the new code follows the pseudocode order
* the Q bit in the insn was in a local variable 'u'; in the
new code it is decoded into a->q
Backports commit 96fc80f5f186decd1a649f6c04252faceb057ad2 from qemu
In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
temporary in do_2shift_env_64(); free it.
Backports commit a4f67e180def790ff0bbb33fc93bb6e80382f041 from qemu
Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.
Backports commit 448f0e5f3ecfbd089b934e5e3aa0ccd1f51a6174 from qemu
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.
Backports commit 18fb58d588898550919392277787979ee7d0d84e from qemu
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.
These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.
Backports commit 9546ca5998d3cbd98a81b2d46a2e92a11b0f78a4 from qemu
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.
Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.
Backports commit 3a1d9eb07b767a7592abca642af80906f9eab0ed from qemu
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.
Backports commit f5b28401200ec95ba89552df3ecdcdc342f6b90b from qemu
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.
Backports commit 0fa1ab0302badabc3581aefcbb2f189ef52c4985 from qemu
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.
As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.
Backports commit b28be09570d0827969b62b8f82b0f720a9915427 from qemu
The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.
Backports commit 9593a3988c3e788790aa107d778386b09f456a6d from qemu
The miscellaneous control instructions are mutually exclusive
within the t32 decode sub-group.
Backports commit d6084fba47bb9aef79775c1102d4b647eb58c365 from qemu
Convert the insns in the one-register-and-immediate group to decodetree.
In the new decode, our asimd_imm_const() function returns a 64-bit value
rather than a 32-bit one, which means we don't need to treat cmode=14 op=1
as a special case in the decoder (it is the only encoding where the two
halves of the 64-bit value are different).
Backports commit 2c35a39eda0b16c2ed85c94cec204bf5efb97812 from qemu
Convert the VCVT fixed-point conversion operations in the
Neon 2-regs-and-shift group to decodetree.
Backports commit 3da26f11711caeaa18318b6afa14dfb81d7650ab from qemu
Convert the VSHLL and VMOVL insns from the 2-reg-shift group
to decodetree. Since the loop always has two passes, we unroll
it to avoid the awkward reassignment of one TCGv to another.
Backports commit 968bf842742a5ffbb0041cb31089e61a9f7a833d from qemu
Convert the VQSHLU and QVSHL 2-reg-shift insns to decodetree.
These are the last of the simple shift-by-immediate insns.
Backports commit 37bfce81b10450071193c8495a07f182ec652e2a from qemu
Convert the VSHR 2-reg-shift insns to decodetree.
Note that unlike the legacy decoder, we present the right shift
amount to the trans_ function as a positive integer.
Backports commit 66432d6b8294e3508218b360acfdf7c244eea993 from qemu
Convert the VSHL and VSLI insns from the Neon 2-registers-and-a-shift
group to decodetree.
Backports commit d3c8c736f8b4bdd02831076286b1788232f46ced from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit 43fa36c96c24349145497adc1b451f9caf74e344 from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit afc8b7d32668547308bdd654a63cf5228936e0ba from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit effa992f153f5e7ab97ab843b565690748c5b402 from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit aaffebd6d3135b8aed7e61932af53b004d261579 from qemu
With this conversion, we will be able to use the same helpers
with sve. This also fixes a bug in which we failed to clear
the high bits of the SVE register after an AdvSIMD operation.
Backports commit 1738860d7e60dec5dbeba17f8b44d31aae3accac from qemu
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.
This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.
Backports commit a04b68e1d4c4f0cd5cd7542697b1b230b84532f5 from qemu
Using the MSR instruction to write to CPSR.E is deprecated, but it is
required to work from any mode including unprivileged code. We were
incorrectly forbidding usermode code from writing it because
CPSR_USER did not include the CPSR_E bit.
We use CPSR_USER in only three places:
* as the mask of what to allow userspace MSR to write to CPSR
* when deciding what bits a linux-user signal-return should be
able to write from the sigcontext structure
* in target_user_copy_regs() when we set up the initial
registers for the linux-user process
In the first two cases not being able to update CPSR.E is a bug, and
in the third case it doesn't matter because CPSR.E is always 0 there.
So we can fix both bugs by adding CPSR_E to CPSR_USER.
Because the cpsr_write() in restore_sigcontext() is now changing
a CPSR bit which is cached in hflags, we need to add an
arm_rebuild_hflags() call there; the callsite in
target_user_copy_regs() was already rebuilding hflags for other
reasons.
(The recommended way to change CPSR.E is to use the 'SETEND'
instruction, which we do correctly allow from usermode code.)
Backports commit 268b1b3dfbb92a9348406f728a33f39e3d8dcd8a from qemu
Do not explicitly store zero to the NEON high part
when we can pass !is_q to clear_vec_high.
Backports commit e1f778596ebfa8782276f4dd4651f2b285d734ff from qemu
The 8-byte store for the end a !is_q operation can be
merged with the other stores. Use a no-op vector move
to trigger the expand_clr portion of tcg_gen_gvec_mov.
Backports commit 5c27392dd08bd8534893abf25ef501f1bd8680fe from qemu
Give the previously unnamed enum a typedef name. Use it in the
prototypes of compare functions. Use it to hold the results
of the compare functions.
Backports commit 71bfd65c5fcd72f8af2735905415c7ce4220f6dc from qemu
Give the previously unnamed enum a typedef name. Use the packed
attribute so that we do not affect the layout of the float_status
struct. Use it in the prototypes of relevant functions.
Adjust switch statements as necessary to avoid compiler warnings.
Backports commit 3dede407cc61b64997f0c30f6dbf4df09949abc9 from qemu
The existing f{32,64}_addsub_post test, which checks for zero
inputs, is identical to f{32,64}_mul_fast_test. Which means
we can eliminate the fast_test/fast_op hooks in favor of
reusing the same post hook.
This means we have one fewer test along the fast path for multiply.
Backports commit b240c9c497b9880ac0ba29465907d5ebecd48083 from qemu
Convert the Neon floating point VFMA and VFMS insn to decodetree.
These are the last insns in the 3-reg-same group so we can
remove all the support/loop code from the old decoder.
Backports commit e95485f85657be21135c17a9226e297c21e73360 from qemu
Convert the Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS 3-reg-same
insns to decodetree. (These are all the remaining non-accumulation
instructions in this group.)
Backports commit d5fdf9e9e1c6f2bbb0a4bcaafd85d344cce9c298 from qemu
The usual location for the env argument in the argument list of a TCG helper
is immediately after the return-value argument. recps_f32 and rsqrts_f32
differ in that they put it at the end.
Move the env argument to its usual place; this will allow us to
more easily use these helper functions with the gvec APIs.
Backports commit 26c6f695cfd2a3ccddb4d015a25b56f56aa62928 from qemu
Convert the Neon integer 3-reg-same compare insns VCGE, VCGT,
VCEQ, VACGE and VACGT to decodetree.
Backports commit 727ff1d63213e6666e511956903b9e97a339ec7e from qemu
Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
decodetree.
We don't have a gvec helper for multiply-accumulate, so VMLA and VMLS
need a loop function do_3same_fp(). This takes a reads_vd parameter
to do_3same_fp() which tells it to load the old value into vd before
calling the callback function, in the same way that the do_vfp_3op_sp()
and do_vfp_3op_dp() functions in translate-vfp.inc.c work. (The
only uses in this patch pass reads_vd == true, but later commits
will use reads_vd == false.)
This conversion fixes in passing an underdecoding for VMUL
Backports commit 8aa71ead912ca0a9c0d29b74e0976f91952f950a from qemu
Convert the Neon float VPMIN, VPMAX and VPADD 3-reg-same insns to
decodetree. These are the only remaining 'pairwise' operations,
so we can delete the pairwise-specific bits of the old decoder's
for-each-element loop now.
Backports commit ab978335a56e3618212868fdce3a54217c6e71e6 from qemu
Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
We already have gvec helpers for addition and subtraction, but must
add one for fabd.
Backports commit a26a352bb498662cd0c205cb433a352f86fac7d2 from qemu
Convert the Neon VQDMULH and VQRDMULH 3-reg-same insns to
decodetree. These are the last integer operations in the
3-reg-same group.
Backports commit 7ecc28bc72b8033cf4e0c6332135ec20d4125dfb from qemu
Convert the Neon integer VPADD 3-reg-same insns to decodetree. These
are 'pairwise' operations. (Note that VQRDMLAH, which shares the
same primary opcode but has U=1, has already been converted.)
Backports commit fa22827d4eb078b6c58cd3d19af0b50ed951e832 from qemu
Convert the Neon integer VPMAX and VPMIN 3-reg-same insns to
decodetree. These are 'pairwise' operations.
Backports commit 059c2398a2b1ae86c6722c45e79fb0d0f4d95b1d from qemu
Convert the VQSHL, VRSHL and VQRSHL insns in the 3-reg-same
group to decodetree. We have already implemented the size==0b11
case of these insns; this commit handles the remaining sizes
Backports commit 6812dfdc6b0286730d6f903ebfbdc4f81b80c29b from qemu
Convert the Neon VRHADD and VHSUB 3-reg-same insns to decodetree.
(These are all the other insns in 3-reg-same which were using
GEN_NEON_INTEGER_OP() and which are not pairwise or
reversed-operands.)
Backports commit 8e44d03f4b5590e19a4f7910ca1c327609933dd7 from qemu
Convert the 64-bit element insns in the 3-reg-same group
to decodetree. This covers VQSHL, VRSHL and VQRSHL where
size==0b11.
Backports commit 35d4352fa9e94b35bf17f58181cb16c184b98d56 from qemu
Convert the Neon VQRDMLAH and VQRDMLSH insns in the 3-reg-same group
to decodetree. These don't use do_3same() because they want to
operate on VFP double registers, whose offsets are different from the
neon_reg_offset() calculations do_3same does.
Backports commit a063569508af8295cf6271e06700e5b956bb402d from qemu
Pass a pointer directly to env->vfp.qc[0], rather than env.
This will allow SVE2, which does not modify QC, to pass a
pointer to dummy storage.
Change the return type of inl_qrdml.h_s16 to match the
sense of the operation: signed.
Backports commit e286bf4a72fe3a60490b8d6e3f28d6335677e08c from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit 146aa66ce58b686b8037d0eb3921c1125942dbde from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit c7715b6b51a6f7a5412c5fcb40a4c8586105e597 from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit 8161b75357095fef54c76b1a6ed1e54d0e8655e0 from qemu
Rather than perform the argument swap during code generation,
perform it during decode. This means it doesn't have to be
special cased later, and we can share code with aarch64 code
generation. Hopefully the decode comment addresses any confusion
that might arise in between.
Backports commit e9eee5316ffec5f37643de806b2e5577c5c189cf from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Backports commit 271063206a46062a45fc6bab8dabe45f0b88159d from qemu
Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.
Macro-ize the 5 nearly identical comparisons.
Backports commit 69d5e2bf8c3cefedbfa1c1670137e636dbd7faa5 from qemu
Now that we've converted all cases to gvec, there is quite a bit
of dead code at the end of the function. Remove it.
Sink the call to gen_gvec_fn2i to the end, loading a function
pointer within the switch statement.
Backports commit 3f08f0bce841e7857ec98ce7909629d0c335005e from qemu
In 1dc8425e551, while converting to gvec, I added an extra range check
against the shift count. This was unnecessary because the encoding of
the shift count produces 0 to the element size - 1.
Backports commit 2f27c5244db300387f15d9ffa5067a204ffd625d from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 893ab0542aa385a287cbe46d5535c8b9e95ce699 from qemu
Create vectorized versions of handle_shri_with_rndacc
for shift+round and shift+round+accumulate. Add out-of-line
helpers in preparation for longer vector lengths from SVE.
Backports commit 6ccd48d4ea244c1c46a24dfa50bfb547f11422dd from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 631e565450c483e0622eec3d8b61d7fa41d16bca from qemu
DUP (indexed) can duplicate 128-bit elements, so using esz
unconditionally can assert in tcg_gen_gvec_dup_imm.
Fixes: 8711e71f9cbb
Backports commit 7e17d50ebd359ee5fa3d65d7fdc0fe0336d60694 from qemu
Now that we can pass 7 parameters, do not encode register
operands within simd_data.
Backports commit 08975da9f0bfcfa654628cae71201a351ba5449a from qemu
Move the common set_feature() and unset_feature() functions
from cpu.c and cpu64.c to cpu.h.
Backports commit 5fda95041d7237ab35733ceb66e0cb89f6107169 from qemu
Since on the aarch64-linux-user build, arm_cpus[] is empty, add
the cpu_count variable and only iterate when it is non-zero.
Backports commit 92b6a659388ab3735e5fbb17ac486923b681f57f from qemu
Calling access_el3_aa32ns() works for AArch32 only cores
but it does not handle 32-bit EL2 on top of 64-bit EL3
for mixed 32/64-bit cores.
Merge access_el3_aa32ns_aa64any() into access_el3_aa32ns()
and only use the latter.
Fixes: 68e9c2fe65 ("target-arm: Add VTCR_EL2")
Backports commit 93dd1e6140e2652347cfe7208591d4cd32762d08 from qemu
We're going to want at least some of the NeonGen* typedefs
for the refactored 32-bit Neon decoder, so move them all
to translate.h since it makes more sense to keep them in
one group.
Backports commit 9aefc6cf9b73f66062d2f914a0136756e7a28211 from qemu
Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
3-reg-same grouping to decodetree.
Backports commit 0de34fd48ad4e44bf5caa2330657ebefa93cea7d from qemu
Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
Note that for the logic ops the 'size' field forms part of their
decode and the actual operations are always bitwise.
Backports commit 35a548edb6f5043386183b9f6b4139d99d1f130a from qemu
Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
Note that we don't need the neon_3r_sizes[op] check here because all
size values are OK for VADD and VSUB; we'll add this when we convert
the first insn that has size restrictions.
For this we need one of the GVecGen*Fn typedefs currently in
translate-a64.h; move them all to translate.h as a block so they
are visible to the 32-bit decoder.
Backports commit a4e143ac5b9185f670d2f17ee9cc1a430047cb65 from qemu
Convert the Neon "load/store single structure to one lane" insns to
decodetree.
As this is the last set of insns in the neon load/store group,
we can remove the whole disas_neon_ls_insn() function.
Backports commit 123ce4e3daba26b760b472687e1fb1ad82cf1993 from qemu
Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
to decodetree. These are the last ones in the group so we can remove
all the legacy decode for the group.
Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.
Backports commit d27e82f7d02f35e5919bd9cbbcb157f3537069a0 from qemu
Convert the VFM[AS]L (vector) insns to decodetree. This is the last
insn in the legacy decoder for the 3same_ext group, so we can
delete the legacy decoder function for the group entirely.
Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.
Backports commit 9a107e7b8a3c87ab63ec830d3d60f319fc577ff7 from qemu
Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 Neon encodings. At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.
We follow the same pattern we did for the VFP decodetree conversion
(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
with Neon will be moving gradually out to translate-neon.vfp.inc,
which we #include into translate.c.
In order to share the decode files between A32 and T32, we
split Neon into 3 parts:
* data-processing
* load-store
* 'shared' encodings
The first two groups of instructions have similar but not identical
A32 and T32 encodings, so we need to manually transform the T32
encoding into the A32 one before calling the decoder; the third group
covers the Neon instructions which are identical in A32 and T32.
Backports commit 625e3dd44a15dfbe9532daa6454df3f86cf04d3e from qemu
We were accidentally permitting decode of Thumb Neon insns even if
the CPU didn't have the FEATURE_NEON bit set, because the feature
check was being done before the call to disas_neon_data_insn() and
disas_neon_ls_insn() in the Arm decoder but was omitted from the
Thumb decoder. Push the feature bit check down into the called
functions so it is done for both Arm and Thumb encodings.
Backports commit d1a6d3b594157425232a1ae5ea7f51b7a1c1aa2e from qemu
Somewhere along theline we accidentally added a duplicate
"using D16-D31 when they don't exist" check to do_vfm_dp()
(probably an artifact of a patchseries rebase). Remove it.
Backports commit 0d787cf1f3c88fa29477e054f8523f6d82d91c98 from qemu
MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
Represent it in QEMU's ARMCPU struct with a uint64_t, not a
uint32_t.
This fixes an error when compiling with -Werror=conversion
because we were manipulating the register value using a
local uint64_t variable:
target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
628 | cpu->midr = t;
| ^
and future-proofs us against a possible future architecture
change using some of the top 32 bits.
Backports commit e544f80030121040c8932ff1bd4006f390266c0f from qemu
In aarch64_max_initfn() we update both 32-bit and 64-bit ID
registers. The intended pattern is that for 64-bit ID registers we
use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
registers use FIELD_DP32 and the uint32_t 'u' register. For
ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
this 64-bit ID register would end up always zero. Luckily at the
moment that's what they should be anyway, so this bug has no visible
effects.
Use the right-sized variable.
Backports commit 5a89dd2385a193aa954a7c9bf4e381f2ba6ae359 from qemu
The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
translation table descriptors from just bit [54] to bits [54:53],
allowing stage 2 to control execution permissions separately for EL0
and EL1. Implement the new semantics of the XN field and enable
the feature for our 'max' CPU.
Backports commit ce3125bed935a12e619a8253c19340ecaa899347 from qemu
For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
whether the stage 1 access is for EL0 or not, because whether
exec permission is given can depend on whether this is an EL0
or EL1 access. Add a new argument to get_phys_addr_lpae() so
the call sites can pass this information in.
Since get_phys_addr_lpae() doesn't already have a doc comment,
add one so we have a place to put the documentation of the
semantics of the new s1_is_el0 argument.
Backports commit ff7de2fc2c994030bfb83af9ddc9a3cd70ce3e88 from qemu
The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
call it in S1_ptw_translate().
Backports commit 59dff859cd850876df2cfa561c7bcfc4bdda4599 from qemu
We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
TLB. However we never actually use the TLB -- all stage 2 lookups
are done by direct calls to get_phys_addr_lpae() followed by a
physical address load via address_space_ld*().
Remove Stage2 from the list of ARM MMU indexes which correspond to
real core MMU indexes, and instead put it in the set of "NOTLB" ARM
MMU indexes.
This allows us to drop NB_MMU_MODES to 11. It also means we can
safely add support for the ARMv8.3-TTS2UXN extension, which adds
permission bits to the stage 2 descriptors which define execute
permission separatel for EL0 and EL1; supporting that while keeping
Stage2 in a QEMU TLB would require us to use separate TLBs for
"Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
lot of extra complication given we aren't even using the QEMU TLB.
In the process of updating the comment on our MMU index use,
fix a couple of other minor errors:
* NS EL2 EL2&0 was missing from the list in the comment
* some text hadn't been updated from when we bumped NB_MMU_MODES
above 8
Backports commit bf05340cb655637451162c02dadcd6581a05c02c from qemu
According to Arm ARM, VQDMULL is only valid when U=0, while having
U=1 is unallocated.
Backports commit ab553ef74ee52c0889679d0bd0da084aaf938f5c from qemu
We will move this code in the next commit. Clean it up
first to avoid checkpatch.pl errors.
Backports commit 51c510aa5876a681cd0059ed3bacaa17590dc2d5 from qemu
Make cpu_register() (renamed to arm_cpu_register()) available
from internals.h so we can register CPUs also from other files
in the future.
Backports commit 37bcf244454f4efb82e2c0c64bbd7eabcc165a0c from qemu
Under KVM these registers are written by the hardware.
Restrict the writefn handlers to TCG to avoid when building
without TCG:
LINK aarch64-softmmu/qemu-system-aarch64
target/arm/helper.o: In function `do_ats_write':
target/arm/helper.c:3524: undefined reference to `raise_exception'
Backports commit 9fb005b02dbda7f47b789b7f19bf5f73622a4756 from qemu
These instructions are often used in glibc's string routines.
They were the final uses of the 32-bit at a time neon helpers.
Backports commit 6b375d3546b009d1e63e07397ec9c6af256e15e9 from qemu
In commit 41a4bf1feab098da4cd the added code to set the CNP
field in ID_MMFR4 for the AArch64 'max' CPU had a typo
where it used the wrong variable name, resulting in ID_MMFR4
fields AC2, XNX and LSM being wrong. Fix the typo.
Fixes: 41a4bf1feab098da4cd
Backports commit e73c4443473107ddf11ad3a7fea5bef2001ee802 from qemu
An old comment in get_phys_addr_lpae() claims that the code does not
support the different format TCR for VTCR_EL2. This used to be true
but it is not true now (in particular the aa64_va_parameters() and
aa32_va_parameters() functions correctly handle the different
register format by checking whether the mmu_idx is Stage2).
Remove the out of date parts of the comment.
Backports commit 07d1be3b3aac20c21ac4a95c7f3f01a3622a31a3 from qemu
Our implementation of the PSTATE.PAN bit incorrectly cleared all
access permission bits for privileged access to memory which is
user-accessible. It should only affect the privileged read and write
permissions; execute permission is dealt with via XN/PXN instead.
Fixes: 81636b70c226dc27d7ebc8d
Backports commit f4e1dbc578a051db08a40c05276ebf525b98f949 from qemu
The arm_current_el() should be invoked after mode switching. Otherwise, we
get a wrong current EL value, since current EL is also determined by
current mode.
Fixes: 4a2696c0d4 ("target/arm: Set PAN bit as required on exception entry")
Backports commit 88828bf133b64b7a860c166af3423ef1a47c5d3b from qemu
Coverity reports a BAD_SHIFT with ctz32(imm5), with imm5 == 0.
This is an invalid encoding, but we diagnose that just below
by rejecting size > 3. Avoid the warning by sinking the
computation of index below the check.
Backports commit 550a04893c2bd4442211b353680b9a6408d94dba from qemu
Coverity raised a shed-load of errors cascading from inferring
that clz32(immh) might yield 32, from immh might be 0.
While immh cannot be 0 from encoding, it is not obvious even to
a human how we've checked that: via the filtering provided by
data_proc_simd[].
Backports commit 3944d58db3fc5bf131345a21a44013bc13849a12 from qemu
Coverity rightly notes that ctz32(bas) on 0 will return 32,
which makes the len calculation a BAD_SHIFT.
A value of 0 in DBGWCR<n>_EL1.BAS is reserved. Simply move
the existing check we have for this case
Backports commit ae1111d4def40c6f592c3a307c599272b778eb65 from qemu
For system emulation we need to check the state of the GIC before we
report the value. However this isn't relevant to exporting of the
value to linux-user and indeed breaks the exported value as set by
modify_arm_cp_regs.
Backports commit 976b99b6ec2e15cd7c36d72fdb9b60c37c5494f8 from qemu
We must include the tag in the FAR_ELx register when raising
an addressing exception. Which means that we should not clear
out the tag during translation.
We cannot at present comply with this for user mode, so we
retain the clean_data_tbi function for the moment, though it
no longer does what it says on the tin for system mode. This
function is to be replaced with MTE, so don't worry about the
slight misnaming.
Buglink: https://bugs.launchpad.net/qemu/+bug/1867072
Backports commit 38d931687fa196a7ef860f8583815abc7fd5521a from qemu
This data access was forgotten when we added support for cleaning
addresses of TBI information.
Fixes: 3a471103ac1823ba
Backports commit 597d61a3b1f94c53a3aaa77671697c0c5f797dbf from qemu.
The function does not write registers, and only reads them by
implication via the exception path.
Backports commit 1371b02c5a060e423e70560dbca769b54e471ba9 from qemu
This is an aarch64-only function. Move it out of the shared file.
This patch is code movement only.
Backports commit 7b182eb2467af6c47c9c77c64bbbeed8ed53c330 from qemu
If by context we know that we're in AArch64 mode, we need not
test for M-profile when reconstructing the full ARMMMUIdx.
Backports commit 20dc67c947a691fa9df05e76aec6df50204b4b94 from qemu
Replicate the single TBI bit from TCR_EL2 and TCR_EL3 so that
we can unconditionally use pointer bit 55 to index into our
composite TBI1:TBI0 field.
Backports commit 3e270f67f0f05277021763af119a6ce195f8ed51 from qemu
This bit traps EL1 access to cache maintenance insns that operate
to the point of unification. There are no longer any references to
plain aa64_cacheop_access, so remove it.
Backports commit 38262d8a732f8bd0e9ca3dc064f6e73d00c08b9a from qemu
This bit traps EL1 access to cache maintenance insns that operate
to the point of coherency or persistence.
Backports commit 1bed4d2e55459129c19f5952bcfc65bd0c70db5b from qemu
Update the {TGE,E2H} == '11' masking to ARMv8.6.
If EL2 is configured for aarch32, disable all of
the bits that are RES0 in aarch32 mode.
Backports commit 4990e1d3c128580dd2fa0bbb1a42b6d63ba1ac28 from qemu
Don't merely start with v8.0, handle v7VE as well. Ensure that writes
from aarch32 mode do not change bits in the other half of the register.
Protect reads of aa64 id registers with ARM_FEATURE_AARCH64.
Backports commit d1fb4da208411ce7b3dafb9f9e7726ebcec14edb from qemu
The ARMv8.2-TTCNP extension allows an implementation to optimize by
sharing TLB entries between multiple cores, provided that software
declares that it's ready to deal with this by setting a CnP bit in
the TTBRn_ELx. It is mandatory from ARMv8.2 onward.
For QEMU's TLB implementation, sharing TLB entries between different
cores would not really benefit us and would be a lot of work to
implement. So we implement this extension in the "trivial" manner:
we allow the guest to set and read back the CnP bit, but don't change
our behaviour (this is an architecturally valid implementation
choice).
The only code path which looks at the TTBRn_ELx values for the
long-descriptor format where the CnP bit is defined is already doing
enough masking to not get confused when the CnP bit at the bottom of
the register is set, so we can simply add a comment noting why we're
relying on that mask.
Backports commit 41a4bf1feab098da4cd5495cd56a99b0339e2275 from qemu
The ARMv8.3-CCIDX extension makes the CCSIDR_EL1 system ID registers
have a format that uses the full 64 bit width of the register, and
adds a new CCSIDR2 register so AArch32 can get at the high 32 bits.
QEMU doesn't implement caches, so we just treat these ID registers as
opaque values that are set to the correct constant values for each
CPU. The only thing we need to do is allow 64-bit values in our
cssidr[] array and provide the CCSIDR2 accessors.
We don't set the CCIDX field in our 'max' CPU because the CCSIDR
constant values we use are the same as the ones used by the
Cortex-A57 and they are in the old 32-bit format. This means
that the extra regdef added here is unused currently, but it
means that whenever in the future we add a CPU that does need
the new 64-bit format it will just work when we set the cssidr
values and the ID registers for it.
Backports commit 957e615503bd0de22393fd8dbcb22a5064fd2b5c from qemu
The v8.4-RCPC extension implements some new instructions:
* LDAPUR, LDAPURB, LDAPURH, LDAPRSB, LDAPRSH, LDAPRSW
* STLUR, STLURB, STLURH
These are all in a new subgroup of encodings that sits below the
top-level "Loads and Stores" group in the Arm ARM.
The STLUR* instructions have standard store-release semantics; the
LDAPUR* have Load-AcquirePC semantics, but (as with LDAPR*) we choose
to implement them as the slightly stronger Load-Acquire.
Backports commit a1229109dec4375259d3fff99f362405aab7917a from qemu
The v8.3-RCPC extension implements three new load instructions
which provide slightly weaker consistency guarantees than the
existing load-acquire operations. For QEMU we choose to simply
implement them with a full LDAQ barrier.
Backports commit 2677cf9f92a5319bb995927f9225940414ce879d from qemu
We missed an instance of using FIELD_EX32 on a 64-bit ID
register, in isar_feature_aa64_pmu_8_4(). Fix it.
Backports commit 54117b90ffd8a3977917971c3bd99bb5242710d9 from qemu.
Passing the raw op field from the manual is less instructive
than it might be. Do the full decode and use the existing
helpers to perform the expansion.
Since these are v8 insns, VECLEN+VECSTRIDE are already RES0.
Backports commit f2eafb75511e5d2ee601b43dc6ee0bcc6e453acd from qemu
Passing the raw o1 and o2 fields from the manual is less
instructive than it might be. Do the full decode and let
the trans_* functions pass in booleans to a helper.
Backports commit d486f8308a13543bbcc4887f246e856df991a4bc from qemu
Those vfp instructions without extra opcode fields can
share a common @format for brevity.
Backports commit 906b60facc3d3dd3af56cb1a7860175d805e10a3 from qemu
Have the calls adjacent as an intermediate step toward
actually merging the decodes.
Backports commit f0f6d5c81be47d593e5ece7f06df6fba4c15738b from qemu
Now that we no longer have an early check for ARM_FEATURE_VFP,
we can use the proper ISA check in trans_VLLDM_VLSTM.
Backports commit dc778a6873f534817a13257be2acba3ca87ec015 from qemu
All remaining tests for VFP4 are for fused multiply-add insns.
Since the MVFR1 field is used for both VFP and NEON, move its adjustment
from the !has_neon block to the (!has_vfp && !has_neon) block.
Test for vfp of the appropraite width alongside the test for simdfmac
within translate-vfp.inc.c. Within disas_neon_data_insn, we have
already tested for ARM_FEATURE_NEON.
Backports commit c52881bbc22b50db99a6c37171ad3eea7d959ae6 from qemu
We will eventually remove the early ARM_FEATURE_VFP test,
so add a proper test for each trans_* that does not already
have another ISA test.
Backports commit 82f6abe16b9b951180657c5fe15942d5214aa12e from qemu
Sort this check to the start of a trans_* function.
Merge this with any existing test for fpdp_v2.
Backports commit 84774cc37f2c17e48a4867a8e8e055deb23bea69 from qemu
Shuffle the order of the checks so that we test the ISA
before we test anything else, such as the register arguments.
Backports commit 799449abda137153a0e68b8788d8e1486f389490 from qemu
We cannot easily create "any" functions for these, because the
ID_AA64PFR0 fields for FP and SIMD signal "enabled" with zero.
Which means that an aarch32-only cpu will return incorrect results
when testing the aarch64 registers.
To use these, we must either have context or additionally test
vs ARM_FEATURE_AARCH64.
Backports commit 7d63183ff1a61b3f7934dc9b40b10e4fd5e100cd from qemu
The old name, isar_feature_aa32_fpdp, does not reflect
that the test includes VFPv2. We will introduce another
feature tests for VFPv3.
Backports commit c4ff873583834c8275586914fff714e3ae65dee4 from qemu
Use this in the places that were checking ARM_FEATURE_VFP, and
are obviously testing for the existance of the register set
as opposed to testing for some particular instruction extension.
Backports commit 7fbc6a403a0aab834e764fa61d81ed8586cfe352 from qemu
We had set this for aarch32-only in arm_max_initfn, but
failed to set the same bit for aarch64.
Backports commit dac65ba1d7945c5d58ab63d8769103634adb2b01 from qemu
We are going to convert FEATURE tests to ISAR tests,
so FPSP needs to be set for these cpus, like we have
already for FPDP.
Backports commit 9eb4f58918a851fb46895fd9b7ce579afeac9d02 from qemu
Many uses of ARM_FEATURE_VFP3 are testing for the number of simd
registers implemented. Use the proper test vs MVFR0.SIMDReg.
Backports commit a6627f5fc607939f7c8b9c3157fdcb2d368ba0ed from qemu
The old name, isar_feature_aa32_fp_d32, does not reflect
the MVFR0 field name, SIMDReg.
Backports commit 0e13ba7889432c5e2f1bdb1b25e7076ca1b1dcba from qemu
We still need two different helpers, since NEON and SVE2 get the
inputs from different locations within the source vector. However,
we can convert both to the same internal form for computation.
The sve2 helper is not used yet, but adding it with this patch
helps illustrate why the neon changes are helpful.
Backports commit e7e96fc5ec8c79dc77fef522d5226ac09f684ba5 from qemu
The gvec form will be needed for implementing SVE2.
Extend the implementation to operate on uint64_t instead of uint32_t.
Use a counted inner loop instead of terminating when op1 goes to zero,
looking toward the required implementation for ARMv8.4-DIT.
Backports commit a21bb78e5817be3f494922e1dadd6455fe5d6318 from qemu
These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift. This
requires several masks and selects in addition to the actual
shifts to form the complete answer.
That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.
Backports commit 87b74e8b6edd287ea2160caa0ebea725fa8f1ca1 from qemu
The ACTLR2 and HACTLR2 AArch32 system registers didn't exist in ARMv7
or the original ARMv8. They were later added as optional registers,
whose presence is signaled by the ID_MMFR4.AC2 field. From ARMv8.2
they are mandatory (ie ID_MMFR4.AC2 must be non-zero).
We implemented HACTLR2 in commit 0e0456ab8895a5e85, but we
incorrectly made it exist for all v8 CPUs, and we didn't implement
ACTLR2 at all.
Sort this out by implementing both registers only when they are
supposed to exist, and setting the ID_MMFR4 bit for -cpu max.
Note that this removes HACTLR2 from our Cortex-A53, -A47 and -A72
CPU models; this is correct, because those CPUs do not implement
this register.
Fixes: 0e0456ab8895a5e85
Backports commit f6287c24c66d6b9187c1c2887e1c7cfa4d304b0c from qemu
Cut-and-paste errors mean we're using FIELD_EX64() to extract fields from
some 32-bit ID register fields. Use FIELD_EX32() instead. (This makes
no difference in behaviour, it's just more consistent.)
Backports commit b3a816f6ce1ec184ab6072f50bbe4479fc5116c3 from qemu
Now we have moved ID_MMFR4 into the ARMISARegisters struct, we
can define and use an isar_feature for the presence of the
ARMv8.2-AA32HPD feature, rather than open-coding the test.
While we're here, correct a comment typo which missed an 'A'
from the feature name.
Backports commit 4036b7d1cd9fb1097a5f4bc24d7d31744256260f from qemu
The isar_feature_aa32_pan and isar_feature_aa32_ats1e1 functions
are supposed to be testing fields in ID_MMFR3; but a cut-and-paste
error meant we were looking at MVFR0 instead.
Fix the functions to look at the right register; this requires
us to move at least id_mmfr3 to the ARMISARegisters struct; we
choose to move all the ID_MMFRn registers for consistency.
Backports commit 10054016eda1b13bdd8340d100fd029cc8b58f36 from qemu
The LC bit in the PMCR_EL0 register is supposed to be:
* read/write
* RES1 on an AArch64-only implementation
* an architecturally UNKNOWN value on reset
(and use of LC==0 by software is deprecated).
We were implementing it incorrectly as read-only always zero,
though we do have all the code needed to test it and behave
accordingly.
Instead make it a read-write bit which resets to 1 always, which
satisfies all the architectural requirements above.
Backports commit 62d96ff48510f4bf648ad12f5d3a5507227b026f from qemu
The PMCR_EL0.DP bit is bit 5, which is 0x20, not 0x10. 0x10 is 'X'.
Correct our #define of PMCRDP and add the missing PMCRX.
We do have the correct behaviour for handling the DP bit being
set, so this fixes a guest-visible bug.
Fixes: 033614c47de
Backports commit a1ed04dd79aabb9dbeeb5fa7d49f1a3de0357553 from qemu
Set the ID register bits to provide ARMv8.4-PMU (and implicitly
also ARMv8.1-PMU) in the 'max' CPU.
Backports commit 3bec78447a958d4819911252e056f29740ac25e4 from qemu
The ARMv8.4-PMU extension adds:
* one new required event, STALL
* one new system register PMMIR_EL1
(There are also some more L1-cache related events, but since
we don't implement any cache we don't provide these, in the
same way we don't provide the base-PMUv3 cache events.)
The STALL event "counts every attributable cycle on which no
attributable instruction or operation was sent for execution on this
PE". QEMU doesn't stall in this sense, so this is another
always-reads-zero event.
The PMMIR_EL1 register is a read-only register providing
implementation-specific information about the PMU; currently it has
only one field, SLOTS, which defines behaviour of the STALL_SLOT PMU
event. Since QEMU doesn't implement the STALL_SLOT event, we can
validly make the register read zero.
Backports commit 15dd1ebda4a6ef928d484c5a4f48b8ccb7438bb2 from qemu
The ARMv8.1-PMU extension requires:
* the evtCount field in PMETYPER<n>_EL0 is 16 bits, not 10
* MDCR_EL2.HPMD allows event counting to be disabled at EL2
* two new required events, STALL_FRONTEND and STALL_BACKEND
* ID register bits in ID_AA64DFR0_EL1 and ID_DFR0
We already implement the 16-bit evtCount field and the
HPMD bit, so all that is missing is the two new events:
STALL_FRONTEND
"counts every cycle counted by the CPU_CYCLES event on which no
operation was issued because there are no operations available
to issue to this PE from the frontend"
STALL_BACKEND
"counts every cycle counted by the CPU_CYCLES event on which no
operation was issued because the backend is unable to accept
any available operations from the frontend"
QEMU never stalls in this sense, so our implementation is trivial:
always return a zero count.
Backports commit 0727f63b1ecf765ebc48266f616f8fc362dc7fbc from qemu
We're going to want to read the DBGDIDR register from KVM in
a subsequent commit, which means it needs to be in the
ARMISARegisters sub-struct. Move it.
Backports commit 4426d3617d64922d97b74ed22e67e33b6fb7de0a from qemu
The AArch32 DBGDIDR defines properties like the number of
breakpoints, watchpoints and context-matching comparators. On an
AArch64 CPU, the register may not even exist if AArch32 is not
supported at EL1.
Currently we hard-code use of DBGDIDR to identify the number of
breakpoints etc; this works for all our TCG CPUs, but will break if
we ever add an AArch64-only CPU. We also have an assert() that the
AArch32 and AArch64 registers match, which currently works only by
luck for KVM because we don't populate either of these ID registers
from the KVM vCPU and so they are both zero.
Clean this up so we have functions for finding the number
of breakpoints, watchpoints and context comparators which look
in the appropriate ID register.
This allows us to drop the "check that AArch64 and AArch32 agree
on the number of breakpoints etc" asserts:
* we no longer look at the AArch32 versions unless that's the
right place to be looking
* it's valid to have a CPU (eg AArch64-only) where they don't match
* we shouldn't have been asserting the validity of ID registers
in a codepath used with KVM anyway
Backports commit 88ce6c6ee85d902f59dc65afc3ca86b34f02b9ed from qemu
Add the 64-bit version of the "is this a v8.1 PMUv3?"
ID register check function, and the _any_ version that
checks for either AArch32 or AArch64 support. We'll use
this in a later commit.
We don't (yet) do any isar_feature checks on ID_AA64DFR1_EL1,
but we move id_aa64dfr1 into the ARMISARegisters struct with
id_aa64dfr0, for consistency.
Backports commit 2a609df87d9b886fd38a190a754dbc241ff707e8 from qemu
Instead of open-coding a check on the ID_DFR0 PerfMon ID register
field, create a standardly-named isar_feature for "does AArch32 have
a v8.1 PMUv3" and use it.
This entails moving the id_dfr0 field into the ARMISARegisters struct.
Backports commit a617953855b65a602d36364b9643f7e5bc31288e from qemu
We already define FIELD macros for ID_DFR0, so use them in the
one place where we're doing direct bit value manipulation.
Backports commit d52c061e541982a3663ad5c65bd3b518dbe85b87 from qemu
Add FIELD() definitions for the ID_AA64DFR0_EL1 and use them
where we currently have hard-coded bit values.
Backports commit ceb2744b47a1ef4184dca56a158eb3156b6eba36 from qemu
Pull the code that defines the various PMU registers out
into its own function, matching the pattern we have
already for the debug registers.
Apart from one style fix to a multi-line comment, this
is purely movement of code with no changes to it.
Backports commit 24183fb6f00ecca8b508e245c95ff50ddde3f18b from qemu
Instead of open-coding "ARM_FEATURE_AARCH64 ? aa64_predinv: aa32_predinv",
define and use an any_predinv isar_feature test function.
Backports commit 22e570730d15374453baa73ff2a699e01ef4e950 from qemu
Our current usage of the isar_feature feature tests almost always
uses an _aa32_ test when the code path is known to be AArch32
specific and an _aa64_ test when the code path is known to be
AArch64 specific. There is just one exception: in the vfp_set_fpscr
helper we check aa64_fp16 to determine whether the FZ16 bit in
the FP(S)CR exists, but this code is also used for AArch32.
There are other places in future where we're likely to want
a general "does this feature exist for either AArch32 or
AArch64" check (typically where architecturally the feature exists
for both CPU states if it exists at all, but the CPU might be
AArch32-only or AArch64-only, and so only have one set of ID
registers).
Introduce a new category of isar_feature_* functions:
isar_feature_any_foo() should be tested when what we want to
know is "does this feature exist for either AArch32 or AArch64",
and always returns the logical OR of isar_feature_aa32_foo()
and isar_feature_aa64_foo().
Backports commit 6e61f8391cc6cb0846d4bf078dbd935c2aeebff5 from qemu
In take_aarch32_exception(), we know we are dealing with a CPU that
has AArch32, so the right isar_feature test is aa32_pan, not aa64_pan.
Backports commit f8af1143ef93954e77cf59e09b5e004dafbd64fd from qemu
Enforce a convention that an isar_feature function that tests a
32-bit ID register always has _aa32_ in its name, and one that
tests a 64-bit ID register always has _aa64_ in its name.
We already follow this except for three cases: thumb_div,
arm_div and jazelle, which all need _aa32_ adding.
(As noted in the comment, isar_feature_aa32_fp16_arith()
is an exception in that it currently tests ID_AA64PFR0_EL1,
but will switch to MVFR1 once we've properly implemented
FP16 for AArch32.)
Backports commit 873b73c0c891ec20adacc7bd1ae789294334d675 from qemu
For the purpose of rebuild_hflags_a64, we do not need to compute
all of the va parameters, only tbi. Moreover, we can compute them
in a form that is more useful to storing in hflags.
This eliminates the need for aa64_va_parameter_both, so fold that
in to aa64_va_parameter. The remaining calls to aa64_va_parameter
are in get_phys_addr_lpae and in pauth_helper.c.
This reduces the total cpu consumption of aa64_va_parameter in a
kernel boot plus a kvm guest kernel boot from 3% to 0.5%.
Backports commit b830a5ee82e66f54697dcc6450fe9239b7412d13 from qemu
Now that aa64_va_parameters_both sets select based on the number
of ranges in the regime, the ttbr1_valid check is redundant.
Backports commit 03f27724dff15633911e68a3906c30f57938ea45 from qemu
The psuedocode in aarch64/functions/pac/auth/Auth and
aarch64/functions/pac/strip/Strip always uses bit 55 for
extfield and do not consider if the current regime has 2 ranges.
Backports commit 7eeb4c2ce8dc0a5655526f3f39bd5d6cc02efb39 from qemu
The ARMv8.1-VMID16 extension extends the VMID from 8 bits to 16 bits:
* the ID_AA64MMFR1_EL1.VMIDBits field specifies whether the VMID is
8 or 16 bits
* the VMID field in VTTBR_EL2 is extended to 16 bits
* VTCR_EL2.VS lets the guest specify whether to use the full 16 bits,
or use the backwards-compatible 8 bits
For QEMU implementing this is trivial:
* we do not track VMIDs in TLB entries, so we never use the VMID field
* we treat any write to VTTBR_EL2, not just a change to the VMID field
bits, as a "possible VMID change" that causes us to throw away TLB
entries, so that code doesn't need changing
* we allow the guest to read/write the VTCR_EL2.VS bit already
So all that's missing is the ID register part: report that we support
VMID16 in our 'max' CPU.
Backports commit dc7a88d0810ad272bdcd2e0869359af78fdd9114 from qemu
Add definitions for all of the fields, up to ARMv8.5.
Convert the existing RESERVED register to a full register.
Query KVM for the value of the register for the host.
Backports commit 64761e10af2742a916c08271828890274137b9e8 from qemu
This is a minor enhancement over ARMv8.1-PAN.
The *_PAN mmu_idx are used with the existing do_ats_write.
Backports commit 04b07d29722192926f467ea5fedf2c3b0996a2a5 from qemu
The PAN bit is preserved, or set as per SCTLR_ELx.SPAN,
plus several other conditions listed in the ARM ARM.
Backports commit 4a2696c0d4d80e14a192b28148c6167bc5056f94 from qemu
For aarch64, there's a dedicated msr (imm, reg) insn.
For aarch32, this is done via msr to cpsr. Writes from el0
are ignored, which is already handled by the CPSR_USER mask.
Backports commit 220f508f49c5f49fb771d5105f991c19ffede3f7 from qemu
The only remaining use was in op_helper.c. Use PSTATE_SS
directly, and move the commentary so that it is more obvious
what is going on.
Backports commit 70dae0d069c45250bbefd9424089383a8ac239de from qemu
Using ~0 as the mask on the aarch64->aarch32 exception return
was not even as correct as the CPSR_ERET_MASK that we had used
on the aarch32->aarch32 exception return.
Backports commit d203cabd1bd12f31c9df0b5737421ba67b96857b from qemu
CPSR_ERET_MASK was a useless renaming of CPSR_RESERVED.
The function also takes into account bits that the cpu
does not support.
Backports commit 437864216d63f052f3cd06ec8861d0e432496424 from qemu
The J bit signals Jazelle mode, and so of course is RES0
when the feature is not enabled.
Backports commit f062d1447f2a80e7a5f593b8cb5ac7cab5e16eb0 from qemu
Split this helper out of msr_mask in translate.c. At the same time,
transform the negative reductive logic to positive accumulative logic.
It will be usable along the exception paths.
While touching msr_mask, fix up formatting.
Backports commit 4f9584ed4bba8a57a3cb2fa48a682725005d530a from qemu
Include definitions for all of the bits in ID_MMFR3.
We already have a definition for ID_AA64MMFR1.PAN.
Backports commit 3d6ad6bb466f487bcc861f99e2c9054230df1076 from qemu
To implement PAN, we will want to swap, for short periods
of time, to a different privileged mmu_idx. In addition,
we cannot do this with flushing alone, because the AT*
instructions have both PAN and PAN-less versions.
Add the ARMMMUIdx*_PAN constants where necessary next to
the corresponding ARMMMUIdx* constant.
Backports commit 452ef8cb8c7b06f44a30a3c3a54d3be82c4aef59 from qemu
The fall through organization of this function meant that we
would raise an interrupt, then might overwrite that with another.
Since interrupt prioritization is IMPLEMENTATION DEFINED, we
can recognize these in any order we choose.
Unify the code to raise the interrupt in a block at the end.
Backports commit d63d0ec59d87a698de5ed843288f90a23470cf2e from qemu
Avoid redundant computation of cpu state by passing it in
from the caller, which has already computed it for itself.
Backports commit be87955687446be152f366af543c9234eab78a7c from qemu
This inline function has one user in cpu.c, and need not be exposed
otherwise. Code movement only, with fixups for checkpatch.
Backports commit 310cedf39dea240a89f90729fd99481ff6158e90 from qemu
When VHE is enabled, the exception level below EL2 is not EL1,
but EL0, and so to identify the entry vector offset for exceptions
targeting EL2 we need to look at the width of EL0, not of EL1.
Backports commit cb092fbbaeb7b4e91b3f9c53150c8160f91577c7 from qemu
The EL2&0 translation regime is affected by Load Register (unpriv).
The code structure used here will facilitate later changes in this
area for implementing UAO and NV.
Backports commit cc28fc30e333dc2f20ebfde54444697e26cd8f6d from qemu
Since we only support a single ASID, flush the tlb when it changes.
Note that TCR_EL2, like TCR_EL1, has the A1 bit that chooses between
the two TTBR* registers for the location of the ASID.
Backports commit d06dc93340825030b6297c61199a17c0067b0377 from qemu
Apart from the wholesale redirection that HCR_EL2.E2H performs
for EL2, there's a separate redirection specific to the timers
that happens for EL0 when running in the EL2&0 regime.
Backports commit bb5972e439dc0ac4d21329a9d97bad6760ec702d from qemu
Several of the EL1/0 registers are redirected to the EL2 version when in
EL2 and HCR_EL2.E2H is set. Many of these registers have side effects.
Link together the two ARMCPRegInfo structures after they have been
properly instantiated. Install common dispatch routines to all of the
relevant registers.
The same set of registers that are redirected also have additional
EL12/EL02 aliases created to access the original register that was
redirected.
Omit the generic timer registers from redirection here, because we'll
need multiple kinds of redirection from both EL0 and EL2.
Backports commit e2cce18f5c1d0d55328c585c8372cdb096bbf528 from qemu
The comment that we don't support EL2 is somewhat out of date.
Update to include checks against HCR_EL2.TDZ.
Backports commit 4351cb72fb65926136ab618c9e40c1f5a8813251 from qemu
Use the correct sctlr for EL2&0 regime. Due to header ordering,
and where arm_mmu_idx_el is declared, we need to move the function
out of line. Use the function in many more places in order to
select the correct control.
Backports commit aaec143212bb70ac9549cf73203d13100bd5c7c2 from qemu
Return the indexes for the EL2&0 regime when the appropriate bits
are set within HCR_EL2.
Backports commit 6003d9800ee38aa11eefb5cd64ae55abb64bef16 from qemu
Create a predicate to indicate whether the regime has
both positive and negative addresses.
Backports commit 339370b90d067345b69585ddf4b668fa01f41d67 from qemu
Prepare for, but do not yet implement, the EL2&0 regime.
This involves adding the new MMUIdx enumerators and adjusting
some of the MMUIdx related predicates to match.
Backports commit b9f6033c1a5fb7da55ed353794db8ec064f78bb2 from qemu.
Replace the magic numbers with the relevant ARM_MMU_IDX_M_* constants.
Keep the definitions short by referencing previous symbols.
Backports commit 25568316b2a7e73d68701042ba6ebdb217205e20 from qemu
Define via macro expansion, so that renumbering of the base ARMMMUIdx
symbols is automatically reflected in the bit definitions.
Backports commit 5f09a6dfbfbff4662f52cc3130a2e07044816497 from qemu
We are about to expand the number of mmuidx to 10, and so need 4 bits.
For the benefit of reading the number out of -d exec, align it to the
penultimate nibble.
Backports commit 506f149815c2168f16ade17893e117419d93f248 from qemu
We had completely run out of TBFLAG bits.
Split A- and M-profile bits into two overlapping buckets.
This results in 4 free bits.
We used to initialize all of the a32 and m32 fields in DisasContext
by assignment, in arm_tr_init_disas_context. Now we only initialize
either the a32 or m32 by assignment, because the bits overlap in
tbflags. So zero the entire structure in gen_intermediate_code.
Backports commit 79cabf1f473ca6e9fa0727f64ed9c2a84a36f0aa from qemu
This is part of a reorganization to the set of mmu_idx.
The non-secure EL2 regime only has a single stage translation;
there is no point in pointing out that the idx is for stage1.
Backports commit e013b7411339342aac8d986c5d5e329e1baee8e1 from qemu
This is part of a reorganization to the set of mmu_idx.
The EL3 regime only has a single stage translation, and
is always secure.
Backports commit 127b2b086303296289099a6fb10bbc51077f1d53 from qemu
This is part of a reorganization to the set of mmu_idx.
This emphasizes that they apply to the Secure EL1&0 regime.
Backports commit fba37aedecb82506c62a1f9e81d066b4fd04e443 from qemu
This is part of a reorganization to the set of mmu_idx.
The EL1&0 regime is the only one that uses 2-stage translation.
Spelling out Stage avoids confusion with Secure.
Backports commit 2859d7b590760283a7b5aef40b723e9dfd7c98ba from qemu
This is part of a reorganization to the set of mmu_idx.
This emphasizes that they apply to the EL1&0 regime.
The ultimate goal is
-- Non-secure regimes:
ARMMMUIdx_E10_0,
ARMMMUIdx_E20_0,
ARMMMUIdx_E10_1,
ARMMMUIdx_E2,
ARMMMUIdx_E20_2,
-- Secure regimes:
ARMMMUIdx_SE10_0,
ARMMMUIdx_SE10_1,
ARMMMUIdx_SE3,
-- Helper mmu_idx for non-secure EL1&0 stage1 and stage2
ARMMMUIdx_Stage2,
ARMMMUIdx_Stage1_E0,
ARMMMUIdx_Stage1_E1,
The 'S' prefix is reserved for "Secure". Unless otherwise specified,
each mmu_idx represents all stages of translation.
Backports commit 01b98b686460b3a0fb47125882e4f8d4268ac1b6 from qemu
At the same time, add writefn to TTBR0_EL2 and TCR_EL2.
A later patch will update any ASID therein.
Backports commit ed30da8eee6906032b38a84e4807e2142b09d8ec from qemu
Not all of the breakpoint types are supported, but those that
only examine contextidr are extended to support the new register.
Backports commit e2a1a4616c86159eb4c07659a02fff8bb25d3729 from qemu
When support for the AHP flag was added we inexplicably only freed the
new temps in one of the two legs. Move those tcg_temp_free to the same
level as the allocation to fix that leak.
Backports commit aeab8e5eb220cc5ff84b0b68b9afccc611bf0fcd from qemu
In the PAC computation, sbox was applied over wrong bits.
As this is a 4-bit sbox, bit index should be incremented by 4 instead of 16.
Test vector from QARMA paper (https://eprint.iacr.org/2016/444.pdf) was
used to verify one computation of the pauth_computepac() function which
uses sbox2.
Launchpad: https://bugs.launchpad.net/bugs/1859713
Backports commit de0b1bae6461f67243282555475f88b2384a1eb9 from qemu
The PMU is not optional on cortex-r5 and cortex-r5f (see
the "Features" chapter of the Technical Reference Manual).
Backports commit 90f671581ac601fcc1b840d9e9abe7e3c3e672db from qemu
During the conversion to decodetree, the setting of
ISSIs16Bit got lost. This causes the guest os to
incorrectly adjust trapping memory operations.
Backports commit 1a1fbc6cbb34c26d43d8360c66c1d21681af14a9 from qemu
The IL bit is set for 32-bit instructions, thus passing false
with the is_16bit parameter to syn_data_abort_with_iss() makes
a syn mask that always has the IL bit set.
Pass is_16bit as true to make the initial syn mask have IL=0,
so that the final IL value comes from or'ing template_syn.
Cc: qemu-stable@nongnu.org
Fixes: aaa1f954d4ca ("target-arm: A64: Create Instruction Syndromes for Data Aborts")
Backports commit 30d544839e278dc76017b9a42990c41e84a34377 from qemu
The wfi instruction can be configured to be trapped by a higher exception
level, such as the EL2 hypervisor. When the instruction is trapped, the
program counter should contain the address of the wfi instruction that
caused the exception. The program counter is adjusted for this in the wfi op
helper function.
However, this correction is done to env->pc, which only applies to AArch64
mode. For AArch32, the program counter is stored in env->regs[15]. This
adds an if-else statement to modify the correct program counter location
based on the the current CPU mode.
Backports commit 855532912b0e1bf803ae393e5b0c7e80948cd6a4 from qemu
The SPSR register is named within the Unicorn headers, but the code
to access it is absent. This means that it will always read as 0 and
ignore writes. This makes it harder to work with changes in processor
mode, as the usual way to return from a CPU exception is a
`MOVS pc, lr` for undefined instructions or `SUBS pc, lr, #4`
for most other aborts - which implicitly restores the CPSR from SPSR.
This change adds the access to the SPSR so that it can be read and
written as the caller might expect.
Backports commit 99097cab4c39fb3fc50eea8f0006954f62a149b2 from unicorn.
Fixes:
target/arm/translate-a64.c: In function 'disas_crypto_three_reg_sha512':
target/arm/translate-a64.c:13625:9: error: 'genfn' may be used uninitialized in this function [-Werror=maybe-uninitialized]
genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
qemu/target/arm/translate-a64.c:13609:8: error: 'feature' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (!feature) {
Backports commit c7a5e7910517e2711215a9e869a733ffde696091 from qemu
Before we introduce blocking semihosting calls we need to ensure we
can restart the system on semi hosting exception. To be able to do
this the EXCP_SEMIHOST operation should be idempotent until it finally
completes. Practically this means ensureing we only update the pc
after the semihosting call has completed.
Backports commit 4ff5ef9e911c670ca10cdd36dd27c5395ec2c753 from qemu
All semihosting exceptions are dealt with earlier in the common code
so we should never get here.
Backports commit b906acbb3aceed5b1eca30d9d365d5bd7431400b from qemu
A write to the SCR can change the effective EL by droppping the system
from secure to non-secure mode. However if we use a cached current_el
from before the change we'll rebuild the flags incorrectly. To fix
this we introduce the ARM_CP_NEWEL CP flag to indicate the new EL
should be used when recomputing the flags.
Backports partof commit f80741d107673f162e3b097fc76a1590036cc9d1 from
qemu
ARMv8.2 introduced support for Data Cache Clean instructions
to PoP (point-of-persistence) - DC CVAP and PoDP (point-of-deep-persistence)
- DV CVADP. Both specify conceptual points in a memory system where all writes
that are to reach them are considered persistent.
The support provided considers both to be actually the same so there is no
distinction between the two. If none is available (there is no backing store
for given memory) both will result in Data Cache Clean up to the point of
coherency. Otherwise sync for the specified range shall be performed.
Backports commit 0d57b49992200a926c4436eead97ecfc8cc710be from qemu
This change ensures that the FPU can be accessed in Non-Secure mode
when the CPU core is reset using the arm_set_cpu_on() function call.
The NSACR.{CP11,CP10} bits define the exception level required to
access the FPU in Non-Secure mode. Without these bits set, the CPU
will give an undefined exception trap on the first FPU access for the
secondary cores under Linux.
This is necessary because in this power-control codepath QEMU
is effectively emulating a bit of EL3 firmware, and has to set
the CPU up as the EL3 firmware would.
Fixes: fc1120a7f5
Backports commit 0c7f8c43daf6556078e51de98aa13f069e505985 from qemu
QEMU lacks the minimum Jazelle implementation that is required
by the architecture (everything is RAZ or RAZ/WI). Add it
together with the HCR_EL2.TID0 trapping that goes with it.
Backports commit f96f3d5f09973ef40f164cf2d5fd98ce5498b82a from qemu
HSTR_EL2 offers a way to trap ranges of CP15 system register
accesses to EL2, and it looks like this register is completely
ignored by QEMU.
To avoid adding extra .accessfn filters all over the place (which
would have a direct performance impact), let's add a new TB flag
that gets set whenever HSTR_EL2 is non-zero and that QEMU translates
a context where this trap has a chance to apply, and only generate
the extra access check if the hypervisor is actively using this feature.
Tested with a hand-crafted KVM guest accessing CBAR.
Backports commit 5bb0a20b74ad17dee5dae38e3b8b70b383ee7c2d from qemu
HCR_EL2.TID3 requires that AArch32 reads of MVFR[012] are trapped to
EL2, and HCR_EL2.TID0 does the same for reads of FPSID.
In order to handle this, introduce a new TCG helper function that
checks for these control bits before executing the VMRC instruction.
Tested with a hacked-up version of KVM/arm64 that sets the control
bits for 32bit guests.
Backports commit 9ca1d776cb49c09b09579d9edd0447542970c834 from qemu
HCR_EL2.TID1 mandates that access from EL1 to REVIDR_EL1, AIDR_EL1
(and their 32bit equivalents) as well as TCMTR, TLBTR are trapped
to EL2. QEMU ignores it, making it harder for a hypervisor to
virtualize the HW (though to be fair, no known hypervisor actually
cares).
Do the right thing by trapping to EL2 if HCR_EL2.TID1 is set.
Backports commit 93fbc983b29a2eb84e2f6065929caf14f99c3681 from qemu
HCR_EL2.TID2 mandates that access from EL1 to CTR_EL0, CCSIDR_EL1,
CCSIDR2_EL1, CLIDR_EL1, CSSELR_EL1 are trapped to EL2, and QEMU
completely ignores it, making it impossible for hypervisors to
virtualize the cache hierarchy.
Do the right thing by trapping to EL2 if HCR_EL2.TID2 is set.
Backports commit 630fcd4d2ba37050329e0adafdc552d656ebe2f3 from qemu
This is derived from cortex-m4 description, adding DP support and FPv5
instructions with the corresponding flags in isar and mvfr2.
Checked that it could successfully execute
vrinta.f32 s15, s15
while cortex-m4 emulation rejects it with "illegal instruction".
Backports commit cf7beda5072e106ddce875c1996446540c5fe239 from qemu
HCR_EL2.TID3 mandates that access from EL1 to a long list of id
registers traps to EL2, and QEMU has so far ignored this requirement.
This breaks (among other things) KVM guests that have PtrAuth enabled,
while the hypervisor doesn't want to expose the feature to its guest.
To achieve this, KVM traps the ID registers (ID_AA64ISAR1_EL1 in this
case), and masks out the unsupported feature.
QEMU not honoring the trap request means that the guest observes
that the feature is present in the HW, starts using it, and dies
a horrible death when KVM injects an UNDEF, because the feature
*really* isn't supported.
Do the right thing by trapping to EL2 if HCR_EL2.TID3 is set.
Note that this change does not include trapping of the MVFR
registers from AArch32 (they are accessed via the VMRS
instruction and need to be handled in a different way).
Backports commit 6a4ef4e5d1084ce41fafa7d470a644b0fd3d9317 from qemu
The ARMv8 ARM states when executing at EL2, EL3 or Secure EL1,
ISR_EL1 shows the pending status of the physical IRQ, FIQ, or
SError interrupts.
Unfortunately, QEMU's implementation only considers the HCR_EL2
bits, and ignores the current exception level. This means a hypervisor
trying to look at its own interrupt state actually sees the guest
state, which is unexpected and breaks KVM as of Linux 5.3.
Instead, check for the running EL and return the physical bits
if not running in a virtualized context.
Backports commit 7cf95aed53c8770a338617ef40d5f37d2c197853 from qemu
According to the PushStack() pseudocode in the armv7m RM,
bit 4 of the LR should be set to NOT(CONTROL.PFCA) when
an FPU is present. Current implementation is doing it for
armv8, but not for armv7. This patch makes the existing
logic applicable to both code paths.
Backports commit f900b1e5b087a02199fbb6de7038828008e9e419 from qemu
Simply moving the non-stub helper_v7m_mrs/msr outside of
!CONFIG_USER_ONLY is not an option, because of all of the
other system-mode helpers that are called.
But we can split out a few subroutines to handle the few
EL0 accessible registers without duplicating code.
Backports commit 04c9c81b8fa2ee33f59a26265700fae6fc646062 from qemu
There was too much cut and paste between ldrexd and strexd,
as ldrexd does prohibit two output registers the same.
Fixes: af288228995
Backports commit 655b02646dc175dc10666459b0a1e4346fc8d46a from qemu
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.
Target dependant attributes are conditionalized upon NEED_CPU_H.
Backports commit 14776ab5a12972ea439c7fb2203a4c15a09094b4 from qemu
There are only two remaining uses of gen_bx_im. In each case, we
know the destination mode -- not changing in the case of gen_jmp
or changing in the case of trans_BLX_i. Use this to simplify the
surrounding code.
For trans_BLX_i, use gen_jmp for the actual branch. For gen_jmp,
use gen_set_pc_im to set up the single-step.
Backports commit eac2f39602e0423adf56be410c9a22c31fec9a81 from qemu
Now that all callers pass a constant value, split the switch
statement into the individual trans_* functions.
Backports commit 279de61a21a1622cb875ead82d6e78c989ba2966 from qemu
Add a check for ARMv6 in trans_CPS. We had this correct in
the T16 path, but had previously forgotten the check on the
A32 and T32 paths.
Backports commit 20556e7bd6111266fbf1d81e4ff7a89bfa5795a7 from qemu
Fold away all of the cases that now just goto illegal_op,
because all of their internal bits are now in decodetree.
Backports commit 590057d969a54de5d97261701c5702b3bebc9c07 from qemu
Fold away all of the cases that now just goto illegal_op,
because all of their internal bits are now in decodetree.
Backports commit f843e77144c9334e244a422848177f2fbef5eb05 from qemu
We have been using store_reg and not store_reg_for_load when writing
back a loaded value into the base register. At first glance this is
incorrect when base == pc, however that case is UNPREDICTABLE.
Backports commit b0e382b8cf365fed8b8c43482029ac7655961a85 from qemu
This has been a TODO item for quite a while. The minimum bit
count for A32 and T16 is 1, and for T32 is 2.
Backports commit 4b222545dbf30b60c033e1cd6eddda612575fd8c from qemu
Prior to v7, for the A32 encoding, this operation wrote an UNKNOWN
value back to the base register. Starting in v7 this is UNPREDICTABLE.
Backports commit 3949f4675d13c587078f8f423845a3a537a22595 from qemu
This includes a minor bug fix to LDM (user), which requires
bit 21 to be 0, which means no writeback.
Backports commit c5c426d4c680f908a1e262091a17b088b5709200 from qemu
In op_bfx, note that tcg_gen_{,s}extract_i32 already checks
for width == 32, so we don't need to special case that here.
Backports commit 86d21e4b509a2835ed79f234f476a4c5191d435b from qemu
Pass the T5 encoding of SUBS PC, LR, #IMM through the normal SUBS path
to make it clear exactly what's happening -- we hit ALUExceptionReturn
along that path.
Backports commit ef11bc3c461e2c650e8bef552146a4b08f81884e from qemu
Document our choice about the T32 CONSTRAINED UNPREDICTABLE behaviour.
This matches the undocumented choice made by the legacy decoder.
Backports commit 4c97f5b2f0fa9b37f9ff497f15411d809e6fd098 from qemu
The m-profile and a-profile decodings overlap. Only return false
for the case of wrong profile; handle UNDEFINED for permission failure
directly. This ensures that we don't accidentally pass an insn that
applies to the wrong profile.
Backports commit d0b26644502103ca97093ef67749812dc1df7eea from qemu
By shifting the 16-bit input left by 16, we can align the desired
portion of the 48-bit product and use tcg_gen_muls2_i32.
Backports commit 485b607d4f393e0de92c922806a68aef22340c98 from qemu
Since all of the inputs and outputs are i32, dispense with
the intermediate promotion to i64 and use tcg_gen_add2_i32.
Backports commit ea96b374641bc429269096d88d4e91ee544273e9 from qemu