Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.
For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.
Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
Now the VFP_CONV_FIX macros can handle fp16's distinction between the
width of the operation and the width of the type used to pass operands,
use the macros rather than the open-coded functions.
This creates an extra six helper functions, all of which we are going
to need for the AArch32 VFP fp16 instructions.
Backports commit 414ba270c4fb758d987adf37ae9bfe531715c604
Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
the fp16 insns that use the DO_VFP_2OP macro, because there
is no fp16 version of VMOV_reg.
Notes:
* the gen_helper_vfp_negh already exists as we needed to create
it for the fp16 multiply-add insns
* as usual we need to use the f16 version of the fp_status;
this is only relevant for VSQRT
Backports ce2d65a5d191380756cdac7a1fd1ba76bd1621cf
Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
instructions. (These are all the remaining ones which we implement
via do_vfp_3op_[hsd]p().)
Backports commit e7cb0ded52c6d7b86585b09935fe7caeb9e38b69
Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
VSUB, VMUL, VDIV, VMINNM and VMAXNM:
* make the VFP_BINOP() macro generate float16 helpers as well as
float32 and float64
* implement a do_vfp_3op_hp() function similar to the existing
do_vfp_3op_sp()
* add decode for the half-precision insn patterns
Note that the VFP_BINOP macro use creates a couple of unused helper
functions vfp_maxh and vfp_minh, but they're small so it's not worth
splitting the BINOP operations into "needs halfprec" and "no
halfprec" groups.
Backports commit 120a0eb3ea23a5b06fae2f3daebd46a4035864cf
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit 43fa36c96c24349145497adc1b451f9caf74e344 from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit afc8b7d32668547308bdd654a63cf5228936e0ba from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit effa992f153f5e7ab97ab843b565690748c5b402 from qemu
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Backports commit aaffebd6d3135b8aed7e61932af53b004d261579 from qemu
With this conversion, we will be able to use the same helpers
with sve. This also fixes a bug in which we failed to clear
the high bits of the SVE register after an AdvSIMD operation.
Backports commit 1738860d7e60dec5dbeba17f8b44d31aae3accac from qemu
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.
This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.
Backports commit a04b68e1d4c4f0cd5cd7542697b1b230b84532f5 from qemu
The usual location for the env argument in the argument list of a TCG helper
is immediately after the return-value argument. recps_f32 and rsqrts_f32
differ in that they put it at the end.
Move the env argument to its usual place; this will allow us to
more easily use these helper functions with the gvec APIs.
Backports commit 26c6f695cfd2a3ccddb4d015a25b56f56aa62928 from qemu
Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
We already have gvec helpers for addition and subtraction, but must
add one for fabd.
Backports commit a26a352bb498662cd0c205cb433a352f86fac7d2 from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 893ab0542aa385a287cbe46d5535c8b9e95ce699 from qemu
Create vectorized versions of handle_shri_with_rndacc
for shift+round and shift+round+accumulate. Add out-of-line
helpers in preparation for longer vector lengths from SVE.
Backports commit 6ccd48d4ea244c1c46a24dfa50bfb547f11422dd from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 631e565450c483e0622eec3d8b61d7fa41d16bca from qemu
These instructions are often used in glibc's string routines.
They were the final uses of the 32-bit at a time neon helpers.
Backports commit 6b375d3546b009d1e63e07397ec9c6af256e15e9 from qemu
This is an aarch64-only function. Move it out of the shared file.
This patch is code movement only.
Backports commit 7b182eb2467af6c47c9c77c64bbbeed8ed53c330 from qemu
We still need two different helpers, since NEON and SVE2 get the
inputs from different locations within the source vector. However,
we can convert both to the same internal form for computation.
The sve2 helper is not used yet, but adding it with this patch
helps illustrate why the neon changes are helpful.
Backports commit e7e96fc5ec8c79dc77fef522d5226ac09f684ba5 from qemu
The gvec form will be needed for implementing SVE2.
Extend the implementation to operate on uint64_t instead of uint32_t.
Use a counted inner loop instead of terminating when op1 goes to zero,
looking toward the required implementation for ARMv8.4-DIT.
Backports commit a21bb78e5817be3f494922e1dadd6455fe5d6318 from qemu
These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift. This
requires several masks and selects in addition to the actual
shifts to form the complete answer.
That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.
Backports commit 87b74e8b6edd287ea2160caa0ebea725fa8f1ca1 from qemu
HCR_EL2.TID3 requires that AArch32 reads of MVFR[012] are trapped to
EL2, and HCR_EL2.TID0 does the same for reads of FPSID.
In order to handle this, introduce a new TCG helper function that
checks for these control bits before executing the VMRC instruction.
Tested with a hacked-up version of KVM/arm64 that sets the control
bits for 32bit guests.
Backports commit 9ca1d776cb49c09b09579d9edd0447542970c834 from qemu
Replace x = double_saturate(y) with x = add_saturate(y, y).
There is no need for a separate more specialized helper.
Backports commit 640581a06d14e2d0d3c3ba79b916de6bc43578b0 from qemu
The M-profile architecture floating point system supports
lazy FP state preservation, where FP registers are not
pushed to the stack when an exception occurs but are instead
only saved if and when the first FP instruction in the exception
handler is executed. Implement this in QEMU, corresponding
to the check of LSPACT in the pseudocode ExecuteFPCheck().
Backports commit e33cf0f8d8c9998a7616684f9d6aa0d181b88803 from qemu
We do not need an out-of-line helper for manipulating bits in pstate.
While changing things, share the implementation of gen_ss_advance.
Backports commit 22ac3c49641f6eed93dca5b852030b4d3eacf6c4 from qemu
The EL0+UMA check is unique to DAIF. While SPSel had avoided the
check by nature of already checking EL >= 1, the other post v8.0
extensions to MSR (imm) allow EL0 and do not require UMA. Avoid
the unconditional write to pc and use raise_exception_ra to unwind.
Backports commit ff730e9666a716b669ac4a8ca7c521177d1d2b15 from qemu
Note that float16_to_float32 rightly squashes SNaN to QNaN.
But of course pickNaNMulAdd, for ARM, selects SNaNs first.
So we have to preserve SNaN long enough for the correct NaN
to be selected. Thus float16_to_float32_by_bits.
Backports commit a4e943a716d5fac923d82df3eabc65d1e3624019 from qemu
For same-sign saturation, we have tcg vector operations. We can
compute the QC bit by comparing the saturated value against the
unsaturated value.
Backports commit 89e68b575e138d0af1435f11a8ffcd8779c237bd from qemu
Add code to insert calls to a helper function to do the stack
limit checking when we handle these forms of instruction
that write to SP:
* ADD (SP plus immediate)
* ADD (SP plus register)
* SUB (SP minus immediate)
* SUB (SP minus register)
* MOV (register)
Backports commit 5520318939fea5d659bf808157cd726cb967b761 from qemu