Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.
Backport 635187aaa92f21ab001e2868e803b3c5460261ca
Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.
This is the last use of do_3same_fp() so we can now delete
that function.
Backports commit cf722d75b329ef3f86b869e7e68cbfb1607b3bde
Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.
Backports e5adc70665ecaf4009c2fb8d66775ea718a85abd
Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.
Backports e22705bb941d82d6c2a09e8b2031084326902be3
Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.
Backport e43268c54b6cbcb197d179409df7126e81f8cd52
Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.
Backports bb2741da186ebaebc7d5189372be4401e1ff9972
Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.
(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)
Backports ad505db233b89b7fd4b5a98b6f0e8ac8d05b11db
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.
For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.
Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
Unify add/sub helpers and add a parameter for rounding.
This will allow saturating non-rounding to reuse this code.
Backports d21798856b227a20a0a41640236af445f4f4aeb0
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.
This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.
Backports commit a04b68e1d4c4f0cd5cd7542697b1b230b84532f5 from qemu
Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
We already have gvec helpers for addition and subtraction, but must
add one for fabd.
Backports commit a26a352bb498662cd0c205cb433a352f86fac7d2 from qemu
Pass a pointer directly to env->vfp.qc[0], rather than env.
This will allow SVE2, which does not modify QC, to pass a
pointer to dummy storage.
Change the return type of inl_qrdml.h_s16 to match the
sense of the operation: signed.
Backports commit e286bf4a72fe3a60490b8d6e3f28d6335677e08c from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 893ab0542aa385a287cbe46d5535c8b9e95ce699 from qemu
Create vectorized versions of handle_shri_with_rndacc
for shift+round and shift+round+accumulate. Add out-of-line
helpers in preparation for longer vector lengths from SVE.
Backports commit 6ccd48d4ea244c1c46a24dfa50bfb547f11422dd from qemu
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.
Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.
Backports commit 631e565450c483e0622eec3d8b61d7fa41d16bca from qemu
These instructions are often used in glibc's string routines.
They were the final uses of the 32-bit at a time neon helpers.
Backports commit 6b375d3546b009d1e63e07397ec9c6af256e15e9 from qemu
We still need two different helpers, since NEON and SVE2 get the
inputs from different locations within the source vector. However,
we can convert both to the same internal form for computation.
The sve2 helper is not used yet, but adding it with this patch
helps illustrate why the neon changes are helpful.
Backports commit e7e96fc5ec8c79dc77fef522d5226ac09f684ba5 from qemu
The gvec form will be needed for implementing SVE2.
Extend the implementation to operate on uint64_t instead of uint32_t.
Use a counted inner loop instead of terminating when op1 goes to zero,
looking toward the required implementation for ARMv8.4-DIT.
Backports commit a21bb78e5817be3f494922e1dadd6455fe5d6318 from qemu
These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift. This
requires several masks and selects in addition to the actual
shifts to form the complete answer.
That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.
Backports commit 87b74e8b6edd287ea2160caa0ebea725fa8f1ca1 from qemu
Note that float16_to_float32 rightly squashes SNaN to QNaN.
But of course pickNaNMulAdd, for ARM, selects SNaNs first.
So we have to preserve SNaN long enough for the correct NaN
to be selected. Thus float16_to_float32_by_bits.
Backports commit a4e943a716d5fac923d82df3eabc65d1e3624019 from qemu
Fortunately, the functions affected are so far only called from SVE,
so there is no tail to be cleared. But as we convert more of AdvSIMD
to gvec, this will matter.
Backports commit d8efe78e8039511b95c23d75bb48eca6873fbb0f from qemu
For same-sign saturation, we have tcg vector operations. We can
compute the QC bit by comparing the saturated value against the
unsaturated value.
Backports commit 89e68b575e138d0af1435f11a8ffcd8779c237bd from qemu
Change the representation of this field such that it is easy
to set from vector code.
Backports commit a4d5846245c5e029e5aa3945a9bda1de1c3fedbf from qemu
Enhance the existing helpers to support SVE, which takes the
index from each 128-bit segment. The change has no effect
for AdvSIMD, since there is only one such segment.
Backports commit 18fc24057815bf3d956cfab892a2bc2344bd1dcb from qemu
For aa64 advsimd, we had been passing the pre-indexed vector.
However, sve applies the index to each 128-bit segment, so we
need to pass in the index separately.
For aa32 advsimd, the fp32 operation always has index 0, but
we failed to interpret the fp16 index correctly.
Backports commit 2cc99919a81a62589a4a6b0f365eabfead1db1a7 from qemu