Commit graph

53 commits

Author SHA1 Message Date
Peter Maydell 6b8096d9fc target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Backports d1a9254be5cc93afb15be19f7543da6ff4806256
2021-03-02 13:03:51 -05:00
Peter Maydell 5c6730a432 target/arm: Fix float16 pairwise Neon ops on big-endian hosts
In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data. This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Backports 552714c0812a10e5cff239bd29928e5fcb8d8b3b
2021-03-02 13:02:31 -05:00
Peter Maydell 8c6affbca4 target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
Add gvec helpers for doing Neon-style indexed non-fused fp
multiply-and-accumulate operations.

Backports commit c50d8d144098a8261233ca31b47e3bc487e112fe
2021-03-01 17:52:31 -05:00
Peter Maydell 3cc3099e36 target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.

Backports commit d7ce81e553e6789bf27657105b32575668d60b1c
2021-03-01 17:48:42 -05:00
Peter Maydell 681218b4ab target/arm: Implement fp16 for Neon VRINTX
Convert the Neon VRINTX insn to use gvec, and use this to implement
fp16 support for it.

Backports 23afcdd2511f2a3dc05bed650d27bd25cf9b2a3c
2021-03-01 17:47:25 -05:00
Peter Maydell 53aba9d900 target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
and use this to implement the fp16 versions.

Backports 18725916b1438b54d6d6533980833d2251a20b7c
2021-03-01 17:44:49 -05:00
Peter Maydell eb4054d04f target/arm: Implement fp16 for Neon VCVT with rounding modes
Convert the Neon VCVT with-specified-rounding-mode instructions
to gvec, and use this to implement fp16 support for them.

Backports ca88a6efdf4ce96b646a896059f9bd324c2cebc4
2021-03-01 17:40:36 -05:00
Peter Maydell 56fe927d40 target/arm: Implement fp16 for Neon VCVT fixed-point
Implement fp16 for the Neon VCVT insns which convert between
float and fixed-point.

Backports 24018cf3990b692b51e50183c5fbd98d17b3fa40
2021-03-01 17:36:43 -05:00
Peter Maydell 948b01ad01 target/arm: Convert Neon VCVT fixed-point to gvec
Convert the Neon VCVT float<->fixed-point insns to a
gvec style, in preparation for adding fp16 support.

Backports 7b959c5890deb9a6d71bc6800006a0eae0a84c60
2021-03-01 17:33:20 -05:00
Peter Maydell c324c6817e target/arm: Implement fp16 for Neon float-integer VCVT
Convert the Neon float-integer VCVT insns to gvec, and use this
to implement fp16 support for them.

Note that unlike the VFP int<->fp16 VCVT insns we converted
earlier and which convert to/from a 32-bit integer, these
Neon insns convert to/from 16-bit integers. So we can use
the existing vfp conversion helpers for the f32<->u32/i32
case but need to provide our own for f16<->u16/i16.

Backports 7782a9afec81d1efe23572135c1ed777691ccde5
2021-03-01 17:29:02 -05:00
Peter Maydell 82f4a7e135 target/arm: Implement fp16 for Neon pairwise fp ops
Convert the Neon pairwise fp ops to use a single gvic-style
helper to do the full operation instead of one helper call
for each 32-bit part. This allows us to use the same
framework to implement the fp16.

Backports 1dc587ee9bfe804406eb3e0bacf47a80644d8abc
2021-03-01 17:25:19 -05:00
Peter Maydell b08ea84374 target/arm: Implement fp16 for Neon VRSQRTS
Convert the Neon VRSQRTS insn to using a gvec helper,
and use this to implement the fp16 case.

As with VRECPS, we adjust the phrasing of the new implementation
slightly so that the fp32 version parallels the fp16 one.

Backports 40fde72dda2da8d55b820fa6c5efd85814be2023
2021-03-01 17:20:22 -05:00
Peter Maydell f4ebbba9fd target/arm: Implement fp16 for Neon VRECPS
Convert the Neon VRECPS insn to using a gvec helper, and
use this to implement the fp16 case.

The phrasing of the new float32_recps_nf() is slightly different from
the old recps_f32() so that it parallels the f16 version; for f16 we
can't assume that flush-to-zero is always enabled.

Backports ac8c62c4e5a3f24e6d47f52ec1bfb20994caefa5
2021-03-01 17:09:16 -05:00
Peter Maydell 5776c594e4 target/arm: Implement fp16 for Neon fp compare-vs-0
Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.

Backport 635187aaa92f21ab001e2868e803b3c5460261ca
2021-03-01 17:05:03 -05:00
Peter Maydell 8de258c3cb target/arm: Implement fp16 for Neon VFMA, VMFS
Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.

This is the last use of do_3same_fp() so we can now delete
that function.

Backports commit cf722d75b329ef3f86b869e7e68cbfb1607b3bde
2021-03-01 17:00:49 -05:00
Peter Maydell 587c3549b7 target/arm: Implement fp16 for Neon VMLA, VMLS operations
Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.

Backports e5adc70665ecaf4009c2fb8d66775ea718a85abd
2021-03-01 16:57:20 -05:00
Peter Maydell 0068d12355 target/arm: Implement fp16 for Neon VMAXNM, VMINNM
Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.

Backports e22705bb941d82d6c2a09e8b2031084326902be3
2021-03-01 16:53:57 -05:00
Peter Maydell 465cfb54c4 target/arm: Implement fp16 for Neon VMAX, VMIN
Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.

Backport e43268c54b6cbcb197d179409df7126e81f8cd52
2021-03-01 16:50:23 -05:00
Peter Maydell 6dd4a8e93f target/arm: Implement fp16 for VACGE, VACGT
Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.

Backports bb2741da186ebaebc7d5189372be4401e1ff9972
2021-03-01 16:47:44 -05:00
Peter Maydell 4eb39f1b2f target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.

(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)

Backports ad505db233b89b7fd4b5a98b6f0e8ac8d05b11db
2021-03-01 16:44:34 -05:00
Peter Maydell 4850377f01 target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.

For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.

Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
2021-03-01 16:31:54 -05:00
Lioncash f5a21abc0b target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd 2021-02-26 15:01:44 -05:00
Richard Henderson aa97b6b755 target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd
Backports 3607440c4df6498585a570cfc1041e4972b41b56
2021-02-26 14:51:17 -05:00
Richard Henderson 732674b868 target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd
Backports 2e5a265e6a9e7169c4a3e87db261b2fa92582590
2021-02-26 14:46:29 -05:00
Richard Henderson 80325ac866 target/arm: Generalize inl_qrdmlah_* helper functions
Unify add/sub helpers and add a parameter for rounding.
This will allow saturating non-rounding to reuse this code.

Backports d21798856b227a20a0a41640236af445f4f4aeb0
2021-02-26 14:41:32 -05:00
Richard Henderson 1df7314dc3 target/arm: Convert aes and sm4 to gvec helpers
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.

This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.

Backports commit a04b68e1d4c4f0cd5cd7542697b1b230b84532f5 from qemu
2020-06-14 22:41:33 -04:00
Peter Maydell bb0aa79847 target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
We already have gvec helpers for addition and subtraction, but must
add one for fabd.

Backports commit a26a352bb498662cd0c205cb433a352f86fac7d2 from qemu
2020-05-15 23:26:51 -04:00
Richard Henderson 451683ee79 target/arm: Vectorize SABA/UABA
Include 64-bit element size in preparation for SVE2.

Backports commit cfdb2c0c95ae9205b0dd7f0f5e970cdec50fef20 from qemu
2020-05-15 22:15:14 -04:00
Richard Henderson 98c79f9afc target/arm: Vectorize SABD/UABD
Include 64-bit element size in preparation for SVE2.

Backports commit 50c160d44eb059c7fc7f348ae2c3b0cb41437044 from qemu
2020-05-15 22:01:29 -04:00
Richard Henderson 765dbb57f0 target/arm: Clear tail in gvec_fmul_idx_*, gvec_fmla_idx_*
Must clear the tail for AdvSIMD when SVE is enabled.

Fixes: ca40a6e6e39

Backports commit 525d9b6d42844e187211d25b69be8b378785bc24 from qemu
2020-05-15 21:50:30 -04:00
Richard Henderson 73d08253a2 target/arm: Pass pointer to qc to qrdmla/qrdmls
Pass a pointer directly to env->vfp.qc[0], rather than env.
This will allow SVE2, which does not modify QC, to pass a
pointer to dummy storage.

Change the return type of inl_qrdml.h_s16 to match the
sense of the operation: signed.

Backports commit e286bf4a72fe3a60490b8d6e3f28d6335677e08c from qemu
2020-05-15 21:48:35 -04:00
Richard Henderson 6190be3191 target/arm: Create gen_gvec_{sri,sli}
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.

Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.

Backports commit 893ab0542aa385a287cbe46d5535c8b9e95ce699 from qemu
2020-05-15 20:39:28 -04:00
Richard Henderson 2609e6f319 target/arm: Create gen_gvec_{u,s}{rshr,rsra}
Create vectorized versions of handle_shri_with_rndacc
for shift+round and shift+round+accumulate. Add out-of-line
helpers in preparation for longer vector lengths from SVE.

Backports commit 6ccd48d4ea244c1c46a24dfa50bfb547f11422dd from qemu
2020-05-15 20:28:44 -04:00
Richard Henderson 5d7c46204d target/arm: Create gen_gvec_[us]sra
The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.

Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.

Backports commit 631e565450c483e0622eec3d8b61d7fa41d16bca from qemu
2020-05-15 20:10:32 -04:00
Richard Henderson b26b4c06cd target/arm: Vectorize integer comparison vs zero
These instructions are often used in glibc's string routines.
They were the final uses of the 32-bit at a time neon helpers.

Backports commit 6b375d3546b009d1e63e07397ec9c6af256e15e9 from qemu
2020-04-30 21:29:17 -04:00
Richard Henderson fcce8d4aa1 target/arm: Convert PMULL.8 to gvec
We still need two different helpers, since NEON and SVE2 get the
inputs from different locations within the source vector. However,
we can convert both to the same internal form for computation.

The sve2 helper is not used yet, but adding it with this patch
helps illustrate why the neon changes are helpful.

Backports commit e7e96fc5ec8c79dc77fef522d5226ac09f684ba5 from qemu
2020-03-21 19:35:46 -04:00
Richard Henderson c00f72f74f target/arm: Convert PMULL.64 to gvec
The gvec form will be needed for implementing SVE2.

Backports commit b9ed510e46f2f9e31e5e8adb4661d5d1cbe9a459 from qemu
2020-03-21 19:27:38 -04:00
Richard Henderson db8a935b44 target/arm: Convert PMUL.8 to gvec
The gvec form will be needed for implementing SVE2.

Extend the implementation to operate on uint64_t instead of uint32_t.
Use a counted inner loop instead of terminating when op1 goes to zero,
looking toward the required implementation for ARMv8.4-DIT.

Backports commit a21bb78e5817be3f494922e1dadd6455fe5d6318 from qemu
2020-03-21 19:22:18 -04:00
Richard Henderson d3139f2f0a target/arm: Vectorize USHL and SSHL
These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift. This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.

Backports commit 87b74e8b6edd287ea2160caa0ebea725fa8f1ca1 from qemu
2020-03-21 19:14:17 -04:00
Richard Henderson 5473c3603f
target/arm: Add helpers for FMLAL
Note that float16_to_float32 rightly squashes SNaN to QNaN.
But of course pickNaNMulAdd, for ARM, selects SNaNs first.
So we have to preserve SNaN long enough for the correct NaN
to be selected. Thus float16_to_float32_by_bits.

Backports commit a4e943a716d5fac923d82df3eabc65d1e3624019 from qemu
2019-02-28 15:31:48 -05:00
Richard Henderson 5c34cab41c
target/arm: Add missing clear_tail calls
Fortunately, the functions affected are so far only called from SVE,
so there is no tail to be cleared. But as we convert more of AdvSIMD
to gvec, this will matter.

Backports commit d8efe78e8039511b95c23d75bb48eca6873fbb0f from qemu
2019-02-15 18:15:20 -05:00
Richard Henderson f3cb92c86c
target/arm: Use vector operations for saturation
For same-sign saturation, we have tcg vector operations. We can
compute the QC bit by comparing the saturated value against the
unsaturated value.

Backports commit 89e68b575e138d0af1435f11a8ffcd8779c237bd from qemu
2019-02-15 18:14:09 -05:00
Richard Henderson 10d468f601
target/arm: Split out FPSCR.QC to a vector field
Change the representation of this field such that it is easy
to set from vector code.

Backports commit a4d5846245c5e029e5aa3945a9bda1de1c3fedbf from qemu
2019-02-15 18:04:13 -05:00
Lioncash 8eaa850287
target/arm/vec_helper: Remove use of void pointer arithmetic
This is a GNU-specific extension.
2019-01-30 14:03:26 -05:00
Richard Henderson d343f8ac0f
target/arm: Implement SVE dot product (indexed)
Backports commit 16fcfdc7325649b187ac489f3ae0b0d2a20b6230 from qemu
2018-07-03 04:42:41 -04:00
Richard Henderson 2f6d555473
target/arm: Implement SVE dot product (vectors)
Backports commit d730ecaae77ac696515207a5ef99509240fc792b from qemu
2018-07-03 04:35:25 -04:00
Richard Henderson 01a7224cdb
target/arm: Implement SVE fp complex multiply add (indexed)
Enhance the existing helpers to support SVE, which takes the
index from each 128-bit segment. The change has no effect
for AdvSIMD, since there is only one such segment.

Backports commit 18fc24057815bf3d956cfab892a2bc2344bd1dcb from qemu
2018-07-03 04:32:46 -04:00
Richard Henderson 281deae0a9
target/arm: Pass index to AdvSIMD FCMLA (indexed)
For aa64 advsimd, we had been passing the pre-indexed vector.
However, sve applies the index to each 128-bit segment, so we
need to pass in the index separately.

For aa32 advsimd, the fp32 operation always has index 0, but
we failed to interpret the fp16 index correctly.

Backports commit 2cc99919a81a62589a4a6b0f365eabfead1db1a7 from qemu
2018-07-03 04:27:10 -04:00
Richard Henderson 942f3c835e
target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group
Backports commit 3887c0388d39930ab419d4ae6e8ca5ea67a74ad5 from qemu
2018-07-03 03:44:40 -04:00
Richard Henderson cf3c7824ff
target/arm: Implement SVE Floating Point Multiply Indexed Group
Backports commit ca40a6e6e390eb1cad7ade881dc7c622793f9324 from qemu
2018-07-03 03:35:49 -04:00