Convert the Neon VRSQRTS insn to using a gvec helper,
and use this to implement the fp16 case.
As with VRECPS, we adjust the phrasing of the new implementation
slightly so that the fp32 version parallels the fp16 one.
Backports 40fde72dda2da8d55b820fa6c5efd85814be2023
Convert the Neon VRECPS insn to using a gvec helper, and
use this to implement the fp16 case.
The phrasing of the new float32_recps_nf() is slightly different from
the old recps_f32() so that it parallels the f16 version; for f16 we
can't assume that flush-to-zero is always enabled.
Backports ac8c62c4e5a3f24e6d47f52ec1bfb20994caefa5
Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.
Backport 635187aaa92f21ab001e2868e803b3c5460261ca
Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.
This is the last use of do_3same_fp() so we can now delete
that function.
Backports commit cf722d75b329ef3f86b869e7e68cbfb1607b3bde
Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.
Backports e5adc70665ecaf4009c2fb8d66775ea718a85abd
Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.
Backports e22705bb941d82d6c2a09e8b2031084326902be3
Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.
Backport e43268c54b6cbcb197d179409df7126e81f8cd52
Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.
Backports bb2741da186ebaebc7d5189372be4401e1ff9972
Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.
(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)
Backports ad505db233b89b7fd4b5a98b6f0e8ac8d05b11db
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.
For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.
Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
Now the VFP_CONV_FIX macros can handle fp16's distinction between the
width of the operation and the width of the type used to pass operands,
use the macros rather than the open-coded functions.
This creates an extra six helper functions, all of which we are going
to need for the AArch32 VFP fp16 instructions.
Backports commit 414ba270c4fb758d987adf37ae9bfe531715c604
Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
the fp16 insns that use the DO_VFP_2OP macro, because there
is no fp16 version of VMOV_reg.
Notes:
* the gen_helper_vfp_negh already exists as we needed to create
it for the fp16 multiply-add insns
* as usual we need to use the f16 version of the fp_status;
this is only relevant for VSQRT
Backports ce2d65a5d191380756cdac7a1fd1ba76bd1621cf
Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
instructions. (These are all the remaining ones which we implement
via do_vfp_3op_[hsd]p().)
Backports commit e7cb0ded52c6d7b86585b09935fe7caeb9e38b69
Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
VSUB, VMUL, VDIV, VMINNM and VMAXNM:
* make the VFP_BINOP() macro generate float16 helpers as well as
float32 and float64
* implement a do_vfp_3op_hp() function similar to the existing
do_vfp_3op_sp()
* add decode for the half-precision insn patterns
Note that the VFP_BINOP macro use creates a couple of unused helper
functions vfp_maxh and vfp_minh, but they're small so it's not worth
splitting the BINOP operations into "needs halfprec" and "no
halfprec" groups.
Backports commit 120a0eb3ea23a5b06fae2f3daebd46a4035864cf
The temp that gets assigned to clean_addr has been allocated with
new_tmp_a64, which means that it will be freed at the end of the
instruction. Freeing it earlier leads to assertion failure.
The loop creates a complication, in which we allocate a new local
temp, which does need freeing, and the final code path is shared
between the loop and non-loop.
Fix this complication by adding new_tmp_a64_local so that the new
local temp is freed at the end, and can be treated exactly like
the non-loop path.
Fixes: bba87d0a0f4
Backports commit 4b4dc9750a0aa0b9766bd755bf6512a84744ce8a from qemu
Because the elements are non-sequential, we cannot eliminate many
tests straight away like we can for sequential operations. But
we often have the PTE details handy, so we can test for Tagged.
Backports commit d28d12f008ee44dc2cc2ee5d8f673be9febc951e from qemu
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.
Backports commit aa13f7c3c378fa41366b9fcd6c29af1c3d81126a from qemu
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.
Backports commit 71b9f3948c75bb97641a3c8c7de96d1cb47cdc07 from qemu
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.
Backports commit 206adacfb8d35e671e3619591608c475aa046b63 from qemu
For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.
Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.
Temporarily mark the functions unused to avoid Werror.
Backports commit b4cd95d2f4c7197b844f51b29871d888063ea3e7 from qemu
Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.
Since fb901c9, cpu_mem_index is now a simple extract
from env->hflags and not a large computation. Which means
that it's now more work to pass around this value than it
is to recompute it.
This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.
Replace existing uses of check_data_tbi in translate-a64.c that
perform multiple logical memory access. Leave the helper blank
for now to reduce the patch size.
Backports commit 73ceeb0011b23bac8bd2c09ebe3c18d034aa69ce from qemu
Replace existing uses of check_data_tbi in translate-a64.c that
perform a single logical memory access. Leave the helper blank
for now to reduce the patch size.
Backports commit 0a405be2b8fd9506a009b10d7d2d98c394b36db6 from qemu
Now that we know that the operation is on a single page,
we need not loop over pages while probing.
Backports commit e26d0d226892f67435cadcce86df0ddfb9943174 from qemu
The m68k-specific softfloat code includes a function floatx80_mod that
is extremely similar to floatx80_rem, but computing the remainder
based on truncating the quotient toward zero rather than rounding it
to nearest integer. This is also useful for emulating the x87 fprem
and fprem1 instructions. Change the floatx80_rem implementation into
floatx80_modrem that can perform either operation, with both
floatx80_rem and floatx80_mod as thin wrappers available for all
targets.
There does not appear to be any use for the _mod operation for other
floating-point formats in QEMU (the only other architectures using
_rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc,
for instructions that have been removed in the latest version of the
architecture), so no change is made to the code for other formats.
Backports commit 6b8b0136ab3018e4b552b485f808bf66bcf19ead from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit 43fa36c96c24349145497adc1b451f9caf74e344 from qemu
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Backports commit afc8b7d32668547308bdd654a63cf5228936e0ba from qemu
With this conversion, we will be able to use the same helpers
with sve. This also fixes a bug in which we failed to clear
the high bits of the SVE register after an AdvSIMD operation.
Backports commit 1738860d7e60dec5dbeba17f8b44d31aae3accac from qemu
No host backend support yet, but the interfaces for rotls
are in place. Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.
Backports commit 23850a74afb641102325b4b7f74071d929fc4594 from qemu
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
for each backend, and then this code becomes unused.
Backports commit 5d0ceda902915e3f0e21c39d142c92c4e97c3ebb from qemu