Commit graph

5687 commits

Author SHA1 Message Date
Eduardo Habkost cefb1666c0 arm: Fix typo in AARCH64_CPU_GET_CLASS definition
There's a typo in the type name of AARCH64_CPU_GET_CLASS. This
was never detected because the macro is not used by any code.

Backports 37e3d65043229bb20bd07af74dc0866e12071415
2021-03-01 18:03:29 -05:00
Peter Maydell ff74ede2fd target/arm: Enable FP16 in '-cpu max'
Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
that our "-cpu max" has v8.2-FP16.

Backports commit 5f07817eb94542e39a419baafa3026b15e8d33f7
2021-03-01 18:00:13 -05:00
Peter Maydell b948636c4a target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
and use this to implement fp16 support.

Backports fc8ae790311882afa3c7816df004daf978c40e9a
2021-03-01 17:57:36 -05:00
Peter Maydell 8c6affbca4 target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
Add gvec helpers for doing Neon-style indexed non-fused fp
multiply-and-accumulate operations.

Backports commit c50d8d144098a8261233ca31b47e3bc487e112fe
2021-03-01 17:52:31 -05:00
Peter Maydell 3cc3099e36 target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.

Backports commit d7ce81e553e6789bf27657105b32575668d60b1c
2021-03-01 17:48:42 -05:00
Peter Maydell 681218b4ab target/arm: Implement fp16 for Neon VRINTX
Convert the Neon VRINTX insn to use gvec, and use this to implement
fp16 support for it.

Backports 23afcdd2511f2a3dc05bed650d27bd25cf9b2a3c
2021-03-01 17:47:25 -05:00
Peter Maydell 53aba9d900 target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
and use this to implement the fp16 versions.

Backports 18725916b1438b54d6d6533980833d2251a20b7c
2021-03-01 17:44:49 -05:00
Peter Maydell eb4054d04f target/arm: Implement fp16 for Neon VCVT with rounding modes
Convert the Neon VCVT with-specified-rounding-mode instructions
to gvec, and use this to implement fp16 support for them.

Backports ca88a6efdf4ce96b646a896059f9bd324c2cebc4
2021-03-01 17:40:36 -05:00
Peter Maydell 56fe927d40 target/arm: Implement fp16 for Neon VCVT fixed-point
Implement fp16 for the Neon VCVT insns which convert between
float and fixed-point.

Backports 24018cf3990b692b51e50183c5fbd98d17b3fa40
2021-03-01 17:36:43 -05:00
Peter Maydell 948b01ad01 target/arm: Convert Neon VCVT fixed-point to gvec
Convert the Neon VCVT float<->fixed-point insns to a
gvec style, in preparation for adding fp16 support.

Backports 7b959c5890deb9a6d71bc6800006a0eae0a84c60
2021-03-01 17:33:20 -05:00
Peter Maydell c324c6817e target/arm: Implement fp16 for Neon float-integer VCVT
Convert the Neon float-integer VCVT insns to gvec, and use this
to implement fp16 support for them.

Note that unlike the VFP int<->fp16 VCVT insns we converted
earlier and which convert to/from a 32-bit integer, these
Neon insns convert to/from 16-bit integers. So we can use
the existing vfp conversion helpers for the f32<->u32/i32
case but need to provide our own for f16<->u16/i16.

Backports 7782a9afec81d1efe23572135c1ed777691ccde5
2021-03-01 17:29:02 -05:00
Peter Maydell 82f4a7e135 target/arm: Implement fp16 for Neon pairwise fp ops
Convert the Neon pairwise fp ops to use a single gvic-style
helper to do the full operation instead of one helper call
for each 32-bit part. This allows us to use the same
framework to implement the fp16.

Backports 1dc587ee9bfe804406eb3e0bacf47a80644d8abc
2021-03-01 17:25:19 -05:00
Peter Maydell b08ea84374 target/arm: Implement fp16 for Neon VRSQRTS
Convert the Neon VRSQRTS insn to using a gvec helper,
and use this to implement the fp16 case.

As with VRECPS, we adjust the phrasing of the new implementation
slightly so that the fp32 version parallels the fp16 one.

Backports 40fde72dda2da8d55b820fa6c5efd85814be2023
2021-03-01 17:20:22 -05:00
Peter Maydell f4ebbba9fd target/arm: Implement fp16 for Neon VRECPS
Convert the Neon VRECPS insn to using a gvec helper, and
use this to implement the fp16 case.

The phrasing of the new float32_recps_nf() is slightly different from
the old recps_f32() so that it parallels the f16 version; for f16 we
can't assume that flush-to-zero is always enabled.

Backports ac8c62c4e5a3f24e6d47f52ec1bfb20994caefa5
2021-03-01 17:09:16 -05:00
Peter Maydell 5776c594e4 target/arm: Implement fp16 for Neon fp compare-vs-0
Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.

Backport 635187aaa92f21ab001e2868e803b3c5460261ca
2021-03-01 17:05:03 -05:00
Peter Maydell 8de258c3cb target/arm: Implement fp16 for Neon VFMA, VMFS
Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.

This is the last use of do_3same_fp() so we can now delete
that function.

Backports commit cf722d75b329ef3f86b869e7e68cbfb1607b3bde
2021-03-01 17:00:49 -05:00
Peter Maydell 587c3549b7 target/arm: Implement fp16 for Neon VMLA, VMLS operations
Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.

Backports e5adc70665ecaf4009c2fb8d66775ea718a85abd
2021-03-01 16:57:20 -05:00
Peter Maydell 0068d12355 target/arm: Implement fp16 for Neon VMAXNM, VMINNM
Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.

Backports e22705bb941d82d6c2a09e8b2031084326902be3
2021-03-01 16:53:57 -05:00
Peter Maydell 465cfb54c4 target/arm: Implement fp16 for Neon VMAX, VMIN
Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.

Backport e43268c54b6cbcb197d179409df7126e81f8cd52
2021-03-01 16:50:23 -05:00
Peter Maydell 6dd4a8e93f target/arm: Implement fp16 for VACGE, VACGT
Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.

Backports bb2741da186ebaebc7d5189372be4401e1ff9972
2021-03-01 16:47:44 -05:00
Peter Maydell 4eb39f1b2f target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.

(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)

Backports ad505db233b89b7fd4b5a98b6f0e8ac8d05b11db
2021-03-01 16:44:34 -05:00
Peter Maydell 0e8fd4cd0c target/arm: Implement fp16 for Neon VABS, VNEG of floats
Rewrite Neon VABS/VNEG of floats to use gvec logical AND and XOR, so
that we can implement the fp16 version of the insns.

Backport 2b70d8cd09f5450c15788acd24f6f8bc4116c395
2021-03-01 16:40:33 -05:00
Peter Maydell 6c71951d54 target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
We already have gvec helpers for floating point VRECPE and
VRQSRTE, so convert the Neon decoder to use them and
add the fp16 support.

Backports 4a15d9a3b39d4d161d7e03dfcf52e9f214eef0b8
2021-03-01 16:35:04 -05:00
Peter Maydell 4850377f01 target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.

For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.

Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
2021-03-01 16:31:54 -05:00
Peter Maydell 08b70267d0 target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
Implement the VFP fp16 variant of VMOV that transfers a 16-bit
value between a general purpose register and a VFP register.

Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later
only we have no need to replicate the old "updates CPSR.NZCV"
behaviour that the singleprec version of this insn does

Backports commit 46a4b854525cb9f34a611f6ada6cdff1eab0ac2d
2021-03-01 16:26:34 -05:00
Peter Maydell 58485bca97 target/arm: Implement new VFP fp16 insn VMOVX
The fp16 extension includes a new instruction VMOVX, which copies the
upper 16 bits of a 32-bit source VFP register into the lower 16
bits of the destination and zeroes the high half of the destination.
Implement it.

Backports f61e5c43b86907dea17f431b528d806659d62bcb
2021-03-01 16:24:50 -05:00
Peter Maydell 3dd587e3df target/arm: Implement new VFP fp16 insn VINS
The fp16 extension includes a new instruction VINS, which copies the
lower 16 bits of a 32-bit source VFP register into the upper 16 bits
of the destination. Implement it.

Backports commit e4875e3bcc3a9c54d7e074c8f51e04c2e6364e2e
2021-03-01 16:22:27 -05:00
Peter Maydell 90aa9647e0 target/arm: Implement VFP fp16 VRINT*
Implement the fp16 version of the VFP VRINT* insns.

Backports 0a6f4b4cb338665b81ad824d9a6868932461b7f7
2021-03-01 16:15:21 -05:00
Peter Maydell 1c8088b48a target/arm: Implement VFP fp16 VSEL
Implement the fp16 versions of the VFP VSEL instruction.

Backports commit 11e78fecdf2d605cfed33aa09bbcf0cc4fb95886
2021-03-01 16:08:51 -05:00
Peter Maydell beee4ad7f3 target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
Implement the fp16 versions of the VFP VCVT instruction forms
which convert between floating point and integer with a specified
rounding mode.

Backports c505bc6a9d50a48f9d89d6cf930e863838a5b367
2021-02-28 05:18:07 -05:00
Peter Maydell 74a6af4e23 target/arm: Implement VFP fp16 VCVT between float and fixed-point
Implement the fp16 versions of the VFP VCVT instruction forms which
convert between floating point and fixed-point.

Backports a149e2de0b63e3906729ed1d3df7d9ecdb6de5e6
2021-02-28 05:15:40 -05:00
Peter Maydell 9c5b6f06a2 target/arm: Use macros instead of open-coding fp16 conversion helpers
Now the VFP_CONV_FIX macros can handle fp16's distinction between the
width of the operation and the width of the type used to pass operands,
use the macros rather than the open-coded functions.

This creates an extra six helper functions, all of which we are going
to need for the AArch32 VFP fp16 instructions.

Backports commit 414ba270c4fb758d987adf37ae9bfe531715c604
2021-02-28 05:08:44 -05:00
Peter Maydell dd6e11eaa7 target/arm: Make VFP_CONV_FIX macros take separate float type and float size
Currently the VFP_CONV_FIX macros take a single fsz argument for the
size of the float type, which is used both to select the name of
the functions to call (eg float32_is_any_nan()) and also for the
type to use for the float inputs and outputs (eg float32).

Separate these into fsz and ftype arguments, so that we can use them
for fp16, which uses 'float16' in the function names but is still
passing inputs and outputs in a 32-bit sized type.

Backports 5366f6ad7da4f6def2733ec7ee24495430256839
2021-02-28 05:05:53 -05:00
Peter Maydell f8241ae22f target/arm: Implement VFP fp16 VCVT between float and integer
Backports 0094e9f475a5a742d10d2f1e1beceea82b69f982
2021-02-28 05:02:25 -05:00
Peter Maydell ac9ae5cbe7 target/arm: Implement VFP fp16 VLDR and VSTR
Implement the fp16 versions of the VFP VLDR/VSTR (immediate).

Backports commit 274afbb121107b8aaeaa11b3e7904d5f8ae38a94
2021-02-28 04:58:32 -05:00
Peter Maydell 5d98e14545 target/arm: Implement VFP fp16 VCMP
Implement fp16 version of VCMP.

Backports 1b88b054c5b201e8581114d29527c6a5a7e088c9
2021-02-28 04:56:24 -05:00
Peter Maydell 25d95570f3 target/arm: Implement VFP fp16 for VMOV immediate
Implement VFP fp16 support for the VMOV immediate insn.

Backports commit 28c28728e53c9f4c13a5cd50f313788c7ec2f9ad
2021-02-28 04:51:11 -05:00
Peter Maydell 2d9abf7c0b target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT
Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
the fp16 insns that use the DO_VFP_2OP macro, because there
is no fp16 version of VMOV_reg.

Notes:
* the gen_helper_vfp_negh already exists as we needed to create
it for the fp16 multiply-add insns
* as usual we need to use the f16 version of the fp_status;
this is only relevant for VSQRT

Backports ce2d65a5d191380756cdac7a1fd1ba76bd1621cf
2021-02-28 04:48:28 -05:00
Peter Maydell f3af6b8c25 target/arm: Macroify uses of do_vfp_2op_sp() and do_vfp_2op_dp()
Macroify the uses of do_vfp_2op_sp() and do_vfp_2op_dp(); this will
make it easier to add the halfprec support.

Backports 009a07335b8ff492d940e1eb229a1b0d302c2512
2021-02-28 04:43:01 -05:00
Peter Maydell 6ac2c597ab target/arm: Implement VFP fp16 for fused-multiply-add
Implement VFP fp16 support for fused multiply-add insns
VFNMA, VFNMS, VFMA, VFMS.

Backports 9886fe2834b064a3cf0675a4659942ed547aed42
2021-02-28 04:39:21 -05:00
Peter Maydell f86c84425b target/arm: Macroify trans functions for VFMA, VFMS, VFNMA, VFNMS
Macroify creation of the trans functions for single and double
precision VFMA, VFMS, VFNMA, VFNMS. The repetition was OK for
two sizes, but we're about to add halfprec and it will get a bit
more than seems reasonable.

Backports 2aa8dcfa14558fe2a63ed0496d60b02565c9a225
2021-02-28 04:36:07 -05:00
Peter Maydell a42ecfe203 target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL
Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
instructions. (These are all the remaining ones which we implement
via do_vfp_3op_[hsd]p().)

Backports commit e7cb0ded52c6d7b86585b09935fe7caeb9e38b69
2021-02-28 04:29:37 -05:00
Peter Maydell eae621098d target/arm: Implement VFP fp16 for VFP_BINOP operations
Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
VSUB, VMUL, VDIV, VMINNM and VMAXNM:

* make the VFP_BINOP() macro generate float16 helpers as well as
float32 and float64
* implement a do_vfp_3op_hp() function similar to the existing
do_vfp_3op_sp()
* add decode for the half-precision insn patterns

Note that the VFP_BINOP macro use creates a couple of unused helper
functions vfp_maxh and vfp_minh, but they're small so it's not worth
splitting the BINOP operations into "needs halfprec" and "no
halfprec" groups.

Backports commit 120a0eb3ea23a5b06fae2f3daebd46a4035864cf
2021-02-28 04:24:39 -05:00
Peter Maydell 1afb240134 target/arm: Use correct ID register check for aa32_fp16_arith
The aa32_fp16_arith feature check function currently looks at the
AArch64 ID_AA64PFR0 register. This is (as the comment notes) not
correct. The bogus check was put in mostly to allow testing of the
fp16 variants of the VCMLA instructions and it was something of
a mistake that we allowed them to exist in master.

Switch the feature check function to testing VMFR1.FPHP, which is
what it ought to be.

This will remove emulation of the VCMLA and VCADD insns from
AArch32 code running on an AArch64 '-cpu max' using system emulation.
(They were never enabled for aarch32 linux-user and system-emulation.)
Since we weren't advertising their existence via the AArch32 ID
register, well-behaved guests wouldn't have been using them anyway.

Once we have implemented all the AArch32 support for the FP16 extension
we will advertise it in the MVFR1 ID register field, which will reenable
these insns along with all the others.

Backports 02bc236d0131a666d4ac2bb7197bbad2897c336a
2021-02-27 16:47:48 -05:00
Peter Maydell b93ca1fca6 target/arm: Remove local definitions of float constants
In several places the target/arm code defines local float constants
for 2, 3 and 1.5, which are also provided by include/fpu/softfloat.h.
Remove the unnecessary local duplicate versions.

Backports b684e49a17da39539b0ac6e4c4c98b28b38feb76
2021-02-27 16:47:10 -05:00
Chen Qun 46af765bbb target/arm/translate-a64:Remove redundant statement in disas_simd_two_reg_misc_fp16()
Clang static code analyzer show warning:
target/arm/translate-a64.c:13007:5: warning: Value stored to 'rd' is never read
rd = extract32(insn, 0, 5);
^ ~~~~~~~~~~~~~~~~~~~~~
target/arm/translate-a64.c:13008:5: warning: Value stored to 'rn' is never read
rn = extract32(insn, 5, 5);
^ ~~~~~~~~~~~~~~~~~~~~~

Backports fa71dd531c12ad9a05cdd78392e9fc2a30ea921d
2021-02-27 16:45:25 -05:00
Chen Qun 9bac2113cd target/arm/translate-a64:Remove dead assignment in handle_scalar_simd_shli()
Clang static code analyzer show warning:
target/arm/translate-a64.c:8635:14: warning: Value stored to 'tcg_rn' during its
initialization is never read
TCGv_i64 tcg_rn = new_tmp_a64(s);
^~~~~~ ~~~~~~~~~~~~~~
target/arm/translate-a64.c:8636:14: warning: Value stored to 'tcg_rd' during its
initialization is never read
TCGv_i64 tcg_rd = new_tmp_a64(s);
^~~~~~ ~~~~~~~~~~~~~~

Backports 07174c86b41e91d98ed2ee0ee12e516694853c6b
2021-02-27 16:44:29 -05:00
LIU Zhiwei ad78fc2df5 softfloat: Define comparison operations for bfloat16
Backports c53b1079334c41b342a8ad3b7ccfd51bf5427f5
2021-02-27 16:43:10 -05:00
LIU Zhiwei d26cd63ad6 softfloat: Define misc operations for bfloat16
Backports 5ebf5f4be66c378fd5f3dee85f54dd4942171d57
2021-02-27 16:41:46 -05:00
LIU Zhiwei d8168a8142 softfloat: Define convert operations for bfloat16
Backports 34f0c0a98a5f3bb6706088c0384f937f7a294d3e
2021-02-27 16:37:11 -05:00
LIU Zhiwei b0be0d28cc softfloat: Define operations for bfloat16
Backports 8282310d8535cc2a8431c516e907da79f92df6eb
2021-02-26 15:20:30 -05:00
Stephen Long 95a0837f2d softfloat: Add float16_is_normal
This float16 predicate was missing from the normal set.

Backports a03e924cf8a22888060fc0de4d91de053cd5cde4
2021-02-26 15:12:37 -05:00
Frank Chang d97454eb63 softfloat: Add fp16 and uint8/int8 conversion functions
Backports 0d93d8ec632154dea2627a9e989972ee09721187
2021-02-26 15:11:57 -05:00
Kito Cheng 76d123efee softfloat: Implement the full set of comparisons for float16
Backports dd205025a048ef6f53ff51eb86ddc58e7a82a771
2021-02-26 15:04:12 -05:00
Lioncash f5a21abc0b target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd 2021-02-26 15:01:44 -05:00
Richard Henderson aa97b6b755 target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd
Backports 3607440c4df6498585a570cfc1041e4972b41b56
2021-02-26 14:51:17 -05:00
Richard Henderson 732674b868 target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd
Backports 2e5a265e6a9e7169c4a3e87db261b2fa92582590
2021-02-26 14:46:29 -05:00
Richard Henderson 80325ac866 target/arm: Generalize inl_qrdmlah_* helper functions
Unify add/sub helpers and add a parameter for rounding.
This will allow saturating non-rounding to reuse this code.

Backports d21798856b227a20a0a41640236af445f4f4aeb0
2021-02-26 14:41:32 -05:00
Richard Henderson 1bedcfbda3 target/arm: Tidy SVE tszimm shift formats
Rather than require the user to fill in the immediate (shl or shr),
create full formats that include the immediate.
2021-02-26 14:35:53 -05:00
Richard Henderson da41a23a1b target/arm: Split out gen_gvec_ool_zz
Backports 40e32e5a8a379baf6e0d49d83cf19950cfbaf96b
2021-02-26 14:32:36 -05:00
Richard Henderson 5bd98feed9 target/arm: Split out gen_gvec_ool_zzz
Backports e645d1a17a359156c6047006d760ca176d493edb
2021-02-26 14:29:48 -05:00
Richard Henderson aa3819c396 target/arm: Split out gen_gvec_ool_zzp
Model after gen_gvec_fn_zzz et al.

Backports 96a461f7c12587d3a64a71e4d90cda5c09ca3eb4
2021-02-26 14:26:33 -05:00
Lioncash 2da89a626c target/arm: Merge helper_sve_clr_* and helper_sve_movz_* 2021-02-26 14:23:06 -05:00
Richard Henderson 8eb3642d96 target/arm: Split out gen_gvec_ool_zzzp
Model after gen_gvec_fn_zzz et al.

Backports 36cbb7a8e7100864c488a1153cecba90b1c33a4c
2021-02-26 14:14:13 -05:00
Richard Henderson 9b3671e9ad target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp
The gvec operation was added after the initial implementation
of the SEL instruction and was missed in the conversion.

Backports d4bc623254b55e2f9613c9450216fa7e50c03929
2021-02-26 14:12:25 -05:00
Richard Henderson c8c247410f target/arm: Clean up 4-operand predicate expansion
Move the check for !S into do_pppp_flags, which allows to merge in
do_vecop4_p. Split out gen_gvec_fn_ppp without sve_access_check,
to mirror gen_gvec_fn_zzz.

Backport dd81a8d7cf5c90963603806e58a217bbe759f75e
2021-02-26 14:07:14 -05:00
Richard Henderson 7bef6489a8 target/arm: Merge do_vector2_p into do_mov_p
This is the only user of the function

Backports d0b2df5a01eeccbac71d4d883158b91e7f9a6a29
2021-02-26 13:59:00 -05:00
Richard Henderson f329d428f3 target/arm: Rearrange {sve,fp}_check_access assert
We want to ensure that access is checked by the time we ask
for a specific fp/vector register. We want to ensure that
we do not emit two lots of code to raise an exception.

But sometimes it's difficult to cleanly organize the code
such that we never pass through sve_check_access exactly once.
Allow multiple calls so long as the result is true, that is,
no exception to be raised.

Backports 8a40fe5f1bf3837ae3f9961efe1d51e7214f2664
2021-02-26 13:56:27 -05:00
Richard Henderson 64822511dd target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn
Model gen_gvec_fn_zzz on gen_gvec_fn3 in translate-a64.c, but
indicating which kind of register and in which order.

Model do_zzz_fn on the other do_foo functions that take an
argument set and verify sve enabled.

Backports 28c4da31be6a5e501b60b77bac17652dd3211378
2021-02-26 13:53:10 -05:00
Richard Henderson 3146cbb64e target/arm: Split out gen_gvec_fn_zz
Model the new function on gen_gvec_fn2 in translate-a64.c, but
indicating which kind of register and in which order. Since there
is only one user of do_vector2_z, fold it into do_mov_z

Backports f7d79c41fa4bd0f0d27dcd14babab8575fbed39f
2021-02-26 13:50:05 -05:00
Richard Henderson 234a22803d qemu/int128: Add int128_lshift
Add left-shift to match the existing right-shift.

Backports 5be4dd043f5beb5e7587d1ef8dd4e3716ec05639
2021-02-26 13:45:44 -05:00
Richard Henderson 6f341e0199 target/arm: Fill in the WnR syndrome bit in mte_check_fail
According to AArch64.TagCheckFault, none of the other ISS values are
provided, so we do not need to go so far as merge_syn_data_abort.
But we were missing the WnR bit.

Backports commit 9a4670be7f0734d27bf4058db3becf83cd0cc9d5 from qemu
2021-02-26 12:26:15 -05:00
Richard Henderson 6969435fb8 target/arm: Pass the entire mte descriptor to mte_check_fail
We need more information than just the mmu_idx in order
to create the proper exception syndrome. Only change the
function signature so far.

Backports dbf8c32178291169e111a6a9fd7ae17af4a3039d
2021-02-26 12:19:51 -05:00
Philippe Mathieu-Daudé d4c59cce4e target/arm: Clarify HCR_EL2 ARMCPRegInfo type
In commit ce4afed839 ("target/arm: Implement AArch32 HCR and HCR2")
the HCR_EL2 register has been changed from type NO_RAW (no underlying
state and does not support raw access for state saving/loading) to
type CONST (TCG can assume the value to be constant), removing the
read/write accessors.
We forgot to remove the previous type ARM_CP_NO_RAW. This is not
really a problem since the field is overwritten. However it makes
code review confuse, so remove it.

Backports 0e5aac18bc31dbdfab51f9784240d0c31a4c5579
2021-02-26 12:18:15 -05:00
Max Filippov d9e561ab2a softfloat: add xtensa specialization for pickNaNMulAdd
pickNaNMulAdd logic on Xtensa is to apply pickNaN to the inputs of the
expression (a * b) + c. However if default NaN is produces as a result
of (a * b) calculation it is not considered when c is NaN.
So with two pickNaN variants there must be two pickNaNMulAdd variants.
In addition the invalid flag is always set when (a * b) produces NaN.

Backports commit fbcc38e4cb1b539b8615ec9b0adc285351d77628 from qemu
2021-02-26 12:16:51 -05:00
Max Filippov fee4c62fe4 softfloat: pass float_status pointer to pickNaN
Pass float_status structure pointer to the pickNaN so that
machine-specific settings are available to NaN selection code.
Add use_first_nan property to float_status and use it in Xtensa-specific
pickNaN.

Backports commit 913602e3ffe6bf50b869a14028a55cb267645ba3
2021-02-26 12:16:05 -05:00
Max Filippov db780eff66 softfloat: make NO_SIGNALING_NANS runtime property
target/xtensa, the only user of NO_SIGNALING_NANS macro has FPU
implementations with and without the corresponding property. With
NO_SIGNALING_NANS being a macro they cannot be a part of the same QEMU
executable.
Replace macro with new property in float_status to allow cores with
different FPU implementations coexist.

Backports cc43c6925113c5bc8f1a0205375931d2e4807c99
2021-02-26 12:11:40 -05:00
Peter Maydell 3e5aa58139 target/arm: Use correct FPST for VCMLA, VCADD on fp16
When we implemented the VCMLA and VCADD insns we put in the
code to handle fp16, but left it using the standard fp status
flags. Correct them to use FPST_STD_F16 for fp16 operations.

Bacports commit b34aa5129e9c3aff890b4f4bcc84962e94185629
2021-02-26 12:02:23 -05:00
Peter Maydell 61377ce01c target/arm: Implement FPST_STD_F16 fpstatus
Architecturally, Neon FP16 operations use the "standard FPSCR" like
all other Neon operations. However, this is defined in the Arm ARM
pseudocode as "a fixed value, except that FZ16 (and AHP) follow the
FPSCR bits". In QEMU, the softfloat float_status doesn't include
separate flush-to-zero for FP16 operations, so we must keep separate
fp_status for "Neon non-FP16" and "Neon fp16" operations, in the
same way we do already for the non-Neon "fp_status" vs "fp_status_f16".

Add the extra float_status field to the CPU state structure,
ensure it is correctly initialized and updated on FPSCR writes,
and make fpstatus_ptr(FPST_STD_F16) return a pointer to it.

Backports commit aaae563bc73de0598bbc09a102e68f27fafe704a
2021-02-26 12:00:25 -05:00
Peter Maydell b1b0a41507 target/arm: Make A32/T32 use new fpstatus_ptr() API
Make A32/T32 code use the new fpstatus_ptr() API:
get_fpstatus_ptr(0) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(1) -> fpstatus_ptr(FPST_STD)

Backports a84d1d1316726704edd2617b2c30c921d98a8137
2021-02-26 11:55:55 -05:00
Peter Maydell 79359e3a69 target/arm: Replace A64 get_fpstatus_ptr() with generic fpstatus_ptr()
We currently have two versions of get_fpstatus_ptr(), which both take
an effectively boolean argument:
* the one for A64 takes "bool is_f16" to distinguish fp16 from other ops
* the one for A32/T32 takes "int neon" to distinguish Neon from other ops

This is confusing, and to implement ARMv8.2-FP16 the A32/T32 one will
need to make a four-way distinction between "non-Neon, FP16",
"non-Neon, single/double", "Neon, FP16" and "Neon, single/double".
The A64 version will then be a strict subset of the A32/T32 version.

To clean this all up, we want to go to a single implementation which
takes an enum argument with values FPST_FPCR, FPST_STD,
FPST_FPCR_F16, and FPST_STD_F16. We rename the function to
fpstatus_ptr() so that unconverted code gets a compilation error
rather than silently passing the wrong thing to the new function.

This commit implements that new API, and converts A64 to use it:
get_fpstatus_ptr(false) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(true) -> fpstatus_ptr(FPST_FPCR_F16)

Backports commit cdfb22bb7326fee607d9553358856cca341dbc9a
2021-02-26 11:46:51 -05:00
Peter Maydell e9240f0f54 target/arm: Delete unused ARM_FEATURE_CRC
In commit 962fcbf2efe57231a9f5df we converted the uses of the
ARM_FEATURE_CRC bit to use the aa32_crc32 isar_feature test
instead. However we forgot to remove the now-unused definition
of the feature name in the enum. Delete it now.

Backports commit cf6303d262e31f4812dfeb654c6c6803e52000af
2021-02-26 11:24:40 -05:00
Peter Maydell e0000d1700 target/arm/translate.c: Delete/amend incorrect comments
In arm_tr_init_disas_context() we have a FIXME comment that suggests
"cpu_M0 can probably be the same as cpu_V0". This isn't in fact
possible: cpu_V0 is used as a temporary inside gen_iwmmxt_shift(),
and that function is called in various places where cpu_M0 contains a
live value (i.e. between gen_op_iwmmxt_movq_M0_wRn() and
gen_op_iwmmxt_movq_wRn_M0() calls). Remove the comment.

We also have a comment on the declarations of cpu_V0/V1/M0 which
claims they're "for efficiency". This isn't true with modern TCG, so
replace this comment with one which notes that they're only used with
the iwmmxt decode

Backports 8b4c9a50dc9531a729ae4b5941d287ad0422db48
2021-02-26 11:23:52 -05:00
Peter Maydell 0759bb8eaf target/arm: Delete unused VFP_DREG macros
As part of the Neon decodetree conversion we removed all
the uses of the VFP_DREG macros, but forgot to remove the
macro definitions. Do so now.

Backports e60527c5d501e5015a119a0388a27abeae4dac09
2021-02-26 11:22:01 -05:00
Peter Maydell 368323b03f target/arm: Remove ARCH macro
The ARCH() macro was used a lot in the legacy decoder, but
there are now just two uses of it left. Since a macro which
expands out to a goto is liable to be confusing when reading
code, replace the last two uses with a simple open-coded
qeuivalent.

Backports ce51c7f522ca488c795c3510413e338021141c96
2021-02-26 11:21:20 -05:00
Peter Maydell 5d9c0addcf target/arm: Convert T32 coprocessor insns to decodetree
Convert the T32 coprocessor instructions to decodetree.
As with the A32 conversion, this corrects an underdecoding
where we did not check that MRRC/MCRR [24:21] were 0b0010
and so treated some kinds of LDC/STC and MRRC/MCRR rather
than UNDEFing them.

Backports commit 4c498dcfd84281f20bd55072630027d1b3c115fd
2021-02-26 11:19:35 -05:00
Peter Maydell bdaaac68f5 target/arm: Do M-profile NOCP checks early and via decodetree
For M-profile CPUs, the architecture specifies that the NOCP
exception when a coprocessor is not present or disabled should cover
the entire wide range of coprocessor-space encodings, and should take
precedence over UNDEF exceptions. (This is the opposite of
A-profile, where checking for a disabled FPU has to happen last.)

Implement this with decodetree patterns that cover the specified
ranges of the encoding space. There are a few instructions (VLLDM,
VLSTM, and in v8.1 also VSCCLRM) which are in copro-space but must
not be NOCP'd: these must be handled also in the new m-nocp.decode so
they take precedence.

This is a minor behaviour change: for unallocated insn patterns in
the VFP area (cp=10,11) we will now NOCP rather than UNDEF when the
FPU is disabled.

As well as giving us the correct architectural behaviour for v8.1M
and the recommended behaviour for v8.0M, this refactoring also
removes the old NOCP handling from the remains of the 'legacy
decoder' in disas_thumb2_insn(), paving the way for cleaning that up.

Since we don't currently have a v8.1M feature bit or any v8.1M CPUs,
the minor changes to this logic that we'll need for v8.1M are marked
up with TODO comments.

Backports commit a3494d4671797c291c88bd414acb0aead15f7239 from qemu
2021-02-26 11:17:23 -05:00
Peter Maydell c675b73b1f target/arm: Tidy up disas_arm_insn()
The only thing left in the "legacy decoder" is the handling
of disas_xscale_insn(), and we can simplify the code.

Backports commit 8198c071bc55bee55ef4f104a5b125f541b51096
2021-02-26 10:59:09 -05:00
Peter Maydell fc4cc9d95f target/arm: Convert A32 coprocessor insns to decodetree
Convert the A32 coprocessor instructions to decodetree.

Note that this corrects an underdecoding: for the 64-bit access case
(MRRC/MCRR) we did not check that bits [24:21] were 0b0010, so we
would incorrectly treat LDC/STC as MRRC/MCRR rather than UNDEFing
them.

The decodetree versions of these insns assume the coprocessor
is in the range 0..7 or 14..15. This is architecturally sensible
(as per the comments) and OK in practice for QEMU because the only
uses of the ARMCPRegInfo infrastructure we have that aren't
for coprocessors 14 or 15 are the pxa2xx use of coprocessor 6.
We add an assertion to the define_one_arm_cp_reg_with_opaque()
function to catch any accidental future attempts to use it to
define coprocessor registers for invalid coprocessors.

Backports commit cd8be50e58f63413c033531d3273c0e44851684f from qemu
2021-02-26 10:57:00 -05:00
Peter Maydell ef0e23f1f9 target/arm: Separate decode from handling of coproc insns
As a prelude to making coproc insns use decodetree, split out the
part of disas_coproc_insn() which does instruction decoding from the
part which does the actual work, and make do_coproc_insn() handle the
UNDEF-on-bad-permissions and similar cases itself rather than
returning 1 to eventually percolate up to a callsite that calls
unallocated_encoding() for it.

Backports 19c23a9baafc91dd3881a7a4e9bf454e42d24e4e
2021-02-26 10:53:52 -05:00
Peter Maydell 2944a75b98 target/arm: Pull handling of XScale insns out of disas_coproc_insn()
At the moment we check for XScale/iwMMXt insns inside
disas_coproc_insn(): for CPUs with ARM_FEATURE_XSCALE all copro insns
with cp 0 or 1 are handled specially. This works, but is an odd
place for this check, because disas_coproc_insn() is called from both
the Arm and Thumb decoders but the XScale case never applies for
Thumb (all the XScale CPUs were ARMv5, which has only Thumb1, not
Thumb2 with the 32-bit coprocessor insn encodings). It also makes it
awkward to convert the real copro access insns to decodetree.

Move the identification of XScale out to its own function
which is only called from disas_arm_insn().

Backports commit 7b4f933db865391a90a3b4518bb2050a83f2a873 from qemu
2021-02-26 10:50:32 -05:00
LIU Zhiwei 9b7f4b72fc target/riscv: vector single-width integer multiply instructions 2021-02-26 10:46:26 -05:00
LIU Zhiwei ab81642440 target/riscv: vector integer min/max instructions
558fa7797c919c4f21ac10980f3ed28160d6d3cb
2021-02-26 10:43:13 -05:00
LIU Zhiwei 965af9986a target/riscv: vector integer comparison instructions
1366fc79be04fa56a0e3f078ba4f26c27ac67e89
2021-02-26 10:40:33 -05:00
LIU Zhiwei 244793c4e8 target/riscv: vector single-width bit shift instructions
Backports 3277d955d21d8943d80062b4cfd8547f831dbd51
2021-02-26 10:37:09 -05:00
LIU Zhiwei 56c0e253c2 target/riscv: vector bitwise logical instructions
Backports d3842924cf93d104f691c5ea9090d6700ccef281
2021-02-26 10:30:33 -05:00
LIU Zhiwei 05153c6d7c target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
3a6f8f68ad2f4a22d9ae8287f336b5dcc80b6448
2021-02-26 10:19:48 -05:00
LIU Zhiwei b9814de4c3 target/riscv: vector widening integer add and subtract
Backports 8fcdf77630290591a6068c2d82ca2935338c3b0c
2021-02-26 10:05:43 -05:00
LIU Zhiwei f564388e89 target/riscv: vector single-width integer add and subtract
Backports 43740e3a3b3bb66456103684e622ba4e9baae297
2021-02-26 09:58:31 -05:00
LIU Zhiwei 7d0d7338c2 target/riscv: add vector amo operations
Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Backports 268fcca66bde62257960ec8d859de374315a5e3d
2021-02-26 09:47:32 -05:00
LIU Zhiwei 152934bade target/riscv: add fault-only-first unit stride load
The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Backports commit 022b4ecf775ffeff522eaea4f0d94edcfe00a0a9 from qemu
2021-02-26 09:28:19 -05:00
LIU Zhiwei 887c29bc79 target/riscv: add vector index load and store instructions
Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Backports f732560e3551c0823cee52efba993fbb8f689a36
2021-02-26 03:00:45 -05:00
LIU Zhiwei c7a17d04a2 target/riscv: add vector stride load and store instructions
Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address. It can been seen as a special
case of strided operations.

Backports 751538d5da557e5c10e5045c2d27639580ea54a7
2021-02-26 02:55:14 -05:00
LIU Zhiwei e4bc5056cd target/riscv: add an internals.h header
The internals.h keeps things that are not relevant to the actual architecture,
only to the implementation, separate.

Backports f476f17740ad42288d42dd8fedcdae8ca7007a16
2021-02-26 02:39:29 -05:00
LIU Zhiwei 9db3b70869 target/riscv: add vector configure instruction
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Backports 2b7168fc43fb270fb89e1dddc17ef54714712f3a from qemu
2021-02-26 02:37:59 -05:00
LIU Zhiwei 0554e79ad1 target/riscv: support vector extension csr
The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Backports 8e3a1f18871e0ea251b95561fe1ec5a9bc896c4a from qemu
2021-02-26 02:25:58 -05:00
LIU Zhiwei bff31d8822 target/riscv: implementation-defined constant parameters
vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Backports 32931383270e2ca8209267ca99f23f3c5f780982 from qemu
2021-02-26 02:23:28 -05:00
LIU Zhiwei 0968caa249 target/riscv: add vector extension field in CPURISCVState
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Backports ad9e5aa2ae8032f19a8293b6b8f4661c06167bf0 from qemu
2021-02-26 02:17:49 -05:00
Peter Maydell fceb5e309a Open 5.2 development tree
Backports commit 672b2f2695891b6d818bddc3ce0df964c7627969 from qemu
2021-02-25 23:52:17 -05:00
Peter Maydell 1f497fc74a Update version for v5.1.0 release
Backports commit d0ed6a69d399ae193959225cdeaa9382746c91cc from qemu
2021-02-25 23:51:51 -05:00
Peter Maydell 3c229a2b9e Update version for v5.1.0-rc3 release 2021-02-25 23:51:33 -05:00
Peter Maydell 0718459fb3 target/arm: Fix Rt/Rt2 in ESR_ELx for copro traps from AArch32 to 64
When a coprocessor instruction in an AArch32 guest traps to AArch32
Hyp mode, the syndrome register (HSR) includes Rt and Rt2 fields
which are simply copies of the Rt and Rt2 fields from the trapped
instruction. However, if the instruction is trapped from AArch32 to
an AArch64 higher exception level, the Rt and Rt2 fields in the
syndrome register (ESR_ELx) must be the AArch64 view of the register.
This makes a difference if the AArch32 guest was in a mode other than
User or System and it was using r13 or r14, or if it was in FIQ mode
and using r8-r14.

We don't know at translate time which AArch32 CPU mode we are in, so
we leave the values we generate in our prototype syndrome register
value at translate time as the raw Rt/Rt2 from the instruction, and
instead correct them to the AArch64 view when we find we need to take
an exception from AArch32 to AArch64 with one of these syndrome
values.

Fixes: https://bugs.launchpad.net/qemu/+bug/1879587

Backports commit a65dabf71a9f9b949d556b1b57fd72595df92398 from qemu
2021-02-25 23:50:18 -05:00
Peter Collingbourne 7de60dfa51 target/arm: Fix decode of LDRA[AB] instructions
These instructions use zero as the discriminator, not SP.

Backports commit d250bb19ced3b702c7c37731855f6876d0cc7995 from qemu
2021-02-25 23:47:25 -05:00
Kaige Li 3004cc1f97 target/arm: Avoid maybe-uninitialized warning with gcc 4.9
GCC version 4.9.4 isn't clever enough to figure out that all
execution paths in disas_ldst() that use 'fn' will have initialized
it first, and so it warns:

/home/LiKaige/qemu/target/arm/translate-a64.c: In function ‘disas_ldst’:
/home/LiKaige/qemu/target/arm/translate-a64.c:3392:5: error: ‘fn’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
^
/home/LiKaige/qemu/target/arm/translate-a64.c:3318:22: note: ‘fn’ was declared here
AtomicThreeOpFn *fn;
^

Make it happy by initializing the variable to NULL.

Backports commit 88a90e3de6ae99cbcfcc04c862c51f241fdf685f from qemu
2021-02-25 23:45:13 -05:00
Richard Henderson ce8282d9cd target/arm: Fix AddPAC error indication
The definition of top_bit used in this function is one higher
than that used in the Arm ARM psuedo-code, which put the error
indication at top_bit - 1 at the wrong place, which meant that
it wasn't visible to Auth.

Fixing the definition of top_bit requires more changes, because
its most common use is for the count of bits in top_bit:bot_bit,
which would then need to be computed as top_bit - bot_bit + 1.

For now, prefer the minimal fix to the error indication alone.

Fixes: 63ff0ca94cb

Backports commit 8796fe40dd30cd9ffd3c958906471715c923b341 from qemu
2021-02-25 23:44:28 -05:00
Peter Maydell 4952920d4d Update version for v5.1.0-rc2 release
Backports commit 5772f2b1fc5d00e7e04e01fa28e9081d6550440a from qemu
2021-02-25 23:43:39 -05:00
Lioncash a1e8e0adff target/arm: Fix bad rebase within do_mem_zpz 2021-02-25 23:43:16 -05:00
Richard Henderson 5e1316a92e target/arm: Always pass cacheattr in S1_ptw_translate
When we changed the interface of get_phys_addr_lpae to require
the cacheattr parameter, this spot was missed. The compiler is
unable to detect the use of NULL vs the nonnull attribute here.

Fixes: 7e98e21c098

Backports commit a6d6f37aed4b171d121cd4a9363fbb41e90dcb53 from qemu
2021-02-25 23:40:32 -05:00
Laszlo Ersek 40c04c73b0 target/i386: floatx80: avoid compound literals in static initializers
Quoting ISO C99 6.7.8p4, "All the expressions in an initializer for an
object that has static storage duration shall be constant expressions or
string literals".

The compound literal produced by the make_floatx80() macro is not such a
constant expression, per 6.6p7-9. (An implementation may accept it,
according to 6.6p10, but is not required to.)

Therefore using "floatx80_zero" and make_floatx80() for initializing
"f2xm1_table" and "fpatan_table" is not portable. And gcc-4.8 in RHEL-7.6
actually chokes on them:

> target/i386/fpu_helper.c:871:5: error: initializer element is not constant
> { make_floatx80(0xbfff, 0x8000000000000000ULL),
> ^

We've had the make_floatx80_init() macro for this purpose since commit
3bf7e40ab914 ("softfloat: fix for C99", 2012-03-17), so let's use that
macro again.

Fixes: eca30647fc0 ("target/i386: reimplement f2xm1 using floatx80 operations")
Fixes: ff57bb7b632 ("target/i386: reimplement fpatan using floatx80 operations")

Backports commit 163b3d1af2552845a60967979aca8d78a6b1b088 from qemu
2021-02-25 23:38:54 -05:00
Richard Henderson 6390789a09 target/i386: Save cc_op before loop insns
We forgot to update cc_op before these branch insns,
which lead to losing track of the current eflags.

Buglink: https://bugs.launchpad.net/qemu/+bug/1888165

Backports commit 3cb3a7720b01830abd5fbb81819dbb9271bf7821 from qemu
2021-02-25 23:36:43 -05:00
Zong Li 001d2e6a29 target/riscv: Fix the range of pmpcfg of CSR funcion table
Backports commit 8ba26b0b2b00dd5849a6c0981e358dc7a7cc315d from qemu
2021-02-25 23:35:21 -05:00
Peter Maydell 08ce565d7c Update version for v5.1.0-rc1 release
Backports commit c8004fe6bbfc0d9c2e7b942c418a85efb3ac4b00 from qemu
2021-02-25 23:34:20 -05:00
Richard Henderson 55369d710c tcg: Save/restore vecop_list around minmax fallback
Forgetting this asserts when tcg_gen_cmp_vec is called from
within tcg_gen_cmpsel_vec.

Fixes: 72b4c792c7a

Backports commit 69c918d2ef319ac63cd759c527debc2a2bdf3a0c from qemu
2021-02-25 23:33:24 -05:00
Chenyi Qiang e5d9e0ed53 target/i386: add fast short REP MOV support
For CPUs support fast short REP MOV[CPUID.(EAX=7,ECX=0):EDX(bit4)], e.g
Icelake and Tigerlake, expose it to the guest VM.

Backports commit 5cb287d2bd578dfe4897458793b4fce35bc4f744 from qemu
2021-02-25 23:31:42 -05:00
Peter Maydell 113dc25fbf Update version for v5.1.0-rc0 release
Backports commit 8746309137ba470d1b2e8f5ce86ac228625db940 from qemu
2021-02-25 23:30:37 -05:00
Aaron Lindsay e532ce610e target/arm: Don't do raw writes for PMINTENCLR
Raw writes to this register when in KVM mode can cause interrupts to be
raised (even when the PMU is disabled). Because the underlying state is
already aliased to PMINTENSET (which already provides raw write
functions), we can safely disable raw accesses to PMINTENCLR entirely.

Backports commit 887c0f1544991f567543b7c214aa11ab0cea0a29 from qemu
2021-02-25 23:27:47 -05:00
Richard Henderson f403c1f54f target/arm: Fix mtedesc for do_mem_zpz
The mtedesc that was constructed was not actually passed in.
Found by Coverity (CID 1429996).

Backports commit cdecb3fc1eb182d90666348a47afe63c493686e7 from qemu
2021-02-25 23:25:54 -05:00
Paolo Bonzini 5b794349d3 target/i386: implement undocumented 'smsw r32' behavior
In 32-bit mode, the higher 16 bits of the destination
register are undefined. In practice CR0[31:0] is stored,
just like in 64-bit mode, so just remove the "if" that
currently differentiates the behavior.

Backports commit c0c8445255b2b5b440c355431c8b01b7b7b7c8cf from qemu
2021-02-25 23:23:51 -05:00
Joseph Myers cf54c51869 target/i386: fix IEEE SSE floating-point exception raising
The SSE instruction implementations all fail to raise the expected
IEEE floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in MXCSR.

Fix this by adding such conversions. Unlike for x87, emulated SSE
floating-point operations might be optimized using hardware floating
point on the host, and so a different approach is taken that is
compatible with such optimizations. The required invariant is that
all exceptions set in env->sse_status (other than "denormal operand",
for which the SSE semantics are different from those in the softfloat
code) are ones that are set in the MXCSR; the emulated MXCSR is
updated lazily when code reads MXCSR, while when code sets MXCSR, the
exceptions in env->sse_status are set accordingly.

A few instructions do not raise all the exceptions that would be
raised by the softfloat code, and those instructions are made to save
and restore the softfloat exception state accordingly.

Nothing is done about "denormal operand"; setting that (only for the
case when input denormals are *not* flushed to zero, the opposite of
the logic in the softfloat code for such an exception) will require
custom code for relevant instructions, or else architecture-specific
conditionals in the softfloat code for when to set such an exception
together with custom code for various SSE conversion and rounding
instructions that do not set that exception.

Nothing is done about trapping exceptions (for which there is minimal
and largely broken support in QEMU's emulation in the x87 case and no
support at all in the SSE case).

Backports commit 418b0f93d12a1589d5031405de857844f32e9ccc from qemu
2021-02-25 23:21:32 -05:00
Joseph Myers fd5b0dd456 target/i386: set SSE FTZ in correct floating-point state
The code to set floating-point state when MXCSR changes calls
set_flush_to_zero on &env->fp_status, so affecting the x87
floating-point state rather than the SSE state. Fix to call it for
&env->sse_status instead.

Backports commit 3ddc0eca2229846bfecc3485648a6cb85a466dc7 from qemu
2021-02-25 23:15:53 -05:00
Laurent Vivier c15ddf11dd softfloat,m68k: disable floatx80_invalid_encoding() for m68k
According to the comment, this definition of invalid encoding is given
by intel developer's manual, and doesn't comply with 680x0 FPU.

With m68k, the explicit integer bit can be zero in the case of:
- zeros (exp == 0, mantissa == 0)
- denormalized numbers (exp == 0, mantissa != 0)
- unnormalized numbers (exp != 0, exp < 0x7FFF)
- infinities (exp == 0x7FFF, mantissa == 0)
- not-a-numbers (exp == 0x7FFF, mantissa != 0)

For infinities and NaNs, the explicit integer bit can be either one or
zero.

The IEEE 754 standard does not define a zero integer bit. Such a number
is an unnormalized number. Hardware does not directly support
denormalized and unnormalized numbers, but implicitly supports them by
trapping them as unimplemented data types, allowing efficient conversion
in software.

See "M68000 FAMILY PROGRAMMER’S REFERENCE MANUAL",
"1.6 FLOATING-POINT DATA TYPES"

We will implement in the m68k TCG emulator the FP_UNIMP exception to
trap into the kernel to normalize the number. In case of linux-user,
the number will be normalized by QEMU.

Backports commit d159dd058c7dc48a9291fde92eaae52a9f26a4d1 from qemu
2021-02-25 23:14:47 -05:00
Mark Cave-Ayland db742bec00 target/m68k: consolidate physical translation offset into get_physical_address()
Since all callers to get_physical_address() now apply the same page offset to
the translation result, move the logic into get_physical_address() itself to
avoid duplication.

Backports commit 852002b5664bf079da05c5201dbf2345b870e5ed from qemu
2021-02-25 23:13:48 -05:00
Mark Cave-Ayland 3b2bc4b0c8 target/m68k: fix physical address translation in m68k_cpu_get_phys_page_debug()
The result of the get_physical_address() function should be combined with the
offset of the original page access before being returned. Otherwise the
m68k_cpu_get_phys_page_debug() function can round to the wrong page causing
incorrect lookups in gdbstub and various "Disassembler disagrees with
translator over instruction decoding" warnings to appear at translation time.

Fixes: 88b2fef6c3 ("target/m68k: add MC68040 MMU")
2021-02-25 23:12:12 -05:00
Richard Henderson 65d5288563 tcg: Fix do_nonatomic_op_* vs signed operations
The smin/smax/umin/umax operations require the operands to be
properly sign extended. Do not drop the MO_SIGN bit from the
load, and additionally extend the val input.

Backports commit 852f933e482518797f7785a2e017a215b88df815 from qemu
2021-02-25 23:10:40 -05:00
Richard Henderson 57c66389c2 target/arm: Fix temp double-free in sve ldr/str
The temp that gets assigned to clean_addr has been allocated with
new_tmp_a64, which means that it will be freed at the end of the
instruction. Freeing it earlier leads to assertion failure.

The loop creates a complication, in which we allocate a new local
temp, which does need freeing, and the final code path is shared
between the loop and non-loop.

Fix this complication by adding new_tmp_a64_local so that the new
local temp is freed at the end, and can be treated exactly like
the non-loop path.

Fixes: bba87d0a0f4

Backports commit 4b4dc9750a0aa0b9766bd755bf6512a84744ce8a from qemu
2021-02-25 23:10:37 -05:00
Richard Henderson 54e2107bdf target/arm: Enable MTE
We now implement all of the components of MTE, without actually
supporting any tagged memory. All MTE instructions will work,
trivially, so we can enable support.

Backports commit c7459633baa71d1781fde4a245d6ec9ce2f008cf from qemu
2021-02-25 23:00:27 -05:00
Richard Henderson a34fda25b0 target/arm: Add allocation tag storage for system mode
Look up the physical address for the given virtual address,
convert that to a tag physical address, and finally return
the host address that backs it.

Backports commit e4d5bf4fbd5abfc3727e711eda64a583cab4d637 from qemu
2021-02-25 22:58:56 -05:00
Richard Henderson 9b6c64f8f8 target/arm: Create tagged ram when MTE is enabled
Backports commit 8bce44a2f6beb388a3f157652b46e99929839a96 from qemu
2021-02-25 22:51:23 -05:00
Richard Henderson 2ea0b53c1a target/arm: Cache the Tagged bit for a page in MemTxAttrs
This "bit" is a particular value of the page's MemAttr.

Backports commit 337a03f07ff0f9e6295662f4094e03a045b60bdc from qemu
2021-02-25 22:48:04 -05:00
Richard Henderson 28cd096d67 target/arm: Always pass cacheattr to get_phys_addr
We need to check the memattr of a page in order to determine
whether it is Tagged for MTE. Between Stage1 and Stage2,
this becomes simpler if we always collect this data, instead
of occasionally being presented with NULL.

Use the nonnull attribute to allow the compiler to check that
all pointer arguments are non-null.

Backports commit 7e98e21c09871cddc20946c8f3f3595e93154ecb from qemu
2021-02-25 22:46:00 -05:00
Richard Henderson e2456a83a4 target/arm: Set PSTATE.TCO on exception entry
D1.10 specifies that exception handlers begin with tag checks overridden.

Backports commit 34669338bd9d66255fceaa84c314251ca49ca8d5 from qemu
2021-02-25 22:41:26 -05:00
Richard Henderson 35d0443056 target/arm: Implement data cache set allocation tags
This is DC GVA and DC GZVA, and the tag check for DC ZVA.

Backports commit eb821168db798302bd124a3b000cebc23bd0a395 from qemu
2021-02-25 22:40:08 -05:00
Richard Henderson 33f5bdabb1 target/arm: Complete TBI clearing for user-only for SVE
There are a number of paths by which the TBI is still intact
for user-only in the SVE helpers.

Because we currently always set TBI for user-only, we do not
need to pass down the actual TBI setting from above, and we
can remove the top byte in the inner-most primitives, so that
none are forgotten. Moreover, this keeps the "dirty" pointer
around at the higher levels, where we need it for any MTE checking.

Since the normal case, especially for user-only, goes through
RAM, this clearing merely adds two insns per page lookup, which
will be completely in the noise.

Backports commit c4af8ba19b9d22aac79cab679a20b159af9d6809 from qemu
2021-02-25 22:37:12 -05:00
Richard Henderson 732efce958 target/arm: Add mte helpers for sve scatter/gather memory ops
Because the elements are non-sequential, we cannot eliminate many
tests straight away like we can for sequential operations. But
we often have the PTE details handy, so we can test for Tagged.

Backports commit d28d12f008ee44dc2cc2ee5d8f673be9febc951e from qemu
2021-02-25 22:34:24 -05:00
Richard Henderson 5698b7badb target/arm: Handle TBI for sve scalar + int memory ops
We still need to handle tbi for user-only when mte is inactive.

Backports commit 9473d0ecafcffc8b258892b1f9f18e037bdba958 from qemu
2021-02-25 22:17:46 -05:00
Richard Henderson 586235d02d target/arm: Add mte helpers for sve scalar + int ff/nf loads
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.

Backports commit aa13f7c3c378fa41366b9fcd6c29af1c3d81126a from qemu
2021-02-25 22:09:17 -05:00
Richard Henderson cb31d54b18 target/arm: Add mte helpers for sve scalar + int stores
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.

Backports commit 71b9f3948c75bb97641a3c8c7de96d1cb47cdc07 from qemu
2021-02-25 21:53:55 -05:00
Richard Henderson 670b25c5fa target/arm: Add mte helpers for sve scalar + int loads
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.

Backports commit 206adacfb8d35e671e3619591608c475aa046b63 from qemu
2021-02-25 21:45:32 -05:00
Richard Henderson 6a78133659 target/arm: Remove sve_memopidx
None of the sve helpers use TCGMemOpIdx any longer, so we can
stop passing it.

Backports commit ba080b8682fc6bde7f2d9dedddb519d63cbe138f from qemu
2021-02-25 21:33:44 -05:00
Richard Henderson 1f306230d4 target/arm: Reuse sve_probe_page for gather loads
Backports commit 10a85e2c8ab6e004e7f3f1dcfea8cb0bf58fb9fb from qemu
2021-02-25 21:30:13 -05:00