The M-profile definition of the MVFR1 ID register differs slightly
from the A-profile one, and in particular the check for "does the CPU
support fp16 arithmetic" is not the same.
We don't currently implement any M-profile CPUs with fp16 arithmetic,
so this is not yet a visible bug, but correcting the logic now
disarms this beartrap for when we eventually do.
Backports commit dfc523a84b06b6a4b583ed4c29d24fd980dd37a0
Move the id_pfr0 and id_pfr1 fields into the ARMISARegisters
sub-struct. We're going to want id_pfr1 for an isar_features
check, and moving both at the same time avoids an odd
inconsistency.
Changes other than the ones to cpu.h and kvm64.c made
automatically with:
perl -p -i -e 's/cpu->id_pfr/cpu->isar.id_pfr/' target/arm/*.c hw/intc/armv7m_nvic.c
Backports commit 8a130a7be6e222965641e1fd9469fd3ee752c7d4
The ARM_FEATURE_PXN bit indicates whether the CPU supports the PXN
bit in short-descriptor translation table format descriptors. This
is indicated by ID_MMFR0.VMSA being at least 0b0100. Replace the
feature bit with an ID register check, in line with our preference
for ID register checks over feature bits.
Backports commit 0ae0326b984e77a55c224b7863071bd3d8951231
Per Intel SDM vol 1, 13.2, if CPUID.1:ECX.XSAVE[bit 26] is 0, the
processor provides no further enumeration through CPUID function 0DH.
QEMU does not do this for "-cpu host,-xsave".
Backports 19ca8285fcd61a8f60f2f44f789a561e0958e8e6
This check was backwards when introduced in commit
033614c47de78409ad3fb39bb7bd1483b71c6789:
target/arm: Filter cycle counter based on PMCCFILTR_EL0
Backports commit db1f3afb17269cf2bd86c222e1bced748487ef71
The VCMLA and VCADD insns have a size field which is 0 for fp16
and 1 for fp32 (note that this is the reverse of the Neon 3-same
encoding!). Convert it to MO_* values in decode for consistency.
Backports d186a4854c04e9832907b0b4240a47731da20993
Convert the insns using the 2reg_vcvt and 2reg_vcvt_f16 formats
to pass the size through to the trans function as a MO_* value
rather than the '0==f32, 1==f16' used in the fp 3-same encodings.
Backports commit 0ae715c658a02af1834b63563c56112a6d8842cb
In the Neon instructions, some instruction formats have a 2-bit size
field which corresponds exactly to QEMU's MO_8/16/32/64. However the
floating-point insns in the 3-same group have a 1-bit size field
which is "0 for 32-bit float and 1 for 16-bit float". Currently we
pass these values directly through to trans_ functions, which means
that when reading a particular trans_ function you need to know if
that insn uses a 2-bit size or a 1-bit size.
Move the handling of the 1-bit size to the decodetree file, so that
all these insns consistently pass a size to the trans_ function which
is an MO_8/16/32/64 value.
In this commit we switch over the insns using the 3same_fp and
3same_fp_q0 formats.
Backports commit 6cf0f240e0b980a877abed12d2995f740eae6515
There's a typo in the type name of AARCH64_CPU_GET_CLASS. This
was never detected because the macro is not used by any code.
Backports 37e3d65043229bb20bd07af74dc0866e12071415
Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
that our "-cpu max" has v8.2-FP16.
Backports commit 5f07817eb94542e39a419baafa3026b15e8d33f7
Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
and use this to implement fp16 support.
Backports fc8ae790311882afa3c7816df004daf978c40e9a
In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.
Backports commit d7ce81e553e6789bf27657105b32575668d60b1c
Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
and use this to implement the fp16 versions.
Backports 18725916b1438b54d6d6533980833d2251a20b7c
Convert the Neon VCVT with-specified-rounding-mode instructions
to gvec, and use this to implement fp16 support for them.
Backports ca88a6efdf4ce96b646a896059f9bd324c2cebc4
Convert the Neon VCVT float<->fixed-point insns to a
gvec style, in preparation for adding fp16 support.
Backports 7b959c5890deb9a6d71bc6800006a0eae0a84c60
Convert the Neon float-integer VCVT insns to gvec, and use this
to implement fp16 support for them.
Note that unlike the VFP int<->fp16 VCVT insns we converted
earlier and which convert to/from a 32-bit integer, these
Neon insns convert to/from 16-bit integers. So we can use
the existing vfp conversion helpers for the f32<->u32/i32
case but need to provide our own for f16<->u16/i16.
Backports 7782a9afec81d1efe23572135c1ed777691ccde5
Convert the Neon pairwise fp ops to use a single gvic-style
helper to do the full operation instead of one helper call
for each 32-bit part. This allows us to use the same
framework to implement the fp16.
Backports 1dc587ee9bfe804406eb3e0bacf47a80644d8abc
Convert the Neon VRSQRTS insn to using a gvec helper,
and use this to implement the fp16 case.
As with VRECPS, we adjust the phrasing of the new implementation
slightly so that the fp32 version parallels the fp16 one.
Backports 40fde72dda2da8d55b820fa6c5efd85814be2023
Convert the Neon VRECPS insn to using a gvec helper, and
use this to implement the fp16 case.
The phrasing of the new float32_recps_nf() is slightly different from
the old recps_f32() so that it parallels the f16 version; for f16 we
can't assume that flush-to-zero is always enabled.
Backports ac8c62c4e5a3f24e6d47f52ec1bfb20994caefa5
Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.
Backport 635187aaa92f21ab001e2868e803b3c5460261ca
Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.
This is the last use of do_3same_fp() so we can now delete
that function.
Backports commit cf722d75b329ef3f86b869e7e68cbfb1607b3bde
Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.
Backports e5adc70665ecaf4009c2fb8d66775ea718a85abd
Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.
Backports e22705bb941d82d6c2a09e8b2031084326902be3
Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.
Backport e43268c54b6cbcb197d179409df7126e81f8cd52
Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.
Backports bb2741da186ebaebc7d5189372be4401e1ff9972
Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.
(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)
Backports ad505db233b89b7fd4b5a98b6f0e8ac8d05b11db
Rewrite Neon VABS/VNEG of floats to use gvec logical AND and XOR, so
that we can implement the fp16 version of the insns.
Backport 2b70d8cd09f5450c15788acd24f6f8bc4116c395
We already have gvec helpers for floating point VRECPE and
VRQSRTE, so convert the Neon decoder to use them and
add the fp16 support.
Backports 4a15d9a3b39d4d161d7e03dfcf52e9f214eef0b8
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.
For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.
Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
Implement the VFP fp16 variant of VMOV that transfers a 16-bit
value between a general purpose register and a VFP register.
Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later
only we have no need to replicate the old "updates CPSR.NZCV"
behaviour that the singleprec version of this insn does
Backports commit 46a4b854525cb9f34a611f6ada6cdff1eab0ac2d
The fp16 extension includes a new instruction VMOVX, which copies the
upper 16 bits of a 32-bit source VFP register into the lower 16
bits of the destination and zeroes the high half of the destination.
Implement it.
Backports f61e5c43b86907dea17f431b528d806659d62bcb
The fp16 extension includes a new instruction VINS, which copies the
lower 16 bits of a 32-bit source VFP register into the upper 16 bits
of the destination. Implement it.
Backports commit e4875e3bcc3a9c54d7e074c8f51e04c2e6364e2e
Implement the fp16 versions of the VFP VCVT instruction forms
which convert between floating point and integer with a specified
rounding mode.
Backports c505bc6a9d50a48f9d89d6cf930e863838a5b367
Implement the fp16 versions of the VFP VCVT instruction forms which
convert between floating point and fixed-point.
Backports a149e2de0b63e3906729ed1d3df7d9ecdb6de5e6
Now the VFP_CONV_FIX macros can handle fp16's distinction between the
width of the operation and the width of the type used to pass operands,
use the macros rather than the open-coded functions.
This creates an extra six helper functions, all of which we are going
to need for the AArch32 VFP fp16 instructions.
Backports commit 414ba270c4fb758d987adf37ae9bfe531715c604
Currently the VFP_CONV_FIX macros take a single fsz argument for the
size of the float type, which is used both to select the name of
the functions to call (eg float32_is_any_nan()) and also for the
type to use for the float inputs and outputs (eg float32).
Separate these into fsz and ftype arguments, so that we can use them
for fp16, which uses 'float16' in the function names but is still
passing inputs and outputs in a 32-bit sized type.
Backports 5366f6ad7da4f6def2733ec7ee24495430256839
Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
the fp16 insns that use the DO_VFP_2OP macro, because there
is no fp16 version of VMOV_reg.
Notes:
* the gen_helper_vfp_negh already exists as we needed to create
it for the fp16 multiply-add insns
* as usual we need to use the f16 version of the fp_status;
this is only relevant for VSQRT
Backports ce2d65a5d191380756cdac7a1fd1ba76bd1621cf
Macroify the uses of do_vfp_2op_sp() and do_vfp_2op_dp(); this will
make it easier to add the halfprec support.
Backports 009a07335b8ff492d940e1eb229a1b0d302c2512
Macroify creation of the trans functions for single and double
precision VFMA, VFMS, VFNMA, VFNMS. The repetition was OK for
two sizes, but we're about to add halfprec and it will get a bit
more than seems reasonable.
Backports 2aa8dcfa14558fe2a63ed0496d60b02565c9a225
Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
instructions. (These are all the remaining ones which we implement
via do_vfp_3op_[hsd]p().)
Backports commit e7cb0ded52c6d7b86585b09935fe7caeb9e38b69
Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
VSUB, VMUL, VDIV, VMINNM and VMAXNM:
* make the VFP_BINOP() macro generate float16 helpers as well as
float32 and float64
* implement a do_vfp_3op_hp() function similar to the existing
do_vfp_3op_sp()
* add decode for the half-precision insn patterns
Note that the VFP_BINOP macro use creates a couple of unused helper
functions vfp_maxh and vfp_minh, but they're small so it's not worth
splitting the BINOP operations into "needs halfprec" and "no
halfprec" groups.
Backports commit 120a0eb3ea23a5b06fae2f3daebd46a4035864cf
The aa32_fp16_arith feature check function currently looks at the
AArch64 ID_AA64PFR0 register. This is (as the comment notes) not
correct. The bogus check was put in mostly to allow testing of the
fp16 variants of the VCMLA instructions and it was something of
a mistake that we allowed them to exist in master.
Switch the feature check function to testing VMFR1.FPHP, which is
what it ought to be.
This will remove emulation of the VCMLA and VCADD insns from
AArch32 code running on an AArch64 '-cpu max' using system emulation.
(They were never enabled for aarch32 linux-user and system-emulation.)
Since we weren't advertising their existence via the AArch32 ID
register, well-behaved guests wouldn't have been using them anyway.
Once we have implemented all the AArch32 support for the FP16 extension
we will advertise it in the MVFR1 ID register field, which will reenable
these insns along with all the others.
Backports 02bc236d0131a666d4ac2bb7197bbad2897c336a
In several places the target/arm code defines local float constants
for 2, 3 and 1.5, which are also provided by include/fpu/softfloat.h.
Remove the unnecessary local duplicate versions.
Backports b684e49a17da39539b0ac6e4c4c98b28b38feb76
Clang static code analyzer show warning:
target/arm/translate-a64.c:13007:5: warning: Value stored to 'rd' is never read
rd = extract32(insn, 0, 5);
^ ~~~~~~~~~~~~~~~~~~~~~
target/arm/translate-a64.c:13008:5: warning: Value stored to 'rn' is never read
rn = extract32(insn, 5, 5);
^ ~~~~~~~~~~~~~~~~~~~~~
Backports fa71dd531c12ad9a05cdd78392e9fc2a30ea921d
Clang static code analyzer show warning:
target/arm/translate-a64.c:8635:14: warning: Value stored to 'tcg_rn' during its
initialization is never read
TCGv_i64 tcg_rn = new_tmp_a64(s);
^~~~~~ ~~~~~~~~~~~~~~
target/arm/translate-a64.c:8636:14: warning: Value stored to 'tcg_rd' during its
initialization is never read
TCGv_i64 tcg_rd = new_tmp_a64(s);
^~~~~~ ~~~~~~~~~~~~~~
Backports 07174c86b41e91d98ed2ee0ee12e516694853c6b
Unify add/sub helpers and add a parameter for rounding.
This will allow saturating non-rounding to reuse this code.
Backports d21798856b227a20a0a41640236af445f4f4aeb0
The gvec operation was added after the initial implementation
of the SEL instruction and was missed in the conversion.
Backports d4bc623254b55e2f9613c9450216fa7e50c03929
Move the check for !S into do_pppp_flags, which allows to merge in
do_vecop4_p. Split out gen_gvec_fn_ppp without sve_access_check,
to mirror gen_gvec_fn_zzz.
Backport dd81a8d7cf5c90963603806e58a217bbe759f75e
We want to ensure that access is checked by the time we ask
for a specific fp/vector register. We want to ensure that
we do not emit two lots of code to raise an exception.
But sometimes it's difficult to cleanly organize the code
such that we never pass through sve_check_access exactly once.
Allow multiple calls so long as the result is true, that is,
no exception to be raised.
Backports 8a40fe5f1bf3837ae3f9961efe1d51e7214f2664
Model gen_gvec_fn_zzz on gen_gvec_fn3 in translate-a64.c, but
indicating which kind of register and in which order.
Model do_zzz_fn on the other do_foo functions that take an
argument set and verify sve enabled.
Backports 28c4da31be6a5e501b60b77bac17652dd3211378
Model the new function on gen_gvec_fn2 in translate-a64.c, but
indicating which kind of register and in which order. Since there
is only one user of do_vector2_z, fold it into do_mov_z
Backports f7d79c41fa4bd0f0d27dcd14babab8575fbed39f
According to AArch64.TagCheckFault, none of the other ISS values are
provided, so we do not need to go so far as merge_syn_data_abort.
But we were missing the WnR bit.
Backports commit 9a4670be7f0734d27bf4058db3becf83cd0cc9d5 from qemu
We need more information than just the mmu_idx in order
to create the proper exception syndrome. Only change the
function signature so far.
Backports dbf8c32178291169e111a6a9fd7ae17af4a3039d
In commit ce4afed839 ("target/arm: Implement AArch32 HCR and HCR2")
the HCR_EL2 register has been changed from type NO_RAW (no underlying
state and does not support raw access for state saving/loading) to
type CONST (TCG can assume the value to be constant), removing the
read/write accessors.
We forgot to remove the previous type ARM_CP_NO_RAW. This is not
really a problem since the field is overwritten. However it makes
code review confuse, so remove it.
Backports 0e5aac18bc31dbdfab51f9784240d0c31a4c5579
When we implemented the VCMLA and VCADD insns we put in the
code to handle fp16, but left it using the standard fp status
flags. Correct them to use FPST_STD_F16 for fp16 operations.
Bacports commit b34aa5129e9c3aff890b4f4bcc84962e94185629
Architecturally, Neon FP16 operations use the "standard FPSCR" like
all other Neon operations. However, this is defined in the Arm ARM
pseudocode as "a fixed value, except that FZ16 (and AHP) follow the
FPSCR bits". In QEMU, the softfloat float_status doesn't include
separate flush-to-zero for FP16 operations, so we must keep separate
fp_status for "Neon non-FP16" and "Neon fp16" operations, in the
same way we do already for the non-Neon "fp_status" vs "fp_status_f16".
Add the extra float_status field to the CPU state structure,
ensure it is correctly initialized and updated on FPSCR writes,
and make fpstatus_ptr(FPST_STD_F16) return a pointer to it.
Backports commit aaae563bc73de0598bbc09a102e68f27fafe704a
Make A32/T32 code use the new fpstatus_ptr() API:
get_fpstatus_ptr(0) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(1) -> fpstatus_ptr(FPST_STD)
Backports a84d1d1316726704edd2617b2c30c921d98a8137
We currently have two versions of get_fpstatus_ptr(), which both take
an effectively boolean argument:
* the one for A64 takes "bool is_f16" to distinguish fp16 from other ops
* the one for A32/T32 takes "int neon" to distinguish Neon from other ops
This is confusing, and to implement ARMv8.2-FP16 the A32/T32 one will
need to make a four-way distinction between "non-Neon, FP16",
"non-Neon, single/double", "Neon, FP16" and "Neon, single/double".
The A64 version will then be a strict subset of the A32/T32 version.
To clean this all up, we want to go to a single implementation which
takes an enum argument with values FPST_FPCR, FPST_STD,
FPST_FPCR_F16, and FPST_STD_F16. We rename the function to
fpstatus_ptr() so that unconverted code gets a compilation error
rather than silently passing the wrong thing to the new function.
This commit implements that new API, and converts A64 to use it:
get_fpstatus_ptr(false) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(true) -> fpstatus_ptr(FPST_FPCR_F16)
Backports commit cdfb22bb7326fee607d9553358856cca341dbc9a
In commit 962fcbf2efe57231a9f5df we converted the uses of the
ARM_FEATURE_CRC bit to use the aa32_crc32 isar_feature test
instead. However we forgot to remove the now-unused definition
of the feature name in the enum. Delete it now.
Backports commit cf6303d262e31f4812dfeb654c6c6803e52000af
In arm_tr_init_disas_context() we have a FIXME comment that suggests
"cpu_M0 can probably be the same as cpu_V0". This isn't in fact
possible: cpu_V0 is used as a temporary inside gen_iwmmxt_shift(),
and that function is called in various places where cpu_M0 contains a
live value (i.e. between gen_op_iwmmxt_movq_M0_wRn() and
gen_op_iwmmxt_movq_wRn_M0() calls). Remove the comment.
We also have a comment on the declarations of cpu_V0/V1/M0 which
claims they're "for efficiency". This isn't true with modern TCG, so
replace this comment with one which notes that they're only used with
the iwmmxt decode
Backports 8b4c9a50dc9531a729ae4b5941d287ad0422db48
As part of the Neon decodetree conversion we removed all
the uses of the VFP_DREG macros, but forgot to remove the
macro definitions. Do so now.
Backports e60527c5d501e5015a119a0388a27abeae4dac09
The ARCH() macro was used a lot in the legacy decoder, but
there are now just two uses of it left. Since a macro which
expands out to a goto is liable to be confusing when reading
code, replace the last two uses with a simple open-coded
qeuivalent.
Backports ce51c7f522ca488c795c3510413e338021141c96
Convert the T32 coprocessor instructions to decodetree.
As with the A32 conversion, this corrects an underdecoding
where we did not check that MRRC/MCRR [24:21] were 0b0010
and so treated some kinds of LDC/STC and MRRC/MCRR rather
than UNDEFing them.
Backports commit 4c498dcfd84281f20bd55072630027d1b3c115fd
For M-profile CPUs, the architecture specifies that the NOCP
exception when a coprocessor is not present or disabled should cover
the entire wide range of coprocessor-space encodings, and should take
precedence over UNDEF exceptions. (This is the opposite of
A-profile, where checking for a disabled FPU has to happen last.)
Implement this with decodetree patterns that cover the specified
ranges of the encoding space. There are a few instructions (VLLDM,
VLSTM, and in v8.1 also VSCCLRM) which are in copro-space but must
not be NOCP'd: these must be handled also in the new m-nocp.decode so
they take precedence.
This is a minor behaviour change: for unallocated insn patterns in
the VFP area (cp=10,11) we will now NOCP rather than UNDEF when the
FPU is disabled.
As well as giving us the correct architectural behaviour for v8.1M
and the recommended behaviour for v8.0M, this refactoring also
removes the old NOCP handling from the remains of the 'legacy
decoder' in disas_thumb2_insn(), paving the way for cleaning that up.
Since we don't currently have a v8.1M feature bit or any v8.1M CPUs,
the minor changes to this logic that we'll need for v8.1M are marked
up with TODO comments.
Backports commit a3494d4671797c291c88bd414acb0aead15f7239 from qemu
The only thing left in the "legacy decoder" is the handling
of disas_xscale_insn(), and we can simplify the code.
Backports commit 8198c071bc55bee55ef4f104a5b125f541b51096
Convert the A32 coprocessor instructions to decodetree.
Note that this corrects an underdecoding: for the 64-bit access case
(MRRC/MCRR) we did not check that bits [24:21] were 0b0010, so we
would incorrectly treat LDC/STC as MRRC/MCRR rather than UNDEFing
them.
The decodetree versions of these insns assume the coprocessor
is in the range 0..7 or 14..15. This is architecturally sensible
(as per the comments) and OK in practice for QEMU because the only
uses of the ARMCPRegInfo infrastructure we have that aren't
for coprocessors 14 or 15 are the pxa2xx use of coprocessor 6.
We add an assertion to the define_one_arm_cp_reg_with_opaque()
function to catch any accidental future attempts to use it to
define coprocessor registers for invalid coprocessors.
Backports commit cd8be50e58f63413c033531d3273c0e44851684f from qemu
As a prelude to making coproc insns use decodetree, split out the
part of disas_coproc_insn() which does instruction decoding from the
part which does the actual work, and make do_coproc_insn() handle the
UNDEF-on-bad-permissions and similar cases itself rather than
returning 1 to eventually percolate up to a callsite that calls
unallocated_encoding() for it.
Backports 19c23a9baafc91dd3881a7a4e9bf454e42d24e4e
At the moment we check for XScale/iwMMXt insns inside
disas_coproc_insn(): for CPUs with ARM_FEATURE_XSCALE all copro insns
with cp 0 or 1 are handled specially. This works, but is an odd
place for this check, because disas_coproc_insn() is called from both
the Arm and Thumb decoders but the XScale case never applies for
Thumb (all the XScale CPUs were ARMv5, which has only Thumb1, not
Thumb2 with the 32-bit coprocessor insn encodings). It also makes it
awkward to convert the real copro access insns to decodetree.
Move the identification of XScale out to its own function
which is only called from disas_arm_insn().
Backports commit 7b4f933db865391a90a3b4518bb2050a83f2a873 from qemu
Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction
Backports 268fcca66bde62257960ec8d859de374315a5e3d