Commit graph

5685 commits

Author SHA1 Message Date
Frank Chang d97454eb63 softfloat: Add fp16 and uint8/int8 conversion functions
Backports 0d93d8ec632154dea2627a9e989972ee09721187
2021-02-26 15:11:57 -05:00
Kito Cheng 76d123efee softfloat: Implement the full set of comparisons for float16
Backports dd205025a048ef6f53ff51eb86ddc58e7a82a771
2021-02-26 15:04:12 -05:00
Lioncash f5a21abc0b target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd 2021-02-26 15:01:44 -05:00
Richard Henderson aa97b6b755 target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd
Backports 3607440c4df6498585a570cfc1041e4972b41b56
2021-02-26 14:51:17 -05:00
Richard Henderson 732674b868 target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd
Backports 2e5a265e6a9e7169c4a3e87db261b2fa92582590
2021-02-26 14:46:29 -05:00
Richard Henderson 80325ac866 target/arm: Generalize inl_qrdmlah_* helper functions
Unify add/sub helpers and add a parameter for rounding.
This will allow saturating non-rounding to reuse this code.

Backports d21798856b227a20a0a41640236af445f4f4aeb0
2021-02-26 14:41:32 -05:00
Richard Henderson 1bedcfbda3 target/arm: Tidy SVE tszimm shift formats
Rather than require the user to fill in the immediate (shl or shr),
create full formats that include the immediate.
2021-02-26 14:35:53 -05:00
Richard Henderson da41a23a1b target/arm: Split out gen_gvec_ool_zz
Backports 40e32e5a8a379baf6e0d49d83cf19950cfbaf96b
2021-02-26 14:32:36 -05:00
Richard Henderson 5bd98feed9 target/arm: Split out gen_gvec_ool_zzz
Backports e645d1a17a359156c6047006d760ca176d493edb
2021-02-26 14:29:48 -05:00
Richard Henderson aa3819c396 target/arm: Split out gen_gvec_ool_zzp
Model after gen_gvec_fn_zzz et al.

Backports 96a461f7c12587d3a64a71e4d90cda5c09ca3eb4
2021-02-26 14:26:33 -05:00
Lioncash 2da89a626c target/arm: Merge helper_sve_clr_* and helper_sve_movz_* 2021-02-26 14:23:06 -05:00
Richard Henderson 8eb3642d96 target/arm: Split out gen_gvec_ool_zzzp
Model after gen_gvec_fn_zzz et al.

Backports 36cbb7a8e7100864c488a1153cecba90b1c33a4c
2021-02-26 14:14:13 -05:00
Richard Henderson 9b3671e9ad target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp
The gvec operation was added after the initial implementation
of the SEL instruction and was missed in the conversion.

Backports d4bc623254b55e2f9613c9450216fa7e50c03929
2021-02-26 14:12:25 -05:00
Richard Henderson c8c247410f target/arm: Clean up 4-operand predicate expansion
Move the check for !S into do_pppp_flags, which allows to merge in
do_vecop4_p. Split out gen_gvec_fn_ppp without sve_access_check,
to mirror gen_gvec_fn_zzz.

Backport dd81a8d7cf5c90963603806e58a217bbe759f75e
2021-02-26 14:07:14 -05:00
Richard Henderson 7bef6489a8 target/arm: Merge do_vector2_p into do_mov_p
This is the only user of the function

Backports d0b2df5a01eeccbac71d4d883158b91e7f9a6a29
2021-02-26 13:59:00 -05:00
Richard Henderson f329d428f3 target/arm: Rearrange {sve,fp}_check_access assert
We want to ensure that access is checked by the time we ask
for a specific fp/vector register. We want to ensure that
we do not emit two lots of code to raise an exception.

But sometimes it's difficult to cleanly organize the code
such that we never pass through sve_check_access exactly once.
Allow multiple calls so long as the result is true, that is,
no exception to be raised.

Backports 8a40fe5f1bf3837ae3f9961efe1d51e7214f2664
2021-02-26 13:56:27 -05:00
Richard Henderson 64822511dd target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn
Model gen_gvec_fn_zzz on gen_gvec_fn3 in translate-a64.c, but
indicating which kind of register and in which order.

Model do_zzz_fn on the other do_foo functions that take an
argument set and verify sve enabled.

Backports 28c4da31be6a5e501b60b77bac17652dd3211378
2021-02-26 13:53:10 -05:00
Richard Henderson 3146cbb64e target/arm: Split out gen_gvec_fn_zz
Model the new function on gen_gvec_fn2 in translate-a64.c, but
indicating which kind of register and in which order. Since there
is only one user of do_vector2_z, fold it into do_mov_z

Backports f7d79c41fa4bd0f0d27dcd14babab8575fbed39f
2021-02-26 13:50:05 -05:00
Richard Henderson 234a22803d qemu/int128: Add int128_lshift
Add left-shift to match the existing right-shift.

Backports 5be4dd043f5beb5e7587d1ef8dd4e3716ec05639
2021-02-26 13:45:44 -05:00
Richard Henderson 6f341e0199 target/arm: Fill in the WnR syndrome bit in mte_check_fail
According to AArch64.TagCheckFault, none of the other ISS values are
provided, so we do not need to go so far as merge_syn_data_abort.
But we were missing the WnR bit.

Backports commit 9a4670be7f0734d27bf4058db3becf83cd0cc9d5 from qemu
2021-02-26 12:26:15 -05:00
Richard Henderson 6969435fb8 target/arm: Pass the entire mte descriptor to mte_check_fail
We need more information than just the mmu_idx in order
to create the proper exception syndrome. Only change the
function signature so far.

Backports dbf8c32178291169e111a6a9fd7ae17af4a3039d
2021-02-26 12:19:51 -05:00
Philippe Mathieu-Daudé d4c59cce4e target/arm: Clarify HCR_EL2 ARMCPRegInfo type
In commit ce4afed839 ("target/arm: Implement AArch32 HCR and HCR2")
the HCR_EL2 register has been changed from type NO_RAW (no underlying
state and does not support raw access for state saving/loading) to
type CONST (TCG can assume the value to be constant), removing the
read/write accessors.
We forgot to remove the previous type ARM_CP_NO_RAW. This is not
really a problem since the field is overwritten. However it makes
code review confuse, so remove it.

Backports 0e5aac18bc31dbdfab51f9784240d0c31a4c5579
2021-02-26 12:18:15 -05:00
Max Filippov d9e561ab2a softfloat: add xtensa specialization for pickNaNMulAdd
pickNaNMulAdd logic on Xtensa is to apply pickNaN to the inputs of the
expression (a * b) + c. However if default NaN is produces as a result
of (a * b) calculation it is not considered when c is NaN.
So with two pickNaN variants there must be two pickNaNMulAdd variants.
In addition the invalid flag is always set when (a * b) produces NaN.

Backports commit fbcc38e4cb1b539b8615ec9b0adc285351d77628 from qemu
2021-02-26 12:16:51 -05:00
Max Filippov fee4c62fe4 softfloat: pass float_status pointer to pickNaN
Pass float_status structure pointer to the pickNaN so that
machine-specific settings are available to NaN selection code.
Add use_first_nan property to float_status and use it in Xtensa-specific
pickNaN.

Backports commit 913602e3ffe6bf50b869a14028a55cb267645ba3
2021-02-26 12:16:05 -05:00
Max Filippov db780eff66 softfloat: make NO_SIGNALING_NANS runtime property
target/xtensa, the only user of NO_SIGNALING_NANS macro has FPU
implementations with and without the corresponding property. With
NO_SIGNALING_NANS being a macro they cannot be a part of the same QEMU
executable.
Replace macro with new property in float_status to allow cores with
different FPU implementations coexist.

Backports cc43c6925113c5bc8f1a0205375931d2e4807c99
2021-02-26 12:11:40 -05:00
Peter Maydell 3e5aa58139 target/arm: Use correct FPST for VCMLA, VCADD on fp16
When we implemented the VCMLA and VCADD insns we put in the
code to handle fp16, but left it using the standard fp status
flags. Correct them to use FPST_STD_F16 for fp16 operations.

Bacports commit b34aa5129e9c3aff890b4f4bcc84962e94185629
2021-02-26 12:02:23 -05:00
Peter Maydell 61377ce01c target/arm: Implement FPST_STD_F16 fpstatus
Architecturally, Neon FP16 operations use the "standard FPSCR" like
all other Neon operations. However, this is defined in the Arm ARM
pseudocode as "a fixed value, except that FZ16 (and AHP) follow the
FPSCR bits". In QEMU, the softfloat float_status doesn't include
separate flush-to-zero for FP16 operations, so we must keep separate
fp_status for "Neon non-FP16" and "Neon fp16" operations, in the
same way we do already for the non-Neon "fp_status" vs "fp_status_f16".

Add the extra float_status field to the CPU state structure,
ensure it is correctly initialized and updated on FPSCR writes,
and make fpstatus_ptr(FPST_STD_F16) return a pointer to it.

Backports commit aaae563bc73de0598bbc09a102e68f27fafe704a
2021-02-26 12:00:25 -05:00
Peter Maydell b1b0a41507 target/arm: Make A32/T32 use new fpstatus_ptr() API
Make A32/T32 code use the new fpstatus_ptr() API:
get_fpstatus_ptr(0) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(1) -> fpstatus_ptr(FPST_STD)

Backports a84d1d1316726704edd2617b2c30c921d98a8137
2021-02-26 11:55:55 -05:00
Peter Maydell 79359e3a69 target/arm: Replace A64 get_fpstatus_ptr() with generic fpstatus_ptr()
We currently have two versions of get_fpstatus_ptr(), which both take
an effectively boolean argument:
* the one for A64 takes "bool is_f16" to distinguish fp16 from other ops
* the one for A32/T32 takes "int neon" to distinguish Neon from other ops

This is confusing, and to implement ARMv8.2-FP16 the A32/T32 one will
need to make a four-way distinction between "non-Neon, FP16",
"non-Neon, single/double", "Neon, FP16" and "Neon, single/double".
The A64 version will then be a strict subset of the A32/T32 version.

To clean this all up, we want to go to a single implementation which
takes an enum argument with values FPST_FPCR, FPST_STD,
FPST_FPCR_F16, and FPST_STD_F16. We rename the function to
fpstatus_ptr() so that unconverted code gets a compilation error
rather than silently passing the wrong thing to the new function.

This commit implements that new API, and converts A64 to use it:
get_fpstatus_ptr(false) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(true) -> fpstatus_ptr(FPST_FPCR_F16)

Backports commit cdfb22bb7326fee607d9553358856cca341dbc9a
2021-02-26 11:46:51 -05:00
Peter Maydell e9240f0f54 target/arm: Delete unused ARM_FEATURE_CRC
In commit 962fcbf2efe57231a9f5df we converted the uses of the
ARM_FEATURE_CRC bit to use the aa32_crc32 isar_feature test
instead. However we forgot to remove the now-unused definition
of the feature name in the enum. Delete it now.

Backports commit cf6303d262e31f4812dfeb654c6c6803e52000af
2021-02-26 11:24:40 -05:00
Peter Maydell e0000d1700 target/arm/translate.c: Delete/amend incorrect comments
In arm_tr_init_disas_context() we have a FIXME comment that suggests
"cpu_M0 can probably be the same as cpu_V0". This isn't in fact
possible: cpu_V0 is used as a temporary inside gen_iwmmxt_shift(),
and that function is called in various places where cpu_M0 contains a
live value (i.e. between gen_op_iwmmxt_movq_M0_wRn() and
gen_op_iwmmxt_movq_wRn_M0() calls). Remove the comment.

We also have a comment on the declarations of cpu_V0/V1/M0 which
claims they're "for efficiency". This isn't true with modern TCG, so
replace this comment with one which notes that they're only used with
the iwmmxt decode

Backports 8b4c9a50dc9531a729ae4b5941d287ad0422db48
2021-02-26 11:23:52 -05:00
Peter Maydell 0759bb8eaf target/arm: Delete unused VFP_DREG macros
As part of the Neon decodetree conversion we removed all
the uses of the VFP_DREG macros, but forgot to remove the
macro definitions. Do so now.

Backports e60527c5d501e5015a119a0388a27abeae4dac09
2021-02-26 11:22:01 -05:00
Peter Maydell 368323b03f target/arm: Remove ARCH macro
The ARCH() macro was used a lot in the legacy decoder, but
there are now just two uses of it left. Since a macro which
expands out to a goto is liable to be confusing when reading
code, replace the last two uses with a simple open-coded
qeuivalent.

Backports ce51c7f522ca488c795c3510413e338021141c96
2021-02-26 11:21:20 -05:00
Peter Maydell 5d9c0addcf target/arm: Convert T32 coprocessor insns to decodetree
Convert the T32 coprocessor instructions to decodetree.
As with the A32 conversion, this corrects an underdecoding
where we did not check that MRRC/MCRR [24:21] were 0b0010
and so treated some kinds of LDC/STC and MRRC/MCRR rather
than UNDEFing them.

Backports commit 4c498dcfd84281f20bd55072630027d1b3c115fd
2021-02-26 11:19:35 -05:00
Peter Maydell bdaaac68f5 target/arm: Do M-profile NOCP checks early and via decodetree
For M-profile CPUs, the architecture specifies that the NOCP
exception when a coprocessor is not present or disabled should cover
the entire wide range of coprocessor-space encodings, and should take
precedence over UNDEF exceptions. (This is the opposite of
A-profile, where checking for a disabled FPU has to happen last.)

Implement this with decodetree patterns that cover the specified
ranges of the encoding space. There are a few instructions (VLLDM,
VLSTM, and in v8.1 also VSCCLRM) which are in copro-space but must
not be NOCP'd: these must be handled also in the new m-nocp.decode so
they take precedence.

This is a minor behaviour change: for unallocated insn patterns in
the VFP area (cp=10,11) we will now NOCP rather than UNDEF when the
FPU is disabled.

As well as giving us the correct architectural behaviour for v8.1M
and the recommended behaviour for v8.0M, this refactoring also
removes the old NOCP handling from the remains of the 'legacy
decoder' in disas_thumb2_insn(), paving the way for cleaning that up.

Since we don't currently have a v8.1M feature bit or any v8.1M CPUs,
the minor changes to this logic that we'll need for v8.1M are marked
up with TODO comments.

Backports commit a3494d4671797c291c88bd414acb0aead15f7239 from qemu
2021-02-26 11:17:23 -05:00
Peter Maydell c675b73b1f target/arm: Tidy up disas_arm_insn()
The only thing left in the "legacy decoder" is the handling
of disas_xscale_insn(), and we can simplify the code.

Backports commit 8198c071bc55bee55ef4f104a5b125f541b51096
2021-02-26 10:59:09 -05:00
Peter Maydell fc4cc9d95f target/arm: Convert A32 coprocessor insns to decodetree
Convert the A32 coprocessor instructions to decodetree.

Note that this corrects an underdecoding: for the 64-bit access case
(MRRC/MCRR) we did not check that bits [24:21] were 0b0010, so we
would incorrectly treat LDC/STC as MRRC/MCRR rather than UNDEFing
them.

The decodetree versions of these insns assume the coprocessor
is in the range 0..7 or 14..15. This is architecturally sensible
(as per the comments) and OK in practice for QEMU because the only
uses of the ARMCPRegInfo infrastructure we have that aren't
for coprocessors 14 or 15 are the pxa2xx use of coprocessor 6.
We add an assertion to the define_one_arm_cp_reg_with_opaque()
function to catch any accidental future attempts to use it to
define coprocessor registers for invalid coprocessors.

Backports commit cd8be50e58f63413c033531d3273c0e44851684f from qemu
2021-02-26 10:57:00 -05:00
Peter Maydell ef0e23f1f9 target/arm: Separate decode from handling of coproc insns
As a prelude to making coproc insns use decodetree, split out the
part of disas_coproc_insn() which does instruction decoding from the
part which does the actual work, and make do_coproc_insn() handle the
UNDEF-on-bad-permissions and similar cases itself rather than
returning 1 to eventually percolate up to a callsite that calls
unallocated_encoding() for it.

Backports 19c23a9baafc91dd3881a7a4e9bf454e42d24e4e
2021-02-26 10:53:52 -05:00
Peter Maydell 2944a75b98 target/arm: Pull handling of XScale insns out of disas_coproc_insn()
At the moment we check for XScale/iwMMXt insns inside
disas_coproc_insn(): for CPUs with ARM_FEATURE_XSCALE all copro insns
with cp 0 or 1 are handled specially. This works, but is an odd
place for this check, because disas_coproc_insn() is called from both
the Arm and Thumb decoders but the XScale case never applies for
Thumb (all the XScale CPUs were ARMv5, which has only Thumb1, not
Thumb2 with the 32-bit coprocessor insn encodings). It also makes it
awkward to convert the real copro access insns to decodetree.

Move the identification of XScale out to its own function
which is only called from disas_arm_insn().

Backports commit 7b4f933db865391a90a3b4518bb2050a83f2a873 from qemu
2021-02-26 10:50:32 -05:00
LIU Zhiwei 9b7f4b72fc target/riscv: vector single-width integer multiply instructions 2021-02-26 10:46:26 -05:00
LIU Zhiwei ab81642440 target/riscv: vector integer min/max instructions
558fa7797c919c4f21ac10980f3ed28160d6d3cb
2021-02-26 10:43:13 -05:00
LIU Zhiwei 965af9986a target/riscv: vector integer comparison instructions
1366fc79be04fa56a0e3f078ba4f26c27ac67e89
2021-02-26 10:40:33 -05:00
LIU Zhiwei 244793c4e8 target/riscv: vector single-width bit shift instructions
Backports 3277d955d21d8943d80062b4cfd8547f831dbd51
2021-02-26 10:37:09 -05:00
LIU Zhiwei 56c0e253c2 target/riscv: vector bitwise logical instructions
Backports d3842924cf93d104f691c5ea9090d6700ccef281
2021-02-26 10:30:33 -05:00
LIU Zhiwei 05153c6d7c target/riscv: vector integer add-with-carry / subtract-with-borrow instructions
3a6f8f68ad2f4a22d9ae8287f336b5dcc80b6448
2021-02-26 10:19:48 -05:00
LIU Zhiwei b9814de4c3 target/riscv: vector widening integer add and subtract
Backports 8fcdf77630290591a6068c2d82ca2935338c3b0c
2021-02-26 10:05:43 -05:00
LIU Zhiwei f564388e89 target/riscv: vector single-width integer add and subtract
Backports 43740e3a3b3bb66456103684e622ba4e9baae297
2021-02-26 09:58:31 -05:00
LIU Zhiwei 7d0d7338c2 target/riscv: add vector amo operations
Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Backports 268fcca66bde62257960ec8d859de374315a5e3d
2021-02-26 09:47:32 -05:00
LIU Zhiwei 152934bade target/riscv: add fault-only-first unit stride load
The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Backports commit 022b4ecf775ffeff522eaea4f0d94edcfe00a0a9 from qemu
2021-02-26 09:28:19 -05:00
LIU Zhiwei 887c29bc79 target/riscv: add vector index load and store instructions
Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Backports f732560e3551c0823cee52efba993fbb8f689a36
2021-02-26 03:00:45 -05:00
LIU Zhiwei c7a17d04a2 target/riscv: add vector stride load and store instructions
Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address. It can been seen as a special
case of strided operations.

Backports 751538d5da557e5c10e5045c2d27639580ea54a7
2021-02-26 02:55:14 -05:00
LIU Zhiwei e4bc5056cd target/riscv: add an internals.h header
The internals.h keeps things that are not relevant to the actual architecture,
only to the implementation, separate.

Backports f476f17740ad42288d42dd8fedcdae8ca7007a16
2021-02-26 02:39:29 -05:00
LIU Zhiwei 9db3b70869 target/riscv: add vector configure instruction
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Backports 2b7168fc43fb270fb89e1dddc17ef54714712f3a from qemu
2021-02-26 02:37:59 -05:00
LIU Zhiwei 0554e79ad1 target/riscv: support vector extension csr
The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Backports 8e3a1f18871e0ea251b95561fe1ec5a9bc896c4a from qemu
2021-02-26 02:25:58 -05:00
LIU Zhiwei bff31d8822 target/riscv: implementation-defined constant parameters
vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Backports 32931383270e2ca8209267ca99f23f3c5f780982 from qemu
2021-02-26 02:23:28 -05:00
LIU Zhiwei 0968caa249 target/riscv: add vector extension field in CPURISCVState
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Backports ad9e5aa2ae8032f19a8293b6b8f4661c06167bf0 from qemu
2021-02-26 02:17:49 -05:00
Peter Maydell fceb5e309a Open 5.2 development tree
Backports commit 672b2f2695891b6d818bddc3ce0df964c7627969 from qemu
2021-02-25 23:52:17 -05:00
Peter Maydell 1f497fc74a Update version for v5.1.0 release
Backports commit d0ed6a69d399ae193959225cdeaa9382746c91cc from qemu
2021-02-25 23:51:51 -05:00
Peter Maydell 3c229a2b9e Update version for v5.1.0-rc3 release 2021-02-25 23:51:33 -05:00
Peter Maydell 0718459fb3 target/arm: Fix Rt/Rt2 in ESR_ELx for copro traps from AArch32 to 64
When a coprocessor instruction in an AArch32 guest traps to AArch32
Hyp mode, the syndrome register (HSR) includes Rt and Rt2 fields
which are simply copies of the Rt and Rt2 fields from the trapped
instruction. However, if the instruction is trapped from AArch32 to
an AArch64 higher exception level, the Rt and Rt2 fields in the
syndrome register (ESR_ELx) must be the AArch64 view of the register.
This makes a difference if the AArch32 guest was in a mode other than
User or System and it was using r13 or r14, or if it was in FIQ mode
and using r8-r14.

We don't know at translate time which AArch32 CPU mode we are in, so
we leave the values we generate in our prototype syndrome register
value at translate time as the raw Rt/Rt2 from the instruction, and
instead correct them to the AArch64 view when we find we need to take
an exception from AArch32 to AArch64 with one of these syndrome
values.

Fixes: https://bugs.launchpad.net/qemu/+bug/1879587

Backports commit a65dabf71a9f9b949d556b1b57fd72595df92398 from qemu
2021-02-25 23:50:18 -05:00
Peter Collingbourne 7de60dfa51 target/arm: Fix decode of LDRA[AB] instructions
These instructions use zero as the discriminator, not SP.

Backports commit d250bb19ced3b702c7c37731855f6876d0cc7995 from qemu
2021-02-25 23:47:25 -05:00
Kaige Li 3004cc1f97 target/arm: Avoid maybe-uninitialized warning with gcc 4.9
GCC version 4.9.4 isn't clever enough to figure out that all
execution paths in disas_ldst() that use 'fn' will have initialized
it first, and so it warns:

/home/LiKaige/qemu/target/arm/translate-a64.c: In function ‘disas_ldst’:
/home/LiKaige/qemu/target/arm/translate-a64.c:3392:5: error: ‘fn’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
^
/home/LiKaige/qemu/target/arm/translate-a64.c:3318:22: note: ‘fn’ was declared here
AtomicThreeOpFn *fn;
^

Make it happy by initializing the variable to NULL.

Backports commit 88a90e3de6ae99cbcfcc04c862c51f241fdf685f from qemu
2021-02-25 23:45:13 -05:00
Richard Henderson ce8282d9cd target/arm: Fix AddPAC error indication
The definition of top_bit used in this function is one higher
than that used in the Arm ARM psuedo-code, which put the error
indication at top_bit - 1 at the wrong place, which meant that
it wasn't visible to Auth.

Fixing the definition of top_bit requires more changes, because
its most common use is for the count of bits in top_bit:bot_bit,
which would then need to be computed as top_bit - bot_bit + 1.

For now, prefer the minimal fix to the error indication alone.

Fixes: 63ff0ca94cb

Backports commit 8796fe40dd30cd9ffd3c958906471715c923b341 from qemu
2021-02-25 23:44:28 -05:00
Peter Maydell 4952920d4d Update version for v5.1.0-rc2 release
Backports commit 5772f2b1fc5d00e7e04e01fa28e9081d6550440a from qemu
2021-02-25 23:43:39 -05:00
Lioncash a1e8e0adff target/arm: Fix bad rebase within do_mem_zpz 2021-02-25 23:43:16 -05:00
Richard Henderson 5e1316a92e target/arm: Always pass cacheattr in S1_ptw_translate
When we changed the interface of get_phys_addr_lpae to require
the cacheattr parameter, this spot was missed. The compiler is
unable to detect the use of NULL vs the nonnull attribute here.

Fixes: 7e98e21c098

Backports commit a6d6f37aed4b171d121cd4a9363fbb41e90dcb53 from qemu
2021-02-25 23:40:32 -05:00
Laszlo Ersek 40c04c73b0 target/i386: floatx80: avoid compound literals in static initializers
Quoting ISO C99 6.7.8p4, "All the expressions in an initializer for an
object that has static storage duration shall be constant expressions or
string literals".

The compound literal produced by the make_floatx80() macro is not such a
constant expression, per 6.6p7-9. (An implementation may accept it,
according to 6.6p10, but is not required to.)

Therefore using "floatx80_zero" and make_floatx80() for initializing
"f2xm1_table" and "fpatan_table" is not portable. And gcc-4.8 in RHEL-7.6
actually chokes on them:

> target/i386/fpu_helper.c:871:5: error: initializer element is not constant
> { make_floatx80(0xbfff, 0x8000000000000000ULL),
> ^

We've had the make_floatx80_init() macro for this purpose since commit
3bf7e40ab914 ("softfloat: fix for C99", 2012-03-17), so let's use that
macro again.

Fixes: eca30647fc0 ("target/i386: reimplement f2xm1 using floatx80 operations")
Fixes: ff57bb7b632 ("target/i386: reimplement fpatan using floatx80 operations")

Backports commit 163b3d1af2552845a60967979aca8d78a6b1b088 from qemu
2021-02-25 23:38:54 -05:00
Richard Henderson 6390789a09 target/i386: Save cc_op before loop insns
We forgot to update cc_op before these branch insns,
which lead to losing track of the current eflags.

Buglink: https://bugs.launchpad.net/qemu/+bug/1888165

Backports commit 3cb3a7720b01830abd5fbb81819dbb9271bf7821 from qemu
2021-02-25 23:36:43 -05:00
Zong Li 001d2e6a29 target/riscv: Fix the range of pmpcfg of CSR funcion table
Backports commit 8ba26b0b2b00dd5849a6c0981e358dc7a7cc315d from qemu
2021-02-25 23:35:21 -05:00
Peter Maydell 08ce565d7c Update version for v5.1.0-rc1 release
Backports commit c8004fe6bbfc0d9c2e7b942c418a85efb3ac4b00 from qemu
2021-02-25 23:34:20 -05:00
Richard Henderson 55369d710c tcg: Save/restore vecop_list around minmax fallback
Forgetting this asserts when tcg_gen_cmp_vec is called from
within tcg_gen_cmpsel_vec.

Fixes: 72b4c792c7a

Backports commit 69c918d2ef319ac63cd759c527debc2a2bdf3a0c from qemu
2021-02-25 23:33:24 -05:00
Chenyi Qiang e5d9e0ed53 target/i386: add fast short REP MOV support
For CPUs support fast short REP MOV[CPUID.(EAX=7,ECX=0):EDX(bit4)], e.g
Icelake and Tigerlake, expose it to the guest VM.

Backports commit 5cb287d2bd578dfe4897458793b4fce35bc4f744 from qemu
2021-02-25 23:31:42 -05:00
Peter Maydell 113dc25fbf Update version for v5.1.0-rc0 release
Backports commit 8746309137ba470d1b2e8f5ce86ac228625db940 from qemu
2021-02-25 23:30:37 -05:00
Aaron Lindsay e532ce610e target/arm: Don't do raw writes for PMINTENCLR
Raw writes to this register when in KVM mode can cause interrupts to be
raised (even when the PMU is disabled). Because the underlying state is
already aliased to PMINTENSET (which already provides raw write
functions), we can safely disable raw accesses to PMINTENCLR entirely.

Backports commit 887c0f1544991f567543b7c214aa11ab0cea0a29 from qemu
2021-02-25 23:27:47 -05:00
Richard Henderson f403c1f54f target/arm: Fix mtedesc for do_mem_zpz
The mtedesc that was constructed was not actually passed in.
Found by Coverity (CID 1429996).

Backports commit cdecb3fc1eb182d90666348a47afe63c493686e7 from qemu
2021-02-25 23:25:54 -05:00
Paolo Bonzini 5b794349d3 target/i386: implement undocumented 'smsw r32' behavior
In 32-bit mode, the higher 16 bits of the destination
register are undefined. In practice CR0[31:0] is stored,
just like in 64-bit mode, so just remove the "if" that
currently differentiates the behavior.

Backports commit c0c8445255b2b5b440c355431c8b01b7b7b7c8cf from qemu
2021-02-25 23:23:51 -05:00
Joseph Myers cf54c51869 target/i386: fix IEEE SSE floating-point exception raising
The SSE instruction implementations all fail to raise the expected
IEEE floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in MXCSR.

Fix this by adding such conversions. Unlike for x87, emulated SSE
floating-point operations might be optimized using hardware floating
point on the host, and so a different approach is taken that is
compatible with such optimizations. The required invariant is that
all exceptions set in env->sse_status (other than "denormal operand",
for which the SSE semantics are different from those in the softfloat
code) are ones that are set in the MXCSR; the emulated MXCSR is
updated lazily when code reads MXCSR, while when code sets MXCSR, the
exceptions in env->sse_status are set accordingly.

A few instructions do not raise all the exceptions that would be
raised by the softfloat code, and those instructions are made to save
and restore the softfloat exception state accordingly.

Nothing is done about "denormal operand"; setting that (only for the
case when input denormals are *not* flushed to zero, the opposite of
the logic in the softfloat code for such an exception) will require
custom code for relevant instructions, or else architecture-specific
conditionals in the softfloat code for when to set such an exception
together with custom code for various SSE conversion and rounding
instructions that do not set that exception.

Nothing is done about trapping exceptions (for which there is minimal
and largely broken support in QEMU's emulation in the x87 case and no
support at all in the SSE case).

Backports commit 418b0f93d12a1589d5031405de857844f32e9ccc from qemu
2021-02-25 23:21:32 -05:00
Joseph Myers fd5b0dd456 target/i386: set SSE FTZ in correct floating-point state
The code to set floating-point state when MXCSR changes calls
set_flush_to_zero on &env->fp_status, so affecting the x87
floating-point state rather than the SSE state. Fix to call it for
&env->sse_status instead.

Backports commit 3ddc0eca2229846bfecc3485648a6cb85a466dc7 from qemu
2021-02-25 23:15:53 -05:00
Laurent Vivier c15ddf11dd softfloat,m68k: disable floatx80_invalid_encoding() for m68k
According to the comment, this definition of invalid encoding is given
by intel developer's manual, and doesn't comply with 680x0 FPU.

With m68k, the explicit integer bit can be zero in the case of:
- zeros (exp == 0, mantissa == 0)
- denormalized numbers (exp == 0, mantissa != 0)
- unnormalized numbers (exp != 0, exp < 0x7FFF)
- infinities (exp == 0x7FFF, mantissa == 0)
- not-a-numbers (exp == 0x7FFF, mantissa != 0)

For infinities and NaNs, the explicit integer bit can be either one or
zero.

The IEEE 754 standard does not define a zero integer bit. Such a number
is an unnormalized number. Hardware does not directly support
denormalized and unnormalized numbers, but implicitly supports them by
trapping them as unimplemented data types, allowing efficient conversion
in software.

See "M68000 FAMILY PROGRAMMER’S REFERENCE MANUAL",
"1.6 FLOATING-POINT DATA TYPES"

We will implement in the m68k TCG emulator the FP_UNIMP exception to
trap into the kernel to normalize the number. In case of linux-user,
the number will be normalized by QEMU.

Backports commit d159dd058c7dc48a9291fde92eaae52a9f26a4d1 from qemu
2021-02-25 23:14:47 -05:00
Mark Cave-Ayland db742bec00 target/m68k: consolidate physical translation offset into get_physical_address()
Since all callers to get_physical_address() now apply the same page offset to
the translation result, move the logic into get_physical_address() itself to
avoid duplication.

Backports commit 852002b5664bf079da05c5201dbf2345b870e5ed from qemu
2021-02-25 23:13:48 -05:00
Mark Cave-Ayland 3b2bc4b0c8 target/m68k: fix physical address translation in m68k_cpu_get_phys_page_debug()
The result of the get_physical_address() function should be combined with the
offset of the original page access before being returned. Otherwise the
m68k_cpu_get_phys_page_debug() function can round to the wrong page causing
incorrect lookups in gdbstub and various "Disassembler disagrees with
translator over instruction decoding" warnings to appear at translation time.

Fixes: 88b2fef6c3 ("target/m68k: add MC68040 MMU")
2021-02-25 23:12:12 -05:00
Richard Henderson 65d5288563 tcg: Fix do_nonatomic_op_* vs signed operations
The smin/smax/umin/umax operations require the operands to be
properly sign extended. Do not drop the MO_SIGN bit from the
load, and additionally extend the val input.

Backports commit 852f933e482518797f7785a2e017a215b88df815 from qemu
2021-02-25 23:10:40 -05:00
Richard Henderson 57c66389c2 target/arm: Fix temp double-free in sve ldr/str
The temp that gets assigned to clean_addr has been allocated with
new_tmp_a64, which means that it will be freed at the end of the
instruction. Freeing it earlier leads to assertion failure.

The loop creates a complication, in which we allocate a new local
temp, which does need freeing, and the final code path is shared
between the loop and non-loop.

Fix this complication by adding new_tmp_a64_local so that the new
local temp is freed at the end, and can be treated exactly like
the non-loop path.

Fixes: bba87d0a0f4

Backports commit 4b4dc9750a0aa0b9766bd755bf6512a84744ce8a from qemu
2021-02-25 23:10:37 -05:00
Richard Henderson 54e2107bdf target/arm: Enable MTE
We now implement all of the components of MTE, without actually
supporting any tagged memory. All MTE instructions will work,
trivially, so we can enable support.

Backports commit c7459633baa71d1781fde4a245d6ec9ce2f008cf from qemu
2021-02-25 23:00:27 -05:00
Richard Henderson a34fda25b0 target/arm: Add allocation tag storage for system mode
Look up the physical address for the given virtual address,
convert that to a tag physical address, and finally return
the host address that backs it.

Backports commit e4d5bf4fbd5abfc3727e711eda64a583cab4d637 from qemu
2021-02-25 22:58:56 -05:00
Richard Henderson 9b6c64f8f8 target/arm: Create tagged ram when MTE is enabled
Backports commit 8bce44a2f6beb388a3f157652b46e99929839a96 from qemu
2021-02-25 22:51:23 -05:00
Richard Henderson 2ea0b53c1a target/arm: Cache the Tagged bit for a page in MemTxAttrs
This "bit" is a particular value of the page's MemAttr.

Backports commit 337a03f07ff0f9e6295662f4094e03a045b60bdc from qemu
2021-02-25 22:48:04 -05:00
Richard Henderson 28cd096d67 target/arm: Always pass cacheattr to get_phys_addr
We need to check the memattr of a page in order to determine
whether it is Tagged for MTE. Between Stage1 and Stage2,
this becomes simpler if we always collect this data, instead
of occasionally being presented with NULL.

Use the nonnull attribute to allow the compiler to check that
all pointer arguments are non-null.

Backports commit 7e98e21c09871cddc20946c8f3f3595e93154ecb from qemu
2021-02-25 22:46:00 -05:00
Richard Henderson e2456a83a4 target/arm: Set PSTATE.TCO on exception entry
D1.10 specifies that exception handlers begin with tag checks overridden.

Backports commit 34669338bd9d66255fceaa84c314251ca49ca8d5 from qemu
2021-02-25 22:41:26 -05:00
Richard Henderson 35d0443056 target/arm: Implement data cache set allocation tags
This is DC GVA and DC GZVA, and the tag check for DC ZVA.

Backports commit eb821168db798302bd124a3b000cebc23bd0a395 from qemu
2021-02-25 22:40:08 -05:00
Richard Henderson 33f5bdabb1 target/arm: Complete TBI clearing for user-only for SVE
There are a number of paths by which the TBI is still intact
for user-only in the SVE helpers.

Because we currently always set TBI for user-only, we do not
need to pass down the actual TBI setting from above, and we
can remove the top byte in the inner-most primitives, so that
none are forgotten. Moreover, this keeps the "dirty" pointer
around at the higher levels, where we need it for any MTE checking.

Since the normal case, especially for user-only, goes through
RAM, this clearing merely adds two insns per page lookup, which
will be completely in the noise.

Backports commit c4af8ba19b9d22aac79cab679a20b159af9d6809 from qemu
2021-02-25 22:37:12 -05:00
Richard Henderson 732efce958 target/arm: Add mte helpers for sve scatter/gather memory ops
Because the elements are non-sequential, we cannot eliminate many
tests straight away like we can for sequential operations. But
we often have the PTE details handy, so we can test for Tagged.

Backports commit d28d12f008ee44dc2cc2ee5d8f673be9febc951e from qemu
2021-02-25 22:34:24 -05:00
Richard Henderson 5698b7badb target/arm: Handle TBI for sve scalar + int memory ops
We still need to handle tbi for user-only when mte is inactive.

Backports commit 9473d0ecafcffc8b258892b1f9f18e037bdba958 from qemu
2021-02-25 22:17:46 -05:00
Richard Henderson 586235d02d target/arm: Add mte helpers for sve scalar + int ff/nf loads
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.

Backports commit aa13f7c3c378fa41366b9fcd6c29af1c3d81126a from qemu
2021-02-25 22:09:17 -05:00
Richard Henderson cb31d54b18 target/arm: Add mte helpers for sve scalar + int stores
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.

Backports commit 71b9f3948c75bb97641a3c8c7de96d1cb47cdc07 from qemu
2021-02-25 21:53:55 -05:00
Richard Henderson 670b25c5fa target/arm: Add mte helpers for sve scalar + int loads
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.

Backports commit 206adacfb8d35e671e3619591608c475aa046b63 from qemu
2021-02-25 21:45:32 -05:00
Richard Henderson 6a78133659 target/arm: Remove sve_memopidx
None of the sve helpers use TCGMemOpIdx any longer, so we can
stop passing it.

Backports commit ba080b8682fc6bde7f2d9dedddb519d63cbe138f from qemu
2021-02-25 21:33:44 -05:00
Richard Henderson 1f306230d4 target/arm: Reuse sve_probe_page for gather loads
Backports commit 10a85e2c8ab6e004e7f3f1dcfea8cb0bf58fb9fb from qemu
2021-02-25 21:30:13 -05:00
Richard Henderson 585da952ec target/arm: Reuse sve_probe_page for scatter stores
Backports commit 88a660a48ef513ce9875b595e19b2a820b3f3fca from qemu
2021-02-25 21:27:14 -05:00
Richard Henderson 3eee880c2a target/arm: Reuse sve_probe_page for gather first-fault loads
This avoids the need for a separate set of helpers to implement
no-fault semantics, and will enable MTE in the future.

Backports commit 50de9b78cec06e6d16e92a114a505779359ca532 from qemu
2021-02-25 21:22:16 -05:00
Richard Henderson b1e31f3bf3 target/arm: Use SVEContLdSt for contiguous stores
Follow the model set up for contiguous loads. This handles
watchpoints correctly for contiguous stores, recognizing the
exception before any changes to memory.

Backports commit 0fa476c1bb37a70df7eeff1e5bfb4791feb37e0e from qemu
2021-02-25 21:15:14 -05:00
Richard Henderson 3591c2f548 target/arm: Update contiguous first-fault and no-fault loads
With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines. With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.

Backports commit c647673ce4d72a8789703c62a7f3cbc732cb1ea8 from qemu
2021-02-25 21:06:14 -05:00
Richard Henderson 6c9304448e target/arm: Use SVEContLdSt for multi-register contiguous loads
Backports commit 5c9b8458a0b3008d24d84b67e1c9b6d5f39f4d66 from qemu
2021-02-25 20:50:22 -05:00
Richard Henderson 3979c8f73e target/arm: Handle watchpoints in sve_ld1_r
Handle all of the watchpoints for active elements all at once,
before we've modified the vector register. This removes the
TLB_WATCHPOINT bit from page[].flags, which means that we can
use the normal fast path via RAM.

Backports commit 4bcc3f0ff8e5ae2b17b5aab9aa613ff1b8025896 from qemu
2021-02-25 20:44:13 -05:00
Richard Henderson 0e5aa37c9a target/arm: Use SVEContLdSt in sve_ld1_r
First use of the new helper functions, so we can remove the
unused markup. No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.

Backports commit b854fd06a868e0308bcfe05ad0a71210705814c7 from qemu
2021-02-25 20:41:53 -05:00
Richard Henderson d363c3d0ba target/arm: Adjust interface of sve_ld1_host_fn
The current interface includes a loop; change it to load a
single element. We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.

Replace each call with the simplest possible loop over active
elements.

Backports commit cf4a49b71b1712142d7122025a8ca7ea5b59d73f from qemu
2021-02-25 20:34:18 -05:00
Richard Henderson 94b0876f15 target/arm: Add sve infrastructure for page lookup
For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.

Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.

Temporarily mark the functions unused to avoid Werror.

Backports commit b4cd95d2f4c7197b844f51b29871d888063ea3e7 from qemu
2021-02-25 20:28:23 -05:00
Richard Henderson f430a399d4 target/arm: Drop manual handling of set/clear_helper_retaddr
Since we converted back to cpu_*_data_ra, we do not need to
do this ourselves.

Backports commit f32e2ab65f3a0fc03d58936709e5a565c4b0db50 from qemu
2021-02-25 20:20:29 -05:00
Richard Henderson 2e03f74a53 target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn
Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.

Since fb901c9, cpu_mem_index is now a simple extract
from env->hflags and not a large computation.  Which means
that it's now more work to pass around this value than it
is to recompute it.

This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.
2021-02-25 20:16:38 -05:00
Richard Henderson 84012be55c target/arm: Add arm_tlb_bti_gp
Introduce an lvalue macro to wrap target_tlb_bit0.

Backports commit 149d3b31f3f0f7f9e1c3a77043450a95c7a7e93d from qemu
2021-02-25 17:45:50 -05:00
Richard Henderson a9eb62d211 target/arm: Tidy trans_LD1R_zpri
Move the variable declarations to the top of the function,
but do not create a new label before sve_access_check.

Backports commit c0ed9166b1aea86a2fbaada1195aacd1049f9e85 from qemu
2021-02-25 17:42:40 -05:00
Richard Henderson 49bd9a5c68 target/arm: Use mte_checkN for sve unpredicated stores
Backports commit bba87d0a0f480805223a6428a7942a51733c488a from qemu
2021-02-25 17:40:43 -05:00
Richard Henderson 3ce14ebc78 target/arm: Use mte_checkN for sve unpredicated loads
Backports commit b2aa8879b884cd66acde4123899dd92a38fe6527 from qemu
2021-02-25 17:26:37 -05:00
Richard Henderson 4fdd05e1aa target/arm: Add helper_mte_check_zva
Use a special helper for DC_ZVA, rather than the more
general mte_checkN.

Backports commit 46dc1bc0601554823a42ad27f236da2ad8f3bdc6 from qemu
2021-02-25 17:17:54 -05:00
Richard Henderson 9a05ca01e7 target/arm: Implement helper_mte_checkN
Fill out the stub that was added earlier.

Backports commit 5add8248556a3c1006018d7d8e601c9572b280a9 from qemu
2021-02-25 17:10:56 -05:00
Richard Henderson 91e2f55b69 target/arm: Implement helper_mte_check1
Fill out the stub that was added earlier.

Backports commit 2e34ff45f32cb032883616a1cc5ea8ac96f546d5 from qemu
2021-02-25 17:02:34 -05:00
Richard Henderson 3e786526cf target/arm: Add gen_mte_checkN
Replace existing uses of check_data_tbi in translate-a64.c that
perform multiple logical memory access. Leave the helper blank
for now to reduce the patch size.

Backports commit 73ceeb0011b23bac8bd2c09ebe3c18d034aa69ce from qemu
2021-02-25 16:40:16 -05:00
Richard Henderson 582e64f348 target/arm: Add gen_mte_check1
Replace existing uses of check_data_tbi in translate-a64.c that
perform a single logical memory access. Leave the helper blank
for now to reduce the patch size.

Backports commit 0a405be2b8fd9506a009b10d7d2d98c394b36db6 from qemu
2021-02-25 16:13:31 -05:00
Richard Henderson 4488858072 target/arm: Move regime_tcr to internals.h
We will shortly need this in mte_helper.c as well.

Backports commit 38659d311d05e6c5feff6bddcc1c33b60d3b86a1 from qemu
2021-02-25 16:04:54 -05:00
Richard Henderson e46cec8543 target/arm: Move regime_el to internals.h
We will shortly need this in mte_helper.c as well.

Backports commit 9c7ab8fc8cb6d6e2fb7a82c1088691c7c23fa1b9 from qemu
2021-02-25 16:04:12 -05:00
Richard Henderson 7147f5f28a target/arm: Implement the access tag cache flushes
Like the regular data cache flushes, these are nops within qemu.

Backports commit 5463df160ecee510e78493993eb1bd38b4838a10 from qemu
2021-02-25 16:02:30 -05:00
Richard Henderson 4bb37fc3c1 target/arm: Implement the LDGM, STGM, STZGM instructions
Backports commit 5f716a82388eb09754dd900e7dbb8ffa15897a28 from qemu
2021-02-25 16:00:50 -05:00
Richard Henderson 5b3ddcf2e2 target/arm: Simplify DC_ZVA
Now that we know that the operation is on a single page,
we need not loop over pages while probing.

Backports commit e26d0d226892f67435cadcce86df0ddfb9943174 from qemu
2021-02-25 15:55:46 -05:00
Richard Henderson 7de60598d5 target/arm: Restrict the values of DCZID.BS under TCG
We can simplify our DC_ZVA if we recognize that the largest BS
that we actually use in system mode is 64. Let us just assert
that it fits within TARGET_PAGE_SIZE.

For DC_GVA and STZGM, we want to be able to write whole bytes
of tag memory, so assert that BS is >= 2 * TAG_GRANULE, or 32.

Backports commit a4157b80242bf1c8aa0ee77aae7458ba79012d5d from qemu
2021-02-25 15:12:20 -05:00
Richard Henderson e15aa7c5a7 target/arm: Implement the STGP instruction
Backports commit 6439d67fc944cf29de94a160e9450a2063c7b515 from qemu
2021-02-25 15:10:40 -05:00
Richard Henderson e8b9cb8b4a target/arm: Implement LDG, STG, ST2G instructions
Backports commit c15294c1e36a7dd9b25bd54d98178e80f4b64bc1 from qemu
2021-02-25 15:08:44 -05:00
Richard Henderson 448fc3ae4a target/arm: Define arm_cpu_do_unaligned_access for user-only
Use the same code as system mode, so that we generate the same
exception + syndrome for the unaligned access.

For the moment, if MTE is enabled so that this path is reachable,
this would generate a SIGSEGV in the user-only cpu_loop. Decoding
the syndrome to produce the proper SIGBUS will be done later.

Backports commit 0d1762e931f8a694f261c604daba605bcda70928 from qemu
2021-02-25 14:51:19 -05:00
Richard Henderson 43ea7aa828 target/arm: Implement the SUBP instruction
Backports commit dad3015f55f8d48f84f0eae36021a9c6f9587e57 from qemu
2021-02-25 14:46:00 -05:00
Richard Henderson 13fd83fcc9 target/arm: Implement the GMI instruction
Backports commit 438efea0bb639c9c2dfb42c8d9459e21aa183c8a from qemu
2021-02-25 14:44:29 -05:00
Richard Henderson 911a6b57ed target/arm: Implement the ADDG, SUBG instructions
Backports commit efbc78ad978763aedd11cb718eb1ff8db3fc9152 from qemu
2021-02-25 14:42:33 -05:00
Richard Henderson acd7e4cb18 target/arm: Revise decoding for disas_add_sub_imm
The current Arm ARM has adjusted the official decode of
"Add/subtract (immediate)" so that the shift field is only bit 22,
and bit 23 is part of the op1 field of the parent category
"Data processing - immediate".

Backports commit 21a8b343eaae63f6984f9a200092b0ea167647f1 from qemu
2021-02-25 14:38:46 -05:00
Richard Henderson 58f3dd2cc7 target/arm: Implement the IRG instruction
Backports commit da54941f45b820cbaca72aa6efd5669b3dc86e2f from qemu
2021-02-25 14:36:11 -05:00
Richard Henderson 6bec295bf8 target/arm: Add MTE bits to tb_flags
Cache the composite ATA setting.

Cache when MTE is fully enabled, i.e. access to tags are enabled
and tag checks affect the PE. Do this for both the normal context
and the UNPRIV context.

Backports commit 81ae05fa2d21ac1a0054935b74342aa38a5ecef7 from qemu
2021-02-25 14:31:41 -05:00
Richard Henderson f6be2a1a42 target/arm: Add MTE system registers
This is TFSRE0_EL1, TFSR_EL1, TFSR_EL2, TFSR_EL3,
RGSR_EL1, GCR_EL1, GMID_EL1, and PSTATE.TCO.

Backports commit 4b779cebb3e5ab30b945181f1ba3932f5f8a1cb5 from qemu
2021-02-25 14:12:24 -05:00
Richard Henderson 179a3aacdf target/arm: Add DISAS_UPDATE_NOCHAIN
Add an option that writes back the PC, like DISAS_UPDATE_EXIT,
but does not exit back to the main loop.

Backports commit 329833286d7a1b0ef8c7daafe13c6ae32429694e from qemu
2021-02-25 14:08:08 -05:00
Richard Henderson eaa6291aa7 target/arm: Rename DISAS_UPDATE to DISAS_UPDATE_EXIT
Emphasize that the is_jmp option exits to the main loop.

Backports commit 14407ec2007e18536ed34772eef46f6e0a0e3d0e from qemu
2021-02-25 14:02:46 -05:00
Richard Henderson 2540911bdd target/arm: Add support for MTE to SCTLR_ELx
target/arm: Add support for MTE to HCR_EL2 and SCR_EL3

This does not attempt to rectify all of the res0 bits, but does
clear the mte bits when not enabled. Since there is no high-part
mapping of SCTLR, aa32 mode cannot write to these bits.

Backports commits f00faf130d5dcf64b04f71a95f14745845ca1014, and
8ddb300bf60a5f3d358dd6fbf81174f6c03c1d9f from qemu.
2021-02-25 13:59:11 -05:00
Richard Henderson d81feac642 target/arm: Improve masking of SCR RES0 bits
Protect reads of aa64 id registers with ARM_CP_STATE_AA64.
Use this as a simpler test than arm_el_is_aa64, since EL3
cannot change mode.

Backports commit 252e8c69669599b4bcff802df300726300292f47 from qemu
2021-02-25 13:56:35 -05:00
Richard Henderson 1a35600453 target/arm: Add isar tests for mte
Backports commit c7fd0baac0c24defec66263799faa8618327b352 from qemu
2021-02-25 13:55:52 -05:00
Joseph Myers c01b7432a1 target/i386: reimplement fpatan using floatx80 operations
The x87 fpatan emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation. Reimplement using the soft-float operations, as
for other such instructions.

Backports commit ff57bb7b63267dabd60f88354c8c29ea5e1eb3ec from qemu
2021-02-25 13:48:32 -05:00
Joseph Myers ddb2f1d4dd target/i386: reimplement fyl2x using floatx80 operations
The x87 fyl2x emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation. Reimplement using the soft-float operations,
building on top of the reimplementation of fyl2xp1 and factoring out
code to be shared between the two instructions.

The included test assumes that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematically exact result (including that it should be exact, in the
exact cases which cover more cases than for fyl2xp1).

Backports commit 1f18a1e6ab8368a4eab2d22894d3b2ae75250cd3 from qemu
2021-02-25 13:46:29 -05:00
Joseph Myers ac2f3fa0f2 target/i386: reimplement fyl2xp1 using floatx80 operations
The x87 fyl2xp1 emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (adding 1 then using log rather than
attempting a better emulation using log1p).

Reimplement using the soft-float operations, as was done for f2xm1; as
in that case, m68k has related operations but not exactly this one and
it seemed safest to implement directly rather than reusing the m68k
code to avoid accumulation of errors.

A test is included with many randomly generated inputs. The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of y * log2(x + 1); the implementation aims to do
somewhat better than that (about 70 correct bits before rounding). I
haven't investigated how accurate hardware is.

Intel manuals describe a narrower range of valid arguments to this
instruction than AMD manuals. The implementation accepts the wider
range (it's needed anyway for the core code to be reusable in a
subsequent patch reimplementing fyl2x), but the test only has inputs
in the narrower range so that it's valid on hardware that may reject
or produce poor results for inputs outside that range.

Code in the previous implementation that sets C2 for some out-of-range
arguments is not carried forward to the new implementation; C2 is
undefined for this instruction and I suspect that code was just
cut-and-pasted from the trigonometric instructions (fcos, fptan, fsin,
fsincos) where C2 *is* defined to be set for out-of-range arguments.

Backports commit 5eebc49d2d0aa5fc7e90eeac97533051bb7b72fa from qemu
2021-02-25 13:43:46 -05:00
Joseph Myers 0a790f9937 target/i386: reimplement fprem, fprem1 using floatx80 operations
The x87 fprem and fprem1 emulation is currently based around
conversion to double, which is inherently unsuitable for a good
emulation of any floatx80 operation. Reimplement using the soft-float
floatx80 remainder operations.

Backports commit 5ef396e2ba865f34a4766dbd60c739fb4bcb4fcc from qemu
2021-02-25 13:41:54 -05:00
Joseph Myers 8d0bf2d6e1 softfloat: return low bits of quotient from floatx80_modrem
Both x87 and m68k need the low parts of the quotient for their
remainder operations. Arrange for floatx80_modrem to track those bits
and return them via a pointer.

The architectures using float32_rem and float64_rem do not appear to
need this information, so the *_rem interface is left unchanged and
the information returned only from floatx80_modrem. The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch does not
change that, but the m68k maintainers may wish to do so.

Backports commit 445810ec915687d37b8ae0ef8d7340ab4a153efa from qemu
2021-02-25 13:39:10 -05:00
Joseph Myers e4cfbc1f06 softfloat: do not set denominator high bit for floatx80 remainder
The floatx80 remainder implementation unnecessarily sets the high bit
of bSig explicitly. By that point in the function, arguments that are
invalid, zero, infinity or NaN have already been handled and
subnormals have been through normalizeFloatx80Subnormal, so the high
bit will already be set. Remove the unnecessary code.

Backports commit 566601f1f9d972e44214696d3cb320e6c18880aa from qemu
2021-02-25 13:37:13 -05:00
Joseph Myers 2d50384633 softfloat: do not return pseudo-denormal from floatx80 remainder
The floatx80 remainder implementation sometimes returns the numerator
unchanged when the denominator is sufficiently larger than the
numerator. But if the value to be returned unchanged is a
pseudo-denormal, that is incorrect. Fix it to normalize the numerator
in that case.

Backports commit b662495dca0a2a36008cf8def91e2566519ed3f2 from qemu
2021-02-25 13:36:42 -05:00
Joseph Myers 6b63555a00 softfloat: fix floatx80 remainder pseudo-denormal check for zero
The floatx80 remainder implementation ignores the high bit of the
significand when checking whether an operand (numerator) with zero
exponent is zero. This means it mishandles a pseudo-denormal
representation of 0x1p-16382L by treating it as zero. Fix this by
checking the whole significand instead.

Backports commit 499a2f7b554a295cfc10f8cd026d9b20a38fe664 from qemu
2021-02-25 13:35:17 -05:00
Joseph Myers b08d204a37 softfloat: merge floatx80_mod and floatx80_rem
The m68k-specific softfloat code includes a function floatx80_mod that
is extremely similar to floatx80_rem, but computing the remainder
based on truncating the quotient toward zero rather than rounding it
to nearest integer. This is also useful for emulating the x87 fprem
and fprem1 instructions. Change the floatx80_rem implementation into
floatx80_modrem that can perform either operation, with both
floatx80_rem and floatx80_mod as thin wrappers available for all
targets.

There does not appear to be any use for the _mod operation for other
floating-point formats in QEMU (the only other architectures using
_rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc,
for instructions that have been removed in the latest version of the
architecture), so no change is made to the code for other formats.

Backports commit 6b8b0136ab3018e4b552b485f808bf66bcf19ead from qemu
2021-02-25 13:34:05 -05:00
Joseph Myers 2aee4714ab target/i386: reimplement f2xm1 using floatx80 operations
The x87 f2xm1 emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (computing with pow and then
subtracting 1 rather than attempting a better emulation using expm1).

Reimplement using the soft-float operations, including additions and
multiplications with higher precision where appropriate to limit
accumulation of errors. I considered reusing some of the m68k code
for transcendental operations, but the instructions don't generally
correspond exactly to x87 operations (for example, m68k has 2^x and
e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
from applying multiple such operations each rounding to floatx80
precision, I wrote a direct implementation of 2^x - 1 instead. It
would be possible in principle to make the implementation more
efficient by doing the intermediate operations directly with
significands, signs and exponents and not packing / unpacking floatx80
format for each operation, but that would make it significantly more
complicated and it's not clear that's worthwhile; the m68k emulation
doesn't try to do that.

A test is included with many randomly generated inputs. The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding). I haven't
investigated how accurate hardware is.

Backports commit eca30647fc078f4d9ed1b455bd67960f99dbeb7a from qemu
2021-02-25 13:31:13 -05:00
Peter Maydell 4a1996502f target/arm: Remove dead code relating to SABA and UABA
In commit cfdb2c0c95ae9205b0 ("target/arm: Vectorize SABA/UABA") we
replaced the old handling of SABA/UABA with a vectorized implementation
which returns early rather than falling into the loop-ever-elements
code. We forgot to delete the part of the old looping code that
did the accumulate step, and Coverity correctly warns (CID 1428955)
that this code is now dead. Delete it.

Fixes: cfdb2c0c95ae9205b0

Backports commit ced7e8edb282765685d2ba0206a11f8692d8ec1c from qemu
2021-02-25 13:18:51 -05:00
Peter Maydell 167ed57625 target/arm: Remove unnecessary gen_io_end() calls
Since commit ba3e7926691ed3 it has been unnecessary for target code
to call gen_io_end() after an IO instruction in icount mode; it is
sufficient to call gen_io_start() before it and to force the end of
the TB.

Many now-unnecessary calls to gen_io_end() were removed in commit
9e9b10c6491153b, but some were missed or accidentally added later.
Remove unneeded calls from the arm target:

* the call in the handling of exception-return-via-LDM is
unnecessary, and the code is already forcing end-of-TB
* the call in the VFP access check code is more complicated:
we weren't ending the TB, so we need to add the code to
force that by setting DISAS_UPDATE
* the doc comment for ARM_CP_IO doesn't need to mention
gen_io_end() any more

Backports commit 55c812b74289863c348449135812027d188f040a from qemu
2021-02-25 13:17:32 -05:00
Peter Maydell 083d207fb0 target/arm: Move some functions used only in translate-neon.inc.c to that file
The functions neon_element_offset(), neon_load_element(),
neon_load_element64(), neon_store_element() and
neon_store_element64() are used only in the translate-neon.inc.c
file, so move their definitions there.

Since the .inc.c file is #included in translate.c this doesn't make
much difference currently, but it's a more logical place to put the
functions and it might be helpful if we ever decide to try to make
the .inc.c files genuinely separate compilation units.

Backports commit 6fb5787898aab6aa04887fed9cf3220dd4c3f36a from qemu
2021-02-25 13:15:23 -05:00
Peter Maydell 0b06317dc4 target/arm: Convert Neon VTRN to decodetree
Convert the Neon VTRN insn to decodetree. This is the last insn in the
Neon data-processing group, so we can remove all the now-unused old
decoder framework.

It's possible that there's a more efficient implementation of
VTRN, but for this conversion we just copy the existing approach.

Backports commit d4366190f84fe89cc5d46da995dac1e7d541b98e from qemu
2021-02-25 13:12:28 -05:00
Peter Maydell b7584069dd target/arm: Convert Neon VSWP to decodetree
Convert the Neon VSWP insn to decodetree. Since the new implementation
doesn't have to share a pass-loop with the other 2-reg-misc operations
we can implement the swap with 64-bit accesses rather than 32-bits
(which brings us into line with the pseudocode and is more efficient).

Backports commit 8ab3a227a0f13f0ff85846f36f7c466769aef4fc from qemu
2021-02-25 13:07:56 -05:00
Peter Maydell 73abdfea53 target/arm: Convert Neon 2-reg-misc VCVT insns to decodetree
Convert the VCVT instructions in the 2-reg-misc grouping to
decodetree.

Backports commit a183d5fb38b07bab2a840196186c4806f3c67c0d from qemu
2021-02-25 13:07:15 -05:00
Peter Maydell 7e705fdc8c target/arm: Convert Neon 2-reg-misc VRINT insns to decodetree
Convert the Neon 2-reg-misc VRINT insns to decodetree.
Giving these insns their own do_vrint() function allows us
to change the rounding mode just once at the start and end
rather than doing it for every element in the vector.

Backports commit 128123ea34e9e6afe4842aefcb9cf84b9642ac22 from qemu
2021-02-25 13:02:24 -05:00
Peter Maydell 3eddb77327 target/arm: Convert Neon 2-reg-misc fp-compare-with-zero insns to decodetree
Convert the fp-compare-with-zero insns in the Neon 2-reg-misc group to
decodetree.

Backports commit baa59323e841f76523f6ad4d746cdeb47ea574cd from qemu
2021-02-25 12:59:22 -05:00
Peter Maydell 6eb852ec1c target/arm: Convert simple fp Neon 2-reg-misc insns
Convert the Neon 2-reg-misc insns which are implemented with
simple calls to functions that take the input, output and
fpstatus pointer.

Backports commit 3e96b205286dfb8bbf363229709e4f8648fce379 from qemu
2021-02-25 12:56:28 -05:00
Peter Maydell 3dcee11013 target/arm: Convert Neon VQABS, VQNEG to decodetree
Convert the Neon VQABS and VQNEG insns to decodetree.
Since these are the only ones which need cpu_env passing to
the helper, we wrap the helper rather than creating a whole
new do_2misc_env() function.

Backports commit 4936f38abe6db0a9d23fd04e4cb0cf4d51cff174 from qemu
2021-02-25 12:53:18 -05:00
Peter Maydell 4033a3ca5c target/arm: Convert remaining simple 2-reg-misc Neon ops
Convert the remaining ops in the Neon 2-reg-misc group which
can be implemented simply with our do_2misc() helper.

Backports commit 84eae770af69c37a92496a4c4248875c070d5ee3 from qemu
2021-02-25 12:50:55 -05:00
Peter Maydell 88f8111500 target/arm: Convert Neon 2-reg-misc VREV32 and VREV16 to decodetree
Convert the VREV32 and VREV16 insns in the Neon 2-reg-misc group
to decodetree.

Backports commit 8966808205b59d6c196b380b638475bcd1657ef4 from qemu
2021-02-25 12:49:16 -05:00
Peter Maydell db1e503708 target/arm: Make gen_swap_half() take separate src and dest
Make gen_swap_half() take a source and destination TCGv_i32 rather
than modifying the input TCGv_i32; we're going to want to be able to
use it with the more flexible function signature, and this also
brings it into line with other functions like gen_rev16() and
gen_revsh().

Backports commit 8ec3de7018a8198624aae49eef5568256114a829 from qemu
2021-02-25 12:40:23 -05:00
Peter Maydell 3c1289c594 target/arm: Fix capitalization in NeonGenTwo{Single, Double}OPFn typedefs
All the other typedefs like these spell "Op" with a lowercase 'p';
remane the NeonGenTwoSingleOPFn and NeonGenTwoDoubleOPFn typedefs to
match.

Backports commit 5de3fd045be11b74cd0fbf36c6d4fb8387d5463b from qemu
2021-02-25 12:38:30 -05:00
Peter Maydell fa6727ebba target/arm: Rename NeonGenOneOpFn to NeonGenOne64OpFn
The NeonGenOneOpFn typedef breaks with the pattern of the other
NeonGen*Fn typedefs, because it is a TCGv_i64 -> TCGv_i64 operation
but it does not have '64' in its name. Rename it to NeonGenOne64OpFn,
so that the old name is available for a TCGv_i32 -> TCGv_i32 operation
(which we will need in a subsequent commit).

Backports commit 039f4e809ad2772fb33de4511ff68a485d875618 from qemu
2021-02-25 12:34:51 -05:00
Peter Maydell 27e74962e5 target/arm: Convert Neon 2-reg-misc crypto operations to decodetree
Convert the Neon-2-reg misc crypto ops (AESE, AESMC, SHA1H, SHA1SU1)
to decodetree.

Backports commit 0b30dd5b85e20aba259768cb7aaa952b3e319468 from qemu
2021-02-25 12:32:39 -05:00
Peter Maydell 4354448f57 target/arm: Convert vectorised 2-reg-misc Neon ops to decodetree
Convert to decodetree the insns in the Neon 2-reg-misc grouping which
we implement using gvec.

Backports commit 75153179e9928775d5333243ea4b278f438d75ae from qemu
2021-02-25 12:28:31 -05:00
Peter Maydell 6301f9acaa target/arm: Convert Neon VCVT f16/f32 insns to decodetree
Convert the Neon insns in the 2-reg-misc group which are
VCVT between f32 and f16 to decodetree.

Backports commit 654a517355e249435505ae5ff14a7520410cf7a4 from qemu
2021-02-25 12:25:32 -05:00
Peter Maydell 4ca33c54a2 target/arm: Convert Neon 2-reg-misc VSHLL to decodetree
Convert the VSHLL insn in the 2-reg-misc Neon group to decodetree.

Backports commit 749e2be36d75f11d5fa8f8277e2a0569bd2a1c97 from qemu
2021-02-25 12:20:57 -05:00
Peter Maydell 48d57d0dc7 target/arm: Convert Neon narrowing moves to decodetree
Convert the Neon narrowing moves VMQNV, VQMOVN, VQMOVUN in the 2-reg-misc
group to decodetree.

Backports commit 3882bdacb0ad548864b9f2582a32bb5c785e3165 from qemu
2021-02-25 12:18:01 -05:00
Peter Maydell 35d8a3e83f target/arm: Convert VZIP, VUZP to decodetree
Convert the Neon VZIP and VUZP insns in the 2-reg-misc group to
decodetree.

Backports commit 567663a2af2457da8aa74f221b1f3f8a6d2eddf6 from qemu
2021-02-25 12:14:29 -05:00
Peter Maydell d21fae82ba target/arm: Convert Neon 2-reg-misc pairwise ops to decodetree
Convert the pairwise ops VPADDL and VPADAL in the 2-reg-misc grouping
to decodetree.

At this point we can get rid of the weird CPU_V001 #define that was
used to avoid having to explicitly list all the arguments being
passed to some TCG gen/helper functions.

Backports commit 6106af3aa2304fccee91a3a90138352b0c2af998 from qemu
2021-02-25 12:12:11 -05:00
Peter Maydell 505923e676 target/arm: Convert Neon 2-reg-misc VREV64 to decodetree
Convert the Neon VREV64 insn from the 2-reg-misc grouping to decodetree.

Backports commit 353d2b85058711a5e44c2dc63eb5b620db50a602 from qemu
2021-02-25 12:07:06 -05:00
Alistair Francis e1f49dc888 target/riscv: Implement checks for hfence
Call the helper_hyp_tlb_flush() function on hfence instructions which
will generate an illegal insruction execption if we don't have
permission to flush the Hypervisor level TLBs.

Backports commit 2761db5fc20943bbd606b6fd49640ac000398de6 from qemu
2021-02-25 12:03:57 -05:00
Alistair Francis 8eb8bc290f target/riscv: Move the hfence instructions to the rvh decode
Also correct the name of the VVMA instruction.

Backports commit b8429ded723ec52568e05f6a24ed78c93224687c from qemu
2021-02-25 11:59:49 -05:00
Alistair Francis 39ff690eff target/riscv: Report errors validating 2nd-stage PTEs
Backports commit 88914473e748db20d8e18b9735f647a683319fa6 from qemu
2021-02-25 11:55:53 -05:00
Alistair Francis a6c323c912 target/riscv: Set access as data_load when validating stage-2 PTEs
Backports commit efe9f9c820d1322729957a60ff785c9527a79ddf from qemu
2021-02-25 11:54:31 -05:00
Ian Jiang 5c3a2f391c riscv: Add helper to make NaN-boxing for FP register
The function that makes NaN-boxing when a 32-bit value is assigned
to a 64-bit FP register is split out to a helper gen_nanbox_fpr().
Then it is applied in translating of the FLW instruction.

Backports commit 354908cee1f7ff761b5fedbdb6376c378c10f941 from qemu
2021-02-25 11:53:27 -05:00
MerryMage 92243aefd4 arm/translate: Do not tracecode when in an IT block 2021-02-07 19:14:32 +00:00
Sunho Kim d56c79776e unicorn: fix uc_emu_start until if end instruction is in another tlb 2020-08-05 03:18:51 +09:00
MerryMage 9ac17104b8 arm: Add missing file vec_internal.h
Missing from commit 1df7314dc3.

Ported from qemu a04b68e1d4c4f0cd5cd7542697b1b230b84532f5.
2020-06-20 00:12:09 +01:00
Philippe Mathieu-Daudé 4465ff9c93 fpu/softfloat: Silence 'bitwise negation of boolean expression' warning
When building with clang version 10.0.0-4ubuntu1, we get:

CC lm32-softmmu/fpu/softfloat.o
fpu/softfloat.c:3365:13: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation]
absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven );
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

fpu/softfloat.c:3423:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation]
absZ0 &= ~ ( ( (uint64_t) ( absZ1<<1 ) == 0 ) & roundNearestEven );
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

...

fpu/softfloat.c:4273:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation]
zSig1 &= ~ ( ( zSig2 + zSig2 == 0 ) & roundNearestEven );
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix by rewriting the fishy bitwise AND of two bools as an int.

Backports commit 4066288694c3bdd175df813cad675a3b5191956b from qemu
2020-06-18 23:56:27 -04:00
Peter Maydell 709610e606 target/arm: Convert Neon VDUP (scalar) to decodetree
Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)

Backports commit 9aaa23c2ae18e6fb9a291b81baf91341db76dfa0 from qemu
2020-06-17 00:43:19 -04:00
Peter Maydell 8de8a4500a target/arm: Convert Neon VTBL, VTBX to decodetree
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.

Backports commit 54e96c744b70a5d19f14b212a579dd3be8fcaad9 from qemu
2020-06-17 00:39:27 -04:00
Peter Maydell 4731a69d66 target/arm: Convert Neon VEXT to decodetree
Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.

We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.

Backports commit 0aad761fb0aed40c99039eacac470cbd03d07019 from qemu
2020-06-17 00:29:04 -04:00
Peter Maydell 1aa9046120 target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.

Backports commit 77e576a9281825fc170f3b3af83f47e110549b5c from qemu
2020-06-17 00:24:12 -04:00
Peter Maydell 088a1e8ba9 target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
group to decodetree.

Backports commit aa318f5b9b4ab3b6744b5305dd8ae9b96676f20e from qemu
2020-06-17 00:15:18 -04:00
Peter Maydell c0551804d4 target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
to decodetree.

Backports commit b2fc7be972b94872f6a6dd32d9bda1b88ddbcaad from qemu
2020-06-17 00:11:56 -04:00
Peter Maydell 2e8ae1130e target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.

Backports commit 85ac9aef9a5418de3168df569e21258e853840a2 from qemu
2020-06-17 00:09:32 -04:00
Peter Maydell bf1b0374b9 target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree. These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.

The refactoring removes some of the oddities of the old decoder:
* operands to the operation and accumulation were often
reversed (taking advantage of the fact that most of these ops
are commutative); the new code follows the pseudocode order
* the Q bit in the insn was in a local variable 'u'; in the
new code it is decoded into a->q

Backports commit 96fc80f5f186decd1a649f6c04252faceb057ad2 from qemu
2020-06-17 00:04:29 -04:00
Peter Maydell 1817f28afd target/arm: Add missing TCG temp free in do_2shift_env_64()
In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
temporary in do_2shift_env_64(); free it.

Backports commit a4f67e180def790ff0bbb33fc93bb6e80382f041 from qemu
2020-06-16 23:57:17 -04:00
Peter Maydell 06dfc2ada6 target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.

Backports commit 448f0e5f3ecfbd089b934e5e3aa0ccd1f51a6174 from qemu
2020-06-16 23:56:30 -04:00
Peter Maydell 6383a2bd15 target/arm: Convert Neon 3-reg-diff polynomial VMULL
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.

Backports commit 18fb58d588898550919392277787979ee7d0d84e from qemu
2020-06-16 23:54:51 -04:00
Peter Maydell 090426b120 target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.

These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.

Backports commit 9546ca5998d3cbd98a81b2d46a2e92a11b0f78a4 from qemu
2020-06-16 23:51:56 -04:00
Peter Maydell 5464405d5c target/arm: Convert Neon 3-reg-diff long multiplies
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.

Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.

Backports commit 3a1d9eb07b767a7592abca642af80906f9eab0ed from qemu
2020-06-16 23:47:28 -04:00
Peter Maydell 21044a1d11 target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.

Backports commit f5b28401200ec95ba89552df3ecdcdc342f6b90b from qemu
2020-06-16 23:41:20 -04:00
Peter Maydell 34418f1998 target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.

Backports commit 0fa1ab0302badabc3581aefcbb2f189ef52c4985 from qemu
2020-06-16 23:36:18 -04:00
Peter Maydell d25998ba7d target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.

As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.

Backports commit b28be09570d0827969b62b8f82b0f720a9915427 from qemu
2020-06-16 23:29:53 -04:00
Peter Maydell a9d0e36bcf target/arm: Fix missing temp frees in do_vshll_2sh
The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.

Backports commit 9593a3988c3e788790aa107d778386b09f456a6d from qemu
2020-06-16 23:26:04 -04:00
Thomas Huth 6053203c1c target/i386: Remove obsolete TODO file
The last real change to this file is from 2012, so it is very likely
that this file is completely out-of-date and ignored today. Let's
simply remove it to avoid confusion if someone finds it by accident.

Backports commit 3575b0aea983ad57804c9af739ed8ff7bc168393 from qemu
2020-06-15 13:22:56 -04:00
Joseph Myers 18b0ae9ebd target/i386: correct fix for pcmpxstrx substring search
This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
/ pcmpestrm / pcmpistri / pcmpistrm substring search, commit
ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.

That commit fixed a bug that showed up in four GCC tests with one libc
implementation. The tests in question generate random inputs to the
intrinsics and compare results to a C implementation, but they only
test 1024 possible random inputs, and when the tests use the cases of
those instructions that work with word rather than byte inputs, it's
easy to have problematic cases that show up much less frequently than
that. Thus, testing with a different libc implementation, and so a
different random number generator, showed up a problem with the
previous patch.

When investigating the previous test failures, I found the description
of these instructions in the Intel manuals (starting from computing a
16x16 or 8x8 set of comparison results) confusing and hard to match up
with the more optimized implementation in QEMU, and referred to AMD
manuals which described the instructions in a different way. Those
AMD descriptions are very explicit that the whole of the string being
searched for must be found in the other operand, not running off the
end of that operand; they say "If the prototype and the SUT are equal
in length, the two strings must be identical for the comparison to be
TRUE.". However, that statement is incorrect.

In my previous commit message, I noted:

The operation in this case is a search for a string (argument d to
the helper) in another string (argument s to the helper); if a copy
of d at a particular position would run off the end of s, the
resulting output bit should be 0 whether or not the strings match in
the region where they overlap, but the QEMU implementation was
wrongly comparing only up to the point where s ends and counting it
as a match if an initial segment of d matched a terminal segment of
s. Here, "run off the end of s" means that some byte of d would
overlap some byte outside of s; thus, if d has zero length, it is
considered to match everywhere, including after the end of s.

The description "some byte of d would overlap some byte outside of s"
is accurate only when understood to refer to overlapping some byte
*within the 16-byte operand* but at or after the zero terminator; it
is valid to run over the end of s if the end of s is the end of the
16-byte operand. So the fix in the previous patch for the case of d
being empty was correct, but the other part of that patch was not
correct (as it never allowed partial matches even at the end of the
16-byte operand). Nor was the code before the previous patch correct
for the case of d nonempty, as it would always have allowed partial
matches at the end of s.

Fix with a partial revert of my previous change, combined with
inserting a check for the special case of s having maximum length to
determine where it is necessary to check for matches.

In the added test, test 1 is for the case of empty strings, which
failed before my 2017 patch, test 2 is for the bug introduced by my
2017 patch and test 3 deals with the case where a match of an initial
segment at the end of the string is not valid when the string ends
before the end of the 16-byte operand (that is, the case that would be
broken by a simple revert of the non-empty-string part of my 2017
patch).

Backports commit bc921b2711c4e2e8ab99a3045f6c0f134a93b535 from qemu
2020-06-15 13:20:48 -04:00