Commit graph

5431 commits

Author SHA1 Message Date
Richard Henderson 0e5aa37c9a target/arm: Use SVEContLdSt in sve_ld1_r
First use of the new helper functions, so we can remove the
unused markup. No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.

Backports commit b854fd06a868e0308bcfe05ad0a71210705814c7 from qemu
2021-02-25 20:41:53 -05:00
Richard Henderson d363c3d0ba target/arm: Adjust interface of sve_ld1_host_fn
The current interface includes a loop; change it to load a
single element. We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.

Replace each call with the simplest possible loop over active
elements.

Backports commit cf4a49b71b1712142d7122025a8ca7ea5b59d73f from qemu
2021-02-25 20:34:18 -05:00
Richard Henderson 94b0876f15 target/arm: Add sve infrastructure for page lookup
For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.

Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.

Temporarily mark the functions unused to avoid Werror.

Backports commit b4cd95d2f4c7197b844f51b29871d888063ea3e7 from qemu
2021-02-25 20:28:23 -05:00
Richard Henderson f430a399d4 target/arm: Drop manual handling of set/clear_helper_retaddr
Since we converted back to cpu_*_data_ra, we do not need to
do this ourselves.

Backports commit f32e2ab65f3a0fc03d58936709e5a565c4b0db50 from qemu
2021-02-25 20:20:29 -05:00
Richard Henderson 2e03f74a53 target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn
Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.

Since fb901c9, cpu_mem_index is now a simple extract
from env->hflags and not a large computation.  Which means
that it's now more work to pass around this value than it
is to recompute it.

This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.
2021-02-25 20:16:38 -05:00
Richard Henderson 84012be55c target/arm: Add arm_tlb_bti_gp
Introduce an lvalue macro to wrap target_tlb_bit0.

Backports commit 149d3b31f3f0f7f9e1c3a77043450a95c7a7e93d from qemu
2021-02-25 17:45:50 -05:00
Richard Henderson a9eb62d211 target/arm: Tidy trans_LD1R_zpri
Move the variable declarations to the top of the function,
but do not create a new label before sve_access_check.

Backports commit c0ed9166b1aea86a2fbaada1195aacd1049f9e85 from qemu
2021-02-25 17:42:40 -05:00
Richard Henderson 49bd9a5c68 target/arm: Use mte_checkN for sve unpredicated stores
Backports commit bba87d0a0f480805223a6428a7942a51733c488a from qemu
2021-02-25 17:40:43 -05:00
Richard Henderson 3ce14ebc78 target/arm: Use mte_checkN for sve unpredicated loads
Backports commit b2aa8879b884cd66acde4123899dd92a38fe6527 from qemu
2021-02-25 17:26:37 -05:00
Richard Henderson 4fdd05e1aa target/arm: Add helper_mte_check_zva
Use a special helper for DC_ZVA, rather than the more
general mte_checkN.

Backports commit 46dc1bc0601554823a42ad27f236da2ad8f3bdc6 from qemu
2021-02-25 17:17:54 -05:00
Richard Henderson 9a05ca01e7 target/arm: Implement helper_mte_checkN
Fill out the stub that was added earlier.

Backports commit 5add8248556a3c1006018d7d8e601c9572b280a9 from qemu
2021-02-25 17:10:56 -05:00
Richard Henderson 91e2f55b69 target/arm: Implement helper_mte_check1
Fill out the stub that was added earlier.

Backports commit 2e34ff45f32cb032883616a1cc5ea8ac96f546d5 from qemu
2021-02-25 17:02:34 -05:00
Richard Henderson 3e786526cf target/arm: Add gen_mte_checkN
Replace existing uses of check_data_tbi in translate-a64.c that
perform multiple logical memory access. Leave the helper blank
for now to reduce the patch size.

Backports commit 73ceeb0011b23bac8bd2c09ebe3c18d034aa69ce from qemu
2021-02-25 16:40:16 -05:00
Richard Henderson 582e64f348 target/arm: Add gen_mte_check1
Replace existing uses of check_data_tbi in translate-a64.c that
perform a single logical memory access. Leave the helper blank
for now to reduce the patch size.

Backports commit 0a405be2b8fd9506a009b10d7d2d98c394b36db6 from qemu
2021-02-25 16:13:31 -05:00
Richard Henderson 4488858072 target/arm: Move regime_tcr to internals.h
We will shortly need this in mte_helper.c as well.

Backports commit 38659d311d05e6c5feff6bddcc1c33b60d3b86a1 from qemu
2021-02-25 16:04:54 -05:00
Richard Henderson e46cec8543 target/arm: Move regime_el to internals.h
We will shortly need this in mte_helper.c as well.

Backports commit 9c7ab8fc8cb6d6e2fb7a82c1088691c7c23fa1b9 from qemu
2021-02-25 16:04:12 -05:00
Richard Henderson 7147f5f28a target/arm: Implement the access tag cache flushes
Like the regular data cache flushes, these are nops within qemu.

Backports commit 5463df160ecee510e78493993eb1bd38b4838a10 from qemu
2021-02-25 16:02:30 -05:00
Richard Henderson 4bb37fc3c1 target/arm: Implement the LDGM, STGM, STZGM instructions
Backports commit 5f716a82388eb09754dd900e7dbb8ffa15897a28 from qemu
2021-02-25 16:00:50 -05:00
Richard Henderson 5b3ddcf2e2 target/arm: Simplify DC_ZVA
Now that we know that the operation is on a single page,
we need not loop over pages while probing.

Backports commit e26d0d226892f67435cadcce86df0ddfb9943174 from qemu
2021-02-25 15:55:46 -05:00
Richard Henderson 7de60598d5 target/arm: Restrict the values of DCZID.BS under TCG
We can simplify our DC_ZVA if we recognize that the largest BS
that we actually use in system mode is 64. Let us just assert
that it fits within TARGET_PAGE_SIZE.

For DC_GVA and STZGM, we want to be able to write whole bytes
of tag memory, so assert that BS is >= 2 * TAG_GRANULE, or 32.

Backports commit a4157b80242bf1c8aa0ee77aae7458ba79012d5d from qemu
2021-02-25 15:12:20 -05:00
Richard Henderson e15aa7c5a7 target/arm: Implement the STGP instruction
Backports commit 6439d67fc944cf29de94a160e9450a2063c7b515 from qemu
2021-02-25 15:10:40 -05:00
Richard Henderson e8b9cb8b4a target/arm: Implement LDG, STG, ST2G instructions
Backports commit c15294c1e36a7dd9b25bd54d98178e80f4b64bc1 from qemu
2021-02-25 15:08:44 -05:00
Richard Henderson 448fc3ae4a target/arm: Define arm_cpu_do_unaligned_access for user-only
Use the same code as system mode, so that we generate the same
exception + syndrome for the unaligned access.

For the moment, if MTE is enabled so that this path is reachable,
this would generate a SIGSEGV in the user-only cpu_loop. Decoding
the syndrome to produce the proper SIGBUS will be done later.

Backports commit 0d1762e931f8a694f261c604daba605bcda70928 from qemu
2021-02-25 14:51:19 -05:00
Richard Henderson 43ea7aa828 target/arm: Implement the SUBP instruction
Backports commit dad3015f55f8d48f84f0eae36021a9c6f9587e57 from qemu
2021-02-25 14:46:00 -05:00
Richard Henderson 13fd83fcc9 target/arm: Implement the GMI instruction
Backports commit 438efea0bb639c9c2dfb42c8d9459e21aa183c8a from qemu
2021-02-25 14:44:29 -05:00
Richard Henderson 911a6b57ed target/arm: Implement the ADDG, SUBG instructions
Backports commit efbc78ad978763aedd11cb718eb1ff8db3fc9152 from qemu
2021-02-25 14:42:33 -05:00
Richard Henderson acd7e4cb18 target/arm: Revise decoding for disas_add_sub_imm
The current Arm ARM has adjusted the official decode of
"Add/subtract (immediate)" so that the shift field is only bit 22,
and bit 23 is part of the op1 field of the parent category
"Data processing - immediate".

Backports commit 21a8b343eaae63f6984f9a200092b0ea167647f1 from qemu
2021-02-25 14:38:46 -05:00
Richard Henderson 58f3dd2cc7 target/arm: Implement the IRG instruction
Backports commit da54941f45b820cbaca72aa6efd5669b3dc86e2f from qemu
2021-02-25 14:36:11 -05:00
Richard Henderson 6bec295bf8 target/arm: Add MTE bits to tb_flags
Cache the composite ATA setting.

Cache when MTE is fully enabled, i.e. access to tags are enabled
and tag checks affect the PE. Do this for both the normal context
and the UNPRIV context.

Backports commit 81ae05fa2d21ac1a0054935b74342aa38a5ecef7 from qemu
2021-02-25 14:31:41 -05:00
Richard Henderson f6be2a1a42 target/arm: Add MTE system registers
This is TFSRE0_EL1, TFSR_EL1, TFSR_EL2, TFSR_EL3,
RGSR_EL1, GCR_EL1, GMID_EL1, and PSTATE.TCO.

Backports commit 4b779cebb3e5ab30b945181f1ba3932f5f8a1cb5 from qemu
2021-02-25 14:12:24 -05:00
Richard Henderson 179a3aacdf target/arm: Add DISAS_UPDATE_NOCHAIN
Add an option that writes back the PC, like DISAS_UPDATE_EXIT,
but does not exit back to the main loop.

Backports commit 329833286d7a1b0ef8c7daafe13c6ae32429694e from qemu
2021-02-25 14:08:08 -05:00
Richard Henderson eaa6291aa7 target/arm: Rename DISAS_UPDATE to DISAS_UPDATE_EXIT
Emphasize that the is_jmp option exits to the main loop.

Backports commit 14407ec2007e18536ed34772eef46f6e0a0e3d0e from qemu
2021-02-25 14:02:46 -05:00
Richard Henderson 2540911bdd target/arm: Add support for MTE to SCTLR_ELx
target/arm: Add support for MTE to HCR_EL2 and SCR_EL3

This does not attempt to rectify all of the res0 bits, but does
clear the mte bits when not enabled. Since there is no high-part
mapping of SCTLR, aa32 mode cannot write to these bits.

Backports commits f00faf130d5dcf64b04f71a95f14745845ca1014, and
8ddb300bf60a5f3d358dd6fbf81174f6c03c1d9f from qemu.
2021-02-25 13:59:11 -05:00
Richard Henderson d81feac642 target/arm: Improve masking of SCR RES0 bits
Protect reads of aa64 id registers with ARM_CP_STATE_AA64.
Use this as a simpler test than arm_el_is_aa64, since EL3
cannot change mode.

Backports commit 252e8c69669599b4bcff802df300726300292f47 from qemu
2021-02-25 13:56:35 -05:00
Richard Henderson 1a35600453 target/arm: Add isar tests for mte
Backports commit c7fd0baac0c24defec66263799faa8618327b352 from qemu
2021-02-25 13:55:52 -05:00
Joseph Myers c01b7432a1 target/i386: reimplement fpatan using floatx80 operations
The x87 fpatan emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation. Reimplement using the soft-float operations, as
for other such instructions.

Backports commit ff57bb7b63267dabd60f88354c8c29ea5e1eb3ec from qemu
2021-02-25 13:48:32 -05:00
Joseph Myers ddb2f1d4dd target/i386: reimplement fyl2x using floatx80 operations
The x87 fyl2x emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation. Reimplement using the soft-float operations,
building on top of the reimplementation of fyl2xp1 and factoring out
code to be shared between the two instructions.

The included test assumes that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematically exact result (including that it should be exact, in the
exact cases which cover more cases than for fyl2xp1).

Backports commit 1f18a1e6ab8368a4eab2d22894d3b2ae75250cd3 from qemu
2021-02-25 13:46:29 -05:00
Joseph Myers ac2f3fa0f2 target/i386: reimplement fyl2xp1 using floatx80 operations
The x87 fyl2xp1 emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (adding 1 then using log rather than
attempting a better emulation using log1p).

Reimplement using the soft-float operations, as was done for f2xm1; as
in that case, m68k has related operations but not exactly this one and
it seemed safest to implement directly rather than reusing the m68k
code to avoid accumulation of errors.

A test is included with many randomly generated inputs. The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of y * log2(x + 1); the implementation aims to do
somewhat better than that (about 70 correct bits before rounding). I
haven't investigated how accurate hardware is.

Intel manuals describe a narrower range of valid arguments to this
instruction than AMD manuals. The implementation accepts the wider
range (it's needed anyway for the core code to be reusable in a
subsequent patch reimplementing fyl2x), but the test only has inputs
in the narrower range so that it's valid on hardware that may reject
or produce poor results for inputs outside that range.

Code in the previous implementation that sets C2 for some out-of-range
arguments is not carried forward to the new implementation; C2 is
undefined for this instruction and I suspect that code was just
cut-and-pasted from the trigonometric instructions (fcos, fptan, fsin,
fsincos) where C2 *is* defined to be set for out-of-range arguments.

Backports commit 5eebc49d2d0aa5fc7e90eeac97533051bb7b72fa from qemu
2021-02-25 13:43:46 -05:00
Joseph Myers 0a790f9937 target/i386: reimplement fprem, fprem1 using floatx80 operations
The x87 fprem and fprem1 emulation is currently based around
conversion to double, which is inherently unsuitable for a good
emulation of any floatx80 operation. Reimplement using the soft-float
floatx80 remainder operations.

Backports commit 5ef396e2ba865f34a4766dbd60c739fb4bcb4fcc from qemu
2021-02-25 13:41:54 -05:00
Joseph Myers 8d0bf2d6e1 softfloat: return low bits of quotient from floatx80_modrem
Both x87 and m68k need the low parts of the quotient for their
remainder operations. Arrange for floatx80_modrem to track those bits
and return them via a pointer.

The architectures using float32_rem and float64_rem do not appear to
need this information, so the *_rem interface is left unchanged and
the information returned only from floatx80_modrem. The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch does not
change that, but the m68k maintainers may wish to do so.

Backports commit 445810ec915687d37b8ae0ef8d7340ab4a153efa from qemu
2021-02-25 13:39:10 -05:00
Joseph Myers e4cfbc1f06 softfloat: do not set denominator high bit for floatx80 remainder
The floatx80 remainder implementation unnecessarily sets the high bit
of bSig explicitly. By that point in the function, arguments that are
invalid, zero, infinity or NaN have already been handled and
subnormals have been through normalizeFloatx80Subnormal, so the high
bit will already be set. Remove the unnecessary code.

Backports commit 566601f1f9d972e44214696d3cb320e6c18880aa from qemu
2021-02-25 13:37:13 -05:00
Joseph Myers 2d50384633 softfloat: do not return pseudo-denormal from floatx80 remainder
The floatx80 remainder implementation sometimes returns the numerator
unchanged when the denominator is sufficiently larger than the
numerator. But if the value to be returned unchanged is a
pseudo-denormal, that is incorrect. Fix it to normalize the numerator
in that case.

Backports commit b662495dca0a2a36008cf8def91e2566519ed3f2 from qemu
2021-02-25 13:36:42 -05:00
Joseph Myers 6b63555a00 softfloat: fix floatx80 remainder pseudo-denormal check for zero
The floatx80 remainder implementation ignores the high bit of the
significand when checking whether an operand (numerator) with zero
exponent is zero. This means it mishandles a pseudo-denormal
representation of 0x1p-16382L by treating it as zero. Fix this by
checking the whole significand instead.

Backports commit 499a2f7b554a295cfc10f8cd026d9b20a38fe664 from qemu
2021-02-25 13:35:17 -05:00
Joseph Myers b08d204a37 softfloat: merge floatx80_mod and floatx80_rem
The m68k-specific softfloat code includes a function floatx80_mod that
is extremely similar to floatx80_rem, but computing the remainder
based on truncating the quotient toward zero rather than rounding it
to nearest integer. This is also useful for emulating the x87 fprem
and fprem1 instructions. Change the floatx80_rem implementation into
floatx80_modrem that can perform either operation, with both
floatx80_rem and floatx80_mod as thin wrappers available for all
targets.

There does not appear to be any use for the _mod operation for other
floating-point formats in QEMU (the only other architectures using
_rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc,
for instructions that have been removed in the latest version of the
architecture), so no change is made to the code for other formats.

Backports commit 6b8b0136ab3018e4b552b485f808bf66bcf19ead from qemu
2021-02-25 13:34:05 -05:00
Joseph Myers 2aee4714ab target/i386: reimplement f2xm1 using floatx80 operations
The x87 f2xm1 emulation is currently based around conversion to
double. This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (computing with pow and then
subtracting 1 rather than attempting a better emulation using expm1).

Reimplement using the soft-float operations, including additions and
multiplications with higher precision where appropriate to limit
accumulation of errors. I considered reusing some of the m68k code
for transcendental operations, but the instructions don't generally
correspond exactly to x87 operations (for example, m68k has 2^x and
e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
from applying multiple such operations each rounding to floatx80
precision, I wrote a direct implementation of 2^x - 1 instead. It
would be possible in principle to make the implementation more
efficient by doing the intermediate operations directly with
significands, signs and exponents and not packing / unpacking floatx80
format for each operation, but that would make it significantly more
complicated and it's not clear that's worthwhile; the m68k emulation
doesn't try to do that.

A test is included with many randomly generated inputs. The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding). I haven't
investigated how accurate hardware is.

Backports commit eca30647fc078f4d9ed1b455bd67960f99dbeb7a from qemu
2021-02-25 13:31:13 -05:00
Peter Maydell 4a1996502f target/arm: Remove dead code relating to SABA and UABA
In commit cfdb2c0c95ae9205b0 ("target/arm: Vectorize SABA/UABA") we
replaced the old handling of SABA/UABA with a vectorized implementation
which returns early rather than falling into the loop-ever-elements
code. We forgot to delete the part of the old looping code that
did the accumulate step, and Coverity correctly warns (CID 1428955)
that this code is now dead. Delete it.

Fixes: cfdb2c0c95ae9205b0

Backports commit ced7e8edb282765685d2ba0206a11f8692d8ec1c from qemu
2021-02-25 13:18:51 -05:00
Peter Maydell 167ed57625 target/arm: Remove unnecessary gen_io_end() calls
Since commit ba3e7926691ed3 it has been unnecessary for target code
to call gen_io_end() after an IO instruction in icount mode; it is
sufficient to call gen_io_start() before it and to force the end of
the TB.

Many now-unnecessary calls to gen_io_end() were removed in commit
9e9b10c6491153b, but some were missed or accidentally added later.
Remove unneeded calls from the arm target:

* the call in the handling of exception-return-via-LDM is
unnecessary, and the code is already forcing end-of-TB
* the call in the VFP access check code is more complicated:
we weren't ending the TB, so we need to add the code to
force that by setting DISAS_UPDATE
* the doc comment for ARM_CP_IO doesn't need to mention
gen_io_end() any more

Backports commit 55c812b74289863c348449135812027d188f040a from qemu
2021-02-25 13:17:32 -05:00
Peter Maydell 083d207fb0 target/arm: Move some functions used only in translate-neon.inc.c to that file
The functions neon_element_offset(), neon_load_element(),
neon_load_element64(), neon_store_element() and
neon_store_element64() are used only in the translate-neon.inc.c
file, so move their definitions there.

Since the .inc.c file is #included in translate.c this doesn't make
much difference currently, but it's a more logical place to put the
functions and it might be helpful if we ever decide to try to make
the .inc.c files genuinely separate compilation units.

Backports commit 6fb5787898aab6aa04887fed9cf3220dd4c3f36a from qemu
2021-02-25 13:15:23 -05:00
Peter Maydell 0b06317dc4 target/arm: Convert Neon VTRN to decodetree
Convert the Neon VTRN insn to decodetree. This is the last insn in the
Neon data-processing group, so we can remove all the now-unused old
decoder framework.

It's possible that there's a more efficient implementation of
VTRN, but for this conversion we just copy the existing approach.

Backports commit d4366190f84fe89cc5d46da995dac1e7d541b98e from qemu
2021-02-25 13:12:28 -05:00
Peter Maydell b7584069dd target/arm: Convert Neon VSWP to decodetree
Convert the Neon VSWP insn to decodetree. Since the new implementation
doesn't have to share a pass-loop with the other 2-reg-misc operations
we can implement the swap with 64-bit accesses rather than 32-bits
(which brings us into line with the pseudocode and is more efficient).

Backports commit 8ab3a227a0f13f0ff85846f36f7c466769aef4fc from qemu
2021-02-25 13:07:56 -05:00