There are a number of paths by which the TBI is still intact
for user-only in the SVE helpers.
Because we currently always set TBI for user-only, we do not
need to pass down the actual TBI setting from above, and we
can remove the top byte in the inner-most primitives, so that
none are forgotten. Moreover, this keeps the "dirty" pointer
around at the higher levels, where we need it for any MTE checking.
Since the normal case, especially for user-only, goes through
RAM, this clearing merely adds two insns per page lookup, which
will be completely in the noise.
Backports commit c4af8ba19b9d22aac79cab679a20b159af9d6809 from qemu
Because the elements are non-sequential, we cannot eliminate many
tests straight away like we can for sequential operations. But
we often have the PTE details handy, so we can test for Tagged.
Backports commit d28d12f008ee44dc2cc2ee5d8f673be9febc951e from qemu
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.
Backports commit aa13f7c3c378fa41366b9fcd6c29af1c3d81126a from qemu
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.
Backports commit 71b9f3948c75bb97641a3c8c7de96d1cb47cdc07 from qemu
Because the elements are sequential, we can eliminate many tests all
at once when the tag hits TCMA, or if the page(s) are not Tagged.
Backports commit 206adacfb8d35e671e3619591608c475aa046b63 from qemu
This avoids the need for a separate set of helpers to implement
no-fault semantics, and will enable MTE in the future.
Backports commit 50de9b78cec06e6d16e92a114a505779359ca532 from qemu
Follow the model set up for contiguous loads. This handles
watchpoints correctly for contiguous stores, recognizing the
exception before any changes to memory.
Backports commit 0fa476c1bb37a70df7eeff1e5bfb4791feb37e0e from qemu
With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines. With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.
Backports commit c647673ce4d72a8789703c62a7f3cbc732cb1ea8 from qemu
Handle all of the watchpoints for active elements all at once,
before we've modified the vector register. This removes the
TLB_WATCHPOINT bit from page[].flags, which means that we can
use the normal fast path via RAM.
Backports commit 4bcc3f0ff8e5ae2b17b5aab9aa613ff1b8025896 from qemu
First use of the new helper functions, so we can remove the
unused markup. No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.
Backports commit b854fd06a868e0308bcfe05ad0a71210705814c7 from qemu
The current interface includes a loop; change it to load a
single element. We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.
Replace each call with the simplest possible loop over active
elements.
Backports commit cf4a49b71b1712142d7122025a8ca7ea5b59d73f from qemu
For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.
Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.
Temporarily mark the functions unused to avoid Werror.
Backports commit b4cd95d2f4c7197b844f51b29871d888063ea3e7 from qemu
Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.
Since fb901c9, cpu_mem_index is now a simple extract
from env->hflags and not a large computation. Which means
that it's now more work to pass around this value than it
is to recompute it.
This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.
Move the variable declarations to the top of the function,
but do not create a new label before sve_access_check.
Backports commit c0ed9166b1aea86a2fbaada1195aacd1049f9e85 from qemu
Replace existing uses of check_data_tbi in translate-a64.c that
perform multiple logical memory access. Leave the helper blank
for now to reduce the patch size.
Backports commit 73ceeb0011b23bac8bd2c09ebe3c18d034aa69ce from qemu
Replace existing uses of check_data_tbi in translate-a64.c that
perform a single logical memory access. Leave the helper blank
for now to reduce the patch size.
Backports commit 0a405be2b8fd9506a009b10d7d2d98c394b36db6 from qemu
Now that we know that the operation is on a single page,
we need not loop over pages while probing.
Backports commit e26d0d226892f67435cadcce86df0ddfb9943174 from qemu
We can simplify our DC_ZVA if we recognize that the largest BS
that we actually use in system mode is 64. Let us just assert
that it fits within TARGET_PAGE_SIZE.
For DC_GVA and STZGM, we want to be able to write whole bytes
of tag memory, so assert that BS is >= 2 * TAG_GRANULE, or 32.
Backports commit a4157b80242bf1c8aa0ee77aae7458ba79012d5d from qemu
Use the same code as system mode, so that we generate the same
exception + syndrome for the unaligned access.
For the moment, if MTE is enabled so that this path is reachable,
this would generate a SIGSEGV in the user-only cpu_loop. Decoding
the syndrome to produce the proper SIGBUS will be done later.
Backports commit 0d1762e931f8a694f261c604daba605bcda70928 from qemu
The current Arm ARM has adjusted the official decode of
"Add/subtract (immediate)" so that the shift field is only bit 22,
and bit 23 is part of the op1 field of the parent category
"Data processing - immediate".
Backports commit 21a8b343eaae63f6984f9a200092b0ea167647f1 from qemu
Cache the composite ATA setting.
Cache when MTE is fully enabled, i.e. access to tags are enabled
and tag checks affect the PE. Do this for both the normal context
and the UNPRIV context.
Backports commit 81ae05fa2d21ac1a0054935b74342aa38a5ecef7 from qemu
This is TFSRE0_EL1, TFSR_EL1, TFSR_EL2, TFSR_EL3,
RGSR_EL1, GCR_EL1, GMID_EL1, and PSTATE.TCO.
Backports commit 4b779cebb3e5ab30b945181f1ba3932f5f8a1cb5 from qemu
Add an option that writes back the PC, like DISAS_UPDATE_EXIT,
but does not exit back to the main loop.
Backports commit 329833286d7a1b0ef8c7daafe13c6ae32429694e from qemu
target/arm: Add support for MTE to HCR_EL2 and SCR_EL3
This does not attempt to rectify all of the res0 bits, but does
clear the mte bits when not enabled. Since there is no high-part
mapping of SCTLR, aa32 mode cannot write to these bits.
Backports commits f00faf130d5dcf64b04f71a95f14745845ca1014, and
8ddb300bf60a5f3d358dd6fbf81174f6c03c1d9f from qemu.
Protect reads of aa64 id registers with ARM_CP_STATE_AA64.
Use this as a simpler test than arm_el_is_aa64, since EL3
cannot change mode.
Backports commit 252e8c69669599b4bcff802df300726300292f47 from qemu
In commit cfdb2c0c95ae9205b0 ("target/arm: Vectorize SABA/UABA") we
replaced the old handling of SABA/UABA with a vectorized implementation
which returns early rather than falling into the loop-ever-elements
code. We forgot to delete the part of the old looping code that
did the accumulate step, and Coverity correctly warns (CID 1428955)
that this code is now dead. Delete it.
Fixes: cfdb2c0c95ae9205b0
Backports commit ced7e8edb282765685d2ba0206a11f8692d8ec1c from qemu
Since commit ba3e7926691ed3 it has been unnecessary for target code
to call gen_io_end() after an IO instruction in icount mode; it is
sufficient to call gen_io_start() before it and to force the end of
the TB.
Many now-unnecessary calls to gen_io_end() were removed in commit
9e9b10c6491153b, but some were missed or accidentally added later.
Remove unneeded calls from the arm target:
* the call in the handling of exception-return-via-LDM is
unnecessary, and the code is already forcing end-of-TB
* the call in the VFP access check code is more complicated:
we weren't ending the TB, so we need to add the code to
force that by setting DISAS_UPDATE
* the doc comment for ARM_CP_IO doesn't need to mention
gen_io_end() any more
Backports commit 55c812b74289863c348449135812027d188f040a from qemu
The functions neon_element_offset(), neon_load_element(),
neon_load_element64(), neon_store_element() and
neon_store_element64() are used only in the translate-neon.inc.c
file, so move their definitions there.
Since the .inc.c file is #included in translate.c this doesn't make
much difference currently, but it's a more logical place to put the
functions and it might be helpful if we ever decide to try to make
the .inc.c files genuinely separate compilation units.
Backports commit 6fb5787898aab6aa04887fed9cf3220dd4c3f36a from qemu
Convert the Neon VTRN insn to decodetree. This is the last insn in the
Neon data-processing group, so we can remove all the now-unused old
decoder framework.
It's possible that there's a more efficient implementation of
VTRN, but for this conversion we just copy the existing approach.
Backports commit d4366190f84fe89cc5d46da995dac1e7d541b98e from qemu
Convert the Neon VSWP insn to decodetree. Since the new implementation
doesn't have to share a pass-loop with the other 2-reg-misc operations
we can implement the swap with 64-bit accesses rather than 32-bits
(which brings us into line with the pseudocode and is more efficient).
Backports commit 8ab3a227a0f13f0ff85846f36f7c466769aef4fc from qemu
Convert the Neon 2-reg-misc VRINT insns to decodetree.
Giving these insns their own do_vrint() function allows us
to change the rounding mode just once at the start and end
rather than doing it for every element in the vector.
Backports commit 128123ea34e9e6afe4842aefcb9cf84b9642ac22 from qemu
Convert the Neon 2-reg-misc insns which are implemented with
simple calls to functions that take the input, output and
fpstatus pointer.
Backports commit 3e96b205286dfb8bbf363229709e4f8648fce379 from qemu
Convert the Neon VQABS and VQNEG insns to decodetree.
Since these are the only ones which need cpu_env passing to
the helper, we wrap the helper rather than creating a whole
new do_2misc_env() function.
Backports commit 4936f38abe6db0a9d23fd04e4cb0cf4d51cff174 from qemu
Convert the remaining ops in the Neon 2-reg-misc group which
can be implemented simply with our do_2misc() helper.
Backports commit 84eae770af69c37a92496a4c4248875c070d5ee3 from qemu
Make gen_swap_half() take a source and destination TCGv_i32 rather
than modifying the input TCGv_i32; we're going to want to be able to
use it with the more flexible function signature, and this also
brings it into line with other functions like gen_rev16() and
gen_revsh().
Backports commit 8ec3de7018a8198624aae49eef5568256114a829 from qemu
All the other typedefs like these spell "Op" with a lowercase 'p';
remane the NeonGenTwoSingleOPFn and NeonGenTwoDoubleOPFn typedefs to
match.
Backports commit 5de3fd045be11b74cd0fbf36c6d4fb8387d5463b from qemu
The NeonGenOneOpFn typedef breaks with the pattern of the other
NeonGen*Fn typedefs, because it is a TCGv_i64 -> TCGv_i64 operation
but it does not have '64' in its name. Rename it to NeonGenOne64OpFn,
so that the old name is available for a TCGv_i32 -> TCGv_i32 operation
(which we will need in a subsequent commit).
Backports commit 039f4e809ad2772fb33de4511ff68a485d875618 from qemu
Convert to decodetree the insns in the Neon 2-reg-misc grouping which
we implement using gvec.
Backports commit 75153179e9928775d5333243ea4b278f438d75ae from qemu
Convert the Neon insns in the 2-reg-misc group which are
VCVT between f32 and f16 to decodetree.
Backports commit 654a517355e249435505ae5ff14a7520410cf7a4 from qemu
Convert the Neon narrowing moves VMQNV, VQMOVN, VQMOVUN in the 2-reg-misc
group to decodetree.
Backports commit 3882bdacb0ad548864b9f2582a32bb5c785e3165 from qemu
Convert the pairwise ops VPADDL and VPADAL in the 2-reg-misc grouping
to decodetree.
At this point we can get rid of the weird CPU_V001 #define that was
used to avoid having to explicitly list all the arguments being
passed to some TCG gen/helper functions.
Backports commit 6106af3aa2304fccee91a3a90138352b0c2af998 from qemu
Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)
Backports commit 9aaa23c2ae18e6fb9a291b81baf91341db76dfa0 from qemu
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.
Backports commit 54e96c744b70a5d19f14b212a579dd3be8fcaad9 from qemu
Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.
We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.
Backports commit 0aad761fb0aed40c99039eacac470cbd03d07019 from qemu
Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.
Backports commit 77e576a9281825fc170f3b3af83f47e110549b5c from qemu
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.
Backports commit 85ac9aef9a5418de3168df569e21258e853840a2 from qemu
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree. These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.
The refactoring removes some of the oddities of the old decoder:
* operands to the operation and accumulation were often
reversed (taking advantage of the fact that most of these ops
are commutative); the new code follows the pseudocode order
* the Q bit in the insn was in a local variable 'u'; in the
new code it is decoded into a->q
Backports commit 96fc80f5f186decd1a649f6c04252faceb057ad2 from qemu
In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
temporary in do_2shift_env_64(); free it.
Backports commit a4f67e180def790ff0bbb33fc93bb6e80382f041 from qemu
Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.
Backports commit 448f0e5f3ecfbd089b934e5e3aa0ccd1f51a6174 from qemu
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.
Backports commit 18fb58d588898550919392277787979ee7d0d84e from qemu
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.
These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.
Backports commit 9546ca5998d3cbd98a81b2d46a2e92a11b0f78a4 from qemu
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.
Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.
Backports commit 3a1d9eb07b767a7592abca642af80906f9eab0ed from qemu
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.
Backports commit f5b28401200ec95ba89552df3ecdcdc342f6b90b from qemu
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.
Backports commit 0fa1ab0302badabc3581aefcbb2f189ef52c4985 from qemu
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.
As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.
Backports commit b28be09570d0827969b62b8f82b0f720a9915427 from qemu
The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.
Backports commit 9593a3988c3e788790aa107d778386b09f456a6d from qemu
The miscellaneous control instructions are mutually exclusive
within the t32 decode sub-group.
Backports commit d6084fba47bb9aef79775c1102d4b647eb58c365 from qemu
Convert the insns in the one-register-and-immediate group to decodetree.
In the new decode, our asimd_imm_const() function returns a 64-bit value
rather than a 32-bit one, which means we don't need to treat cmode=14 op=1
as a special case in the decoder (it is the only encoding where the two
halves of the 64-bit value are different).
Backports commit 2c35a39eda0b16c2ed85c94cec204bf5efb97812 from qemu
Convert the VCVT fixed-point conversion operations in the
Neon 2-regs-and-shift group to decodetree.
Backports commit 3da26f11711caeaa18318b6afa14dfb81d7650ab from qemu
Convert the VSHLL and VMOVL insns from the 2-reg-shift group
to decodetree. Since the loop always has two passes, we unroll
it to avoid the awkward reassignment of one TCGv to another.
Backports commit 968bf842742a5ffbb0041cb31089e61a9f7a833d from qemu
Convert the VQSHLU and QVSHL 2-reg-shift insns to decodetree.
These are the last of the simple shift-by-immediate insns.
Backports commit 37bfce81b10450071193c8495a07f182ec652e2a from qemu