Commit graph

476 commits

Author SHA1 Message Date
Peter Maydell edae732810 target/arm: Move general-use constant expanders up in translate.c
The constant-expander functions like negate, plus_2, etc, are
generally useful; move them up in translate.c so we can use them in
the VFP/Neon decoders as well as in the A32/T32/T16 decoders.

Backports f7ed0c9433e7c5c157d2e6235eb5c8b93234a71a
2021-03-03 18:29:32 -05:00
Peter Maydell 2e3bd010a8 target/arm: Implement CLRM instruction
In v8.1M the new CLRM instruction allows zeroing an arbitrary set of
the general-purpose registers and APSR. Implement this.

The encoding is a subset of the LDMIA T2 encoding, using what would
be Rn=0b1111 (which UNDEFs for LDMIA).

Backports 6e21a013fbdf54960a079dccc90772bb622e28e8
2021-03-03 18:00:28 -05:00
Peter Maydell 43d8441881 target/arm: Implement VSCCLRM insn
Implement the v8.1M VSCCLRM insn, which zeros floating point
registers if there is an active floating point context.
This requires support in write_neon_element32() for the MO_32
element size, so add it.

Because we want to use arm_gen_condlabel(), we need to move
the definition of that function up in translate.c so it is
before the #include of translate-vfp.c.inc.

Backports 83ff3d6add965c9752324de11eac5687121ea826
2021-03-03 17:57:30 -05:00
Chetan Pant c7f6786089 arm tcg cpus: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

Backports 50f57e09fda4b7ffbc5ba62aad6cebf660824023
2021-03-02 13:30:35 -05:00
Xinhao Zhang b3f63b72a2 target/arm: add space before the open parenthesis '('
Fix code style. Space required before the open parenthesis '('.

Backports 7f350a87e3a85e8a260ce4b133d549a7b2789213
2021-03-02 13:17:48 -05:00
Richard Henderson d473f66177 target/arm: Improve do_prewiden_3d
We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Backports 8aab18a2c5209e4e48998a61fbc2d89f374331ed
2021-03-02 13:00:25 -05:00
Richard Henderson 07c2b70234 target/arm: Rename neon_load_reg64 to vfp_load_reg64
The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Backports b38b96ca90827012ab8eb045c1337cea83a54c4b
2021-03-02 12:43:25 -05:00
Richard Henderson 9d87b62578 target/arm: Add read/write_neon_element64
Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Backports 0aa8e700a53b0aa7275ed747b8fa3acb61d35f2d
2021-03-02 12:40:33 -05:00
Richard Henderson 89b1f62878 target/arm: Rename neon_load_reg32 to vfp_load_reg32
The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Backports 21c1c0e50b73c580c6bfc8f2314d1b6a14793561
2021-03-02 12:30:20 -05:00
Richard Henderson 011d9ab061 target/arm: Expand read/write_neon_element32 to all MemOp
We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Backports 4d5fa5a80ac28f34b8497be1e85371272413a12e
2021-03-02 12:26:41 -05:00
Richard Henderson d21316d639 target/arm: Add read/write_neon_element32
Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc. The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Backports a712266f5d5a36d04b22fe69fa15592d62bed019
2021-03-02 12:18:31 -05:00
Richard Henderson e390c1ec7f target/arm: Use neon_element_offset in vfp_reg_offset
This seems a bit more readable than using offsetof CPU_DoubleU.

Backports d8719785fde2f5041986853a314c05c6f567d3cb
2021-03-02 11:55:49 -05:00
Richard Henderson c1ca9e53da target/arm: Use neon_element_offset in neon_load/store_reg
These are the only users of neon_reg_offset, so remove that.

Backports 0f2cdc82276a723ee58562b56b9d537a4bd7bfef
2021-03-02 11:54:56 -05:00
Richard Henderson 1b09d0d96f target/arm: Move neon_element_offset to translate.c
This will shortly have users outside of translate-neon.c.inc.

Backports 7ec85c02833f4264840c6ed78b749443a7b4ffe0
2021-03-02 11:52:59 -05:00
Richard Henderson 8a20537e7f target/arm: Introduce neon_full_reg_offset
This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0. This fixes a bug
when running on a big-endian host.

Backports 015ee81a4c06b644969f621fd9965cc6372b879e
2021-03-02 11:50:36 -05:00
Peter Maydell 3ae5543825 target/arm: Implement v8.1M low-overhead-loop instructions
v8.1M's "low-overhead-loop" extension has three instructions
for looping:
* DLS (start of a do-loop)
* WLS (start of a while-loop)
* LE (end of a loop)

The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR. The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.

As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.

(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors. We'll add those later when we implement
MVE.)

Backports commit b7226369721896ab9ef71544e4fe95b40710e05a
2021-03-01 20:29:04 -05:00
Peter Maydell be197f9857 target/arm: Implement v8.1M branch-future insns (as NOPs)
v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).

For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.

Backports commit 05903f036edba8e3ed940cc215b8e27fb49265b9
2021-03-01 20:25:15 -05:00
Peter Maydell 966246d991 target/arm: Don't allow BLX imm for M-profile
The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.

(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)

Backports 920f04fa3ea789f8f85a52cee5395b8887b56cf7
2021-03-01 20:23:59 -05:00
Peter Maydell 666fe17025 target/arm: Implement v8.1M conditional-select insns
v8.1M brings four new insns to M-profile:
* CSEL : Rd = cond ? Rn : Rm
* CSINC : Rd = cond ? Rn : Rm+1
* CSINV : Rd = cond ? Rn : ~Rm
* CSNEG : Rd = cond ? Rn : -Rm

Implement these.

Backports cc73bbded0dfb5612b0e416f7eda13a66950542a
2021-03-01 20:19:33 -05:00
Peter Maydell 31013d5a8f target/arm: Fix SMLAD incorrect setting of Q bit
The SMLAD instruction is supposed to:
* signed multiply Rn[15:0] * Rm[15:0]
* signed multiply Rn[31:16] * Rm[31:16]
* perform a signed addition of the products and Ra
* set Rd to the low 32 bits of the theoretical
infinite-precision result
* set the Q flag if the sign-extension of Rd
would differ from the infinite-precision result
(ie on overflow)

Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q. This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1

The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.

Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.

Backports commit 5288145d716338ace0f83e3ff05c4d07715bb4f4
2021-03-01 19:58:39 -05:00
Peter Maydell b1b0a41507 target/arm: Make A32/T32 use new fpstatus_ptr() API
Make A32/T32 code use the new fpstatus_ptr() API:
get_fpstatus_ptr(0) -> fpstatus_ptr(FPST_FPCR)
get_fpstatus_ptr(1) -> fpstatus_ptr(FPST_STD)

Backports a84d1d1316726704edd2617b2c30c921d98a8137
2021-02-26 11:55:55 -05:00
Peter Maydell e0000d1700 target/arm/translate.c: Delete/amend incorrect comments
In arm_tr_init_disas_context() we have a FIXME comment that suggests
"cpu_M0 can probably be the same as cpu_V0". This isn't in fact
possible: cpu_V0 is used as a temporary inside gen_iwmmxt_shift(),
and that function is called in various places where cpu_M0 contains a
live value (i.e. between gen_op_iwmmxt_movq_M0_wRn() and
gen_op_iwmmxt_movq_wRn_M0() calls). Remove the comment.

We also have a comment on the declarations of cpu_V0/V1/M0 which
claims they're "for efficiency". This isn't true with modern TCG, so
replace this comment with one which notes that they're only used with
the iwmmxt decode

Backports 8b4c9a50dc9531a729ae4b5941d287ad0422db48
2021-02-26 11:23:52 -05:00
Peter Maydell 0759bb8eaf target/arm: Delete unused VFP_DREG macros
As part of the Neon decodetree conversion we removed all
the uses of the VFP_DREG macros, but forgot to remove the
macro definitions. Do so now.

Backports e60527c5d501e5015a119a0388a27abeae4dac09
2021-02-26 11:22:01 -05:00
Peter Maydell 368323b03f target/arm: Remove ARCH macro
The ARCH() macro was used a lot in the legacy decoder, but
there are now just two uses of it left. Since a macro which
expands out to a goto is liable to be confusing when reading
code, replace the last two uses with a simple open-coded
qeuivalent.

Backports ce51c7f522ca488c795c3510413e338021141c96
2021-02-26 11:21:20 -05:00
Peter Maydell 5d9c0addcf target/arm: Convert T32 coprocessor insns to decodetree
Convert the T32 coprocessor instructions to decodetree.
As with the A32 conversion, this corrects an underdecoding
where we did not check that MRRC/MCRR [24:21] were 0b0010
and so treated some kinds of LDC/STC and MRRC/MCRR rather
than UNDEFing them.

Backports commit 4c498dcfd84281f20bd55072630027d1b3c115fd
2021-02-26 11:19:35 -05:00
Peter Maydell bdaaac68f5 target/arm: Do M-profile NOCP checks early and via decodetree
For M-profile CPUs, the architecture specifies that the NOCP
exception when a coprocessor is not present or disabled should cover
the entire wide range of coprocessor-space encodings, and should take
precedence over UNDEF exceptions. (This is the opposite of
A-profile, where checking for a disabled FPU has to happen last.)

Implement this with decodetree patterns that cover the specified
ranges of the encoding space. There are a few instructions (VLLDM,
VLSTM, and in v8.1 also VSCCLRM) which are in copro-space but must
not be NOCP'd: these must be handled also in the new m-nocp.decode so
they take precedence.

This is a minor behaviour change: for unallocated insn patterns in
the VFP area (cp=10,11) we will now NOCP rather than UNDEF when the
FPU is disabled.

As well as giving us the correct architectural behaviour for v8.1M
and the recommended behaviour for v8.0M, this refactoring also
removes the old NOCP handling from the remains of the 'legacy
decoder' in disas_thumb2_insn(), paving the way for cleaning that up.

Since we don't currently have a v8.1M feature bit or any v8.1M CPUs,
the minor changes to this logic that we'll need for v8.1M are marked
up with TODO comments.

Backports commit a3494d4671797c291c88bd414acb0aead15f7239 from qemu
2021-02-26 11:17:23 -05:00
Peter Maydell c675b73b1f target/arm: Tidy up disas_arm_insn()
The only thing left in the "legacy decoder" is the handling
of disas_xscale_insn(), and we can simplify the code.

Backports commit 8198c071bc55bee55ef4f104a5b125f541b51096
2021-02-26 10:59:09 -05:00
Peter Maydell fc4cc9d95f target/arm: Convert A32 coprocessor insns to decodetree
Convert the A32 coprocessor instructions to decodetree.

Note that this corrects an underdecoding: for the 64-bit access case
(MRRC/MCRR) we did not check that bits [24:21] were 0b0010, so we
would incorrectly treat LDC/STC as MRRC/MCRR rather than UNDEFing
them.

The decodetree versions of these insns assume the coprocessor
is in the range 0..7 or 14..15. This is architecturally sensible
(as per the comments) and OK in practice for QEMU because the only
uses of the ARMCPRegInfo infrastructure we have that aren't
for coprocessors 14 or 15 are the pxa2xx use of coprocessor 6.
We add an assertion to the define_one_arm_cp_reg_with_opaque()
function to catch any accidental future attempts to use it to
define coprocessor registers for invalid coprocessors.

Backports commit cd8be50e58f63413c033531d3273c0e44851684f from qemu
2021-02-26 10:57:00 -05:00
Peter Maydell ef0e23f1f9 target/arm: Separate decode from handling of coproc insns
As a prelude to making coproc insns use decodetree, split out the
part of disas_coproc_insn() which does instruction decoding from the
part which does the actual work, and make do_coproc_insn() handle the
UNDEF-on-bad-permissions and similar cases itself rather than
returning 1 to eventually percolate up to a callsite that calls
unallocated_encoding() for it.

Backports 19c23a9baafc91dd3881a7a4e9bf454e42d24e4e
2021-02-26 10:53:52 -05:00
Peter Maydell 2944a75b98 target/arm: Pull handling of XScale insns out of disas_coproc_insn()
At the moment we check for XScale/iwMMXt insns inside
disas_coproc_insn(): for CPUs with ARM_FEATURE_XSCALE all copro insns
with cp 0 or 1 are handled specially. This works, but is an odd
place for this check, because disas_coproc_insn() is called from both
the Arm and Thumb decoders but the XScale case never applies for
Thumb (all the XScale CPUs were ARMv5, which has only Thumb1, not
Thumb2 with the 32-bit coprocessor insn encodings). It also makes it
awkward to convert the real copro access insns to decodetree.

Move the identification of XScale out to its own function
which is only called from disas_arm_insn().

Backports commit 7b4f933db865391a90a3b4518bb2050a83f2a873 from qemu
2021-02-26 10:50:32 -05:00
Richard Henderson 179a3aacdf target/arm: Add DISAS_UPDATE_NOCHAIN
Add an option that writes back the PC, like DISAS_UPDATE_EXIT,
but does not exit back to the main loop.

Backports commit 329833286d7a1b0ef8c7daafe13c6ae32429694e from qemu
2021-02-25 14:08:08 -05:00
Richard Henderson eaa6291aa7 target/arm: Rename DISAS_UPDATE to DISAS_UPDATE_EXIT
Emphasize that the is_jmp option exits to the main loop.

Backports commit 14407ec2007e18536ed34772eef46f6e0a0e3d0e from qemu
2021-02-25 14:02:46 -05:00
Peter Maydell 167ed57625 target/arm: Remove unnecessary gen_io_end() calls
Since commit ba3e7926691ed3 it has been unnecessary for target code
to call gen_io_end() after an IO instruction in icount mode; it is
sufficient to call gen_io_start() before it and to force the end of
the TB.

Many now-unnecessary calls to gen_io_end() were removed in commit
9e9b10c6491153b, but some were missed or accidentally added later.
Remove unneeded calls from the arm target:

* the call in the handling of exception-return-via-LDM is
unnecessary, and the code is already forcing end-of-TB
* the call in the VFP access check code is more complicated:
we weren't ending the TB, so we need to add the code to
force that by setting DISAS_UPDATE
* the doc comment for ARM_CP_IO doesn't need to mention
gen_io_end() any more

Backports commit 55c812b74289863c348449135812027d188f040a from qemu
2021-02-25 13:17:32 -05:00
Peter Maydell 083d207fb0 target/arm: Move some functions used only in translate-neon.inc.c to that file
The functions neon_element_offset(), neon_load_element(),
neon_load_element64(), neon_store_element() and
neon_store_element64() are used only in the translate-neon.inc.c
file, so move their definitions there.

Since the .inc.c file is #included in translate.c this doesn't make
much difference currently, but it's a more logical place to put the
functions and it might be helpful if we ever decide to try to make
the .inc.c files genuinely separate compilation units.

Backports commit 6fb5787898aab6aa04887fed9cf3220dd4c3f36a from qemu
2021-02-25 13:15:23 -05:00
Peter Maydell 0b06317dc4 target/arm: Convert Neon VTRN to decodetree
Convert the Neon VTRN insn to decodetree. This is the last insn in the
Neon data-processing group, so we can remove all the now-unused old
decoder framework.

It's possible that there's a more efficient implementation of
VTRN, but for this conversion we just copy the existing approach.

Backports commit d4366190f84fe89cc5d46da995dac1e7d541b98e from qemu
2021-02-25 13:12:28 -05:00
Peter Maydell b7584069dd target/arm: Convert Neon VSWP to decodetree
Convert the Neon VSWP insn to decodetree. Since the new implementation
doesn't have to share a pass-loop with the other 2-reg-misc operations
we can implement the swap with 64-bit accesses rather than 32-bits
(which brings us into line with the pseudocode and is more efficient).

Backports commit 8ab3a227a0f13f0ff85846f36f7c466769aef4fc from qemu
2021-02-25 13:07:56 -05:00
Peter Maydell 73abdfea53 target/arm: Convert Neon 2-reg-misc VCVT insns to decodetree
Convert the VCVT instructions in the 2-reg-misc grouping to
decodetree.

Backports commit a183d5fb38b07bab2a840196186c4806f3c67c0d from qemu
2021-02-25 13:07:15 -05:00
Peter Maydell 7e705fdc8c target/arm: Convert Neon 2-reg-misc VRINT insns to decodetree
Convert the Neon 2-reg-misc VRINT insns to decodetree.
Giving these insns their own do_vrint() function allows us
to change the rounding mode just once at the start and end
rather than doing it for every element in the vector.

Backports commit 128123ea34e9e6afe4842aefcb9cf84b9642ac22 from qemu
2021-02-25 13:02:24 -05:00
Peter Maydell 3eddb77327 target/arm: Convert Neon 2-reg-misc fp-compare-with-zero insns to decodetree
Convert the fp-compare-with-zero insns in the Neon 2-reg-misc group to
decodetree.

Backports commit baa59323e841f76523f6ad4d746cdeb47ea574cd from qemu
2021-02-25 12:59:22 -05:00
Peter Maydell 6eb852ec1c target/arm: Convert simple fp Neon 2-reg-misc insns
Convert the Neon 2-reg-misc insns which are implemented with
simple calls to functions that take the input, output and
fpstatus pointer.

Backports commit 3e96b205286dfb8bbf363229709e4f8648fce379 from qemu
2021-02-25 12:56:28 -05:00
Peter Maydell 3dcee11013 target/arm: Convert Neon VQABS, VQNEG to decodetree
Convert the Neon VQABS and VQNEG insns to decodetree.
Since these are the only ones which need cpu_env passing to
the helper, we wrap the helper rather than creating a whole
new do_2misc_env() function.

Backports commit 4936f38abe6db0a9d23fd04e4cb0cf4d51cff174 from qemu
2021-02-25 12:53:18 -05:00
Peter Maydell 4033a3ca5c target/arm: Convert remaining simple 2-reg-misc Neon ops
Convert the remaining ops in the Neon 2-reg-misc group which
can be implemented simply with our do_2misc() helper.

Backports commit 84eae770af69c37a92496a4c4248875c070d5ee3 from qemu
2021-02-25 12:50:55 -05:00
Peter Maydell 88f8111500 target/arm: Convert Neon 2-reg-misc VREV32 and VREV16 to decodetree
Convert the VREV32 and VREV16 insns in the Neon 2-reg-misc group
to decodetree.

Backports commit 8966808205b59d6c196b380b638475bcd1657ef4 from qemu
2021-02-25 12:49:16 -05:00
Peter Maydell db1e503708 target/arm: Make gen_swap_half() take separate src and dest
Make gen_swap_half() take a source and destination TCGv_i32 rather
than modifying the input TCGv_i32; we're going to want to be able to
use it with the more flexible function signature, and this also
brings it into line with other functions like gen_rev16() and
gen_revsh().

Backports commit 8ec3de7018a8198624aae49eef5568256114a829 from qemu
2021-02-25 12:40:23 -05:00
Peter Maydell 27e74962e5 target/arm: Convert Neon 2-reg-misc crypto operations to decodetree
Convert the Neon-2-reg misc crypto ops (AESE, AESMC, SHA1H, SHA1SU1)
to decodetree.

Backports commit 0b30dd5b85e20aba259768cb7aaa952b3e319468 from qemu
2021-02-25 12:32:39 -05:00
Peter Maydell 4354448f57 target/arm: Convert vectorised 2-reg-misc Neon ops to decodetree
Convert to decodetree the insns in the Neon 2-reg-misc grouping which
we implement using gvec.

Backports commit 75153179e9928775d5333243ea4b278f438d75ae from qemu
2021-02-25 12:28:31 -05:00
Peter Maydell 6301f9acaa target/arm: Convert Neon VCVT f16/f32 insns to decodetree
Convert the Neon insns in the 2-reg-misc group which are
VCVT between f32 and f16 to decodetree.

Backports commit 654a517355e249435505ae5ff14a7520410cf7a4 from qemu
2021-02-25 12:25:32 -05:00
Peter Maydell 4ca33c54a2 target/arm: Convert Neon 2-reg-misc VSHLL to decodetree
Convert the VSHLL insn in the 2-reg-misc Neon group to decodetree.

Backports commit 749e2be36d75f11d5fa8f8277e2a0569bd2a1c97 from qemu
2021-02-25 12:20:57 -05:00
Peter Maydell 48d57d0dc7 target/arm: Convert Neon narrowing moves to decodetree
Convert the Neon narrowing moves VMQNV, VQMOVN, VQMOVUN in the 2-reg-misc
group to decodetree.

Backports commit 3882bdacb0ad548864b9f2582a32bb5c785e3165 from qemu
2021-02-25 12:18:01 -05:00
Peter Maydell 35d8a3e83f target/arm: Convert VZIP, VUZP to decodetree
Convert the Neon VZIP and VUZP insns in the 2-reg-misc group to
decodetree.

Backports commit 567663a2af2457da8aa74f221b1f3f8a6d2eddf6 from qemu
2021-02-25 12:14:29 -05:00