unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-12-25 13:15:39 +00:00

Author	SHA1	Message	Date
Peter Maydell	67ad40ffa4	target/arm: Convert VFP VMLS to decodetree Convert the VFP VMLS instruction to decodetree. Backports commit e7258280d46af4ab6a0cc93ccfe8f6614defb4b7 from qemu	2019-06-13 18:02:37 -04:00
Peter Maydell	edf81eb214	target/arm: Convert VFP VMLA to decodetree Convert the VFP VMLA instruction to decodetree. This is the first of the VFP 3-operand data processing instructions, so we include in this patch the code which loops over the elements for an old-style VFP vector operation. The existing code to do this looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since we are going to be converting instructions one at a time anyway we can take the opportunity to make the new loop use TCG temporaries, which means we can do that conversion one operation at a time rather than needing to do it all in one go. We include an UNDEF check which was missing in the old code: short-vector operations (with stride or length non-zero) were deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec field does not indicate that support for short vectors is present we UNDEF the operations that would use them. (This is a change of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which previously were all incorrectly allowing short-vector operations.) Note that the conversion fixes a bug in the old code for the case of VFP short-vector "mixed scalar/vector operations". These happen where the destination register is in a vector bank but but the second operand is in a scalar bank. For example vmla.f64 d10, d1, d16 with length 2 stride 2 is equivalent to the pair of scalar operations vmla.f64 d10, d1, d16 vmla.f64 d8, d3, d16 where the destination and first input register cycle through their vector but the second input is scalar (d16). In the old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d} as a temporary output for the multiply, which trashes the second input operand. For the fully-scalar case (where we never do a second iteration) and the fully-vector case (where the loop loads the new second input operand) this doesn't matter, but for the mixed scalar/vector case we will end up using the wrong value for later loop iterations. In the new code we use TCG temporaries and so avoid the bug. This bug is present for all the multiply-accumulate insns that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS. Note 2: the expression used to calculate the next register number in the vector bank is not in fact correct; we leave this behaviour unchanged from the old decoder and will fix this bug later in the series. Backports commit 266bd25c485597c94209bfdb3891c1d0c573c164 from qemu	2019-06-13 17:59:16 -04:00
Peter Maydell	93fe4cbe9e	target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans functions which perform the memory accesses by going via the TCG globals cpu_F0s and cpu_F0d, to use local TCG temps instead. Backports commit 3993d0407dff7233e42f2251db971e126a0497e9 from qemu	2019-06-13 17:31:28 -04:00
Peter Maydell	ff7042567e	target/arm: Convert the VFP load/store multiple insns to decodetree Convert the VFP load/store multiple insns to decodetree. This includes tightening up the UNDEF checking for pre-VFPv3 CPUs which only have D0-D15 : they now UNDEF for any access to D16-D31, not merely when the smallest register in the transfer list is in D16-D31. This conversion does not try to share code between the single precision and the double precision versions; this looks a bit duplicative of code, but it leaves the door open for a future refactoring which gets rid of the use of the "F0" registers by inlining the various functions like gen_vfp_ld() and gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }" conditionalisation. Backports commit fa288de272c5c8a66d5eb683b123706a52bc7ad6 from qemu	2019-06-13 17:26:52 -04:00
Peter Maydell	6f0633ce80	target/arm: Convert VFP VLDR and VSTR to decodetree Convert the VFP single load/store insns VLDR and VSTR to decodetree. Backports commit 79b02a3b5231c5b8cd31e50cd549968dd0a05c49 from qemu	2019-06-13 17:22:48 -04:00
Peter Maydell	fe98885ff2	target/arm: Convert VFP two-register transfer insns to decodetree Convert the VFP two-register transfer instructions to decodetree (in the v8 Arm ARM these are the "Advanced SIMD and floating-point 64-bit move" encoding group). Again, we expand out the sequences involving gen_vfp_msr() and gen_msr_vfp(). Backports commit 81f681106eabe21c55118a5a41999fb7387fb714 from qemu	2019-06-13 17:20:00 -04:00
Peter Maydell	3fb3403b82	target/arm: Convert single-precision register moves to decodetree Convert the "single-precision" register moves to decodetree: * VMSR * VMRS * VMOV between general purpose register and single precision Note that the VMSR/VMRS conversions make our handling of the "should this UNDEF?" checks consistent between the two instructions: * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0 (previously was a nop) * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better (previously was a nop) * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better (previously would write to the register, which had no guest-visible effect because we always UNDEF reads) We also tighten up the decode: we were previously underdecoding some SBZ or SBO bits. The conversion of VMOV_single includes the expansion out of the gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr() sequences into the simpler direct load/store of the TCG temp via neon_{load,store}_reg32(): we know in the new function that we're always single-precision, we don't need to use the old-and-deprecated cpu_F0* TCG globals, and we don't happen to have the declaration of gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the new function is. Backports commit a9ab50011aeda2dd012da99069e078379315ea18 from qemu	2019-06-13 17:16:38 -04:00
Peter Maydell	694058da94	target/arm: Convert double-precision register moves to decodetree Convert the "double-precision" register moves to decodetree: this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP. Note that the conversion process has tightened up a few of the UNDEF encoding checks: we now correctly forbid: * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10 * VMOV-from-gpr with opc1:opc2 == 0x10 * VDUP with B:E == 11 * VDUP with Q == 1 and Vn<0> == 1 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- The accesses of elements < 32 bits could be improved by doing direct ld/st of the right size rather than 32-bit read-and-shift or read-modify-write, but we leave this for later cleanup, since this series is generally trying to stick to fixing the decode. Backports commit 9851ed9269d214c0c6feba960dd14ff09e6c34b4 from qemu	2019-06-13 17:11:56 -04:00
Peter Maydell	7265161108	target/arm: Add helpers for VFP register loads and stores The current VFP code has two different idioms for loading and storing from the VFP register file: 1 using the gen_mov_F0_vreg() and similar functions, which load and store to a fixed set of TCG globals cpu_F0s, CPU_F0d, etc 2 by direct calls to tcg_gen_ld_f64() and friends We want to phase out idiom 1 (because the use of the fixed globals is a relic of a much older version of TCG), but idiom 2 is quite longwinded: tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg)) requires us to specify the 64-bitness twice, once in the function name and once by passing 'true' to vfp_reg_offset(). There's no guard against accidentally passing the wrong flag. Instead, let's move to a convention of accessing 64-bit registers via the existing neon_load_reg64() and neon_store_reg64(), and provide new neon_load_reg32() and neon_store_reg32() for the 32-bit equivalents. Implement the new functions and use them in the code in translate-vfp.inc.c. We will convert the rest of the VFP code as we do the decodetree conversion in subsequent commits. Backports commit 160f3b64c5cc4c8a09a1859edc764882ce6ad6bf from qemu	2019-06-13 17:01:59 -04:00
Peter Maydell	033a386ffb	target/arm: Move the VFP trans_* functions to translate-vfp.inc.c Move the trans_*() functions we've just created from translate.c to translate-vfp.inc.c. This is pure code motion with no textual changes (this can be checked with 'git show --color-moved'). Backports commit f7bbb8f31f0761edbf0c64b7ab3c3f49c13612ea from qemu	2019-06-13 16:56:24 -04:00
Peter Maydell	3994dfd079	target/arm: Convert the VSEL instructions to decodetree Convert the VSEL instructions to decodetree. We leave trans_VSEL() in translate.c for now as this allows the patch to show just the changes from the old handle_vsel(). In the old code the check for "do D16-D31 exist" was hidden in the VFP_DREG macro, and assumed that VFPv3 always implied that D16-D31 exist. In the new code we do the correct ID register test. This gives identical behaviour for most of our CPUs, and fixes previously incorrect handling for Cortex-R5F, Cortex-M4 and Cortex-M33, which all implement VFPv3 or better with only 16 double-precision registers. Backports commit b3ff4b87b4ae08120a51fe12592725e1dca8a085 from qemu	2019-06-13 16:41:22 -04:00
Lioncash	b3cfede44f	target/arm: Make load_cpu_offset() take a DisasContext* instead of uc_struct* Keeps it consistent with store_cpu_offset	2019-06-13 16:35:31 -04:00
Peter Maydell	78997058e4	target/arm: Factor out VFP access checking code Factor out the VFP access checking code so that we can use it in the leaf functions of the decodetree decoder. We call the function full_vfp_access_check() so we can keep the more natural vfp_access_check() for a version which doesn't have the 'ignore_vfp_enabled' flag -- that way almost all VFP insns will be able to use vfp_access_check(s) and only the special-register access function will have to use full_vfp_access_check(s, ignore_vfp_enabled). Backports commit 06db8196bba34776829020192ed623a0b22e6557 from qemu	2019-06-13 16:33:38 -04:00
Peter Maydell	9732ebba5c	target/arm: Add stubs for AArch32 VFP decodetree Add the infrastructure for building and invoking a decodetree decoder for the AArch32 VFP encodings. At the moment the new decoder covers nothing, so we always fall back to the existing hand-written decode. We need to have one decoder for the unconditional insns and one for the conditional insns, as otherwise the patterns for conditional insns would incorrectly match against the unconditional ones too. Since translate.c is over 14,000 lines long and we're going to be touching pretty much every line of the VFP code as part of the decodetree conversion, we create a new translate-vfp.inc.c to hold the code which deals with VFP in the new scheme. It should be possible to convert this into a standalone translation unit eventually, but the conversion process will be much simpler if we simply #include it midway through translate.c to start with. Backports commit 78e138bc1f672c145ef6ace74617db00eebaa2ba from qemu	2019-06-13 16:24:37 -04:00

14 commits