unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-07-05 14:20:43 +00:00

Author	SHA1	Message	Date
Peter Maydell	08b70267d0	target/arm: Implement VFP fp16 VMOV between gp and halfprec registers Implement the VFP fp16 variant of VMOV that transfers a 16-bit value between a general purpose register and a VFP register. Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later only we have no need to replicate the old "updates CPSR.NZCV" behaviour that the singleprec version of this insn does Backports commit 46a4b854525cb9f34a611f6ada6cdff1eab0ac2d	2021-03-01 16:26:34 -05:00
Peter Maydell	90aa9647e0	target/arm: Implement VFP fp16 VRINT* Implement the fp16 version of the VFP VRINT* insns. Backports 0a6f4b4cb338665b81ad824d9a6868932461b7f7	2021-03-01 16:15:21 -05:00
Peter Maydell	74a6af4e23	target/arm: Implement VFP fp16 VCVT between float and fixed-point Implement the fp16 versions of the VFP VCVT instruction forms which convert between floating point and fixed-point. Backports a149e2de0b63e3906729ed1d3df7d9ecdb6de5e6	2021-02-28 05:15:40 -05:00
Peter Maydell	f8241ae22f	target/arm: Implement VFP fp16 VCVT between float and integer Backports 0094e9f475a5a742d10d2f1e1beceea82b69f982	2021-02-28 05:02:25 -05:00
Peter Maydell	ac9ae5cbe7	target/arm: Implement VFP fp16 VLDR and VSTR Implement the fp16 versions of the VFP VLDR/VSTR (immediate). Backports commit 274afbb121107b8aaeaa11b3e7904d5f8ae38a94	2021-02-28 04:58:32 -05:00
Peter Maydell	5d98e14545	target/arm: Implement VFP fp16 VCMP Implement fp16 version of VCMP. Backports 1b88b054c5b201e8581114d29527c6a5a7e088c9	2021-02-28 04:56:24 -05:00
Peter Maydell	25d95570f3	target/arm: Implement VFP fp16 for VMOV immediate Implement VFP fp16 support for the VMOV immediate insn. Backports commit 28c28728e53c9f4c13a5cd50f313788c7ec2f9ad	2021-02-28 04:51:11 -05:00
Peter Maydell	2d9abf7c0b	target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT Implement VFP fp16 for VABS, VNEG and VSQRT. This is all the fp16 insns that use the DO_VFP_2OP macro, because there is no fp16 version of VMOV_reg. Notes: * the gen_helper_vfp_negh already exists as we needed to create it for the fp16 multiply-add insns * as usual we need to use the f16 version of the fp_status; this is only relevant for VSQRT Backports ce2d65a5d191380756cdac7a1fd1ba76bd1621cf	2021-02-28 04:48:28 -05:00
Peter Maydell	6ac2c597ab	target/arm: Implement VFP fp16 for fused-multiply-add Implement VFP fp16 support for fused multiply-add insns VFNMA, VFNMS, VFMA, VFMS. Backports 9886fe2834b064a3cf0675a4659942ed547aed42	2021-02-28 04:39:21 -05:00
Peter Maydell	a42ecfe203	target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL instructions. (These are all the remaining ones which we implement via do_vfp_3op_[hsd]p().) Backports commit e7cb0ded52c6d7b86585b09935fe7caeb9e38b69	2021-02-28 04:29:37 -05:00
Peter Maydell	eae621098d	target/arm: Implement VFP fp16 for VFP_BINOP operations Implmeent VFP fp16 support for simple binary-operator VFP insns VADD, VSUB, VMUL, VDIV, VMINNM and VMAXNM: * make the VFP_BINOP() macro generate float16 helpers as well as float32 and float64 * implement a do_vfp_3op_hp() function similar to the existing do_vfp_3op_sp() * add decode for the half-precision insn patterns Note that the VFP_BINOP macro use creates a couple of unused helper functions vfp_maxh and vfp_minh, but they're small so it's not worth splitting the BINOP operations into "needs halfprec" and "no halfprec" groups. Backports commit 120a0eb3ea23a5b06fae2f3daebd46a4035864cf	2021-02-28 04:24:39 -05:00
Peter Maydell	bdaaac68f5	target/arm: Do M-profile NOCP checks early and via decodetree For M-profile CPUs, the architecture specifies that the NOCP exception when a coprocessor is not present or disabled should cover the entire wide range of coprocessor-space encodings, and should take precedence over UNDEF exceptions. (This is the opposite of A-profile, where checking for a disabled FPU has to happen last.) Implement this with decodetree patterns that cover the specified ranges of the encoding space. There are a few instructions (VLLDM, VLSTM, and in v8.1 also VSCCLRM) which are in copro-space but must not be NOCP'd: these must be handled also in the new m-nocp.decode so they take precedence. This is a minor behaviour change: for unallocated insn patterns in the VFP area (cp=10,11) we will now NOCP rather than UNDEF when the FPU is disabled. As well as giving us the correct architectural behaviour for v8.1M and the recommended behaviour for v8.0M, this refactoring also removes the old NOCP handling from the remains of the 'legacy decoder' in disas_thumb2_insn(), paving the way for cleaning that up. Since we don't currently have a v8.1M feature bit or any v8.1M CPUs, the minor changes to this logic that we'll need for v8.1M are marked up with TODO comments. Backports commit a3494d4671797c291c88bd414acb0aead15f7239 from qemu	2021-02-26 11:17:23 -05:00
Richard Henderson	303d922e5d	target/arm: Split VFM decode Passing the raw o1 and o2 fields from the manual is less instructive than it might be. Do the full decode and let the trans_* functions pass in booleans to a helper. Backports commit d486f8308a13543bbcc4887f246e856df991a4bc from qemu	2020-03-22 00:07:53 -04:00
Richard Henderson	a445b1dab9	target/arm: Add formats for some vfp 2 and 3-register insns Those vfp instructions without extra opcode fields can share a common @format for brevity. Backports commit 906b60facc3d3dd3af56cb1a7860175d805e10a3 from qemu	2020-03-22 00:05:27 -04:00
Richard Henderson	f1ce64857c	target/arm: Move VLLDM and VLSTM to vfp.decode Now that we no longer have an early check for ARM_FEATURE_VFP, we can use the proper ISA check in trans_VLLDM_VLSTM. Backports commit dc778a6873f534817a13257be2acba3ca87ec015 from qemu	2020-03-21 23:51:59 -04:00
Peter Maydell	ba0ddd3459	target/arm: Use vfp_expand_imm() for AArch32 VFP VMOV_imm The AArch32 VMOV (immediate) instruction uses the same VFP encoded immediate format we already handle in vfp_expand_imm(). Use that function rather than hand-decoding it. Backports commit 9bee50b498410ed6466018b26464d7384c7879e9 from qemu	2019-06-25 18:20:19 -05:00
Peter Maydell	1a0d31c05e	target/arm: Convert float-to-integer VCVT insns to decodetree Convert the float-to-integer VCVT instructions to decodetree. Since these are the last unconverted instructions, we can delete the old decoder structure entirely now. Backports commit 3111bfc2da6ba0c8396dc97ca479942d711c6146 from qemu	2019-06-13 19:40:02 -04:00
Peter Maydell	f6c67559d4	target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree Convert the VCVT (between floating-point and fixed-point) instructions to decodetree. Backports commit e3d6f4290c788e850c64815f0b3e331600a4bcc0 from qemu	2019-06-13 19:35:51 -04:00
Peter Maydell	c66d477359	target/arm: Convert VJCVT to decodetree Convert the VJCVT instruction to decodetree. Backports commit 92073e947487e2109f3dfebfeaa48d6323cbd981 from qemu	2019-06-13 19:31:35 -04:00
Peter Maydell	7be9e6f9b4	target/arm: Convert integer-to-float insns to decodetree Convert the VCVT integer-to-float instructions to decodetree. Backports commit 8fc9d8918cde342c71923e361b9f2193e36ed18b from qemu	2019-06-13 19:20:41 -04:00
Peter Maydell	e0e4f99103	target/arm: Convert double-single precision conversion insns to decodetree Convert the VCVT double/single precision conversion insns to decodetree. Backports commit 6ed7e49c3693ed8411773c4880f42b2932beb12d from qemu	2019-06-13 19:18:01 -04:00
Peter Maydell	ab9d0235ed	target/arm: Convert VFP round insns to decodetree Convert the VFP round-to-integer instructions VRINTR, VRINTZ and VRINTX to decodetree. These instructions were only introduced as part of the "VFP misc" additions in v8A, so we check this. The old decoder's implementation was incorrectly providing them even for v7A CPUs. Backports commit e25155f55dc4abb427a88dfe58bbbc550fe7d643 from qemu	2019-06-13 19:15:05 -04:00
Peter Maydell	9e842a0f2a	target/arm: Convert the VCVT-to-f16 insns to decodetree Convert the VCVTT and VCVTB instructions which convert from f32 and f64 to f16 to decodetree. Since we're no longer constrained to the old decoder's style using cpu_F0s and cpu_F0d we can perform a direct 16 bit store of the right half of the input single-precision register rather than doing a load/modify/store sequence on the full 32 bits. Backports commit cdfd14e86ab0b1ca29a702d13a8e4af2e902a9bf from qemu	2019-06-13 19:03:59 -04:00
Peter Maydell	7d927b2d0e	target/arm: Convert the VCVT-from-f16 insns to decodetree Convert the VCVTT, VCVTB instructions that deal with conversion from half-precision floats to f32 or 64 to decodetree. Since we're no longer constrained to the old decoder's style using cpu_F0s and cpu_F0d we can perform a direct 16 bit load of the right half of the input single-precision register rather than loading the full 32 bits and then doing a separate shift or sign-extension. Backports commit b623d803dda805f07aadcbf098961fde27315c19 from qemu	2019-06-13 19:00:23 -04:00
Peter Maydell	e6cc2616d2	target/arm: Convert VFP comparison insns to decodetree Convert the VFP comparison instructions to decodetree. Note that comparison instructions should not honour the VFP short-vector length and stride information: they are scalar-only operations. This applies to all the 2-operand instructions except for VMOV, VABS, VNEG and VSQRT. (In the old decoder this is implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.) Backports commit 386bba2368842fc74388a3c1651c6c0c0c70adbd from qemu	2019-06-13 18:55:53 -04:00
Peter Maydell	a75a3e321f	target/arm: Convert VMOV (register) to decodetree Backports commit 17552b979ebb9848a534c25ebed18a1072710058 from qemu	2019-06-13 18:49:49 -04:00
Peter Maydell	ee30962891	target/arm: Convert VSQRT to decodetree Convert the VSQRT instruction to decodetree. Backports commit b8474540cbce4e2fa45010416375d1bcbe86dc15 from qemu	2019-06-13 18:47:32 -04:00
Peter Maydell	7aea3da6b7	target/arm: Convert VNEG to decodetree Convert the VNEG instruction to decodetree. Backports commit 1882651afdb0ca44f0631192fbe65a71c660d809 from qemu	2019-06-13 18:43:50 -04:00
Peter Maydell	1032d86ad3	target/arm: Convert VABS to decodetree Convert the VFP VABS instruction to decodetree. Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or VFPGen2OpDPFn because none of the operations which use this format and support short vectors will need it. Backports commit 90287e22c987e9840704345ed33d237cbe759dd9 from qemu	2019-06-13 18:41:43 -04:00
Peter Maydell	7a16bc6876	target/arm: Convert VMOV (imm) to decodetree Convert the VFP VMOV (immediate) instruction to decodetree. Backports commit b518c753f0b94e14e01e97b4ec42c100dafc0cc2 from qemu	2019-06-13 18:37:58 -04:00
Peter Maydell	0ebb6b8b90	target/arm: Convert VFP fused multiply-add insns to decodetree Convert the VFP fused multiply-add instructions (VFNMA, VFNMS, VFMA, VFMS) to decodetree. Note that in the old decode structure we were implementing these to honour the VFP vector stride/length. These instructions were introduced in VFPv4, and in the v7A architecture they are UNPREDICTABLE if the vector stride or length are non-zero. In v8A they must UNDEF if stride or length are non-zero, like all VFP instructions; we choose to UNDEF always. Backports commit d4893b01d23060845ee3855bc96626e16aad9ab5 from qemu	2019-06-13 18:24:36 -04:00
Peter Maydell	321bcc822b	target/arm: Convert VDIV to decodetree Convert the VDIV instruction to decodetree. Backports commit 519ee7ae31e050eb0ff9ad35c213f0bd7ab1c03e from qemu	2019-06-13 18:19:47 -04:00
Peter Maydell	76c74bc657	target/arm: Convert VSUB to decodetree Convert the VSUB instruction to decodetree. Backports commit 8fec9a119264b7936503abce3c106fad7e3ccb76 from qemu.	2019-06-13 18:18:00 -04:00
Peter Maydell	f56f0342ad	target/arm: Convert VADD to decodetree Convert the VADD instruction to decodetree. Backports commit ce28b303716e7eca3f3765bf6776d722ebbe1122 from qemu	2019-06-13 18:15:52 -04:00
Peter Maydell	06584edf61	target/arm: Convert VNMUL to decodetree Convert the VNMUL instruction to decodetree. Backports commit 43c4be1236c105090d134540da1036073d157cd4 from qemu	2019-06-13 18:14:16 -04:00
Peter Maydell	2c5e102017	target/arm: Convert VMUL to decodetree Convert the VMUL instruction to decodetree. Backports commit 88c5188ced60e9f2b8cc3af3b9bc4a8031c8c996 from qemu	2019-06-13 18:12:03 -04:00
Peter Maydell	b26b6a12a2	target/arm: Convert VFP VNMLA to decodetree Convert the VFP VNMLA instruction to decodetree. Backports commit 8a483533adc1bdc2decb8f456dbe930a2d245a8b from qemu	2019-06-13 18:09:57 -04:00
Peter Maydell	638b90de31	target/arm: Convert VFP VNMLS to decodetree Convert the VFP VNMLS instruction to decodetree. Backports commit c54a416cc6d60efbc79dd37aaf0c8918c05b5815 from qemu	2019-06-13 18:06:59 -04:00
Peter Maydell	67ad40ffa4	target/arm: Convert VFP VMLS to decodetree Convert the VFP VMLS instruction to decodetree. Backports commit e7258280d46af4ab6a0cc93ccfe8f6614defb4b7 from qemu	2019-06-13 18:02:37 -04:00
Peter Maydell	edf81eb214	target/arm: Convert VFP VMLA to decodetree Convert the VFP VMLA instruction to decodetree. This is the first of the VFP 3-operand data processing instructions, so we include in this patch the code which loops over the elements for an old-style VFP vector operation. The existing code to do this looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since we are going to be converting instructions one at a time anyway we can take the opportunity to make the new loop use TCG temporaries, which means we can do that conversion one operation at a time rather than needing to do it all in one go. We include an UNDEF check which was missing in the old code: short-vector operations (with stride or length non-zero) were deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec field does not indicate that support for short vectors is present we UNDEF the operations that would use them. (This is a change of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which previously were all incorrectly allowing short-vector operations.) Note that the conversion fixes a bug in the old code for the case of VFP short-vector "mixed scalar/vector operations". These happen where the destination register is in a vector bank but but the second operand is in a scalar bank. For example vmla.f64 d10, d1, d16 with length 2 stride 2 is equivalent to the pair of scalar operations vmla.f64 d10, d1, d16 vmla.f64 d8, d3, d16 where the destination and first input register cycle through their vector but the second input is scalar (d16). In the old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d} as a temporary output for the multiply, which trashes the second input operand. For the fully-scalar case (where we never do a second iteration) and the fully-vector case (where the loop loads the new second input operand) this doesn't matter, but for the mixed scalar/vector case we will end up using the wrong value for later loop iterations. In the new code we use TCG temporaries and so avoid the bug. This bug is present for all the multiply-accumulate insns that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS. Note 2: the expression used to calculate the next register number in the vector bank is not in fact correct; we leave this behaviour unchanged from the old decoder and will fix this bug later in the series. Backports commit 266bd25c485597c94209bfdb3891c1d0c573c164 from qemu	2019-06-13 17:59:16 -04:00
Peter Maydell	ff7042567e	target/arm: Convert the VFP load/store multiple insns to decodetree Convert the VFP load/store multiple insns to decodetree. This includes tightening up the UNDEF checking for pre-VFPv3 CPUs which only have D0-D15 : they now UNDEF for any access to D16-D31, not merely when the smallest register in the transfer list is in D16-D31. This conversion does not try to share code between the single precision and the double precision versions; this looks a bit duplicative of code, but it leaves the door open for a future refactoring which gets rid of the use of the "F0" registers by inlining the various functions like gen_vfp_ld() and gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }" conditionalisation. Backports commit fa288de272c5c8a66d5eb683b123706a52bc7ad6 from qemu	2019-06-13 17:26:52 -04:00
Peter Maydell	6f0633ce80	target/arm: Convert VFP VLDR and VSTR to decodetree Convert the VFP single load/store insns VLDR and VSTR to decodetree. Backports commit 79b02a3b5231c5b8cd31e50cd549968dd0a05c49 from qemu	2019-06-13 17:22:48 -04:00
Peter Maydell	fe98885ff2	target/arm: Convert VFP two-register transfer insns to decodetree Convert the VFP two-register transfer instructions to decodetree (in the v8 Arm ARM these are the "Advanced SIMD and floating-point 64-bit move" encoding group). Again, we expand out the sequences involving gen_vfp_msr() and gen_msr_vfp(). Backports commit 81f681106eabe21c55118a5a41999fb7387fb714 from qemu	2019-06-13 17:20:00 -04:00
Peter Maydell	3fb3403b82	target/arm: Convert single-precision register moves to decodetree Convert the "single-precision" register moves to decodetree: * VMSR * VMRS * VMOV between general purpose register and single precision Note that the VMSR/VMRS conversions make our handling of the "should this UNDEF?" checks consistent between the two instructions: * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0 (previously was a nop) * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better (previously was a nop) * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better (previously would write to the register, which had no guest-visible effect because we always UNDEF reads) We also tighten up the decode: we were previously underdecoding some SBZ or SBO bits. The conversion of VMOV_single includes the expansion out of the gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr() sequences into the simpler direct load/store of the TCG temp via neon_{load,store}_reg32(): we know in the new function that we're always single-precision, we don't need to use the old-and-deprecated cpu_F0* TCG globals, and we don't happen to have the declaration of gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the new function is. Backports commit a9ab50011aeda2dd012da99069e078379315ea18 from qemu	2019-06-13 17:16:38 -04:00
Peter Maydell	694058da94	target/arm: Convert double-precision register moves to decodetree Convert the "double-precision" register moves to decodetree: this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP. Note that the conversion process has tightened up a few of the UNDEF encoding checks: we now correctly forbid: * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10 * VMOV-from-gpr with opc1:opc2 == 0x10 * VDUP with B:E == 11 * VDUP with Q == 1 and Vn<0> == 1 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- The accesses of elements < 32 bits could be improved by doing direct ld/st of the right size rather than 32-bit read-and-shift or read-modify-write, but we leave this for later cleanup, since this series is generally trying to stick to fixing the decode. Backports commit 9851ed9269d214c0c6feba960dd14ff09e6c34b4 from qemu	2019-06-13 17:11:56 -04:00
Peter Maydell	9732ebba5c	target/arm: Add stubs for AArch32 VFP decodetree Add the infrastructure for building and invoking a decodetree decoder for the AArch32 VFP encodings. At the moment the new decoder covers nothing, so we always fall back to the existing hand-written decode. We need to have one decoder for the unconditional insns and one for the conditional insns, as otherwise the patterns for conditional insns would incorrectly match against the unconditional ones too. Since translate.c is over 14,000 lines long and we're going to be touching pretty much every line of the VFP code as part of the decodetree conversion, we create a new translate-vfp.inc.c to hold the code which deals with VFP in the new scheme. It should be possible to convert this into a standalone translation unit eventually, but the conversion process will be much simpler if we simply #include it midway through translate.c to start with. Backports commit 78e138bc1f672c145ef6ace74617db00eebaa2ba from qemu	2019-06-13 16:24:37 -04:00

46 commits