unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-11-05 19:04:50 +00:00

Author	SHA1	Message	Date
Peter Maydell	5c6730a432	target/arm: Fix float16 pairwise Neon ops on big-endian hosts In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error meant we were using the H4() address swizzler macro rather than the H2() which is required for 2-byte data. This had no effect on little-endian hosts but meant we put the result data into the destination Dreg in the wrong order on big-endian hosts. Backports 552714c0812a10e5cff239bd29928e5fcb8d8b3b	2021-03-02 13:02:31 -05:00
Richard Henderson	d473f66177	target/arm: Improve do_prewiden_3d We can use proper widening loads to extend 32-bit inputs, and skip the "widenfn" step. Backports 8aab18a2c5209e4e48998a61fbc2d89f374331ed	2021-03-02 13:00:25 -05:00
Richard Henderson	9263117d47	target/arm: Simplify do_long_3d and do_2scalar_long In both cases, we can sink the write-back and perform the accumulate into the normal destination temps Backports 9f1a5f93c2dd345dc6c8fe86ed14bf1485056f6e	2021-03-02 12:46:53 -05:00
Richard Henderson	07c2b70234	target/arm: Rename neon_load_reg64 to vfp_load_reg64 The only uses of this function are for loading VFP double-precision values, and nothing to do with NEON. Backports b38b96ca90827012ab8eb045c1337cea83a54c4b	2021-03-02 12:43:25 -05:00
Richard Henderson	9d87b62578	target/arm: Add read/write_neon_element64 Replace all uses of neon_load/store_reg64 within translate-neon.c.inc. Backports 0aa8e700a53b0aa7275ed747b8fa3acb61d35f2d	2021-03-02 12:40:33 -05:00
Richard Henderson	89b1f62878	target/arm: Rename neon_load_reg32 to vfp_load_reg32 The only uses of this function are for loading VFP single-precision values, and nothing to do with NEON. Backports 21c1c0e50b73c580c6bfc8f2314d1b6a14793561	2021-03-02 12:30:20 -05:00
Richard Henderson	011d9ab061	target/arm: Expand read/write_neon_element32 to all MemOp We can then use this to improve VMOV (scalar to gp) and VMOV (gp to scalar) so that we simply perform the memory operation that we wanted, rather than inserting or extracting from a 32-bit quantity. These were the last uses of neon_load/store_reg, so remove them. Backports 4d5fa5a80ac28f34b8497be1e85371272413a12e	2021-03-02 12:26:41 -05:00
Richard Henderson	d21316d639	target/arm: Add read/write_neon_element32 Model these off the aa64 read/write_vec_element functions. Use it within translate-neon.c.inc. The new functions do not allocate or free temps, so this rearranges the calling code a bit. Backports a712266f5d5a36d04b22fe69fa15592d62bed019	2021-03-02 12:18:31 -05:00
Richard Henderson	e390c1ec7f	target/arm: Use neon_element_offset in vfp_reg_offset This seems a bit more readable than using offsetof CPU_DoubleU. Backports d8719785fde2f5041986853a314c05c6f567d3cb	2021-03-02 11:55:49 -05:00
Richard Henderson	c1ca9e53da	target/arm: Use neon_element_offset in neon_load/store_reg These are the only users of neon_reg_offset, so remove that. Backports 0f2cdc82276a723ee58562b56b9d537a4bd7bfef	2021-03-02 11:54:56 -05:00
Richard Henderson	1b09d0d96f	target/arm: Move neon_element_offset to translate.c This will shortly have users outside of translate-neon.c.inc. Backports 7ec85c02833f4264840c6ed78b749443a7b4ffe0	2021-03-02 11:52:59 -05:00
Richard Henderson	8a20537e7f	target/arm: Introduce neon_full_reg_offset This function makes it clear that we're talking about the whole register, and not the 32-bit piece at index 0. This fixes a bug when running on a big-endian host. Backports 015ee81a4c06b644969f621fd9965cc6372b879e	2021-03-02 11:50:36 -05:00
Peter Maydell	2f0940677e	target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension If the M-profile low-overhead-branch extension is implemented, FPSCR bits [18:16] are a new field LTPSIZE. If MVE is not implemented (currently always true for us) then this field always reads as 4 and ignores writes. These bits used to be the vector-length field for the old short-vector extension, so we need to take care that they are not misinterpreted as setting vec_len. We do this with a rearrangement of the vfp_set_fpscr() code that deals with vec_len, vec_stride and also the QC bit; this obviates the need for the M-profile only masking step that we used to have at the start of the function. We provide a new field in CPUState for LTPSIZE, even though this will always be 4, in preparation for MVE, so we don't have to come back later and split it out of the vfp.xregs[FPSCR] value. (This state struct field will be saved and restored as part of the FPSCR value via the vmstate_fpscr in machine.c.) Backports 8128c8e8cc9489a8387c74075974f86dc0222e7f	2021-03-01 20:36:02 -05:00
Peter Maydell	8a6e118a17	target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16 M-profile CPUs with half-precision floating point support should be able to write to FPSCR.FZ16, but an M-profile specific masking of the value at the top of vfp_set_fpscr() currently prevents that. This is not yet an active bug because we have no M-profile FP16 CPUs, but needs to be fixed before we can add any. The bits that the masking is effectively preventing from being set are the A-profile only short-vector Len and Stride fields, plus the Neon QC bit. Rearrange the order of the function so that those fields are handled earlier and only under a suitable guard; this allows us to drop the M-profile specific masking, making FZ16 writeable. This change also makes the QC bit correctly RAZ/WI for older no-Neon A-profile cores. This refactoring also paves the way for the low-overhead-branch LTPSIZE field, which uses some of the bits that are used for A-profile Stride and Len. Backports commit d31e2ce68d56f5bcc83831497e5fe4b8a7e18e85	2021-03-01 20:33:22 -05:00
Peter Maydell	3ae5543825	target/arm: Implement v8.1M low-overhead-loop instructions v8.1M's "low-overhead-loop" extension has three instructions for looping: * DLS (start of a do-loop) * WLS (start of a while-loop) * LE (end of a loop) The loop-start instructions are both simple operations to start a loop whose iteration count (if any) is in LR. The loop-end instruction handles "decrement iteration count and jump back to loop start"; it also caches the information about the branch back to the start of the loop to improve performance of the branch on subsequent iterations. As with the branch-future instructions, the architecture permits an implementation to discard the LO_BRANCH_INFO cache at any time, and QEMU takes the IMPDEF option to never set it in the first place (equivalent to discarding it immediately), because for us a "real" implementation would be unnecessary complexity. (This implementation only provides the simple looping constructs; the vector extension MVE (Helium) adds some extra variants to handle looping across vectors. We'll add those later when we implement MVE.) Backports commit b7226369721896ab9ef71544e4fe95b40710e05a	2021-03-01 20:29:04 -05:00
Peter Maydell	be197f9857	target/arm: Implement v8.1M branch-future insns (as NOPs) v8.1M implements a new 'branch future' feature, which is a set of instructions that request the CPU to perform a branch "in the future", when it reaches a particular execution address. In hardware, the expected implementation is that the information about the branch location and destination is cached and then acted upon when execution reaches the specified address. However the architecture permits an implementation to discard this cached information at any point, and so guest code must always include a normal branch insn at the branch point as a fallback. In particular, an implementation is specifically permitted to treat all BF insns as NOPs (which is equivalent to discarding the cached information immediately). For QEMU, implementing this caching of branch information would be complicated and would not improve the speed of execution at all, so we make the IMPDEF choice to implement all BF insns as NOPs. Backports commit 05903f036edba8e3ed940cc215b8e27fb49265b9	2021-03-01 20:25:15 -05:00
Peter Maydell	966246d991	target/arm: Don't allow BLX imm for M-profile The BLX immediate insn in the Thumb encoding always performs a switch from Thumb to Arm state. This would be totally useless in M-profile which has no Arm decoder, and so the instruction does not exist at all there. Make the encoding UNDEF for M-profile. (This part of the encoding space is used for the branch-future and low-overhead-loop insns in v8.1M.) Backports 920f04fa3ea789f8f85a52cee5395b8887b56cf7	2021-03-01 20:23:59 -05:00
Peter Maydell	5680bc701b	target/arm: Make the t32 insn[25:23]=111 group non-overlapping The t32 decode has a group which represents a set of insns which overlap with B_cond_thumb because they have [25:23]=111 (which is an invalid condition code field for the branch insn). This group is currently defined using the {} overlap-OK syntax, but it is almost entirely non-overlapping patterns. Switch it over to use a non-overlapping group. For this to be valid syntactically, CPS must move into the same overlapping-group as the hint insns (CPS vs hints was the only actual use of the overlap facility for the group). The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer necessary and so we can remove it (promoting those insns to be members of the parent group). Backports 45f11876ae86128bdee27e0b089045de43cc88e4	2021-03-01 20:22:11 -05:00
Peter Maydell	666fe17025	target/arm: Implement v8.1M conditional-select insns v8.1M brings four new insns to M-profile: * CSEL : Rd = cond ? Rn : Rm * CSINC : Rd = cond ? Rn : Rm+1 * CSINV : Rd = cond ? Rn : ~Rm * CSNEG : Rd = cond ? Rn : -Rm Implement these. Backports cc73bbded0dfb5612b0e416f7eda13a66950542a	2021-03-01 20:19:33 -05:00
Peter Maydell	2dae268fcb	target/arm: Implement v8.1M NOCP handling From v8.1M, disabled-coprocessor handling changes slightly: * coprocessors 8, 9, 14 and 15 are also governed by the cp10 enable bit, like cp11 * an extra range of instruction patterns is considered to be inside the coprocessor space We previously marked these up with TODO comments; implement the correct behaviour. Unfortunately there is no ID register field which indicates this behaviour. We could in theory test an unrelated ID register which indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch >= 3 (low-overhead-loops), but it seems better to simply define a new ARM_FEATURE_V8_1M feature flag and use it for this and other new-in-v8.1M behaviour that isn't identifiable from the ID registers. Backports commit 5d2555a1fe7370feeb1efbbf276a653040910017	2021-03-01 20:16:09 -05:00
Peter Maydell	51093daf5f	decodetree: Fix codegen for non-overlapping group inside overlapping group For nested groups like: { [ pattern 1 pattern 2 ] pattern 3 } the intended behaviour is that patterns 1 and 2 must not overlap with each other; if the insn matches neither then we fall through to pattern 3 as the next thing in the outer overlapping group. Currently we generate incorrect code for this situation, because in the code path for a failed match inside the inner non-overlapping group we generate a "return" statement, which causes decode to stop entirely rather than continuing to the next thing in the outer group. Generate a "break" instead, so that decode flow behaves as required for this nested group case. Backports 514101c0b931f0a11a40d29d26af1cc40482f951	2021-03-01 20:14:19 -05:00
Richard Henderson	f7e831a7e4	target/arm: Ignore HCR_EL2.ATA when {E2H,TGE} != 11 Unlike many other bits in HCR_EL2, the description for this bit does not contain the phrase "if ... this field behaves as 0 for all purposes other than", so do not squash the bit in arm_hcr_el2_eff. Instead, replicate the E2H+TGE test in the two places that require it. Backports 4301acd7d7d455792ea873ced75c0b5d653618b1	2021-03-01 20:12:36 -05:00
Richard Henderson	4f00eacb11	target/arm: Fix reported EL for mte_check_fail The reporting in AArch64.TagCheckFail only depends on PSTATE.EL, and not the AccType of the operation. There are two guest visible problems that affect LDTR and STTR because of this: (1) Selecting TCF0 vs TCF1 to decide on reporting, (2) Report "data abort same el" not "data abort lower el". Backports 50244cc76abcac3296cff3d84826f5ff71808c80	2021-03-01 20:10:44 -05:00
Richard Henderson	511636a3f4	target/arm: Remove redundant mmu_idx lookup We already have the full ARMMMUIdx as computed from the function parameter. For the purpose of regime_has_2_ranges, we can ignore any difference between AccType_Normal and AccType_Unpriv, which would be the only difference between the passed mmu_idx and arm_mmu_idx_el. Backports 4aedfc0f633fd06dd2a5dc8ffa93f4c56117e37f	2021-03-01 20:09:51 -05:00
Peter Maydell	d350644817	target/arm: AArch32 VCVT fixed-point to float is always round-to-nearest For AArch32, unlike the VCVT of integer to float, which honours the rounding mode specified by the FPSCR, VCVT of fixed-point to float is always round-to-nearest. (AArch64 fixed-point-to-float conversions always honour the FPCR rounding mode.) Implement this by providing _round_to_nearest versions of the relevant helpers which set the rounding mode temporarily when making the call to the underlying softfloat function. We only need to change the VFP VCVT instructions, because the standard- FPSCR value used by the Neon VCVT is always set to round-to-nearest, so we don't need to do the extra work of saving and restoring the rounding mode. Backports commit 61db12d9f9eb36761edba4d9a414cd8dd34c512b	2021-03-01 20:04:31 -05:00
Peter Maydell	31013d5a8f	target/arm: Fix SMLAD incorrect setting of Q bit The SMLAD instruction is supposed to: * signed multiply Rn[15:0] * Rm[15:0] * signed multiply Rn[31:16] * Rm[31:16] * perform a signed addition of the products and Ra * set Rd to the low 32 bits of the theoretical infinite-precision result * set the Q flag if the sign-extension of Rd would differ from the infinite-precision result (ie on overflow) Our current implementation doesn't quite do this, though: it performs an addition of the products setting Q on overflow, and then it adds Ra, again possibly setting Q. This sometimes incorrectly sets Q when the architecturally mandated only-check-for-overflow-once algorithm does not. For instance: r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff smlad r0, r1, r2, r3 This is (-32768 * -32768) + (-32768 * -32768) - 1 The products are both 0x4000_0000, so when added together as 32-bit signed numbers they overflow (and QEMU sets Q), but because the addition of Ra == -1 brings the total back down to 0x7fff_ffff there is no overflow for the complete operation and setting Q is incorrect. Fix this edge case by resorting to 64-bit arithmetic for the case where we need to add three values together. Backports commit 5288145d716338ace0f83e3ff05c4d07715bb4f4	2021-03-01 19:58:39 -05:00
Peter Maydell	6cd06169ee	target/arm: Make '-cpu max' have a 48-bit PA QEMU supports a 48-bit physical address range, but we don't currently expose it in the '-cpu max' ID registers (you get the same range as Cortex-A57, which is 44 bits). Set the ID_AA64MMFR0.PARange field to indicate 48 bits. Backports d1b6b7017572e8d82f26eb827a1dba0e8cf3cae6	2021-03-01 19:50:28 -05:00
Richard Henderson	c648361597	tcg: Remove TCG_TARGET_HAS_cmp_vec The cmp_vec opcode is mandatory; this symbol is unused. Backports cae5d53b9e72d7a1e43cebeb36471d77a16c6e43	2021-03-01 19:49:02 -05:00
Richard Henderson	45af31fcb4	tcg/optimize: Fold dup2_vec When the two arguments are identical, this can be reduced to dup_vec or to mov_vec from a tcg_constant_vec. Backports commit 1dc4fe70128db05237a00eda6eb15e2b44deb31f	2021-03-01 19:46:14 -05:00
Richard Henderson	456fb66617	tcg: Fix generation of dupi_vec for 32-bit host The definition of INDEX_op_dupi_vec is that it operates on units of tcg_target_ulong -- in this case 32 bits. It does not work to use this for a uint64_t value that happens to be small enough to fit in tcg_target_ulong. Backports a5b30d950c42b14bc9da24d1e68add6538d23336	2021-03-01 19:45:30 -05:00
Richard Henderson	578673be68	tcg/i386: Fix dupi for avx2 32-bit hosts The previous change wrongly stated that 32-bit avx2 should have used VPBROADCASTW. But that's a 16-bit broadcast and we want a 32-bit broadcast. Backports f80d09b599a5e0fd7f44653f23b04104cb703f7a	2021-03-01 19:44:09 -05:00
Richard Henderson	50b3632ab4	tcg: Remove TCGOpDef.used The last user of this field disappeared in f69d277ece4.	2021-03-01 19:43:37 -05:00
Richard Henderson	7813c57f9e	tcg: Move some TCG_CT_* bits to TCGArgConstraint bitfields These are easier to set and test when they have their own fields. Reduce the size of alias_index and sort_index to 4 bits, which is sufficient for TCG_MAX_OP_ARGS. This leaves only the bits indicating constants within the ct field. Move all initialization to allocation time, rather than init individual fields in process_op_defs. Backports bc2b17e6ea582ef3ade2bdca750de269c674c915	2021-03-01 19:41:34 -05:00
Richard Henderson	71a34d84e5	tcg: Remove TCG_CT_REG This wasn't actually used for anything, really. All variable operands must accept registers, and which are indicated by the set in TCGArgConstraint.regs. Backports commit 74a117906b87ff9220e4baae5a7431d6f4eadd45	2021-03-01 19:38:00 -05:00
Richard Henderson	ae075d324d	tcg: Move sorted_args into TCGArgConstraint.sort_index This uses an existing hole in the TCGArgConstraint structure and will be convenient for keeping the data in one place. Backports 66792f90f14fef18b25a168922877a367ecdca05	2021-03-01 19:33:45 -05:00
Richard Henderson	e3356f9bad	tcg: Drop union from TCGArgConstraint The union is unused; let "regs" appear in the main structure without the "u.regs" wrapping. Backports 9be0d08019465b38e2f1a605960961a491430c21	2021-03-01 19:29:19 -05:00
Richard Henderson	1551f6be9d	tcg: Adjust simd_desc size encoding With larger vector sizes, it turns out oprsz == maxsz, and we only need to represent mismatch for oprsz <= 32. We do, however, need to represent larger oprsz and do so without reducing SIMD_DATA_BITS. Reduce the size of the oprsz field and increase the maxsz field. Steal the oprsz value of 24 to indicate equality with maxsz. Backports e2e7168a214b0ed98dc357bba96816486a289762	2021-03-01 19:23:37 -05:00
Richard Henderson	567fa21c65	target/arm: Fix SVE splice While converting to gen_gvec_ool_zzzp, we lost passing a->esz as the data argument to the function. Backports commit dd701fafe55a78e655d4823d29226d92250a6b56	2021-03-01 19:20:44 -05:00
Richard Henderson	ccb293911f	target/arm: Fix sve ldr/str The mte update missed a bit when producing clean addresses. Fixes: b2aa8879b88 Backports d8227b098301935ea8e0e032e7d41e5dc3e97590	2021-03-01 19:20:04 -05:00
Peter Maydell	79feec40df	target/arm: Make isar_feature_aa32_fp16_arith() handle M-profile The M-profile definition of the MVFR1 ID register differs slightly from the A-profile one, and in particular the check for "does the CPU support fp16 arithmetic" is not the same. We don't currently implement any M-profile CPUs with fp16 arithmetic, so this is not yet a visible bug, but correcting the logic now disarms this beartrap for when we eventually do. Backports commit dfc523a84b06b6a4b583ed4c29d24fd980dd37a0	2021-03-01 19:17:23 -05:00
Peter Maydell	09a7d6381e	target/arm: Move id_pfr0, id_pfr1 into ARMISARegisters Move the id_pfr0 and id_pfr1 fields into the ARMISARegisters sub-struct. We're going to want id_pfr1 for an isar_features check, and moving both at the same time avoids an odd inconsistency. Changes other than the ones to cpu.h and kvm64.c made automatically with: perl -p -i -e 's/cpu->id_pfr/cpu->isar.id_pfr/' target/arm/*.c hw/intc/armv7m_nvic.c Backports commit 8a130a7be6e222965641e1fd9469fd3ee752c7d4	2021-03-01 19:15:10 -05:00
Peter Maydell	ed92f3c42b	target/arm: Replace ARM_FEATURE_PXN with ID_MMFR0.VMSA check The ARM_FEATURE_PXN bit indicates whether the CPU supports the PXN bit in short-descriptor translation table format descriptors. This is indicated by ID_MMFR0.VMSA being at least 0b0100. Replace the feature bit with an ID register check, in line with our preference for ID register checks over feature bits. Backports commit 0ae0326b984e77a55c224b7863071bd3d8951231	2021-03-01 19:06:15 -05:00
Xiaoyao Li	d9d68cc128	i386/cpu: Clear FEAT_XSAVE_COMP_{LO,HI} when XSAVE is not available Per Intel SDM vol 1, 13.2, if CPUID.1:ECX.XSAVE[bit 26] is 0, the processor provides no further enumeration through CPUID function 0DH. QEMU does not do this for "-cpu host,-xsave". Backports 19ca8285fcd61a8f60f2f44f789a561e0958e8e6	2021-03-01 19:04:03 -05:00
Richard Henderson	5e6196ea6b	target/riscv: Set instance_align on RISCVCPU TypeInfo Fix alignment of CPURISCVState.vreg. Backports 5de5b99b3101a1648ed583193db8d92eea0c4545	2021-03-01 19:00:27 -05:00
Richard Henderson	cdf40f7ff6	target/arm: Set instance_align on CPUARM TypeInfo Fix alignment of CPUARMState.vfp.zregs. Backports d03087bda4ba17076b430fd2af083020d7c5112a	2021-03-01 18:58:44 -05:00
Richard Henderson	86dd30850d	qom: Allow objects to be allocated with increased alignment It turns out that some hosts have a default malloc alignment less than that required for vectors. We assume that, with compiler annotation on CPUArchState, that we can properly align the vector portion of the guest state. Fix the alignment of the allocation by using qemu_memalloc when required.	2021-03-01 18:32:51 -05:00
Eduardo Habkost	6baafeafd4	qom: Correct object_class_dynamic_cast_assert() documentation object_class_dynamic_cast_assert() is not used by INTERFACE_CHECK, remove misleading mention of that function in the documentation.	2021-03-01 18:29:34 -05:00
Aaron Lindsay	97702da7ad	target/arm: Count PMU events when MDCR.SPME is set This check was backwards when introduced in commit 033614c47de78409ad3fb39bb7bd1483b71c6789: target/arm: Filter cycle counter based on PMCCFILTR_EL0 Backports commit db1f3afb17269cf2bd86c222e1bced748487ef71	2021-03-01 18:25:25 -05:00
Peter Maydell	16ad0d93d9	target/arm: Convert VCMLA, VCADD size field to MO_* in decode The VCMLA and VCADD insns have a size field which is 0 for fp16 and 1 for fp32 (note that this is the reverse of the Neon 3-same encoding!). Convert it to MO_* values in decode for consistency. Backports d186a4854c04e9832907b0b4240a47731da20993	2021-03-01 18:23:34 -05:00
Peter Maydell	61abec1908	target/arm: Convert Neon VCVT fp size field to MO_* in decode Convert the insns using the 2reg_vcvt and 2reg_vcvt_f16 formats to pass the size through to the trans function as a MO_* value rather than the '0==f32, 1==f16' used in the fp 3-same encodings. Backports commit 0ae715c658a02af1834b63563c56112a6d8842cb	2021-03-01 18:20:11 -05:00

1 2 3 4 5 ...

5641 commits