unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-12-26 05:55:40 +00:00

Author	SHA1	Message	Date
Tony Nguyen	f75368cd0f	tcg: TCGMemOp is now accelerator independent MemOp Preparation for collapsing the two byte swaps, adjust_endianness and handle_bswap, along the I/O path. Target dependant attributes are conditionalized upon NEED_CPU_H. Backports commit 14776ab5a12972ea439c7fb2203a4c15a09094b4 from qemu	2019-11-28 03:01:12 -05:00
Richard Henderson	c79510378f	tcg/i386: Use umin/umax in expanding unsigned compare Using umin(a, b) == a as an expansion for TCG_COND_LEU is a better alternative to (a - INT_MIN) <= (b - INT_MIN). Backports commit ebcfb91abed8c0fb180a968b9004419c208dcc02 from qemu	2019-05-24 18:36:32 -04:00
Richard Henderson	ffdbc1a233	tcg/i386: Remove expansion for missing minmax This is now handled by code within tcg-op-vec.c. Backports commit 3ec3538a45f2fead475b0cca6945092c87927b4f from qemu	2019-05-24 18:34:44 -04:00
Richard Henderson	68cb096196	tcg/i386: Support vector comparison select value We already had backend support for this feature. Expand the new cmpsel opcode using vpblendb. The combination allows us to avoid an extra NOT for some comparison codes. Backports commit 904c5e19672778cc3349f4975437cfdf3371abb6 from qemu	2019-05-24 18:33:16 -04:00
Richard Henderson	60cfe541b2	tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts The VBROADCASTSD instruction only allows %ymm registers as destination. Rather than forcing VEX.L and writing to the entire 256-bit register, revert to using MOVDDUP with an %xmm register. This is sufficient for an avx1 host since we do not support TCG_TYPE_V256 for that case. Also fix the 32-bit avx2, which should have used VPBROADCASTW. Fixes: 1e262b49b533 Backports commit 7b60ef3264e9627ac6efb34e9a6130647e9b55c0 from qemu	2019-05-24 18:04:08 -04:00
Lioncash	fcaa52c1fe	tcg: Synchronize with qemu Resolves any formatting discrepancies and bad merges that slipped through.	2019-05-16 18:11:08 -04:00
Richard Henderson	fd35490991	tcg/i386: Support vector absolute value Backports commit 18f9b65f1a4225dd314cb9b0a8dea968c5bc2ef3 from qemu	2019-05-16 16:37:33 -04:00
Richard Henderson	18b3df6e4e	tcg/i386: Support vector scalar shift opcodes Backports commit 0a8d7a3bf5a149a82450eef555fd61728703dd84 from qemu	2019-05-16 16:19:44 -04:00
Richard Henderson	f793ec847d	tcg/i386: Support vector variable shift opcodes Backports commit a2ce146a06807fe1d1a81e878b8f249ff1e14038 from qemu	2019-05-16 15:53:33 -04:00
Richard Henderson	66e6bea084	tcg: Add INDEX_op_dupm_vec Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Backports commit 37ee55a081b7863ffab2151068dd1b2f11376914 from qemu	2019-05-16 15:38:02 -04:00
Richard Henderson	a6fd4e2345	tcg/i386: Implement tcg_out_dupm_vec At the same time, improve tcg_out_dupi_vec wrt broadcast from the constant pool. Backports commit 1e262b49b5331441f697461e4305fe06719758a7 from qemu	2019-05-16 15:27:15 -04:00
Richard Henderson	d4e7c6a8c5	tcg: Add tcg_out_dupm_vec to the backend interface Currently stubbed out in all backends that support vectors. Backports commit d6ecb4a978b718dbe108a9fa9ecccc8b7f7cb579 from qemu	2019-05-16 15:24:48 -04:00
Richard Henderson	cf238d3544	tcg: Manually expand INDEX_op_dup_vec This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Backports commit bab1671f0fa928fd678a22f934739f06fd5fd035 from qemu	2019-05-16 15:22:29 -04:00
Richard Henderson	3d20e1678c	tcg: Promote tcg_out_{dup,dupi}_vec to backend interface The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Backports commit e7632cfa8b76cdbbc1c76e8737338ef5844e7d60 from qemu	2019-05-16 15:18:48 -04:00
Richard Henderson	f86bd1c5d6	tcg: Return bool success from tcg_out_mov This patch merely changes the interface, aborting on all failures, of which there are currently none. Backports commit 78113e83e0007e869c9f0cb4c0497a77538988e3 from qemu	2019-05-16 15:14:42 -04:00
Richard Henderson	6145e3fdd7	tcg: Restart TB generation after out-of-line ldst overflow This is part c of relocation overflow handling. Backports commit aeee05f53a5d67304a521d2644dc0a607e3c8b28 from qemu	2019-04-30 10:06:53 -04:00
Richard Henderson	0f20a26b36	tcg/i386: Support INDEX_op_extract2_{i32,i64} Backports commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab from qemu	2019-04-30 09:37:39 -04:00
Lioncash	96c52ea053	tcg: Synchronize with qemu	2019-04-22 02:03:01 -04:00
Mark Cave-Ayland	576df55076	tcg/i386: fix unsigned vector saturating arithmetic Due to a cut/paste error in the original implementation, the unsigned vector saturating arithmetic was erroneously being calculated as signed vector saturating arithmetic. Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic") Backports commit 3115584d39afe8cf2a84a40549029f53792abca5 from qemu	2019-02-12 11:37:12 -05:00
Richard Henderson	63d1aae6b2	tcg/i386: Implement vector minmax arithmetic The avx instruction set does not directly provide MO_64. We can still implement 64-bit with comparison and vpblendvb. Backports commit bc37faf4cb2baa77c44298c01558970b88d32808 from qemu	2019-01-29 16:41:11 -05:00
Richard Henderson	5518b543ed	tcg/i386: Implement vector saturating arithmetic Only MO_8 and MO_16 are implemented, since that's all the instruction set provides. Backports commit 8ffafbcec275e61f6a1a17ac1d0bd918d5b23db3 from qemu	2019-01-29 16:37:55 -05:00
Richard Henderson	24e65f60ed	tcg/i386: Split subroutines out of tcg_expand_vec_op This routine was becoming too large. Backports commit 44f1441dbe14e7174a707d7e7ecbc2c8e080bfda from qemu	2019-01-29 16:33:59 -05:00
Richard Henderson	3b85c29bb9	tcg/i386: Assume 32-bit values are zero-extended We now have an invariant that all TCG_TYPE_I32 values are zero-extended, which means that we do not need to extend them again during qemu_ld/st, either explicitly via a separate tcg_out_ext32u or implicitly via P_ADDR32. Backports commit 4810d96f03be4d3820563e3c6bf13dfc0627f205 from qemu	2018-12-18 05:42:52 -05:00
Richard Henderson	b7b142ed79	tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests This preserves the invariant that all TCG_TYPE_I32 values are zero-extended in the 64-bit host register. Backports commit 75478279a0c1eafc7b69d5382356da138f58f1bd from qemu	2018-12-18 05:38:55 -05:00
Richard Henderson	4e882a95f3	tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Backports commit 3dbc8c61de4e0d0a2afe0897cda7ab28cd37a164 from qemu	2018-12-18 05:36:58 -05:00
Richard Henderson	bdd6118105	tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Backports commit 1d21d95b6101786d44d3b4a12400eb80a1ecc647 from qemu	2018-12-18 05:35:34 -05:00
Richard Henderson	fc86fd34ff	tcg/i386: Return false on failure from patch_reloc Backports commit bec3afd5fc6ab0b6e9d8a01575d58db8d1ad82ce from qemu	2018-12-18 05:27:14 -05:00
Richard Henderson	46189d87b3	tcg: Return success from patch_reloc This will move the assert for success from within (subroutines of) patch_reloc into the callers. It will also let new code do something different when a relocation is out of range. For the moment, all backends are trivially converted to return true. Backports commit 6ac1778676f4259c10b0629ccd9df319a5d1baeb from qemu	2018-12-18 05:25:45 -05:00
Roman Kapl	33e69342e3	tcg/i386: fix vector operations on 32-bit hosts The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers. This was defined as no-op for 32-bit x86, with the assumption that we have eight registers anyway. This assumption is not true once we have xmm regs. Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and have overflown into other opcode fields, wreaking havoc. To trigger these problems, you can try running the "movi d8, #0x0" AArch64 instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated, but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2". Fixes: 770c2fc7bb ("Add vector operations") Backports commit 93bf9a42733321fb632bcb9eafd049ef0e3d9417 from qemu	2018-10-02 04:22:35 -04:00
Richard Henderson	a4c2dbef3e	tcg/i386: Mark xmm registers call-clobbered When host vector registers and operations were introduced, I failed to mark the registers call clobbered as required by the ABI. Fixes: 770c2fc7bb7 Backports commit 672189cd586ea38a2c1d8ab91eb1f9dcff5ceb05 from qemu	2018-07-23 20:00:26 -04:00
John Arbuckle	22c3206738	tcg/i386: Use byte form of xgetbv instruction The assembler in most versions of Mac OS X is pretty old and does not support the xgetbv instruction. To go around this problem, the raw encoding of the instruction is used instead. Backports commit 1019242af11400252f6735ca71a35f81ac23a66d from qemu	2018-06-28 13:23:32 -05:00
Richard Henderson	33f7f6f09a	tcg/i386: Fix dup_vec in non-AVX2 codepath The VPUNPCKLD* instructions are all "non-destructive source", indicated by "NDS" in the encoding string in the x86 ISA manual. This means that they take two source operands, one of which is encoded in the VEX.vvvv field. We were incorrectly treating them as if they were destructive-source and passing 0 as the 'v' argument of tcg_out_vex_modrm(). This meant we were always using %xmm0 as one of the source operands, causing incorrect results if the register allocator happened to want to use something else. For instance the input AArch64 insn: DUP v26.16b, w21 which becomes TCG IR ops: dup_vec v128,e8,tmp2,x21 st_vec v128,e8,tmp2,env,$0xa40 was assembled to: 0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0 0x607c5694: 00 0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1 0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1 0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1 0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14) 0x607c56aa: 00 when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1". This resulted in our incorrectly setting the output vector to q26=0000320000003200:0000320000003200 when given an input of x21 == 0000000002803200 rather than the expected all-zeroes. Pass the correct source register number to tcg_out_vex_modrm() for these insns. Backports commit 7eb30ef0ba2eb59e7430d4848ae8d4bf4e50f768 from qemu	2018-05-11 11:22:38 -04:00
Lioncash	6bdfeb35ec	tcg/i386: Perform comparison pass against qemu Ensures formatting and code are consistent.	2018-03-20 06:29:06 -04:00
Richard Henderson	2310bd4887	tcg/i386: Support INDEX_op_dup2_vec for -m32 Unknown why -m32 was passing with gcc but not clang; it should have failed for both. This would be used for tcg_gen_dup_i64_vec, and visible with the right TB and an aarch64 guest. Backports commit 7f34ed4bcdfda55f978f51aadca64aa970c9f4b6 from qemu	2018-03-17 20:22:24 -04:00
Lioncash	b28c64ed34	tcg/i386: Amend bad merge	2018-03-12 10:11:03 -04:00
Richard Henderson	a16ee979fc	tcg/i386: Always use TZCNT when available I think this is cleaner than sometimes using BSF. Backports commit 39f099ec9d6d420b6fe6f7f4f8ed80ae29c65ff2 from qemu	2018-03-12 05:11:42 -04:00
Richard Henderson	7e327aaf84	util: Introduce include/qemu/cpuid.h Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the supplied <cpuid.h> does not contain the bit_AVX2 define that we use when detecting whether the routine can be enabled. Introduce a qemu-specific header that uses the compiler's definition of __cpuid et al, but supplies any missing bit_* definitions needed. This avoids introducing any extra ifdefs to util/bufferiszero.c, and allows quite a few to be removed from tcg/i386/tcg-target.inc.c. Backports commit 5dd8990841a9e331d9d4838a116291698208cbb6 from qemu	2018-03-09 12:12:00 -05:00
Richard Henderson	b3e89e9996	tcg/i386: Add vector operations The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Backports commit 770c2fc7bb70804ae9869995fd02dadd6d7656ac from qemu	2018-03-07 08:07:40 -05:00
Emilio G. Cota	3cf23eb256	tcg/i386: constify tcg_target_callee_save_regs Backports commit e268f4c036d2b47a4f8bf293c1371b328e03ca04 from qemu	2018-03-05 02:08:02 -05:00
Richard Henderson	fc8b4316a9	tcg: Remove tcg_regset_set32 It's not even clear what the interface REG and VAL32 were supposed to mean. All uses had REG = 0 and VAL32 was the bitset assigned to the destination. Backports commit f46934df662182097dce07d57ec00f37e4d2abf1 from qemu	2018-03-04 23:42:59 -05:00
Richard Henderson	49d09d6888	tcg: Remove tcg_regset_clear Backports commit ccb1bb66ea2a42e773bfa04178d8b383ff86d4d8 from qemu	2018-03-04 23:24:45 -05:00
Richard Henderson	b96f53e8a3	tcg/i386: Store out-of-range call targets in constant pool Already it saves 2 bytes per call, but also the constant pool entry may well be shared across multiple calls. Backports commit 4e45f23943c0bb91588627de3801826546155ad8 from qemu	2018-03-04 22:22:49 -05:00
Richard Henderson	f96514a99c	tcg: Rearrange ldst label tracking Dispense with TCGBackendData, as it has never been used for more than holding a single pointer. Use a define in the cpu/tcg-target.h to signal requirement for TCGLabelQemuLdst, so that we can drop the no-op tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. Backports commit 659ef5cbb893872d25e9d95191cc23b16546c8a1 from qemu	2018-03-04 22:13:13 -05:00
Emilio G. Cota	e4dfb7f807	tcg/i386: implement goto_ptr Backports commit 5cb4ef80f65252dd85b86fa7f3c985015423d670 from qemu	2018-03-02 21:08:38 -05:00
Richard Henderson	4bec129626	tcg/i386: Handle ctpop opcode Backports commit 993508e43e6d180e9ba9b747a9657eac69aec5bb from qemu	2018-03-01 18:49:43 -05:00
Richard Henderson	246d891668	tcg/i386: Handle ctz and clz opcodes Backports commit bbf25f90ba802a286fd72be9175a860ae5fec726 from qemu	2018-03-01 16:56:08 -05:00
Richard Henderson	73ab332185	tcg/i386: Allow bmi2 shiftx to have non-matching operands Previously we could not have different constraints for different ISA levels, which prevented us from eliding the matching constraint for shifts. We do now have to make sure that the operands match for constant shifts. We can also handle some small left shifts via lea. Backports commit 6a5aed4bdc7078838a8098336588d56c9ce09d1d from qemu	2018-03-01 16:45:04 -05:00
Richard Henderson	9e3feebbfb	tcg/i386: Hoist common arguments in tcg_out_op Backports commit 42d5b514928a8a0d2f55a4c243d1333f9675815b from qemu	2018-03-01 16:42:30 -05:00
Richard Henderson	142ca07077	tcg/i386: Fuly convert tcg_target_op_def Use a switch instead of searching a table. Share constraints between 32-bit and 64-bit, when at all possible. Backports commit cd26449a505f808e479af4fdd539e05767e09c06 from qemu	2018-03-01 16:32:31 -05:00
Richard Henderson	3f38611159	tcg: Pass the opcode width to target_parse_constraint This will let us choose how to interpret a given constraint depending on whether the opcode is 32- or 64-bit. Which will let us share more constraint combinations between opcodes. At the same time, change the interface to return the advanced pointer instead of passing it in/out by reference. Backports commit 069ea736b50b75fdec99c9b8cc603b97bd98419e from qemu	2018-03-01 15:45:40 -05:00

1 2

63 commits