unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-07-01 03:08:31 +00:00

Author	SHA1	Message	Date
Richard Henderson	cd538f0b7e	tcg: Initialize cpu_env generically This is identical for each target. So, move the initialization to common code. Move the variable itself out of tcg_ctx and name it cpu_env to minimize changes within targets. This also means we can remove tcg_global_reg_new_{ptr,i32,i64}, since there are no longer global-register temps created by targets. Backports commit 1c2adb958fc07e5b3e81ed21b801c04a15f41f4f from qemu	2018-03-15 15:49:19 -04:00
Emilio G. Cota	23a55a277f	tcg: enable multiple TCG contexts in softmmu This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable and tcg_ctxs; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Backports commit 3468b59e18b179bc63c7ce934de912dfa9596122 from qemu	2018-03-14 14:32:34 -04:00
Emilio G. Cota	f772fd986d	tcg: introduce regions to split code_gen_buffer This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Backports commit e8feb96fcc6c16eab8923332e86ff4ef0e2ac276 from qemu	2018-03-14 12:10:29 -04:00
Emilio G. Cota	1be7b55bb4	tcg: introduce **tcg_ctxs to keep track of all TCGContext's Groundwork for supporting multiple TCG contexts. Note that having n_tcg_ctxs is unnecessary. However, it is convenient to have it, since it will simplify iterating over the array: we'll have just a for loop instead of having to iterate over a NULL-terminated array (which would require n+1 elems) or having to check with ifdef's for usermode/softmmu. Backports commit df2cce2968069526553d82331ce9817eaca6b03a from qemu	2018-03-14 12:10:25 -04:00
Lioncash	21b0afe218	tcg: Perform comparison pass with qemu Makes formatting and code consistent with qemu	2018-03-12 18:03:06 -04:00
Richard Henderson	ab8579123e	tcg: Add generic vector ops for multiplication Backports commit 3774030a3e523689df24a7ed22854ce7a06b0116 from qemu	2018-03-06 16:10:08 -05:00
Richard Henderson	f9c4930ecd	tcg: Add generic vector ops for comparisons Backports commit 212be173f01e85e6589fd76676827953a84a732b from qemu	2018-03-06 16:09:38 -05:00
Richard Henderson	577ee114c3	tcg: Add generic vector ops for constant shifts Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Backports commit d0ec97967f940bbc11dced83422b39c224127f1e from qemu	2018-03-06 14:03:30 -05:00
Richard Henderson	64365612bf	tcg: Add generic vector expanders Backports commit db432672dc50ed86dda17ac821b7eb07411a90af from qemu	2018-03-06 13:42:52 -05:00
Richard Henderson	b9cd924fa5	tcg: Add types and basic operations for host vectors Nothing uses or enables them yet. Backports commit d2fd745fe8b9ac574d28b7ac63c39f6529749bd2 from qemu	2018-03-06 12:13:32 -05:00
Richard Henderson	140058221d	tcg: Generalize TCGOp parameters We had two fields specific to INDEX_op_call. Rename these and add some macros so that the fields may be reused for other opcodes. Backports commit cd9090aa9dbba30db8aec9a2fc103aaf1ab0f5a7 from qemu	2018-03-05 16:53:50 -05:00
Richard Henderson	7fe5f620df	tcg: Dynamically allocate TCGOps With no fixed array allocation, we can't overflow a buffer. This will be important as optimizations related to host vectors may expand the number of ops used. Use QTAILQ to link the ops together. Backports commit 15fa08f8451babc88d733bd411d4c94976f9d0f8 from qemu	2018-03-05 16:34:40 -05:00
Richard Henderson	5f074f09ab	tcg: Remove TCGV_UNUSED* and TCGV_IS_UNUSED* These are now trivial sets and tests against NULL. Unwrap. Backports commit f764718d0cb30af9f1f8e1d6a33622cc05ca4155 from qemu	2018-03-05 15:58:15 -05:00
Richard Henderson	5ef155a68f	tcg/s390x: Use constant pool for prologue Rather than have separate code only used for guest_base, rely on a recent change to handle constant pool entries. Backports commit ba2c747992f8c315c2fbddba196ce9137430d61d from qemu	2018-03-05 11:28:39 -05:00
Richard Henderson	ef3f552229	tcg: Allow constant pool entries in the prologue Both ARMv6 and AArch64 currently may drop complex guest_base values into the constant pool. But generic code wasn't expecting that, and the pool is not emitted. Correct that. Backports commit 5b38ee31616d1532c3c3a6dc644a9160d608ed2f from qemu	2018-03-05 11:25:56 -05:00
Richard Henderson	ab9df6244c	tcg: Use offsets not indices for TCGv_* Using the offset of a temporary, relative to TCGContext, rather than its index means that we don't use 0. That leaves offset 0 free for a NULL representation without having to leave index 0 unused. Backports commit e89b28a63501c0ad6d2501fe851d0c5202055e70 from qemu	2018-03-05 10:12:08 -05:00
Richard Henderson	d450156414	tcg: Remove GET_TCGV_* and MAKE_TCGV_* The GET and MAKE functions weren't really specific enough. We now have a full complement of functions that convert exactly between temporaries, arguments, tcgv pointers, and indices. The target/sparc change is also a bug fix, which would have affected a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64. Backports commit dc41aa7d34989b552efe712ffe184236216f960b from qemu	2018-03-05 09:12:26 -05:00
Richard Henderson	960eb3f4f9	tcg: Introduce temp_tcgv_{i32,i64,ptr} Backports commit 085272b35e0644fea373c33b5265c1818b7a978c from qemu	2018-03-05 08:55:52 -05:00
Richard Henderson	2bb5011b18	tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp} Transform TCGv_* to an "argument" or a temporary. For now, an argument is simply the temporary index. Backports commit ae8b75dc6ec808378487064922f25f1e7ea7a9be from qemu	2018-03-05 08:46:12 -05:00
Richard Henderson	d104b792a6	tcg: Change temp_allocate_frame arg to TCGTemp Backports commit 2272e4a791b7e1a01ffac143616ba4ece9a5762d from qemu	2018-03-05 07:51:40 -05:00
Richard Henderson	35a7a9c9a4	tcg: Avoid loops against variable bounds Copy s->nb_globals or s->nb_temps to a local variable for the purposes of iteration. This should allow the compiler to use low-overhead looping constructs on some hosts. Backports commit ac3b88911ebc6fc841f28898ee8aed40839debe2 from qemu	2018-03-05 07:50:06 -05:00
Richard Henderson	1f4ac863bf	tcg: Use per-temp state data in liveness This avoids having to allocate external memory for each temporary. Backports commit b83eabeac06e38706738bd5e92b1ba117a1b554d from qemu	2018-03-05 07:47:51 -05:00
Richard Henderson	010ded3088	tcg: Add temp_global bit to TCGTemp This avoids needing to test the index of a temp against nb_globals. Backports commit fa477d25470187030614288d35bc734edffa41ee from qemu	2018-03-05 07:21:10 -05:00
Richard Henderson	a9c46ad7a0	tcg: Introduce arg_temp Backports commit 434391390ba99996af1591b427a73b3f5c05065e from qemu	2018-03-05 07:17:44 -05:00
Richard Henderson	c8f0f6901e	tcg: Propagate TCGOp down to allocators Backports commit dd186292017641d5b31fc13225a420677e1d20d3 from qemu	2018-03-05 07:12:48 -05:00
Richard Henderson	f1e2ea6847	tcg: Propagate args to op->args in tcg.c Backports commit efee3746fa471852daba7674b0d34f8c88be7559 from qemu	2018-03-05 07:06:50 -05:00
Richard Henderson	eb488f5bd6	tcg: Merge opcode arguments into TCGOp Rather than have a separate buffer of 10*max_ops entries, give each opcode 10 entries. The result is actually a bit smaller and should have slightly more cache locality. Backports commit 75e8b9b7aa0b95a761b9add7e2f09248b101a392 from qemu	2018-03-05 04:45:20 -05:00
Emilio G. Cota	239e9771df	tcg: define TCG_HIGHWATER Will come in handy very soon. Backports commit a505785cd221994dd3713bde860861869a059940 from qemu	2018-03-05 03:22:27 -05:00
Emilio G. Cota	8552d95c52	exec-all: extract tb->tc_* into a separate struct tc_tb In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Backports commit e7e168f41364c6e83d0f75fc1b3ce7f9c41ccf76 from qemu	2018-03-05 02:57:22 -05:00
Richard Henderson	9a9c2ede4a	tcg: Remove tcg_regset_{or,and,andnot,not} Backports commit 07ddf036fa66bca279590c09fe1c46bcdcc5bcff from qemu	2018-03-04 23:34:16 -05:00
Richard Henderson	7ba6f6f5e6	tcg: Remove tcg_regset_set Backports commit d21369f5fb41299d5e7b032ec6da12da7f95f72f from qemu	2018-03-04 23:31:35 -05:00
Richard Henderson	49d09d6888	tcg: Remove tcg_regset_clear Backports commit ccb1bb66ea2a42e773bfa04178d8b383ff86d4d8 from qemu	2018-03-04 23:24:45 -05:00
Richard Henderson	7b68a8f0ca	tcg: Add tcg_op_supported Backports commit be0f34b5840312bbe9627c2b9f68a25f32903dae from qemu	2018-03-04 23:20:28 -05:00
Richard Henderson	e9d8cef430	tcg: Infrastructure for managing constant pools A new shared header tcg-pool.inc.c adds new_pool_label, for registering a tcg_target_ulong to be emitted after the generated code, plus relocation data to install a pointer to the data. A new pointer is added to the TCGContext, so that we dump the constant pool as data, not code. Backports commit 57a269469dbf70013dab3a176e1735636010a772 from qemu	2018-03-04 22:17:33 -05:00
Richard Henderson	f96514a99c	tcg: Rearrange ldst label tracking Dispense with TCGBackendData, as it has never been used for more than holding a single pointer. Use a define in the cpu/tcg-target.h to signal requirement for TCGLabelQemuLdst, so that we can drop the no-op tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. Backports commit 659ef5cbb893872d25e9d95191cc23b16546c8a1 from qemu	2018-03-04 22:13:13 -05:00
Emilio G. Cota	d3ada2feb5	tcg: allocate TB structs before the corresponding translated code Allocating an arbitrarily-sized array of tbs results in either (a) a lot of memory wasted or (b) unnecessary flushes of the code cache when we run out of TB structs in the array. An obvious solution would be to just malloc a TB struct when needed, and keep the TB array as an array of pointers (recall that tb_find_pc() needs the TB array to run in O(log n)). Perhaps a better solution, which is implemented in this patch, is to allocate TB's right before the translated code they describe. This results in some memory waste due to padding to have code and TBs in separate cache lines--for instance, I measured 4.7% of padding in the used portion of code_gen_buffer when booting aarch64 Linux on a host with 64-byte cache lines. However, it can allow for optimizations in some host architectures, since TCG backends could safely assume that the TB and the corresponding translated code are very close to each other in memory. See this message by rth for a detailed explanation: https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html Subject: Re: GSoC 2017 Proposal: TCG performance enhancements Backports commit 6e3b2bfd6af488a896f7936e99ef160f8f37e6f2 from qemu	2018-03-03 17:05:49 -05:00
Emilio G. Cota	8f4f15e5f5	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu	2018-03-02 21:05:18 -05:00
Richard Henderson	b4b173615c	tcg: Allow an operand to be matching or a constant This allows an output operand to match an input operand only when the input operand needs a register. Backports commit 17280ff4a5f264e01e55ae514ee6d3586f9577b2 from qemu	2018-03-01 15:49:05 -05:00
Richard Henderson	3f38611159	tcg: Pass the opcode width to target_parse_constraint This will let us choose how to interpret a given constraint depending on whether the opcode is 32- or 64-bit. Which will let us share more constraint combinations between opcodes. At the same time, change the interface to return the advanced pointer instead of passing it in/out by reference. Backports commit 069ea736b50b75fdec99c9b8cc603b97bd98419e from qemu	2018-03-01 15:45:40 -05:00
Richard Henderson	b8c93597b4	tcg: Transition flat op_defs array to a target callback This will allow the target to tailor the constraints to the auto-detected ISA extensions. Backports commit f69d277ece43c42c7ab0144c2ff05ba740f6706b from qemu	2018-03-01 15:40:11 -05:00
Richard Henderson	551ef0a9f7	tcg: Add markup for output requires new register This is the same concept as, and same markup as, the early clobber markup in gcc. Backports commit 82790a870992bd87d5fd9e607f40859dcf4f82ac from qemu	2018-03-01 15:24:58 -05:00
Paolo Bonzini	8734e13a73	tcg: try sti when moving a constant into a dead memory temp This comes from free from unifying tcg_reg_alloc_mov and tcg_reg_alloc_movi's handling of TEMP_VAL_CONST. It triggers often on moves to cc_dst, such as the following translation of "sub $0x3c,%esp": before: after: subl $0x3c,%ebp subl $0x3c,%ebp movl %ebp,0x10(%r14) movl %ebp,0x10(%r14) movl $0x3c,%ebx movl $0x3c,0x2c(%r14) movl %ebx,0x2c(%r14) Backports commit 0fe4fca4e1a5e06a270127dd80bb753d4dda61c6 from qemu	2018-02-26 10:08:47 -05:00
Thomas Huth	b581d4033f	tcg: Remove duplicate header includes host-utils.h and timer.h are included twice in tcg.c. One time should be enough. Backports commit 347519eb9d68303a6c23a7663c0fa6c20a225191 from qemu	2018-02-26 02:29:38 -05:00
Richard Henderson	ede1cae3dc	tcg: Lower indirect registers in a separate pass Rather than rely on recursion during the middle of register allocation, lower indirect registers to loads and stores off the indirect base into plain temps. For an x86_64 host, with sufficient registers, this results in identical code, modulo the actual register assignments. For an i686 host, with insufficient registers, this means that temps can be (temporarily) spilled to the stack in order to satisfy an allocation. This as opposed to the possibility of not being able to spill, to allocate a register for the indirect base, in order to perform a spill. Backports commit 5a18407f55ade924aa6397c9a043a9ffd59645fe from qemu	2018-02-25 22:32:28 -05:00
Richard Henderson	8a012ff6d3	tcg: Require liveness analysis Backports commit c0ef05b5e62ab0c291a94022f14104e61e306f03 from qemu	2018-02-25 22:20:42 -05:00
Richard Henderson	2aa46dd9a1	tcg: Include liveness info in the dumps Backports commit bdfb460ef77500f7b186759b585f06ff2120929d from qemu	2018-02-25 22:13:08 -05:00
Richard Henderson	e973e89a57	tcg: Compress dead_temps and mem_temps into a single array We only need two bits per temporary. Fold the two bytes into one, and reduce the memory and cachelines required during compilation. Backports commit c70fbf0a9938baf3b4f843355a77c17a7e945b98 from qemu	2018-02-25 22:07:08 -05:00
Richard Henderson	690985a582	tcg: Fold life data into TCGOp Reduce the size of other bitfields to make room. This reduces the cache footprint of compilation. Backports commit bee158cb4dde35c41632a3a129c869f14a32f8f0 from qemu	2018-02-25 21:49:42 -05:00
Richard Henderson	1547048a22	tcg: Reorg TCGOp chaining Instead of using -1 as end of chain, use 0, and link through the 0 entry as a fully circular double-linked list. Backports commit dcb8e75870e2de199db853697f8839cb603beefe from qemu	2018-02-25 21:44:50 -05:00
Richard Henderson	b2e6e351c2	tcg: Compress liveness data to 16 bits This reduces both memory usage and per-insn cacheline usage during code generation. Backports commit a1b3c48d2b23d6eaeb4529d3e1183d2648731bf8 from qemu	2018-02-25 21:27:24 -05:00

1 2 3

112 commits