unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-07-03 02:28:18 +00:00

Author	SHA1	Message	Date
Emilio G. Cota	23a55a277f	tcg: enable multiple TCG contexts in softmmu This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable and tcg_ctxs; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Backports commit 3468b59e18b179bc63c7ce934de912dfa9596122 from qemu	2018-03-14 14:32:34 -04:00
Emilio G. Cota	f772fd986d	tcg: introduce regions to split code_gen_buffer This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Backports commit e8feb96fcc6c16eab8923332e86ff4ef0e2ac276 from qemu	2018-03-14 12:10:29 -04:00
Markus Armbruster	5d554fefeb	Include qapi/error.h exactly where needed This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Backports commit e688df6bc4549f28534cdb001f168b8caae55b0c from qemu	2018-03-07 12:26:38 -05:00
Peter Xu	1bb34aadf9	cpu: refactor cpu_address_space_init() Normally we create an address space for that CPU and pass that address space into the function. Let's just do it inside to unify address space creations. It'll simplify my next patch to rename those address spaces. Backports commit 80ceb07a83375e3a0091591f96bd47bce2f640ce from qemu	2018-03-05 14:39:25 -05:00
Alex Bennée	d56a4b0be4	tcg: handle EXCP_ATOMIC exception for system emulation The patch enables handling atomic code in the guest. This should be preferably done in cpu_handle_exception(), but the current assumptions regarding when we can execute atomic sections cause a deadlock. The current mechanism discards the flags which were set in atomic execution. We ensure they are properly saved by calling the cc->cpu_exec_enter/leave() functions around the loop. As we are running cpu_exec_step_atomic() from the outermost loop we need to avoid an abort() when single stepping over atomic code since debug exception longjmp will point to the the setlongjmp in cpu_exec(). We do this by setting a new jmp_env so that it jumps back here on an exception. Backports relevant parts of commit 08e73c48b053566bfe0c994f154f73991cd0ff0e from qemu	2018-03-02 09:56:43 -05:00
Alex Bennée	632b853761	tcg: remove global exit_request There are now only two uses of the global exit_request left. The first ensures we exit the run_loop when we first start to process pending work and in the kick handler. This is just as easily done by setting the first_cpu->exit_request flag. The second use is in the round robin kick routine. The global exit_request ensured every vCPU would set its local exit_request and cause a full exit of the loop. Now the iothread isn't being held while running we can just rely on the kick handler to push us out as intended. We lightly re-factor the main vCPU thread to ensure cpu->exit_requests cause us to exit the main loop and process any IO requests that might come along. As an cpu->exit_request may legitimately get squashed while processing the EXCP_INTERRUPT exception we also check cpu->queued_work_first to ensure queued work is expedited as soon as possible. Backports commit e5143e30fb87fbf179029387f83f98a5a9b27f19 from qemu	2018-03-02 09:38:08 -05:00
KONRAD Frederic	c5730ff194	tcg: add options for enabling MTTCG We know there will be cases where MTTCG won't work until additional work is done in the front/back ends to support. It will however be useful to be able to turn it on. As a result MTTCG will default to off unless the combination is supported. However the user can turn it on for the sake of testing. Backports commit 8d4e9146b3568022ea5730d92841345d41275d66 from qemu	2018-03-02 09:25:01 -05:00
Richard Henderson	e35aacd5ae	tcg: Add EXCP_ATOMIC When we cannot emulate an atomic operation within a parallel context, this exception allows us to stop the world and try again in a serial context. Backports commit fdbc2b5722f6092e47181a947c90fd4bdcc1c121 from qemu Also backports parts of commit 02d57ea115b7669f588371c86484a2e8ebc369be	2018-02-27 11:57:58 -05:00
Alex Bennée	33589eb75f	cpus: pass CPUState to run_on_cpu helpers CPUState is a fairly common pointer to pass to these helpers. This means if you need other arguments for the async_run_on_cpu case you end up having to do a g_malloc to stuff additional data into the routine. For the current users this isn't a massive deal but for MTTCG this gets cumbersome when the only other parameter is often an address. This adds the typedef run_on_cpu_func for helper functions which has an explicit CPUState * passed as the first parameter. All the users of run_on_cpu and async_run_on_cpu have had their helpers updated to use CPUState where available. Backports commit e0eeb4a21a3ca4b296220ce4449d8acef9de9049 from qemu	2018-02-26 04:54:55 -05:00
Paolo Bonzini	9485b7c2e1	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Backports commit 63c915526d6a54a95919ebece83fa9ca631b2508 from qemu	2018-02-24 02:39:08 -05:00
Paolo Bonzini	37f26922dd	qemu-common: push cpu.h inclusion out of qemu-common.h Backports commit 33c11879fd422b759483ed25fef133ea900ea8d7 from qemu	2018-02-24 01:50:56 -05:00
Peter Maydell	293266a9d8	exec: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Backports commit 7b31bbc2e68605ab2f10dc609dd54cf4c7b5f49a from qemu	2018-02-19 00:49:55 -05:00
Peter Crosthwaite	ce997e1caf	qom/cpu: Add MemoryRegion property Add a MemoryRegion property, which if set is used to construct the CPU's initial (default) AddressSpace. Backports commit 6731d864f80938e404dc3e5eb7f6b76b891e3e43 from qemu	2018-02-18 21:54:50 -05:00
Lioncash	2210c7f486	cpus: Relocate address space initialization Moves it to qemu_init_vcpu where it belongs	2018-02-18 21:05:04 -05:00
Peter Maydell	51369b67cd	exec.c: Allow target CPUs to define multiple AddressSpaces Allow multiple calls to cpu_address_space_init(); each call adds an entry to the cpu->ases array at the specified index. It is up to the target-specific CPU code to actually use these extra address spaces. Since this multiple AddressSpace support won't work with KVM, add an assertion to avoid confusing failures. Backports commit 12ebc9a76dd7702aef0a3618717a826c19c34ef4 from qemu	2018-02-17 22:35:13 -05:00
Peter Maydell	f1b237236c	exec.c: Don't set cpu->as until cpu_address_space_init Rather than setting cpu->as unconditionally in cpu_exec_init (and then having target-i386 override this later), don't set it until the first call to cpu_address_space_init. This requires us to initialise the address space for both TCG and KVM (KVM doesn't need the AS listener but it does require cpu->as to be set). For target CPUs which don't set up any address spaces (currently everything except i386), add the default address_space_memory in qemu_init_vcpu(). Backports commit 56943e8cc14b7eeeab67d1942fa5d8bcafe3e53f from qemu	2018-02-17 22:24:36 -05:00
Peter Crosthwaite	e51f8c9f6f	cpu-exec: Purge all uses of ENV_GET_CPU() Remove un-needed usages of ENV_GET_CPU() by converting the APIs to use CPUState pointers and retrieving the env_ptr as minimally needed. Scripted conversion for target-* change: for I in target-/cpu.h; do sed -i \ 's/$^int cpu_[^_]_exec($[^ ][^ ]* \s);$/\1CPUState cpu);/' \ $I; done Backports commit ea3e9847408131abc840240bd61e892d28459452 from qemu	2018-02-17 15:23:18 -05:00
Peter Crosthwaite	50b6fa93a8	cpu: Change tcg_cpu_exec() arg to cpu, not env The sole caller of this function navigates the cpu->env_ptr only for this function to take it back the cpu pointer straight away. Pass in cpu pointer instead and grab the env pointer locally in the function. Removes a core code usage of ENV_GET_CPU(). Backports commit 3d57f7893c90d911d786cb2c622b0926fc808b57 from qemu	2018-02-17 15:23:18 -05:00
Nguyen Anh Quynh	52cb0ba78e	cleanup more synchronization code	2017-01-09 14:05:39 +08:00
Chris Eagle	fccbcfd4c2	revert to use of g_free to make future qemu integrations easier (#695 ) * revert to use of g_free to make future qemu integrations easier * bracing	2016-12-21 22:28:36 +08:00
Chris Eagle	e46545f722	remove glib dependency by provide compatible replacements	2016-12-18 14:56:58 -08:00
Ryan Hileman	cb615fdba7	remove uc->cpus	2016-09-23 07:38:21 -07:00
Ryan Hileman	f99030179c	fix free() -> g_free()	2016-08-11 07:49:19 -07:00
danghvu	ada1c13662	Fix memleak: do not re-initialize halt_cond	2016-07-06 01:49:10 -05:00
Nguyen Anh Quynh	3a742fb6f6	fix conflicts when merging no-thread to master	2016-04-23 10:06:57 +08:00
Chris Eagle	9467254fc0	strip out per cpu thread code	2016-03-25 17:24:28 -07:00
Ryan Hileman	f0af8f8282	execute cpus in same thread as uc_emu_start() note: I'm sure this makes some dead code	2016-03-23 22:50:56 -07:00
Nguyen Anh Quynh	20b01a6933	fix merge conflict	2016-02-01 12:08:38 +08:00
Nguyen Anh Quynh	e750a4e97c	when uc_mem_exec() remove EXE permission, quit current TB & continue emulating with TB flushed. this fixes issue in PR #378	2016-01-28 00:56:55 +08:00
Nguyen Anh Quynh	580bc7b56a	cleanup	2016-01-10 23:10:00 +08:00
farmdve	036763d6ae	Fix memory leaks as reported by DrMemory and Valgrind. ARM and probably the rest of the arches have significant memory leaks as they have no release interface. Additionally, DrMemory does not have 64-bit support and thus I can't test the 64-bit version under Windows. Under Linux valgrind supports both 32-bit and 64-bit but there are different macros and code for Linux and Windows.	2016-01-08 01:42:56 +02:00
Nguyen Anh Quynh	2f297bdd3a	handle some errors properly so avoid exit() during initialization. this fixes issue #237	2015-11-12 01:43:41 +08:00
Nguyen Anh Quynh	8b39ec5b0c	initial support to remove a static variable in qemu-thread-win32.c	2015-09-02 16:13:12 +08:00
Nguyen Anh Quynh	344d016104	import	2015-08-21 15:04:50 +08:00

34 commits