unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-12-22 04:51:24 +00:00

History

Emilio G. Cota f772fd986d tcg: introduce regions to split code_gen_buffer This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Backports commit e8feb96fcc6c16eab8923332e86ff4ef0e2ac276 from qemu		2018-03-14 12:10:29 -04:00
..
accel	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
crypto	crypto: Clean up includes	2018-02-19 00:47:40 -05:00
default-configs	arm64eb: add support for ARM64 big endian.	2017-04-24 23:30:01 +08:00
docs	docs: clarify memory region lifecycle	2018-02-12 15:11:21 -05:00
fpu	softfloat: fix crash on int conversion of SNaN	2018-03-09 11:40:17 -05:00
hw	target/arm: Make 'any' CPU just an alias for 'max'	2018-03-12 10:11:49 -04:00
include	cpu: Unicorn-ify the qemu_tcg_mttcg_enabled() macro	2018-03-14 12:10:29 -04:00
qapi	qapi: Move qapi-schema.json to qapi/, rename generated files	2018-03-09 11:35:11 -05:00
qobject	qdict: Introduce qdict_rename_keys()	2018-03-12 10:11:48 -04:00
qom	tcg: Add CPUState cflags_next_tb	2018-03-13 14:39:43 -04:00
scripts	qapi: Move qapi-schema.json to qapi/, rename generated files	2018-03-09 11:35:11 -05:00
target	tcg: take tb_ctx out of TCGContext	2018-03-14 09:18:12 -04:00
tcg	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
util	osdep: introduce qemu_mprotect_rwx/none	2018-03-14 12:10:28 -04:00
aarch64.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
aarch64eb.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
accel.c	clean-up: removed duplicate #includes	2018-02-28 08:51:56 -05:00
arm.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
armeb.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
CODING_STYLE	import	2015-08-21 15:04:50 +08:00
configure	tcg: move tcg backend files into accel/tcg/	2018-03-13 11:48:15 -04:00
COPYING	import	2015-08-21 15:04:50 +08:00
COPYING.LIB	import	2015-08-21 15:04:50 +08:00
cpus.c	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
exec.c	exec: Drop unnecessary code for unicorn	2018-03-12 10:11:46 -04:00
gen_all_header.sh	arm64eb: add support for ARM64 big endian.	2017-04-24 23:30:01 +08:00
glib_compat.c	translate-all: use a binary search tree to track TBs in TBContext	2018-03-13 16:18:29 -04:00
HACKING	import	2015-08-21 15:04:50 +08:00
header_gen.py	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
ioport.c	hw: remove pio_addr_t	2018-02-24 02:43:16 -05:00
LICENSE	import	2015-08-21 15:04:50 +08:00
m68k.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
Makefile	qapi: Don't create useless directory qapi-generated	2018-03-09 11:36:49 -05:00
Makefile.objs	qapi: Move qapi-schema.json to qapi/, rename generated files	2018-03-09 11:35:11 -05:00
Makefile.target	tcg: move tcg backend files into accel/tcg/	2018-03-13 11:48:15 -04:00
memory.c	memory: Share special empty FlatView	2018-03-11 22:34:28 -04:00
memory_ldst.inc.c	exec: Drop unnecessary code for unicorn	2018-03-12 10:11:46 -04:00
memory_mapping.c	include/qemu/osdep.h: Don't include qapi/error.h	2018-02-21 23:08:18 -05:00
mips.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
mips64.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
mips64el.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
mipsel.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
powerpc.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
qemu-timer.c	timer/cpus: fix some typos and update some comments	2018-02-25 23:21:57 -05:00
rules.mak	build-sys: silence make by default or V=0	2018-03-06 08:58:03 -05:00
sparc.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
sparc64.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00
unicorn_common.h	tcg: take tb_ctx out of TCGContext	2018-03-14 09:18:12 -04:00
VERSION	import	2015-08-21 15:04:50 +08:00
vl.c	machine: Eliminate QEMUMachine and qemu_register_machine()	2018-03-11 15:22:25 -04:00
vl.h	import	2015-08-21 15:04:50 +08:00
x86_64.h	tcg: introduce regions to split code_gen_buffer	2018-03-14 12:10:29 -04:00