mirror of https://github.com/yuzu-emu/unicorn.git synced 2026-07-20 15:54:11 +00:00

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, X86)

Find a file

Emilio G. Cota 6bc05eeee4 tb hash: track translated blocks with qht Having a fixed-size hash table for keeping track of all translation blocks is suboptimal: some workloads are just too big or too small to get maximum performance from the hash table. The MRU promotion policy helps improve performance when the hash table is a little undersized, but it cannot make up for severely undersized hash tables. Furthermore, frequent MRU promotions result in writes that are a scalability bottleneck. For scalability, lookups should only perform reads, not writes. This is not a big deal for now, but it will become one once MTTCG matures. The appended fixes these issues by using qht as the implementation of the TB hash table. This solution is superior to other alternatives considered, namely: - master: implementation in QEMU before this patchset - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU. - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU. MRU is implemented here by adding an intermediate struct that contains the u32 hash and a pointer to the TB; this allows us, on an MRU promotion, to copy said struct (that is not at the head), and put this new copy at the head. After a grace period, the original non-head struct can be eliminated, and after another grace period, freed. - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize + no MRU for lookups; MRU for inserts. The appended solution is the following: - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize + no MRU for lookups; MRU for inserts. The plots below compare the considered solutions. The Y axis shows the boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis sweeps the number of buckets (or initial number of buckets for qht-autoresize). The plots in PNG format (and with errorbars) can be seen here: http://imgur.com/a/Awgnq Each test runs 5 times, and the entire QEMU process is pinned to a single core for repeatability of results. Host: Intel Xeon E5-2690 28 ++------------+-------------+-------------+-------------+------------++ A*** + + + master A*** + 27 ++ * xxhash ##B###++ \| A****A** xxhash-rcu $$C$$$ \| 26 C$$ A**A**** qht-fixed-nomru%%D%%%++ D%%$$ A***A***Aqht-dyn-mru AE*A 25 ++ %%$$ qht-dyn-nomru &&F&&&++ B#####% \| 24 ++ #C$$$$$ ++ \| B### $ \| \| ## C$$$$$$ \| 23 ++ # C$$$$$$ ++ \| B###### C$$$$$$ %%%D 22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C \| D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E 21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B + E@@@ F&&& + E@ + F&&& + + 20 ++------------+-------------+-------------+-------------+------------++ 14 16 18 20 22 24 log2 number of buckets Host: Intel i7-4790K 14.5 ++------------+------------+-------------+------------+------------++ A + + + master A* + 14 ++ xxhash ##B###++ 13.5 ++ xxhash-rcu $$C$$$++ \| qht-fixed-nomru %%D%%% \| 13 ++ A**** qht-dyn-mru @@E@@@++ \| A*A**A** qht-dyn-nomru &&F&&& \| 12.5 C$$ A**A**A*A** A 12 ++ $$ A ++ D%%% $$ \| 11.5 ++ %% ++ B### %C$$$$$$ \| 11 ++ ## D%%%%% C$$$$$ ++ \| # % C$$$$$$ \| 10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C 10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B + F&& D%%%%%%B######B######B#####B###@@@D%%% + 9.5 ++------------+------------+-------------+------------+------------++ 14 16 18 20 22 24 log2 number of buckets Note that the original point before this patch series is X=15 for "master"; the little sensitivity to the increased number of buckets is due to the poor hashing function in master. xxhash-rcu has significant overhead due to the constant churn of allocating and deallocating intermediate structs for implementing MRU. An alternative would be do consider failed lookups as "maybe not there", and then acquire the external lock (tb_lock in this case) to really confirm that there was indeed a failed lookup. This, however, would not be enough to implement dynamic resizing--this is more complex: see "Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming" by Triplett, McKenney and Walpole. This solution was discarded due to the very coarse RCU read critical sections that we have in MTTCG; resizing requires waiting for readers after every pointer update, and resizes require many pointer updates, so this would quickly become prohibitive. qht-fixed-nomru shows that MRU promotion is advisable for undersized hash tables. However, qht-dyn-mru shows that MRU promotion is not important if the hash table is properly sized: there is virtually no difference in performance between qht-dyn-nomru and qht-dyn-mru. Before this patch, we're at X=15 on "xxhash"; after this patch, we're at X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we can achieve with optimum sizing of the hash table, while keeping the hash table scalable for readers. The improvement we get before and after this patch for booting debian jessie with arm-softmmu is: - Intel Xeon E5-2690: 10.5% less time - Intel i7-4790K: 5.2% less time We could get this same improvement _for this particular workload_ by statically increasing the size of the hash table. But this would hurt workloads that do not need a large hash table. The dynamic (upward) resizing allows us to start small and enlarge the hash table as needed. A quick note on downsizing: the table is resized back to 215 buckets on every tb_flush; this makes sense because it is not guaranteed that the table will reach the same number of TBs later on (e.g. most bootup code is thrown away after boot); it makes sense to grow the hash table as more code blocks are translated. This also avoids the complication of having to build downsizing hysteresis logic into qht. Backports commit 909eaac9bbc2ed4f3a82ce38e905b87d478a3e00 from qemu		2018-03-13 14:16:26 -04:00
bindings	link to Crystal binding	2017-12-23 00:26:40 +08:00
docs	Added note about installing tests dependencies on Mac OS X. Added note about tests failing when required architecture support is disabled in build. (#908 )	2017-10-12 19:56:00 +08:00
include	memory: Share special empty FlatView	2018-03-11 22:34:28 -04:00
msvc	tcg: move tcg backend files into accel/tcg/	2018-03-13 11:48:15 -04:00
qemu	tb hash: track translated blocks with qht	2018-03-13 14:16:26 -04:00
samples	Fixed register mistake in comments (#894 )	2017-09-17 16:40:01 +07:00
tests	add 64-bit test demonstrating setting MSRs and FS/GS segments (#901 )	2017-09-29 04:26:23 +08:00
.appveyor.yml	MSYS test (#852 )	2017-06-25 10:11:35 +08:00
.gitignore	qapi: Move qapi-schema.json to qapi/, rename generated files	2018-03-09 11:35:11 -05:00
.travis.yml	use new travis osx image and brew (#935 )	2018-01-05 10:29:49 +08:00
AUTHORS.TXT	import	2015-08-21 15:04:50 +08:00
Brewfile	Update Brewfile	2017-09-30 17:36:44 +07:00
ChangeLog	update ChangeLog	2017-04-20 13:28:02 +08:00
config.mk	Fix document file extension	2016-08-08 17:33:49 +09:00
COPYING	import	2015-08-21 15:04:50 +08:00
COPYING.LGPL2	LGPL2 for all header files under include/unicorn/	2017-12-16 10:08:42 +08:00
COPYING_GLIB	glib_compat: add COPYING_GLIB	2016-12-27 10:15:08 +08:00
CREDITS.TXT	update CREDITS.TXT	2017-04-25 12:56:47 +08:00
install-cmocka-linux.sh	Start moving examples in S files (#851 )	2017-06-25 10:14:22 +08:00
list.c	callback to count number of instructions in uc_emu_start() should be executed first. fix #727	2017-06-16 13:22:38 +08:00
make.sh	Added MSVC support for arm64eb.	2017-04-25 14:23:58 +10:00
Makefile	crypto: introduce new module for computing hash digests	2018-02-17 15:23:17 -05:00
msvc.bat	add msvc.bat	2017-04-21 15:35:40 +08:00
pkgconfig.mk	bump extra version to 2	2017-04-21 15:30:40 +08:00
README.md	add Clojure	2017-12-23 00:32:33 +08:00
uc.c	exec: Drop unnecessary code for unicorn	2018-03-12 10:11:46 -04:00
windows_export.bat	Make the call out to visual studio extremely resilient	2017-01-02 03:32:48 -08:00

README.md

Unicorn Engine

Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework based on QEMU.

Unicorn offers some unparalleled features:

Multi-architecture: ARM, ARM64 (ARMv8), M68K, MIPS, SPARC, and X86 (16, 32, 64-bit)
Clean/simple/lightweight/intuitive architecture-neutral API
Implemented in pure C language, with bindings for Crystal, Clojure, Visual Basic, Perl, Rust, Ruby, Python, Java, .NET, Go, Delphi/Free Pascal and Haskell.
Native support for Windows & *nix (with Mac OSX, Linux, *BSD & Solaris confirmed)
High performance via Just-In-Time compilation
Support for fine-grained instrumentation at various levels
Thread-safety by design
Distributed under free software license GPLv2

Further information is available at http://www.unicorn-engine.org

License

This project is released under the GPL license.

Compilation & Docs

See docs/COMPILE.md file for how to compile and install Unicorn.

More documentation is available in docs/README.md.

Contact

Contribute

If you want to contribute, please pick up something from our Github issues.

We also maintain a list of more challenged problems in a TODO list.

CREDITS.TXT records important contributors of our project.