mirror of
https://github.com/yuzu-emu/unicorn.git
synced 2025-02-02 15:21:00 +00:00
3dc16ebca3
It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 +Emaster +-N--+ ++ || | |++ | || | 15 +++ ++ |N| | |+| | 10 ++| ++ |+|+ | | | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E| |+E+E+- +++ +E+------+E+-- | 5 ++|+ ++ |+N+H+--- +++ | ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- | 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 ++master +-N--+ ++ |E| | |++ | ||E | 15 ++| ++ |N|| | |+|| ---+E+------+E+-----+E+------+E| 10 ++| | ---+E+------+E+-----+E+--- +++ +++ ||H+E+--+E+-- | |+++++ | | || | 5 ++|+H+-- +++ ++ |+N+ - ---+H+------+H+------ | + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | 35 +cmpxchg +-H--+ ++ | master +-N--+ ---+E+------+E+------+E+-----+E+------+E| 30 ++| ---+E+-- +++ ++ | | -+E+--- | 25 ++E ---- +++ ++ |+++++ -+E+ | 20 +E+ E-- +++ ++ |H|+++ | |+| +H+------- | 15 ++H+ ---+++ +H+------ ++ |N++H+-- +++--- +H+------++| 10 ++ +++ - +++ ---+H+ +++ +H+ | | +H+-----+H+------+H+-- | 5 ++| +++ ++ ++N+N+--+N++ + + + + + | 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 140 +cmpxchg +-H--+ +++ +++ ++ | master +-N--+ E--------E------+E+------++| 120 ++ --| | +++ E+ | -- +++ +++ ++| 100 ++ - ++ | +++- +++ ++| 80 ++ -+E+ -+H+------+H+------H--------++ | ---- ---- +++ H| | ---+E+-----+E+- ---+H+ ++| 60 ++ +E+--- +++ ---+H+--- ++ | --+++ ---+H+-- | 40 ++ +E+-+H+--- ++ | +H+ | 20 +EE+ ++ +N+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 300 +cmpxchg +-H--+ +++ | master +-N--+ +++ || | +++ | ----E| 250 ++ | ----E---- ++ | ----E--- | ---+H| 200 ++ -+E+--- +++ ---+H+--- ++ | ---- -+H+-- | | +E+ +++ ---- +++ | 150 ++ ---+++ ---+H+- ++ | --- -+H+-- | 100 ++ ---+E+ ---- +++ ++ | +++ ---+E+-----+H+- | | -+E+------+H+-- | 50 ++ +E+ ++ +EE+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu |
||
---|---|---|
bindings | ||
docs | ||
include | ||
msvc | ||
qemu | ||
samples | ||
tests | ||
.appveyor.yml | ||
.gitignore | ||
.travis.yml | ||
AUTHORS.TXT | ||
Brewfile | ||
ChangeLog | ||
config.mk | ||
COPYING | ||
COPYING.LGPL2 | ||
COPYING_GLIB | ||
CREDITS.TXT | ||
install-cmocka-linux.sh | ||
list.c | ||
make.sh | ||
Makefile | ||
msvc.bat | ||
pkgconfig.mk | ||
README.md | ||
uc.c | ||
windows_export.bat |
Unicorn Engine
Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework based on QEMU.
Unicorn offers some unparalleled features:
- Multi-architecture: ARM, ARM64 (ARMv8), M68K, MIPS, SPARC, and X86 (16, 32, 64-bit)
- Clean/simple/lightweight/intuitive architecture-neutral API
- Implemented in pure C language, with bindings for Crystal, Clojure, Visual Basic, Perl, Rust, Ruby, Python, Java, .NET, Go, Delphi/Free Pascal and Haskell.
- Native support for Windows & *nix (with Mac OSX, Linux, *BSD & Solaris confirmed)
- High performance via Just-In-Time compilation
- Support for fine-grained instrumentation at various levels
- Thread-safety by design
- Distributed under free software license GPLv2
Further information is available at http://www.unicorn-engine.org
License
This project is released under the GPL license.
Compilation & Docs
See docs/COMPILE.md file for how to compile and install Unicorn.
More documentation is available in docs/README.md.
Contact
Contact us via mailing list, email or twitter for any questions.
Contribute
If you want to contribute, please pick up something from our Github issues.
We also maintain a list of more challenged problems in a TODO list.
CREDITS.TXT records important contributors of our project.