Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, X86)
Go to file
Longpeng(Mike) 8b5400d675
target-i386: present virtual L3 cache info for vcpus
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
......
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
......
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
......
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
......
}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.

For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
No-L3 With-L3(applied this patch)
cpu0:	363890	44582
cpu1:	373405	43109
cpu2:	340783	43797
cpu3:	333854	43409
cpu4:	327170	40038
cpu5:	325491	39922
cpu6:	319129	42391
cpu7:	306480	41035
cpu8:	161139	32188
cpu9:	164649	31024
cpu10:	149823	30398
cpu11:	149823	32455
cpu12:	164830	35143
cpu13:	172269	35805
cpu14:	179979	33898
cpu15:	194505	32754
avg:	268963.6	40129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).

Backports commit 14c985cffa6cb177fc01a163d8bcf227c104718c from qemu
2018-02-25 23:16:14 -05:00
bindings link to Crystal binding 2017-12-23 00:26:40 +08:00
docs Added note about installing tests dependencies on Mac OS X. Added note about tests failing when required architecture support is disabled in build. (#908) 2017-10-12 19:56:00 +08:00
include exec: avoid realloc in phys_map_node_reserve 2018-02-25 19:32:40 -05:00
msvc util: Move qemu-log to utils 2018-02-25 22:17:44 -05:00
qemu target-i386: present virtual L3 cache info for vcpus 2018-02-25 23:16:14 -05:00
samples Fixed register mistake in comments (#894) 2017-09-17 16:40:01 +07:00
tests add 64-bit test demonstrating setting MSRs and FS/GS segments (#901) 2017-09-29 04:26:23 +08:00
.appveyor.yml MSYS test (#852) 2017-06-25 10:11:35 +08:00
.gitignore arm64eb: add support for ARM64 big endian. 2017-04-24 23:30:01 +08:00
.travis.yml use new travis osx image and brew (#935) 2018-01-05 10:29:49 +08:00
AUTHORS.TXT import 2015-08-21 15:04:50 +08:00
Brewfile Update Brewfile 2017-09-30 17:36:44 +07:00
ChangeLog update ChangeLog 2017-04-20 13:28:02 +08:00
config.mk Fix document file extension 2016-08-08 17:33:49 +09:00
COPYING import 2015-08-21 15:04:50 +08:00
COPYING.LGPL2 LGPL2 for all header files under include/unicorn/ 2017-12-16 10:08:42 +08:00
COPYING_GLIB glib_compat: add COPYING_GLIB 2016-12-27 10:15:08 +08:00
CREDITS.TXT update CREDITS.TXT 2017-04-25 12:56:47 +08:00
install-cmocka-linux.sh Start moving examples in S files (#851) 2017-06-25 10:14:22 +08:00
list.c callback to count number of instructions in uc_emu_start() should be executed first. fix #727 2017-06-16 13:22:38 +08:00
make.sh Added MSVC support for arm64eb. 2017-04-25 14:23:58 +10:00
Makefile crypto: introduce new module for computing hash digests 2018-02-17 15:23:17 -05:00
msvc.bat add msvc.bat 2017-04-21 15:35:40 +08:00
pkgconfig.mk bump extra version to 2 2017-04-21 15:30:40 +08:00
README.md add Clojure 2017-12-23 00:32:33 +08:00
uc.c exec: avoid realloc in phys_map_node_reserve 2018-02-25 19:32:40 -05:00
windows_export.bat Make the call out to visual studio extremely resilient 2017-01-02 03:32:48 -08:00

Unicorn Engine

Join the chat at https://gitter.im/unicorn-engine/chat

Build Status Build status

Unicorn is a lightweight, multi-platform, multi-architecture CPU emulator framework based on QEMU.

Unicorn offers some unparalleled features:

  • Multi-architecture: ARM, ARM64 (ARMv8), M68K, MIPS, SPARC, and X86 (16, 32, 64-bit)
  • Clean/simple/lightweight/intuitive architecture-neutral API
  • Implemented in pure C language, with bindings for Crystal, Clojure, Visual Basic, Perl, Rust, Ruby, Python, Java, .NET, Go, Delphi/Free Pascal and Haskell.
  • Native support for Windows & *nix (with Mac OSX, Linux, *BSD & Solaris confirmed)
  • High performance via Just-In-Time compilation
  • Support for fine-grained instrumentation at various levels
  • Thread-safety by design
  • Distributed under free software license GPLv2

Further information is available at http://www.unicorn-engine.org

License

This project is released under the GPL license.

Compilation & Docs

See docs/COMPILE.md file for how to compile and install Unicorn.

More documentation is available in docs/README.md.

Contact

Contact us via mailing list, email or twitter for any questions.

Contribute

If you want to contribute, please pick up something from our Github issues.

We also maintain a list of more challenged problems in a TODO list.

CREDITS.TXT records important contributors of our project.