unicorn/qemu
Emilio G. Cota ec14a00925
target-arm: emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

atomic_add-bench: 1000000 ops/thread, [0,1] range

9 ++---------+----------+----------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
8 +Emaster +-H--+ ++
| | |
7 ++E ++
| | |
6 ++++ ++
| | |
5 ++ | ++
4 ++ | ++
| | |
3 ++ | ++
| | |
2 ++ | ++
|H++E+--- +++ ---+E+------+E+------+E|
1 +++ +E+-----+E+------+E+------+E+------+E+-- +++ +++ ++
++H+ + +++ + +++ ++++ + + + |
0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,2] range

16 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
14 ++master +-H--+ ++
| | |
12 ++| ++
| E |
10 ++| ++
| | |
8 ++++ ++
|E+| |
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- +++ +++ +++ ---+E+------+E|
2 +H+ +E+------E-------+E+-----+E+------+E+------+E+-- +++
+ | + +++ + ++++ + + + |
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,128] range

70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + ++++ + |
60 ++master +-H--+ ----E------+E+-------++
| -+E+--- +++ +++ +E|
| +++ ---- +++ ++|
50 ++ +++ ---+E+- ++
| -E--- |
40 ++ ---+++ ++
| +++--- |
| -+E+ |
30 ++ +++---- ++
| +E+ |
20 ++ +++-- ++
| +E+ |
|+E+ |
10 +E+ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,1024] range

120 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
| master +-H--+ ++|
100 ++ ----E+
| +++ ---+E+--- ++|
| --E--- +++ |
80 ++ ---- +++ ++
| ---+E+- |
60 ++ -+E+-- ++
| +++ ---- +++ |
| -+E+- |
40 ++ +++---- ++
| +++ ---+E+ |
| -+E+--- |
20 ++ +E+ ++
|+E+++ |
+E+ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

Backports commit 354161b37c6465a32073eac5f16fa35939af2bb4 from qemu
2018-02-28 00:07:44 -05:00
..
crypto crypto: Clean up includes 2018-02-19 00:47:40 -05:00
default-configs arm64eb: add support for ARM64 big endian. 2017-04-24 23:30:01 +08:00
docs docs: clarify memory region lifecycle 2018-02-12 15:11:21 -05:00
fpu fpu: add mechanism to check for invalid long double formats 2018-02-26 02:27:40 -05:00
hw qdev: Fix object reference leak in case device.realize() fails 2018-02-25 21:00:26 -05:00
include tcg: Add atomic128 helpers 2018-02-27 21:43:48 -05:00
qapi qapi: rename QmpOutputVisitor to QObjectOutputVisitor 2018-02-27 08:05:33 -05:00
qobject qapi: rename QmpOutputVisitor to QObjectOutputVisitor 2018-02-27 08:05:33 -05:00
qom qapi: rename QmpOutputVisitor to QObjectOutputVisitor 2018-02-27 08:05:33 -05:00
scripts qapi: rename QmpOutputVisitor to QObjectOutputVisitor 2018-02-27 08:05:33 -05:00
target-arm target-arm: emulate LL/SC using cmpxchg helpers 2018-02-28 00:07:44 -05:00
target-i386 target-i386: remove helper_lock() 2018-02-27 23:43:22 -05:00
target-m68k target-m68k: Optimize gen_flush_flags 2018-02-27 10:19:54 -05:00
target-mips softmmu: Add probe_write() 2018-02-27 12:20:50 -05:00
target-sparc sparc: Use g_memdup() instead of g_new0() + memcpy() 2018-02-25 23:19:44 -05:00
tcg tcg: Emit barriers with parallel_cpus 2018-02-27 22:28:33 -05:00
util qapi: rename QmpOutputVisitor to QObjectOutputVisitor 2018-02-27 08:05:33 -05:00
aarch64.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
aarch64eb.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
accel.c accel: make configure_accelerator return void 2018-02-24 00:31:28 -05:00
arm.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
armeb.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
atomic_template.h tcg: Add atomic128 helpers 2018-02-27 21:43:48 -05:00
CODING_STYLE import 2015-08-21 15:04:50 +08:00
configure tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
COPYING import 2015-08-21 15:04:50 +08:00
COPYING.LIB import 2015-08-21 15:04:50 +08:00
cpu-exec-common.c tcg: Add EXCP_ATOMIC 2018-02-27 11:57:58 -05:00
cpu-exec.c tcg: Add EXCP_ATOMIC 2018-02-27 11:57:58 -05:00
cpus.c tcg: Add EXCP_ATOMIC 2018-02-27 11:57:58 -05:00
cputlb.c tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
exec.c exec: Avoid direct references to Int128 parts 2018-02-27 11:01:43 -05:00
gen_all_header.sh arm64eb: add support for ARM64 big endian. 2017-04-24 23:30:01 +08:00
glib_compat.c qapi: Fix memleak in string visitors on int lists 2018-02-25 00:20:34 -05:00
HACKING import 2015-08-21 15:04:50 +08:00
header_gen.py tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
ioport.c hw: remove pio_addr_t 2018-02-24 02:43:16 -05:00
LICENSE import 2015-08-21 15:04:50 +08:00
m68k.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
Makefile Makefile: Add a FORCE target 2018-02-24 17:03:51 -05:00
Makefile.objs tcg: Add atomic helpers 2018-02-27 15:57:47 -05:00
Makefile.target tcg: Add atomic helpers 2018-02-27 15:57:47 -05:00
memory.c exec.c: Remove static allocation of sub_section of sub_page 2018-02-26 10:50:04 -05:00
memory_mapping.c include/qemu/osdep.h: Don't include qapi/error.h 2018-02-21 23:08:18 -05:00
mips.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
mips64.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
mips64el.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
mipsel.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
powerpc.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
qapi-schema.json qapi: Lazy creation of array types 2018-02-19 18:55:35 -05:00
qemu-timer.c timer/cpus: fix some typos and update some comments 2018-02-25 23:21:57 -05:00
rules.mak rules.mak: Don't extract libs from .mo-libs in link command 2018-02-26 02:08:03 -05:00
softmmu_template.h cputlb: Remove includes from softmmu_template.h 2018-02-27 12:40:43 -05:00
sparc.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
sparc64.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
tcg-runtime.c tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00
translate-all.c tcg: Add EXCP_ATOMIC 2018-02-27 11:57:58 -05:00
translate-all.h translate-all.c: Compute L1 page table properties at runtime 2018-02-26 11:46:58 -05:00
translate-common.c exec: Clean up includes 2018-02-19 00:49:55 -05:00
unicorn_common.h qom/cpu: Add MemoryRegion property 2018-02-18 21:54:50 -05:00
VERSION import 2015-08-21 15:04:50 +08:00
vl.c cpu: Support a target CPU having a variable page size 2018-02-26 12:29:08 -05:00
vl.h import 2015-08-21 15:04:50 +08:00
x86_64.h tcg: Add CONFIG_ATOMIC64 2018-02-27 22:25:36 -05:00