unicorn/qemu/target-arm
Emilio G. Cota ec14a00925
target-arm: emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

atomic_add-bench: 1000000 ops/thread, [0,1] range

9 ++---------+----------+----------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
8 +Emaster +-H--+ ++
| | |
7 ++E ++
| | |
6 ++++ ++
| | |
5 ++ | ++
4 ++ | ++
| | |
3 ++ | ++
| | |
2 ++ | ++
|H++E+--- +++ ---+E+------+E+------+E|
1 +++ +E+-----+E+------+E+------+E+------+E+-- +++ +++ ++
++H+ + +++ + +++ ++++ + + + |
0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,2] range

16 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
14 ++master +-H--+ ++
| | |
12 ++| ++
| E |
10 ++| ++
| | |
8 ++++ ++
|E+| |
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- +++ +++ +++ ---+E+------+E|
2 +H+ +E+------E-------+E+-----+E+------+E+------+E+-- +++
+ | + +++ + ++++ + + + |
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,128] range

70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + ++++ + |
60 ++master +-H--+ ----E------+E+-------++
| -+E+--- +++ +++ +E|
| +++ ---- +++ ++|
50 ++ +++ ---+E+- ++
| -E--- |
40 ++ ---+++ ++
| +++--- |
| -+E+ |
30 ++ +++---- ++
| +E+ |
20 ++ +++-- ++
| +E+ |
|+E+ |
10 +E+ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,1024] range

120 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
| master +-H--+ ++|
100 ++ ----E+
| +++ ---+E+--- ++|
| --E--- +++ |
80 ++ ---- +++ ++
| ---+E+- |
60 ++ -+E+-- ++
| +++ ---- +++ |
| -+E+- |
40 ++ +++---- ++
| +++ ---+E+ |
| -+E+--- |
20 ++ +E+ ++
|+E+++ |
+E+ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

Backports commit 354161b37c6465a32073eac5f16fa35939af2bb4 from qemu
2018-02-28 00:07:44 -05:00
..
arm_ldst.h cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
cpu-qom.h target-arm: make cpu-qom.h not target specific 2018-02-24 00:48:59 -05:00
cpu.c arm: add Cortex A7 CPU parameters 2018-02-26 03:44:24 -05:00
cpu.h target-arm: Implement new HLT trap for semihosting 2018-02-26 15:28:45 -05:00
cpu64.c target-arm: Get rid of unused variable warnings 2018-02-23 12:43:09 -05:00
crypto_helper.c target-arm: Clean up includes 2018-02-17 21:09:32 -05:00
helper-a64.c softfloat: Implement run-time-configurable meaning of signaling NaN bit 2018-02-24 20:27:12 -05:00
helper-a64.h import 2015-08-21 15:04:50 +08:00
helper.c target-arm: Implement new HLT trap for semihosting 2018-02-26 15:28:45 -05:00
helper.h target-arm: Implement MRS (banked) and MSR (banked) instructions 2018-02-21 21:50:42 -05:00
internals.h Fix confusing argument names in some common functions 2018-02-25 03:58:27 -05:00
iwmmxt_helper.c target-arm: Clean up includes 2018-02-17 21:09:32 -05:00
kvm-consts.h import 2015-08-21 15:04:50 +08:00
Makefile.objs delete sparc32_dma.h & arm-semi.c 2017-01-19 15:10:41 +08:00
neon_helper.c target-arm: Fix warn about implicit conversion 2018-02-25 22:44:43 -05:00
op_addsub.h import 2015-08-21 15:04:50 +08:00
op_helper.c Fix masking of PC lower bits when doing exception returns 2018-02-26 08:09:28 -05:00
psci.c Use #include "..." for our own headers, <...> for others 2018-02-25 04:10:33 -05:00
translate-a64.c target-arm: Comments added to identify cases in a switch 2018-02-26 08:05:49 -05:00
translate.c target-arm: emulate LL/SC using cmpxchg helpers 2018-02-28 00:07:44 -05:00
translate.h target-arm: Infrastucture changes to enable handling of tagged address loading into PC 2018-02-26 07:58:17 -05:00
unicorn.h arm64eb: add support for ARM64 big endian. 2017-04-24 23:30:01 +08:00
unicorn_aarch64.c qemu-common: push cpu.h inclusion out of qemu-common.h 2018-02-24 01:50:56 -05:00
unicorn_arm.c qemu-common: push cpu.h inclusion out of qemu-common.h 2018-02-24 01:50:56 -05:00