Emilio G. Cota
ec14a00925
target-arm: emulate LL/SC using cmpxchg helpers
...
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.
The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.
Hi-res plots: http://imgur.com/a/aNQpB
atomic_add-bench: 1000000 ops/thread, [0,1] range
9 ++---------+----------+----------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
8 +Emaster +-H--+ ++
| | |
7 ++E ++
| | |
6 ++++ ++
| | |
5 ++ | ++
4 ++ | ++
| | |
3 ++ | ++
| | |
2 ++ | ++
|H++E+--- +++ ---+E+------+E+------+E|
1 +++ +E+-----+E+------+E+------+E+------+E+-- +++ +++ ++
++H+ + +++ + +++ ++++ + + + |
0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 1000000 ops/thread, [0,2] range
16 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
14 ++master +-H--+ ++
| | |
12 ++| ++
| E |
10 ++| ++
| | |
8 ++++ ++
|E+| |
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- +++ +++ +++ ---+E+------+E|
2 +H+ +E+------E-------+E+-----+E+------+E+------+E+-- +++
+ | + +++ + ++++ + + + |
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 1000000 ops/thread, [0,128] range
70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + ++++ + |
60 ++master +-H--+ ----E------+E+-------++
| -+E+--- +++ +++ +E|
| +++ ---- +++ ++|
50 ++ +++ ---+E+- ++
| -E--- |
40 ++ ---+++ ++
| +++--- |
| -+E+ |
30 ++ +++---- ++
| +E+ |
20 ++ +++-- ++
| +E+ |
|+E+ |
10 +E+ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 1000000 ops/thread, [0,1024] range
120 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
| master +-H--+ ++|
100 ++ ----E+
| +++ ---+E+--- ++|
| --E--- +++ |
80 ++ ---- +++ ++
| ---+E+- |
60 ++ -+E+-- ++
| +++ ---- +++ |
| -+E+- |
40 ++ +++---- ++
| +++ ---+E+ |
| -+E+--- |
20 ++ +E+ ++
|+E+++ |
+E+ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
Backports commit 354161b37c6465a32073eac5f16fa35939af2bb4 from qemu
2018-02-28 00:07:44 -05:00
Richard Henderson
fd9933fbd5
target-arm: Rearrange aa32 load and store functions
...
Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.
Backports commit 7f5616f53896a4e08ad37de3ac50d3a4cc8eff7a from qemu
2018-02-27 23:59:16 -05:00
Emilio G. Cota
3dc16ebca3
target-i386: remove helper_lock()
...
It's been superseded by the atomic helpers.
The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.
The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset
Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.
atomic_add-bench: 5000000 ops/thread, [0,1] range
25 ++---------+----------+---------+----------+----------+----------+---++
+ atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 +Emaster +-N--+ ++
|| |
|++ |
|| |
15 +++ ++
|N| |
|+| |
10 ++| ++
|+|+ |
| | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E|
|+E+E+- +++ +E+------+E+-- |
5 ++|+ ++
|+N+H+--- +++ |
++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- |
0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,2] range
25 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 ++master +-N--+ ++
|E| |
|++ |
||E |
15 ++| ++
|N|| |
|+|| ---+E+------+E+-----+E+------+E|
10 ++| | ---+E+------+E+-----+E+--- +++ +++
||H+E+--+E+-- |
|+++++ |
| || |
5 ++|+H+-- +++ ++
|+N+ - ---+H+------+H+------ |
+ +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H|
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,8] range
40 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
35 +cmpxchg +-H--+ ++
| master +-N--+ ---+E+------+E+------+E+-----+E+------+E|
30 ++| ---+E+-- +++ ++
| | -+E+--- |
25 ++E ---- +++ ++
|+++++ -+E+ |
20 +E+ E-- +++ ++
|H|+++ |
|+| +H+------- |
15 ++H+ ---+++ +H+------ ++
|N++H+-- +++--- +H+------++|
10 ++ +++ - +++ ---+H+ +++ +H+
| | +H+-----+H+------+H+-- |
5 ++| +++ ++
++N+N+--+N++ + + + + + |
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,128] range
160 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
140 +cmpxchg +-H--+ +++ +++ ++
| master +-N--+ E--------E------+E+------++|
120 ++ --| | +++ E+
| -- +++ +++ ++|
100 ++ - ++
| +++- +++ ++|
80 ++ -+E+ -+H+------+H+------H--------++
| ---- ---- +++ H|
| ---+E+-----+E+- ---+H+ ++|
60 ++ +E+--- +++ ---+H+--- ++
| --+++ ---+H+-- |
40 ++ +E+-+H+--- ++
| +H+ |
20 +EE+ ++
+N+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,1024] range
350 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
300 +cmpxchg +-H--+ +++
| master +-N--+ +++ ||
| +++ | ----E|
250 ++ | ----E---- ++
| ----E--- | ---+H|
200 ++ -+E+--- +++ ---+H+--- ++
| ---- -+H+-- |
| +E+ +++ ---- +++ |
150 ++ ---+++ ---+H+- ++
| --- -+H+-- |
100 ++ ---+E+ ---- +++ ++
| +++ ---+E+-----+H+- |
| -+E+------+H+-- |
50 ++ +E+ ++
+EE+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
hi-res: http://imgur.com/a/fMRmq
For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.
Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu
2018-02-27 23:43:22 -05:00
Emilio G. Cota
9d9b7dedac
target-i386: emulate XCHG using atomic helper
...
Backports commit ea97ebe89f7a879ea9aba90140e40c29b5cbd653 from qemu
2018-02-27 23:40:20 -05:00
Emilio G. Cota
8f96b6beb9
target-i386: emulate LOCK'ed BTX ops using atomic helpers
...
Backports commit cfe819d309d472f75fd129faf1d1064a2498326c from qemu
2018-02-27 23:39:21 -05:00
Emilio G. Cota
089965fa8d
target-i386: emulate LOCK'ed XADD using atomic helper
...
Backports commit f53b01817f95781d2bcc8a82e057d1416601e13b from qemu
2018-02-27 23:06:28 -05:00
Emilio G. Cota
f9ed728f27
target-i386: emulate LOCK'ed NEG using cmpxchg helper
...
Backports commit 8eb8c7385608b99bed6055a22d897ff727a6cb8e from qemu
2018-02-27 23:03:28 -05:00
Emilio G. Cota
fedeb0f93e
target-i386: emulate LOCK'ed NOT using atomic helper
...
Backports commit 2a5fe8ae145ef7a3ab480922116d27efcc97b85d from qemu
2018-02-27 23:00:33 -05:00
Emilio G. Cota
05c94546d5
target-i386: emulate LOCK'ed INC using atomic helper
...
Backports commit 60e573462fcdb83aa1a41e66a9f31dc8a4364399 from qemu
2018-02-27 22:56:05 -05:00
Emilio G. Cota
7c7b0fe746
target-i386: emulate LOCK'ed OP instructions using atomic helpers
...
Backports commit a7cee522f3529c2fc85379237b391ea98823271e from qemu
2018-02-27 22:53:46 -05:00
Emilio G. Cota
a386368f82
target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
...
The diff here is uglier than necessary. All this does is to turn
FOO
into:
if (s->prefix & PREFIX_LOCK) {
BAR
} else {
FOO
}
where FOO is the original implementation of an unlocked cmpxchg.
Backports commit ae03f8de45427042ecd10b0941a005f21ecc064c from qemu
2018-02-27 22:38:37 -05:00
Richard Henderson
b48508a6c1
tcg: Emit barriers with parallel_cpus
...
Backports commit 91682118aa330aff7e8ef0cc685c32d101f49940 from qemu
2018-02-27 22:28:33 -05:00
Richard Henderson
064543a415
tcg: Add CONFIG_ATOMIC64
...
Allow qemu to build on 32-bit hosts without 64-bit atomic ops.
Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation. Do so by dropping back to single-step.
Backports commit df79b996a7b21c6ea7847f7927a2e1a294b86c72 from qemu
2018-02-27 22:25:36 -05:00
Richard Henderson
da01e53757
tcg: Add atomic128 helpers
...
Force the use of cmpxchg16b on x86_64.
Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction. Further, it's required by Windows 8 so no new cpus
will ever omit it.
If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.
Backports commit 7ebee43ee3e2fcd7b5063058b7ef74bc43216733 from qemu
2018-02-27 21:43:48 -05:00
Richard Henderson
5c0ce1b99c
tcg: Add atomic helpers
...
Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.
Backports commit c482cb117cc418115ca9c6d21a7a2315414c0a40 from qemu
2018-02-27 15:57:47 -05:00
Richard Henderson
0245f93c02
cputlb: Remove includes from softmmu_template.h
...
We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.
Backports commit 40978428853e2f7b4597ab2a9ffeb187333802dc from qemu
2018-02-27 12:40:43 -05:00
Richard Henderson
5c79851143
cputlb: Tidy some macros
...
TGT_LE and TGT_BE are not size dependent and do not need to be
redefined. The others are no longer used at all.
Backports commit c86c6e4c80fee4d9423bedb10ba9e9c4aa68f861 from qemu
2018-02-27 12:36:25 -05:00
Richard Henderson
4da1cfb902
cputlb: Move most of iotlb code out of line
...
Saves 2k code size off of a cold path.
Backports commit 82a45b96a203a7403427183f1afd3d295222ff7d from qemu
2018-02-27 12:34:19 -05:00
Richard Henderson
5df7c9eec7
cputlb: Move probe_write out of softmmu_template.h
...
Backports commit 3b08f0a92545ba06fbdeaae929a5172480300c33 from qemu
2018-02-27 12:25:24 -05:00
Yongbok Kim
79e4c001a9
softmmu: Add probe_write()
...
Probe for whether the specified guest write access is permitted.
If it is not permitted then an exception will be taken in the same
way as if this were a real write access (and we will not return).
Otherwise the function will return, and there will be a valid
entry in the TLB for this access.
Backports commit 3b4afc9e75ab1a95f33e41f462921093f8a109c4 from qemu
2018-02-27 12:20:50 -05:00
Richard Henderson
1c9c8d3f10
cputlb: Replace SHIFT with DATA_SIZE
...
Backports commit dea2198201b3e0151d75b42774c51cf2ffe2ca4b from qemu
2018-02-27 12:00:33 -05:00
Richard Henderson
e35aacd5ae
tcg: Add EXCP_ATOMIC
...
When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.
Backports commit fdbc2b5722f6092e47181a947c90fd4bdcc1c121 from qemu
Also backports parts of commit 02d57ea115b7669f588371c86484a2e8ebc369be
2018-02-27 11:57:58 -05:00
Richard Henderson
d5510a546f
int128: Add int128_make128
...
Allows Int128 to be used more generally, rather than having to
begin with 64-bit inputs and accumulate.
Backports commit 1edaeee0955fba7d834b7c8f4e372e7eae030745 from qemu
2018-02-27 11:06:33 -05:00
Richard Henderson
9084e5fe1b
int128: Use __int128 if available
...
Backports commit 0846beb36641e8f0c3ee55a5bb84d468b653c852 from qemu
2018-02-27 11:03:06 -05:00
Richard Henderson
4fdbe94eea
exec: Avoid direct references to Int128 parts
...
Backports commit 258dfaaad05a5fbe32a142b794e1df3e16501d0e from qemu
2018-02-27 11:01:43 -05:00
Richard Henderson
ba1c63572e
atomics: Add __nocheck atomic operations
...
While the check against sizeof(void *) is appropriate for
normal usage within qemu, there are places in which we want
wider operaions and have checked for their existance.
Backports commit 84bca3927b36fb1d9a2ca85cbbdf9023d2b84678 from qemu
2018-02-27 11:00:20 -05:00
Lioncash
a59eef391e
atomic: MSVC compatible equivalents to some functions
2018-02-27 10:56:04 -05:00
Emilio G. Cota
c837d76a86
atomics: add atomic_op_fetch variants
...
This paves the way for upcoming work.
Backports commit 83d0c719f837724d9e3963b078211b2242bdd2a5 from qemu
2018-02-27 10:28:27 -05:00
Emilio G. Cota
102a53aa50
atomics: add atomic_xor
...
This paves the way for upcoming work.
Backports commit 61696ddbdc74263ddb6869856772cfe355a5d3bd from qemu
2018-02-27 10:23:31 -05:00
Richard Henderson
3fe8d46a15
atomics: Add parameters to macros
...
Making these functional rather than object macros will
prevent later problems with complex macro expansion.
Backports commit d1a9f2d12fcfc942924956fbe321aedf4226ccb7 from qemu
2018-02-27 10:21:35 -05:00
Richard Henderson
4168095fed
target-m68k: Optimize gen_flush_flags
...
Backports commit 36f0399d46f2ccf4f6e7451ba46b1e8d0e9ab341 from qemu
2018-02-27 10:19:54 -05:00
Richard Henderson
7403e63f2f
target-m68k: Optimize some comparisons
...
Backports commit 9d896621c1820fd8f437fac26fd7d2e0921091c3 from qemu
2018-02-27 10:15:20 -05:00
Richard Henderson
672a28173f
target-m68k: Use setcond for scc
...
Backports commit b459e3eccfae7fe83e30187c391de00bccf4f51d from qemu
2018-02-27 10:11:35 -05:00
Richard Henderson
ed6feb9329
target-m68k: Introduce DisasCompare
...
Backports commit 6a432295d73df91890dc70c4a94dcc4ba88ad1c3 from qemu
2018-02-27 10:08:32 -05:00
Richard Henderson
4e498cc54d
target-m68k: Reorg flags handling
...
Separate all ccr bits. Continue to batch updates via cc_op.
Backports commit 620c6cf66584bfbee90db84a7e87a6eabf230ca9 from qemu
2018-02-27 10:02:02 -05:00
Richard Henderson
121309a4d0
target-m68k: Reorg flags handling
...
Separate all ccr bits. Continue to batch updates via cc_op.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Fix gen_logic_cc() to really extend the size of the result.
Fix gen_get_ccr(): update cc_op as it is used by the helper.
Factorize flags computing and src/ccr cleanup
Backports commit 620c6cf66584bfbee90db84a7e87a6eabf230ca9 from qemu
2018-02-27 09:30:32 -05:00
Richard Henderson
61ab9a42cd
target-m68k: Remove incorrect clearing of cc_x
...
The CF docs certainly doesnt suggest this is true.
Backports commit 18dd87f26bed46f22bb1b9536329c02de500f407 from qemu
2018-02-27 09:21:26 -05:00
Richard Henderson
187c2a9807
target-m68k: Some fixes to SR and flags management
...
Backports commit 99c514485b1d7922c4ca1ed767fd45525de4701f from qemu
2018-02-27 09:19:21 -05:00
Richard Henderson
9493b29399
target-m68k: Print flags properly
...
Backports commit 8e394ccabdb1e439aab092de6b9d2f26432e962f from qemu
2018-02-27 09:17:44 -05:00
Laurent Vivier
57ea90a91f
target-m68k: update CPU flags management
...
Copied from target-i386
Backports commit 9fdb533fb129b19610941bd1e5dd93e7471a18f5 from qemu
2018-02-27 09:15:29 -05:00
Laurent Vivier
125675e334
target-m68k: don't update cc_dest in helpers
...
Backports commit 91f90d7191f862ab27528dbdf76cee55c77f79cf from qemu
2018-02-27 09:04:51 -05:00
Laurent Vivier
12f9ba3fe4
target-m68k: update move to/from ccr/sr
...
Backports commit 7c0eb318bdcc3667a861e7b0f140df0b6d9895e2 from qemu
2018-02-27 08:57:05 -05:00
Laurent Vivier
b8366d5b31
target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
...
Update cc_op directly from tcg_gen_insn_start() and
restore_state_to_opc()
Copied from target-i386
Backports commit 20a8856eba0980fbe9d2b8ed2b33ecdb9c9fe5ad from qemu
2018-02-27 08:53:02 -05:00
Laurent Vivier
a521f4f41d
target-m68k: Replace helper_xflag_lt with setcond
...
Backports commit f9083519034aaa5ad5cd2c5727bd61c29bf60bc5 from qemu
2018-02-27 08:50:44 -05:00
Laurent Vivier
b079255576
target-m68k: allow to update flags with operation on words and bytes
...
Backports commit 5dbb6784b7e2b833c036b4df58aa07067e35f476 from qemu
2018-02-27 08:47:12 -05:00
Laurent Vivier
f069762b61
target-m68k: REG() macro cleanup
...
Backports commit bcc098b0c23b4dd902ff56987d769bd839677331 from qemu
2018-02-27 08:37:25 -05:00
Laurent Vivier
3d59fe56b3
target-m68k: set PAGE_BITS to 12 for m68k
...
Backports commit 2b04e85a3401e13cb19b1de197e6c211eaadca4c from qemu
2018-02-27 08:36:09 -05:00
Laurent Vivier
292fc83c86
target-m68k: define operand sizes
...
Backports commit 7ef25cdd6cee4fa468d6cb913fa064a6689faf7d from qemu
2018-02-27 08:35:13 -05:00
Laurent Vivier
2653165c63
target-m68k: introduce read_imXX() functions
...
Read a 8, 16 or 32bit immediat constant.
An immediate constant is stored in the instruction opcode and
can be in one or two extension words.
Backports commit 28b68cd79ef01e8b1f5bd26718cd8c09a12c625f from qemu
2018-02-27 08:32:04 -05:00
Laurent Vivier
d29cbb70b3
target-m68k: manage scaled index
...
Scaled index is not supported by 68000, 68008, and 68010.
EA = (bd + PC) + Xn.SIZE*SCALE + od
Ignore it:
M68000 FAMILY PROGRAMMER’S REFERENCE MANUAL
2.4 BRIEF EXTENSION WORD FORMAT COMPATIBILITY
"If the MC68000 were to execute an instruction that
encoded a scaling factor, the scaling factor would be
ignored and would not access the desired memory address.
The earlier microprocessors do not recognize the brief
extension word formats implemented by newer processors.
Although they can detect illegal instructions, they do not
decode invalid encodings of the brief extension word formats
as exceptions."
Backports commit d8633620a112296fcf6a6ae9a1cbba614c0ca502 from qemu
2018-02-27 08:27:20 -05:00