Commit graph

2138 commits

Author SHA1 Message Date
Joseph Myers 7ff441826c
tcg: correct 32-bit tcg_gen_ld8s_i64 sign-extension
The version of tcg_gen_ld8s_i64 for 32-bit systems does a load into
the low part of the return value - then attempts a sign extension into
the high part, but wrongly sets the high part to a sign extension of
itself rather than of the low part. This results in TCG internal
errors from the use of the uninitialized high part (in some GCC tests
of AArch64 NEON shift intrinsics, in particular). This patch corrects
the sign-extension logic, making it match other functions such as
tcg_gen_ld16s_i64.

Backports commit 3ff91d7e85176f8b4b131163d7fd801757a2c949 from qemu
2018-03-01 08:41:23 -05:00
Peter Maydell f9c5c1a604
tcg/tcg.h: Improve documentation of TCGv_i32 etc types
The typedefs we use for the TCGv_i32, TCGv_i64 and TCGv_ptr
types are somewhat confusing, because we define them as
pointers to structs, but the structs themselves are never
defined. Explain in the comments a bit more clearly why
this is OK and what is going on under the hood.

Backports commit a40d4701bc9f6e6a3bbfb7b4fbe756a5b72b5df1 from qemu
2018-03-01 08:40:35 -05:00
Richard Henderson f5a35908da
tcg: Add tcg_gen_mulsu2_{i32,i64,tl}
This multiply has one signed input and one unsigned input,
producing the full double-width result.

Backports commit 5087abfb7dfd1d368ae6939420057036b4d8e509 from qemu
2018-03-01 08:39:37 -05:00
Richard Henderson f9d91a81b5
target-sparc: Use tcg_gen_atomic_cmpxchg_tl
Backports commit 5a7267b6a9e94c264ca77a7ca5a239e70dac81da from qemu
2018-03-01 08:34:35 -05:00
Richard Henderson 47313adedd
target-sparc: Use tcg_gen_atomic_xchg_tl
Backports commit da1bcae65288bdd51e0a7203d1e6c9cde1be5b3d from qemu
2018-03-01 08:32:10 -05:00
Richard Henderson 6b09040e23
target-sparc: Remove MMU_MODE*_SUFFIX
The functions that these generate are no longer used.

Backports commit 47b2696b975b794c6fa7b9fa8ae4699e749d662c from qemu
2018-03-01 08:30:27 -05:00
Richard Henderson 00fc847229
target-sparc: Allow 4-byte alignment on fp mem ops
The cpu is allowed to require stricter alignment on these 8- and 16-byte
operations, and the OS is required to fix up the accesses as necessary,
so the previous code was not wrong.

However, we can easily handle this misalignment for all direct 8-byte
operations and for direct 16-byte loads.

We must retain 16-byte alignment for 16-byte stores, so that we don't have
to probe for writability of a second page before performing the first of
two 8-byte stores. We also retain 8-byte alignment for no-fault loads,
since they are rare and it's not worth extending the helpers for this.

Backports commit cb21b4da6cca1bb4e3f5fefb698fb9e4d00c8f66 from qemu
2018-03-01 08:29:11 -05:00
Richard Henderson eec264526e
target-sparc: Implement ldqf and stqf inline
At the same time, fix a problem with stqf_asi, when
a write might access two pages.

Backports commit f939ffe5a022a8798824e2720ed5a14186fca6b6 from qemu
2018-03-01 08:20:36 -05:00
Richard Henderson 3a25695841
target-sparc: Remove asi helper code handled inline
Now that we never call out to helpers when direct accesses can
handle an asi, remove the corresponding code in those helpers.
For ldda, this removes the entire helper.

Backports commit 918d9a2c9d36378a3cf6636018900a4731c83b9d from qemu
2018-03-01 08:14:31 -05:00
Richard Henderson 15c8bf0b42
target-sparc: Implement BCOPY/BFILL inline
Backports commit 34810610acbde7a0745be3a88e99f2ef9282260f from qemu
2018-02-28 12:54:10 -05:00
Richard Henderson 3c48eb4aaf
target-sparc: Implement cas_asi/casx_asi inline
Backports commit 7268adebfda6548b8ae6865dc8337f116a5d266d from qemu
2018-02-28 12:47:26 -05:00
Richard Henderson b28b5cd3d3
target-sparc: Implement ldstub_asi inline
Backports commit fbb4bbb62e5603c991b880e25dc4bb30d342b944 from qemu
2018-02-28 12:42:26 -05:00
Richard Henderson adf9faf075
target-sparc: Implement swap_asi inline
Backports commit 4fb554bc6c88eb45270a3ad3cf6e6e2ad476aede from qemu
2018-02-28 12:39:55 -05:00
Richard Henderson ebc292c174
target-sparc: Handle more twinx asis
As used by HelenOS, presumably for ultra 2 and 3,
prior to the sun4v platform and the current twinx names.

Backports commit 34a6e13da70b2c798630a8dbd03d09f201c0198f from qemu
2018-02-28 12:28:08 -05:00
Richard Henderson ecbeea7c56
target-sparc: Use MMU_PHYS_IDX for bypass asis
Backports commit 7f87c90527d7363e8cecf1c6b5ad3d4cc85d3d28 from qemu
2018-02-28 12:26:29 -05:00
Richard Henderson 15eea419e5
target-sparc: Add MMU_PHYS_IDX
It's handy to have a mmu idx for physical addresses, so
that mmu disabled and physical access asis can use the
same path as normal accesses.

Backports commit af7a06bac7d3abb2da48ef3277d2a415772d2ae8 from qemu
2018-02-28 12:24:17 -05:00
Richard Henderson 9e60a8e432
target-sparc: Introduce cpu_raise_exception_ra
Several helpers call helper_raise_exception directly, which requires
in turn that their callers have performed save_state. The new function
allows a TCG return address to be passed in so that we can restore
PC + NPC + flags data from that.

This fixes a bug in the usage of helper_check_align, whose callers had
not been calling save_state. It fixes another bug in which the divide
helpers used GETPC at a level other than the direct callee from TCG.

This allows the translator to avoid save_state prior to SAVE, RESTORE,
and FLUSHW instructions.

Backports commit 2f9d35fc4006122bad33f9ae3e2e51d2263e98ee from qemu
2018-02-28 12:15:06 -05:00
Richard Henderson 62ae2a5102
target-sparc: Use overalignment flags for twinx and block asis
This allows us to enforce 16 and 64-byte alignment
without any extra overhead.

Backports commit 808832277af11dafee5a55da2b9e41d019b879ca from qemu
2018-02-28 12:01:50 -05:00
Alex Bennée da124da4b1
tcg: move locking for tb_invalidate_phys_page_range up
In the linux-user case all things that involve ''l1_map' and PageDesc
tweaks are protected by the memory lock (mmpa_lock). For SoftMMU mode
we previously relied on single threaded behaviour, with MTTCG we now use
the tb_lock().

As a result we need to do a little re-factoring and push the taking of
this lock up the call tree. This requires a slightly different entry for
the SoftMMU and user-mode cases from tb_invalidate_phys_range.

This also means user-mode breakpoint insertion needs to take two locks
but it hadn't taken any previously so this is an improvement.

Backpoirts commit ba051fb5e56d5ff5e4fa672d37954452e58543b2 from qemu
2018-02-28 10:35:41 -05:00
Paolo Bonzini 9d64a89acf
tcg: comment on which functions have to be called with tb_lock held
softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks. Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.

This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.

Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock. The next one will rectify this.

Backports commit 7d7500d99895f888f97397ef32bb536bb0df3b74 from qemu
2018-02-28 10:26:28 -05:00
Alex Bennée 7aab0bd9a6
translate-all: add DEBUG_LOCKING asserts
This adds asserts to check the locking on the various translation
engines structures. There are two sets of structures that are protected
by locks.

The first the l1map and PageDesc structures used to track which
translation blocks are associated with which physical addresses. In
user-mode this is covered by the mmap_lock.

The second case are TB context related structures which are protected by
tb_lock which is also user-mode only.

Currently the asserts do nothing in SoftMMU mode but this will change
for MTTCG.

Backports commit 301e40ed8005306c009978be295ed9a4b725178b from qemu
2018-02-28 08:56:15 -05:00
Alex Bennée 075aaad106
translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
Make the debug define consistent with the others. The flush operation is
all about invalidating TranslationBlocks on flush events.

Also fix up the commenting on the other DEBUG for the benefit of
checkpatch.

Backports commit 955939a2b51f72bea1c200b559ea39985df5a633 from qemu
2018-02-28 08:53:38 -05:00
Anand J 8278af45cd
clean-up: removed duplicate #includes
Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.

Backports commit 814bb12a561d36aeb5ae4440ad43d2b0761d76da from qemu
2018-02-28 08:51:56 -05:00
Wei Huang bceed21d23
arm: Add an option to turn on/off vPMU support
This patch adds a pmu=[on/off] option to enable/disable vPMU support
in guest vCPU. It allows virt tools, such as libvirt, to determine the
exsitence of vPMU and configure it. Note this option is only available
for cortex-a57/cortex-53/ host CPUs, but unavailable on ARMv7 and other
processors. Also even though "pmu=" option is available for TCG mode,
setting it doesn't turn PMU on.

Backports commit 929e754d5a621cd53f30e69b766ccf381b58d124 from qemu
2018-02-28 08:49:23 -05:00
Laurent Vivier 5daf91ea48
target-m68k: immediate ops manage word and byte operands
Backports commit 92c62548f69cb4ba739d7d046e9caf9ea75753e4 from qemu
2018-02-28 08:42:22 -05:00
Laurent Vivier f7c29f73b3
target-m68k: cmp manages word and bytes operands
Backports commit ff99b952c8280853801fe14f7ae62d0f87464f7d from qemu
2018-02-28 08:37:46 -05:00
Laurent Vivier fc28e8127f
target-m68k: add/sub manage word and byte operands
Backports commit 8a370c6cb770b618f7eb66628116c25e84588df8 from qemu
2018-02-28 07:18:25 -05:00
Laurent Vivier bc27695926
target-m68k: add addressing modes to neg
Backports commit 227de713e0f4224a82c32991b4e4c4973381426b from qemu
2018-02-28 07:07:28 -05:00
Laurent Vivier 3558b93f11
target-m68k: introduce byte and word cc_ops
Backports commit db3d7945ae7992c91cc5705dccf60fec79b24dc4 from qemu
2018-02-28 06:52:16 -05:00
Laurent Vivier 4e257ffda9
target-m68k: some bit ops cleanup
Backports commit 3c980d2ef664e6d5a1a0c98aca4d11d33b17ca59 from qemu
2018-02-28 01:25:58 -05:00
Laurent Vivier cfab571859
target-m68k: suba/adda can manage word operand
Backports commit 415f4b62eb4629bd3702e6fb8aa51437a92983ff from qemu
2018-02-28 01:20:23 -05:00
Laurent Vivier 99c297efe3
target-m68k: and can manage word and byte operands
Backports commit 52dc23c5956159a79a4e2d4193e44d2c4cf3883c from qemu
2018-02-28 01:19:02 -05:00
Laurent Vivier 41372b0cc9
target-m68k: or can manage word and byte operands
Backports commit 020a4659208a6f9a985881504fd4d3b44ab589be from qemu
2018-02-28 01:15:27 -05:00
Laurent Vivier e140aac281
target-m68k: eor can manage word and byte operands
Backports commit eec37aec85af9f5fd59b534d20c86a775b8e7973 from qemu
2018-02-28 01:05:21 -05:00
Laurent Vivier bc52777b00
target-m68k: add addressing modes to not
Backports commit ea4f2a844132c81f1e6b51fed7019686ce4e3bc5 from qemu
2018-02-28 01:03:38 -05:00
Richard Henderson 549e31cc72
target-m68k: Inline addx, subx, negx
And add opcodes for 680x0

Backports commit a665a820e5d46b1611f409fbc7a540fe1c6bf5c8 from qemu
2018-02-28 01:02:31 -05:00
Laurent Vivier b796f934ff
target-m68k: add dbcc
Backports commit beff27ab3a60d8abab4a166670ca79b3c0970005 from qemu
2018-02-28 00:45:39 -05:00
Laurent Vivier 977c3fe6c4
target-m68k: add addressing modes to scc
Backports commit d5a3cf33f2f65069d2f79a6e349f0d8140f02bb4 from qemu
2018-02-28 00:43:30 -05:00
Laurent Vivier 77b1754376
target-m68k: add exg ops
Backports commit 29cf437da4eeacb46cd7076014d06c85ca47c91d from qemu
2018-02-28 00:37:41 -05:00
Laurent Vivier 56882899be
target-m68k: add linkl
Backports commit c630e436c0ed3adc3a858c328119daf6d1b3357f from qemu
2018-02-28 00:31:27 -05:00
Laurent Vivier 59d6a1a744
target-m68k: add bkpt instruction
Backports commit 71600eda7cc48f03ea306bc69ed7e52ef1d9dd91 from qemu
2018-02-28 00:29:41 -05:00
Emilio G. Cota 22be035e60
target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
The exception is not emitted anymore; remove it and the associated
TCG variables.

Backports commit 05188cc72f0399e99c92f608a8e7ca4c8e552c4b from qemu
2018-02-28 00:24:20 -05:00
Emilio G. Cota cb92eea81a
target-arm: emulate aarch64's LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

atomic_add-bench: 1000000 ops/thread, [0,1] range

18 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
16 ++master +-H--+ ++
|| |
14 ++ ++
| | |
12 ++| ++
| | |
10 ++++ ++
8 ++E ++
|+++ |
6 ++ | ++
| | |
4 ++ | ++
| | |
2 +H++E+--- ++
+ | +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,2] range

18 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
16 ++master +-H--+ ++
| | |
14 ++E ++
| | |
12 ++| ++
|+++ |
10 ++ | ++
8 ++ | ++
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- |
2 +H+ +E+-----+++ +++ +++ ---+E+-----+E+------+++
+++ + +E+---+--+E+----++E+------+E+--- ++++ +++ + +E|
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,128] range

70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
60 ++master +-H--+ +++ ---+E+-----+E+------+E+
| +E+------E-------+E+--- |
| --- +++ |
50 ++ +++--- ++
| -+E+ |
40 ++ +++---- ++
| E- |
| --| |
30 ++ -- +++ ++
| +E+ |
20 ++E+ ++
|E+ |
| |
10 ++ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,1024] range

160 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
140 ++master +-H--+ +++ +++
| -+E+-----+E+-------E|
120 ++ +++ ---- +++
| +++ ----E-- |
100 ++ --E--- +++ ++
| +++ ---- +++ |
80 ++ --E-- ++
| ---- +++ |
| -+E+ |
60 ++ ---- +++ ++
| +E+- |
40 ++ -- ++
| +E+ |
20 +EE+ ++
+++ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

Backports commit 1dd089d0eec060dcd8478735114d98421d414805 from qemu
2018-02-28 00:21:27 -05:00
Emilio G. Cota 3546558f66
target-arm: emulate SWP with atomic_xchg helper
Backports commit cf12bce088f22b92bf62ffa0d7f6a3e951e355a9 from qemu
2018-02-28 00:11:23 -05:00
Emilio G. Cota ec14a00925
target-arm: emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

atomic_add-bench: 1000000 ops/thread, [0,1] range

9 ++---------+----------+----------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
8 +Emaster +-H--+ ++
| | |
7 ++E ++
| | |
6 ++++ ++
| | |
5 ++ | ++
4 ++ | ++
| | |
3 ++ | ++
| | |
2 ++ | ++
|H++E+--- +++ ---+E+------+E+------+E|
1 +++ +E+-----+E+------+E+------+E+------+E+-- +++ +++ ++
++H+ + +++ + +++ ++++ + + + |
0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,2] range

16 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
14 ++master +-H--+ ++
| | |
12 ++| ++
| E |
10 ++| ++
| | |
8 ++++ ++
|E+| |
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- +++ +++ +++ ---+E+------+E|
2 +H+ +E+------E-------+E+-----+E+------+E+------+E+-- +++
+ | + +++ + ++++ + + + |
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,128] range

70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + ++++ + |
60 ++master +-H--+ ----E------+E+-------++
| -+E+--- +++ +++ +E|
| +++ ---- +++ ++|
50 ++ +++ ---+E+- ++
| -E--- |
40 ++ ---+++ ++
| +++--- |
| -+E+ |
30 ++ +++---- ++
| +E+ |
20 ++ +++-- ++
| +E+ |
|+E+ |
10 +E+ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,1024] range

120 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
| master +-H--+ ++|
100 ++ ----E+
| +++ ---+E+--- ++|
| --E--- +++ |
80 ++ ---- +++ ++
| ---+E+- |
60 ++ -+E+-- ++
| +++ ---- +++ |
| -+E+- |
40 ++ +++---- ++
| +++ ---+E+ |
| -+E+--- |
20 ++ +E+ ++
|+E+++ |
+E+ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

Backports commit 354161b37c6465a32073eac5f16fa35939af2bb4 from qemu
2018-02-28 00:07:44 -05:00
Richard Henderson fd9933fbd5
target-arm: Rearrange aa32 load and store functions
Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Backports commit 7f5616f53896a4e08ad37de3ac50d3a4cc8eff7a from qemu
2018-02-27 23:59:16 -05:00
Emilio G. Cota 3dc16ebca3
target-i386: remove helper_lock()
It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

atomic_add-bench: 5000000 ops/thread, [0,1] range

25 ++---------+----------+---------+----------+----------+----------+---++
+ atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 +Emaster +-N--+ ++
|| |
|++ |
|| |
15 +++ ++
|N| |
|+| |
10 ++| ++
|+|+ |
| | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E|
|+E+E+- +++ +E+------+E+-- |
5 ++|+ ++
|+N+H+--- +++ |
++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- |
0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,2] range

25 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 ++master +-N--+ ++
|E| |
|++ |
||E |
15 ++| ++
|N|| |
|+|| ---+E+------+E+-----+E+------+E|
10 ++| | ---+E+------+E+-----+E+--- +++ +++
||H+E+--+E+-- |
|+++++ |
| || |
5 ++|+H+-- +++ ++
|+N+ - ---+H+------+H+------ |
+ +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H|
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,8] range

40 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
35 +cmpxchg +-H--+ ++
| master +-N--+ ---+E+------+E+------+E+-----+E+------+E|
30 ++| ---+E+-- +++ ++
| | -+E+--- |
25 ++E ---- +++ ++
|+++++ -+E+ |
20 +E+ E-- +++ ++
|H|+++ |
|+| +H+------- |
15 ++H+ ---+++ +H+------ ++
|N++H+-- +++--- +H+------++|
10 ++ +++ - +++ ---+H+ +++ +H+
| | +H+-----+H+------+H+-- |
5 ++| +++ ++
++N+N+--+N++ + + + + + |
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,128] range

160 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
140 +cmpxchg +-H--+ +++ +++ ++
| master +-N--+ E--------E------+E+------++|
120 ++ --| | +++ E+
| -- +++ +++ ++|
100 ++ - ++
| +++- +++ ++|
80 ++ -+E+ -+H+------+H+------H--------++
| ---- ---- +++ H|
| ---+E+-----+E+- ---+H+ ++|
60 ++ +E+--- +++ ---+H+--- ++
| --+++ ---+H+-- |
40 ++ +E+-+H+--- ++
| +H+ |
20 +EE+ ++
+N+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,1024] range

350 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
300 +cmpxchg +-H--+ +++
| master +-N--+ +++ ||
| +++ | ----E|
250 ++ | ----E---- ++
| ----E--- | ---+H|
200 ++ -+E+--- +++ ---+H+--- ++
| ---- -+H+-- |
| +E+ +++ ---- +++ |
150 ++ ---+++ ---+H+- ++
| --- -+H+-- |
100 ++ ---+E+ ---- +++ ++
| +++ ---+E+-----+H+- |
| -+E+------+H+-- |
50 ++ +E+ ++
+EE+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu
2018-02-27 23:43:22 -05:00
Emilio G. Cota 9d9b7dedac
target-i386: emulate XCHG using atomic helper
Backports commit ea97ebe89f7a879ea9aba90140e40c29b5cbd653 from qemu
2018-02-27 23:40:20 -05:00
Emilio G. Cota 8f96b6beb9
target-i386: emulate LOCK'ed BTX ops using atomic helpers
Backports commit cfe819d309d472f75fd129faf1d1064a2498326c from qemu
2018-02-27 23:39:21 -05:00
Emilio G. Cota 089965fa8d
target-i386: emulate LOCK'ed XADD using atomic helper
Backports commit f53b01817f95781d2bcc8a82e057d1416601e13b from qemu
2018-02-27 23:06:28 -05:00
Emilio G. Cota f9ed728f27
target-i386: emulate LOCK'ed NEG using cmpxchg helper
Backports commit 8eb8c7385608b99bed6055a22d897ff727a6cb8e from qemu
2018-02-27 23:03:28 -05:00
Emilio G. Cota fedeb0f93e
target-i386: emulate LOCK'ed NOT using atomic helper
Backports commit 2a5fe8ae145ef7a3ab480922116d27efcc97b85d from qemu
2018-02-27 23:00:33 -05:00
Emilio G. Cota 05c94546d5
target-i386: emulate LOCK'ed INC using atomic helper
Backports commit 60e573462fcdb83aa1a41e66a9f31dc8a4364399 from qemu
2018-02-27 22:56:05 -05:00
Emilio G. Cota 7c7b0fe746
target-i386: emulate LOCK'ed OP instructions using atomic helpers
Backports commit a7cee522f3529c2fc85379237b391ea98823271e from qemu
2018-02-27 22:53:46 -05:00
Emilio G. Cota a386368f82
target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
BAR
} else {
FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

Backports commit ae03f8de45427042ecd10b0941a005f21ecc064c from qemu
2018-02-27 22:38:37 -05:00
Richard Henderson b48508a6c1
tcg: Emit barriers with parallel_cpus
Backports commit 91682118aa330aff7e8ef0cc685c32d101f49940 from qemu
2018-02-27 22:28:33 -05:00
Richard Henderson 064543a415
tcg: Add CONFIG_ATOMIC64
Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation. Do so by dropping back to single-step.

Backports commit df79b996a7b21c6ea7847f7927a2e1a294b86c72 from qemu
2018-02-27 22:25:36 -05:00
Richard Henderson da01e53757
tcg: Add atomic128 helpers
Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction. Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.

Backports commit 7ebee43ee3e2fcd7b5063058b7ef74bc43216733 from qemu
2018-02-27 21:43:48 -05:00
Richard Henderson 5c0ce1b99c
tcg: Add atomic helpers
Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.

Backports commit c482cb117cc418115ca9c6d21a7a2315414c0a40 from qemu
2018-02-27 15:57:47 -05:00
Richard Henderson 0245f93c02
cputlb: Remove includes from softmmu_template.h
We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.

Backports commit 40978428853e2f7b4597ab2a9ffeb187333802dc from qemu
2018-02-27 12:40:43 -05:00
Richard Henderson 5c79851143
cputlb: Tidy some macros
TGT_LE and TGT_BE are not size dependent and do not need to be
redefined. The others are no longer used at all.

Backports commit c86c6e4c80fee4d9423bedb10ba9e9c4aa68f861 from qemu
2018-02-27 12:36:25 -05:00
Richard Henderson 4da1cfb902
cputlb: Move most of iotlb code out of line
Saves 2k code size off of a cold path.

Backports commit 82a45b96a203a7403427183f1afd3d295222ff7d from qemu
2018-02-27 12:34:19 -05:00
Richard Henderson 5df7c9eec7
cputlb: Move probe_write out of softmmu_template.h
Backports commit 3b08f0a92545ba06fbdeaae929a5172480300c33 from qemu
2018-02-27 12:25:24 -05:00
Yongbok Kim 79e4c001a9
softmmu: Add probe_write()
Probe for whether the specified guest write access is permitted.
If it is not permitted then an exception will be taken in the same
way as if this were a real write access (and we will not return).
Otherwise the function will return, and there will be a valid
entry in the TLB for this access.

Backports commit 3b4afc9e75ab1a95f33e41f462921093f8a109c4 from qemu
2018-02-27 12:20:50 -05:00
Richard Henderson 1c9c8d3f10
cputlb: Replace SHIFT with DATA_SIZE
Backports commit dea2198201b3e0151d75b42774c51cf2ffe2ca4b from qemu
2018-02-27 12:00:33 -05:00
Richard Henderson e35aacd5ae
tcg: Add EXCP_ATOMIC
When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.

Backports commit fdbc2b5722f6092e47181a947c90fd4bdcc1c121 from qemu

Also backports parts of commit 02d57ea115b7669f588371c86484a2e8ebc369be
2018-02-27 11:57:58 -05:00
Richard Henderson d5510a546f
int128: Add int128_make128
Allows Int128 to be used more generally, rather than having to
begin with 64-bit inputs and accumulate.

Backports commit 1edaeee0955fba7d834b7c8f4e372e7eae030745 from qemu
2018-02-27 11:06:33 -05:00
Richard Henderson 9084e5fe1b
int128: Use __int128 if available
Backports commit 0846beb36641e8f0c3ee55a5bb84d468b653c852 from qemu
2018-02-27 11:03:06 -05:00
Richard Henderson 4fdbe94eea
exec: Avoid direct references to Int128 parts
Backports commit 258dfaaad05a5fbe32a142b794e1df3e16501d0e from qemu
2018-02-27 11:01:43 -05:00
Richard Henderson ba1c63572e
atomics: Add __nocheck atomic operations
While the check against sizeof(void *) is appropriate for
normal usage within qemu, there are places in which we want
wider operaions and have checked for their existance.

Backports commit 84bca3927b36fb1d9a2ca85cbbdf9023d2b84678 from qemu
2018-02-27 11:00:20 -05:00
Lioncash a59eef391e
atomic: MSVC compatible equivalents to some functions 2018-02-27 10:56:04 -05:00
Emilio G. Cota c837d76a86
atomics: add atomic_op_fetch variants
This paves the way for upcoming work.

Backports commit 83d0c719f837724d9e3963b078211b2242bdd2a5 from qemu
2018-02-27 10:28:27 -05:00
Emilio G. Cota 102a53aa50
atomics: add atomic_xor
This paves the way for upcoming work.

Backports commit 61696ddbdc74263ddb6869856772cfe355a5d3bd from qemu
2018-02-27 10:23:31 -05:00
Richard Henderson 3fe8d46a15
atomics: Add parameters to macros
Making these functional rather than object macros will
prevent later problems with complex macro expansion.

Backports commit d1a9f2d12fcfc942924956fbe321aedf4226ccb7 from qemu
2018-02-27 10:21:35 -05:00
Richard Henderson 4168095fed
target-m68k: Optimize gen_flush_flags
Backports commit 36f0399d46f2ccf4f6e7451ba46b1e8d0e9ab341 from qemu
2018-02-27 10:19:54 -05:00
Richard Henderson 7403e63f2f
target-m68k: Optimize some comparisons
Backports commit 9d896621c1820fd8f437fac26fd7d2e0921091c3 from qemu
2018-02-27 10:15:20 -05:00
Richard Henderson 672a28173f
target-m68k: Use setcond for scc
Backports commit b459e3eccfae7fe83e30187c391de00bccf4f51d from qemu
2018-02-27 10:11:35 -05:00
Richard Henderson ed6feb9329
target-m68k: Introduce DisasCompare
Backports commit 6a432295d73df91890dc70c4a94dcc4ba88ad1c3 from qemu
2018-02-27 10:08:32 -05:00
Richard Henderson 4e498cc54d
target-m68k: Reorg flags handling
Separate all ccr bits. Continue to batch updates via cc_op.

Backports commit 620c6cf66584bfbee90db84a7e87a6eabf230ca9 from qemu
2018-02-27 10:02:02 -05:00
Richard Henderson 121309a4d0
target-m68k: Reorg flags handling
Separate all ccr bits. Continue to batch updates via cc_op.

Signed-off-by: Richard Henderson <rth@twiddle.net>

Fix gen_logic_cc() to really extend the size of the result.
Fix gen_get_ccr(): update cc_op as it is used by the helper.
Factorize flags computing and src/ccr cleanup

Backports commit 620c6cf66584bfbee90db84a7e87a6eabf230ca9 from qemu
2018-02-27 09:30:32 -05:00
Richard Henderson 61ab9a42cd
target-m68k: Remove incorrect clearing of cc_x
The CF docs certainly doesnt suggest this is true.

Backports commit 18dd87f26bed46f22bb1b9536329c02de500f407 from qemu
2018-02-27 09:21:26 -05:00
Richard Henderson 187c2a9807
target-m68k: Some fixes to SR and flags management
Backports commit 99c514485b1d7922c4ca1ed767fd45525de4701f from qemu
2018-02-27 09:19:21 -05:00
Richard Henderson 9493b29399
target-m68k: Print flags properly
Backports commit 8e394ccabdb1e439aab092de6b9d2f26432e962f from qemu
2018-02-27 09:17:44 -05:00
Laurent Vivier 57ea90a91f
target-m68k: update CPU flags management
Copied from target-i386

Backports commit 9fdb533fb129b19610941bd1e5dd93e7471a18f5 from qemu
2018-02-27 09:15:29 -05:00
Laurent Vivier 125675e334
target-m68k: don't update cc_dest in helpers
Backports commit 91f90d7191f862ab27528dbdf76cee55c77f79cf from qemu
2018-02-27 09:04:51 -05:00
Laurent Vivier 12f9ba3fe4
target-m68k: update move to/from ccr/sr
Backports commit 7c0eb318bdcc3667a861e7b0f140df0b6d9895e2 from qemu
2018-02-27 08:57:05 -05:00
Laurent Vivier b8366d5b31
target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
Update cc_op directly from tcg_gen_insn_start() and
restore_state_to_opc()

Copied from target-i386

Backports commit 20a8856eba0980fbe9d2b8ed2b33ecdb9c9fe5ad from qemu
2018-02-27 08:53:02 -05:00
Laurent Vivier a521f4f41d
target-m68k: Replace helper_xflag_lt with setcond
Backports commit f9083519034aaa5ad5cd2c5727bd61c29bf60bc5 from qemu
2018-02-27 08:50:44 -05:00
Laurent Vivier b079255576
target-m68k: allow to update flags with operation on words and bytes
Backports commit 5dbb6784b7e2b833c036b4df58aa07067e35f476 from qemu
2018-02-27 08:47:12 -05:00
Laurent Vivier f069762b61
target-m68k: REG() macro cleanup
Backports commit bcc098b0c23b4dd902ff56987d769bd839677331 from qemu
2018-02-27 08:37:25 -05:00
Laurent Vivier 3d59fe56b3
target-m68k: set PAGE_BITS to 12 for m68k
Backports commit 2b04e85a3401e13cb19b1de197e6c211eaadca4c from qemu
2018-02-27 08:36:09 -05:00
Laurent Vivier 292fc83c86
target-m68k: define operand sizes
Backports commit 7ef25cdd6cee4fa468d6cb913fa064a6689faf7d from qemu
2018-02-27 08:35:13 -05:00
Laurent Vivier 2653165c63
target-m68k: introduce read_imXX() functions
Read a 8, 16 or 32bit immediat constant.

An immediate constant is stored in the instruction opcode and
can be in one or two extension words.

Backports commit 28b68cd79ef01e8b1f5bd26718cd8c09a12c625f from qemu
2018-02-27 08:32:04 -05:00
Laurent Vivier d29cbb70b3
target-m68k: manage scaled index
Scaled index is not supported by 68000, 68008, and 68010.

EA = (bd + PC) + Xn.SIZE*SCALE + od

Ignore it:

M68000 FAMILY PROGRAMMER’S REFERENCE MANUAL
2.4 BRIEF EXTENSION WORD FORMAT COMPATIBILITY

"If the MC68000 were to execute an instruction that
encoded a scaling factor, the scaling factor would be
ignored and would not access the desired memory address.
The earlier microprocessors do not recognize the brief
extension word formats implemented by newer processors.
Although they can detect illegal instructions, they do not
decode invalid encodings of the brief extension word formats
as exceptions."

Backports commit d8633620a112296fcf6a6ae9a1cbba614c0ca502 from qemu
2018-02-27 08:27:20 -05:00
Laurent Vivier fa4a71a1bf
target-m68k: define m680x0 CPUs and features
This patch defines height new features:

- M68K_FEATURE_SCALED_INDEX, scaled address index register
- M68K_FEATURE_LONG_MULDIV, 32bit multiply/divide
- M68K_FEATURE_QUAD_MULDIV, 64bit multiply/divide
- M68K_FEATURE_BCCL, long conditional branches
- M68K_FEATURE_BITFIELD, bit field instructions
- M68K_FEATURE_FPU, FPU instructions
- M68K_FEATURE_CAS, cas instruction
- M68K_FEATURE_BKPT, bkpt instruction

Backports commit f076803bbf6ad1618f493f543faff97f3dd0c970 from qemu
2018-02-27 08:26:06 -05:00
John Paul Adrian Glaubitz 2fd7779aa5
target-m68k: Build the opcode table only once to avoid multithreading issues
Backports commit b208525797b031c1be4121553e21746686318a38 from qemu
2018-02-27 08:14:35 -05:00
Laurent Vivier fd84549b3e
target-m68k: fix DEBUG_DISPATCH
Backports commit a1ff19302007986fa081738e88905a715bd68e2e from qemu
2018-02-27 08:07:21 -05:00
Daniel P. Berrange 83a5bf2d25
qapi: rename QmpOutputVisitor to QObjectOutputVisitor
The QmpOutputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one wants a QObject. Rename it
to better reflect its functionality as a generic QAPI
to QObject converter.

The commit before previous renamed the files, this one renames C
identifiers.

Backports commit 7d5e199ade76c53ec316ab6779800581bb47c50a from qemu
2018-02-27 08:05:33 -05:00
Daniel P. Berrange 2949a90977
qapi: rename QmpInputVisitor to QObjectInputVisitor
The QmpInputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one has a QObject. Rename it
to better reflect its functionality as a generic QObject
to QAPI converter.

The previous commit renamed the files, this one renames C identifiers.

Backports commit 09e68369a88d7de0f988972bf28eec1b80cc47f9 from qemu
2018-02-26 15:54:15 -05:00
Daniel P. Berrange 228f122248
qapi: rename *qmp-*-visitor* to *qobject-*-visitor*
The QMP visitors have no direct dependency on QMP. It is
valid to use them anywhere that one has a QObject. Rename them
to better reflect their functionality as a generic QObject
to QAPI converter.

This is the first of three parts: rename the files. The next two
parts will rename C identifiers. The split is necessary to make git
rename detection work.

Backports commit b3db211f3c80bb996a704d665fe275619f728bd4 from qemu
2018-02-26 15:42:37 -05:00