Commit graph

22 commits

Author SHA1 Message Date
Richard Henderson 22ebc5fcee
target-i386: Use clz and ctz opcodes
Backports commit e5143c90883cd32a432eb793cdcce6bee747834a from qemu
2018-03-01 16:17:42 -05:00
Doug Evans 7c874b1b2b
target-i386: Fix eflags.TF/#DB handling of syscall/sysret insns
The syscall and sysret instructions behave a bit differently:
TF is checked after the instruction completes.
This allows the o/s to disable #DB at a syscall by adding TF to FMASK.
And then when the sysret is executed the #DB is taken "as if" the
syscall insn just completed.

Backports commit c52ab08aee6f7d4717fc6b517174043126bd302f from qemu
2018-03-01 10:56:22 -05:00
Paolo Bonzini 560515941a
target-i386: correctly propagate retaddr into SVM helpers
Commit 2afbdf8 ("target-i386: exception handling for memory helpers",
2015-09-15) changed tlb_fill's cpu_restore_state+raise_exception_err
to raise_exception_err_ra. After this change, the cpu_restore_state
and raise_exception_err's cpu_loop_exit are merged into
raise_exception_err_ra's cpu_loop_exit_restore.

This actually fixed some bugs, but when SVM is enabled there is a
second path from raise_exception_err_ra to cpu_loop_exit. This is
the VMEXIT path, and now cpu_vmexit is called without a
cpu_restore_state before.

The fix is to pass the retaddr to cpu_vmexit (via
cpu_svm_check_intercept_param). All helpers can now use GETPC() to pass
the correct retaddr, too.

Backports commit 823fb688ebc52a7d79c1308acb28c92b56820167 from qemu
2018-03-01 09:31:16 -05:00
Emilio G. Cota 3dc16ebca3
target-i386: remove helper_lock()
It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

atomic_add-bench: 5000000 ops/thread, [0,1] range

25 ++---------+----------+---------+----------+----------+----------+---++
+ atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 +Emaster +-N--+ ++
|| |
|++ |
|| |
15 +++ ++
|N| |
|+| |
10 ++| ++
|+|+ |
| | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E|
|+E+E+- +++ +E+------+E+-- |
5 ++|+ ++
|+N+H+--- +++ |
++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- |
0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,2] range

25 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 ++master +-N--+ ++
|E| |
|++ |
||E |
15 ++| ++
|N|| |
|+|| ---+E+------+E+-----+E+------+E|
10 ++| | ---+E+------+E+-----+E+--- +++ +++
||H+E+--+E+-- |
|+++++ |
| || |
5 ++|+H+-- +++ ++
|+N+ - ---+H+------+H+------ |
+ +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H|
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,8] range

40 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
35 +cmpxchg +-H--+ ++
| master +-N--+ ---+E+------+E+------+E+-----+E+------+E|
30 ++| ---+E+-- +++ ++
| | -+E+--- |
25 ++E ---- +++ ++
|+++++ -+E+ |
20 +E+ E-- +++ ++
|H|+++ |
|+| +H+------- |
15 ++H+ ---+++ +H+------ ++
|N++H+-- +++--- +H+------++|
10 ++ +++ - +++ ---+H+ +++ +H+
| | +H+-----+H+------+H+-- |
5 ++| +++ ++
++N+N+--+N++ + + + + + |
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,128] range

160 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
140 +cmpxchg +-H--+ +++ +++ ++
| master +-N--+ E--------E------+E+------++|
120 ++ --| | +++ E+
| -- +++ +++ ++|
100 ++ - ++
| +++- +++ ++|
80 ++ -+E+ -+H+------+H+------H--------++
| ---- ---- +++ H|
| ---+E+-----+E+- ---+H+ ++|
60 ++ +E+--- +++ ---+H+--- ++
| --+++ ---+H+-- |
40 ++ +E+-+H+--- ++
| +H+ |
20 +EE+ ++
+N+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 5000000 ops/thread, [0,1024] range

350 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
300 +cmpxchg +-H--+ +++
| master +-N--+ +++ ||
| +++ | ----E|
250 ++ | ----E---- ++
| ----E--- | ---+H|
200 ++ -+E+--- +++ ---+H+--- ++
| ---- -+H+-- |
| +E+ +++ ---- +++ |
150 ++ ---+++ ---+H+- ++
| --- -+H+-- |
100 ++ ---+E+ ---- +++ ++
| +++ ---+E+-----+H+- |
| -+E+------+H+-- |
50 ++ +E+ ++
+EE+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu
2018-02-27 23:43:22 -05:00
Emilio G. Cota a386368f82
target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
BAR
} else {
FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

Backports commit ae03f8de45427042ecd10b0941a005f21ecc064c from qemu
2018-02-27 22:38:37 -05:00
Paolo Bonzini 1435732c0d
target-i386: implement PKE for TCG
Backports commit 0f70ed4759a29ca932af1e9525729f4f455642f8 from qemu
2018-02-22 10:18:55 -05:00
Richard Henderson 22d4f95912
target-i386: Implement FSGSBASE
Backports commit 07929f2ab2ab9c9e01d4ae79f48f2b2476b715c8 from qemu
2018-02-20 14:45:58 -05:00
Richard Henderson 86cc5862a1
target-i386: Clear bndregs during legacy near jumps
Backports commit 7d117ce81ef6258cdcc0d24c774d045fa4b5fd26 from qemu
2018-02-20 14:36:11 -05:00
Richard Henderson f30c3efd0e
target-i386: Implement BNDLDX, BNDSTX
Backports commit bdd87b3b591add6e4d7c6b6125fcf0d706cc8bc4 from qemu
2018-02-20 14:26:18 -05:00
Richard Henderson 554c41f05f
target-i386: Implement BNDCL, BNDCU, BNDCN
Backports commit 523e28d7614571680d21641bd0bd9b9e84570cee from qemu
2018-02-20 14:22:46 -05:00
Richard Henderson 159e837a6c
target-i386: Perform set/reset_inhibit_irq inline
With helpers that can be reused for other things.

Backports commit 7f0b7141b4c7deab51efd8ee1e83eab2d9b7a9ea from qemu
2018-02-20 13:34:47 -05:00
Richard Henderson 7a7a72f49b
target-i386: Implement XSAVEOPT
Backports commit c9cfe8f9fb21f086e24b3a8f7ccd9c06e4d8d9d6 from qemu
2018-02-20 12:52:10 -05:00
Richard Henderson 6c5b6a0e7f
target-i386: Add XSAVE extension
This includes XSAVE, XRSTOR, XGETBV, XSETBV, which are all related,
as well as the associate cpuid bits.

Backports commit 19dc85dba23c0db1ca932c62e453c37e00761628 from qemu
2018-02-20 12:47:52 -05:00
Richard Henderson b490486028
target-i386: Split fxsave/fxrstor implementation
We will be able to reuse these pieces for XSAVE/XRSTOR.

Backports commit 64dbaff09bb768dbbb13142862554f18ab642866 from qemu
2018-02-20 11:58:00 -05:00
Richard Henderson 7dd4fcc621
target-i386: Rewrite gen_enter inline
Use gen_lea_v_seg for centralized segment base knowledge. Unify
code across 32- and 64-bit. Fix note about "must save state"
before using the out-of-line helpers.

Backports commit 743e398e2fbf2f7183bf7a53c9d011fabcaa1770 from qemu
2018-02-20 10:13:43 -05:00
Richard Henderson fcc9dbc103
target-i386: Check CR4[DE] for processing DR4/DR5
Introduce helper_get_dr so that we don't have to put CR4[DE]
into the scarce HFLAGS resource. At the same time, rename
helper_movl_drN_T0 to helper_set_dr and set the helper flags.

Backports commit d0052339236072bbf08c1d600c0906126b1ab258 from qemu
2018-02-17 15:24:06 -05:00
Eduardo Habkost c6bfe2a03d
target-i386: Handle I/O breakpoints
Backports commit 5223a9423c5fb9e32b0c3eaaa2c0bf8c5cfd6866 from qemu
2018-02-17 15:24:06 -05:00
Pavel Dovgalyuk 08f93c3fe6
target-i386: exception handling for seg_helper functions
This patch fixes exception handling for seg_helper functions.

Backports commit 100ec0991958d0c1b61f140e64dbe92991c6dd2c from qemu
2018-02-17 15:23:50 -05:00
Paolo Bonzini e57e92feca
target-i386: Use correct memory attributes for ioport accesses
In order to do this, stop using the cpu_in*/out* helpers, and instead
access address_space_io directly.

cpu_in* and cpu_out* remain for usage in the monitor, in qtest, and
in Xen.

Backports commit 3f7d84648607cc0fcb3812bb4b88978e2a7aa24f from qemu
2018-02-13 12:27:43 -05:00
Ryan Hileman 0886ae8ede rework code/block tracing 2016-01-22 18:42:27 -08:00
Spl3en 4c3ad139ea (Fix #341) SYSENTER instruction is not properly hooked with uc_hook_add in x86 emulation.
helper_sysenter in qemu/target-i386/seg_helper.c didn't check properly if a call interrupt callback was registred.
It has been fixed by copying the helper_syscall behavior.
2015-12-24 16:00:22 +01:00
Nguyen Anh Quynh 344d016104 import 2015-08-21 15:04:50 +08:00