Commit graph

11 commits

Author SHA1 Message Date
Emilio G. Cota 1677898a09
cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% )
31,574,905,303 cycles # 4.217 GHz ( +- 0.12% )
57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% )
10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% )
173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% )

7.504481349 seconds time elapsed ( +- 0.14% )

- After:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
31,478,476,520 cycles # 4.218 GHz ( +- 0.07% )
57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% )
10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% )
173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% )

7.474970463 seconds time elapsed ( +- 0.07% )

2. SPEC06int:
SPEC06int (test set)
[Y axis: Speedup over master]
1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
| |
1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+
| +++ | +++ tlb-lock-v3 (spinl|ck) |
| +++ | | +++ +++ | | |
1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
| ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### |
1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
| *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # |
0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
| * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # |
| * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # |
0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
| * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # |
0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
| * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
| * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Backports commit 403f290c0603f35f2d09c982bf5549b6d0803ec1 from qemu
2018-10-23 15:37:43 -04:00
Richard Henderson c911ea7128
tcg: Add tlb_index and tlb_entry helpers
Isolate the computation of an index from an address into a
helper before we change that function.

Backports commit 383beda9cf32f795616c3b93f7d6154d70372d4b from qemu
2018-10-23 15:04:27 -04:00
Peter Maydell 6543f9ea26
tcg: Define and use new tlb_hit() and tlb_hit_page() functions
The condition to check whether an address has hit against a particular
TLB entry is not completely trivial. We do this in various places, and
in fact in one place (get_page_addr_code()) we have got the condition
wrong. Abstract it out into new tlb_hit() and tlb_hit_page() inline
functions (one for a known-page-aligned address and one for an
arbitrary address), and use them in all the places where we had the
condition correct.

This is a no-behaviour-change patch; we leave fixing the buggy
code in get_page_addr_code() to a subsequent patch

Backports commit 334692bce7f0653a93b8d84ecde8c847b08dec38 from qemu
2018-07-03 19:21:36 -04:00
Bobby Bingham d46e52d9d0
cpu_ldst.h: use correct guest address parameter
In the user emulation code path, tlb_vaddr_to_host erronesously passed
vaddr as the guest address to be translated, instead of addr, the parameter
which actually contained the guest address.

This resulted in incorrect addresses being used when emulating block copy
(mvc/mvpg) and block clear (xc) instructions for the s390x target.

Backports commit c2a85316902e67530da9d6548139fcce73c0cac6 from qemu
2018-03-01 08:56:37 -05:00
Pavel Dovgalyuk 28f154129b
softmmu: remove now unused functions
Now that the cpu_ld/st_* function directly call helper_ret_ld/st, we can
drop the old helper_ld/st functions.

Backports commit b8611499b940b1b4db67aa985e3a844437bcbf00 from qemu
2018-02-17 15:23:38 -05:00
Benjamin Herrenschmidt 1722be3e73
tlb: Add ifetch argument to cpu_mmu_index()
This is set to true when the index is for an instruction fetch
translation.

The core get_page_addr_code() sets it, as do the SOFTMMU_CODE_ACCESS
acessors.

All targets ignore it for now, and all other callers pass "false".

This will allow targets who wish to split the mmu index between
instruction and data accesses to do so. A subsequent patch will
do just that for PowerPC.

Backports commit 97ed5ccdee95f0b98bedc601ff979e368583472c from qemu
2018-02-17 15:23:37 -05:00
Aurelien Jarno 93df793d4d
softmmu: provide tlb_vaddr_to_host function for user mode
To avoid to many #ifdef in target code, provide a tlb_vaddr_to_host for
both user and softmmu modes. In the first case the function always
succeed and just call the g2h function.

Backports commit 2e83c496261c799b0fe6b8e18ac80cdc0a5c97ce from qemu
2018-02-17 15:22:43 -05:00
Paolo Bonzini 96e7e32972
softmmu: support up to 12 MMU modes
At 8k per TLB (for 64-bit host or target), 8 or more modes
make the TLBs bigger than 64k, and some RISC TCG backends do
not like that. On the affected hosts, cut the TLB size in
half---there is still a measurable speedup on PPC with the
next patch.

Backports commit 1de29aef17a7d70dbc04a7fe51e18942e3ebe313 from qemu
2018-02-13 08:34:52 -05:00
Peter Maydell 9d02c52b8a
cpu_ldst.h: Allow NB_MMU_MODES to be 7
Support guest CPUs which need 7 MMU index values.
Add a comment about what would be required to raise the limit
further (trivial for 8, TCG backend rework for 9 or more).

Backports commit 8f3ae2ae2d02727f6d56610c09d7535e43650dd4 from qemu
2018-02-12 11:21:19 -05:00
xorstream 1aeaf5c40d This code should now build the x86_x64-softmmu part 2. 2017-01-19 22:50:28 +11:00
Nguyen Anh Quynh 344d016104 import 2015-08-21 15:04:50 +08:00