unicorn/qemu/include/exec
Emilio G. Cota 210d13ec49
tcg: consolidate TB lookups in tb_lookup__cpu_state
This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

(A) Lookup succeeds for TB in hash without tb_lock
(B) Sets the TB's tb->invalid flag
(B) Removes the TB from tb_htable
(B) Clears all CPU's tb_jmp_cache
(A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

Performance counter stats for 'taskset -c 0 qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=jessie.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

Before:
18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% )
23,142 context-switches # 0.001 M/sec ( +- 0.50% )
1 CPU-migrations # 0.000 M/sec
10,558 page-faults # 0.001 M/sec ( +- 0.95% )
53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%]
24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%]
16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%]
76,267,572,582 instructions # 1.41 insns per cycle
12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%]
263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%]

19.648474449 seconds time elapsed ( +- 0.82% )

After, w/ inline (this patch):
18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% )
23,048 context-switches # 0.001 M/sec ( +- 0.48% )
1 CPU-migrations # 0.000 M/sec
10,708 page-faults # 0.001 M/sec ( +- 0.81% )
53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%]
23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%]
16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%]
75,786,269,766 instructions # 1.42 insns per cycle
12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%]
260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%]

19.340502161 seconds time elapsed ( +- 0.56% )

After, w/o inline:
18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% )
23,230 context-switches # 0.001 M/sec ( +- 0.42% )
1 CPU-migrations # 0.000 M/sec
10,563 page-faults # 0.001 M/sec ( +- 1.27% )
54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%]
24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%]
16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%]
77,659,755,503 instructions # 1.43 insns per cycle
12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%]
261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%]

19.700174670 seconds time elapsed ( +- 0.56% )

Backports commit f6bb84d53110398f4899c19dab4e0fe9908ec060 from qemu
2018-03-05 02:42:46 -05:00
..
address-spaces.h Clean up header guards that don't match their file name 2018-02-25 04:18:42 -05:00
cpu-all.h exec: introduce MemoryRegionCache 2018-03-01 10:50:30 -05:00
cpu-common.h cpu: Introduce a wrapper for tlb_flush() that can be used in common code 2018-03-03 21:24:55 -05:00
cpu-defs.h tcg: add CONFIG_TCG guards in headers 2018-03-03 21:37:52 -05:00
cpu_ldst.h cpu_ldst.h: use correct guest address parameter 2018-03-01 08:56:37 -05:00
cpu_ldst_template.h softmmu: add helper function to pass through retaddr 2018-02-17 15:23:38 -05:00
cputlb.h include/exec: Move cputlb exec.c defs out 2018-02-23 10:52:25 -05:00
exec-all.h exec-all: fix typos in TranslationBlock's documentation 2018-03-05 02:10:28 -05:00
gen-icount.h gen-icount: check cflags instead of use_icount global 2018-03-04 14:26:26 -05:00
helper-gen.h Clean up decorations and whitespace around header guards 2018-02-25 04:26:02 -05:00
helper-head.h Clean up header guards that don't match their file name 2018-02-25 04:18:42 -05:00
helper-proto.h Clean up decorations and whitespace around header guards 2018-02-25 04:26:02 -05:00
helper-tcg.h tcg: Expand glue macros before stringifying helper names 2018-03-03 23:07:21 -05:00
hwaddr.h qemu-common: push cpu.h inclusion out of qemu-common.h 2018-02-24 01:50:56 -05:00
ioport.h hw: remove pio_addr_t 2018-02-24 02:43:16 -05:00
memattrs.h memory.h: Move MemTxResult type to memattrs.h 2018-03-04 13:10:47 -05:00
memory-internal.h memory: Open code FlatView rendering 2018-03-04 02:06:48 -05:00
memory.h memory.h: Move MemTxResult type to memattrs.h 2018-03-04 13:10:47 -05:00
ram_addr.h memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr 2018-02-24 03:34:44 -05:00
ramlist.h memory: RCU ram_list.dirty_memory[] for safe RAM hotplug 2018-02-22 15:38:03 -05:00
semihost.h exec: Add semihosting stubs 2018-02-17 15:23:33 -05:00
tb-context.h tcg: allocate TB structs before the corresponding translated code 2018-03-03 17:05:49 -05:00
tb-hash-xx.h Clean up ill-advised or unusual header guards 2018-02-25 04:22:46 -05:00
tb-hash.h tb-hash: improve tb_jmp_cache hash function in user mode 2018-03-03 14:11:29 -05:00
tb-lookup.h tcg: consolidate TB lookups in tb_lookup__cpu_state 2018-03-05 02:42:46 -05:00
translator.h tcg: Add generic translation framework 2018-03-04 14:31:16 -05:00