unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-12-23 20:15:31 +00:00

Author	SHA1	Message	Date
Emilio G. Cota	ae3e22a689	tb hash: hash phys_pc, pc, and flags with xxhash For some workloads such as arm bootup, tb_phys_hash is performance-critical. The is due to the high frequency of accesses to the hash table, originated by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's. More info: https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html To dig further into this I modified an arm image booting debian jessie to immediately shut down after boot. Analysis revealed that quite a bit of time is unnecessarily spent in tb_phys_hash: the cause is poor hashing that results in very uneven loading of chains in the hash table's buckets; the longest observed chain had ~550 elements. The appended addresses this with two changes: 1) Use xxhash as the hash table's hash function. xxhash is a fast, high-quality hashing function. 2) Feed the hashing function with not just tb_phys, but also pc and flags. This improves performance over using just tb_phys for hashing, since that resulted in some hash buckets having many TB's, while others getting very few; with these changes, the longest observed chain on a single hash bucket is brought down from ~550 to ~40. Tests show that the other element checked for in tb_find_physical, cs_base, is always a match when tb_phys+pc+flags are a match, so hashing cs_base is wasteful. It could be that this is an ARM-only thing, though. UPDATE: On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote: > The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB > consisting of only a delay slot). > It may well still turn out to be reasonable to ignore cs_base for hashing. BTW, after this change the hash table should not be called "tb_hash_phys" anymore; this is addressed later in this series. This change gives consistent bootup time improvements. I tested two host machines: - Intel Xeon E5-2690: 11.6% less time - Intel i7-4790K: 19.2% less time Increasing the number of hash buckets yields further improvements. However, using a larger, fixed number of buckets can degrade performance for other workloads that do not translate as many blocks (600K+ for debian-jessie arm bootup). This is dealt with later in this series. Backports commit 42bd32287f3a18d823f2258b813824a39ed7c6d9 from qemu	2018-02-24 18:00:14 -05:00
Sergey Fedorov	3a9c5e7509	cpu-exec: Fix direct jump to TB spanning page It is not safe to make a direct jump to a TB spanning two pages in system emulation because the mapping for the second page can get changed but we don't take care of direct jumps in this case. However in user mode emulation, this is not the case because there's only static address translation and TBs are always invalidated properly. Backports commit c88c67e58b61618a904d2333ceebefc3c852d32e from qemu	2018-02-24 03:24:53 -05:00
Paolo Bonzini	9485b7c2e1	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Backports commit 63c915526d6a54a95919ebece83fa9ca631b2508 from qemu	2018-02-24 02:39:08 -05:00
Paolo Bonzini	2f4ae94b5c	target-i386: make cpu-qom.h not target specific Make X86CPU an opaque type within cpu-qom.h, and move all definitions of private methods, as well as all type definitions that require knowledge of the layout to cpu.h. This helps making files independent of NEED_CPU_H if they only need to pass around CPU pointers. Backports commit 4da6f8d954429c0cd1471d25cb9dbe909607374e from qemu	2018-02-24 00:55:22 -05:00
Sergey Fedorov	eab60b7c77	cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt() Backports commit 8b1fe3f439eaa2f0a6ee7737942bb6c405725867 from qemu	2018-02-24 00:27:05 -05:00
Sergey Fedorov	b4b7b88f69	cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec() Backports commit ba048a4ae15ba0f70c6dcb12ee05db120408de78 from qemu	2018-02-24 00:16:40 -05:00
Sergey Fedorov	aefb8935a9	cpu-exec: Move TB execution stuff out of cpu_exec() Simplify cpu_exec() by extracting TB execution code outside of cpu_exec() into a new static inline function cpu_loop_exec_tb(). Backports commit 928de9ee14b0b63ee9f9275732ed3e1c8b5f4790 from qemu	2018-02-24 00:15:24 -05:00
Sergey Fedorov	d4ef96abf2	cpu-exec: Move interrupt handling out of cpu_exec() Simplify cpu_exec() by extracting interrupt handling code outside of cpu_exec() into a new static inline function cpu_handle_interrupt(). Backports commit c385e6e49763c6dd5dbbd90fadde95d986f8bd38 from qemu	2018-02-24 00:09:06 -05:00
Sergey Fedorov	c1b52a4387	cpu-exec: Move exception handling out of cpu_exec() Simplify cpu_exec() by extracting exception handling code out of cpu_exec() into a new static inline function cpu_handle_exception(). Also make cpu_handle_debug_exception() inline as it is used only once. Backports commit ea284766ec6b9f1712369249566b4c372f3cec8b from qemu	2018-02-24 00:03:37 -05:00
Sergey Fedorov	fc3d135dac	cpu-exec: Move halt handling out of cpu_exec() Simplify cpu_exec() by extracting CPU halt state handling code out of cpu_exec() into a new static inline function cpu_handle_halt(). Backports commit 8b2d34e997371c9729a0f41e3cc624d4300bbe78 from qemu	2018-02-23 23:53:20 -05:00
Lioncash	88d00a75ca	cpu-exec: move cpu_exec to the bottom of the file Remove forward declarations	2018-02-23 23:50:28 -05:00
Sergey Fedorov	0088ca994f	cpu-exec: Remove relic orphaned comment This comment should have been deleted by commit 0ac087f1f3ae ("removed unused code") but somehow it is still here. There's no point to keep it. Backports commit c6f0d9f84c43ae973270df1a77482466558ee487 from qemu	2018-02-23 23:47:05 -05:00
Sergey Fedorov	1a768018c2	tcg: Remove needless CPUState::current_tb This field was used for telling cpu_interrupt() to unlink a chain of TBs being executed when it worked that way. Now, cpu_interrupt() don't do this anymore. So we don't need this field anymore. Backports commit 3213525f8ab48742db09dab18cb9ae6f36a6c921 from qemu	2018-02-23 23:45:42 -05:00
Sergey Fedorov	73c75b4cf7	cpu-exec: Move TB chaining into tb_find_fast() Move tb_add_jump() call and surrounding code from cpu_exec() into tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct chaining optimization details into tb_find_fast(). It also allows to move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer to tb_find_slow() which also manipulates the lock. Backports commit a0522c7a55cc8ac76d82884cf8e52f76daa664cc from qemu	2018-02-23 23:38:57 -05:00
Sergey Fedorov	ba9a237586	tcg: Rework tb_invalidated_flag 'tb_invalidated_flag' was meant to catch two events: * some TB has been invalidated by tb_phys_invalidate(); * the whole translation buffer has been flushed by tb_flush(). Then it was checked: * in cpu_exec() to ensure that the last executed TB can be safely linked to directly call the next one; * in cpu_exec_nocache() to decide if the original TB should be provided for further possible invalidation along with the temporarily generated TB. It is always safe to patch an invalidated TB since it is not going to be used anyway. It is also safe to call tb_phys_invalidate() for an already invalidated TB. Thus, setting this flag in tb_phys_invalidate() is simply unnecessary. Moreover, it can prevent from pretty proper linking of TBs, if any arbitrary TB has been invalidated. So just don't touch it in tb_phys_invalidate(). If this flag is only used to catch whether tb_flush() has been called then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using only 'true' and 'false' to set its value. Also, instead of setting it in tb_gen_code(), just after tb_flush() has been called, do it right inside of tb_flush(). In cpu_exec(), this flag is used to track if tb_flush() has been called and have made 'next_tb' (a reference to the last executed TB) invalid for linking it to directly call the next TB. tb_flush() can be called during the CPU execution loop from tb_gen_code(), during TB execution or by another thread while 'tb_lock' is released. Catch for translation buffer flush reliably by resetting this flag once before first TB lookup and each time we find it set before trying to add a direct jump. Don't touch in in tb_find_physical(). Each vCPU has its own execution loop in multithreaded mode and thus should have its own copy of the flag to be able to reset it with its own 'next_tb' and don't affect any other vCPU execution thread. So make this flag per-vCPU and move it to CPUState. In cpu_exec_nocache(), we only need to check if tb_flush() has been called from tb_gen_code() called by cpu_exec_nocache() itself. To do this reliably, preserve the old value of the flag, reset it before calling tb_gen_code(), check afterwards, and combine the saved value back to the flag. This patch is based on the patch "tcg: move tb_invalidated_flag to CPUState" from Paolo Bonzini <pbonzini@redhat.com>. Backports commit 6f789be56d3f38e9214dafcfab3bf9be7191f370 from qemu	2018-02-23 23:34:51 -05:00
Sergey Fedorov	c9700af2bd	tcg: Clean up from 'next_tb' The value returned from tcg_qemu_tb_exec() is the value passed to the corresponding tcg_gen_exit_tb() at translation time of the last TB attempted to execute. It is a little confusing to store it in a variable named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer and additional information in its two least significant bits. Break it down right away into two variables named 'last_tb' and 'tb_exit' which are a pointer to the last TB attempted to execute and the TB exit reason, correspondingly. This simplifies the code and improves its readability. Correct a misleading documentation comment for tcg_qemu_tb_exec() and fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in another couple of places. Backports commit 819af24b9c1e95e6576f1cefd32f4d6bf56dfa56 from qemu	2018-02-23 23:29:04 -05:00
Sergey Fedorov	73c59faad5	tcg: Clean up direct block chaining safety checks We don't take care of direct jumps when address mapping changes. Thus we must be sure to generate direct jumps so that they always keep valid even if address mapping changes. Luckily, we can only allow to execute a TB if it was generated from the pages which match with current mapping. Document tcg_gen_goto_tb() declaration and note the reason for destination PC limitations. Some targets with variable length instructions allow TB to straddle a page boundary. However, we make sure that both of TB pages match the current address mapping when looking up TBs. So it is safe to do direct jumps into the both pages. Correct the checks for some of those targets. Given that, we can safely patch a TB which spans two pages. Remove the unnecessary check in cpu_exec() and allow such TBs to be patched. Backports commit 5b053a4a28278bca606eeff7d1c0730df1b047e9 from qemu	2018-02-23 22:26:00 -05:00
Emilio G. Cota	170f6e0b3b	tb: consistently use uint32_t for tb->flags We are inconsistent with the type of tb->flags: usage varies loosely between int and uint64_t. Settle to uint32_t everywhere, which is superior to both: at least one target (aarch64) uses the most significant bit in the u32, and uint64_t is wasteful. Compile-tested for all targets. Backports commit 89fee74a0f066dfd73830a7b5fa137e87888c870 from qemu	2018-02-23 21:28:11 -05:00
Alex Bennée	3da7d9d9ae	qemu-log: dfilter-ise exec, out_asm, op and opt_op qemu-log: dfilter-ise exec, out_asm, op and opt_op This ensures the code generation debug code will honour -dfilter if set. For the "exec" tracing I've added a new inline macro for efficiency's sake. Backports commit d977e1c2dbc9e63454b2000f91954d02543bf43b from qemu	2018-02-22 10:06:19 -05:00
Peter Maydell	3f5e36e15f	qemu-log: Improve the exec TB execution logging Improve the TB execution logging so that it is easier to identify what is happening from trace logs: * move the "Trace" logging of executed TBs into cpu_tb_exec() so that it is emitted if and only if we actually execute a TB, and for consistency for the CPU state logging * log when we link two TBs together via tb_add_jump() * log when cpu_tb_exec() returns early from a chain of TBs The new style logging looks like this: Trace 0x7fb7cc822ca0 [ffffffc0000dce00] Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823420 [ffffffc000302688] Trace 0x7fb7cc8234a0 [ffffffc000302698] Trace 0x7fb7cc823520 [ffffffc0003026a4] Trace 0x7fb7cc823560 [ffffffc0000dce44] Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc822fd0 [ffffffc0000dd52c] Backports commit 1a830635229e14c403600167823ea6b3b79d3097 from qemu	2018-02-22 09:40:11 -05:00
Peter Maydell	293266a9d8	exec: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Backports commit 7b31bbc2e68605ab2f10dc609dd54cf4c7b5f49a from qemu	2018-02-19 00:49:55 -05:00
Paolo Bonzini	3907ea1a3b	cpu-exec: Fix compiler warning (-Werror=clobbered) Reloading of local variables after sigsetjmp is only needed for some buggy compilers. The code which should reload these variables causes compiler warnings with gcc 4.7 when compiler optimizations are enabled: cpu-exec.c:204:15: error: variable ‘cpu’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] cpu-exec.c:207:15: error: variable ‘cc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] cpu-exec.c:202:28: error: argument ‘env’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Now this code is only used for compilers which need it (and gcc 4.5.x, x > 0 which does not need it but won't give warnings). There were bug reports for clang and gcc 4.5.0, while gcc 4.5.1 was reported to work fine without the reload code. For clang it is not clear which versions are affected, so simply keep the status quo for all clang compilations. This can be improved later. Backports commit 0448f5f8b816923b198ab6c32286fd1f3b2f3e45 from qemu	2018-02-17 15:24:15 -05:00
Richard Henderson	e9e8833da4	cpu-exec: Add nochain debug flag Respect it to avoid linking TBs together. Backports commit 89a82cd4b6a90fe117fa715e2abe51d5c607560c from qemu	2018-02-17 15:24:04 -05:00
Peter Crosthwaite	bf067fcc26	cpu-exec: Migrate some generic fns to cpu-exec-common The goal is to split the functions such that cpu-exec is CPU specific content, while cpus-exec-common.c is generic code only. The function interface to cpu-exec needs to be virtualised to prepare support for multi-arch and moving these definitions out saves bloating the QOM interface. So move these definitions out of cpu-exec to a new module, cpu-exec-common. Backports commit 5abf9495ca9ff41160260ac274115825c10545cc from qemu	2018-02-17 15:23:51 -05:00
Paolo Bonzini	62045513bb	tcg: synchronize exit_request and tcg_current_cpu accesses Synchronize the remaining pair of accesses in cpu_signal. These should be necessary on Windows as well, at least in theory. Probably SuspendProcess and ResumeProcess introduce some implicit memory barrier. Backports relevant parts of commit aed807c8e2bf009b2c6a35490d4fd4383887221d from qemu	2018-02-17 15:23:49 -05:00
Paolo Bonzini	7f1d59bb83	tcg: synchronize cpu->exit_request and cpu->tcg_exit_req accesses Backports commit ab096a75cd626dcd4ad34b2a11652df0269bee0d from qemu	2018-02-17 15:23:49 -05:00
Paolo Bonzini	1cfd4190a7	tcg: assign cpu->current_tb in a simpler place TCG has not been reading cpu->current_tb from signal handlers for years. The code that synchronized cpu_exec with the signal handler is not needed anymore. Backports commit b0a46fa796504c7334202877a68c857e49f7c96c from qemu	2018-02-17 15:23:49 -05:00
Paolo Bonzini	96e5a7ced3	tcg: introduce tcg_current_cpu This is already useful on Windows in order to remove tls.h, because accesses to current_cpu are done from a different thread on that platform. It will be used on POSIX platforms as soon TCG stops using signals to interrupt the execution of translated code. Backports commit 9373e63297c43752f9cf085feb7f5aed57d959f8 from qemu	2018-02-17 15:23:49 -05:00
Pavel Dovgalyuk	4a05c9ee28	cpu-exec: introduce loop exit with restore function This patch introduces loop exit function, which also restores guest CPU state according to the value of host program counter. Backports commit 1c3c8af1fb40a481c07749e0448644d9b7700415 from qemu	2018-02-17 15:23:38 -05:00
Peter Crosthwaite	e51f8c9f6f	cpu-exec: Purge all uses of ENV_GET_CPU() Remove un-needed usages of ENV_GET_CPU() by converting the APIs to use CPUState pointers and retrieving the env_ptr as minimally needed. Scripted conversion for target-* change: for I in target-/cpu.h; do sed -i \ 's/$^int cpu_[^_]_exec($[^ ][^ ]* \s);$/\1CPUState cpu);/' \ $I; done Backports commit ea3e9847408131abc840240bd61e892d28459452 from qemu	2018-02-17 15:23:18 -05:00
Peter Crosthwaite	8200453545	translate-all: Change tb_flush() env argument to cpu All of the core-code usages of this API have the cpu pointer handy so pass it in. There are only 3 architecture specific usages (2 of which are commented out) which can just use ENV_GET_CPU() locally to get the cpu pointer. The reduces core code usage of the CPU env, which brings us closer to common-obj'ing these core files. Backports commit bbd77c180d7ff1b04a7661bb878939b2e1d23798 from qemu	2018-02-17 15:23:18 -05:00
Peter Crosthwaite	09d23c6604	include/exec: Move tb hash functions out This is one of very few things in exec-all with a genuine CPU architecture dependency. Move these hashing helpers to a new header to trim exec-all.h down to a near architecture-agnostic header. The defs are only used by cpu-exec and translate-all which are both arch-obj's so the new tb-hash.h has no core code usage. Backports commit e1b89321bafea9fb33d87852fc91fee579d17dfe from qemu	2018-02-17 15:23:15 -05:00
Paolo Bonzini	a46accd252	exec: make iotlb RCU-friendly After the previous patch, TLBs will be flushed on every change to the memory mapping. This patch augments that with synchronization of the MemoryRegionSections referred to in the iotlb array. With this change, it is guaranteed that iotlb_to_region will access the correct memory map, even once the TLB will be accessed outside the BQL. Backports commit 9d82b5a792236db31a75b9db5c93af69ac07c7c5 from qemu	2018-02-12 15:20:39 -05:00
Paolo Bonzini	3fbda890df	exec: introduce cpu_reload_memory_map This for now is a simple TLB flush. This can change later for two reasons: 1) an AddressSpaceDispatch will be cached in the CPUState object 2) it will not be possible to do tlb_flush once the TCG-generated code runs outside the BQL. Backports commit 76e5c76f2e2e0d20bab2cd5c7a87452f711654fb from qemu	2018-02-12 15:09:49 -05:00
Andrew Dutcher	363cbacee4	Only set eip to the instruction pointer after an interrupt if the interrupt was user-generated (#875 )	2017-08-29 17:14:36 +07:00
xorstream	b0ae2138fb	Merge remote-tracking branch 'unicorn-engine/master' into msvc_native	2017-01-20 22:37:51 +11:00
Nguyen Anh Quynh	42771848d6	no more spinlock	2017-01-20 14:57:33 +08:00
xorstream	1aeaf5c40d	This code should now build the x86_x64-softmmu part 2.	2017-01-19 22:50:28 +11:00
Hoang-Vu Dang	9a2a5b15d8	Rename unhandled CPU exception	2016-07-05 11:10:39 -05:00
Hoang-Vu Dang	9cdca5a32b	Unhandled interrupt will halt execution	2016-07-04 17:07:57 -05:00
Nguyen Anh Quynh	c8569d8128	arm: fix change PC feature. now tests/regress/callback-pc.py passes	2016-01-28 16:03:19 +08:00
Nguyen Anh Quynh	5a04bcb115	allow to change PC during callback. this solves issue #210	2016-01-28 14:06:17 +08:00
Ryan Hileman	93052f6566	refactor to allow multiple hooks for one type	2016-01-22 18:41:43 -08:00
farmdve	c9f4bd27cc	Reset env->invalid_error before executing a translation block.	2016-01-11 18:12:57 +02:00
Nguyen Anh Quynh	9099755ca1	flush JIT cache before finishing emulation. this fixes issue #263 . TODO: optimize this for better performance	2015-11-13 23:57:03 +08:00
Nguyen Anh Quynh	938d0b89eb	x86: check for exit request after every hooked instruction. this should fix issue #232	2015-11-07 01:02:45 +08:00
Nguyen Anh Quynh	9e64cba6ec	Rename some hook related enums: - UC_ERR_READ_INVALID -> UC_ERR_READ_UNMAPPED - UC_ERR_WRITE_INVALID -> UC_ERR_WRITE_UNMAPPED - UC_ERR_FETCH_INVALID -> UC_ERR_FETCH_UNMAPPED - UC_MEM_READ_INVALID -> UC_MEM_READ_UNMAPPED - UC_MEM_WRITE_INVALID -> UC_MEM_WRITE_UNMAPPED - UC_MEM_FETCH_INVALID -> UC_MEM_FETCH_UNMAPPED - UC_HOOK_MEM_READ_INVALID -> UC_HOOK_MEM_READ_UNMAPPED - UC_HOOK_MEM_WRITE_INVALID -> UC_HOOK_MEM_WRITE_UNMAPPED - UC_HOOK_MEM_FETCH_INVALID -> UC_HOOK_MEM_FETCH_UNMAPPED - UC_HOOK_MEM_INVALID -> UC_HOOK_MEM_UNMAPPED This also renames some newly added macros to use _INVALID postfix: - UC_HOOK_MEM_READ_ERR -> UC_HOOK_MEM_READ_INVALID - UC_HOOK_MEM_WRITE_ERR -> UC_HOOK_MEM_WRITE_INVALID - UC_HOOK_MEM_FETCH_ERR -> UC_HOOK_MEM_FETCH_INVALID - UC_HOOK_MEM_ERR -> UC_HOOK_MEM_INVALID Fixed all the bindings Java, Go & Python.	2015-09-30 14:46:55 +08:00
Nguyen Anh Quynh	2b0b4169bc	mips: advance PC for SYSCALL instruction. this fixes issue #157	2015-09-28 10:58:43 +08:00
Nguyen Anh Quynh	886946dcf4	do not use syscall to quit emulation. this can fix issues #147 & #148	2015-09-26 16:49:00 +08:00
Nguyen Anh Quynh	a166c24f8e	x86: correct EIP of INT instruction by updating it only after calling interrupt handler	2015-09-06 14:58:11 +08:00

1 2

56 commits