unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-12-20 03:51:41 +00:00

Author	SHA1	Message	Date
Alexey Kardashevskiy	096ca207af	memory: Add reporting of supported page sizes Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate uses when translating, however this information is not available outside the translate context for various checks. This adds a get_min_page_size callback to MemoryRegionIOMMUOps and a wrapper for it so IOMMU users (such as VFIO) can know the minimum actual page size supported by an IOMMU. As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE as fallback. This removes vfio_container_granularity() and uses new helper in memory_region_iommu_replay() when replaying IOMMU mappings on added IOMMU memory region. Backports the relevant parts of commit f682e9c244af7166225f4a50cc18ff296bb9d43e from qemu	2018-02-24 19:23:28 -05:00
Emilio G. Cota	ae3e22a689	tb hash: hash phys_pc, pc, and flags with xxhash For some workloads such as arm bootup, tb_phys_hash is performance-critical. The is due to the high frequency of accesses to the hash table, originated by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's. More info: https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html To dig further into this I modified an arm image booting debian jessie to immediately shut down after boot. Analysis revealed that quite a bit of time is unnecessarily spent in tb_phys_hash: the cause is poor hashing that results in very uneven loading of chains in the hash table's buckets; the longest observed chain had ~550 elements. The appended addresses this with two changes: 1) Use xxhash as the hash table's hash function. xxhash is a fast, high-quality hashing function. 2) Feed the hashing function with not just tb_phys, but also pc and flags. This improves performance over using just tb_phys for hashing, since that resulted in some hash buckets having many TB's, while others getting very few; with these changes, the longest observed chain on a single hash bucket is brought down from ~550 to ~40. Tests show that the other element checked for in tb_find_physical, cs_base, is always a match when tb_phys+pc+flags are a match, so hashing cs_base is wasteful. It could be that this is an ARM-only thing, though. UPDATE: On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote: > The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB > consisting of only a delay slot). > It may well still turn out to be reasonable to ignore cs_base for hashing. BTW, after this change the hash table should not be called "tb_hash_phys" anymore; this is addressed later in this series. This change gives consistent bootup time improvements. I tested two host machines: - Intel Xeon E5-2690: 11.6% less time - Intel i7-4790K: 19.2% less time Increasing the number of hash buckets yields further improvements. However, using a larger, fixed number of buckets can degrade performance for other workloads that do not translate as many blocks (600K+ for debian-jessie arm bootup). This is dealt with later in this series. Backports commit 42bd32287f3a18d823f2258b813824a39ed7c6d9 from qemu	2018-02-24 18:00:14 -05:00
Emilio G. Cota	9ef9de9cf8	exec: add tb_hash_func5, derived from xxhash This will be used by upcoming changes for hashing the tb hash. Add this into a separate file to include the copyright notice from xxhash. Backports commit dc8b295d05ec35a8c032f9abca421772347ba5d4 from qemu	2018-02-24 17:36:35 -05:00
Peter Maydell	d7dccff836	cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() The function cpu_resume_from_signal() is now always called with a NULL puc argument, and is rather misnamed since it is never called from a signal handler. It is essentially forcing an exit to the top level cpu loop but without raising any exception, so rename it to cpu_loop_exit_noexc() and drop the useless unused argument. Backports commit 6886b98036a8f8f5bce8b10756ce080084cef11b from qemu	2018-02-24 17:25:28 -05:00
Peter Maydell	8d0faac1dc	qemu-common.h: Drop WORDS_ALIGNED define The WORDS_ALIGNED #define is not used anywhere, and hasn't been since 2013 when commit 612d590ebc6cef rewrote the various ld<type>_<endian>_p functions to not use it. Remove the #define and the comment describing it. Also remove the line in the comment about TARGET_WORDS_ALIGNED, since it has never actually existed. Backports commit 0d5c21f2b3bf1e0b562a2c74e353d2e03f2f50ef from qemu	2018-02-24 17:01:55 -05:00
Paolo Bonzini	8df5ad80b1	exec: hide mr->ram_addr from qemu_get_ram_ptr users Let users of qemu_get_ram_ptr and qemu_ram_ptr_length pass in an address that is relative to the MemoryRegion. This basically means what address_space_translate returns. Because the semantics of the second parameter change, rename the function to qemu_map_ram_ptr. Backports commit 0878d0e11ba8013dd759c6921cbf05ba6a41bd71 from qemu	2018-02-24 16:17:49 -05:00
Paolo Bonzini	b2e1b34bcc	memory: split memory_region_from_host from qemu_ram_addr_from_host Move the old qemu_ram_addr_from_host to memory_region_from_host and make it return an offset within the region. For qemu_ram_addr_from_host return the ram_addr_t directly, similar to what it was before commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host", 2013-07-04). Backports commit 07bdaa4196b51bc7ffa7c3f74e9e4a9dc8a7966a from qemu	2018-02-24 16:06:49 -05:00
Paolo Bonzini	918c626847	exec: remove ram_addr argument from qemu_ram_block_from_host Of the two callers, one does not use it, and the other can compute it itself based on the other output argument (offset) and the RAMBlock. Backports commit f615f39616c4fd1a3a3b078af8d75bb4be6390de from qemu	2018-02-24 03:37:40 -05:00
Paolo Bonzini	f26f1f123c	memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr Remove direct uses of ram_addr_t and optimize memory_region_{get,set}_fd now that a MemoryRegion knows its RAMBlock directly. Backports commit 4ff87573df3606856a92c14eef3393a63d736d11 from qemu	2018-02-24 03:34:44 -05:00
Fam Zheng	fb8135cd0d	memory: Remove code for mr->may_overlap The collision check does nothing and hasn't been used. Remove the variable together with related code. Backports commit b61359781958759317ee6fd1a45b59be0b7dbbe1 from qemu	2018-02-24 02:55:25 -05:00
Gonglei	feff56cc11	memory: drop find_ram_block() On the one hand, we have already qemu_get_ram_block() whose function is similar. On the other hand, we can directly use mr->ram_block but searching RAMblock by ram_addr which is a kind of waste. Backports commit fa53a0e53efdc7002497ea4a76aacf6cceb170ef from qemu	2018-02-24 02:52:20 -05:00
Paolo Bonzini	d0d3712417	hw: remove pio_addr_t pio_addr_t is almost unused, because these days I/O ports are simply accessed through the address space. cpu_{in,out}[bwl] themselves are almost unused; monitor.c and xen-hvm.c could use address_space_read/write directly, since they have an integer size at hand. This leaves qtest as the only user of those functions. On the other hand even portio_* functions use this type; the only interesting use of pio_addr_t thus is include/hw/sysbus.h. I guess I could move it there, but I don't see much benefit in that either. Using uint32_t is enough and avoids the need to include ioport.h everywhere. Backports commit 89a80e7400f7225d9401b35ef32454b4ab29dc67 from qemu	2018-02-24 02:43:16 -05:00
Paolo Bonzini	9485b7c2e1	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Backports commit 63c915526d6a54a95919ebece83fa9ca631b2508 from qemu	2018-02-24 02:39:08 -05:00
Paolo Bonzini	58693409ea	exec: extract exec/tb-context.h TCG backends do not need most of exec-all.h; extract what they actually need to a separate file or move it directly to tcg.h. The next patch will stop including exec-all.h from everywhere. Backports commit 00f6da6a1a5d1ce085334eccbb50ec899ceed513 from qemu	2018-02-24 02:09:58 -05:00
Paolo Bonzini	37f26922dd	qemu-common: push cpu.h inclusion out of qemu-common.h Backports commit 33c11879fd422b759483ed25fef133ea900ea8d7 from qemu	2018-02-24 01:50:56 -05:00
Paolo Bonzini	78fd1aab94	cpu: move endian-dependent load/store functions to cpu-all.h Disentangle cpu-common.h and memory.h from NEED_CPU_H. Prototypes are not defined for !NEED_CPU_H, so remove them from poison.h too. Only macros need poisoning. Backports commit a7d6039cb35592683ecc56d2b37817da2d2f8b00 from qemu	2018-02-24 01:04:26 -05:00
Sergey Fedorov	ba9a237586	tcg: Rework tb_invalidated_flag 'tb_invalidated_flag' was meant to catch two events: * some TB has been invalidated by tb_phys_invalidate(); * the whole translation buffer has been flushed by tb_flush(). Then it was checked: * in cpu_exec() to ensure that the last executed TB can be safely linked to directly call the next one; * in cpu_exec_nocache() to decide if the original TB should be provided for further possible invalidation along with the temporarily generated TB. It is always safe to patch an invalidated TB since it is not going to be used anyway. It is also safe to call tb_phys_invalidate() for an already invalidated TB. Thus, setting this flag in tb_phys_invalidate() is simply unnecessary. Moreover, it can prevent from pretty proper linking of TBs, if any arbitrary TB has been invalidated. So just don't touch it in tb_phys_invalidate(). If this flag is only used to catch whether tb_flush() has been called then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using only 'true' and 'false' to set its value. Also, instead of setting it in tb_gen_code(), just after tb_flush() has been called, do it right inside of tb_flush(). In cpu_exec(), this flag is used to track if tb_flush() has been called and have made 'next_tb' (a reference to the last executed TB) invalid for linking it to directly call the next TB. tb_flush() can be called during the CPU execution loop from tb_gen_code(), during TB execution or by another thread while 'tb_lock' is released. Catch for translation buffer flush reliably by resetting this flag once before first TB lookup and each time we find it set before trying to add a direct jump. Don't touch in in tb_find_physical(). Each vCPU has its own execution loop in multithreaded mode and thus should have its own copy of the flag to be able to reset it with its own 'next_tb' and don't affect any other vCPU execution thread. So make this flag per-vCPU and move it to CPUState. In cpu_exec_nocache(), we only need to check if tb_flush() has been called from tb_gen_code() called by cpu_exec_nocache() itself. To do this reliably, preserve the old value of the flag, reset it before calling tb_gen_code(), check afterwards, and combine the saved value back to the flag. This patch is based on the patch "tcg: move tb_invalidated_flag to CPUState" from Paolo Bonzini <pbonzini@redhat.com>. Backports commit 6f789be56d3f38e9214dafcfab3bf9be7191f370 from qemu	2018-02-23 23:34:51 -05:00
Sergey Fedorov	d60af028c5	tcg: Clarify thread safety check in tb_add_jump() The check is to make sure that another thread hasn't already done the same while we were outside of tb_lock. Mention this in a comment. Backports commit 9962c478b153a18fe88a6509fe58cd178aff8abc from qemu	2018-02-23 21:32:47 -05:00
Sergey Fedorov	fbc0a1105f	tcg: Use uintptr_t type for jmp_list_{next\|first} fields of TB These fields do not contain pure pointers to a TranslationBlock structure. So uintptr_t is the most appropriate type for them. Also put some asserts to assure that the two least significant bits of the pointer are always zero before assigning it to jmp_list_first. Backports commit c37e6d7e3589ecb96914faa21025ad7ba6654aea from qemu	2018-02-23 21:28:19 -05:00
Sergey Fedorov	e60c24cecf	tcg: Clean up direct block chaining data fields Briefly describe in a comment how direct block chaining is done. It should help in understanding of the following data fields. Rename some fields in TranslationBlock and TCGContext structures to better reflect their purpose (dropping excessive 'tb_' prefix in TranslationBlock but keeping it in TCGContext): tb_next_offset => jmp_reset_offset tb_jmp_offset => jmp_insn_offset tb_next => jmp_target_addr jmp_next => jmp_list_next jmp_first => jmp_list_first Avoid using a magic constant as an invalid offset which is used to indicate that there's no n-th jump generated. Backports commit f309101c26b59641fc1aa8fb2a98a5441cdaea03 from qemu	2018-02-23 21:28:19 -05:00
Sergey Fedorov	c5b234ed1f	tcg: Note requirement on atomic direct jump patching Backports commit 10b4f4855537dd421e193a7d0416513116370558 from qemu	2018-02-23 21:28:18 -05:00
Sergey Fedorov	52e2972300	tcg/arm: Make direct jump patching thread-safe Ensure direct jump patching in ARM is atomic by using atomic_read()/atomic_set() for code patching. Backports commit 7d14e0e2d661479985197203589c38840e1066df from qemu	2018-02-23 21:28:18 -05:00
Sergey Fedorov	57359fbe6c	tcg/s390: Make direct jump patching thread-safe Ensure direct jump patching in s390 is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() for code patching. Backports commit ed3d51ecd7fe248d3959e469d53890ac9ffe0cd2 from qemu	2018-02-23 21:28:18 -05:00
Sergey Fedorov	5eb2d6618f	tcg/i386: Make direct jump patching thread-safe Ensure direct jump patching in i386 is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() for code patching. Backports commit 0d07abf05e98903c7faf204a9a90f7d45b7554dc from qemu	2018-02-23 21:28:17 -05:00
Emilio G. Cota	170f6e0b3b	tb: consistently use uint32_t for tb->flags We are inconsistent with the type of tb->flags: usage varies loosely between int and uint64_t. Settle to uint32_t everywhere, which is superior to both: at least one target (aarch64) uses the most significant bit in the u32, and uint64_t is wasteful. Compile-tested for all targets. Backports commit 89fee74a0f066dfd73830a7b5fa137e87888c870 from qemu	2018-02-23 21:28:11 -05:00
Edgar E. Iglesias	bfc74c4da2	gen-icount: Use tcg_set_insn_param Use tcg_set_insn_param() instead of directly accessing internal tcg data structures to update an insn param. Backports commit 25caa94c4a26daaab1e65c6d887e2972aeb5749e from qemu	2018-02-23 20:01:17 -05:00
Lioncash	87130fc884	exec-all: Remove externs These are unused	2018-02-23 12:43:03 -05:00
Peter Crosthwaite	576f1752a6	include/exec: Move cputlb exec.c defs out Move the architecture agnostic function prototypes for exec.c out of cputlb.h to exec-all.h. This allows hiding of the arch specific cputlb.h from exec.c which should be getting close to having no architecture specifics. Prepares support for multi-arch, which will have a minimal cpu.h that services exec.c but not cputlb.h. Backports commit dfccc7602374c9fd3b083208b552d62daa244811 from qemu	2018-02-23 10:52:25 -05:00
Peter Crosthwaite	97c9423ee8	cputlb: move CPU_LOOP() for tlb_reset() to exec.c To prepare for multi-arch, cputlb.c should only have awareness of one single architecture. This means it should not have access to the full CPU lists which may be heterogeneous. Instead, push the CPU_LOOP() up to the one and only caller in exec.c. Backports commit 9a13565d52bfd321934fb44ee004bbaf5f5913a8 from qemu	2018-02-23 10:46:31 -05:00
Paolo Bonzini	9479199c6b	memory: fix usage of find_next_bit and find_next_zero_bit The last two arguments to these functions are the last and first bit to check relative to the base. The code was using incorrectly the first bit and the number of bits. Fix this in cpu_physical_memory_get_dirty and cpu_physical_memory_all_dirty. This requires a few changes in the iteration; change the code in cpu_physical_memory_set_dirty_range to match. Backports commit 88c73d16ad1b6c22a2ab082064d0d521f756296a from qemu	2018-02-22 19:51:43 -05:00
Stefan Hajnoczi	e79e0881cd	memory: RCU ram_list.dirty_memory[] for safe RAM hotplug Although accesses to ram_list.dirty_memory[] use atomics so multiple threads can safely dirty the bitmap, the data structure is not fully thread-safe yet. This patch handles the RAM hotplug case where ram_list.dirty_memory[] is grown. ram_list.dirty_memory[] is change from a regular bitmap to an RCU array of pointers to fixed-size bitmap blocks. Threads can continue accessing bitmap blocks while the array is being extended. See the comments in the code for an in-depth explanation of struct DirtyMemoryBlocks. I have tested that live migration with virtio-blk dataplane works. Backports commit 5b82b703b69acc67b78b98a5efc897a3912719eb from qemu	2018-02-22 15:38:03 -05:00
Alex Bennée	3da7d9d9ae	qemu-log: dfilter-ise exec, out_asm, op and opt_op qemu-log: dfilter-ise exec, out_asm, op and opt_op This ensures the code generation debug code will honour -dfilter if set. For the "exec" tracing I've added a new inline macro for efficiency's sake. Backports commit d977e1c2dbc9e63454b2000f91954d02543bf43b from qemu	2018-02-22 10:06:19 -05:00
Peter Maydell	3f5e36e15f	qemu-log: Improve the exec TB execution logging Improve the TB execution logging so that it is easier to identify what is happening from trace logs: * move the "Trace" logging of executed TBs into cpu_tb_exec() so that it is emitted if and only if we actually execute a TB, and for consistency for the CPU state logging * log when we link two TBs together via tb_add_jump() * log when cpu_tb_exec() returns early from a chain of TBs The new style logging looks like this: Trace 0x7fb7cc822ca0 [ffffffc0000dce00] Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823420 [ffffffc000302688] Trace 0x7fb7cc8234a0 [ffffffc000302698] Trace 0x7fb7cc823520 [ffffffc0003026a4] Trace 0x7fb7cc823560 [ffffffc0000dce44] Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc822fd0 [ffffffc0000dd52c] Backports commit 1a830635229e14c403600167823ea6b3b79d3097 from qemu	2018-02-22 09:40:11 -05:00
Pavel Fedin	0201c71145	Merge memory_region_init_reservation() into memory_region_init_io() Just specifying ops = NULL in some cases can be more convenient than having two functions. Backports commit 6d6d2abf2c2e52c0f404d0a31a963e945b0cc7ad from qemu	2018-02-21 11:23:00 -05:00
Fam Zheng	fa7d3e6cdb	memory: Drop MemoryRegion.ram_addr All references to mr->ram_addr are replaced by memory_region_get_ram_addr(mr) (except for a few assertions that are replaced with mr->ram_block). Backports commit 8e41fb63c5bf29ecabe0cee1239bf6230f19978a from qemu	2018-02-21 08:53:08 -05:00
Fam Zheng	2c1a72635d	memory: Implement memory_region_get_ram_addr with mr->ram_block Backports commit 7ebb2745acbb8d910eab07dc5f0aa01a4457703c from qemu	2018-02-21 08:53:08 -05:00
Gonglei	aa80edbef0	exec: Return RAMBlock pointer from allocating functions Previously we return RAMBlock.offset; now return the pointer to the whole structure. ram_block_add returns void now, error is completely passed with errp. Backports commit 528f46af6ecd1e300db18684969104d4067b867b from qemu	2018-02-21 08:52:57 -05:00
Gonglei	26951bf754	memory: Remove unreachable return statement Backports commit d61524486c6e503e502241a2ea834f930f98a6a1 from qemu	2018-02-20 20:54:24 -05:00
Gonglei	d25285bc78	memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length these two functions consume too much cpu overhead to find the RAMBlock by ram address. After this patch, we can pass the RAMBlock pointer to them so that they don't need to find the RAMBlock anymore most of the time. We can get better performance in address translation processing. Backports commit 3655cb9c7375a595a8051ec677c515b24d5c1fe6 from qemu	2018-02-20 20:53:31 -05:00
Gonglei	39e4d63e68	exec: store RAMBlock pointer into memory region Each RAM memory region has a unique corresponding RAMBlock. In the current realization, the memory region only stored the ram_addr which means the offset of RAM address space, We need to qurey the global ram.list to find the ram block by ram_addr if we want to get the ram block, which is very expensive. Now, we store the RAMBlock pointer into memory region structure. So, if we know the mr, we can easily get the RAMBlock. Backports commit 58eaa2174e99d9a05172d03fd2799ab8fd9e6f60 from qemu	2018-02-20 20:43:32 -05:00
Lioncash	c658126845	include: Move RAMList to ramlist.h Moves the struct back into qemu's headers	2018-02-20 08:47:51 -05:00
Lioncash	cdd4003ce9	Move RAMBlock to ram_addr.h Moves it back into qemu's includes.	2018-02-20 08:35:44 -05:00
Paolo Bonzini	cbc56b3ceb	memory: add early bail out from cpu_physical_memory_set_dirty_range This condition is true in the common case, so we can cut out the body of the function. In addition, this makes it easier for the compiler to do at least partial inlining, even if it decides that fully inlining the function is unreasonable. Backports commit 8bafcb21643a39a5b29109f8bd5ee5a6f0f6850b from qemu	2018-02-20 08:32:10 -05:00
Lioncash	a268815478	include: Add stubbed xen function Will allow us to not comment out code all the time for xen checks (ideally)	2018-02-20 08:29:58 -05:00
Lioncash	6d5f465449	uc: Handle freeing of multiple address spaces	2018-02-18 21:36:50 -05:00
Dr. David Alan Gilbert	75701d03ee	qemu_ram_foreach_block: pass up error value, and down the ramblock name check the return value of the function it calls and error if it's non-0 Fixup qemu_rdma_init_one_block that is the only current caller, and rdma_add_block the only function it calls using it. Pass the name of the ramblock to the function; helps in debugging. Backports commit e3807054e20fb3b94d18cb751c437ee2f43b6fac from qemu	2018-02-18 19:17:18 -05:00
Peter Crosthwaite	b82e711a65	memory: Add address_space_init_shareable() This will either create a new AS or return a pointer to an already existing equivalent one, if we have already created an AS for the specified root memory region. The motivation is to reuse address spaces as much as possible. It's going to be quite common that bus masters out in device land have pointers to the same memory region for their mastering yet each will need to create its own address space. Let the memory API implement sharing for them. Aside from the perf optimisations, this should reduce the amount of redundant output on info mtree as well. Thee returned value will be malloced, but the malloc will be automatically freed when the AS runs out of refs. Backports commit f0c02d15b57da6f5463e3768aa0cfeedccf4b8f4 from qemu	2018-02-18 00:18:21 -05:00
Peter Maydell	1dfba71bef	exec.c: Add cpu_get_address_space() Add a function to return the AddressSpace for a CPU based on its numerical index. (Callers outside exec.c don't have access to the CPUAddressSpace struct so can't just fish it out of the CPUState struct directly.) Backports commit 651a5bc03705102de519ebf079a40ecc1da991db from qemu	2018-02-17 23:22:23 -05:00
Peter Maydell	2fe995a0da	exec.c: Pass MemTxAttrs to iotlb_to_region so it uses the right AS Pass the MemTxAttrs for the memory access to iotlb_to_region(); this allows it to determine the correct AddressSpace to use for the lookup. Backports commit a54c87b68a0410d0cf6f8b84e42074a5cf463732 from qemu	2018-02-17 23:19:00 -05:00
Peter Maydell	8edd6ffdfd	cputlb.c: Use correct address space when looking up MemoryRegionSection When looking up the MemoryRegionSection for the new TLB entry in tlb_set_page_with_attrs(), use cpu_asidx_from_attrs() to determine the correct address space index for the lookup, and pass it into address_space_translate_for_iotlb(). Backports commit d7898cda81b6efa6b2d7a749882695cdcf280eaa from qemu	2018-02-17 23:15:22 -05:00

1 2 3

140 commits