unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-12-24 13:35:30 +00:00

Author	SHA1	Message	Date
Peter Maydell	36cd9f0df0	cpu_ldst.h: Drop unused ld/st_kernel defines The ld_kernel and st*_kernel defines are not used anywhere; delete them. Backports commit 5a0826f7d2f9bea6e02157985b103d0a4c458aaa from qemu	2019-04-22 06:54:26 -04:00
Lioncash	830756a725	gen-icount: Use tcg_ctx where applicable in commented out code If this is ever used in the future, it'll already be able to be used.	2019-04-22 06:17:10 -04:00
Lioncash	d844d7cc9d	exec: Backport tb_cflags accessor	2019-04-22 06:12:59 -04:00
Lioncash	9f0e469142	gen-icount: Synchronize with qemu	2019-04-22 05:53:46 -04:00
David Hildenbrand	8583c8f1f6	include/exec/helper-head.h: support "const void " in helper calls Especially when dealing with out-of-line gvec helpers, it is often helpful to specify some vector pointers as constant. E.g. when we have two inputs and one output, marking the two inputs as consts pointers helps to avoid bugs. Const pointers can be specified via "cptr", however behave in TCG just like ordinary pointers. We can specify helpers like: DEF_HELPER_FLAGS_4(gvec_vbperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32) void HELPER(gvec_vbperm)(void v1, const void v2, const void v3, uint32_t desc) And make sure that here, only v1 will be written (as long as const is not casted away, of course). Backports commit 8c6edfdd90522caa4fc429144d393aba5b99f584 from qemu	2019-02-22 19:12:09 -05:00
Emilio G. Cota	1b44fd94ac	exec-all: document that tlb_fill can trigger a TLB resize Backports commit ae56a2ff92ac73782279abf8857585c34b15f509 from qemu	2019-02-12 11:38:28 -05:00
Richard Henderson	9c2a5963d0	exec: Add target-specific tlb bits to MemTxAttrs These bits can be used to cache target-specific data in cputlb read from the page tables. Backports commit d3765835ed02f91f0c6cbb452874209a6af4a730 from qemu	2019-02-05 17:00:56 -05:00
Lioncash	29d84a9296	target: Resolve repeated typedef warnings	2019-01-22 20:27:35 -05:00
Richard Henderson	80b4bef1cc	tcg: Add TCG_CALL_NO_RETURN Remember which helpers have been marked noreturn. Backports commit 15d7409260498505e991e7b9d87118627165e613 from qemu	2019-01-05 06:35:21 -05:00
Emilio G. Cota	308f4c1e0c	include: move exec/tb-hash-xx.h to qemu/xxhash.h Backports commit fe656e3185fa10973d43492c867643e80fa433cd from qemu	2018-12-18 06:07:55 -05:00
Emilio G. Cota	63082a4d20	exec: introduce qemu_xxhash{2,4,5,6,7} Before moving them all to include/qemu/xxhash.h. Backports commit c971d8fa73ff92996d751fa87d90f220cf3c8194 from qemu	2018-12-18 06:04:57 -05:00
Peter Maydell	1301becdab	tcg: Support MMU protection regions smaller than TARGET_PAGE_SIZE Add support for MMU protection regions that are smaller than TARGET_PAGE_SIZE. We do this by marking the TLB entry for those pages with a flag TLB_RECHECK. This flag causes us to always take the slow-path for accesses. In the slow path we can then special case them to always call tlb_fill() again, so we have the correct information for the exact address being accessed. This change allows us to handle reading and writing from small regions; we cannot deal with execution from the small region. Backports commit 55df6fcf5476b44bc1b95554e686ab3e91d725c5 from qemu	2018-11-16 21:35:54 -05:00
Lioncash	3a0ab1a64a	Partial backport of: exec.c: Handle IOMMUs in address_space_translate_for_iotlb() We just want the parameter changes here. Partial backport of commit 1f871c5e6b0f30644a60a81a6a7aadb3afb030ac from qemu	2018-11-16 21:24:55 -05:00
Marc-André Lureau	fc354aa464	memory: learn about non-volatile memory region Add a new flag to mark memory region that are used as non-volatile, by NVDIMM for example. That bit is propagated down to the flat view, and reflected in HMP info mtree with a "nv-" prefix on the memory type. This way, guest_phys_blocks_region_add() can skip the NV memory regions for dumps and TCG memory clear in a following patch. Backports commit c26763f8ec70b1011098cab0da9178666d8256a5 from qemu	2018-11-11 08:50:39 -05:00
Li Qiang	b79f16c331	memory.h: fix typos in comments Backports commit 847b31f0d608bfcbc9ea11d5013ae62e956f32cd from qemu	2018-11-11 07:31:35 -05:00
Emilio G. Cota	1677898a09	cputlb: read CPUTLBEntry.addr_write atomically Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB). This completes the conversion to tlb_lock. This conversion results on average in no performance loss, as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz) show. 1. aarch64 bootup+shutdown test: - Before: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% ) 31,574,905,303 cycles # 4.217 GHz ( +- 0.12% ) 57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% ) 10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% ) 173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% ) 7.504481349 seconds time elapsed ( +- 0.14% ) - After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 31,478,476,520 cycles # 4.218 GHz ( +- 0.07% ) 57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% ) 10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% ) 173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% ) 7.474970463 seconds time elapsed ( +- 0.07% ) 2. SPEC06int: SPEC06int (test set) [Y axis: Speedup over master] 1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+ \| \| 1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+ \| +++ \| +++ tlb-lock-v3 (spinl\|ck) \| \| +++ \| \| +++ +++ \| \| \| 1.05 +-+....+++...........####.........\|####.+++.\|......\|.....###....+++...........+++....###.........+-+ \| ### ++#\| # \|# \|# *### +++### +++#+# \| +++ \| #\|# ### \| 1 +-++++#++++####+++#++#++++++++++#++#++++#++++#+#+**+#++++###++++###++++###++++#+#++++#+#+++-+ \| +* # #++# * # #### * # * ++# **+# \| * # ***\|# \|# # #\|# #+# # # \| 0.95 +-+....#....#..#.\|..#...#..#.\|..#....#.\|..#.++.#.+++#.**.#....#+#....#.#..++#.#..+-+ \| * # # # \| # # # \| # * * # ++ # * * # * * # * \|* # ++# # # # *** # \| \| * * # ++# # + # # # \| # * * # * * # * * # * * # ++ # **** # ++# # * * # \| 0.9 +-+....#...\|#..#....#.++#..#.\|..#....#....#....#....#....#..\|.#...\|#.#....#..+-+ \| * * # *** # * * # \|# # + # * * # * * # * * # * * # * * # ++ # \|# # * * # \| 0.85 +-+....#..\|..#....#.**..#....#....#....#....#....#....#....#.**.#....#..+-+ \| * # + # * * # \| # * * # * * # * * # * * # * * # * * # * * # * \|* # * * # \| \| * * # * * # * * # + # * * # * * # * * # * * # * * # * * # * * # * \|* # * * # \| 0.8 +-+....#.....#....#....#....#....#....#....#....#....#....#.++.#....#..+-+ \| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # \| 0.75 +-+--*##--###-###-###-###-###-*##-##-##-##-##-##--*##--+-+ 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean png: https://imgur.com/a/BHzpPTW Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Backports commit 403f290c0603f35f2d09c982bf5549b6d0803ec1 from qemu	2018-10-23 15:37:43 -04:00
Richard Henderson	c911ea7128	tcg: Add tlb_index and tlb_entry helpers Isolate the computation of an index from an address into a helper before we change that function. Backports commit 383beda9cf32f795616c3b93f7d6154d70372d4b from qemu	2018-10-23 15:04:27 -04:00
Emilio G. Cota	dfb3954571	exec: introduce tlb_init Paves the way for the addition of a per-TLB lock. Backports commit 5005e2537d090bee87aca3b924dcd17920fd146a from qemu	2018-10-23 14:41:29 -04:00
Peter Maydell	01683fe97e	memory: Remove old_mmio accessors Now that all the users of old_mmio MemoryRegion accessors have been converted, we can remove the core code support. Backports commit 62a0db942dec6ebfec19aac2b604737d3c9a2d75 from qemu	2018-10-04 04:45:30 -04:00
Junyan He	6ead2c3d1f	memory, exec: Expose all memory block related flags. We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Backports commit b0e5de93811077254a536c23b713b49e12efb742 from qemu	2018-08-22 13:00:05 -04:00
Peter Maydell	6543f9ea26	tcg: Define and use new tlb_hit() and tlb_hit_page() functions The condition to check whether an address has hit against a particular TLB entry is not completely trivial. We do this in various places, and in fact in one place (get_page_addr_code()) we have got the condition wrong. Abstract it out into new tlb_hit() and tlb_hit_page() inline functions (one for a known-page-aligned address and one for an arbitrary address), and use them in all the places where we had the condition correct. This is a no-behaviour-change patch; we leave fixing the buggy code in get_page_addr_code() to a subsequent patch Backports commit 334692bce7f0653a93b8d84ecde8c847b08dec38 from qemu	2018-07-03 19:21:36 -04:00
Peter Maydell	8295b228e3	bswap: Add new stn__p() and ldn__p() memory access functions There's a common pattern in QEMU where a function needs to perform a data load or store of an N byte integer in a particular endianness. At the moment this is handled by doing a switch() on the size and calling the appropriate ld_p or st_p function for each size. Provide a new family of functions ldn__p() and stn__p() which take the size as an argument and do the switch() themselves. Backports commit afa4f6653dca095f63f3fe7f2001e9334f5676c1 from qemu	2018-06-15 12:17:21 -04:00
Peter Maydell	61a7ac6948	cpu-defs.h: Document CPUIOTLBEntry 'addr' field The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious use; add a comment documenting it (reverse-engineered from what the code that sets it is doing). Backports commit ace4109011b4912b24e76f152e2cf010e78819c5 from qemu	2018-06-15 12:07:39 -04:00
Peter Maydell	7a6ae26346	cputlb: Pass cpu_transaction_failed() the correct physaddr The API for cpu_transaction_failed() says that it takes the physical address for the failed transaction. However we were actually passing it the offset within the target MemoryRegion. We don't currently have any target CPU implementations of this hook that require the physical address; fix this bug so we don't get confused if we ever do add one. Backports commit 2d54f19401bc54b3b56d1cc44c96e4087b604b97 from qemu	2018-06-15 12:03:23 -04:00
Richard Henderson	10e2b13650	tcg: Pass tb and index to tcg_gen_exit_tb separately Do the cast to uintptr_t within the helper, so that the compiler can type check the pointer argument. We can also do some more sanity checking of the index argument. Backports commit 07ea28b41830f946de3841b0ac61a3413679feb9 from qemu	2018-06-07 11:56:32 -04:00
Richard Henderson	533a3f6a6c	tcg: Fix helper function vs host abi for float16 Depending on the host abi, float16, aka uint16_t, values are passed and returned either zero-extended in the host register or with garbage at the top of the host register. The tcg code generator has so far been assuming garbage, as that matches the x86 abi, but this is incorrect for other host abis. Further, target/arm has so far been assuming zero-extended results, so that it may store the 16-bit value into a 32-bit slot with the high 16-bits already clear. Rectify both problems by mapping "f16" in the helper definition to uint32_t instead of (a typedef for) uint16_t. This forces the host compiler to assume garbage in the upper 16 bits on input and to zero-extend the result on output. Backports commit 6c2be133a7478e443c99757b833d0f265c48e0a6 from qemu	2018-06-02 10:10:12 -04:00
Richard Henderson	1730d3cff0	target/arm: Implement SVE Integer Multiply-Add Group Backports commit 96a36e4a44bbf296ac212ed68ebf4e48d3dfb1f0 from qemu	2018-05-20 04:35:36 -04:00
Emilio G. Cota	d26bf1d446	translator: merge max_insns into DisasContextBase While at it, use int for both num_insns and max_insns to make sure we have same-type comparisons. Backports commit b542683d77b4f56cef0221b267c341616d87bce9 from qemu	2018-05-11 13:59:17 -04:00
Pavel Dovgalyuk	b4bf3c776b	icount: fix cpu_restore_state_from_tb for non-tb-exit cases In icount mode, instructions that access io memory spaces in the middle of the translation block invoke TB recompilation. After recompilation, such instructions become last in the TB and are allowed to access io memory spaces. When the code includes instruction like i386 'xchg eax, 0xffffd080' which accesses APIC, QEMU goes into an infinite loop of the recompilation. This instruction includes two memory accesses - one read and one write. After the first access, APIC calls cpu_report_tpr_access, which restores the CPU state to get the current eip. But cpu_restore_state_from_tb resets the cpu->can_do_io flag which makes the second memory access invalid. Therefore the second memory access causes a recompilation of the block. Then these operations repeat again and again. This patch moves resetting cpu->can_do_io flag from cpu_restore_state_from_tb to cpu_loop_exit* functions. It also adds a parameter for cpu_restore_state which controls restoring icount. There is no need to restore icount when we only query CPU state without breaking the TB. Restoring it in such cases leads to the incorrect flow of the virtual time. In most cases new parameter is true (icount should be recalculated). But there are two cases in i386 and openrisc when the CPU state is only queried without the need to break the TB. This patch fixes both of these cases. Backports commit afd46fcad2dceffda35c0586f5723c127b6e09d8 from qemu	2018-04-11 20:05:40 -04:00
Alex Bennée	4074587775	accel/tcg/translate-all: expand cpu_restore_state addr check We are still seeing signals during translation time when we walk over a page protection boundary. This expands the check to ensure the host PC is inside the code generation buffer. The original suggestion was to check versus tcg_ctx.code_gen_ptr but as we now segment the translation buffer we have to settle for just a general check for being inside. I've also fixed up the declaration to make it clear it can deal with invalid addresses. A later patch will fix up the call sites. Backports commit d25f2a72272b9ffe0d06710d6217d1169bc2cc7d from qemu	2018-04-11 19:53:57 -04:00
Bharata B Rao	e373c001fa	cpu: Add Error argument to cpu_exec_init() Add an Error argument to cpu_exec_init() to let users collect the error. This is in preparation to change the CPU enumeration logic in cpu_exec_init(). With the new enumeration logic, cpu_exec_init() can fail if cpu_index values corresponding to max_cpus have already been handed out. Since all current callers of cpu_exec_init() are from instance_init, use error_abort Error argument to abort in case of an error. Backports commit 5a790cc4b942e651fec7edc597c19b637fad5a76 from qemu	2018-03-21 07:50:33 -04:00
Peter Crosthwaite	ce1831bfb4	target-*: Don't redefine cpu_exec() This function needs to be converted to QOM hook and virtualised for multi-arch. This rename interferes, as cpu-qom will not have access to the renaming causing name divergence. This rename doesn't really do anything anyway so just delete it. Backports commit 8642c1b81e0418df066a7960a7426d85a923a253 from qemu	2018-03-20 07:02:47 -04:00
Richard Henderson	31e93018f3	tcg: Allow 6 arguments to TCG helpers We already handle this in the backends, and the lifetime datum for the TCGOp is already large enough. Backports commit 1df3caa946e08b387511dfba3a37d78910e51796 from qemu	2018-03-17 18:29:04 -04:00
Lioncash	a81439c7ca	exec: Drop unnecessary code for unicorn The dirty memory code isn't strictly necessary	2018-03-12 10:11:46 -04:00
Alexey Kardashevskiy	b90333a531	memory: Share special empty FlatView This shares an cached empty FlatView among address spaces. The empty FV is used every time when a root MR renders into a FV without memory sections which happens when MR or its children are not enabled or zero-sized. The empty_view is not NULL to keep the rest of memory API intact; it also has a dispatch tree for the same reason. On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves the amount of FlatView's in use (557 -> 260) and dispatch tables (~800000 -> ~370000). In an unrelated experiment with 112 non-virtio devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000 are created at startup. Backports commit 092aa2fc65b7a35121616aad8f39d47b8f921618 from qemu	2018-03-11 22:34:28 -04:00
Alexey Kardashevskiy	1fd8b64072	memory: Get rid of address_space_init_shareable Since FlatViews are shared now and ASes not, this gets rid of address_space_init_shareable(). This should cause no behavioural change. Backports commit b516572f31c0ea0937cd9d11d9bd72dd83809886 from qemu	2018-03-11 22:12:38 -04:00
Alexey Kardashevskiy	d9bc1bcc8c	memory: Rename mem_begin/mem_commit/mem_add helpers This renames some helpers to reflect better what they do. This should cause no behavioural change. Backports commit 8629d3fcb77e9775e44d9051bad0fb5187925eae from qemu	2018-03-11 21:36:50 -04:00
Alexey Kardashevskiy	aa2b76b4e8	memory: Switch memory from using AddressSpace to FlatView FlatView's will be shared between AddressSpace's and subpage_t and MemoryRegionSection cannot store AS anymore, hence this change. In particular, for: typedef struct subpage_t { MemoryRegion iomem; - AddressSpace as; + FlatView fv; hwaddr base; uint16_t sub_section[]; } subpage_t; struct MemoryRegionSection { MemoryRegion mr; - AddressSpace address_space; + FlatView *fv; hwaddr offset_within_region; Int128 size; hwaddr offset_within_address_space; bool readonly; }; This should cause no behavioural change. Backports commit 166206845f7fd75e720e6feea0bb01957c8da07f from qemu	2018-03-11 21:21:37 -04:00
Lioncash	1591f208c0	memory: Move AddressSpaceDispatch from AddressSpace to FlatView As we are going to share FlatView's between AddressSpace's, and AddressSpaceDispatch is a structure to perform quick lookup in FlatView, this moves ASD to FlatView. After previosly open coded ASD rendering, we can also remove as->next_dispatch as the new FlatView pointer is stored on a stack and set to an AS atomically. flatview_destroy() is executed under RCU instead of address_space_dispatch_free() now. This makes mem_begin/mem_commit to work with ASD and mem_add with FV as later on mem_add will be taking FV as an argument anyway. This should cause no behavioural change. Backports commit 66a6df1dc6d5b28cc3e65db0d71683fbdddc6b62 from qemu	2018-03-11 20:40:24 -04:00
Alex Bennée	e56ed38819	include/exec/helper-head.h: support f16 in helper calls This allows us to explicitly pass float16 to helpers rather than assuming uint32_t and dealing with the result. Of course they will be passed in i32 sized registers by default. Backports commit 35737497008aeabce5dc381a41d3827bec486192 from qemu	2018-03-08 12:28:05 -05:00
Paolo Bonzini	c88064b52c	memory: remove memory_region_test_and_clear_dirty It is unused after g364fb has been converted to use DirtyBitmapSnapshot. Backports commit 77302fb5df05ffca9f41b5b54e3b67c601719d57 from qemu	2018-03-08 09:02:06 -05:00
Laurent Vivier	0aecb15f3b	accel/tcg: add size paremeter in tlb_fill() The MC68040 MMU provides the size of the access that triggers the page fault. This size is set in the Special Status Word which is written in the stack frame of the access fault exception. So we need the size in m68k_cpu_unassigned_access() and m68k_cpu_handle_mmu_fault(). To be able to do that, this patch modifies the prototype of handle_mmu_fault handler, tlb_fill() and probe_write(). do_unassigned_access() already includes a size parameter. This patch also updates handle_mmu_fault handlers and tlb_fill() of all targets (only parameter, no code change). Backports commit 98670d47cd8d63a529ff230fd39ddaa186156f8c from qemu	2018-03-06 10:56:34 -05:00
Richard Henderson	7fe5f620df	tcg: Dynamically allocate TCGOps With no fixed array allocation, we can't overflow a buffer. This will be important as optimizations related to host vectors may expand the number of ops used. Use QTAILQ to link the ops together. Backports commit 15fa08f8451babc88d733bd411d4c94976f9d0f8 from qemu	2018-03-05 16:34:40 -05:00
Peter Xu	1bb34aadf9	cpu: refactor cpu_address_space_init() Normally we create an address space for that CPU and pass that address space into the function. Let's just do it inside to unify address space creations. It'll simplify my next patch to rename those address spaces. Backports commit 80ceb07a83375e3a0091591f96bd47bce2f640ce from qemu	2018-03-05 14:39:25 -05:00
Marc-André Lureau	ffa45adb57	memory: remove unused memory_region_set_global_locking() This was never used since its introduction in commit 196ea13104f8 ("memory: Add global-locking property to memory regions"). Backports commit e2fbe20851ceec5ccd7b539a89db0420393fb85d from qemu	2018-03-05 14:14:43 -05:00
Richard Henderson	d450156414	tcg: Remove GET_TCGV_* and MAKE_TCGV_* The GET and MAKE functions weren't really specific enough. We now have a full complement of functions that convert exactly between temporaries, arguments, tcgv pointers, and indices. The target/sparc change is also a bug fix, which would have affected a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64. Backports commit dc41aa7d34989b552efe712ffe184236216f960b from qemu	2018-03-05 09:12:26 -05:00
Richard Henderson	2bb5011b18	tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp} Transform TCGv_* to an "argument" or a temporary. For now, an argument is simply the temporary index. Backports commit ae8b75dc6ec808378487064922f25f1e7ea7a9be from qemu	2018-03-05 08:46:12 -05:00
Emilio G. Cota	8552d95c52	exec-all: extract tb->tc_* into a separate struct tc_tb In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Backports commit e7e168f41364c6e83d0f75fc1b3ce7f9c41ccf76 from qemu	2018-03-05 02:57:22 -05:00
Emilio G. Cota	5fc83f3eb2	exec-all: introduce TB_PAGE_ADDR_FMT And fix the following warning when DEBUG_TB_INVALIDATE is enabled in translate-all.c: CC mipsn32-linux-user/accel/tcg/translate-all.o /data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’: /data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=] printf("protecting code page: 0x" TARGET_FMT_lx "\n", ^ cc1: all warnings being treated as errors /data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed make[1]: * [accel/tcg/translate-all.o] Error 1 Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed make: * [subdir-mipsn32-linux-user] Error 2 cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$ Backports commit 67a5b5d2f6eb6d3b980570223ba5c478487ddb6f from qemu	2018-03-05 02:49:44 -05:00
Emilio G. Cota	b4a7d8b773	exec-all: bring tb->invalid into tb->cflags This gets rid of a hole in struct TranslationBlock. Backports commit 84f1c148da2b35fbb5a436597872765257e8914e from qemu	2018-03-05 02:46:21 -05:00
Emilio G. Cota	210d13ec49	tcg: consolidate TB lookups in tb_lookup__cpu_state This avoids duplicating code. cpu_exec_step will also use the new common function once we integrate parallel_cpus into tb->cflags. Note that in this commit we also fix a race, described by Richard Henderson during review. Think of this scenario with threads A and B: (A) Lookup succeeds for TB in hash without tb_lock (B) Sets the TB's tb->invalid flag (B) Removes the TB from tb_htable (B) Clears all CPU's tb_jmp_cache (A) Store TB into local tb_jmp_cache Given that order of events, (A) will keep executing that invalid TB until another flush of its tb_jmp_cache happens, which in theory might never happen. We can fix this by checking the tb->invalid flag every time we look up a TB from tb_jmp_cache, so that in the above scenario, next time we try to find that TB in tb_jmp_cache, we won't, and will therefore be forced to look it up in tb_htable. Performance-wise, I measured a small improvement when booting debian-arm. Note that inlining pays off: Performance counter stats for 'taskset -c 0 qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=jessie.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% ) 23,142 context-switches # 0.001 M/sec ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 10,558 page-faults # 0.001 M/sec ( +- 0.95% ) 53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%] 24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%] 16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%] 76,267,572,582 instructions # 1.41 insns per cycle 12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%] 263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%] 19.648474449 seconds time elapsed ( +- 0.82% ) After, w/ inline (this patch): 18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% ) 23,048 context-switches # 0.001 M/sec ( +- 0.48% ) 1 CPU-migrations # 0.000 M/sec 10,708 page-faults # 0.001 M/sec ( +- 0.81% ) 53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%] 23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%] 16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%] 75,786,269,766 instructions # 1.42 insns per cycle 12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%] 260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%] 19.340502161 seconds time elapsed ( +- 0.56% ) After, w/o inline: 18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% ) 23,230 context-switches # 0.001 M/sec ( +- 0.42% ) 1 CPU-migrations # 0.000 M/sec 10,563 page-faults # 0.001 M/sec ( +- 1.27% ) 54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%] 24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%] 16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%] 77,659,755,503 instructions # 1.43 insns per cycle 12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%] 261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%] 19.700174670 seconds time elapsed ( +- 0.56% ) Backports commit f6bb84d53110398f4899c19dab4e0fe9908ec060 from qemu	2018-03-05 02:42:46 -05:00
Emilio G. Cota	68ddc0cb08	exec-all: fix typos in TranslationBlock's documentation Backports commit eb5e2b9e3b141de0c435eedc31c26cbbdefbee1b from qemu	2018-03-05 02:10:28 -05:00
Richard Henderson	31b8b67cd3	tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. Backports commit a85833933628384d74ec412024d55cf012640287 from qemu	2018-03-04 21:52:35 -05:00
Lluís Vilanova	ed7225e685	tcg: Add generic translation framework Backports commit bb2e0039dc07177f928f9fe24758967da02d60a2 from qemu	2018-03-04 14:31:16 -05:00
Paolo Bonzini	6997a5a090	gen-icount: check cflags instead of use_icount global Backports commit cd42d5b23691ad73edfd6dbcfc935a960a9c5a65 from qemu	2018-03-04 14:26:26 -05:00
Lluís Vilanova	3a196c62ae	target: [tcg] Use a generic enum for DISAS_ values Used later. An enum makes expected values explicit and bounds the value space of switches. Backports commit 77fc6f5e28667634916f114ae04c6029cd7b9c45 from qemu	2018-03-04 14:08:43 -05:00
Richard Henderson	b8a16f841a	tcg: Add generic DISAS_NORETURN This will allow some amount of cleanup to happen before switching the backends over to enum DisasJumpType. Backports commit 5dc66895b0113034cd37fd5e65911d7959fc26a9 from qemu	2018-03-04 13:49:18 -05:00
Peter Maydell	26c8f31d9e	memory.h: Move MemTxResult type to memattrs.h Move the MemTxResult type to memattrs.h. We're going to want to use it in cpu/qom.h, which doesn't want to include all of memory.h. In practice MemTxResult and MemTxAttrs are pretty closely linked since both are used for the new-style read_with_attrs and write_with_attrs callbacks, so memattrs.h is a reasonable home for this rather than creating a whole new header file for it. Backports commit 3114d092b1740f9db9aa559aeb48ee387011e1da from qemu	2018-03-04 13:10:47 -05:00
Alexey Kardashevskiy	e723b8dd49	memory: Open code FlatView rendering We are going to share FlatView's between AddressSpace's and per-AS memory listeners won't suit the purpose anymore so open code the dispatch tree rendering. Since there is a good chance that dispatch_listener was the only listener, this avoids address_space_update_topology_pass() if there is no registered listeners; this should improve starting time. This should cause no behavioural change. Backports commit 1b04a1580917d9e41fd37ca62cbff9b4bf061e96 from qemu	2018-03-04 02:06:48 -05:00
Lluís Vilanova	32b3c3815d	tcg: Pass generic CPUState to gen_intermediate_code() Needed to implement a target-agnostic gen_intermediate_code() in the future. Backports commit 9c489ea6bed134fecfd556b439c68bba48fbe102 from qemu	2018-03-03 23:34:18 -05:00
Richard Henderson	fc52eea5e2	tcg: Expand glue macros before stringifying helper names Backports commit 44368ac62dc5ba014b68b2c1a8ec6fedc3242a5d from qemu	2018-03-03 23:07:21 -05:00
Alex Bennée	7d02489baf	include/exec/exec-all: document common exit conditions As a precursor to later patches attempt to come up with a more concrete wording for what each of the common exit cases would be. Backports commit df0311e634828fdc99ca59352aef68503d631aad from qemu	2018-03-03 22:31:28 -05:00
Peter Maydell	3bd5694a0a	memory: Rename memory_region_init_rom() and _rom_device() to _nomigrate() Rename memory_region_init_rom() to memory_region_init_rom_nomigrate() and memory_region_init_rom_device() to memory_region_init_rom_device_nomigrate(). Backports commit b59821a95bd1d7cb4697fd7748725c910582e0e7 from qemu	2018-03-03 22:29:01 -05:00
Peter Maydell	7b0027a828	memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate() Rename memory_region_init_ram() to memory_region_init_ram_nomigrate(). This leaves the way clear for us to provide a memory_region_init_ram() which does handle migration. Backports commit 1cfe48c1ce219b60a9096312f7a61806fae64ab3 from qemu	2018-03-03 22:25:39 -05:00
Peter Maydell	152c56f6a9	memory: Document that the RAM MR initializers do not handle migration The various functions for initializing RAM MemoryRegions do not do anything to cause the data in the MemoryRegion to be migrated. Note in their documentation comments that this is the responsibility of the caller. (We will shortly add a new function that does do this for you.) Backports commit a5c0234bb2754f5248e67929a34c843dbe039da5 from qemu	2018-03-03 22:20:32 -05:00
Pranith Kumar	d0a70720a3	Revert "exec.c: Fix breakpoint invalidation race" Now that we have proper locking after MTTCG patches have landed, we can revert the commit. This reverts commit a9353fe897ca2687e5b3385ed39e3db3927a90e0. Backports commit 406bc339b0505fcfc2ffcbca1f05a3756e338a65 from qemu	2018-03-03 22:14:35 -05:00
Yang Zhong	1135db176f	tcg: add CONFIG_TCG guards in headers Add CONFIG_TCG around TLB-related functions and structure declarations. Some of these functions are defined in ./accel/tcg/cputlb.c, which will not be linked in if TCG is disabled, and have no stubs; therefore, their callers will also be compiled out for --disable-tcg. Backports commit b11ec7f2e44b285a3967d629b55d1a6970b06787 from qemu	2018-03-03 21:37:52 -05:00
Yang Zhong	d70c141675	tcg: move page_size_init() function translate-all.c will be disabled if tcg is disabled in the build, so page_size_init() function and related variables will be moved to exec.c file. Backports commit a0be0c585f5dcc4d50a37f6a20d3d625c5ef3a2c from qemu	2018-03-03 21:30:08 -05:00
Thomas Huth	cf5d583ef0	cpu: Introduce a wrapper for tlb_flush() that can be used in common code Commit 1f5c00cfdb8114c ("qom/cpu: move tlb_flush to cpu_common_reset") moved the call to tlb_flush() from the target-specific reset handlers into the common code qom/cpu.c file, and protected the call with "#ifdef CONFIG_SOFTMMU" to avoid that it is called for linux-user only targets. But since qom/cpu.c is common code, CONFIG_SOFTMMU is never defined here, so the tlb_flush() was simply never executed anymore. Fix it by introducing a wrapper for tlb_flush() in a file that is re-compiled for each target, i.e. in translate-all.c. Backports commit 2cd53943115be5118b5b2d4b80ee0a39c94c4f73 from qemu	2018-03-03 21:24:55 -05:00
Emilio G. Cota	1a4e5da043	gen-icount: use tcg_ctx.tcg_env instead of cpu_env We are relying on cpu_env being defined as a global, yet most targets (i.e. all but arm/a64) have it defined as a local variable. Luckily all of them use the same "cpu_env" name, but really compilation shouldn't break if the name of that local variable changed. Fix it by using tcg_ctx.tcg_env, which all targets set in their translate_init function. This change also helps paving the way for the upcoming "translation loop common to all targets" work. Backports commit 53f6672bcf57d82b794a2cc3a3469be7d35c8653 from qemu	2018-03-03 21:08:58 -05:00
Richard Henderson	68275ba6f3	tcg/arm: Use indirect branch for goto_tb Backports commit 3fb53fb4d12f2e7833bd1659e6013237b130ef20 from qemu	2018-03-03 17:11:18 -05:00
Emilio G. Cota	d3ada2feb5	tcg: allocate TB structs before the corresponding translated code Allocating an arbitrarily-sized array of tbs results in either (a) a lot of memory wasted or (b) unnecessary flushes of the code cache when we run out of TB structs in the array. An obvious solution would be to just malloc a TB struct when needed, and keep the TB array as an array of pointers (recall that tb_find_pc() needs the TB array to run in O(log n)). Perhaps a better solution, which is implemented in this patch, is to allocate TB's right before the translated code they describe. This results in some memory waste due to padding to have code and TBs in separate cache lines--for instance, I measured 4.7% of padding in the used portion of code_gen_buffer when booting aarch64 Linux on a host with 64-byte cache lines. However, it can allow for optimizations in some host architectures, since TCG backends could safely assume that the TB and the corresponding translated code are very close to each other in memory. See this message by rth for a detailed explanation: https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html Subject: Re: GSoC 2017 Proposal: TCG performance enhancements Backports commit 6e3b2bfd6af488a896f7936e99ef160f8f37e6f2 from qemu	2018-03-03 17:05:49 -05:00
Emilio G. Cota	7d0440dec4	tb-hash: improve tb_jmp_cache hash function in user mode Optimizations to cross-page chaining and indirect branches make performance more sensitive to the hit rate of tb_jmp_cache. The constraint of reserving some bits for the page number lowers the achievable quality of the hashing function. However, user-mode does not have this requirement. Thus, with this change we use for user-mode a hashing function that is both faster and of better quality than the previous one. Measurements: Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0. - SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz 2.2x +-+--------------------------------------------------------------------------------------------------------------+-+ \| \| \| jr \| 2x +jr+multhash +....................................................+++++...................................+-+ \| jr+hash \|$$$ \| \| \|$+$ \| \| ### $ \| 1.8x +-+......................................................................#\|#.$...................................+-+ \| ++#+# $ \| \| \|# # $ \| 1.6x +-+....................................................................**.#.$....................++$$$..........+-+ \| $$$ +* # $ \|$+$ \| \| ++$$$ ### $ * * # $ +++\|$ $ \| \| ++###+$ # # $ * * # $ ### **## $ \| 1.4x +-+...................+#.$.........*.#.$............................#.$...........#+#$$.++\|#.$..........+-+ \| +* # $ * * # $ * * # $ # # $ * +# $ \| \| * # $ +++++ * * # $ * * # $ *** # $ * * # $ ###$$ \| 1.2x +-+.....................#.$.**##$$...#.$............................#.$...........#.$....#.$.*+#+$..+-+ \| * # $ + # $ * * # $ +++ * * # $ ++###$$ * * # $ * * # $ * * # $ \| \| **##$$ * # $ * * # $ * * # $ **##$$ ++### * # $ *** #+$ * * # $ * * # $ * * # $ \| \| ++#+$ **##$$$ * # $ * * # $ * * # $ + # $ ++####$$ **+# * # $ * * # $ * * # $ * * # $ * * # $ \| 1x +-++-++#+$+++#-+$++-#+$+++#+$+++#+$+-+#+$+**++#+$+++#$$+++#+$+++#+$++-#+$++-+#+$+++#+$-++-+ \| * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ \| \| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ \| 0.8x +-+--*##$$-##$$$-##$$-##$$-##$$-##$$-###$$-##$$-##$$-##$$-##$$-##$$-##$$--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/4UXTrEc Here I also tried the hash function suggested by Paolo ("multhash"): return ((uint64_t) (pc 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1); As you can see it is just as good as the other new function ("hash"), which is what I ended up going with. - SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz 2.6x +-+--------------------------------------------------------------------------------------------------------------+-+ \| \| \| jr ### \| 2.4x +jr+hash...........................................................................................#.#...........+-+ \| # # \| \| # # \| 2.2x +-+................................................................................................#.#...........+-+ \| # # \| \| # # \| 2x +-+................................................................................................#.#...........+-+ \| **** # \| \| * * # \| 1.8x +-+................................................................................................#...........+-+ \| +++ * * # \| \| #### #### * * # \| 1.6x +-+......................................####.............................#..#.***..#.............#...........+-+ \| +++ #++# *** # * * # #### * * # \| \| ### # # * * # * * # # # * * # \| 1.4x +-+...................**+#..........*..#..............................#.....#....#..#.....#...........+-+ \| ++* # * * # * * # * * # *** # * * # #### \| \| * * # #### * * # * * # * * # * * # * * # **** # \| 1.2x +-+......................#..***++#.....#..............................#.....#.....#.....#......#..+-+ \| **### * # * * # * * # * * # * * # * * # * * # * * # \| \| * * # **### * # * * # * * # ***## * # * * # * * # * * # * * # \| 1x +-+--**###--###--*##--###-###--###--###--##--###-###--###--*##--###--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/ArCbHqo - NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz 1.12x +-+-------------------------------------------------------------------------------------------------------------+-+ \| \| \| jr +++ \| 1.1x +jr+hash...........................................................####.........................................+-+ \| +++#\| # \| \| \| #++# \| 1.08x +-+................................+++................+++.+++..**..#.........................................+-+ \| \| +++ \| \| \| * # \| \| \| \| \| \| +++ # \| 1.06x +-+................................***###.............\|...\|........#.........................+++.............+-+ \| \| * \|# ***### * # \| \| \| \| ++# \| \|# * * # #### \| 1.04x +-+................................++..#............\|..\|#.......#........................#.\|#.............+-+ \| * * # ++++# * * # +++#++# \| \| * * # * * # * * # \| # # +++#### \| 1.02x +-+....................................#......+++.......#.......#.....................**..#..**++#...+-+ \| +++ * # +++ \| * * # * * # +++ \| # +++ # \| \| +++ \| +++ +++ ++++++ * * # ****### * # * * # \| +++ ++++++ ++ # * * # \| 1x +-++-+++++####++***###++++-+####+-++++#-++++-+#++++++#+++-+++#+-+++####-+***###++++++#+++-+++#+-++-+ \| ***\| # ++* \|# ****\| # * # * ++# * # * * # **** \|# * * # * * # * * # \| \| * \| \| # ++# \| ++# * # * * # * * # * * # \| ++# * * # * * # * * # \| 0.98x +-+....\|.++#......#..+++..#......#.......#......#.......#..++..#.......#......#.......#...+-+ \| +++ # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # \| \| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # \| 0.96x +-+---***###--###--*###--###--*###--###--*###--###--*###--###--*###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/ZXFX0hJ - NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz 1.3x +-+-------------------------------------------------------------------------------------------------------------+-+ \| #### \| \| jr # # +++ \| 1.25x +jr+hash.....................#..#...........................................####................................+-+ \| # # # # \| \| # # # # \| 1.2x +-+..........................#..#...........................................#..#................................+-+ \| # # # # \| \| # # # # \| 1.15x +-+..........................#..#...........................................#..#................................+-+ \| # # #### # # \| \| # # # # # # \| 1.1x +-+..........................#..#..................................#..#.....#..#................................+-+ \| # # # # # # +++ \| \| # # #### # # # # #### \| 1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+ \| # # # # # # # # # # +++ # # \| \| +++ * # #### * # # # +++# # # ### # # \| 1x +-++-+*###++*++++++-+++#+-**++#-++++-+#+++++#++#++***++#+-++++#-+***-++++++++#++***++#+-++-+ \| * # * * \| * * # * * # * * # **** # * * # * * # * ### ++# * # \| \| * * # * ### * # * * # * * # * * # * * # * * # * * # * * # * * # \| 0.95x +-+........#.....\|#.......#......#.......#......#.......#......#.......#......#.......#...+-+ \| * * # * * \|# * * # * * # * * # * * # * * # * * # * * # * * # * * # \| \| * * # * * \|# * * # * * # * * # * * # * * # * * # * * # * * # * * # \| 0.9x +-+---***###--###--*###--###--*###--###--*###--###--*###--###--***###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/FfD27ey Backports commit 6f1653180f5701c6a8f1b35b89a80b1e3260928e from qemu	2018-03-03 14:11:29 -05:00
Emilio G. Cota	8f4f15e5f5	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu	2018-03-02 21:05:18 -05:00
Peter Xu	fce1b469e5	memory: tune last param of iommu_ops.translate() This patch converts the old "is_write" bool into IOMMUAccessFlags. The difference is that "is_write" can only express either read/write, but sometimes what we really want is "none" here (neither read nor write). Replay is an good example - during replay, we should not check any RW permission bits since thats not an actual IO at all. Backports commit bf55b7afce53718ef96f4e6616da62c0ccac37dd from qemu	2018-03-02 18:59:12 -05:00
Paolo Bonzini	c27870520a	exec: revert MemoryRegionCache MemoryRegionCache did not know about virtio support for IOMMUs (because the two features were developed at the same time). Revert MemoryRegionCache to "normal" address_space_* operations for 2.9, as it is simpler than undoing the virtio patches. Backports commit 90c4fe5fc517a045e7a7cf2f23472e114042ca29 from qemu	2018-03-02 14:30:41 -05:00
Dr. David Alan Gilbert	55d79cf4c0	RAMBlocks: qemu_ram_is_shared Provide a helper to say whether a RAMBlock was created as a shared mapping. Backports commit 463a4ac23bcf0f0b65c850fa66f5ae6e43edd243 from qemu	2018-03-02 13:05:35 -05:00
Dr. David Alan Gilbert	5dfbee8930	memory_region: Fix name comments The 'name' parameter to memory_region_init_* had been marked as debug only, however vmstate_region_ram uses it as a parameter to qemu_ram_set_idstr to set RAMBlock names and these form part of the migration stream. Backports commit e8f5fe2de125a0bfbefbaa6a69af81f4817cb7a0 from qemu	2018-03-02 13:01:23 -05:00
Yongji Xie	23f5b17a08	memory: Introduce DEVICE_HOST_ENDIAN for ram device At the moment ram device's memory regions are DEVICE_NATIVE_ENDIAN. It's incorrect. This memory region is backed by a MMIO area in host, so the uint64_t data that MemoryRegionOps read from/write to this area should be host-endian rather than target-endian. Hence, current code does not work when target and host endianness are different which is the most common case on PPC64. To fix it, this introduces DEVICE_HOST_ENDIAN for the ram device. This has been tested on PPC64 BE/LE host/guest in all possible combinations including TCG. Backports commit c99a29e702528698c0ce2590f06ca7ff239f7c39 from qemu	2018-03-02 11:24:32 -05:00
Alex Bennée	454932263c	cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap While the vargs approach was flexible the original MTTCG ended up having munge the bits to a bitmap so the data could be used in deferred work helpers. Instead of hiding that in cputlb we push the change to the API to make it take a bitmap of MMU indexes instead. For ARM some the resulting flushes end up being quite long so to aid readability I've tended to move the index shifting to a new line so all the bits being or-ed together line up nicely, for example: tlb_flush_page_by_mmuidx(other_cs, pageaddr, (1 << ARMMMUIdx_S1SE1) \| (1 << ARMMMUIdx_S1SE0)); Backports commit 0336cbf8532935d8e23c2aabf3e2ce2c0697b6ac from qemu	2018-03-02 10:12:40 -05:00
Alex Bennée	e3e57ca08e	cputlb: drop flush_global flag from tlb_flush We have never has the concept of global TLB entries which would avoid the flush so we never actually use this flag. Drop it and make clear that tlb_flush is the sledge-hammer it has always been. Backports commit d10eb08f5d8389c814b554d01aa2882ac58221bf from qemu	2018-03-01 19:36:04 -05:00
Jason Wang	29932d0719	memory: handle alias in memory_region_is_iommu() Backports commit 12d37882f0c0def5dee1c21be5d8fea9c21baada from qemu	2018-03-01 13:06:18 -05:00
Jason Wang	fdca6292a1	exec: introduce address_space_get_iotlb_entry() This patch introduces a helper to query the iotlb entry for a possible iova. This will be used by later device IOTLB API to enable the capability for a dataplane (e.g vhost) to query the IOTLB. Backports commit 052c8fa9983f553fdfa0d61034774070dd639c2b from qemu	2018-03-01 13:05:08 -05:00
Paolo Bonzini	81ad780e5e	exec: introduce MemoryRegionCache Device models often have to perform multiple access to a single memory region that is known in advance, but would to use "DMA-style" functions instead of address_space_map/unmap. This can happen for example when the data has to undergo endianness conversion. Introduce a new data structure to cache the result of address_space_translate without forcing usage of a host address like address_space_map does. Backports commit 1f4e496e1fc2eb6c8bf377a0f9695930c380bfd3 from qemu	2018-03-01 10:50:30 -05:00
Paolo Bonzini	88ad0f4f6e	exec: introduce memory_ldst.inc.c Templatize the address_space_* and *_phys functions, so that we can add similar functions in the next patch that work with a lightweight, cache-like version of address_space_map/unmap. Backports commit 0ce265ffef87f19f4dd1ff0663e09a63d66ae408 from qemu	2018-03-01 09:59:34 -05:00
Paolo Bonzini	9404dbf74e	cpu-exec: fix icount out-of-bounds access When icount is active, tb_add_jump is surprisingly called with an out of bounds basic block index. I have no idea how that can work, but it does not seem like a good idea. Clear *last_tb for all TB_EXIT_ICOUNT_EXPIRED cases, even when all you have to do is refill icount_extra. Backports commit d8dea6fbcbed177ca5d23ab77b3834a9437f0e88 from qemu	2018-03-01 09:17:26 -05:00
Bobby Bingham	d46e52d9d0	cpu_ldst.h: use correct guest address parameter In the user emulation code path, tlb_vaddr_to_host erronesously passed vaddr as the guest address to be translated, instead of addr, the parameter which actually contained the guest address. This resulted in incorrect addresses being used when emulating block copy (mvc/mvpg) and block clear (xc) instructions for the s390x target. Backports commit c2a85316902e67530da9d6548139fcce73c0cac6 from qemu	2018-03-01 08:56:37 -05:00
Paolo Bonzini	9d64a89acf	tcg: comment on which functions have to be called with tb_lock held softmmu requires more functions to be thread-safe, because translation blocks can be invalidated from e.g. notdirty callbacks. Probably the same holds for user-mode emulation, it's just that no one has ever tried to produce a coherent locking there. This patch will guide the introduction of more tb_lock and tb_unlock calls for system emulation. Note that after this patch some (most) of the mentioned functions are still called outside tb_lock/tb_unlock. The next one will rectify this. Backports commit 7d7500d99895f888f97397ef32bb536bb0df3b74 from qemu	2018-02-28 10:26:28 -05:00
Alex Bennée	7aab0bd9a6	translate-all: add DEBUG_LOCKING asserts This adds asserts to check the locking on the various translation engines structures. There are two sets of structures that are protected by locks. The first the l1map and PageDesc structures used to track which translation blocks are associated with which physical addresses. In user-mode this is covered by the mmap_lock. The second case are TB context related structures which are protected by tb_lock which is also user-mode only. Currently the asserts do nothing in SoftMMU mode but this will change for MTTCG. Backports commit 301e40ed8005306c009978be295ed9a4b725178b from qemu	2018-02-28 08:56:15 -05:00
Yongbok Kim	79e4c001a9	softmmu: Add probe_write() Probe for whether the specified guest write access is permitted. If it is not permitted then an exception will be taken in the same way as if this were a real write access (and we will not return). Otherwise the function will return, and there will be a valid entry in the TLB for this access. Backports commit 3b4afc9e75ab1a95f33e41f462921093f8a109c4 from qemu	2018-02-27 12:20:50 -05:00
Richard Henderson	e35aacd5ae	tcg: Add EXCP_ATOMIC When we cannot emulate an atomic operation within a parallel context, this exception allows us to stop the world and try again in a serial context. Backports commit fdbc2b5722f6092e47181a947c90fd4bdcc1c121 from qemu Also backports parts of commit 02d57ea115b7669f588371c86484a2e8ebc369be	2018-02-27 11:57:58 -05:00
Peter Maydell	db8b0a82b1	cpu: Support a target CPU having a variable page size Support target CPUs having a page size which isn't knownn at compile time. To use this, the CPU implementation should: * define TARGET_PAGE_BITS_VARY * not define TARGET_PAGE_BITS * define TARGET_PAGE_BITS_MIN to the smallest value it might possibly want for TARGET_PAGE_BITS * call set_preferred_target_page_bits() in its realize function to indicate the actual preferred target page size for the CPU (and report any error from it) In CONFIG_USER_ONLY, the CPU implementation should continue to define TARGET_PAGE_BITS appropriately for the guest OS page size. Machines which want to take advantage of having the page size something larger than TARGET_PAGE_BITS_MIN must set the MachineClass minimum_page_bits field to a value which they guarantee will be no greater than the preferred page size for any CPU they create. Note that changing the target page size by setting minimum_page_bits is a migration compatibility break for that machine. For debugging purposes, attempts to use TARGET_PAGE_SIZE before it has been finally confirmed will assert. Backports commit 20bccb82ff3ea09bcb7c4ee226d3160cab15f7da from qemu	2018-02-26 12:29:08 -05:00
Paolo Bonzini	eb75004013	memory: add a per-AddressSpace list of listeners This speeds up MEMORY_LISTENER_CALL noticeably. Right now, with many PCI devices you have N regions added to M AddressSpaces (M = # PCI devices with bus-master enabled) and each call looks up the whole listener list, with at least M listeners in it. Because most of the regions in N are BARs, which are also roughly proportional to M, the whole thing is O(M^3). This changes it to O(M^2), which is the best we can do without rewriting the whole thing. Backports commit 9a54635dcb51a3fcf7507af630168f514a8cd4e7 from qemu	2018-02-26 10:46:50 -05:00
Paolo Bonzini	4b06e8bbb7	memory: eliminate global MemoryListeners There is none, so just drop the code. Backports commit d45fa784cd0c111131696808d1168259d66b7519 from qemu	2018-02-26 10:19:28 -05:00
Richard Henderson	66d79ac959	tcg: Merge GETPC and GETRA The return address argument to the softmmu template helpers was confused. In the legacy case, we wanted to indicate that there is no return address, and so passed in NULL. However, we then immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero value, indicating the presence of an (invalid) return address. Push the GETPC_ADJ subtraction down to the only point it's required: immediately before use within cpu_restore_state_from_tb, after all NULL pointer checks have been completed. This makes GETPC and GETRA identical. Remove GETRA as the lesser used macro, replacing all uses with GETPC. Backports commit 01ecaf438b1eb46abe23392c8ce5b7628b0c8cf5 from qemu	2018-02-26 02:54:44 -05:00
Paolo Bonzini	30845ae475	tcg: Prepare TB invalidation for lockless TB lookup When invalidating a translation block, set an invalid flag into the TranslationBlock structure first. It is also necessary to check whether the target TB is still valid after acquiring 'tb_lock' but before calling tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in future. Note that we don't have to check 'last_tb'; an already invalidated TB will not be executed anyway and it is thus safe to patch it. Backports commit 6d21e4208f382dd8ca1f7995a6dd9ea7ca281163 from qemu	2018-02-26 01:48:13 -05:00
Alex Williamson	fe66c2e088	memory: Don't use memcpy for ram_device regions With a vfio assigned device we lay down a base MemoryRegion registered as an IO region, giving us read & write accessors. If the region supports mmap, we lay down a higher priority sub-region MemoryRegion on top of the base layer initialized as a RAM device pointer to the mmap. Finally, if we have any quirks for the device (ie. address ranges that need additional virtualization support), we put another IO sub-region on top of the mmap MemoryRegion. When this is flattened, we now potentially have sub-page mmap MemoryRegions exposed which cannot be directly mapped through KVM. This is as expected, but a subtle detail of this is that we end up with two different access mechanisms through QEMU. If we disable the mmap MemoryRegion, we make use of the IO MemoryRegion and service accesses using pread and pwrite to the vfio device file descriptor. If the mmap MemoryRegion is enabled and results in one of these sub-page gaps, QEMU handles the access as RAM, using memcpy to the mmap. Using either pread/pwrite or the mmap directly should be correct, but using memcpy causes us problems. I expect that not only does memcpy not necessarily honor the original width and alignment in performing a copy, but it potentially also uses processor instructions not intended for MMIO spaces. It turns out that this has been a problem for Realtek NIC assignment, which has such a quirk that creates a sub-page mmap MemoryRegion access. To resolve this, we disable memory_access_is_direct() for ram_device regions since QEMU assumes that it can use memcpy for those regions. Instead we access through MemoryRegionOps, which replaces the memcpy with simple de-references of standard sizes to the host memory. With this patch we attempt to provide unrestricted access to the RAM device, allowing byte through qword access as well as unaligned access. The assumption here is that accesses initiated by the VM are driven by a device specific driver, which knows the device capabilities. If unaligned accesses are not supported by the device, we don't want them to work in a VM by performing multiple aligned accesses to compose the unaligned access. A down-side of this philosophy is that the xp command from the monitor attempts to use the largest available access weidth, unaware of the underlying device. Using memcpy had this same restriction, but at least now an operator can dump individual registers, even if blocks of device memory may result in access widths beyond the capabilities of a given device (RTL NICs only support up to dword). Backports commit 1b16ded6a512809f99c133a97f19026fe612b2de from qemu	2018-02-25 23:06:36 -05:00
Alex Williamson	5db45219c9	memory: Replace skip_dump flag with ram_device Setting skip_dump on a MemoryRegion allows us to modify one specific code path, but the restriction we're trying to address encompasses more than that. If we have a RAM MemoryRegion backed by a physical device, it not only restricts our ability to dump that region, but also affects how we should manipulate it. Here we recognize that MemoryRegions do not change to sometimes allow dumps and other times not, so we replace setting the skip_dump flag with a new initializer so that we know exactly the type of region to which we're applying this behavior. Backports commit ca83f87a66d19fdaabf23d4f5ebb49396fe232c1 from qemu	2018-02-25 23:00:45 -05:00
Richard Henderson	1547048a22	tcg: Reorg TCGOp chaining Instead of using -1 as end of chain, use 0, and link through the 0 entry as a fully circular double-linked list. Backports commit dcb8e75870e2de199db853697f8839cb603beefe from qemu	2018-02-25 21:44:50 -05:00
Igor Mammedov	62c89b9cd4	exec: Reduce CONFIG_USER_ONLY ifdeffenery Backports commit 1bc7e522d9cf1b58f2de9c8f1737be0bb5129c35 from qemu	2018-02-25 20:57:48 -05:00

1 2 3 4 5 ...

299 commits