unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2025-01-03 18:05:40 +00:00

Author	SHA1	Message	Date
Emilio G. Cota	b71769fa5f	target/arm: check CF_PARALLEL instead of parallel_cpus Thereby decoupling the resulting translated code from the current state of the system. Backports commit 2399d4e7cec22ecf1c51062d2ebfd45220dbaace from qemu	2018-03-13 15:05:45 -04:00
Emilio G. Cota	c384da2f47	tcg: convert tb->cflags reads to tb_cflags(tb) Convert all existing readers of tb->cflags to tb_cflags, so that we use atomic_read and therefore avoid undefined behaviour in C11. Note that the remaining setters/getters of the field are protected by tb_lock, and therefore do not need conversion. Luckily all readers access the field via 'tb->cflags' (so no foo.cflags, bar->cflags in the code base), which makes the conversion easily scriptable: FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \ accel/tcg/translator.c \| cut -f1 -d':' \| sort \| uniq) perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES perl -pi -e 's/([a-z->.]*)(->\|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES Then manually fixed the few errors that checkpatch reported. Compile-tested for all targets. Backports commit c5a49c63fa26e8825ad101dfe86339ae4c216539 from qemu	2018-03-13 14:57:51 -04:00
Richard Henderson	d6ca4d59dc	tcg: Include CF_COUNT_MASK in CF_HASH_MASK Backports commit cdfef1715c779eb528d633e8b76cbc8a10e71ac8 from qemu	2018-03-13 14:42:42 -04:00
Richard Henderson	5d360366e9	tcg: Add CPUState cflags_next_tb We were generating code during tb_invalidate_phys_page_range, check_watchpoint, cpu_io_recompile, and (seemingly) discarding the TB, assuming that it would magically be picked up during the next iteration through the cpu_exec loop. Instead, record the desired cflags in CPUState so that we request the proper TB so that there is no more magic. Backports commit 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9 from qemu	2018-03-13 14:39:43 -04:00
Emilio G. Cota	b5961a139b	tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Backports commit 4e2ca83e71b51577b06b1468e836556912bd5b6e from qemu	2018-03-13 14:32:43 -04:00
Emilio G. Cota	6bc05eeee4	tb hash: track translated blocks with qht Having a fixed-size hash table for keeping track of all translation blocks is suboptimal: some workloads are just too big or too small to get maximum performance from the hash table. The MRU promotion policy helps improve performance when the hash table is a little undersized, but it cannot make up for severely undersized hash tables. Furthermore, frequent MRU promotions result in writes that are a scalability bottleneck. For scalability, lookups should only perform reads, not writes. This is not a big deal for now, but it will become one once MTTCG matures. The appended fixes these issues by using qht as the implementation of the TB hash table. This solution is superior to other alternatives considered, namely: - master: implementation in QEMU before this patchset - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU. - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU. MRU is implemented here by adding an intermediate struct that contains the u32 hash and a pointer to the TB; this allows us, on an MRU promotion, to copy said struct (that is not at the head), and put this new copy at the head. After a grace period, the original non-head struct can be eliminated, and after another grace period, freed. - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize + no MRU for lookups; MRU for inserts. The appended solution is the following: - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize + no MRU for lookups; MRU for inserts. The plots below compare the considered solutions. The Y axis shows the boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis sweeps the number of buckets (or initial number of buckets for qht-autoresize). The plots in PNG format (and with errorbars) can be seen here: http://imgur.com/a/Awgnq Each test runs 5 times, and the entire QEMU process is pinned to a single core for repeatability of results. Host: Intel Xeon E5-2690 28 ++------------+-------------+-------------+-------------+------------++ A*** + + + master A*** + 27 ++ * xxhash ##B###++ \| A****A** xxhash-rcu $$C$$$ \| 26 C$$ A**A**** qht-fixed-nomru%%D%%%++ D%%$$ A***A***Aqht-dyn-mru AE*A 25 ++ %%$$ qht-dyn-nomru &&F&&&++ B#####% \| 24 ++ #C$$$$$ ++ \| B### $ \| \| ## C$$$$$$ \| 23 ++ # C$$$$$$ ++ \| B###### C$$$$$$ %%%D 22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C \| D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E 21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B + E@@@ F&&& + E@ + F&&& + + 20 ++------------+-------------+-------------+-------------+------------++ 14 16 18 20 22 24 log2 number of buckets Host: Intel i7-4790K 14.5 ++------------+------------+-------------+------------+------------++ A + + + master A* + 14 ++ xxhash ##B###++ 13.5 ++ xxhash-rcu $$C$$$++ \| qht-fixed-nomru %%D%%% \| 13 ++ A**** qht-dyn-mru @@E@@@++ \| A*A**A** qht-dyn-nomru &&F&&& \| 12.5 C$$ A**A**A*A** A 12 ++ $$ A ++ D%%% $$ \| 11.5 ++ %% ++ B### %C$$$$$$ \| 11 ++ ## D%%%%% C$$$$$ ++ \| # % C$$$$$$ \| 10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C 10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B + F&& D%%%%%%B######B######B#####B###@@@D%%% + 9.5 ++------------+------------+-------------+------------+------------++ 14 16 18 20 22 24 log2 number of buckets Note that the original point before this patch series is X=15 for "master"; the little sensitivity to the increased number of buckets is due to the poor hashing function in master. xxhash-rcu has significant overhead due to the constant churn of allocating and deallocating intermediate structs for implementing MRU. An alternative would be do consider failed lookups as "maybe not there", and then acquire the external lock (tb_lock in this case) to really confirm that there was indeed a failed lookup. This, however, would not be enough to implement dynamic resizing--this is more complex: see "Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming" by Triplett, McKenney and Walpole. This solution was discarded due to the very coarse RCU read critical sections that we have in MTTCG; resizing requires waiting for readers after every pointer update, and resizes require many pointer updates, so this would quickly become prohibitive. qht-fixed-nomru shows that MRU promotion is advisable for undersized hash tables. However, qht-dyn-mru shows that MRU promotion is not important if the hash table is properly sized: there is virtually no difference in performance between qht-dyn-nomru and qht-dyn-mru. Before this patch, we're at X=15 on "xxhash"; after this patch, we're at X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we can achieve with optimum sizing of the hash table, while keeping the hash table scalable for readers. The improvement we get before and after this patch for booting debian jessie with arm-softmmu is: - Intel Xeon E5-2690: 10.5% less time - Intel i7-4790K: 5.2% less time We could get this same improvement _for this particular workload_ by statically increasing the size of the hash table. But this would hurt workloads that do not need a large hash table. The dynamic (upward) resizing allows us to start small and enlarge the hash table as needed. A quick note on downsizing: the table is resized back to 215 buckets on every tb_flush; this makes sense because it is not guaranteed that the table will reach the same number of TBs later on (e.g. most bootup code is thrown away after boot); it makes sense to grow the hash table as more code blocks are translated. This also avoids the complication of having to build downsizing hysteresis logic into qht. Backports commit 909eaac9bbc2ed4f3a82ce38e905b87d478a3e00 from qemu	2018-03-13 14:16:26 -04:00
Lioncash	e45c294405	Backport qht hashtable	2018-03-13 13:55:30 -04:00
Philippe Mathieu-Daudé	4eeb4f7faf	accel/tcg: move atomic_template.h to accel/tcg/	2018-03-13 12:28:50 -04:00
Thomas Huth	975924bb2e	accel/tcg: move softmmu_template.h to accel/tcg/ The header is only used by accel/tcg/cputlb.c so we can move it to the accel/tcg/ folder, too. Backports commit da1849c1eba50aa372f87c7945d7b230eb2b2fb2 from qemu	2018-03-13 12:27:04 -04:00
Lioncash	035f1afa7d	tcg: move tcg backend files into accel/tcg/ move tcg-runtime.c, translate-all.(ch) and translate-common.c into accel/tcg/ subdirectory and updated related trace-events file. Backports commit 244f144134d0dd182f1af8654e7f9a79fe770368 and applies relevant changes made in db432672dc50ed86dda17ac821b7eb07411a90af and d9bb58e51068dfc48746c6af0179926c8dc05bce from qemu	2018-03-13 11:48:15 -04:00
Lioncash	99dbbf1571	tcg/optimize: Perform comparison pass with qemu Keeps formatting and code synced	2018-03-12 18:06:29 -04:00
Lioncash	21b0afe218	tcg: Perform comparison pass with qemu Makes formatting and code consistent with qemu	2018-03-12 18:03:06 -04:00
Lioncash	95d50a02a1	target/mips/translate: Perform comparison pass with qemu Keeps code and formatting in sync	2018-03-12 17:52:56 -04:00
Lioncash	7db1bff993	target/mips/op_helper: Perform comparison pass with qemu Keeps code and formatting in sync	2018-03-12 15:25:08 -04:00
Lioncash	48429b2bcb	target/mips/msa_helper: Perform comparison pass with qemu Keeps code and formatting in sync	2018-03-12 15:15:42 -04:00
Lioncash	4e8a1f8d6b	target/mips/internal: Perform comparison pass with qemu Keeps code and formatting in sync with qemu	2018-03-12 15:13:17 -04:00
Lioncash	05089ecb12	target/mips/helper: Perform comparison pass with qemu Keeps code and formatting in sync with qemu	2018-03-12 15:11:52 -04:00
Lioncash	56675f5215	cpu-exec: Resolve potential compilation errors We need to pass 'uc' to CPU_GET_CLASS	2018-03-12 14:59:21 -04:00
Lioncash	e9d9ed5eaa	target/i386/bpt_helper: Perform comparison pass with qemu Keep formatting and code in sync where applicable	2018-03-12 13:28:50 -04:00
Lioncash	fc7eaf7f77	target/i386/svm_helper: Perform comparison pass with qemu Keep code and formatting in sync where applicable	2018-03-12 13:27:03 -04:00
Lioncash	27c283bb3c	target/i386/smm_helper: Perform comparison pass with qemu Ensure code and formatting stay in sync where relevant	2018-03-12 13:25:37 -04:00
Lioncash	73426a7e79	target/i386/seg_helper: Perform comparison pass against qemu Ensure formatting and code stay in sync where relevant	2018-03-12 13:24:36 -04:00
Lioncash	a1910954cd	target/i386/mem_helper: Perform comparison pass against qemu Ensure formatting and relevant code are in order	2018-03-12 13:19:05 -04:00
Lioncash	995ae229a3	target/i386/excp_helper: remove unnecessary comment	2018-03-12 13:16:53 -04:00
Lioncash	c1e72be68d	target/i386/fpu_helper: Perform comparison pass against qemu	2018-03-12 13:15:51 -04:00
Lioncash	0d0dd2ba98	target/i386/translate: Perform comparison pass against qemu Ensure code and formatting match qemu where applicable	2018-03-12 13:12:01 -04:00
Lioncash	83b35aa797	target/sparc/win_helper: Perform comparison pass against qemu Ensure code and formatting are consistent with qemu	2018-03-12 12:46:59 -04:00
Lioncash	0215431990	target/sparc/mmu_helper: Perform comparison pass against qemu Ensure code and formatting match qemu	2018-03-12 12:45:18 -04:00
Lioncash	83c0769d90	target/sparc/ldst_helper: Perform comparison pass against qemu Ensure code and formatting is consistent with qemu	2018-03-12 12:43:14 -04:00
Lioncash	a228660860	target/sparc/fop_helper: Perform comparison pass against qemu Ensure formatting and code is consistent from the backporting	2018-03-12 12:38:21 -04:00
Lioncash	2114d28f7e	target/sparc/cc_helper: Perform a comparison pass against qemu	2018-03-12 12:36:51 -04:00
Lioncash	bcc8bc5c18	target/sparc/translate: Perform comparison pass againt main qemu repo Ensure that formatting and relevant code is organized like qemu	2018-03-12 12:34:49 -04:00
Lioncash	b92dd8d299	target/m68k/op_helper: Adjust formatting to be in sync with qemu	2018-03-12 12:26:53 -04:00
Lioncash	6e9ecb876e	target/m68k/translate: Perform pass over code relative to qemu Catches a few things that got lost in the backporting process.	2018-03-12 12:22:57 -04:00
Lioncash	750d56421c	translate/arm/vec_helper: Align to qemu formatting	2018-03-12 11:59:14 -04:00
Lioncash	bab31a2510	target/arm/cpu and crypto_helper: Correct bad merge and adjust to qemu code style	2018-03-12 11:57:24 -04:00
Lioncash	0751366e5c	target/arm/op_helper: Correct bad merge	2018-03-12 11:42:43 -04:00
Lioncash	9a0632bfcf	target/arm/helper64: Correct bad merge	2018-03-12 11:37:27 -04:00
Lioncash	c93c3bd4b3	target/arm/helper: Correct bad merge	2018-03-12 11:33:45 -04:00
Lioncash	14c1fcd5bf	target/arm/translate: Correct bad merge	2018-03-12 11:17:37 -04:00
Lioncash	0dd13de42f	target/arm/translate-a64: Correct bad merge	2018-03-12 11:17:33 -04:00
Peter Maydell	fabd6c7ae8	target/arm: Make 'any' CPU just an alias for 'max' Now we have a working '-cpu max', the linux-user-only 'any' CPU is pretty much the same thing, so implement it that way. For the moment we don't add any of the extra feature bits to the system-emulation "max", because we don't set the ID register bits we would need to to advertise those features as present. Backports commit a0032cc5427d0d396aa0a9383ad9980533448ea4 from qemu	2018-03-12 10:11:49 -04:00
Peter Maydell	7388fff079	target/arm: Add "-cpu max" support Add support for "-cpu max" for ARM guests. This CPU type behaves like "-cpu host" when KVM is enabled, and like a system CPU with the maximum possible feature set otherwise. (Note that this means it won't be migratable across versions, as we will likely add features to it in future.) Backports commit bab52d4bba3f22921a690a887b4bd0342f2754cd from qemu	2018-03-12 10:11:49 -04:00
Alistair Francis	44d8c38138	target/arm: Add a core count property The cortex A53 TRM specifies that bits 24 and 25 of the L2CTLR register specify the number of cores in the processor, not the total number of cores in the system. To report this correctly on machines with multiple CPU clusters (ARM's big.LITTLE or Xilinx's ZynqMP) we need to allow the machine to overwrite this value. To do this let's add an optional property. Backports commit f9a697112ee64180354f98309a5d6b691cc8699d from qemu	2018-03-12 10:11:48 -04:00
Kevin Wolf	025e354370	qdict: Introduce qdict_rename_keys() A few block drivers will need to rename .bdrv_create options for their QAPIfication, so let's have a helper function for that. Backports commit bcebf102ccc3c6db327f341adc379fdf0673ca6b from qemu	2018-03-12 10:11:48 -04:00
Laurent Vivier	418f96df9b	target/m68k: implement ftentox Using a local m68k floatx80_tentox() [copied from previous: Written by Andreas Grabher for Previous, NeXT Computer Emulator.] Backports commit 6c25be6e30bda0e470f8f0b6b93d53a6efe469e8 from qemu	2018-03-12 10:11:48 -04:00
Laurent Vivier	61fa8cf539	target/m68k: implement ftwotox Using a local m68k floatx80_twotox() [copied from previous: Written by Andreas Grabher for Previous, NeXT Computer Emulator.] Backports commit 068f161536d9a28a5bc482f3de9c387b2fe5908d from qemu	2018-03-12 10:11:48 -04:00
Laurent Vivier	5d508f45b6	target/m68k: implement fetox Using a local m68k floatx80_etox() [copied from previous: Written by Andreas Grabher for Previous, NeXT Computer Emulator.] Backports commit 40ad087330bee5394c9e78c97f909f580be69b58 from qemu	2018-03-12 10:11:47 -04:00
Laurent Vivier	2b793fce0f	target/m68k: implement flog2 Using a local m68k floatx80_log2() [copied from previous: Written by Andreas Grabher for Previous, NeXT Computer Emulator.] Backports commit 67b453ed73fe65949c24e6ca2b43f6816a89a301 from qemu	2018-03-12 10:11:47 -04:00
Laurent Vivier	a052fcb40b	target/m68k: implement flog10 Using a local m68k floatx80_log10() [copied from previous: Written by Andreas Grabher for Previous, NeXT Computer Emulator.] Backports commit 248efb66fb88bc17c04a0d0f09a3539a43c80769 from qemu	2018-03-12 10:11:47 -04:00

1 2 3 4 5 ...

4574 commits