unicorn

mirror of https://github.com/yuzu-emu/unicorn.git synced 2024-12-26 04:15:44 +00:00

Author	SHA1	Message	Date
Kirill A. Shutemov	eb489625b5	x86: implement la57 paging mode The new paging more is extension of IA32e mode with more additional page table level. It brings support of 57-bit vitrual address space (128PB) and 52-bit physical address space (4PB). The structure of new page table level is identical to pml4. The feature is enumerated with CPUID.(EAX=07H, ECX=0):ECX[bit 16]. CR4.LA57[bit 12] need to be set when pageing enables to activate 5-level paging mode. Backports commit 6c7c3c21f95dd9af8a0691c0dd29b07247984122 from qemu	2018-03-01 11:02:07 -05:00
Doug Evans	7c874b1b2b	target-i386: Fix eflags.TF/#DB handling of syscall/sysret insns The syscall and sysret instructions behave a bit differently: TF is checked after the instruction completes. This allows the o/s to disable #DB at a syscall by adding TF to FMASK. And then when the sysret is executed the #DB is taken "as if" the syscall insn just completed. Backports commit c52ab08aee6f7d4717fc6b517174043126bd302f from qemu	2018-03-01 10:56:22 -05:00
Yi Sun	f6e624d97b	target-i386: Add Intel SHA_NI instruction support. Add SHA_NI feature bit. Its spec can be found at: https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf Backports commit 638cbd452d3a92a2ab18caee73078483d90f64eb from qemu	2018-03-01 10:52:54 -05:00
Paolo Bonzini	560515941a	target-i386: correctly propagate retaddr into SVM helpers Commit 2afbdf8 ("target-i386: exception handling for memory helpers", 2015-09-15) changed tlb_fill's cpu_restore_state+raise_exception_err to raise_exception_err_ra. After this change, the cpu_restore_state and raise_exception_err's cpu_loop_exit are merged into raise_exception_err_ra's cpu_loop_exit_restore. This actually fixed some bugs, but when SVM is enabled there is a second path from raise_exception_err_ra to cpu_loop_exit. This is the VMEXIT path, and now cpu_vmexit is called without a cpu_restore_state before. The fix is to pass the retaddr to cpu_vmexit (via cpu_svm_check_intercept_param). All helpers can now use GETPC() to pass the correct retaddr, too. Backports commit 823fb688ebc52a7d79c1308acb28c92b56820167 from qemu	2018-03-01 09:31:16 -05:00
Luwei Kang	57533d1adc	x86: add AVX512_4VNNIW and AVX512_4FMAPS features The spec can be found in Intel Software Developer Manual or in Instruction Set Extensions Programming Reference. Backports commit 95ea69fb46266aaa46d0c8b7f0ba8c4903dbe4e3 from qemu	2018-03-01 08:51:09 -05:00
Emilio G. Cota	3dc16ebca3	target-i386: remove helper_lock() It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + \| \|cmpxchg +-H--+ \| 20 +Emaster +-N--+ ++ \|\| \| \|++ \| \|\| \| 15 +++ ++ \|N\| \| \|+\| \| 10 ++\| ++ \|+\|+ \| \| \| -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E\| \|+E+E+- +++ +E+------+E+-- \| 5 ++\|+ ++ \|+N+H+--- +++ \| ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- \| 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + \| \|cmpxchg +-H--+ \| 20 ++master +-N--+ ++ \|E\| \| \|++ \| \|\|E \| 15 ++\| ++ \|N\|\| \| \|+\|\| ---+E+------+E+-----+E+------+E\| 10 ++\| \| ---+E+------+E+-----+E+--- +++ +++ \|\|H+E+--+E+-- \| \|+++++ \| \| \|\| \| 5 ++\|+H+-- +++ ++ \|+N+ - ---+H+------+H+------ \| + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H\| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + \| 35 +cmpxchg +-H--+ ++ \| master +-N--+ ---+E+------+E+------+E+-----+E+------+E\| 30 ++\| ---+E+-- +++ ++ \| \| -+E+--- \| 25 ++E ---- +++ ++ \|+++++ -+E+ \| 20 +E+ E-- +++ ++ \|H\|+++ \| \|+\| +H+------- \| 15 ++H+ ---+++ +H+------ ++ \|N++H+-- +++--- +H+------++\| 10 ++ +++ - +++ ---+H+ +++ +H+ \| \| +H+-----+H+------+H+-- \| 5 ++\| +++ ++ ++N+N+--+N++ + + + + + \| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + \| 140 +cmpxchg +-H--+ +++ +++ ++ \| master +-N--+ E--------E------+E+------++\| 120 ++ --\| \| +++ E+ \| -- +++ +++ ++\| 100 ++ - ++ \| +++- +++ ++\| 80 ++ -+E+ -+H+------+H+------H--------++ \| ---- ---- +++ H\| \| ---+E+-----+E+- ---+H+ ++\| 60 ++ +E+--- +++ ---+H+--- ++ \| --+++ ---+H+-- \| 40 ++ +E+-+H+--- ++ \| +H+ \| 20 +EE+ ++ +N+ + + + + + + \| 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + \| 300 +cmpxchg +-H--+ +++ \| master +-N--+ +++ \|\| \| +++ \| ----E\| 250 ++ \| ----E---- ++ \| ----E--- \| ---+H\| 200 ++ -+E+--- +++ ---+H+--- ++ \| ---- -+H+-- \| \| +E+ +++ ---- +++ \| 150 ++ ---+++ ---+H+- ++ \| --- -+H+-- \| 100 ++ ---+E+ ---- +++ ++ \| +++ ---+E+-----+H+- \| \| -+E+------+H+-- \| 50 ++ +E+ ++ +EE+ + + + + + + \| 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Backports commit 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b from qemu	2018-02-27 23:43:22 -05:00
Emilio G. Cota	9d9b7dedac	target-i386: emulate XCHG using atomic helper Backports commit ea97ebe89f7a879ea9aba90140e40c29b5cbd653 from qemu	2018-02-27 23:40:20 -05:00
Emilio G. Cota	8f96b6beb9	target-i386: emulate LOCK'ed BTX ops using atomic helpers Backports commit cfe819d309d472f75fd129faf1d1064a2498326c from qemu	2018-02-27 23:39:21 -05:00
Emilio G. Cota	089965fa8d	target-i386: emulate LOCK'ed XADD using atomic helper Backports commit f53b01817f95781d2bcc8a82e057d1416601e13b from qemu	2018-02-27 23:06:28 -05:00
Emilio G. Cota	f9ed728f27	target-i386: emulate LOCK'ed NEG using cmpxchg helper Backports commit 8eb8c7385608b99bed6055a22d897ff727a6cb8e from qemu	2018-02-27 23:03:28 -05:00
Emilio G. Cota	fedeb0f93e	target-i386: emulate LOCK'ed NOT using atomic helper Backports commit 2a5fe8ae145ef7a3ab480922116d27efcc97b85d from qemu	2018-02-27 23:00:33 -05:00
Emilio G. Cota	05c94546d5	target-i386: emulate LOCK'ed INC using atomic helper Backports commit 60e573462fcdb83aa1a41e66a9f31dc8a4364399 from qemu	2018-02-27 22:56:05 -05:00
Emilio G. Cota	7c7b0fe746	target-i386: emulate LOCK'ed OP instructions using atomic helpers Backports commit a7cee522f3529c2fc85379237b391ea98823271e from qemu	2018-02-27 22:53:46 -05:00
Emilio G. Cota	a386368f82	target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers The diff here is uglier than necessary. All this does is to turn FOO into: if (s->prefix & PREFIX_LOCK) { BAR } else { FOO } where FOO is the original implementation of an unlocked cmpxchg. Backports commit ae03f8de45427042ecd10b0941a005f21ecc064c from qemu	2018-02-27 22:38:37 -05:00
Paolo Bonzini	be00a3e100	target-i386: fix 32-bit addresses in LEA This was found with test-i386. The issue is that instructions such as addr32 lea (%eax), %rax did not perform a 32-bit extension, because the LEA translation skipped the gen_lea_v_seg step. That step does not just add segments, it also takes care of extending from address size to pointer size. Backports commit 620abfb004543404bef1953e25da2ad77352941a from qemu	2018-02-26 10:06:08 -05:00
Eduardo Habkost	b41bb81737	target-i386: Don't use cpu->migratable when filtering features When explicitly enabling unmigratable flags using "-cpu host" (e.g. "-cpu host,+invtsc"), the requested feature won't be enabled because cpu->migratable is true by default. This is inconsistent with all other CPU models, which don't have the "migratable" option, making "+invtsc" work without the need for extra options. This happens because x86_cpu_filter_features() uses cpu->migratable as an argument for x86_cpu_get_supported_feature_word(). This is not useful because: 2) on "-cpu host" it only makes QEMU disable features that were explicitly enabled in the command-line; 1) on all the other CPU models, cpu->migratable is already false. The fix is to just use 'false' as an argument to x86_cpu_get_supported_feature_word() in x86_cpu_filter_features(). Note that: * This won't change anything for people using using "-cpu host" or "-cpu host,migratable=<on\|off>" (with no extra features) because the x86_cpu_get_supported_feature_word() call on the cpu->host_features check uses cpu->migratable as argument. * This won't change anything for any CPU model except "host" because they all have cpu->migratable == false (and only "host" has the "migratable" property that allows it to be changed). * This will only change things for people using "-cpu host,+<feature>", where <feature> is a non-migratable feature. The only existing named non-migratable feature is "invtsc". In other words, this change will only affect people using "-cpu host,+invtsc" (that will now get what they asked for: the invtsc flag will be enabled). All other use cases are unaffected. Backports commit 46c032f3afcc05a0123914609f1003906ba63fda from qemu	2018-02-26 09:51:14 -05:00
Eduardo Habkost	4096ce0184	target-i386: x86_cpu_load_features() function When probing for CPU model information, we need to reuse the code that initializes CPUID fields, but not the remaining side-effects of x86_cpu_realizefn(). Move that code to a separate function that can be reused later. Backports commit 41f3d4d69a423dadb8431fda65d8d7c68c0de0fc from qemu	2018-02-26 09:49:34 -05:00
Eduardo Habkost	aa98c8a93f	target-i386: Move warning code outside x86_cpu_filter_features() x86_cpu_filter_features() will be reused by code that shouldn't print any warning. Move the warning code to a new x86_cpu_report_filtered_features() function, and call it from x86_cpu_realizefn(). Backports commit 8ca30e8673aff9bfcf8f969f8db4266b5f62e49c from qemu	2018-02-26 09:40:11 -05:00
Eduardo Habkost	08bfa41e1b	target-i386: xsave: Add FP and SSE bits to x86_ext_save_areas Instead of treating the FP and SSE bits as special cases, add them to the x86_ext_save_areas array. This will simplify the code that calculates the supported xsave components and the size of the xsave area. Backports commit e3c9022b4e2b6a4deb6518361d2bbf33522b9198 from qemu	2018-02-26 09:37:48 -05:00
Eduardo Habkost	54bd827472	target-i386: Register properties for feature aliases manually Instead of keeping the aliases inside the feature name arrays and require parsing the strings, just register alias properties manually. This simplifies the code for property registration and lookup. Backports commit 16d2fcaa509b1ca56eb2fcd8fe877279cf65cccc from qemu	2018-02-26 09:34:52 -05:00
Eduardo Habkost	b508b9e02a	target-i386: Remove underscores from feat_names arrays Instead of translating the feature name entries when adding property names, store the actual property names in the feature name array. For reference, here is the full list of functions that use FeatureWordInfo::feat_names: * x86_cpu_get_migratable_flags(): not affected, as it just check for non-NULL values. * report_unavailable_features(): informative only. It will start printing feature names with hyphens. * x86_cpu_list(): informative only. It will start printing feature names with hyphens * x86_cpu_register_feature_bit_props(): not affected, as it was already calling feat2prop(). Now we can remove the feat2prop() calls safely. So, the only user-visible effect of this patch are the new names being used in help and error messages for users. Backports commit fc7dfd205f3287893c436d932a167bffa30579c8 from qemu	2018-02-26 09:33:15 -05:00
Eduardo Habkost	6d1a7bccb5	target-i386: Disable VME by default with TCG VME is already disabled automatically when using TCG. So, instead of pretending it is there when reporting CPU model data on query-cpu-* QMP commands (making every CPU model to be reported as not runnable), we can disable it by default on all CPU models when using TCG. Do that by adding a tcg_default_props array that will work like kvm_default_props. Backports commit 04d99c3c61f4bdc0450dbeb6512b6dd743baca65 from qemu	2018-02-26 08:23:44 -05:00
Eduardo Habkost	594cbeaa06	target-i386: List CPU models using subclass list Instead of using the builtin_x86_defs array, use the QOM subclass list to list CPU models on "-cpu ?" and "query-cpu-definitions". Backports commit ee465a3ef77c2b2975ffa71c72208c05b3f3970d from qemu	2018-02-26 08:17:04 -05:00
Evgeny Yakovlev	fa9d708fbd	target-i386: Correct family/model/stepping for Opteron_G3 Current CPU definition for AMD Opteron third generation includes features like SSE4a and LAHF_LM support in emulated CPUID. These features are present in K8 rev.E or K10 CPUs and later. However, current G3 family and model describe 2nd generation K8 cores instead. This is incorrect but was considered harmless until our tests found a problem with linux kernels >= 3.10 (and maybe earlier) which specifically check for Opteron K8 model when parsing CPUID leaf 0x80000001: http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/amd.c?v=3.16#L552 This code will disable LAHF_LM feature in /proc/cpuinfo if model number is inconsistent. This change sets Opteron_G3 family/model/stepping to 16/2/3 which is a proper Opteron 3rd generation 2350 CPU. Backports commit 339892d758efb2d0954160d41736a0eac9875d67 from qemu	2018-02-26 04:59:18 -05:00
Eduardo Habkost	b7f434373b	target-i386: Report known CPUID[EAX=0xD,ECX=0]:EAX bits as migratable A regression was introduced by commit 96193c22a "target-i386: Move xsave component mask to features array": all CPUID[EAX=0xD,ECX=0]:EAX bits were being reported as unmigratable because they don't have feature names defined. This broke "-cpu host" because it enables only migratable features by default. This adds a new field to FeatureWordInfo: migratable_flags, which will make those features be reported as migratable even if they don't have a property name defined. Backports commit 6fb2fff75dceed1716e757882a6dfbadd9042407 from qemu	2018-02-26 04:58:05 -05:00
Alex Bennée	33589eb75f	cpus: pass CPUState to run_on_cpu helpers CPUState is a fairly common pointer to pass to these helpers. This means if you need other arguments for the async_run_on_cpu case you end up having to do a g_malloc to stuff additional data into the routine. For the current users this isn't a massive deal but for MTTCG this gets cumbersome when the only other parameter is often an address. This adds the typedef run_on_cpu_func for helper functions which has an explicit CPUState * passed as the first parameter. All the users of run_on_cpu and async_run_on_cpu have had their helpers updated to use CPUState where available. Backports commit e0eeb4a21a3ca4b296220ce4449d8acef9de9049 from qemu	2018-02-26 04:54:55 -05:00
Eduardo Habkost	49c04d7104	target-i386: Clear KVM CPUID features if KVM is disabled This will ensure all checks for features[FEAT_KVM] in the code will be correct in case the KVM CPUID leaf is completely disabled. Backports commit aec661de86894e914d2d82431d9cefa9a9a40213 from qemu	2018-02-26 04:47:05 -05:00
Eduardo Habkost	f29384c810	target-i386: Move xsave component mask to features array This will reuse the existing check/enforce logic in x86_cpu_filter_features() to check the xsave component bits against GET_SUPPORTED_CPUID. Backports commit 96193c22ab39ea24f81e386ad7883260ff24f5fd from qemu	2018-02-26 04:45:35 -05:00
Eduardo Habkost	3fb3e6672b	target-i386: xsave: Calculate set of xsave components on realize Instead of doing complex calculations and calling kvm_arch_get_supported_cpuid() inside cpu_x86_cpuid(), calculate the set of required XSAVE components earlier, at realize time. Backports commit 2ca8a8becc2eeb5262e478ce502f5daa53f3d0bc from qemu	2018-02-26 04:40:41 -05:00
Eduardo Habkost	28f002cbaf	target-i386: xsave: Helper function to calculate xsave area size Move the xsave area size calculation from cpu_x86_cpuid() inside its own function. While doing it, change it to use the XSAVE area struct sizes for the initial size, instead of the magic 0x240 number. Backports commit 1fda6198e4126af9988754c8824cfc9928649890 from qemu	2018-02-26 04:36:27 -05:00
Eduardo Habkost	c35e9eb9af	target-i386: xsave: Simplify CPUID[0xD,0].{EAX,EDX} calculation Instead of assigning individual bits in a loop, just copy the values from ena_mask. Backports commit 8057c621b1b17cbcb35fe67d1a09ada9055873a9 from qemu	2018-02-26 04:35:14 -05:00
Eduardo Habkost	c7195afd32	target-i386: xsave: Calculate enabled components only once Instead of checking both env->features and ena_mask at two different places in the CPUID code, initialize ena_mask based on the features that are enabled for the CPU, and then clear unsupported bits based on kvm_arch_get_supported_cpuid(). The results should be exactly the same, but it will make it easier to move the mask calculation elsewhare, and reuse x86_cpu_filter_features() for the kvm_arch_get_supported_cpuid() check. Backports commit 4928cd6de6b4211a79f98c8dc39115be1e815c2b from qemu	2018-02-26 04:33:18 -05:00
Eduardo Habkost	c3a0cba5b1	target-i386: Don't try to enable PT State xsave component The code that calculates the set of supported XSAVE components on CPUID looks at ext_save_areas to find out which components should be enabled. However, if there are zeroed entries in the ext_save_areas array, the ((env->features[esa->feature] & esa->bits) == esa->bits) check will always succeed and QEMU will unconditionally try to enable the component. Luckily this never caused any problems because the only missing entry in ext_save_areas is the PT State component (bit 8), and KVM currently doesn't support it (so it was cleared on ena_mask). But the code was still incorrect and would break if KVM starts returning CPUID[EAX=0xD,ECX=0].EAX[bit 8] as supported on GET_SUPPORTED_CPUID. Fix the problem by changing the code to not enable a XSAVE component if ExtSaveArea::bits is zero. Backports commit 9646f4927faf68e8690588c2fd6dc9834c440b58 from qemu	2018-02-26 04:30:35 -05:00
Eduardo Habkost	6188c6d6e4	target-i386: Move feature name arrays inside FeatureWordInfo It makes it easier to guarantee the arrays are the right size, and to find information when looking at the code. Backports commit 2d5312da566e4424a807d078da05f92ee7be3eec from qemu	2018-02-26 04:29:47 -05:00
Eduardo Habkost	74ae087743	target-i386: Enable CPUID[0x8000000A] if SVM is enabled SVM needs CPUID[0x8000000A] to be available. So if SVM is enabled in a CPU model or explicitly in the command-line, adjust CPUID xlevel to expose the CPUID[0x8000000A] leaf. Backports commit 0c3d7c0051576d220e6da0a8ac08f2d8482e2f0b from qemu	2018-02-26 04:05:47 -05:00
Eduardo Habkost	37406874ea	target-i386: Automatically set level/xlevel/xlevel2 when needed Instead of requiring users and management software to be aware of required CPUID level/xlevel/xlevel2 values for each feature, automatically increase those values when features need them. This was already done for CPUID[7].EBX, and is now made generic for all CPUID feature flags. Unit test included, to make sure we don't break ABI on older machine-types and don't mess with the CPUID level values if they are explicitly set by the user. Backports commit c39c0edf9bb3b968ba95484465a50c7b19f4aa3a from qemu	2018-02-26 04:03:09 -05:00
Eduardo Habkost	6861fe80cf	target-i386: Add a marker to end of the region zeroed on reset Instead of using cpuid_level, use an empty struct as a marker (like we already did with {start,end}_init_save). This will avoid accidentaly resetting the wrong fields if we change the field ordering on CPUX86State. Backports commit 5e992a8e337e710ea2d02f35668ac55a80e15f99 from qemu	2018-02-26 03:59:03 -05:00
Eduardo Habkost	c78d24b93c	target-i386: Remove unused X86CPUDefinition::xlevel2 field No CPU model in builtin_x86_defs has xlevel2 set, so it is always zero. Delete the field. Note that this is not an user-visible change. It doesn't remove the ability to set xlevel2 on the command-line, it just removes an unused field in builtin_x86_defs. Backports commit 0456441b5eb6694a561ad5bb8dad52483e6a08d0 from qemu	2018-02-26 03:57:02 -05:00
Richard Henderson	552ef4b3e6	target-i386: Use struct X86XSaveArea in fpu_helper.c This avoids a double hand-full of magic numbers in the xsave and xrstor helper functions. Backports commit 3f32bd21df655e62eb271182a5c63280d631c7b3 from qemu	2018-02-26 03:38:53 -05:00
Pranith Kumar	533e083495	target-i386: Generate fences for x86 Backports commit cc19e497a047193db5083425957d7292c8dd3226 from qemu	2018-02-26 03:28:31 -05:00
Stanislav Shmarov	5f9552657e	target-i386: Fixed syscall posssible segfault In user-mode emulation env->idt.base memory is allocated in linux-user/main.c with size 8512 = 4096 (for 64-bit). When fake interrupt EXCP_SYSCALL is thrown do_interrupt_user checks destination privilege level for this fake exception, and tries to read 4 bytes at address base + (256 2^4)=4096, that causes segfault. Privlege level was checked only for int's, so lets read dpl from memory only for this case. Backports commit 885b7c44e4f8b7a012a92770a0dba8b238662caa from qemu	2018-02-26 02:36:09 -05:00
Paolo Bonzini	d8d0d08262	target-i386: fix ordering of fields in CPUX86State Make sure reset zeroes TSC_AUX, XCR0, PKRU. Move XSTATE_BV from the "vmstate only" section to the "KVM only" section. Backports commit 7616f1c2da1c0f336a474a56ad6d32e15ccd666e from qemu	2018-02-26 02:34:22 -05:00
Longpeng(Mike)	8b5400d675	target-i386: present virtual L3 cache info for vcpus Some software algorithms are based on the hardware's cache info, for example, for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger a resched IPI and told cpu2 to do the wakeup if they don't share low level cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc. The relevant linux-kernel code as bellow: static void ttwu_queue(struct task_struct p, int cpu) { struct rq rq = cpu_rq(cpu); ...... if (... && !cpus_share_cache(smp_processor_id(), cpu)) { ...... ttwu_queue_remote(p, cpu); /* will trigger RES IPI / return; } ...... ttwu_do_activate(rq, p, 0); / access target's rq directly / ...... } In real hardware, the cpus on the same socket share L3 cache, so one won't trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs under some workloads even if the virtual cpus belongs to the same virtual socket. For KVM, there will be lots of vmexit due to guest send IPIs. The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates) and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during the period: No-L3 With-L3(applied this patch) cpu0: 363890 44582 cpu1: 373405 43109 cpu2: 340783 43797 cpu3: 333854 43409 cpu4: 327170 40038 cpu5: 325491 39922 cpu6: 319129 42391 cpu7: 306480 41035 cpu8: 161139 32188 cpu9: 164649 31024 cpu10: 149823 30398 cpu11: 149823 32455 cpu12: 164830 35143 cpu13: 172269 35805 cpu14: 179979 33898 cpu15: 194505 32754 avg: 268963.6 40129.8 The VM's topology is "1socket 8cores 2threads". After present virtual L3 cache info for VM, the amounts of RES IPIs in guest reduce 85%. For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause severe performance degradation. We had tested the overall system performance if vcpus actually run on sparate physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%). Backports commit 14c985cffa6cb177fc01a163d8bcf227c104718c from qemu	2018-02-25 23:16:14 -05:00
Luwei Kang	af7b3995dd	target-i386: Add more Intel AVX-512 instructions support Add more AVX512 feature bits, include AVX512DQ, AVX512IFMA, AVX512BW, AVX512VL, AVX512VBMI. Its spec can be found at: https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf Backports commit cc728d1493eee3e20c1547191862e43d3f55e714 from qemu	2018-02-25 23:09:18 -05:00
Richard Henderson	1547048a22	tcg: Reorg TCGOp chaining Instead of using -1 as end of chain, use 0, and link through the 0 entry as a fully circular double-linked list. Backports commit dcb8e75870e2de199db853697f8839cb603beefe from qemu	2018-02-25 21:44:50 -05:00
Igor Mammedov	d30410dc9a	target-i386: Add x86_cpu_unrealizefn() First remove VCPU from exec loop and only then remove lapic. Backports commit c884776e9dc947105827bd6c22192863f97267d2 from qemu	2018-02-25 20:54:13 -05:00
Igor Mammedov	298b0e6529	target-i386: Fix apic object leak when CPU is deleted Backports commit 67e55caa6dcb91c80428cee6fe463f8dd8a755ab from qemu	2018-02-25 20:48:40 -05:00
Igor Mammedov	e15fb246ab	target-i386: cpu: Do not ignore error and fix apic parent object_property_add_child() silently fails with error that it can't create duplicate propery 'apic' as we already have 'apic' property registered for 'apic' feature. As result generic device_realize puts apic into unattached container. As it's programming error, abort if name collision happens in future and fix property name for apic_state to 'lapic', this way apic is a child of cpu instance. Backports commit 6816b1b3811e839540df22855d975b6d76ae438b from qemu	2018-02-25 20:47:46 -05:00
Paolo Bonzini	403021183d	target-i386: Add support for UMIP and RDPID CPUID bits These are both stored in CPUID[EAX=7,EBX=0].ECX. KVM is going to be able to emulate both (albeit with a performance loss in the case of RDPID, which therefore will be in KVM_GET_EMULATED_CPUID rather than KVM_GET_SUPPORTED_CPUID). It's also possible to implement both in TCG, but this is for 2.8. Backports commit c2f193b538032accb9db504998bf2ea7c0ef65af from qemu	2018-02-25 20:46:40 -05:00
Igor Mammedov	6714284211	target-i386: Add socket/core/thread properties to X86CPU These properties will be used by as address where to plug CPU with help -device/device_add commands. Backports commit d89c2b8b98e097b9cad5104b0f178bde1cfa011b from qemu	2018-02-25 20:45:35 -05:00

1 2 3 4 5 ...

368 commits