unicorn/qemu
Longpeng(Mike) 8b5400d675
target-i386: present virtual L3 cache info for vcpus
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
......
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
......
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
......
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
......
}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.

For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
No-L3 With-L3(applied this patch)
cpu0:	363890	44582
cpu1:	373405	43109
cpu2:	340783	43797
cpu3:	333854	43409
cpu4:	327170	40038
cpu5:	325491	39922
cpu6:	319129	42391
cpu7:	306480	41035
cpu8:	161139	32188
cpu9:	164649	31024
cpu10:	149823	30398
cpu11:	149823	32455
cpu12:	164830	35143
cpu13:	172269	35805
cpu14:	179979	33898
cpu15:	194505	32754
avg:	268963.6	40129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).

Backports commit 14c985cffa6cb177fc01a163d8bcf227c104718c from qemu
2018-02-25 23:16:14 -05:00
..
crypto crypto: Clean up includes 2018-02-19 00:47:40 -05:00
default-configs arm64eb: add support for ARM64 big endian. 2017-04-24 23:30:01 +08:00
docs docs: clarify memory region lifecycle 2018-02-12 15:11:21 -05:00
fpu softfloat: Fix warn about implicit conversion from int to int8_t 2018-02-25 22:54:39 -05:00
hw qdev: Fix object reference leak in case device.realize() fails 2018-02-25 21:00:26 -05:00
include glib_compat: Amend header guard 2018-02-25 23:12:20 -05:00
qapi qapi: change QmpInputVisitor to QSLIST 2018-02-25 20:02:09 -05:00
qobject util: move declarations out of qemu-common.h 2018-02-22 09:25:48 -05:00
qom qapi: Add new visit_complete() function 2018-02-25 01:20:03 -05:00
scripts qapi: Implement boxed types for commands/events 2018-02-25 20:22:03 -05:00
target-arm target-arm: Fix lpae bit in FSR on an alignment fault 2018-02-25 23:10:29 -05:00
target-i386 target-i386: present virtual L3 cache info for vcpus 2018-02-25 23:16:14 -05:00
target-m68k tcg: Reorg TCGOp chaining 2018-02-25 21:44:50 -05:00
target-mips target-mips: Silence unused function warning 2018-02-25 21:47:22 -05:00
target-sparc tcg: Reorg TCGOp chaining 2018-02-25 21:44:50 -05:00
tcg tcg: Lower indirect registers in a separate pass 2018-02-25 22:32:28 -05:00
util util: Move qemu-log to utils 2018-02-25 22:17:44 -05:00
aarch64.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
aarch64eb.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
accel.c accel: make configure_accelerator return void 2018-02-24 00:31:28 -05:00
arm.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
armeb.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
CODING_STYLE import 2015-08-21 15:04:50 +08:00
configure configure: Enable -Werror for MinGW builds, too 2018-02-24 18:56:05 -05:00
COPYING import 2015-08-21 15:04:50 +08:00
COPYING.LIB import 2015-08-21 15:04:50 +08:00
cpu-exec-common.c cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() 2018-02-24 17:25:28 -05:00
cpu-exec.c tb hash: hash phys_pc, pc, and flags with xxhash 2018-02-24 18:00:14 -05:00
cpus.c cpu: move exec-all.h inclusion out of cpu.h 2018-02-24 02:39:08 -05:00
cputlb.c cputlb: Add address parameter to VICTIM_TLB_HIT 2018-02-25 03:03:36 -05:00
exec.c exec: avoid realloc in phys_map_node_reserve 2018-02-25 19:32:40 -05:00
gen_all_header.sh arm64eb: add support for ARM64 big endian. 2017-04-24 23:30:01 +08:00
glib_compat.c qapi: Fix memleak in string visitors on int lists 2018-02-25 00:20:34 -05:00
HACKING import 2015-08-21 15:04:50 +08:00
header_gen.py memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
ioport.c hw: remove pio_addr_t 2018-02-24 02:43:16 -05:00
LICENSE import 2015-08-21 15:04:50 +08:00
m68k.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
Makefile Makefile: Add a FORCE target 2018-02-24 17:03:51 -05:00
Makefile.objs util: Move qemu-log to utils 2018-02-25 22:17:44 -05:00
Makefile.target tcg: split tcg_op_defs to -common 2018-02-17 15:23:51 -05:00
memory.c memory: Don't use memcpy for ram_device regions 2018-02-25 23:06:36 -05:00
memory_mapping.c include/qemu/osdep.h: Don't include qapi/error.h 2018-02-21 23:08:18 -05:00
mips.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
mips64.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
mips64el.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
mipsel.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
powerpc.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
qapi-schema.json qapi: Lazy creation of array types 2018-02-19 18:55:35 -05:00
qemu-timer.c all: Clean up includes 2018-02-19 01:34:28 -05:00
rules.mak Makefile: add dependency on scripts/create_config 2018-02-24 17:05:03 -05:00
softmmu_template.h cputlb: Fix for self-modifying writes across page boundaries 2018-02-25 03:12:11 -05:00
sparc.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
sparc64.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00
tcg-runtime.c all: Clean up includes 2018-02-19 01:34:28 -05:00
translate-all.c translate-all: Fix user-mode self-modifying code in 2 page long TB 2018-02-25 03:14:22 -05:00
translate-all.h user-exec: Push resume-from-signal code out to handle_cpu_signal() 2018-02-24 17:21:06 -05:00
translate-common.c exec: Clean up includes 2018-02-19 00:49:55 -05:00
unicorn_common.h qom/cpu: Add MemoryRegion property 2018-02-18 21:54:50 -05:00
VERSION import 2015-08-21 15:04:50 +08:00
vl.c hw: explicitly include qemu/log.h 2018-02-24 02:00:45 -05:00
vl.h import 2015-08-21 15:04:50 +08:00
x86_64.h memory: Replace skip_dump flag with ram_device 2018-02-25 23:00:45 -05:00