Commit graph

100 commits

Author SHA1 Message Date
Lioncash 3fcc644fab
target/i386: Correct X86_CPU macro parameters in x86_cpu_handle_mmu_fault() in helper.c 2018-03-15 23:03:27 -04:00
Brijesh Singh 3a5e6fb750
cpu/i386: populate CPUID 0x8000_001F when SEV is active
When SEV is enabled, CPUID 0x8000_001F should provide additional
information regarding the feature (such as which page table bit is used
to mark the pages as encrypted etc).

The details for memory encryption CPUID is available in AMD APM
(https://support.amd.com/TechDocs/24594.pdf) Section E.4.17

Backports relevant parts of commit 6cb8f2a663a47c6e0da17fc4fb9e06abfda2bd48 from qemu
2018-03-15 16:34:39 -04:00
Liran Alon c1587be44c
KVM: x86: Add support for save/load MSR_SMI_COUNT
This MSR returns the number of #SMIs that occurred on
CPU since boot.

KVM commit 52797bf9a875 ("KVM: x86: Add emulation of MSR_SMI_COUNT")
introduced support for emulating this MSR.

This commit adds support for QEMU to save/load this
MSR for migration purposes.

Backports the relevant parts of commit e13713db5b609d9a83c9cfc8ba389d4215d4ba29 from qemu
2018-03-15 16:29:34 -04:00
Chao Peng 58fb5ce47d
i386: Add support to get/set/migrate Intel Processor Trace feature
Add Intel Processor Trace related definition. It also add
corresponding part to kvm_get/set_msr and vmstate.

Backports commit b77146e9a129bcdb60edc23639211679ae846a92 from qemu
2018-03-15 16:03:25 -04:00
Chao Peng 9b86780b9e
i386: Add Intel Processor Trace feature support
Expose Intel Processor Trace feature to guest.

To make Intel PT live migration safe and get same CPUID information
with same CPU model on diffrent host. CPUID[14] is constant in this
patch. Intel PT use EPT is first supported in IceLake, the CPUID[14]
get on this machine as default value. Intel PT would be disabled
if any machine don't support this minial feature list.

Backports commit e37a5c7fa459558b5020588994707fe3fdd6616e from qemu
2018-03-15 16:02:41 -04:00
Wanpeng Li de4b198371
target-i386: add KVM_HINTS_DEDICATED performance hint
Add KVM_HINTS_DEDICATED performance hint, guest checks this feature bit
to determine if they run on dedicated vCPUs, allowing optimizations such
as usage of qspinlocks.

Backports commit be7773268d98176489483a315d3e2323cb0615b9 from qemu
2018-03-15 15:58:59 -04:00
Richard Henderson cd538f0b7e
tcg: Initialize cpu_env generically
This is identical for each target. So, move the initialization to
common code. Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.

This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.

Backports commit 1c2adb958fc07e5b3e81ed21b801c04a15f41f4f from qemu
2018-03-15 15:49:19 -04:00
Emilio G. Cota 078c9e7e3b
tcg: take tb_ctx out of TCGContext
Groundwork for supporting multiple TCG contexts.

Backports commit 44ded3d04821bec57407cc26a8b4db620da2be04 from qemu
2018-03-14 09:18:12 -04:00
Emilio G. Cota f7c984d21f
translate-all: use a binary search tree to track TBs in TBContext
This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel img/arm/aarch32-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

- Before:

8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% )
16,974 context-switches # 0.002 M/sec ( +- 0.12% )
0 cpu-migrations # 0.000 K/sec
10,125 page-faults # 0.001 M/sec ( +- 1.23% )
35,144,901,879 cycles # 4.367 GHz ( +- 0.14% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% )
10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% )
192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% )

8.640869419 seconds time elapsed ( +- 0.57% )

- After:
8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% )
17,016 context-switches # 0.002 M/sec ( +- 0.40% )
0 cpu-migrations # 0.000 K/sec
18,769 page-faults # 0.002 M/sec ( +- 0.45% )
35,660,956,120 cycles # 4.378 GHz ( +- 1.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% )
10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% )
195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% )

8.828660235 seconds time elapsed ( +- 0.38% )

Backports commit 2ac01d6dafabd4a726254eea98824c798d416ee4 from qemu
2018-03-13 16:18:29 -04:00
Emilio G. Cota 5c1dbf456b
target/i386: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state
of the system.

Backports commit b5e3b4c2aca8eb5a9cfeedfb273af623f17c3731 from qemu
2018-03-13 15:10:44 -04:00
Emilio G. Cota c384da2f47
tcg: convert tb->cflags reads to tb_cflags(tb)
Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.

Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.

Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:

FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)

perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES

Then manually fixed the few errors that checkpatch reported.

Compile-tested for all targets.

Backports commit c5a49c63fa26e8825ad101dfe86339ae4c216539 from qemu
2018-03-13 14:57:51 -04:00
Lioncash e9d9ed5eaa
target/i386/bpt_helper: Perform comparison pass with qemu
Keep formatting and code in sync where applicable
2018-03-12 13:28:50 -04:00
Lioncash fc7eaf7f77
target/i386/svm_helper: Perform comparison pass with qemu
Keep code and formatting in sync where applicable
2018-03-12 13:27:03 -04:00
Lioncash 27c283bb3c
target/i386/smm_helper: Perform comparison pass with qemu
Ensure code and formatting stay in sync where relevant
2018-03-12 13:25:37 -04:00
Lioncash 73426a7e79
target/i386/seg_helper: Perform comparison pass against qemu
Ensure formatting and code stay in sync where relevant
2018-03-12 13:24:36 -04:00
Lioncash a1910954cd
target/i386/mem_helper: Perform comparison pass against qemu
Ensure formatting and relevant code are in order
2018-03-12 13:19:05 -04:00
Lioncash 995ae229a3
target/i386/excp_helper: remove unnecessary comment 2018-03-12 13:16:53 -04:00
Lioncash c1e72be68d
target/i386/fpu_helper: Perform comparison pass against qemu 2018-03-12 13:15:51 -04:00
Lioncash 0d0dd2ba98
target/i386/translate: Perform comparison pass against qemu
Ensure code and formatting match qemu where applicable
2018-03-12 13:12:01 -04:00
Eduardo Habkost 12acb995fa
pc: Don't use QEMUMachine anymore
Now that we have a DEFINE_PC_MACHINE helper macro that just requires an
initialization function, it is trivial to convert them to register a QOM
machine class directly, instead of using QEMUMachine.

Backports commit 865906f7fdadd2732441ab158787f81f6a212bfe from qemu
2018-03-09 14:22:43 -05:00
Richard Henderson 7e327aaf84
util: Introduce include/qemu/cpuid.h
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.

Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.

Backports commit 5dd8990841a9e331d9d4838a116291698208cbb6 from qemu
2018-03-09 12:12:00 -05:00
Alex Bennée 0eee5afd0e
target/*/cpu.h: remove softfloat.h
As cpu.h is another typically widely included file which doesn't need
full access to the softfloat API we can remove the includes from here
as well. Where they do need types it's typically for float_status and
the rounding modes so we move that to softfloat-types.h as well.

As a result of not having softfloat in every cpu.h call we now need to
add it to various helpers that do need the full softfloat.h
definitions.

Backports commit 24f91e81b65fcdd0552d1f0fcb0ea7cfe3829c19 from qemu
2018-03-08 09:58:47 -05:00
Markus Armbruster 6a71ff06ca
Include qapi/qmp/qdict.h exactly where needed
This cleanup makes the number of objects depending on qapi/qmp/qdict.h
drop from 4550 (out of 4743) to 368 in my "build everything" tree.
For qapi/qmp/qobject.h, the number drops from 4552 to 390.

While there, separate #include from file comment with a blank line.

Backports commit 452fcdbc49c59884c8c284268d64baa24fea11e1 from qemu
2018-03-08 08:51:46 -05:00
Markus Armbruster 5d554fefeb
Include qapi/error.h exactly where needed
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.

While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.

Backports commit e688df6bc4549f28534cdb001f168b8caae55b0c from qemu
2018-03-07 12:26:38 -05:00
Markus Armbruster aa7a707738
Drop superfluous includes of qapi-types.h and test-qapi-types.h
Backports commit 522ece32d214bd4b086821c4350c2aebe5587878 from qemu
2018-03-07 12:21:43 -05:00
Lioncash 6cbcf9ce76
unicorn/i386: Lessen amount of X86_CPU macros and casts
Reduces the amount of line noise
2018-03-07 10:34:00 -05:00
Laurent Vivier 0aecb15f3b
accel/tcg: add size paremeter in tlb_fill()
The MC68040 MMU provides the size of the access that
triggers the page fault.

This size is set in the Special Status Word which
is written in the stack frame of the access fault
exception.

So we need the size in m68k_cpu_unassigned_access() and
m68k_cpu_handle_mmu_fault().

To be able to do that, this patch modifies the prototype of
handle_mmu_fault handler, tlb_fill() and probe_write().
do_unassigned_access() already includes a size parameter.

This patch also updates handle_mmu_fault handlers and
tlb_fill() of all targets (only parameter, no code change).

Backports commit 98670d47cd8d63a529ff230fd39ddaa186156f8c from qemu
2018-03-06 10:56:34 -05:00
Richard Henderson 5f074f09ab
tcg: Remove TCGV_UNUSED* and TCGV_IS_UNUSED*
These are now trivial sets and tests against NULL. Unwrap.

Backports commit f764718d0cb30af9f1f8e1d6a33622cc05ca4155 from qemu
2018-03-05 15:58:15 -05:00
Alex Bennée 8e973e762d
target/*helper: don't check retaddr before calling cpu_restore_state
cpu_restore_state officially supports being passed an address it can't
resolve the state for. As a result the checks in the helpers are
superfluous and can be removed. This makes the code consistent with
other users of cpu_restore_state.

Of course this does nothing to address what to do if cpu_restore_state
can't resolve the state but so far it seems this is handled elsewhere.

The change was made with included coccinelle script.

Backports commit 65255e8efdd5fca602bcc4ff61a879939ff75f4f from qemu
2018-03-05 14:47:41 -05:00
Peter Xu 1bb34aadf9
cpu: refactor cpu_address_space_init()
Normally we create an address space for that CPU and pass that address
space into the function. Let's just do it inside to unify address space
creations. It'll simplify my next patch to rename those address spaces.

Backports commit 80ceb07a83375e3a0091591f96bd47bce2f640ce from qemu
2018-03-05 14:39:25 -05:00
Stefan Weil 55b19c099e
target/i386: Fix compiler warnings
These gcc warnings are fixed:

target/i386/translate.c:4461:12: warning:
variable 'prefixes' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
target/i386/translate.c:4466:9: warning:
variable 'rex_w' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
target/i386/translate.c:4466:16: warning:
variable 'rex_r' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]

Tested with x86_64-w64-mingw32-gcc from Debian stretch.

Backports commit a4926d99129a1d8072fc4681cd4efdb214f65ed4 from qemu
2018-03-05 14:20:36 -05:00
Yang Zhong 258b885b17
x86/cpu: Enable new SSE/AVX/AVX512 cpu features
Intel IceLake cpu has added new cpu features,AVX512_VBMI2/GFNI/
VAES/VPCLMULQDQ/AVX512_VNNI/AVX512_BITALG. Those new cpu features
need expose to guest VM.

The bit definition:
CPUID.(EAX=7,ECX=0):ECX[bit 06] AVX512_VBMI2
CPUID.(EAX=7,ECX=0):ECX[bit 08] GFNI
CPUID.(EAX=7,ECX=0):ECX[bit 09] VAES
CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ
CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI
CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf

Backports commit aff9e6e46a343e1404498be4edd03db1112f0950 from qemu
2018-03-05 14:19:37 -05:00
Eduardo Habkost 64a535ea8c
i386: Add EPYC-IBPB CPU model
EPYC-IBPB is a copy of the EPYC CPU model with
just CPUID_8000_0008_EBX_IBPB added.

Backports commit 8ebfafa796ca0cb2b035a7f06f836a675d8b48be from qemu
2018-03-05 13:48:30 -05:00
Eduardo Habkost 676409d54e
i386: Add new -IBRS versions of Intel CPU models
The new MSR IA32_SPEC_CTRL MSR was introduced by a recent Intel
microcode updated and can be used by OSes to mitigate
CVE-2017-5715. Unfortunately we can't change the existing CPU
models without breaking existing setups, so users need to
explicitly update their VM configuration to use the new *-IBRS
CPU model if they want to expose IBRS to guests.

The new CPU models are simple copies of the existing CPU models,
with just CPUID_7_0_EDX_SPEC_CTRL added and model_id updated.

Backports commit 61efbbf869293f1deb9ee39d44bd4e635de59fa7 from qemu
2018-03-05 13:48:30 -05:00
Eduardo Habkost 6fbabb4bce
i386: Add FEAT_8000_0008_EBX CPUID feature word
Add the new feature word and the "ibpb" feature flag.

Based on a patch by Paolo Bonzini.

Backports commit 1ade973f5202404e772aae7b1acd331270d246dc from qemu
2018-03-05 13:48:30 -05:00
Eduardo Habkost 7bee95ad28
i386: Add spec-ctrl CPUID bit
Add the feature name and a CPUID_7_0_EDX_SPEC_CTRL macro.

Backports commit 803d42fa65a371f7bb13180a5953299dc3a160e0 from qemu
2018-03-05 13:48:30 -05:00
Paolo Bonzini 7cf98b1a6e
i386: Add support for SPEC_CTRL MSR
Backports commit cb2637a5ae1f4e61af423395300548f14e8a2e2a from qemu
2018-03-05 13:48:29 -05:00
Eduardo Habkost 181524d695
i386: Change X86CPUDefinition::model_id to const char*
It is valid to have a 48-character model ID on CPUID, however the
definition of X86CPUDefinition::model_id is char[48], which can
make the compiler drop the null terminator from the string.

If a CPU model happens to have 48 bytes on model_id, "-cpu help"
will print garbage and the object_property_set_str() call at
x86_cpu_load_def() will read data outside the model_id array.

We could increase the array size to 49, but this would mean the
compiler would not issue a warning if a 49-char string is used by
mistake for model_id.

To make things simpler, simply change model_id to be const char*,
and validate the string length using an assert() on
x86_register_cpudef_type().

Backports commit 4b220d88ba76fb2623ce4b8ba1f1eea66b82144e from qemu
2018-03-05 13:48:29 -05:00
Peter Maydell d89704eb0f
target/i386: Fix handling of VEX prefixes
In commit e3af7c788b73a6495eb9d94992ef11f6ad6f3c56 we
replaced direct calls to to cpu_ld*_code() with calls
to the x86_ld*_code() wrappers which incorporate an
advance of s->pc. Unfortunately we didn't notice that
in one place the old code was deliberately not incrementing
s->pc:

@@ -4501,7 +4528,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
static const int pp_prefix[4] = {
0, PREFIX_DATA, PREFIX_REPZ, PREFIX_REPNZ
};
- int vex3, vex2 = cpu_ldub_code(env, s->pc);
+ int vex3, vex2 = x86_ldub_code(env, s);

if (!CODE64(s) && (vex2 & 0xc0) != 0xc0) {
/* 4.1.4.6: In 32-bit mode, bits [7:6] must be 11b,

This meant we were mishandling this set of instructions.
Remove the manual advance of s->pc for the "is VEX" case
(which is now done by x86_ldub_code()) and instead rewind
PC in the case where we decide that this isn't really VEX.

Backports commit 817a9fcba8043faa467929e7b0193df6bdc92211 from qemu
2018-03-05 13:48:29 -05:00
Wanpeng Li 72430e4d75
target-i386: adds PV_TLB_FLUSH CPUID feature bit
Adds PV_TLB_FLUSH CPUID feature bit.

Backports commit 6976af663d3a19d1f86a56b5504f1b4559d0f1ae from qemu
2018-03-05 13:48:28 -05:00
Richard Henderson 28061c2e59
qom: Introduce CPUClass.tcg_initialize
Move target cpu tcg initialization to common code,
called from cpu_exec_realizefn.

Backports commit 55c3ceef61fcf06fc98ddc752b7cce788ce7680b from qemu
2018-03-05 09:49:26 -05:00
Richard Henderson 4d9c8583fa
tcg: Remove TCGV_EQUAL*
When we used structures for TCGv_*, we needed a macro in order to
perform a comparison. Now that we use pointers, this is just clutter

Backports commit 11f4e8f8bfaa2caaab24bef6bbbb8a0205015119 from qemu
2018-03-05 09:16:07 -05:00
Richard Henderson eb488f5bd6
tcg: Merge opcode arguments into TCGOp
Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries. The result is actually a bit
smaller and should have slightly more cache locality.

Backports commit 75e8b9b7aa0b95a761b9add7e2f09248b101a392 from qemu
2018-03-05 04:45:20 -05:00
Paolo Bonzini 7dd4afd8d9
target/i386: trap on instructions longer than >15 bytes
Besides being more correct, arbitrarily long instruction allow the
generation of a translation block that spans three pages. This
confuses the generator and even allows ring 3 code to poison the
translation block cache and inject code into other processes that are
in guest ring 3.

This is an improved (and more invasive) fix for commit 30663fd ("tcg/i386:
Check the size of instruction being translated", 2017-03-24). In addition
to being more precise (and generating the right exception, which is #GP
rather than #UD), it distinguishes better between page faults and too long
instructions, as shown by this test case:

int main()
{
char *x = mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0);
memset(x, 0x66, 4096);
x[4096] = 0x90;
x[4097] = 0xc3;
char *i = x + 4096 - 15;
mprotect(x + 4096, 4096, PROT_READ|PROT_WRITE);
((void(*)(void)) i) ();
}

... which produces a #GP without the mprotect, and a #PF with it.

Backports commit b066c5375737ad0d630196dab2a2b329515a1d00 from qemu
2018-03-05 04:12:28 -05:00
Paolo Bonzini 44f87a8fbf
target/i386: introduce x86_ld*_code
These take care of advancing s->pc, and will provide a unified point
where to check for the 15-byte instruction length limit.

Backports commit e3af7c788b73a6495eb9d94992ef11f6ad6f3c56 from qemu
2018-03-05 04:11:02 -05:00
Emilio G. Cota 5fae6dd433
tcg: remove addr argument from lookup_tb_ptr
It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.

This change paves the way to having a common "tb_lookup" function.

Backports commit 7f11636dbee89b0e4d03e9e2b96e14649a7db778 from qemu
2018-03-05 02:16:34 -05:00
Todd Eisenberger 75bdfd85a7
x86: Correct translation of some rdgsbase and wrgsbase encodings
It looks like there was a transcription error when writing this code
initially. The code previously only decoded src or dst of rax. This
resolves
https://bugs.launchpad.net/qemu/+bug/1719984.

Backports commit e0dd5fd41a1a38766009f442967fab700d2d0550 from qemu
2018-03-05 02:05:26 -05:00
Gonglei ce71be5c05
i386/cpu/hyperv: support over 64 vcpus for windows guests
Starting with Windows Server 2012 and Windows 8, if
CPUID.40000005.EAX contains a value of -1, Windows assumes specific
limit to the number of VPs. In this case, Windows Server 2012
guest VMs may use more than 64 VPs, up to the maximum supported
number of processors applicable to the specific Windows
version being used.

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs

For compatibility, Let's introduce a new property for X86CPU,
named "x-hv-max-vps" as Eduardo's suggestion, and set it
to 0x40 before machine 2.10.

(The "x-" prefix indicates that the property is not supposed to
be a stable user interface.)

Backports relevant parts of commit 6c69dfb67e84747cf071958594d939e845dfcc0c from qemu
2018-03-05 00:00:53 -05:00
Joseph Myers a237d9dbca
target/i386: fix phminposuw in-place operation
The SSE4.1 phminposuw instruction finds the minimum 16-bit element in
the source vector, putting the value of that element in the low 16
bits of the destination vector, the index of that element in the next
three bits and zeroing the rest of the destination. The helper for
this operation fills the destination from high to low, meaning that
when the source and destination are the same register, the minimum
source element can be overwritten before it is copied to the
destination. This patch fixes it to fill the destination from low to
high instead, so the minimum source element is always copied first.
This fixes one gcc test failure in my GCC 6-based testing (and so
concludes the present sequence of patches, as I don't have any further
gcc test failures left in that testing that I attribute to QEMU bugs).

Backports commit aa406feadfc5b095ca147ec56d6187c64be015a7 from qemu
2018-03-04 23:59:26 -05:00
Joseph Myers 85b647e486
target/i386: fix pcmpxstrx substring search
One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri /
pcmpistrm instructions does a substring search. The implementation of
this case in the pcmpxstrx helper is incorrect. The operation in this
case is a search for a string (argument d to the helper) in another
string (argument s to the helper); if a copy of d at a particular
position would run off the end of s, the resulting output bit should
be 0 whether or not the strings match in the region where they
overlap, but the QEMU implementation was wrongly comparing only up to
the point where s ends and counting it as a match if an initial
segment of d matched a terminal segment of s. Here, "run off the end
of s" means that some byte of d would overlap some byte outside of s;
thus, if d has zero length, it is considered to match everywhere,
including after the end of s. This patch fixes the implementation to
correspond with the proper instruction semantics. This fixes four gcc
test failures in my GCC 6-based testing.

Backports commit ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60 from qemu
2018-03-04 23:58:45 -05:00