Previously, the low 4 bits were used for TCG_CALL_TYPE_MASK,
which was removed in 6a18ae2d2947532d5c26439548afa0481c4529f9.
Backports commit 3b50352b05eeafeb95cccd770f7aaba00bbdf6fe from qemu
Backporting 6fa2cef205a60b5c5c3b058f53852416b885c455 by Thomas Huth
started invoking assertions on clang. This means Unicorn is doing
something silly. This should be tracked down, but in the meantime,
restore behavior to allow tests to still be run.
For now, defined universally as true, since we previously required
backends to implement swapped memory operations. Future patches
may now remove that support where it is onerous.
Backports commit e1dcf3529d0797b25bb49a20e94b62eb93e7276a from qemu
Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.
Backports commit 6498594c8eda83c5f5915afc34bd03396f8de6df from qemu
Based on the only current user, Sparc:
New code uses 2 constants that take 2 insns to load from constant pool,
plus 13. Old code used 6 constants that took 1 or 2 insns to create,
plus 21. The result is a new total of 17 vs an old total of 29.
Backports commit 9e821eab0ab708add35fa0446d880086e845ee3e from qemu
Based on the only current user, Sparc:
New code uses 1 constant that takes 2 insns to create, plus 8.
Old code used 2 constants that took 2 insns to create, plus 9.
The result is a new total of 10 vs an old total of 13.
Backports commit a686dc71d89b1d7934becd95c843aa1375cdb7e7 from qemu
We now have an invariant that all TCG_TYPE_I32 values are
zero-extended, which means that we do not need to extend
them again during qemu_ld/st, either explicitly via a separate
tcg_out_ext32u or implicitly via P_ADDR32.
Backports commit 4810d96f03be4d3820563e3c6bf13dfc0627f205 from qemu
This preserves the invariant that all TCG_TYPE_I32 values are
zero-extended in the 64-bit host register.
Backports commit 75478279a0c1eafc7b69d5382356da138f58f1bd from qemu
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.
Backports commit 3dbc8c61de4e0d0a2afe0897cda7ab28cd37a164 from qemu
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.
Backports commit 1d21d95b6101786d44d3b4a12400eb80a1ecc647 from qemu
This does require an extra two checks within the slow paths
to replace the assert that we're moving. Also add two checks
within existing functions that lacked any kind of assert for
out of range branch.
Backports commit 55dfd8fedceb1311d9cdded1a0f94b2da91a387d from qemu
The reloc_pc{14,24}_val routines retain their asserts.
Use these directly within the slow paths.
Backports commit d5132903518fadad579ef2de9e45fce98eefaa63 from qemu
This does require an extra two checks within the slow paths
to replace the assert that we're moving.
Backports commit 43fabd30e2f411e8d70ff347902a7c8ed308233e from qemu
This does require an extra two checks within the slow paths
to replace the assert that we're moving.
Backports commit 214bfe83d5a5af70bac2b8d0bd649b018c33c03b from qemu
This will move the assert for success from within (subroutines of)
patch_reloc into the callers. It will also let new code do something
different when a relocation is out of range.
For the moment, all backends are trivially converted to return true.
Backports commit 6ac1778676f4259c10b0629ccd9df319a5d1baeb from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit 8c1b079279fadaee10dc39ca9a58c4c91c7a1854 from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit 791645f0227c9d52ce5fe1ad6e1cda55a9bfe633 from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit 3661612fc3e4b65be03482bf6bafd116101881e1 from qemu
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.
Backports commit f9c7246faa279237200a2a53beacaa8100ea1900 from qemu
There are one use apiece for these. There is no longer a need for
preserving branch offset operands, as we no longer re-translate.
Backports commit 37ee93a974c49ab9edfcd1db0aad3838b0395b14 from qemu
There are one use apiece for these. There is no longer a need for
preserving branch offset operands, as we no longer re-translate.
Backports commit 733589b3382afcb0ae9f43e72e083a5ddd38abd5 from qemu
For x86_64, this can remove a REX prefix resulting in smaller code
when manipulating globals of type i32, as we move them between backing
store via cpu_env, aka TCG_AREG0.
Backports commit 5740d9f714835964873325d1210b26811252843f from qemu
Partially reverts ab20bdc1162. The 14-bit displacement that we
allowed to reach the constant pool is not always sufficient.
Retain the tb-relative addressing, as that is how most return
values from the tb are computed.
Backports commit f6823cbe3787aa47db62deede6683077e3da9a2c from qemu
Both GCC v4.8 and Clang v3.4 (our minimum versions) support
__builtin_unreachable(), so we can remove the version check here now.
Backports commit 6fa2cef205a60b5c5c3b058f53852416b885c455 from qemu
The tcg-op.h header was missing the usual guard against multiple
inclusion; add it.
(Spotted by lgtm.com's static analyzer.)
Backports commit a7ce790a029bd94eb320d8c69f38900f5233997e from qemu
Define and initialize the 16 MXU registers - 15 general computational
register, and 1 control register). There is also a zero register, but
it does not have any corresponding variable.
Backports commit eb5559f67dc8dc12335dd996877bb6daaea32eb2 from qemu.
GCC7+ will no longer advertise support for 16-byte __atomic operations
if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still
has support for __sync_compare_and_swap_16 and we can make use of that.
AArch64 does not have, nor ever has had such support, so open-code it.
Backports commit e6cd4bb59b8154fa00da611200beef7eb4e8ec56 from qemu
We forgot to initialize n in commit 15fa08f845 ("tcg: Dynamically
allocate TCGOps", 2017-12-29).
Backports commit c1f543b739086733024e31d74a52d9e41553f316 from qemu
Rather than test NOCHAIN before linking, do not emit the
goto_tb opcode at all. We already do this for goto_ptr.
Backports commit d7f425fdea991f052241c6479acd9feae834063b from qemu
The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.
Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.
To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
Fixes: 770c2fc7bb ("Add vector operations")
Backports commit 93bf9a42733321fb632bcb9eafd049ef0e3d9417 from qemu
When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.
Fixes: 770c2fc7bb7
Backports commit 672189cd586ea38a2c1d8ab91eb1f9dcff5ceb05 from qemu
In AdvSIMD we can only do 32x32 integer multiples although SVE is
capable of larger 64 bit multiples. As a result we can end up
generating invalid opcodes. Fix this by only reprting we can emit
mul vector ops if the size is small enough.
Fixes a crash on:
sve-all-short-v8.3+sve@vq3/insn_mul_z_zi___INC.risu.bin
When running on AArch64 hardware.
Backports commit e65a5f227d77a5dbae7a7123c3ee915ee4bd80cf from qemu
Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation. However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register. Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.
Backports commit 499748d7683198a765d17b4fdf6901ab9dca920c from qemu
The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction. To go around this problem, the raw
encoding of the instruction is used instead.
Backports commit 1019242af11400252f6735ca71a35f81ac23a66d from qemu
Do the cast to uintptr_t within the helper, so that the compiler
can type check the pointer argument. We can also do some more
sanity checking of the index argument.
Backports commit 07ea28b41830f946de3841b0ac61a3413679feb9 from qemu
Given that this atomic operation will be used by both risc-v
and aarch64, let's not duplicate code across the two targets.
Backports commit 5507c2bf35aa6b4705939349184e71afd5e058b2 from qemu
These operations are re-invented by several targets so far.
Several supported hosts have insns for these, so place the
expanders out-of-line for a future introduction of tcg opcodes.
Backports commit b87fb8cd9f9a0ba599ff79e7bf03222da02e5724 from qemu
In 6001f7729e12 we partially attempt to address the branch
displacement overflow caused by 15fa08f845.
However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c
is a testcase that contains a TB so large as to overflow anyway.
The limit here of 8000 ops produces a maximum output TB size of
24112 bytes on a ppc64le host with that test case. This is still
much less than the maximum forward branch distance of 32764 bytes.
Backports commit abebf92597186be2bc48d487235da28b1127860f from qemu
The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
DUP v26.16b, w21
which becomes TCG IR ops:
dup_vec v128,e8,tmp2,x21
st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0
0x607c5694: 00
0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1
0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14)
0x607c56aa: 00
when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.
Pass the correct source register number to tcg_out_vex_modrm()
for these insns.
Backports commit 7eb30ef0ba2eb59e7430d4848ae8d4bf4e50f768 from qemu
ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st
slow path. BC instruction uses a relative address encoded
on 14 bits.
The slow path functions are added at the end of the generated
instructions buffer, in the reverse order of the callers.
So more we have slow path functions more the distance between
the caller (BC) and the function increases.
This patch changes the behavior to generate the functions in
the same order of the callers.
Backports commit 6001f7729e12dd1d810291e4cbf83cee8e07441d from qemu
Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros.
Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr,
tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32.
Use inlines instead of macros where possible.
Backports commit 5bfa803448638a45542441fd6b7cc1241403ea72 from qemu
In db432672, we allow wide inputs for operations such as add.
However, in 212be173 and 3774030a we didn't do the same for
compare and multiply.
Backports commit 9a938d86b04025ac605db0ea9819e5896bf576ec from qemu
I found with qemu 2.11.x or newer that I would get an illegal instruction
error running some Intel binaries on my ARM chromebook. On investigation,
I found it was quitting on memory barriers.
qemu instruction:
mb $0x31
was translating as:
0x604050cc: 5bf07ff5 blpl #0x600250a8
After patch it gives:
0x604050cc: f57ff05b dmb ish
In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
correct based on online docs, but due to some endian-related shenanigans it
had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory
barrier for ARMv6) also should be byte swapped (and this patch does so).
I have not checked for correctness of aarch64's barrier instruction.
Backports commit 3f814b803797c007abfe5c4041de754e01723031 from qemu
The MIPS TCG target makes the assumption that the offset from the
target env pointer to the tlb_table is less than about 64K. This
used to be true, but gradual addition of features to the Arm
target means that it's no longer true there. This results in
the build-time assertion failing:
In file included from /home/pm215/qemu/include/qemu/osdep.h:36:0,
from /home/pm215/qemu/tcg/tcg.c:28:
/home/pm215/qemu/tcg/mips/tcg-target.inc.c: In function ‘tcg_out_tlb_load’:
/home/pm215/qemu/include/qemu/compiler.h:90:36: error: static assertion failed: "not expecting: offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1]) > 0x7ff0 + 0x7fff"
^
/home/pm215/qemu/include/qemu/compiler.h:98:30: note: in expansion of macro ‘QEMU_BUILD_BUG_MSG’
^
/home/pm215/qemu/tcg/mips/tcg-target.inc.c:1236:9: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
QEMU_BUILD_BUG_ON(offsetof(CPUArchState,
^
/home/pm215/qemu/rules.mak:66: recipe for target 'tcg/tcg.o' failed
An ideal long term approach would be to rearrange the CPU state
so that the tlb_table was not so far along it, but this is tricky
because it would move it from the "not cleared on CPU reset" part
of the struct to the "cleared on CPU reset" part. As a simple fix
for the 2.12 release, make the MIPS TCG target handle an arbitrary
offset by emitting more add instructions. This will mean an extra
instruction in the fastpath for TCG loads and stores for the
affected guests (currently just aarch64-softmmu)
Backports commit 161dfd1e7fad1203840c0390f235030eba3fd23c from qemu
The parameters for tcg_gen_insn_start are target_ulong, which may be split
into two TCGArg parameters for storage in the opcode on 32-bit hosts.
Fixes the ARM target and its direct use of tcg_set_insn_param, which would
set the wrong argument in the 64-on-32 case.
Backports commit 9743cd5736263e90d312b2c33bd739ffe1eae70d from qemu
Failure to do so results in the tcg optimizer sign-extending
any constant fold from 32-bits. This turns out to be visible
in the RISC-V testsuite using a host that emits these opcodes
(e.g. any non-x86_64).
Backports commit f2f1dde75160cac6ede330f3db50dc817d01a2d6 from qemu
This unifies 5 copies of checks for supported vector size,
and in the process fixes a missing check in tcg_gen_gvec_2s.
This lead to an assertion failure for 64-bit vector multiply,
which is not available in the AVX instruction set.
Bakports commit adb196cbd5cff26547bc32a208074f03f4c4a627 from qemu
Unknown why -m32 was passing with gcc but not clang; it should have
failed for both. This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.
Backports commit 7f34ed4bcdfda55f978f51aadca64aa970c9f4b6 from qemu
We already handle this in the backends, and the lifetime datum
for the TCGOp is already large enough.
Backports commit 1df3caa946e08b387511dfba3a37d78910e51796 from qemu
move tcg-runtime.c, translate-all.(ch) and translate-common.c into
accel/tcg/ subdirectory and updated related trace-events file.
Backports commit 244f144134d0dd182f1af8654e7f9a79fe770368 and applies
relevant changes made in db432672dc50ed86dda17ac821b7eb07411a90af and
d9bb58e51068dfc48746c6af0179926c8dc05bce from qemu
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.
Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.
Backports commit 5dd8990841a9e331d9d4838a116291698208cbb6 from qemu
The x86 vector instruction set is extremely irregular. With newer
editions, Intel has filled in some of the blanks. However, we don't
get many 64-bit operations until SSE4.2, introduced in 2009.
The subsequent edition was for AVX1, introduced in 2011, which added
three-operand addressing, and adjusts how all instructions should be
encoded.
Given the relatively narrow 2 year window between possible to support
and desirable to support, and to vastly simplify code maintainence,
I am only planning to support AVX1 and later cpus.
Backports commit 770c2fc7bb70804ae9869995fd02dadd6d7656ac from qemu
Trivial move and constant propagation. Some identity and constant
function folding, but nothing that requires knowledge of the size
of the vector element.
Backports commit 170ba88f45bd7b1c5593021ed8e174f663b0bd1a from qemu
Use dup to convert a non-constant scalar to a third vector.
Add addition, multiplication, and logical operations with an immediate.
Add addition, subtraction, multiplication, and logical operations with
a non-constant scalar. Allow for the front-end to build operations in
which the scalar operand comes first.
Backports commit 22fc3527034678489ec554e82fd52f8a7f05418e from qemu
No vector ops as yet. SSE only has direct support for 8- and 16-bit
saturation; handling 32- and 64-bit saturation is much more expensive.
Backports commit f49b12c6e6a75a5bd109bcbbda072b24e5fb8dfd from qemu
Opcodes are added for scalar and vector shifts, but considering the
varied semantics of these do not expose them to the front ends. Do
go ahead and provide them in case they are needed for backend expansion.
Backports commit d0ec97967f940bbc11dced83422b39c224127f1e from qemu
Some functions use intN_t arguments, some use uintN_t, some just
used "unsigned". To aid putting function pointers in tables, we
need consistency.
Backports commit 474b2e8f0f765515515b495e6872b5e18a660baf from qemu
The code sequence we were generating was only good for unsigned
comparisons. For signed comparisions, use the sequence from gcc.
Fixes booting of ppc64 firmware, with a patch changing the code
sequence for ppc comparisons.
Backports commit 7170ac33135e6ecf89752d3949bcecf9b9766d1c from qemu
We had two fields specific to INDEX_op_call. Rename these and
add some macros so that the fields may be reused for other opcodes.
Backports commit cd9090aa9dbba30db8aec9a2fc103aaf1ab0f5a7 from qemu
With no fixed array allocation, we can't overflow a buffer.
This will be important as optimizations related to host vectors
may expand the number of ops used.
Use QTAILQ to link the ops together.
Backports commit 15fa08f8451babc88d733bd411d4c94976f9d0f8 from qemu
Rather than have separate code only used for guest_base,
rely on a recent change to handle constant pool entries.
Backports commit ba2c747992f8c315c2fbddba196ce9137430d61d from qemu
Both ARMv6 and AArch64 currently may drop complex guest_base values
into the constant pool. But generic code wasn't expecting that, and
the pool is not emitted. Correct that.
Backports commit 5b38ee31616d1532c3c3a6dc644a9160d608ed2f from qemu
Using the offset of a temporary, relative to TCGContext, rather than
its index means that we don't use 0. That leaves offset 0 free for
a NULL representation without having to leave index 0 unused.
Backports commit e89b28a63501c0ad6d2501fe851d0c5202055e70 from qemu
When we used structures for TCGv_*, we needed a macro in order to
perform a comparison. Now that we use pointers, this is just clutter
Backports commit 11f4e8f8bfaa2caaab24bef6bbbb8a0205015119 from qemu
The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.
The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.
Backports commit dc41aa7d34989b552efe712ffe184236216f960b from qemu
Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.
Backports commit ae8b75dc6ec808378487064922f25f1e7ea7a9be from qemu
While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.
Backports commit 6349039d0b06eda59820629b934944246b14a1c1 from qemu
Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration. This should allow the compiler to use low-overhead
looping constructs on some hosts.
Backports commit ac3b88911ebc6fc841f28898ee8aed40839debe2 from qemu
Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries. The result is actually a bit
smaller and should have slightly more cache locality.
Backports commit 75e8b9b7aa0b95a761b9add7e2f09248b101a392 from qemu
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.
Backports commit e7e168f41364c6e83d0f75fc1b3ce7f9c41ccf76 from qemu
It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.
This change paves the way to having a common "tb_lookup" function.
Backports commit 7f11636dbee89b0e4d03e9e2b96e14649a7db778 from qemu
It's not even clear what the interface REG and VAL32 were supposed to mean.
All uses had REG = 0 and VAL32 was the bitset assigned to the destination.
Backports commit f46934df662182097dce07d57ec00f37e4d2abf1 from qemu
We are not going to use ldrd for loading the comparator
for 32-bit guests, so don't limit cmp_off to 8 bits then.
This eliminates one insn in the tlb load for some guests.
Backports commit 95ede84f4de18747d03d79c148013cff99acd60b from qemu
Use UBFX to avoid limitation on CPU_TLB_BITS. Since we're dropping
the initial shift, we need to replace the page masking. We can use
MOVW+BIC to do this without shifting. The result is the same size
as the armv6 path with one less conditional instruction.
Backports commit 647ab96aaf5defeb138e48d610f7f633c587b40d from qemu
Split out maybe_out_small_movi for use with other operations
that want to add to the constant pool.
Backports commit 28eef8aaece5e83df4568d9842ab9611ec130b2c from qemu
We were passing in -2 instead of +2, but then ignoring
the actual contents of addend in the calculation.
Backports commit e692a3492d04500355bcf23575eed7cf137b38d5 from qemu
Already it saves 2 bytes per call, but also the constant pool
entry may well be shared across multiple calls.
Backports commit 4e45f23943c0bb91588627de3801826546155ad8 from qemu
A new shared header tcg-pool.inc.c adds new_pool_label,
for registering a tcg_target_ulong to be emitted after
the generated code, plus relocation data to install a
pointer to the data.
A new pointer is added to the TCGContext, so that we
dump the constant pool as data, not code.
Backports commit 57a269469dbf70013dab3a176e1735636010a772 from qemu
Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.
Backports commit 659ef5cbb893872d25e9d95191cc23b16546c8a1 from qemu
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.
While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.
Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.
Backports commit a85833933628384d74ec412024d55cf012640287 from qemu
This allows LOAD HALFWORD IMMEDIATE ON CONDITION,
eliminating one insn in some common cases.
Backports commit 7af525af01b9615c4f4df5da2e8a50f2fe00b023 from qemu
Currently, we cannot use mttcg for running strong memory model guests
on weak memory model hosts due to missing ordering semantics.
We implicitly generate fence instructions for stronger guests if an
ordering mismatch is detected. We generate fences only for the orders
for which fence instructions are necessary, for example a fence is not
necessary between a store and a subsequent load on x86 since its
absence in the guest binary tells that ordering need not be
ensured. Also note that if we find multiple subsequent fence
instructions in the generated IR, we combine them in the TCG
optimization pass.
This patch allows us to boot an x86 guest on ARM64 hosts using mttcg.
Backports commit b32dc3370a666e237b2099c22166b15e58cb6df8 from qemu
For a 64-bit ILP32 host, aligning to sizeof(long) is not enough.
Guess the minimum for any host is 8, as that covers uint64_t.
Qemu doesn't use a host long double or host vectors, except in
extremely limited circumstances.
Fixes a bus error for a sparc v8plus host.
Backports commit 13aaef678ed377b12b76dc7fb9e615b2f2f9047b from qemu
Patch 85aa80813dd changed the IF emitting the TST instruction,
but failed to change the ?: converting CMP to CMPEQ, so the
result of the TST is ignored.
Backports commit ca671de8af96798e0f493378240034620a3a04ee from qemu
Reserve a register for the guest_base using ppc code for reference.
By doing so, we do not have to recompute it for every memory load.
Backports commit 4df9cac57f5220c17d856292e90fce455f708421 from qemu
When running a helloworld program with qemu-i386 in linux-user
mode on Loongson 3A3000, it will crash. This patch fix the bug.
Backports commit 8b8d768f19037a825a0bc81654492caa7c8fab8b from qemu
This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.
Backports commit 2acee8b2b5e6bba2935bb6ce5be92d0f0f9799cb from qemu
We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.
Backports commit b68686bd4bfeb70040b4099df993dfa0b4f37b03 from qemu
We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.
Backports commit 23b7aa1d2af04ba57cc94f74d9f0ab25dce72fa0 from qemu
Coldfire uses float64, but 680x0 use floatx80.
This patch introduces the use of floatx80 internally
and enables 680x0 80bits FPU.
Backports commit f83311e4764f1f25a8abdec2b32c64483be1759b from qemu
The new placement of the TB means that we can use one insn
to load the goto_tb destination directly from the TB.
Backports commit 308714e6bc945389c64faf1b9213e2c0d3f03391 from qemu
The new placement of the TB means that we can use one insn
to load the return value for exit_tb returning the TB pointer.
Backports commit cc74d332ff9a78684374847375ef63fc4bd10436 from qemu
Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.
An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).
Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:
https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
Backports commit 6e3b2bfd6af488a896f7936e99ef160f8f37e6f2 from qemu
In theory this would re-enable usage of QEMU on an armv4 host.
Whether this is worthwhile is debatable -- we've been unconditionally
issuing the armv5t BX instruction in the prologue since 2011 without
complaint. Possibly we should simply require an armv6 host.
Backports commit 702a947484eb3e615183dafc93de590ab0679f60 from qemu
Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.
Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu
Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize
the output. Even though this code is dead, it gets translated, and
without the initialization we encounter a tcg_error.
Backports commit 79b1af906245558c30e0a5faf26cb52b63f83cce from qemu
We already require gcc 4.1 or newer (for the atomic
support), so the fallback codepaths for older gcc
versions than that are now dead code and we can
just delete them.
NB: clang reports itself as gcc 4.2 (regardless of
clang version), so clang won't be using the fallbacks
either.
Backports commit fa54abb8c298f892639ffc4bc2f61448ac3be4a1 from qemu
The C store helper functions take the address argument as a
target_ulong type; if this is 32 bit but the host is 64 bit
then the SPARC calling convention requires that the caller
must zero extend the value. We weren't doing this, which
meant we could pass values to the caller with high bits set
and QEMU would crash if it was compiled with optimizations.
In particular, the i386 BIOS would not start.
Backports commit 5c32be5baf41aec4f4675d2bf24f9948756abf3c from qemu
The C store helper functions take the data argument as a uint8_t,
uint16_t, etc depending on the store size. The SPARC calling
convention requires that data types smaller than the register
size must be extended by the caller. We weren't doing this,
which meant that if QEMU was compiled with optimizations enabled
we could end up storing incorrect values to guest memory.
(In particular the i386 guest BIOS would crash on startup.)
Add code to the trampolines that call the store helpers to
do the zero extension as required.
Backports commit 709a340d679d95a0c6cbb9b5f654498f04345b50 from qemu
To fix the following warnings:
In file included from /users/pranith/qemu/tcg/tcg.c:255:
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:879:24: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
[-Wenum-conversion]
tcg_out_cmp(s, ext, a, b, b_const);
~~~~~~~~~~~ ^~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:893:36: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
[-Wenum-conversion]
tcg_out_insn(s, 3201, CBZ, ext, a, offset);
~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:389:65: note: expanded from macro 'tcg_out_insn'
glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
^
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:895:37: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
[-Wenum-conversion]
tcg_out_insn(s, 3201, CBNZ, ext, a, offset);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:389:65: note: expanded from macro 'tcg_out_insn'
glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
^
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:1610:27: warning: implicit conversion from enumeration type 'TCGType' (aka 'enum TCGType') to different enumeration type 'TCGMemOp' (aka 'enum TCGMemOp')
[-Wenum-conversion]
tcg_out_brcond(s, ext, a2, a0, a1, const_args[1], arg_label(args[3]));
~~~~~~~~~~~~~~ ^~~
backports commit dc1eccd661ada3b746ca4438e444993c36a0f04f from qemu
This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:
- The ARM translate.c/translate-64.c have been converted to
- use MTTCG safe atomic primitives
- emit the appropriate barrier ops
- The ARM machine has been updated to
- hold the BQL when modifying shared cross-vCPU state
- defer powerctl changes to async safe work
All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.
Backports commit ca759f9e387db87e1719911f019bc60c74be9ed8 from qemu