Commit graph

52 commits

Author SHA1 Message Date
Richard Henderson 67f0af4282
tcg/aarch64: Allow immediates for vector ORR and BIC
The allows immediates to be used for ORR and BIC,
as well as the trivial inversions, ORC and AND.

Backports commit 9e27f58b9902834dffc0d66d9eb62f78d9c2a632 from qemu
2019-05-24 18:47:07 -04:00
Richard Henderson 5ecfba4fe6
tcg/aarch64: Build vector immediates with two insns
Use MOVI+ORR or MVNI+BIC in order to build some vector constants,
as opposed to dropping them to the constant pool. This includes
all 16-bit constants and a similar set of 32-bit constants.

Backports commit 02f3a5b4744885258758d07ebe09cf965de78bcf from qemu
2019-05-24 18:43:54 -04:00
Richard Henderson 06058ef648
tcg/aarch64: Use MVNI in tcg_out_dupi_vec
The compliment of a subset of immediates can be computed
with a single instruction.

Backports commit 7e308e003e5b6ddd3130e09711e1d33693230696 from qemu
2019-05-24 18:42:40 -04:00
Richard Henderson c18ec586dc
tcg/aarch64: Split up is_fimm
There are several sub-classes of vector immediate, and only MOVI
can use them all. This will enable usage of MVNI and ORRI, which
use progressively fewer sub-classes.

This patch adds no new functionality, merely splits the function
and moves part of the logic into tcg_out_dupi_vec.

Backports commit 984fdcee342473dfe797897758929dad654693c8 from qemu
2019-05-24 18:41:37 -04:00
Richard Henderson 0ea4c05dc3
tcg/aarch64: Support vector bitwise select value
The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination. We
can represent the operation without overlap and choose based on
the operands seen.

Backports commit a9e434a5dc16f71ee156428619fc3c3765b68f26 from qemu
2019-05-24 18:38:37 -04:00
Richard Henderson de260cfbd6
tcg/aarch64: Do not advertise minmax for MO_64
The min/max instructions are not available for 64-bit elements.

Backports commit a7b6d286cfb5205b9f5330aefc5727269b3d810f from qemu
2019-05-16 16:44:34 -04:00
Richard Henderson 7c9b3a9021
tcg/aarch64: Support vector absolute value
Backports commit a456394ae540f852cd0d10fd693fe9f33598dc01 from qemu
2019-05-16 16:39:14 -04:00
Richard Henderson 0217ee7b24
tcg/aarch64: Support vector variable shift opcodes
Backports commit 79525dfd08262d8de10d271f17e5a4096ef96d16 from qemu
2019-05-16 15:58:54 -04:00
Richard Henderson 66e6bea084
tcg: Add INDEX_op_dupm_vec
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.

VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.

Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Backports commit 37ee55a081b7863ffab2151068dd1b2f11376914 from qemu
2019-05-16 15:38:02 -04:00
Richard Henderson fd7a67e4a7
tcg/aarch64: Implement tcg_out_dupm_vec
The LD1R instruction does all the work. Note that the only
useful addressing mode is a base register with no offset.

Backports commit f23e5e15edfd49d5dd72cab2ed2d85ac354b2eeb from qemu
2019-05-16 15:29:04 -04:00
Richard Henderson d4e7c6a8c5
tcg: Add tcg_out_dupm_vec to the backend interface
Currently stubbed out in all backends that support vectors.

Backports commit d6ecb4a978b718dbe108a9fa9ecccc8b7f7cb579 from qemu
2019-05-16 15:24:48 -04:00
Richard Henderson cf238d3544
tcg: Manually expand INDEX_op_dup_vec
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Backports commit bab1671f0fa928fd678a22f934739f06fd5fd035 from qemu
2019-05-16 15:22:29 -04:00
Richard Henderson 3d20e1678c
tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
The i386 backend already has these functions, and the aarch64 backend
could easily split out one. Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.

Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.

Backports commit e7632cfa8b76cdbbc1c76e8737338ef5844e7d60 from qemu
2019-05-16 15:18:48 -04:00
Richard Henderson f86bd1c5d6
tcg: Return bool success from tcg_out_mov
This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Backports commit 78113e83e0007e869c9f0cb4c0497a77538988e3 from qemu
2019-05-16 15:14:42 -04:00
Richard Henderson 6145e3fdd7
tcg: Restart TB generation after out-of-line ldst overflow
This is part c of relocation overflow handling.

Backports commit aeee05f53a5d67304a521d2644dc0a607e3c8b28 from qemu
2019-04-30 10:06:53 -04:00
Richard Henderson 2479bbd3b2
tcg/aarch64: Support INDEX_op_extract2_{i32,i64}
Backports commit 464c2969d5d7a0a5d38d2aa5d930986df876d3fb from qemu
2019-04-30 09:40:40 -04:00
Richard Henderson 3f0781e39b
tcg/aarch64: Implement vector minmax arithmetic
Backports commit 93f332a50371936ea02392bdb748c8140ef3f06a from qemu
2019-01-29 16:44:09 -05:00
Richard Henderson 8a012c3929
tcg/aarch64: Implement vector saturating arithmetic
Backports commit d32648d445c534cea7e2ad7ed8608208aa8831c1 from qemu
2019-01-29 16:42:50 -05:00
Richard Henderson a22387f919
tcg/aarch64: Return false on failure from patch_reloc
This does require an extra two checks within the slow paths
to replace the assert that we're moving.

Backports commit 214bfe83d5a5af70bac2b8d0bd649b018c33c03b from qemu
2018-12-18 05:28:45 -05:00
Richard Henderson 46189d87b3
tcg: Return success from patch_reloc
This will move the assert for success from within (subroutines of)
patch_reloc into the callers. It will also let new code do something
different when a relocation is out of range.

For the moment, all backends are trivially converted to return true.

Backports commit 6ac1778676f4259c10b0629ccd9df319a5d1baeb from qemu
2018-12-18 05:25:45 -05:00
Richard Henderson 0a8bc142d3
tcg/aarch64: Fold away noaddr branch routines
There are one use apiece for these. There is no longer a need for
preserving branch offset operands, as we no longer re-translate.

Backports commit 733589b3382afcb0ae9f43e72e083a5ddd38abd5 from qemu
2018-12-18 05:15:41 -05:00
Richard Henderson cbe1065e83
tcg/aarch64: Remove reloc_pc26_atomic
It is unused since b68686bd4bfeb70040b4099df993dfa0b4f37b03.

Backports commit 90d6cb781130891f96eb54f8315e29fbd4e99a71 from qemu
2018-12-18 05:14:22 -05:00
Alex Bennée 11948dd1cc
tcg/aarch64: limit mul_vec size
In AdvSIMD we can only do 32x32 integer multiples although SVE is
capable of larger 64 bit multiples. As a result we can end up
generating invalid opcodes. Fix this by only reprting we can emit
mul vector ops if the size is small enough.

Fixes a crash on:

sve-all-short-v8.3+sve@vq3/insn_mul_z_zi___INC.risu.bin

When running on AArch64 hardware.

Backports commit e65a5f227d77a5dbae7a7123c3ee915ee4bd80cf from qemu
2018-07-21 14:15:59 -04:00
Richard Henderson d1da0b8f6d
tcg/aarch64: Add vector operations
Backports commit 14e4c1e2355473ccb2939afc69ac8f25de103b92 from qemu
2018-03-07 08:07:58 -05:00
Richard Henderson 47ed20fdd4
tcg/aarch64: Fully convert tcg_target_op_def
Backports commit 1897cc2eb8be2d8be23380b45a2d3c1a2808723f from qemu
2018-03-04 23:46:38 -05:00
Richard Henderson fc8b4316a9
tcg: Remove tcg_regset_set32
It's not even clear what the interface REG and VAL32 were supposed to mean.
All uses had REG = 0 and VAL32 was the bitset assigned to the destination.

Backports commit f46934df662182097dce07d57ec00f37e4d2abf1 from qemu
2018-03-04 23:42:59 -05:00
Richard Henderson 49d09d6888
tcg: Remove tcg_regset_clear
Backports commit ccb1bb66ea2a42e773bfa04178d8b383ff86d4d8 from qemu
2018-03-04 23:24:45 -05:00
Richard Henderson 0c3781e7eb
tcg/aarch64: Use constant pool for movi
Backports commit 55129955e92ec164ee2d778f20070dc214109bc6 from qemu
2018-03-04 22:46:50 -05:00
Richard Henderson f96514a99c
tcg: Rearrange ldst label tracking
Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.

Backports commit 659ef5cbb893872d25e9d95191cc23b16546c8a1 from qemu
2018-03-04 22:13:13 -05:00
Richard Henderson 31b8b67cd3
tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.

While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.

Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.

Backports commit a85833933628384d74ec412024d55cf012640287 from qemu
2018-03-04 21:52:35 -05:00
Pranith Kumar 57f8eec080
tcg/aarch64: Enable indirect jump path using LDR (literal)
This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.

Backports commit 2acee8b2b5e6bba2935bb6ce5be92d0f0f9799cb from qemu
2018-03-03 22:03:39 -05:00
Pranith Kumar 5e9e39cafd
tcg/aarch64: Use ADRP+ADD to compute target address
We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

Backports commit b68686bd4bfeb70040b4099df993dfa0b4f37b03 from qemu
2018-03-03 22:01:38 -05:00
Pranith Kumar 0998ba8259
tcg/aarch64: Introduce and use long branch to register
We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.

Backports commit 23b7aa1d2af04ba57cc94f74d9f0ab25dce72fa0 from qemu
2018-03-03 21:59:58 -05:00
Richard Henderson 9a85cb0a26
tcg/aarch64: Use ADR in tcg_out_movi
The new placement of the TB means that we can use one insn
to load the return value for exit_tb returning the TB pointer.

Backports commit cc74d332ff9a78684374847375ef63fc4bd10436 from qemu
2018-03-03 17:09:42 -05:00
Richard Henderson 81f1aae572
tcg/aarch64: Implement goto_ptr
Measurements:

SPECint06 (test set), x86_64-linux-user. Host: APM 64-bit ARMv8 (Atlas/A57) @ 2.4 GHz

1.45x +-+-------------------------------------------------------------------------------------------------------------+-+
| ***** |
| +++ * * +goto-ptr |
1.4x +-+...*****............................*...*....................................................................+-+
| *+++* * * +++ |
1.35x +-+...*...*............................*...*...........................*****....................................+-+
| * * * * *+++* |
| * * * * * * |
1.3x +-+...*...*............................*...*...........................*...*....................................+-+
| * * * * * * |
| * * * * * * ***** |
1.25x +-+...*...*...........*****............*...*...........................*...*............*****...*...*...........+-+
| * * * * * * * * *+++* * * |
1.2x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...........+-+
| * * * * * * * * * * * * |
| * * * * * * * * * * * * ***** |
1.15x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...*...*...+-+
| * * * * * * * * +++ * * * * * * |
| * * * * * * * * ***** * * * * * * |
1.1x +-+...*...*...........*...*....*****...*...*...*****...................*...*...*...*....*...*...*...*...*...*...+-+
| * * * * * * * * * * * * * * * * * * * * |
1.05x +-+...*...*...........*...*....*...*...*...*...*...*...................*...*...*...*....*...*...*...*...*...*...+-+
| * * ***** * * * * * * * * * * * * * * * * * * |
| * * * * * * * * * * * * ***** ***** * * * * * * * * * * |
1x +-+---*****---*****---*****----*****---*****---*****---*****---*****---*****---*****----*****---*****---*****---+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjenxalancbmk hmean
png: http://imgur.com/en9HE8L

Backports commit b19f0c2e7d344d4d62daf554951acdb6c94a34b0 from qemu
2018-03-03 14:13:09 -05:00
Pranith Kumar ee609fa59f
aarch64: Change ext type to TCGType to fix warnings
To fix the following warnings:

In file included from /users/pranith/qemu/tcg/tcg.c:255:
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:879:24: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
[-Wenum-conversion]
tcg_out_cmp(s, ext, a, b, b_const);
~~~~~~~~~~~ ^~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:893:36: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
[-Wenum-conversion]
tcg_out_insn(s, 3201, CBZ, ext, a, offset);
~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:389:65: note: expanded from macro 'tcg_out_insn'
glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
^
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:895:37: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
[-Wenum-conversion]
tcg_out_insn(s, 3201, CBNZ, ext, a, offset);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:389:65: note: expanded from macro 'tcg_out_insn'
glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
^
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:1610:27: warning: implicit conversion from enumeration type 'TCGType' (aka 'enum TCGType') to different enumeration type 'TCGMemOp' (aka 'enum TCGMemOp')
[-Wenum-conversion]
tcg_out_brcond(s, ext, a2, a0, a1, const_args[1], arg_label(args[3]));
~~~~~~~~~~~~~~ ^~~

backports commit dc1eccd661ada3b746ca4438e444993c36a0f04f from qemu
2018-03-02 10:48:56 -05:00
Richard Henderson 2b87ddda35
tcg/aarch64: Handle ctz and clz opcodes
Backports commit 53c76c19904983d2c81e4f5e77027c241918a479 from qemu
2018-03-01 16:19:34 -05:00
Richard Henderson 3f38611159
tcg: Pass the opcode width to target_parse_constraint
This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit. Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.

Backports commit 069ea736b50b75fdec99c9b8cc603b97bd98419e from qemu
2018-03-01 15:45:40 -05:00
Richard Henderson b8c93597b4
tcg: Transition flat op_defs array to a target callback
This will allow the target to tailor the constraints to the
auto-detected ISA extensions.

Backports commit f69d277ece43c42c7ab0144c2ff05ba740f6706b from qemu
2018-03-01 15:40:11 -05:00
Richard Henderson fbea4130fc
tcg/aarch64: Implement field extraction opcodes
Backports commit e2179f94a17bf0933df29ce1b4f6bc93cbe7dbd3 from qemu
2018-03-01 13:30:55 -05:00
Richard Henderson 6820964e2f
tcg/aarch64: Fix tcg_out_movi
There were some patterns, like 0x0000_ffff_ffff_00ff, for which we
would select to begin a multi-insn sequence with MOVN, but would
fail to set the 0x0000 lane back from 0xffff.

Backports commit 50b468d42107a2c646b1c566ed17d9ec362c51c4 from qemu
2018-03-01 09:15:34 -05:00
Richard Henderson a03666f2f2
tcg/aarch64: Fix addsub2 for 0+C
When al == xzr, we cannot use addi/subi because that encodes xsp.
Force a zero into the temp register for that (rare) case.

Backports commit 028fbea47713f909d6ea761a457779a82b276247 from qemu
2018-03-01 09:13:54 -05:00
Pranith Kumar 907060b865
tcg/aarch64: Add support for fence
Backports commit c7a59c2a92592e556b9361437c9c4229917bd1e3 from qemu
2018-02-26 03:11:03 -05:00
Richard Henderson 91f5cf0417
tcg: Support arbitrary size + alignment
Previously we allowed fully unaligned operations, but not operations
that are aligned but with less alignment than the operation size.

In addition, arm32, ia64, mips, and sparc had been omitted from the
previous overalignment patch, which would have led to that alignment
being enforced.

Backports commit 85aa80813dd9f5c1f581c743e45678a3bee220f8 from qemu
2018-02-26 02:47:26 -05:00
Sergey Sorokin e4d123caa9
tcg: Improve the alignment check infrastructure
Some architectures (e.g. ARMv8) need the address which is aligned
to a size more than the size of the memory access.
To support such check it's enough the current costless alignment
check implementation in QEMU, but we need to support
an alignment size specifying.

Backports commit 1f00b27f17518a1bcb4cedca49eaec96a4d560bd from qemu
2018-02-25 02:23:28 -05:00
Richard Henderson 23586e2674
tcg: Optimize spills of constants
While we can store constants via constrants on INDEX_op_st_i32 et al,
we weren't able to spill constants to backing store.

Add a new backend interface, tcg_out_sti, which may store the constant
(and is allowed to fail). Rearrange the temp_* helpers so that we only
attempt to directly store a constant when the temp is becoming dead/free.

Backports commit 59d7c14eeff8d2ad7f61aed86ce5a176113bc153 from qemu
2018-02-25 01:45:29 -05:00
Sergey Fedorov e60c24cecf
tcg: Clean up direct block chaining data fields
Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
tb_next_offset => jmp_reset_offset
tb_jmp_offset => jmp_insn_offset
tb_next => jmp_target_addr
jmp_next => jmp_list_next
jmp_first => jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.

Backports commit f309101c26b59641fc1aa8fb2a98a5441cdaea03 from qemu
2018-02-23 21:28:19 -05:00
Sergey Fedorov a45f8cb49d
tcg/aarch64: Make direct jump patching thread-safe
Ensure direct jump patching in AArch64 is atomic by using
atomic_read()/atomic_set() for code patching.

Backports commit 9e269112953be4d670cb0d25042bd6546fcf3e45 from qemu
2018-02-23 21:28:18 -05:00
Aurelien Jarno 6060ab6596
tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG
Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.

Backports commit 8d8fdbae010aa75a23f0307172e81034125aba6e from qemu
2018-02-23 13:55:21 -05:00
Aurelien Jarno 355ed7cd08
tcg: use tcg_debug_assert instead of assert (fix performance regression)
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Backports commit eabb7b91b36b202b4dac2df2d59d698e3aff197a from qemu
2018-02-23 13:52:13 -05:00