Commit graph

5906 commits

Author SHA1 Message Date
Richard Henderson 1e8da15e61 tcg/i386: Tidy register constraint definitions
Create symbolic constants for all low-byte-addressable
and second-byte-addressable registers. Create a symbol
for the registers that need reserving for softmmu.

There is no functional change for 's', as this letter is
only used for i386. The BYTEL name is correct for the
action we wish from the constraint.

Backports df903b94b3c6fa515da7cf2103513ade06ab0d0f
2021-03-04 15:48:12 -05:00
Richard Henderson ea25434061 tcg/i386: Move constraint type check to tcg_target_const_match
Rather than check the type when filling in the constraint,
check it when matching the constant. This removes the only
use of the type argument to target_parse_constraint.

Backports c7c778b5b9b7865a3e7200805ac561c5d334b8d0
2021-03-04 15:46:28 -05:00
Philippe Mathieu-Daudé daafb0ba17 target/arm: Replace magic value by MMU_DATA_LOAD definition
cpu_get_phys_page_debug() uses 'DATA LOAD' MMU access type.

Backports a9dd161ff2f54446f0b0547447d8196699aca3e1
2021-03-04 15:43:47 -05:00
Richard Henderson 2c8f7b1fbc target/arm: Conditionalize DBGDIDR
Only define the register if it exists for the cpu.

Backports 54a78718be6dd5fc6b6201f84bef8de5ac3b3802
2021-03-04 15:42:03 -05:00
Richard Henderson 073923709f target/arm: Implement ID_PFR2
This was defined at some point before ARMv8.4, and will
shortly be used by new processor descriptions.

Backports 1d51bc96cc4a9b2d31a3f4cb8442ce47753088e2
2021-03-04 15:40:49 -05:00
Richard Henderson 5fb8ab10eb tcg: Restart code generation when we run out of temps
Some large translation blocks can generate so many unique
constants that we run out of temps to hold them. In this
case, longjmp back to the start of code generation and
restart with a smaller translation block.

Backports ae30e86661b0f48562cd95918d37cbeec5d0226
2021-03-04 15:37:05 -05:00
Richard Henderson 554a304d3d qemu/compiler: Split out qemu_build_not_reached_always
Provide a symbol that can always be used to signal an error,
regardless of optimization. Usage of this should be protected
by e.g. __builtin_constant_p, which guards for optimization.

Backports c52ea111e0ea2d5368a3ae601baafaae75e3317f
2021-03-04 15:23:27 -05:00
Philippe Mathieu-Daudé d36a968f8e target/arm/m_helper: Silence GCC 10 maybe-uninitialized error
When building with GCC 10.2 configured with --extra-cflags=-Os, we get:

target/arm/m_helper.c: In function ‘arm_v7m_cpu_do_interrupt’:
target/arm/m_helper.c:1811:16: error: ‘restore_s16_s31’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
1811 | if (restore_s16_s31) {
| ^
target/arm/m_helper.c:1350:10: note: ‘restore_s16_s31’ was declared here
1350 | bool restore_s16_s31;
| ^~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Initialize the 'restore_s16_s31' variable to silence the warning.

Backports 0ae4f11ee57350dac0e705ba79516310400ff43c
2021-03-04 15:16:55 -05:00
Richard Henderson 0636518de4 target/arm: Update REV, PUNPK for pred_desc
Update all users of do_perm_pred2 for the new
predicate descriptor field definitions.

Backports 70acaafef2e053a312d54c09b6721c730690e72c
2021-03-04 15:15:47 -05:00
Richard Henderson eb315be37e target/arm: Update ZIP, UZP, TRN for pred_desc
Update all users of do_perm_pred3 for the new
predicate descriptor field definitions.

Backports f9b0fcceccfc05cde62ff7577fbf2bc13b842414
2021-03-04 15:15:10 -05:00
Richard Henderson fac4e416c9 target/arm: Update PFIRST, PNEXT for pred_desc
These two were odd, in that do_pfirst_pnext passed the
count of 64-bit words rather than bytes. Change to pass
the standard pred_full_reg_size to avoid confusion.

Backports 86300b5d044064046395ae8ed605cc19e63f2a7c
2021-03-04 15:09:47 -05:00
Richard Henderson 4ef4735cd3 target/arm: Introduce PREDDESC field definitions
SVE predicate operations cannot use the "usual" simd_desc
encoding, because the lengths are not a multiple of 8.
But we were abusing the SIMD_* fields to store values anyway.
This abuse broke when SIMD_OPRSZ_BITS was modified in e2e7168a214.

Introduce a new set of field definitions for exclusive use
of predicates, so that it is obvious what kind of predicate
we are manipulating. To be used in future patches

Backports b64ee454a4a086ed459bcda4c0bbb54e197841e4
2021-03-04 15:08:32 -05:00
Rémi Denis-Courmont 9dfa469976 target/arm: refactor vae1_tlbmask()
Backports bc944d3a8b305029196a5e1406702a92fa0b94cf
2021-03-04 15:05:54 -05:00
Rémi Denis-Courmont 8aeaff9385 target/arm: enable Secure EL2 in max CPU
Backports 24179fea7e34c4952d4878ae1b26108ba65e5933
2021-03-04 15:04:43 -05:00
Rémi Denis-Courmont e6d32dc2e0 target/arm: Implement SCR_EL2.EEL2
This adds handling for the SCR_EL3.EEL2 bit.

Backports 926c1b97895879b78ca14bca2831c08740ed1c38
2021-03-04 15:03:08 -05:00
Rémi Denis-Courmont 9690ed8236 target/arm: revector to run-time pick target EL
On ARMv8-A, accesses by 32-bit secure EL1 to monitor registers trap to
the upper (64-bit) EL. With Secure EL2 support, we can no longer assume
that that is always EL3, so make room for the value to be computed at
run-time.

Backports 6b340aeb48e4f7f983e1c38790de65ae93079840
2021-03-04 14:59:14 -05:00
Rémi Denis-Courmont ce8872709f target/arm: set HPFAR_EL2.NS on secure stage 2 faults
Backport 9861248f637ecf11113b04b0b5c7b13c9aa06f09
2021-03-04 14:54:33 -05:00
Rémi Denis-Courmont b49531cfef target/arm: secure stage 2 translation regime
b1a10c868f9b2b09e64009b43450e9a86697d9f3
2021-03-04 14:49:33 -05:00
Rémi Denis-Courmont eeefc3c4a2 target/arm: generalize 2-stage page-walk condition
The stage_1_mmu_idx() already effectively keeps track of which
translation regimes have two stages. Don't hard-code another test.

Backports 7879460a6149ed5e80c29cac85449191d9c5754a
2021-03-04 14:26:22 -05:00
Rémi Denis-Courmont 07ebb7f7ba target/arm: translate NS bit in page-walks
588c6dd113b27b8db393c7264297b9d33261692e
2021-03-04 14:25:13 -05:00
Rémi Denis-Courmont 6f57520b1d target/arm: do S1_ptw_translate() before address space lookup
In the secure stage 2 translation regime, the VSTCR.SW and VTCR.NSW
bits can invert the secure flag for pagetable walks. This patchset
allows S1_ptw_translate() to change the non-secure bit.

Backports 3d4bd397433b12b148d150c8bc5655a696389bd1
2021-03-04 14:23:43 -05:00
Rémi Denis-Courmont ce50ba6d07 target/arm: handle VMID change in secure state
The VTTBR write callback so far assumes that the underlying VM lies in
non-secure state. This handles the secure state scenario.

backports c4f060e89effd70ebdb23d3315495d33af377a09
2021-03-04 14:20:47 -05:00
Rémi Denis-Courmont a78c31e36a target/arm: add ARMv8.4-SEL2 system registers
Backports e9152ee91cc39ed8a53d03607e6e980a7e9444e6
2021-03-04 14:20:10 -05:00
Rémi Denis-Courmont edd5f021e6 target/arm: add MMU stage 1 for Secure EL2
This adds the MMU indices for EL2 stage 1 in secure state.

To keep code contained, which is largelly identical between secure and
non-secure modes, the MMU indices are reassigned. The new assignments
provide a systematic pattern with a non-secure bit.

Backports b6ad6062f1e55bd5b9407ce89e55e3a08b83827c
2021-03-04 14:16:31 -05:00
Rémi Denis-Courmont fbdcef3ca5 target/arm: add 64-bit S-EL2 to EL exception table
With the ARMv8.4-SEL2 extension, EL2 is a legal exception level in
secure mode, though it can only be AArch64.

This patch adds the target EL for exceptions from 64-bit S-EL2.

It also fixes the target EL to EL2 when HCR.{A,F,I}MO are set in secure
mode. Those values were never used in practice as the effective value of
HCR was always 0 in secure mode.

Backports 6c85f906261226e87211506bd9f787fd48a09f17
2021-03-04 14:00:23 -05:00
Rémi Denis-Courmont 159043008f target/arm: Define isar_feature function to test for presence of SEL2
Backports 5ca192dfc551c8a40871c4e30a8b8ceb879adc31
2021-03-04 13:58:57 -05:00
Rémi Denis-Courmont b42e6d6036 target/arm: factor MDCR_EL2 common handling
This adds a common helper to compute the effective value of MDCR_EL2.
That is the actual value if EL2 is enabled in the current security
context, or 0 elsewise.

Backports 59dd089cf9e4a9cddee596c8a1378620df51b9bb
2021-03-04 13:57:34 -05:00
Rémi Denis-Courmont b657bfc59b target/arm: use arm_hcr_el2_eff() where applicable
This will simplify accessing HCR conditionally in secure state.

Backports e04a5752cb03e066d7b1e583e340c7982fcd5e4e
2021-03-04 13:53:30 -05:00
Rémi Denis-Courmont 58af3e76e6 target/arm: use arm_is_el2_enabled() where applicable
Do not assume that EL2 is available in and only in non-secure context.
That equivalence is broken by ARMv8.4-SEL2.

Backports e6ef0169264b00cce552404f689ce137018ff290
2021-03-04 13:49:19 -05:00
Rémi Denis-Courmont 7a694223ca target/arm: add arm_is_el2_enabled() helper
This checks if EL2 is enabled (meaning EL2 registers take effects) in
the current security context.

Backports f3ee5160ce3c03795a28e16d1a0b4916a6c959f4
2021-03-04 13:44:04 -05:00
Rémi Denis-Courmont 7402645436 target/arm: remove redundant tests
In this context, the HCR value is the effective value, and thus is
zero in secure mode. The tests for HCR.{F,I}MO are sufficient.

Backports cc974d5cd84ea60a3dad59752aea712f3d47f8ce
2021-03-04 13:42:12 -05:00
Richard Henderson f6973abb3e target/arm: Add cpu properties to control pauth
The crypto overhead of emulating pauth can be significant for
some workloads. Add two boolean properties that allows the
feature to be turned off, on with the architected algorithm,
or on with an implementation defined algorithm.

We need two intermediate booleans to control the state while
parsing properties lest we clobber ID_AA64ISAR1 into an invalid
intermediate state.

Backports relevent members from eb94284d0812b4e7c11c5d075b584100ac1c1b9a
2021-03-04 13:40:27 -05:00
Richard Henderson 0332498752 target/arm: Implement an IMPDEF pauth algorithm
Without hardware acceleration, a cryptographically strong
algorithm is too expensive for pauth_computepac.

Even with hardware accel, we are not currently expecting
to link the linux-user binaries to any crypto libraries,
and doing so would generally make the --static build fail.

So choose XXH64 as a reasonably quick and decent hash.

Backports 283fc52ade85eb50141f3b8b85f82b07d016cb17
2021-03-04 13:38:22 -05:00
Philippe Mathieu-Daudé 296c32a8da decodetree: Open files with encoding='utf-8'
When decodetree.py was added in commit 568ae7efae7, QEMU was
using Python 2 which happily reads UTF-8 files in text mode.
Python 3 requires either UTF-8 locale or an explicit encoding
passed to open(). Now that Python 3 is required, explicit
UTF-8 encoding for decodetree source files.

To avoid further problems with the user locale, also explicit
UTF-8 encoding for the generated C files.

Explicit both input/output are plain text by using the 't' mode.

This fixes:

$ /usr/bin/python3 scripts/decodetree.py test.decode
Traceback (most recent call last):
File "scripts/decodetree.py", line 1397, in <module>
main()
File "scripts/decodetree.py", line 1308, in main
parse_file(f, toppat)
File "scripts/decodetree.py", line 994, in parse_file
for line in f:
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
ordinal not in range(128)

Backports 4cacecaaa2bbf8af0967bd3eee43297fada475a9
2021-03-04 13:34:08 -05:00
Richard Henderson 419941c3d1 tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
These interfaces have been replaced by tcg_gen_dupi_vec
and tcg_constant_vec.

Backports be986adb35e3594b02ee0d7f1cbec96b08bb29b7
2021-03-04 13:32:32 -05:00
Richard Henderson 8975559888 tcg/i386: Use tcg_constant_vec with tcg vec expanders
Backports 9739a052ad313dbc9b1224f91f23f38e692d3f7e
2021-03-04 13:28:55 -05:00
Richard Henderson 1b811b8546 tcg: Add tcg_reg_alloc_dup2
There are several ways we can expand a vector dup of a 64-bit
element on a 32-bit host.

Backports efe86b21ead9b5d256ce90c378e31681c5e243a5
2021-03-04 13:20:00 -05:00
Richard Henderson 471fc98c49 tcg: Remove movi and dupi opcodes
These are now completely covered by mov from a
TYPE_CONST temporary.

Backports c58f4c97b2ad9247c5ee85d625a934370862fba1
2021-03-04 13:14:30 -05:00
Richard Henderson 6e38e5004f tcg: Use tcg_constant_{i32,i64,vec} with gvec expanders
Backports 88d4005b098427638d7551aa04ebde4fdd06835b
2021-03-04 13:05:25 -05:00
Richard Henderson bd7e78fe4b tcg: Use tcg_constant_{i32,i64} with tcg int expanders
Backports 11d11d61bd9e82ac917c8159f6a2b736829231ae
2021-03-04 12:46:13 -05:00
Richard Henderson 6e54b46d28 tcg: Convert tcg_gen_dupi_vec to TCG_CONST
Because we now store uint64_t in TCGTemp, we can now always
store the full 64-bit duplicate immediate. So remove the
difference between 32- and 64-bit hosts.

Backports 0b4286dd15e2bcaf2aa53dfac0fb3103690f5a34
2021-03-04 12:19:48 -05:00
Richard Henderson 541ef541ae tcg/optimize: Use tcg_constant_internal with constant folding
Backport 8fe35e0444be88de4e3ab80a2a0e210a1f6d663d
2021-03-04 12:15:58 -05:00
Richard Henderson 0038cda620 tcg/optimize: Adjust TempOptInfo allocation
Do not allocate a large block for indexing. Instead, allocate
for each temporary as they are seen.

In general, this will use less memory, if we consider that most
TBs do not touch every target register. This also allows us to
allocate TempOptInfo for new temps created during optimization.

Backports 8f17a975e60b773d7c366a81c0d9bbe304f30859
2021-03-03 21:44:53 -05:00
Richard Henderson e751b45aea tcg/optimize: Improve find_better_copy
Prefer TEMP_CONST over anything else.

Backports 4c868ce6454872d395b29de8d82387b2ad14aeeb
2021-03-03 21:35:28 -05:00
Richard Henderson 8edc9b76dd tcg: Introduce TYPE_CONST temporaries
These will hold a single constant for the duration of the TB.
They are hashed, so that each value has one temp across the TB.

Not used yet, this is all infrastructure.

Backports c0522136adf550c7a0ef7c0755c1f9d1560d2757
2021-03-03 21:29:40 -05:00
Richard Henderson 6100deaffd tcg: Expand TempOptInfo to 64-bits
This propagates the extended value of TCGTemp.val that we did before.
In addition, it will be required for vector constants.

Backports 54795544e4cfb2fa198f7ca244b5ea9eaad322d4
2021-03-03 21:04:23 -05:00
Richard Henderson e84b88344a tcg: Rename struct tcg_temp_info to TempOptInfo
Fix this name vs our coding style.

Backports 6fcb98eda16b27d1999737346cdd4d3c1eae6a57
2021-03-03 20:52:59 -05:00
Richard Henderson 0f71f52216 tcg: Expand TCGTemp.val to 64-bits
This will reduce the differences between 32-bit and 64-bit hosts,
allowing full 64-bit constants to be created with the same interface.

Backports bdb38b95f72ebbef2d24e057828dd18ba9c81f63
2021-03-03 20:46:32 -05:00
Richard Henderson b49c4639d1 tcg: Add temp_readonly
In most, but not all, places that we check for TEMP_FIXED,
we are really testing that we do not modify the temporary.

Backports e01fa97dea857a35be5bb8cce0d632a62e72c689
2021-03-03 20:45:25 -05:00
Richard Henderson 30739864d2 tcg: Consolidate 3 bits into enum TCGTempKind
The temp_fixed, temp_global, temp_local bits are all related.
Combine them into a single enumeration.

Backports ee17db83d2dce35792e9bf03366af193e5e0e5c9
2021-03-03 20:41:24 -05:00
Richard Henderson 520ec7ca76 tcg: Increase tcg_out_dupi_vec immediate to int64_t
While we don't store more than tcg_target_long in TCGTemp,
we shouldn't be limited to that for code generation. We will
be able to use this for INDEX_op_dup2_vec with 2 constants.

Also pass along the minimal vece that may be said to apply
to the constant. This allows some simplification in the
various backends.

Backports 4e18617555955503628a004ed97e1fc2fa7818b9
2021-03-03 20:27:39 -05:00
Richard Henderson c5c19529c5 tcg: Use tcg_out_dupi_vec from temp_load
Having dupi pass though movi is confusing and arguably wrong.

Backports 0a6a8bc8ebfe5ae2a3f18ef48b92a74bc2df2f96
2021-03-03 20:23:02 -05:00
Peter Maydell 68f645dd4f target/arm: Don't decode insns in the XScale/iWMMXt space as cp insns
In commit cd8be50e58f63413c0 we converted the A32 coprocessor
insns to decodetree. This accidentally broke XScale/iWMMXt insns,
because it moved the handling of "cp insns which are handled
by looking up the cp register in the hashtable" from after the
call to the legacy disas_xscale_insn() decode to before it,
with the result that all XScale/iWMMXt insns now UNDEF.

Update valid_cp() so that it knows that on XScale cp 0 and 1
are not standard coprocessor instructions; this will cause
the decodetree trans_ functions to ignore them, so that
execution will correctly get through to the legacy decode again.

Backports e4d51ac6921dc861bfb3d20e4c7dcf345840a9da
2021-03-03 20:17:20 -05:00
Leif Lindholm 09fd12e5f2 target/arm: add aarch32 ID register fields to cpu.h
Add entries present in ARM DDI 0487F.c (August 2020).

Backports bd78b6be24f3ceb71f1a7ec2c98c7a5e49cb4a86
2021-03-03 20:16:26 -05:00
Leif Lindholm a2faae9e30 target/arm: add aarch64 ID register fields to cpu.h
Add entries present in ARM DDI 0487F.c (August 2020).

Backports 00a92832f453275ca023962c00a60dde3a4f2fed
2021-03-03 20:15:16 -05:00
Leif Lindholm ba891afd32 target/arm: add descriptions of CLIDR_EL1, CCSIDR_EL1, CTR_EL0 to cpu.h
Backports 2a14526a6f56973348d622abc572db377f5a23ef
2021-03-03 20:14:05 -05:00
Leif Lindholm fc8e5fe38d target/arm: make ARMCPU.ctr 64-bit
When FEAT_MTE is implemented, the AArch64 view of CTR_EL0 adds the
TminLine field in bits [37:32].
Extend the ctr field to be able to hold this context.

Backports a5fd319ae7f6d496ff5448ec1dedcae8e2f59e9f
2021-03-03 20:13:20 -05:00
Leif Lindholm e6eb25f75a target/arm: make ARMCPU.clidr 64-bit
The AArch64 view of CLIDR_EL1 extends the ICB field to include also bit
32, as well as adding a Ttype<n> field when FEAT_MTE is implemented.
Extend the clidr field to be able to hold this context.

Backports f6450bcb6b2d3e4beae77141edce9e99cb8c277e
2021-03-03 20:12:48 -05:00
Leif Lindholm 3fff83e48f target/arm: fix typo in cpu.h ID_AA64PFR1 field name
SBSS -> SSBS

Backports 9a286bcdfd2b04afca9a668a6d6e0feb809d2d63
2021-03-03 20:12:08 -05:00
Rémi Denis-Courmont 6f06f383ea target/arm: enable Small Translation tables in max CPU
Backports 078e9fe3cbd6894fb6e420d8b53f304a3d5c0464
2021-03-03 20:11:10 -05:00
Rémi Denis-Courmont c7415c92d5 target/arm: ARMv8.4-TTST extension
This adds for the Small Translation tables extension in AArch64 state.

Backports c36c65ea3c35b309d524c05a1c05fdeabf83ddd5
2021-03-03 20:09:01 -05:00
Peter Maydell f7939926dc target/arm: Implement Cortex-M55 model
Now that we have implemented all the features needed by the v8.1M
architecture, we can add the model of the Cortex-M55. This is the
configuration without MVE support; we'll add MVE later

Backports 590e05d6b48937f6d3c631354fd706f8e005b8f6
2021-03-03 20:06:06 -05:00
Peter Maydell e586a27a7b target/arm: Implement FPCXT_NS fp system register
Implement the v8.1M FPCXT_NS floating-point system register. This is
a little more complicated than FPCXT_S, because it has specific
handling for "current FP state is inactive", and it only wants to do
PreserveFPState(), not the full set of actions done by
ExecuteFPCheck() which vfp_access_check() implements.

Backports eb20dafdbff92063a88624176fdc396e01961bf3
2021-03-03 20:02:36 -05:00
Peter Maydell 311b6fd74c target/arm: Correct store of FPSCR value via FPCXT_S
In commit 64f863baeedc8659 we implemented the v8.1M FPCXT_S register,
but we got the write behaviour wrong. On read, this register reads
bits [27:0] of FPSCR plus the CONTROL.SFPA bit. On write, it doesn't
just write back those bits -- it writes a value to the whole FPSCR,
whose upper 4 bits are zeroes.

We also incorrectly implemented the write-to-FPSCR as a simple store
to vfp.xregs; this skips the "update the softfloat flags" part of
the vfp_set_fpscr helper so the value would read back correctly but
not actually take effect.

Fix both of these things by doing a complete write to the FPSCR
using the helper function.

Backports 7fbf95a037d79c5e923ffb51ac902dbe9599c87f
2021-03-03 19:57:56 -05:00
Richard Henderson 85b417d438 target/arm: Fix MTE0_ACTIVE
In 50244cc76abc we updated mte_check_fail to match the ARM
pseudocode, using the correct EL to select the TCF field.
But we failed to update MTE0_ACTIVE the same way, which led
to g_assert_not_reached().

Backports cc97b0019bb590b9b3c2a623e9ebee48831e0ce3
2021-03-03 19:56:23 -05:00
Richard Henderson d0e0c847e1 tcg/aarch64: Use B not BL for tcg_out_goto_long
A typo generated a branch-and-link insn instead of plain branch.

Backports f716bab3a9553259ff90505b3ddd245f4f8c4061
2021-03-03 19:55:13 -05:00
Richard Henderson 9d453e820a tcg: Introduce INDEX_op_qemu_st8_i32
Enable this on i386 to restrict the set of input registers
for an 8-bit store, as required by the architecture. This
removes the last use of scratch registers for user-only mode.

Backports 07ce0b05300de5bc8f1932a4cfbe38f3323e5ab1
2021-03-03 19:51:20 -05:00
Richard Henderson a90476c897 tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP
Always true when movbe is available, otherwise leave
this to generic code.

Backports d2ef1b83a7a2047e0e36d7b62b3a5d151ab958f5
2021-03-03 19:40:00 -05:00
Peter Maydell 1a3abaa81a target/i386: Check privilege level for protected mode 'int N' task gate
When the 'int N' instruction is executed in protected mode, the
pseudocode in the architecture manual specifies that we need to check:

* vector number within IDT limits
* selected IDT descriptor is a valid type (interrupt, trap or task gate)
* if this was a software interrupt then gate DPL < CPL

The way we had structured the code meant that the privilege check for
software interrupts ended up not in the code path taken for task gate
handling, because all of the task gate handling code was in the 'case 5'
of the switch which was checking "is this descriptor a valid type".

Move the task gate handling code out of that switch (so that it is now
purely doing the "valid type?" check) and below the software interrupt
privilege check.

The effect of this missing check was that in a guest userspace binary
executing 'int 8' would cause a guest kernel panic rather than the
userspace binary being handed a SEGV.

This is essentially the same bug fixed in VirtualBox in 2012:
https://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/

Note that for QEMU this is not a security issue because it is only
present when using TCG.

Backports 3df1a3d070575419859cbbab1083fafa7ec2669a
2021-03-03 19:32:10 -05:00
Richard Henderson 5fc12c7fba tcg: Add tcg_gen_bswap_tl alias
The alias is intended to indicate that the bswap is for the
entire target_long. This should avoid ifdefs on some targets.

Backports a66424ba17d661007dc13d78c9e3014ccbaf0efb
2021-03-03 19:30:11 -05:00
Richard Henderson 4ccadaf6cf tcg: Use memset for large vector byte replication
In f47db80cc07, we handled odd-sized tail clearing for
the case of hosts that have vector operations, but did
not handle the case of hosts that do not have vector ops.

This was ok until e2e7168a214b, which changed the encoding
of simd_desc such that the odd sizes are impossible.

Add memset as a tcg helper, and use that for all out-of-line
byte stores to vectors. This includes, but is not limited to,
the tail clearing operation in question.

Backports 6d3ef04893bdea3e7aa08be3cce5141902836a31
2021-03-03 19:28:15 -05:00
Thomas Huth 6a22a7b80e tcg/optimize: Add fallthrough annotations
To be able to compile this file with -Werror=implicit-fallthrough,
we need to add some fallthrough annotations to the case statements
that might fall through. Unfortunately, the typical "/* fallthrough */"
comments do not work here as expected since some case labels are
wrapped in macros and the compiler fails to match the comments in
this case. But using __attribute__((fallthrough)) seems to work fine,
so let's use that instead (by introducing a new QEMU_FALLTHROUGH
macro in our compiler.h header file).

Backports d84568b773fe1fc469c4d8419c3545be52eec82c
2021-03-03 19:18:50 -05:00
Marc-André Lureau 782e912c98 compiler.h: remove GCC < 3 __builtin_expect fallback
Since commit efc6c07 ("configure: Add a test for the minimum compiler
version"), QEMU explicitely depends on GCC >= 4.8.

(clang >= 3.4 advertizes itself as GCC >= 4.2 compatible and supports
__builtin_expect too)

Backports 44cb2c9fe5dd2aa8b44eb42f34ec786ba21a2731
2021-03-03 19:16:12 -05:00
Philippe Mathieu-Daudé 4a0f9846b2 accel/tcg: Remove special case for GCC < 4.6
Since commit efc6c070aca ("configure: Add a test for the
minimum compiler version") the minimum compiler version
required for GCC is 4.8.

We can safely remove the special case for GCC 4.6 introduced
in commit 0448f5f8b81 ("cpu-exec: Fix compiler warning
(-Werror=clobbered)").
No change for Clang as we don't know.

Backports 19a84318c674c157f1b04c5c99595379f8ac8bb3
2021-03-03 19:15:07 -05:00
zhaolichang f526d4455c m68k: fix some comment spelling errors
I found that there are many spelling errors in the comments of qemu/target/m68k.
I used spellcheck to check the spelling errors and found some errors in the folder.

Backports ce00ff729ee8461dc94a1593d25ceda65d973d3c
2021-03-03 19:13:26 -05:00
Laurent Vivier bf2c52bc83 target/m68k: remove useless qregs array
They are unused since the target has been converted to TCG.

Backports 4160d5e6bd347e5d27804912b61d02df0a90ba8e
2021-03-03 19:11:44 -05:00
Bin Meng c59e391194 target/i386: seg_helper: Correct segment selector nullification in the RET/IRET helper
Per the SDM, when returning to outer privilege level, for segment
registers (ES, FS, GS, and DS) if the check fails, the segment
selector becomes null, but QEMU clears the base/limit/flags as well
as nullifying the segment selector, which should be a spec violation.

Real hardware seems to be compliant with the spec, at least on one
Coffee Lake board I tested.

Backports c2ba0515f2df58a661fcb5d6485139877d92ab1b
2021-03-03 19:10:24 -05:00
Paolo Bonzini 1da5d669a7 target/i386: fix operand order for PDEP and PEXT
For PDEP and PEXT, the mask is provided in the memory (mod+r/m)
operand, and therefore is loaded in s->T0 by gen_ldst_modrm.
The source is provided in the second source operand (VEX.vvvv)
and therefore is loaded in s->T1. Fix the order in which
they are passed to the helpers.

Backports 75b208c28316095c4685e8596ceb9e3f656592e2
2021-03-03 19:09:21 -05:00
Peter Maydell a9abb7c647 target/arm: Implement M-profile "minimal RAS implementation"
For v8.1M the architecture mandates that CPUs must provide at
least the "minimal RAS implementation" from the Reliability,
Availability and Serviceability extension. This consists of:
* an ESB instruction which is a NOP
-- since it is in the HINT space we need only add a comment
* an RFSR register which will RAZ/WI
* a RAZ/WI AIRCR.IESB bit
-- the code which handles writes to AIRCR does not allow setting
of RES0 bits, so we already treat this as RAZ/WI; add a comment
noting that this is deliberate
* minimal implementation of the RAS register block at 0xe0005000
-- this will be in a subsequent commit
* setting the ID_PFR0.RAS field to 0b0010
-- we will do this when we add the Cortex-M55 CPU model

Backports 46f4976f22a4549322307b34272e053d38653243
2021-03-03 19:07:27 -05:00
Peter Maydell 543483444d target/arm: Implement CCR_S.TRD behaviour for SG insns
v8.1M introduces a new TRD flag in the CCR register, which enables
checking for stack frame integrity signatures on SG instructions.
Add the code in the SG insn implementation for the new behaviour.

Backports 7f484147369080d36c411c4ba969f90d025aed55
2021-03-03 19:05:25 -05:00
Peter Maydell 7aa516aff2 target/arm: Implement new v8.1M VLLDM and VLSTM encodings
v8.1M adds new encodings of VLLDM and VLSTM (where bit 7 is set).
The only difference is that:
* the old T1 encodings UNDEF if the implementation implements 32
Dregs (this is currently architecturally impossible for M-profile)
* the new T2 encodings have the implementation-defined option to
read from memory (discarding the data) or write UNKNOWN values to
memory for the stack slots that would be D16-D31

We choose not to make those accesses, so for us the two
instructions behave identically assuming they don't UNDEF.

Backports fe6fa228a71f0eb8b8ee315452e6a7736c537b1f
2021-03-03 19:01:33 -05:00
Peter Maydell f02045f5f5 target/arm: Implement new v8.1M NOCP check for exception return
In v8.1M a new exception return check is added which may cause a NOCP
UsageFault (see rule R_XLTP): before we clear s0..s15 and the FPSCR
we must check whether access to CP10 from the Security state of the
returning exception is disabled; if it is then we must take a fault.

(Note that for our implementation CPPWR is always RAZ/WI and so can
never cause CP10 accesses to fail.)

The other v8.1M change to this register-clearing code is that if MVE
is implemented VPR must also be cleared, so add a TODO comment to
that effect.

Backports 3423fbf10427db7680d3237d4f62d8370052fca0
2021-03-03 18:59:37 -05:00
Peter Maydell 05d479a8c0 target/arm: For v8.1M, always clear R0-R3, R12, APSR, EPSR on exception entry
In v8.0M, on exception entry the registers R0-R3, R12, APSR and EPSR
are zeroed for an exception taken to Non-secure state; for an
exception taken to Secure state they become UNKNOWN, and we chose to
leave them at their previous values.

In v8.1M the behaviour is specified more tightly and these registers
are always zeroed regardless of the security state that the exception
targets (see rule R_KPZV). Implement this.

Backports a59b1ed618415212c5f0f05abc1192e14ad5fdbb
2021-03-03 18:55:56 -05:00
Peter Maydell 94b36be626 target/arm: Implement FPCXT_S fp system register
Implement the new-in-v8.1M FPCXT_S floating point system register.
This is for saving and restoring the secure floating point context,
and it reads and writes bits [27:0] from the FPSCR and the
CONTROL.SFPA bit in bit [31].

Backports 64f863baeedc86590a608e2f1722dd8640aa9431
2021-03-03 18:53:23 -05:00
Peter Maydell 362379a9e1 target/arm: Factor out preserve-fp-state from full_vfp_access_check()
Factor out the code which handles M-profile lazy FP state preservation
from full_vfp_access_check(); accesses to the FPCXT_NS register are
a special case which need to do just this part (corresponding in the
pseudocode to the PreserveFPState() function), and not the full
set of actions matching the pseudocode ExecuteFPCheck() which
normal FP instructions need to do.

Backports 96dfae686628fc14ba4f993824322b93395e221b
2021-03-03 18:48:47 -05:00
Peter Maydell 2de945ba4d target/arm: Use new FPCR_NZCV_MASK constant
We defined a constant name for the mask of NZCV bits in the FPCR/FPSCR
in the previous commit; use it in a couple of places in existing code,
where we're masking out everything except NZCV for the "load to Rt=15
sets CPSR.NZCV" special case.

Backports 6a017acdf83e3bb6bd5e85289ca90b2ea3282b7e
2021-03-03 18:47:30 -05:00
Peter Maydell 2c6e54d1cd target/arm: Implement M-profile FPSCR_nzcvqc
v8.1M defines a new FP system register FPSCR_nzcvqc; this behaves
like the existing FPSCR, except that it reads and writes only bits
[31:27] of the FPSCR (the N, Z, C, V and QC flag bits). (Unlike the
FPSCR, the special case for Rt=15 of writing the CPSR.NZCV is not
permitted.)

Implement the register. Since we don't yet implement MVE, we handle
the QC bit as RES0, with todo comments for where we will need to add
support later.

Backports 9542c30bcf13c495400d63616dd8dfa825b04685
2021-03-03 18:45:38 -05:00
Peter Maydell 56532aa94c target/arm: Implement VLDR/VSTR system register
Implement the new-in-v8.1M VLDR/VSTR variants which directly
read or write FP system registers to memory.

Backports 0bf0dd4dcbd9fab324700ac6e0cd061cd043de0d
2021-03-03 18:42:05 -05:00
Peter Maydell edae732810 target/arm: Move general-use constant expanders up in translate.c
The constant-expander functions like negate, plus_2, etc, are
generally useful; move them up in translate.c so we can use them in
the VFP/Neon decoders as well as in the A32/T32/T16 decoders.

Backports f7ed0c9433e7c5c157d2e6235eb5c8b93234a71a
2021-03-03 18:29:32 -05:00
Peter Maydell a72c744370 target/arm: Refactor M-profile VMSR/VMRS handling
Currently M-profile borrows the A-profile code for VMSR and VMRS
(access to the FP system registers), because all it needs to support
is the FPSCR. In v8.1M things become significantly more complicated
in two ways:

* there are several new FP system registers; some have side effects
on read, and one (FPCXT_NS) needs to avoid the usual
vfp_access_check() and the "only if FPU implemented" check

* all sysregs are now accessible both by VMRS/VMSR (which
reads/writes a general purpose register) and also by VLDR/VSTR
(which reads/writes them directly to memory)

Refactor the structure of how we handle VMSR/VMRS to cope with this:

* keep the M-profile code entirely separate from the A-profile code

* abstract out the "read or write the general purpose register" part
of the code into a loadfn or storefn function pointer, so we can
reuse it for VLDR/VSTR.

Backports 32a290b8c3c2dc85cd88bd8983baf900d575cab
2021-03-03 18:13:17 -05:00
Peter Maydell 4eafe42d67 target/arm: Enforce M-profile VMRS/VMSR register restrictions
For M-profile before v8.1M, the only valid register for VMSR/VMRS is
the FPSCR. We have a comment that states this, but the actual logic
to forbid accesses for any other register value is missing, so we
would end up with A-profile style behaviour. Add the missing check.

Backports ede97c9d71110821738a48f88ff9f10d6bec017f
2021-03-03 18:06:23 -05:00
Peter Maydell 2e3bd010a8 target/arm: Implement CLRM instruction
In v8.1M the new CLRM instruction allows zeroing an arbitrary set of
the general-purpose registers and APSR. Implement this.

The encoding is a subset of the LDMIA T2 encoding, using what would
be Rn=0b1111 (which UNDEFs for LDMIA).

Backports 6e21a013fbdf54960a079dccc90772bb622e28e8
2021-03-03 18:00:28 -05:00
Peter Maydell 43d8441881 target/arm: Implement VSCCLRM insn
Implement the v8.1M VSCCLRM insn, which zeros floating point
registers if there is an active floating point context.
This requires support in write_neon_element32() for the MO_32
element size, so add it.

Because we want to use arm_gen_condlabel(), we need to move
the definition of that function up in translate.c so it is
before the #include of translate-vfp.c.inc.

Backports 83ff3d6add965c9752324de11eac5687121ea826
2021-03-03 17:57:30 -05:00
Peter Maydell 952ebdc207 target/arm: Don't clobber ID_PFR1.Security on M-profile cores
In arm_cpu_realizefn() we check whether the board code disabled EL3
via the has_el3 CPU object property, which we create if the CPU
starts with the ARM_FEATURE_EL3 feature bit. If it is disabled, then
we turn off ARM_FEATURE_EL3 and also zero out the relevant fields in
the ID_PFR1 and ID_AA64PFR0 registers.

This codepath was incorrectly being taken for M-profile CPUs, which
do not have an EL3 and don't set ARM_FEATURE_EL3, but which may have
the M-profile Security extension and so should have non-zero values
in the ID_PFR1.Security field.

Restrict the handling of the feature flag to A/R-profile cores.

Backports 4018818840f499d0a478508aedbb6802c8eae928
2021-03-03 17:52:30 -05:00
Peter Maydell cfefada296 target/arm: Implement v8.1M PXN extension
In v8.1M the PXN architecture extension adds a new PXN bit to the
MPU_RLAR registers, which forbids execution of code in the region
from a privileged mode.

This is another feature which is just in the generic "in v8.1M" set
and has no ID register field indicating its presence.

Backports cad8e2e3160dd10371552fce6cd8c6e171503e13
2021-03-03 17:50:26 -05:00
Peter Maydell b9c51dc19a Open 6.0 development tree
Backports c923a30481baf87f631659085f94cd6000116192
2021-03-02 13:39:05 -05:00
Peter Maydell e6ae2e0245 Update version for v5.2.0 release
Backports 553032db17440f8de011390e5a1cfddd13751b0b
2021-03-02 13:38:38 -05:00
Peter Maydell 530491aef0 Update version for v5.2.0-rc4 release
Backports d73c46e4a84e47ffc61b8bf7c378b1383e7316b5
2021-03-02 13:38:19 -05:00
Peter Maydell d823f26c5e Update version for v5.2.0-rc3 release
Backports dd3d2340c4076d1735cd0f7cb61f4d8622b9562d
2021-03-02 13:37:49 -05:00
Rémi Denis-Courmont d9592046ef target/arm: fix stage 2 page-walks in 32-bit emulation
Using a target unsigned long would limit the Input Address to a LPAE
page-walk to 32 bits on AArch32 and 64 bits on AArch64. This is okay
for stage 1 or on AArch64, but it is insufficient for stage 2 on
AArch32. In that later case, the Input Address can have up to 40 bits.

Backports commit 98e8779770c40901ed585745aacc9a8e2b934a28
2021-03-02 13:37:02 -05:00
Peter Maydell 5eb86e4d3c Update version for v5.2.0-rc2 release
66a300a107ec286725bdc943601cbd4247b82158
2021-03-02 13:35:58 -05:00
Philippe Mathieu-Daudé 7bb2c171ac qemu/bswap: Remove unused qemu_bswap_len()
Last use of qemu_bswap_len() has been removed in commit
e5fd1eb05ec ("apb: add busA qdev property to PBM PCI bridge").

Backport 949eaaad5341db318fc8bae79489a1f7624f3b9e
2021-03-02 13:35:17 -05:00
Chetan Pant 3e25486110 x86 tcg cpus: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

Backport d9ff33ada7f32ca59f99b270a2d0eb223b3c9c8f
2021-03-02 13:33:10 -05:00
Chetan Pant c7f6786089 arm tcg cpus: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

Backports 50f57e09fda4b7ffbc5ba62aad6cebf660824023
2021-03-02 13:30:35 -05:00
Peter Maydell e19550db6d Update version for v5.2.0-rc1 release
Backports c6f28ed5075df79fef39c500362a3f4089256c9c
2021-03-02 13:25:21 -05:00
Peter Maydell f991d945d3 target/arm/translate-neon.c: Handle VTBL UNDEF case before VFP access check
Checks for UNDEF cases should go before the "is VFP enabled?" access
check, except in special cases. Move a stray UNDEF check in the VTBL
trans function up above the access check.

Backports b6c56c8a9a4064ea783f352f43c5df6231a110fa
2021-03-02 13:24:51 -05:00
Richard Henderson 9623047097 target/arm: Fix neon VTBL/VTBX for len > 1
The helper function did not get updated when we reorganized
the vector register file for SVE. Since then, the neon dregs
are non-sequential and cannot be simply indexed.

At the same time, make the helper function operate on 64-bit
quantities so that we do not have to call it twice.

Backports 604cef3e57eaeeef77074d78f6cf2eca1be11c62
2021-03-02 13:23:13 -05:00
Xinhao Zhang b3f63b72a2 target/arm: add space before the open parenthesis '('
Fix code style. Space required before the open parenthesis '('.

Backports 7f350a87e3a85e8a260ce4b133d549a7b2789213
2021-03-02 13:17:48 -05:00
Xinhao Zhang 71d4aced5d target/arm: Don't use '#' flag of printf format
Fix code style. Don't use '#' flag of printf format ('%#') in
format strings, use '0x' prefix instead

Backports 6eb55edbabb9eed1e4c7dfb233e7d738e8b5fa89
2021-03-02 13:16:09 -05:00
Xinhao Zhang 492fbc4d2c target/arm: add spaces around operator
Fix code style. Operator needs spaces both sides.

Backports bdc3b6f570e8bd219aa6a24a149b35a691e6986c
2021-03-02 13:15:12 -05:00
Peter Maydell 348504c386 Update version for v5.2.0-rc0 release
Backports 3d6e32347a3b57dac7f469a07c5f520e69bd070a
2021-03-02 13:10:16 -05:00
Peter Maydell e528c8229e target/arm: Get correct MMU index for other-security-state
In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Backports 7142eb9e24b4aa5118cd67038057f15694d782aa
2021-03-02 13:08:44 -05:00
Rémi Denis-Courmont a4053565d6 target/arm: fix LORID_EL1 access check
Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Backports 9bd268bae5c4760870522292fb1d46e7da7e372a
2021-03-02 13:06:50 -05:00
Rémi Denis-Courmont df4413edc7 target/arm: fix handling of HCR.FB
HCR should be applied when NS is set, not when it is cleared.

Backports 373e7ffde9bae90a20fb5db21b053f23091689f4
2021-03-02 13:05:01 -05:00
Peter Maydell 6b8096d9fc target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Backports d1a9254be5cc93afb15be19f7543da6ff4806256
2021-03-02 13:03:51 -05:00
Peter Maydell 5c6730a432 target/arm: Fix float16 pairwise Neon ops on big-endian hosts
In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data. This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Backports 552714c0812a10e5cff239bd29928e5fcb8d8b3b
2021-03-02 13:02:31 -05:00
Richard Henderson d473f66177 target/arm: Improve do_prewiden_3d
We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Backports 8aab18a2c5209e4e48998a61fbc2d89f374331ed
2021-03-02 13:00:25 -05:00
Richard Henderson 9263117d47 target/arm: Simplify do_long_3d and do_2scalar_long
In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps

Backports 9f1a5f93c2dd345dc6c8fe86ed14bf1485056f6e
2021-03-02 12:46:53 -05:00
Richard Henderson 07c2b70234 target/arm: Rename neon_load_reg64 to vfp_load_reg64
The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Backports b38b96ca90827012ab8eb045c1337cea83a54c4b
2021-03-02 12:43:25 -05:00
Richard Henderson 9d87b62578 target/arm: Add read/write_neon_element64
Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Backports 0aa8e700a53b0aa7275ed747b8fa3acb61d35f2d
2021-03-02 12:40:33 -05:00
Richard Henderson 89b1f62878 target/arm: Rename neon_load_reg32 to vfp_load_reg32
The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Backports 21c1c0e50b73c580c6bfc8f2314d1b6a14793561
2021-03-02 12:30:20 -05:00
Richard Henderson 011d9ab061 target/arm: Expand read/write_neon_element32 to all MemOp
We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Backports 4d5fa5a80ac28f34b8497be1e85371272413a12e
2021-03-02 12:26:41 -05:00
Richard Henderson d21316d639 target/arm: Add read/write_neon_element32
Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc. The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Backports a712266f5d5a36d04b22fe69fa15592d62bed019
2021-03-02 12:18:31 -05:00
Richard Henderson e390c1ec7f target/arm: Use neon_element_offset in vfp_reg_offset
This seems a bit more readable than using offsetof CPU_DoubleU.

Backports d8719785fde2f5041986853a314c05c6f567d3cb
2021-03-02 11:55:49 -05:00
Richard Henderson c1ca9e53da target/arm: Use neon_element_offset in neon_load/store_reg
These are the only users of neon_reg_offset, so remove that.

Backports 0f2cdc82276a723ee58562b56b9d537a4bd7bfef
2021-03-02 11:54:56 -05:00
Richard Henderson 1b09d0d96f target/arm: Move neon_element_offset to translate.c
This will shortly have users outside of translate-neon.c.inc.

Backports 7ec85c02833f4264840c6ed78b749443a7b4ffe0
2021-03-02 11:52:59 -05:00
Richard Henderson 8a20537e7f target/arm: Introduce neon_full_reg_offset
This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0. This fixes a bug
when running on a big-endian host.

Backports 015ee81a4c06b644969f621fd9965cc6372b879e
2021-03-02 11:50:36 -05:00
Peter Maydell 2f0940677e target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
If the M-profile low-overhead-branch extension is implemented, FPSCR
bits [18:16] are a new field LTPSIZE. If MVE is not implemented
(currently always true for us) then this field always reads as 4 and
ignores writes.

These bits used to be the vector-length field for the old
short-vector extension, so we need to take care that they are not
misinterpreted as setting vec_len. We do this with a rearrangement
of the vfp_set_fpscr() code that deals with vec_len, vec_stride
and also the QC bit; this obviates the need for the M-profile
only masking step that we used to have at the start of the function.

We provide a new field in CPUState for LTPSIZE, even though this
will always be 4, in preparation for MVE, so we don't have to
come back later and split it out of the vfp.xregs[FPSCR] value.
(This state struct field will be saved and restored as part of
the FPSCR value via the vmstate_fpscr in machine.c.)

Backports 8128c8e8cc9489a8387c74075974f86dc0222e7f
2021-03-01 20:36:02 -05:00
Peter Maydell 8a6e118a17 target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
M-profile CPUs with half-precision floating point support should
be able to write to FPSCR.FZ16, but an M-profile specific masking
of the value at the top of vfp_set_fpscr() currently prevents that.
This is not yet an active bug because we have no M-profile
FP16 CPUs, but needs to be fixed before we can add any.

The bits that the masking is effectively preventing from being
set are the A-profile only short-vector Len and Stride fields,
plus the Neon QC bit. Rearrange the order of the function so
that those fields are handled earlier and only under a suitable
guard; this allows us to drop the M-profile specific masking,
making FZ16 writeable.

This change also makes the QC bit correctly RAZ/WI for older
no-Neon A-profile cores.

This refactoring also paves the way for the low-overhead-branch
LTPSIZE field, which uses some of the bits that are used for
A-profile Stride and Len.

Backports commit d31e2ce68d56f5bcc83831497e5fe4b8a7e18e85
2021-03-01 20:33:22 -05:00
Peter Maydell 3ae5543825 target/arm: Implement v8.1M low-overhead-loop instructions
v8.1M's "low-overhead-loop" extension has three instructions
for looping:
* DLS (start of a do-loop)
* WLS (start of a while-loop)
* LE (end of a loop)

The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR. The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.

As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.

(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors. We'll add those later when we implement
MVE.)

Backports commit b7226369721896ab9ef71544e4fe95b40710e05a
2021-03-01 20:29:04 -05:00
Peter Maydell be197f9857 target/arm: Implement v8.1M branch-future insns (as NOPs)
v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).

For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.

Backports commit 05903f036edba8e3ed940cc215b8e27fb49265b9
2021-03-01 20:25:15 -05:00
Peter Maydell 966246d991 target/arm: Don't allow BLX imm for M-profile
The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.

(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)

Backports 920f04fa3ea789f8f85a52cee5395b8887b56cf7
2021-03-01 20:23:59 -05:00
Peter Maydell 5680bc701b target/arm: Make the t32 insn[25:23]=111 group non-overlapping
The t32 decode has a group which represents a set of insns
which overlap with B_cond_thumb because they have [25:23]=111
(which is an invalid condition code field for the branch insn).
This group is currently defined using the {} overlap-OK syntax,
but it is almost entirely non-overlapping patterns. Switch
it over to use a non-overlapping group.

For this to be valid syntactically, CPS must move into the same
overlapping-group as the hint insns (CPS vs hints was the
only actual use of the overlap facility for the group).

The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
necessary and so we can remove it (promoting those insns to
be members of the parent group).

Backports 45f11876ae86128bdee27e0b089045de43cc88e4
2021-03-01 20:22:11 -05:00
Peter Maydell 666fe17025 target/arm: Implement v8.1M conditional-select insns
v8.1M brings four new insns to M-profile:
* CSEL : Rd = cond ? Rn : Rm
* CSINC : Rd = cond ? Rn : Rm+1
* CSINV : Rd = cond ? Rn : ~Rm
* CSNEG : Rd = cond ? Rn : -Rm

Implement these.

Backports cc73bbded0dfb5612b0e416f7eda13a66950542a
2021-03-01 20:19:33 -05:00
Peter Maydell 2dae268fcb target/arm: Implement v8.1M NOCP handling
From v8.1M, disabled-coprocessor handling changes slightly:
* coprocessors 8, 9, 14 and 15 are also governed by the
cp10 enable bit, like cp11
* an extra range of instruction patterns is considered
to be inside the coprocessor space

We previously marked these up with TODO comments; implement the
correct behaviour.

Unfortunately there is no ID register field which indicates this
behaviour. We could in theory test an unrelated ID register which
indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
>= 3 (low-overhead-loops), but it seems better to simply define a new
ARM_FEATURE_V8_1M feature flag and use it for this and other
new-in-v8.1M behaviour that isn't identifiable from the ID registers.

Backports commit 5d2555a1fe7370feeb1efbbf276a653040910017
2021-03-01 20:16:09 -05:00
Peter Maydell 51093daf5f decodetree: Fix codegen for non-overlapping group inside overlapping group
For nested groups like:

{
[
pattern 1
pattern 2
]
pattern 3
}

the intended behaviour is that patterns 1 and 2 must not
overlap with each other; if the insn matches neither then
we fall through to pattern 3 as the next thing in the
outer overlapping group.

Currently we generate incorrect code for this situation,
because in the code path for a failed match inside the
inner non-overlapping group we generate a "return" statement,
which causes decode to stop entirely rather than continuing
to the next thing in the outer group.

Generate a "break" instead, so that decode flow behaves
as required for this nested group case.

Backports 514101c0b931f0a11a40d29d26af1cc40482f951
2021-03-01 20:14:19 -05:00
Richard Henderson f7e831a7e4 target/arm: Ignore HCR_EL2.ATA when {E2H,TGE} != 11
Unlike many other bits in HCR_EL2, the description for this
bit does not contain the phrase "if ... this field behaves
as 0 for all purposes other than", so do not squash the bit
in arm_hcr_el2_eff.

Instead, replicate the E2H+TGE test in the two places that
require it.

Backports 4301acd7d7d455792ea873ced75c0b5d653618b1
2021-03-01 20:12:36 -05:00
Richard Henderson 4f00eacb11 target/arm: Fix reported EL for mte_check_fail
The reporting in AArch64.TagCheckFail only depends on PSTATE.EL,
and not the AccType of the operation. There are two guest
visible problems that affect LDTR and STTR because of this:

(1) Selecting TCF0 vs TCF1 to decide on reporting,
(2) Report "data abort same el" not "data abort lower el".

Backports 50244cc76abcac3296cff3d84826f5ff71808c80
2021-03-01 20:10:44 -05:00
Richard Henderson 511636a3f4 target/arm: Remove redundant mmu_idx lookup
We already have the full ARMMMUIdx as computed from the
function parameter.

For the purpose of regime_has_2_ranges, we can ignore any
difference between AccType_Normal and AccType_Unpriv, which
would be the only difference between the passed mmu_idx
and arm_mmu_idx_el.

Backports 4aedfc0f633fd06dd2a5dc8ffa93f4c56117e37f
2021-03-01 20:09:51 -05:00
Peter Maydell d350644817 target/arm: AArch32 VCVT fixed-point to float is always round-to-nearest
For AArch32, unlike the VCVT of integer to float, which honours the
rounding mode specified by the FPSCR, VCVT of fixed-point to float is
always round-to-nearest. (AArch64 fixed-point-to-float conversions
always honour the FPCR rounding mode.)

Implement this by providing _round_to_nearest versions of the
relevant helpers which set the rounding mode temporarily when making
the call to the underlying softfloat function.

We only need to change the VFP VCVT instructions, because the
standard- FPSCR value used by the Neon VCVT is always set to
round-to-nearest, so we don't need to do the extra work of saving
and restoring the rounding mode.

Backports commit 61db12d9f9eb36761edba4d9a414cd8dd34c512b
2021-03-01 20:04:31 -05:00
Peter Maydell 31013d5a8f target/arm: Fix SMLAD incorrect setting of Q bit
The SMLAD instruction is supposed to:
* signed multiply Rn[15:0] * Rm[15:0]
* signed multiply Rn[31:16] * Rm[31:16]
* perform a signed addition of the products and Ra
* set Rd to the low 32 bits of the theoretical
infinite-precision result
* set the Q flag if the sign-extension of Rd
would differ from the infinite-precision result
(ie on overflow)

Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q. This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1

The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.

Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.

Backports commit 5288145d716338ace0f83e3ff05c4d07715bb4f4
2021-03-01 19:58:39 -05:00
Peter Maydell 6cd06169ee target/arm: Make '-cpu max' have a 48-bit PA
QEMU supports a 48-bit physical address range, but we don't currently
expose it in the '-cpu max' ID registers (you get the same range as
Cortex-A57, which is 44 bits).

Set the ID_AA64MMFR0.PARange field to indicate 48 bits.

Backports d1b6b7017572e8d82f26eb827a1dba0e8cf3cae6
2021-03-01 19:50:28 -05:00
Richard Henderson c648361597 tcg: Remove TCG_TARGET_HAS_cmp_vec
The cmp_vec opcode is mandatory; this symbol is unused.

Backports cae5d53b9e72d7a1e43cebeb36471d77a16c6e43
2021-03-01 19:49:02 -05:00
Richard Henderson 45af31fcb4 tcg/optimize: Fold dup2_vec
When the two arguments are identical, this can be reduced to
dup_vec or to mov_vec from a tcg_constant_vec.

Backports commit 1dc4fe70128db05237a00eda6eb15e2b44deb31f
2021-03-01 19:46:14 -05:00
Richard Henderson 456fb66617 tcg: Fix generation of dupi_vec for 32-bit host
The definition of INDEX_op_dupi_vec is that it operates on
units of tcg_target_ulong -- in this case 32 bits. It does
not work to use this for a uint64_t value that happens to be
small enough to fit in tcg_target_ulong.

Backports a5b30d950c42b14bc9da24d1e68add6538d23336
2021-03-01 19:45:30 -05:00
Richard Henderson 578673be68 tcg/i386: Fix dupi for avx2 32-bit hosts
The previous change wrongly stated that 32-bit avx2 should have
used VPBROADCASTW. But that's a 16-bit broadcast and we want a
32-bit broadcast.

Backports f80d09b599a5e0fd7f44653f23b04104cb703f7a
2021-03-01 19:44:09 -05:00
Richard Henderson 50b3632ab4 tcg: Remove TCGOpDef.used
The last user of this field disappeared in f69d277ece4.
2021-03-01 19:43:37 -05:00
Richard Henderson 7813c57f9e tcg: Move some TCG_CT_* bits to TCGArgConstraint bitfields
These are easier to set and test when they have their own fields.
Reduce the size of alias_index and sort_index to 4 bits, which is
sufficient for TCG_MAX_OP_ARGS. This leaves only the bits indicating
constants within the ct field.

Move all initialization to allocation time, rather than init
individual fields in process_op_defs.

Backports bc2b17e6ea582ef3ade2bdca750de269c674c915
2021-03-01 19:41:34 -05:00
Richard Henderson 71a34d84e5 tcg: Remove TCG_CT_REG
This wasn't actually used for anything, really. All variable
operands must accept registers, and which are indicated by the
set in TCGArgConstraint.regs.

Backports commit 74a117906b87ff9220e4baae5a7431d6f4eadd45
2021-03-01 19:38:00 -05:00
Richard Henderson ae075d324d tcg: Move sorted_args into TCGArgConstraint.sort_index
This uses an existing hole in the TCGArgConstraint structure
and will be convenient for keeping the data in one place.

Backports 66792f90f14fef18b25a168922877a367ecdca05
2021-03-01 19:33:45 -05:00
Richard Henderson e3356f9bad tcg: Drop union from TCGArgConstraint
The union is unused; let "regs" appear in the main structure
without the "u.regs" wrapping.

Backports 9be0d08019465b38e2f1a605960961a491430c21
2021-03-01 19:29:19 -05:00
Richard Henderson 1551f6be9d tcg: Adjust simd_desc size encoding
With larger vector sizes, it turns out oprsz == maxsz, and we only
need to represent mismatch for oprsz <= 32. We do, however, need
to represent larger oprsz and do so without reducing SIMD_DATA_BITS.

Reduce the size of the oprsz field and increase the maxsz field.
Steal the oprsz value of 24 to indicate equality with maxsz.

Backports e2e7168a214b0ed98dc357bba96816486a289762
2021-03-01 19:23:37 -05:00
Richard Henderson 567fa21c65 target/arm: Fix SVE splice
While converting to gen_gvec_ool_zzzp, we lost passing
a->esz as the data argument to the function.

Backports commit dd701fafe55a78e655d4823d29226d92250a6b56
2021-03-01 19:20:44 -05:00
Richard Henderson ccb293911f target/arm: Fix sve ldr/str
The mte update missed a bit when producing clean addresses.

Fixes: b2aa8879b88

Backports d8227b098301935ea8e0e032e7d41e5dc3e97590
2021-03-01 19:20:04 -05:00
Peter Maydell 79feec40df target/arm: Make isar_feature_aa32_fp16_arith() handle M-profile
The M-profile definition of the MVFR1 ID register differs slightly
from the A-profile one, and in particular the check for "does the CPU
support fp16 arithmetic" is not the same.

We don't currently implement any M-profile CPUs with fp16 arithmetic,
so this is not yet a visible bug, but correcting the logic now
disarms this beartrap for when we eventually do.

Backports commit dfc523a84b06b6a4b583ed4c29d24fd980dd37a0
2021-03-01 19:17:23 -05:00
Peter Maydell 09a7d6381e target/arm: Move id_pfr0, id_pfr1 into ARMISARegisters
Move the id_pfr0 and id_pfr1 fields into the ARMISARegisters
sub-struct. We're going to want id_pfr1 for an isar_features
check, and moving both at the same time avoids an odd
inconsistency.

Changes other than the ones to cpu.h and kvm64.c made
automatically with:
perl -p -i -e 's/cpu->id_pfr/cpu->isar.id_pfr/' target/arm/*.c hw/intc/armv7m_nvic.c

Backports commit 8a130a7be6e222965641e1fd9469fd3ee752c7d4
2021-03-01 19:15:10 -05:00
Peter Maydell ed92f3c42b target/arm: Replace ARM_FEATURE_PXN with ID_MMFR0.VMSA check
The ARM_FEATURE_PXN bit indicates whether the CPU supports the PXN
bit in short-descriptor translation table format descriptors. This
is indicated by ID_MMFR0.VMSA being at least 0b0100. Replace the
feature bit with an ID register check, in line with our preference
for ID register checks over feature bits.

Backports commit 0ae0326b984e77a55c224b7863071bd3d8951231
2021-03-01 19:06:15 -05:00
Xiaoyao Li d9d68cc128 i386/cpu: Clear FEAT_XSAVE_COMP_{LO,HI} when XSAVE is not available
Per Intel SDM vol 1, 13.2, if CPUID.1:ECX.XSAVE[bit 26] is 0, the
processor provides no further enumeration through CPUID function 0DH.
QEMU does not do this for "-cpu host,-xsave".

Backports 19ca8285fcd61a8f60f2f44f789a561e0958e8e6
2021-03-01 19:04:03 -05:00
Richard Henderson 5e6196ea6b target/riscv: Set instance_align on RISCVCPU TypeInfo
Fix alignment of CPURISCVState.vreg.

Backports 5de5b99b3101a1648ed583193db8d92eea0c4545
2021-03-01 19:00:27 -05:00
Richard Henderson cdf40f7ff6 target/arm: Set instance_align on CPUARM TypeInfo
Fix alignment of CPUARMState.vfp.zregs.

Backports d03087bda4ba17076b430fd2af083020d7c5112a
2021-03-01 18:58:44 -05:00
Richard Henderson 86dd30850d qom: Allow objects to be allocated with increased alignment
It turns out that some hosts have a default malloc alignment less
than that required for vectors.

We assume that, with compiler annotation on CPUArchState, that we
can properly align the vector portion of the guest state. Fix the
alignment of the allocation by using qemu_memalloc when required.
2021-03-01 18:32:51 -05:00
Eduardo Habkost 6baafeafd4 qom: Correct object_class_dynamic_cast_assert() documentation
object_class_dynamic_cast_assert() is not used by
INTERFACE_CHECK, remove misleading mention of that function in
the documentation.
2021-03-01 18:29:34 -05:00
Aaron Lindsay 97702da7ad target/arm: Count PMU events when MDCR.SPME is set
This check was backwards when introduced in commit
033614c47de78409ad3fb39bb7bd1483b71c6789:

target/arm: Filter cycle counter based on PMCCFILTR_EL0

Backports commit db1f3afb17269cf2bd86c222e1bced748487ef71
2021-03-01 18:25:25 -05:00
Peter Maydell 16ad0d93d9 target/arm: Convert VCMLA, VCADD size field to MO_* in decode
The VCMLA and VCADD insns have a size field which is 0 for fp16
and 1 for fp32 (note that this is the reverse of the Neon 3-same
encoding!). Convert it to MO_* values in decode for consistency.

Backports d186a4854c04e9832907b0b4240a47731da20993
2021-03-01 18:23:34 -05:00
Peter Maydell 61abec1908 target/arm: Convert Neon VCVT fp size field to MO_* in decode
Convert the insns using the 2reg_vcvt and 2reg_vcvt_f16 formats
to pass the size through to the trans function as a MO_* value
rather than the '0==f32, 1==f16' used in the fp 3-same encodings.

Backports commit 0ae715c658a02af1834b63563c56112a6d8842cb
2021-03-01 18:20:11 -05:00
Peter Maydell 524b54bc7b target/arm: Convert Neon 3-same-fp size field to MO_* in decode
In the Neon instructions, some instruction formats have a 2-bit size
field which corresponds exactly to QEMU's MO_8/16/32/64. However the
floating-point insns in the 3-same group have a 1-bit size field
which is "0 for 32-bit float and 1 for 16-bit float". Currently we
pass these values directly through to trans_ functions, which means
that when reading a particular trans_ function you need to know if
that insn uses a 2-bit size or a 1-bit size.

Move the handling of the 1-bit size to the decodetree file, so that
all these insns consistently pass a size to the trans_ function which
is an MO_8/16/32/64 value.

In this commit we switch over the insns using the 3same_fp and
3same_fp_q0 formats.

Backports commit 6cf0f240e0b980a877abed12d2995f740eae6515
2021-03-01 18:15:18 -05:00
Richard Henderson cd79d2a915 tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.

Backports commit fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
2021-03-01 18:10:07 -05:00
Richard Henderson b478ce5052 tcg: Eliminate one store for in-place 128-bit dup_mem
Do not store back to the exact memory from which we just loaded.

Backports 6a17646176e011ddc463a2870a64c7aaccfe9c50
2021-03-01 18:06:17 -05:00
Stephen Long c9dc750058 tcg: Fix tcg gen for vectorized absolute value
The fallback inline expansion for vectorized absolute value,
when the host doesn't support such an insn was flawed.

E.g. when a vector of bytes has all elements negative, mask
will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1
to the low element instead of all elements becase -mask is 1
and not 0x0101_0101_0101_0101.

Backports commit e7e8f33fb603c3bfa0479d7d924f2ad676a84317
2021-03-01 18:04:46 -05:00
Eduardo Habkost cefb1666c0 arm: Fix typo in AARCH64_CPU_GET_CLASS definition
There's a typo in the type name of AARCH64_CPU_GET_CLASS. This
was never detected because the macro is not used by any code.

Backports 37e3d65043229bb20bd07af74dc0866e12071415
2021-03-01 18:03:29 -05:00
Peter Maydell ff74ede2fd target/arm: Enable FP16 in '-cpu max'
Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
that our "-cpu max" has v8.2-FP16.

Backports commit 5f07817eb94542e39a419baafa3026b15e8d33f7
2021-03-01 18:00:13 -05:00
Peter Maydell b948636c4a target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
and use this to implement fp16 support.

Backports fc8ae790311882afa3c7816df004daf978c40e9a
2021-03-01 17:57:36 -05:00
Peter Maydell 8c6affbca4 target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
Add gvec helpers for doing Neon-style indexed non-fused fp
multiply-and-accumulate operations.

Backports commit c50d8d144098a8261233ca31b47e3bc487e112fe
2021-03-01 17:52:31 -05:00
Peter Maydell 3cc3099e36 target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.

Backports commit d7ce81e553e6789bf27657105b32575668d60b1c
2021-03-01 17:48:42 -05:00
Peter Maydell 681218b4ab target/arm: Implement fp16 for Neon VRINTX
Convert the Neon VRINTX insn to use gvec, and use this to implement
fp16 support for it.

Backports 23afcdd2511f2a3dc05bed650d27bd25cf9b2a3c
2021-03-01 17:47:25 -05:00
Peter Maydell 53aba9d900 target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
and use this to implement the fp16 versions.

Backports 18725916b1438b54d6d6533980833d2251a20b7c
2021-03-01 17:44:49 -05:00
Peter Maydell eb4054d04f target/arm: Implement fp16 for Neon VCVT with rounding modes
Convert the Neon VCVT with-specified-rounding-mode instructions
to gvec, and use this to implement fp16 support for them.

Backports ca88a6efdf4ce96b646a896059f9bd324c2cebc4
2021-03-01 17:40:36 -05:00
Peter Maydell 56fe927d40 target/arm: Implement fp16 for Neon VCVT fixed-point
Implement fp16 for the Neon VCVT insns which convert between
float and fixed-point.

Backports 24018cf3990b692b51e50183c5fbd98d17b3fa40
2021-03-01 17:36:43 -05:00
Peter Maydell 948b01ad01 target/arm: Convert Neon VCVT fixed-point to gvec
Convert the Neon VCVT float<->fixed-point insns to a
gvec style, in preparation for adding fp16 support.

Backports 7b959c5890deb9a6d71bc6800006a0eae0a84c60
2021-03-01 17:33:20 -05:00
Peter Maydell c324c6817e target/arm: Implement fp16 for Neon float-integer VCVT
Convert the Neon float-integer VCVT insns to gvec, and use this
to implement fp16 support for them.

Note that unlike the VFP int<->fp16 VCVT insns we converted
earlier and which convert to/from a 32-bit integer, these
Neon insns convert to/from 16-bit integers. So we can use
the existing vfp conversion helpers for the f32<->u32/i32
case but need to provide our own for f16<->u16/i16.

Backports 7782a9afec81d1efe23572135c1ed777691ccde5
2021-03-01 17:29:02 -05:00
Peter Maydell 82f4a7e135 target/arm: Implement fp16 for Neon pairwise fp ops
Convert the Neon pairwise fp ops to use a single gvic-style
helper to do the full operation instead of one helper call
for each 32-bit part. This allows us to use the same
framework to implement the fp16.

Backports 1dc587ee9bfe804406eb3e0bacf47a80644d8abc
2021-03-01 17:25:19 -05:00
Peter Maydell b08ea84374 target/arm: Implement fp16 for Neon VRSQRTS
Convert the Neon VRSQRTS insn to using a gvec helper,
and use this to implement the fp16 case.

As with VRECPS, we adjust the phrasing of the new implementation
slightly so that the fp32 version parallels the fp16 one.

Backports 40fde72dda2da8d55b820fa6c5efd85814be2023
2021-03-01 17:20:22 -05:00
Peter Maydell f4ebbba9fd target/arm: Implement fp16 for Neon VRECPS
Convert the Neon VRECPS insn to using a gvec helper, and
use this to implement the fp16 case.

The phrasing of the new float32_recps_nf() is slightly different from
the old recps_f32() so that it parallels the f16 version; for f16 we
can't assume that flush-to-zero is always enabled.

Backports ac8c62c4e5a3f24e6d47f52ec1bfb20994caefa5
2021-03-01 17:09:16 -05:00
Peter Maydell 5776c594e4 target/arm: Implement fp16 for Neon fp compare-vs-0
Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
implement the fp16 case.

Backport 635187aaa92f21ab001e2868e803b3c5460261ca
2021-03-01 17:05:03 -05:00
Peter Maydell 8de258c3cb target/arm: Implement fp16 for Neon VFMA, VMFS
Convert the neon floating-point vector operations VFMA and VFMS
to use a gvec helper, and use this to implement the fp16 case.

This is the last use of do_3same_fp() so we can now delete
that function.

Backports commit cf722d75b329ef3f86b869e7e68cbfb1607b3bde
2021-03-01 17:00:49 -05:00
Peter Maydell 587c3549b7 target/arm: Implement fp16 for Neon VMLA, VMLS operations
Convert the Neon floating-point VMLA and VMLS insns over to using a
gvec helper, and use this to implement the fp16 case.

Backports e5adc70665ecaf4009c2fb8d66775ea718a85abd
2021-03-01 16:57:20 -05:00
Peter Maydell 0068d12355 target/arm: Implement fp16 for Neon VMAXNM, VMINNM
Convert the Neon floating point VMAXNM and VMINNM insns to
using a gvec helper and use this to implement the fp16 case.

Backports e22705bb941d82d6c2a09e8b2031084326902be3
2021-03-01 16:53:57 -05:00
Peter Maydell 465cfb54c4 target/arm: Implement fp16 for Neon VMAX, VMIN
Convert the Neon float-point VMAX and VMIN insns over to using
a gvec helper, and use this to implement the fp16 case.

Backport e43268c54b6cbcb197d179409df7126e81f8cd52
2021-03-01 16:50:23 -05:00
Peter Maydell 6dd4a8e93f target/arm: Implement fp16 for VACGE, VACGT
Convert the neon floating-point vector absolute comparison ops
VACGE and VACGT over to using a gvec hepler and use this to
implement the fp16 case.

Backports bb2741da186ebaebc7d5189372be4401e1ff9972
2021-03-01 16:47:44 -05:00
Peter Maydell 4eb39f1b2f target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
Convert the Neon floating-point vector comparison ops VCEQ,
VCGE and VCGT over to using a gvec helper and use this to
implement the fp16 case.

(We put the float16_ceq() etc functions above the DO_2OP()
macro definition because later when we convert the
compare-against-zero instructions we'll want their
definitions to be visible at that point in the source file.)

Backports ad505db233b89b7fd4b5a98b6f0e8ac8d05b11db
2021-03-01 16:44:34 -05:00
Peter Maydell 0e8fd4cd0c target/arm: Implement fp16 for Neon VABS, VNEG of floats
Rewrite Neon VABS/VNEG of floats to use gvec logical AND and XOR, so
that we can implement the fp16 version of the insns.

Backport 2b70d8cd09f5450c15788acd24f6f8bc4116c395
2021-03-01 16:40:33 -05:00
Peter Maydell 6c71951d54 target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
We already have gvec helpers for floating point VRECPE and
VRQSRTE, so convert the Neon decoder to use them and
add the fp16 support.

Backports 4a15d9a3b39d4d161d7e03dfcf52e9f214eef0b8
2021-03-01 16:35:04 -05:00
Peter Maydell 4850377f01 target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
macro: VADD, VSUB, VABD, VMUL.

For VABD this requires us to implement a new gvec_fabd_h helper
using the machinery we have already for the other helpers.

Backport e4a6d4a69e239becfd83bdcd996476e7b8e1138d
2021-03-01 16:31:54 -05:00
Peter Maydell 08b70267d0 target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
Implement the VFP fp16 variant of VMOV that transfers a 16-bit
value between a general purpose register and a VFP register.

Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later
only we have no need to replicate the old "updates CPSR.NZCV"
behaviour that the singleprec version of this insn does

Backports commit 46a4b854525cb9f34a611f6ada6cdff1eab0ac2d
2021-03-01 16:26:34 -05:00
Peter Maydell 58485bca97 target/arm: Implement new VFP fp16 insn VMOVX
The fp16 extension includes a new instruction VMOVX, which copies the
upper 16 bits of a 32-bit source VFP register into the lower 16
bits of the destination and zeroes the high half of the destination.
Implement it.

Backports f61e5c43b86907dea17f431b528d806659d62bcb
2021-03-01 16:24:50 -05:00
Peter Maydell 3dd587e3df target/arm: Implement new VFP fp16 insn VINS
The fp16 extension includes a new instruction VINS, which copies the
lower 16 bits of a 32-bit source VFP register into the upper 16 bits
of the destination. Implement it.

Backports commit e4875e3bcc3a9c54d7e074c8f51e04c2e6364e2e
2021-03-01 16:22:27 -05:00
Peter Maydell 90aa9647e0 target/arm: Implement VFP fp16 VRINT*
Implement the fp16 version of the VFP VRINT* insns.

Backports 0a6f4b4cb338665b81ad824d9a6868932461b7f7
2021-03-01 16:15:21 -05:00
Peter Maydell 1c8088b48a target/arm: Implement VFP fp16 VSEL
Implement the fp16 versions of the VFP VSEL instruction.

Backports commit 11e78fecdf2d605cfed33aa09bbcf0cc4fb95886
2021-03-01 16:08:51 -05:00
Peter Maydell beee4ad7f3 target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
Implement the fp16 versions of the VFP VCVT instruction forms
which convert between floating point and integer with a specified
rounding mode.

Backports c505bc6a9d50a48f9d89d6cf930e863838a5b367
2021-02-28 05:18:07 -05:00
Peter Maydell 74a6af4e23 target/arm: Implement VFP fp16 VCVT between float and fixed-point
Implement the fp16 versions of the VFP VCVT instruction forms which
convert between floating point and fixed-point.

Backports a149e2de0b63e3906729ed1d3df7d9ecdb6de5e6
2021-02-28 05:15:40 -05:00