By shifting the 16-bit input left by 16, we can align the desired
portion of the 48-bit product and use tcg_gen_muls2_i32.
Backports commit 485b607d4f393e0de92c922806a68aef22340c98 from qemu
Since all of the inputs and outputs are i32, dispense with
the intermediate promotion to i64 and use tcg_gen_add2_i32.
Backports commit ea96b374641bc429269096d88d4e91ee544273e9 from qemu
Since all of the inputs and outputs are i32, dispense with
the intermediate promotion to i64 and use tcg_gen_mulu2_i32
and tcg_gen_add2_i32.
Backports commit 2409d56454f0d028619fb1002eda86bf240906dd from qemu
Convert the modified immediate form of the data processing insns.
For A32, we can finally remove any code that was intertwined with
the register and register-shifted-register forms.
Backports commit 581c6ebd17c8f56ad52772216e6c6d8cc8997e8b from qemu
Convert the register shifted by register form of the data
processing insns. For A32, we cannot yet remove any code
because the legacy decoder intertwines the immediate form.
Backports commit 5be2c12337f4cbdbda4efe6ab485350f730faaad from qemu
Convert the register shifted by immediate form of the data
processing insns. For A32, we cannot yet remove any code
because the legacy decoder intertwines the reg-shifted-reg
and immediate forms.
Backports commit 25ae32c558182c07fc6ad01b936e9151cbf00c44 from qemu
Add the infrastructure that will become the new decoder.
No instructions adjusted so far.
Backports commit 51409b9e8cfe997b1ac3365df7400e0c6e844437 from qemu
This function already includes the test for an interworking write
to PC from a load. Change the T32 LDM implementation to match the
A32 LDM implementation.
For LDM, the reordering of the tests does not change valid
behaviour because the only case that differs is has rn == 15,
which is UNPREDICTABLE.
Backports commit 69be3e13764111737e1a7a13bb0c231e4d5be756 from qemu
The previous simplification got the order of operands to the
subtraction wrong. Since the 64-bit product is the subtrahend,
we must use a 64-bit subtract to properly compute the borrow
from the low-part of the product.
Fixes: 5f8cd06ebcf5 ("target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR")
Backports commit e0a0c8322b8ebcdad674f443a3e86db8708d6738 from qemu
The translation table walk for an ATS instruction can result in
various faults. In general these are just reported back via the
PAR_EL1 fault status fields, but in some cases the architecture
requires that the fault is turned into an exception:
* synchronous stage 2 faults of any kind during AT S1E0* and
AT S1E1* instructions executed from NS EL1 fault to EL2 or EL3
* synchronous external aborts are taken as Data Abort exceptions
(This is documented in the v8A Arm ARM DDI0487A.e D5.2.11 and
G5.13.4.)
Backports commit 0710b2fa84a4aeb925422e1e88edac49ed407c79 from qemu
Currently the only part of an ARMCPRegInfo which is allowed to cause
a CPU exception is the access function, which returns a value indicating
that some flavour of UNDEF should be generated.
For the ATS system instructions, we would like to conditionally
generate exceptions as part of the writefn, because some faults
during the page table walk (like external aborts) should cause
an exception to be raised rather than returning a value.
There are several ways we could do this:
* plumb the GETPC() value from the top level set_cp_reg/get_cp_reg
helper functions through into the readfn and writefn hooks
* add extra readfn_with_ra/writefn_with_ra hooks that take the GETPC()
value
* require the ATS instructions to provide a dummy accessfn,
which serves no purpose except to cause the code generation
to emit TCG ops to sync the CPU state
* add an ARM_CP_ flag to mark the ARMCPRegInfo as possibly
throwing an exception in its read/write hooks, and make the
codegen sync the CPU state before calling the hooks if the
flag is set
This patch opts for the last of these, as it is fairly simple
to implement and doesn't require invasive changes like updating
the readfn/writefn hook function prototype signature.
Backports commit 37ff584c15bc3e1dd2c26b1998f00ff87189538c from qemu
Make this a static function private to translate.c.
Thus we can use the same idiom between aarch64 and aarch32
without actually sharing function implementations.
Backports commit 1ce21ba1eaf08b22da5925f3e37fc0b4322da858 from qemu
Despite the fact that the text for the call to gen_exception_insn
is identical for aarch64 and aarch32, the implementation inside
gen_exception_insn is totally different.
This fixes exceptions raised from aarch64.
This reverts commit fb2d3c9a9a.
Separate shift + extract low will result in one extra insn
for hosts like RISC-V, MIPS, and Sparc.
Backports commit 664b7e3b97d6376f3329986c465b3782458b0f8b from qemu
All of the inputs to these instructions are 32-bits. Rather than
extend each input to 64-bits and then extract the high 32-bits of
the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.
Backports commit 5f8cd06ebcf57420be8fea4574de2e074de46709 from qemu
Rotate is the more compact and obvious way to swap 16-bit
elements of a 32-bit word.
Backports commit adefba76e8bf10dfb342094d2f5debfeedb1a74d from qemu
The helper function is more documentary, and also already
handles the case of rotate by zero.
Backports commit dd861b3f29be97a9e3cdb9769dcbc0c7d7825185 from qemu
The immediate shift generator functions already test for,
and eliminate, the case of a shift by zero.
Backports commit 464eaa9571fae5867d9aea7d7209c091c8a50223 from qemu
Unless we're guaranteed to always increase ARM_MAX_VQ by a multiple of
four, then we should use DIV_ROUND_UP to ensure we get an appropriate
array size.
Backports commit 46417784d21c89446763f2047228977bdc267895 from qemu
The current implementation of ZCR_ELx matches the architecture, only
implementing the lower four bits, with the rest RAZ/WI. This puts
a strict limit on ARM_MAX_VQ of 16. Make sure we don't let ARM_MAX_VQ
grow without a corresponding update here.
Backports commit 7b351d98709d3f77d6bb18562e1bf228862b0d57 from qemu
Replace x = double_saturate(y) with x = add_saturate(y, y).
There is no need for a separate more specialized helper.
Backports commit 640581a06d14e2d0d3c3ba79b916de6bc43578b0 from qemu
Promote this function from aarch64 to fully general use.
Use it to unify the code sequences for generating illegal
opcode exceptions.
Backports commit 3cb36637157088892e9e33ddb1034bffd1251d3b from qemu
Unlike the other more generic gen_exception{,_internal}_insn
interfaces, breakpoints always refer to the current instruction.
Backports commit 06bcbda3f64d464b6ecac789bce4bd69f199cd68 from qemu
The offset is variable depending on the instruction set.
Passing in the actual value is clearer in intent.
Backpors commit aee828e7541a5895669ade3a4b6978382b6b094a from qemu
We must update s->base.pc_next when we return from the translate_insn
hook to the main translator loop. By incrementing s->base.pc_next
immediately after reading the insn word, "pc_next" contains the address
of the next instruction throughout translation.
All remaining uses of s->pc are referencing the address of the next insn,
so this is now a simple global replacement. Remove the "s->pc" field.
Backports commit a04159166b880b505ccadc16f2fe84169806883d from qemu
Provide a common routine for the places that require ALIGN(PC, 4)
as the base address as opposed to plain PC. The two are always
the same for A32, but the difference is meaningful for thumb mode.
Backports commit 16e0d8234ef9291747332d2c431e46808a060472 from qemu
We currently have 3 different ways of computing the architectural
value of "PC" as seen in the ARM ARM.
The value of s->pc has been incremented past the current insn,
but that is all. Thus for a32, PC = s->pc + 4; for t32, PC = s->pc;
for t16, PC = s->pc + 2. These differing computations make it
impossible at present to unify the various code paths.
With the newly introduced s->pc_curr, we can compute the correct
value for all cases, using the formula given in the ARM ARM.
This changes the behaviour for load_reg() and load_reg_var()
when called with reg==15 from a 32-bit Thumb instruction:
previously they would have returned the incorrect value
of pc_curr + 6, and now they will return the architecturally
correct value of PC, which is pc_curr + 4. This will not
affect well-behaved guest software, because all of the places
we call these functions from T32 code are instructions where
using r15 is UNPREDICTABLE. Using the architectural PC value
here is more consistent with the T16 and A32 behaviour.
Backports commit fdbcf6329d0c2984c55d7019419a72bf8e583c36 from qemu
Add a new field to retain the address of the instruction currently
being translated. The 32-bit uses are all within subroutines used
by a32 and t32. This will become less obvious when t16 support is
merged with a32+t32, and having a clear definition will help.
Convert aarch64 as well for consistency. Note that there is one
instance of a pre-assert fprintf that used the wrong value for the
address of the current instruction.
Backports commit 43722a6d4f0c92f7e7e1e291580039b0f9789df1 from qemu
This function is used in two different contexts, and it will be
clearer if the function is given the address to which it applies.
Backports commit 331b1ca616cb708db30dab68e3262d286e687f24 from qemu
When generating an architectural single-step exception we were
routing it to the "default exception level", which is to say
the same exception level we execute at except that EL0 exceptions
go to EL1. This is incorrect because the debug exception level
can be configured by the guest for situations such as single
stepping of EL0 and EL1 code by EL2.
We have to track the target debug exception level in the TB
flags, because it is dependent on CPU state like HCR_EL2.TGE
and MDCR_EL2.TDE. (That we were previously calling the
arm_debug_target_el() function to determine dc->ss_same_el
is itself a bug, though one that would only have manifested
as incorrect syndrome information.) Since we are out of TB
flag bits unless we want to expand into the cs_base field,
we share some bits with the M-profile only HANDLER and
STACKCHECK bits, since only A-profile has this singlestep.
Fixes: https://bugs.launchpad.net/qemu/+bug/1838913
Backports commit 8bd587c1066f4456ddfe611b571d9439a947d74c from qemu
Factor out code to 'generate a singlestep exception', which is
currently repeated in four places.
To do this we need to also pull the identical copies of the
gen-exception() function out of translate-a64.c and translate.c
into translate.h.
(There is a bug in the code: we're taking the exception to the wrong
target EL. This will be simpler to fix if there's only one place to
do it.)
Backports commit c1d5f50f094ab204accfacc2ee6aafc9601dd5c4 from qemu
While most features are now detected by probing the ID_* registers
kernels can (and do) use MIDR_EL1 for working out of they have to
apply errata. This can trip up warnings in the kernel as it tries to
work out if it should apply workarounds to features that don't
actually exist in the reported CPU type.
Avoid this problem by synthesising our own MIDR value.
Backports commit 2bd5f41c00686a1f847a60824d0375f3df2c26bf from qemu
rt==15 is a special case when reading the flags: it means the
destination is APSR. This patch avoids rejecting vmrs apsr_nzcv, fpscr
as illegal instruction.
Backports commit cdc6896659b85f7ed8f7552850312e55170de0c5 from qemu
An attempt to do an exception-return (branch to one of the magic
addresses) in linux-user mode for M-profile should behave like
a normal branch, because linux-user mode is always going to be
in 'handler' mode. This used to work, but we broke it when we added
support for the M-profile security extension in commit d02a8698d7ae2bfed.
In that commit we allowed even handler-mode calls to magic return
values to be checked for and dealt with by causing an
EXCP_EXCEPTION_EXIT exception to be taken, because this is
needed for the FNC_RETURN return-from-non-secure-function-call
handling. For system mode we added a check in do_v7m_exception_exit()
to make any spurious calls from Handler mode behave correctly, but
forgot that linux-user mode would also be affected.
How an attempted return-from-non-secure-function-call in linux-user
mode should be handled is not clear -- on real hardware it would
result in return to secure code (not to the Linux kernel) which
could then handle the error in any way it chose. For QEMU we take
the simple approach of treating this erroneous return the same way
it would be handled on a CPU without the security extensions --
treat it as a normal branch.
The upshot of all this is that for linux-user mode we should never
do any of the bx_excret magic, so the code change is simple.
This ought to be a weird corner case that only affects broken guest
code (because Linux user processes should never be attempting to do
exception returns or NS function returns), except that the code that
assigns addresses in RAM for the process and stack in our linux-user
code does not attempt to avoid this magic address range, so
legitimate code attempting to return to a trampoline routine on the
stack can fall into this case. This change fixes those programs,
but we should also look at restricting the range of memory we
use for M-profile linux-user guests to the area that would be
real RAM in hardware.
Backports commit 9027d3fba605d8f6093342ebe4a1da450d374630 from qemu
The function neon_store_reg32() doesn't free the TCG temp that it
is passed, so the caller must do that. We got this right in most
places but forgot to free the TCG temps in trans_VMOV_64_sp().
Backports commit 38fb634853ac6547326d9f88b9a068d9fc6b4ad4 from qemu
In Arm v8.0 M-profile CPUs without the Security Extension and also in
v7M CPUs, there is no NSACR register. However, the code we have to handle
the FPU does not always check whether the ARM_FEATURE_M_SECURITY bit
is set before testing whether env->v7m.nsacr permits access to the
FPU. This means that for a CPU with an FPU but without the Security
Extension we would always take a bogus fault when trying to stack
the FPU registers on an exception entry.
We could fix this by adding extra feature bit checks for all uses,
but it is simpler to just make the internal value of nsacr 0xcff
("all non-secure accesses allowed"), since this is not guest
visible when the Security Extension is not present. This allows
us to continue to follow the Arm ARM pseudocode which takes a
similar approach. (In particular, in the v8.1 Arm ARM the register
is documented as reading as 0xcff in this configuration.)
Fixes: https://bugs.launchpad.net/qemu/+bug/1838475
Backports commit 02ac2f7f613b47f6a5b397b20ab0e6b2e7fb89fa from qemu
Most Arm architectural debug exceptions (eg watchpoints) are ignored
if the configured "debug exception level" is below the current
exception level (so for example EL1 can't arrange to get debug exceptions
for EL2 execution). Exceptions generated by the BRK or BPKT instructions
are a special case -- they must always cause an exception, so if
we're executing above the debug exception level then we
must take them to the current exception level.
This fixes a bug where executing BRK at EL2 could result in an
exception being taken at EL1 (which is strictly forbidden by the
architecture).
Fixes: https://bugs.launchpad.net/qemu/+bug/1838277
Backports commit 987a23224218fa3bb3aa0024ad236dcf29ebde9e from qemu
In arm_cpu_realizefn() we make several assertions about the values of
guest ID registers:
* if the CPU provides AArch32 v7VE or better it must advertise the
ARM_DIV feature
* if the CPU provides AArch32 A-profile v6 or better it must
advertise the Jazelle feature
These are essentially consistency checks that our ID register
specifications in cpu.c didn't accidentally miss out a feature,
because increasingly the TCG emulation gates features on the values
in ID registers rather than using old-style checks of ARM_FEATURE_FOO
bits.
Unfortunately, these asserts can cause problems if we're running KVM,
because in that case we don't control the values of the ID registers
-- we read them from the host kernel. In particular, if the host
kernel is older than 4.15 then it doesn't expose the ID registers via
the KVM_GET_ONE_REG ioctl, and we set up dummy values for some
registers and leave the rest at zero. (See the comment in
target/arm/kvm64.c kvm_arm_get_host_cpu_features().) This set of
dummy values is not sufficient to pass our assertions, and so on
those kernels running an AArch32 guest on AArch64 will assert.
We could provide a more sophisticated set of dummy ID registers in
this case, but that still leaves the possibility of a host CPU which
reports bogus ID register values that would cause us to assert. It's
more robust to only do these ID register checks if we're using TCG,
as that is the only case where this is truly a QEMU code bug.
Backports commit 8f4821d77e465bc2ef77302d47640d5a43d92b30 from qemu
Reported by GCC9 when building with -Wimplicit-fallthrough=2:
target/arm/helper.c: In function ‘arm_cpu_do_interrupt_aarch32_hyp’:
target/arm/helper.c:7958:14: error: this statement may fall through [-Werror=implicit-fallthrough=]
7958 | addr = 0x14;
| ~~~~~^~~~~~
target/arm/helper.c:7959:5: note: here
7959 | default:
| ^~~~~~~
cc1: all warnings being treated as errors
Backports commit 9bbb4ef991fa93323f87769a6e217c2b9273a128 from qemu
In the M-profile architecture, when we do a vector table fetch and it
fails, we need to report a HardFault. Whether this is a Secure HF or
a NonSecure HF depends on several things. If AIRCR.BFHFNMINS is 0
then HF is always Secure, because there is no NonSecure HardFault.
Otherwise, the answer depends on whether the 'underlying exception'
(MemManage, BusFault, SecureFault) targets Secure or NonSecure. (In
the pseudocode, this is handled in the Vector() function: the final
exc.isSecure is calculated by looking at the exc.isSecure from the
exception returned from the memory access, not the isSecure input
argument.)
We weren't doing this correctly, because we were looking at
the target security domain of the exception we were trying to
load the vector table entry for. This produces errors of two kinds:
* a load from the NS vector table which hits the "NS access
to S memory" SecureFault should end up as a Secure HardFault,
but we were raising an NS HardFault
* a load from the S vector table which causes a BusFault
should raise an NS HardFault if BFHFNMINS == 1 (because
in that case all BusFaults are NonSecure), but we were raising
a Secure HardFault
Correct the logic.
We also fix a comment error where we claimed that we might
be escalating MemManage to HardFault, and forgot about SecureFault.
(Vector loads can never hit MPU access faults, because they're
always aligned and always use the default address map.)
Backports commit 51c9122e92b776a3f16af0b9282f1dc5012e2a19 from qemu
The ARMv5 architecture didn't specify detailed per-feature ID
registers. Now that we're using the MVFR0 register fields to
gate the existence of VFP instructions, we need to set up
the correct values in the cpu->isar structure so that we still
provide an FPU to the guest.
This fixes a regression in the arm926 and arm1026 CPUs, which
are the only ones that both have VFP and are ARMv5 or earlier.
This regression was introduced by the VFP refactoring, and more
specifically by commits 1120827fa182f0e76 and 266bd25c485597c,
which accidentally disabled VFP short-vector support and
double-precision support on these CPUs.
Backports commit cb7cef8b32033f6284a47d797edd5c19c5491698 from qemu
When we converted to using feature bits in 602f6e42cfbf we missed out
the fact (dp && arm_dc_feature(s, ARM_FEATURE_V8)) was supported for
-cpu max configurations. This caused a regression in the GCC test
suite. Fix this by setting the appropriate bits in mvfr1.FPHP to
report ARMv8-A with FP support (but not ARMv8.2-FP16).
Fixes: https://bugs.launchpad.net/qemu/+bug/1836078
Backports commit 45b1a243b81a7c9ae56235937280711dd9914ca7 from qemu
In commit e9d652824b0 we extracted the vfp_set_fpscr_to_host()
function but failed at calling it in the correct place, we call
it after xregs[ARM_VFP_FPSCR] is modified.
Fix by calling this function before we update FPSCR.
Backports commit 85795187f416326f87177cabc39fae1911f04c50 from qemu
Off by one error in the EL2 and EL3 tests. Remove the test
against EL3 entirely, since it must always be true.
Backports commit 6a02a73211c5bc634fccd652777230954b83ccba from qemu
Coverity points out (CID 1402195) that the loop in trans_VMOV_imm_dp()
that iterates over the destination registers in a short-vector VMOV
accidentally throws away the returned updated register number
from vfp_advance_dreg(). Add the missing assignment. (We got this
correct in trans_VMOV_imm_sp().)
Backports commit 89a11ff756410aecb87d2c774df6e45dbf4105c1 from qemu
Thumb instructions in an IT block are set up to be conditionally
executed depending on a set of condition bits encoded into the IT
bits of the CPSR/XPSR. The architecture specifies that if the
condition bits are 0b1111 this means "always execute" (like 0b1110),
not "never execute"; we were treating it as "never execute". (See
the ConditionHolds() pseudocode in both the A-profile and M-profile
Arm ARM.)
This is a bit of an obscure corner case, because the only legal
way to get to an 0b1111 set of condbits is to do an exception
return which sets the XPSR/CPSR up that way. An IT instruction
which encodes a condition sequence that would include an 0b1111 is
UNPREDICTABLE, and for v8A the CONSTRAINED UNPREDICTABLE choices
for such an IT insn are to NOP, UNDEF, or treat 0b1111 like 0b1110.
Add a comment noting that we take the latter option.
Backports commit 5529de1e5512c05276825fa8b922147663fd6eac from qemu
In the various helper functions for v7M/v8M instructions, use
the _ra versions of cpu_stl_data() and friends. Otherwise we
may get wrong behaviour or an assert() due to not being able
to locate the TB if there is an exception on the memory access
or if it performs an IO operation when in icount mode
Backports commit 2884fbb60412049ec92389039ae716b32057382e from qemu
In preparation for supporting TCG disablement on ARM, we move most
of TCG related v7m/v8m helpers and APIs into their own file.
Note: It is easier to review this commit using the 'histogram'
diff algorithm:
$ git diff --diff-algorithm=histogram ...
or
$ git diff --histogram ...
Backports commit 7aab5a8c8bb525ea390b4ebc17ab82c0835cfdb6 from qemu
Semihosting hooks either SVC or HLT instructions, and inside KVM
both of those go to EL1, ie to the guest, and can't be trapped to
KVM.
Let check_for_semihosting() return False when not running on TCG.
backports commit 91f78c58da9ba78c8ed00f5d822b701765be8499 from qemu
In the next commit we will split the M-profile functions from this
file. Some function will be called out of helper.c. Declare them in
the "internals.h" header.
Backports commit 787a7e76c2e93a48c47b324fea592c9910a70483 from qemu
This code is specific to the SoftFloat floating-point
implementation, which is only used by TCG.
Backports commit 4a15527c9feecfd2fa2807d5e698abbc19feb35f from qemu
The vfp_set_fpscr() helper contains code specific to the host
floating point implementation (here the SoftFloat library).
Extract this code to vfp_set_fpscr_from_host().
Backports commit 0c6ad94809b37a1f0f1f75d3cd0e4a24fb77e65c from qemu
The vfp_set_fpscr() helper contains code specific to the host
floating point implementation (here the SoftFloat library).
Extract this code to vfp_set_fpscr_to_host().
Backports commit e9d652824b05845f143ef4797d707fae47d4b3ed from qemu
To ease the review of the next commit,
move the vfp_exceptbits_to_host() function directly after
vfp_exceptbits_from_host(). Amusingly the diff shows we
are moving vfp_get_fpscr().
Backports commit 20e62dd8c831c9065ed4a8e64813c93ad61c50d7 from qemu.
These routines are TCG specific.
The arm_deliver_fault() function is only used within the new
helper. Make it static.
Backports commit e21b551cb652663f2f2405a64d63ef6b4a1042b7 from qemu
In the next commit we will split the TLB related routines of
this file, and this function will also be called in the new
file. Declare it in the "internals.h" header.
Backports commit ebae861fc6c385a7bcac72dde4716be06e6776f1 from qemu
Those helpers are a software implementation of the ARM v8 memory zeroing
op code. They should be moved to the op helper file, which is going to
eventually be built only when TCG is enabled.
Backports commit 6cdca173ef81a9dbcee9e142f1a5a34ad9c44b75 from qemu
Since commit 8c06fbdf36b checkpatch.pl enforce a new multiline
comment syntax. Since we'll move this code around, fix its style
first.
Backports commit 9a223097e44d5320f5e0546710263f22d11f12fc from qemu
Group ARM objects together, TCG related ones at the bottom.
This will help when restricting TCG-only objects.
Backports commit 07774d584267488c8c2f104ae5b552791961908a from qemu
Group Aarch64 rules together, TCG related ones at the bottom.
This will help when restricting TCG-only objects.
Backports commit 87f4f183484dba7a460c59e99dac0dbb9f42ed87 from qemu
In commit 1120827fa182f0e7622 we accidentally put the
"UNDEF unless FPU has double-precision support" check in
the single-precision VFM function. Put it in the dp
function where it belongs.
Backports commit 34bea4edb9bbe8edf4b8606276482acdff5ca58b from qemu
The architecture permits FPUs which have only single-precision
support, not double-precision; Cortex-M4 and Cortex-M33 are
both like that. Add the necessary checks on the MVFR0 FPDP
field so that we UNDEF any double-precision instructions on
CPUs like this.
Note that even if FPDP==0 the insns like VMOV-to/from-gpreg,
VLDM/VSTM, VLDR/VSTR which take double precision registers
still exist.
Backports commit 1120827fa182f0e76226df7ffe7a86598d1df54f from qemu
In several places cut and paste errors meant we were using the wrong
type for the 'arg' struct in trans_ functions called by the
decodetree decoder, because we were using the _sp version of the
struct in the _dp function. These were harmless, because the two
structs were identical and so decodetree made them typedefs of the
same underlying structure (and we'd have had a compile error if they
were not harmless), but we should clean them up anyway.
Backports commit 83655223ac6143a563e981906ce13fd6f2cfbefd from qemu
Remove the now unused TCG globals cpu_F0s, cpu_F0d, cpu_F1s, cpu_F1d.
cpu_M0 is still used by the iwmmxt code, and cpu_V0 and
cpu_V1 are used by both iwmmxt and Neon.
Backports commit d9eea52c67c04c58ecceba6ffe5a93d1d02051fa from qemu
Remove some old constructns from NEON_2RM_VCVT_F16_F32 code:
* don't use CPU_F0s
* don't use tcg_gen_st_f32
Backports commit b66f6b9981004bbf120b8d17c20f92785179bdf2 from qemu
Remove some old constructs from NEON_2RM_VCVT_F16_F32 code:
* don't use cpu_F0s
* don't use tcg_gen_ld_f32
Backports commit 58f2682eee738e8890f9cfe858e0f4f68b00d45d from qemu
Stop using cpu_F0s for the Neon f32/s32 VCVT operations.
Since this is the last user of cpu_F0s in the Neon 2rm-op
loop, we can remove the handling code for it too.
Backports commit 60737ed5785b9c1c6f1c85575dfdd1e9eec91878 from qemu
Where Neon instructions are floating point operations, we
mostly use the old VFP utility functions like gen_vfp_abs()
which work on the TCG globals cpu_F0s and cpu_F1s. The
Neon for-each-element loop conditionally loads the inputs
into either a plain old TCG temporary for most operations
or into cpu_F0s for float operations, and similarly stores
back either cpu_F0s or the temporary.
Switch NEON_2RM_VABS_F away from using cpu_F0s, and
update neon_2rm_is_float_op() accordingly.
Backports commit fd8a68cdcf81d70eebf866a132e9780d4108da9c from qemu
The AArch32 VMOV (immediate) instruction uses the same VFP encoded
immediate format we already handle in vfp_expand_imm(). Use that
function rather than hand-decoding it.
Backports commit 9bee50b498410ed6466018b26464d7384c7879e9 from qemu
We want to use vfp_expand_imm() in the AArch32 VFP decode;
move it from the a64-only header/source file to the
AArch32 one (which is always compiled even for AArch64).
Backports commit d6a092d479333b5f20a647a912a31b0102d37335 from qemu
For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.
Unfortunately our calculation of the "increment" part of this
was incorrect:
vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.
This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.
Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations
Backports commit 18cf951af9a27ae573a6fa17f9d0c103f7b7679b from qemu
Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.
Backports commit 3111bfc2da6ba0c8396dc97ca479942d711c6146 from qemu
Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.
Backports commit e3d6f4290c788e850c64815f0b3e331600a4bcc0 from qemu
Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.
These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.
Backports commit e25155f55dc4abb427a88dfe58bbbc550fe7d643 from qemu
Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.
Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.
Backports commit cdfd14e86ab0b1ca29a702d13a8e4af2e902a9bf from qemu
Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.
Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.
Backports commit b623d803dda805f07aadcbf098961fde27315c19 from qemu
Convert the VFP comparison instructions to decodetree.
Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations. This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT. (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)
Backports commit 386bba2368842fc74388a3c1651c6c0c0c70adbd from qemu
Convert the VFP VABS instruction to decodetree.
Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.
Backports commit 90287e22c987e9840704345ed33d237cbe759dd9 from qemu
Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.
Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.
Backports commit d4893b01d23060845ee3855bc96626e16aad9ab5 from qemu
Convert the VFP VMLA instruction to decodetree.
This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.
We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)
Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
vmla.f64 d10, d1, d16 with length 2 stride 2
is equivalent to the pair of scalar operations
vmla.f64 d10, d1, d16
vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.
Backports commit 266bd25c485597c94209bfdb3891c1d0c573c164 from qemu
Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.
Backports commit 3993d0407dff7233e42f2251db971e126a0497e9 from qemu
Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.
This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.
Backports commit fa288de272c5c8a66d5eb683b123706a52bc7ad6 from qemu
Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).
Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().
Backports commit 81f681106eabe21c55118a5a41999fb7387fb714 from qemu
Convert the "single-precision" register moves to decodetree:
* VMSR
* VMRS
* VMOV between general purpose register and single precision
Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
* VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
(previously was a nop)
* VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
(previously was a nop)
* VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
(previously would write to the register, which had no
guest-visible effect because we always UNDEF reads)
We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.
The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.
Backports commit a9ab50011aeda2dd012da99069e078379315ea18 from qemu
Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.
Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
* VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
* VMOV-from-gpr with opc1:opc2 == 0x10
* VDUP with B:E == 11
* VDUP with Q == 1 and Vn<0> == 1
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Backports commit 9851ed9269d214c0c6feba960dd14ff09e6c34b4 from qemu
The current VFP code has two different idioms for
loading and storing from the VFP register file:
1 using the gen_mov_F0_vreg() and similar functions,
which load and store to a fixed set of TCG globals
cpu_F0s, CPU_F0d, etc
2 by direct calls to tcg_gen_ld_f64() and friends
We want to phase out idiom 1 (because the use of the
fixed globals is a relic of a much older version of TCG),
but idiom 2 is quite longwinded:
tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
requires us to specify the 64-bitness twice, once in
the function name and once by passing 'true' to
vfp_reg_offset(). There's no guard against accidentally
passing the wrong flag.
Instead, let's move to a convention of accessing 64-bit
registers via the existing neon_load_reg64() and
neon_store_reg64(), and provide new neon_load_reg32()
and neon_store_reg32() for the 32-bit equivalents.
Implement the new functions and use them in the code in
translate-vfp.inc.c. We will convert the rest of the VFP
code as we do the decodetree conversion in subsequent
commits.
Backports commit 160f3b64c5cc4c8a09a1859edc764882ce6ad6bf from qemu
Move the trans_*() functions we've just created from translate.c
to translate-vfp.inc.c. This is pure code motion with no textual
changes (this can be checked with 'git show --color-moved').
Backports commit f7bbb8f31f0761edbf0c64b7ab3c3f49c13612ea from qemu
Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
trans_VCVT() is temporarily left in translate.c.
Backports commit c2a46a914cd5c38fd0ee57ff0befc1c5bde27bcf from qemu
Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
Again, trans_VRINT() is temporarily left in translate.c.
Backports commit e3bb599d16e4678b228d80194cee328f894b1ceb from qemu
Convert the VMINNM and VMAXNM instructions to decodetree.
As with VSEL, we leave the trans_VMINMAXNM() function
in translate.c for the moment.
Backports commit f65988a1efdb42f9058db44297591491842e697c from qemu
Convert the VSEL instructions to decodetree.
We leave trans_VSEL() in translate.c for now as this allows
the patch to show just the changes from the old handle_vsel().
In the old code the check for "do D16-D31 exist" was hidden in
the VFP_DREG macro, and assumed that VFPv3 always implied that
D16-D31 exist. In the new code we do the correct ID register test.
This gives identical behaviour for most of our CPUs, and fixes
previously incorrect handling for Cortex-R5F, Cortex-M4 and
Cortex-M33, which all implement VFPv3 or better with only 16
double-precision registers.
Backports commit b3ff4b87b4ae08120a51fe12592725e1dca8a085 from qemu
At the moment our -cpu max for AArch32 supports VFP short-vectors
because we always implement them, even for CPUs which should
not have them. The following commits are going to switch to
using the correct ID-register-check to enable or disable short
vector support, so we need to turn it on explicitly for -cpu max,
because Cortex-A15 doesn't implement it.
We don't enable this for the AArch64 -cpu max, because the v8A
architecture never supports short-vectors.
Backports commit 973751fd798d41402d34f9f705c0c6d1633d0cda from qemu
The Cortex-R5F initfn was not correctly setting up the MVFR
ID register values. Fill these in, since some subsequent patches
will use ID register checks rather than CPU feature bit checks.
Backports commit 3de79d335c9aa7d726865e3933d9b21781032183 from qemu
Factor out the VFP access checking code so that we can use it in the
leaf functions of the decodetree decoder.
We call the function full_vfp_access_check() so we can keep
the more natural vfp_access_check() for a version which doesn't
have the 'ignore_vfp_enabled' flag -- that way almost all VFP
insns will be able to use vfp_access_check(s) and only the
special-register access function will have to use
full_vfp_access_check(s, ignore_vfp_enabled).
Backports commit 06db8196bba34776829020192ed623a0b22e6557 from qemu
Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 VFP encodings. At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.
We need to have one decoder for the unconditional insns and one for
the conditional insns, as otherwise the patterns for conditional
insns would incorrectly match against the unconditional ones too.
Since translate.c is over 14,000 lines long and we're going to be
touching pretty much every line of the VFP code as part of the
decodetree conversion, we create a new translate-vfp.inc.c to hold
the code which deals with VFP in the new scheme. It should be
possible to convert this into a standalone translation unit
eventually, but the conversion process will be much simpler if we
simply #include it midway through translate.c to start with.
Backports commit 78e138bc1f672c145ef6ace74617db00eebaa2ba from qemu
The ARM pseudocode installs the error_code into the original
pointer, not the encrypted pointer. The difference applies
within the 7 bits of pac data; the result should be the sign
extension of bit 55.
Add a testcase to that effect.
Backports commit d67ebada159148bfdfde84871338738e4465e985 from qemu
The NSACR register allows secure code to configure the FPU
to be inaccessible to non-secure code. If the NSACR.CP10
bit is set then:
* NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
* CPACR.{CP10,CP11} behave as if RAZ/WI
* HCPTR.{TCP11,TCP10} behave as if RAO/WI
Note that we do not implement the NSACR.NSASEDIS bit which
gates only access to Advanced SIMD, in the same way that
we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.
Backports commit fc1120a7f5f2d4b601003205c598077d3eb11ad2 from qemu
Nothing in there so far, but all of the plumbing done
within the target ArchCPU state.
Backports commit 5b146dc716cfd247f99556c04e6e46fbd67565a0 from qemu
Now that we have ArchCPU, we can define this generically,
in the one place that needs it.
Backports commit 677c4d69ac21961e76a386f9bfc892a44923acc0 from qemu
Cleanup in the boilerplate that each target must define.
Replace arm_env_get_cpu with env_archcpu. The combination
CPU(arm_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Backports commit 2fc0cc0e1e034582f4718b1a2d57691474ccb6aa from qemu
Now that we have both ArchCPU and CPUArchState, we can define
this generically instead of via macro in each target's cpu.h.
Backports commit 29a0af618ddd21f55df5753c3e16b0625f534b3c from qemu
For all targets, into this new file move TARGET_LONG_BITS,
TARGET_PAGE_BITS, TARGET_PHYS_ADDR_SPACE_BITS,
TARGET_VIRT_ADDR_SPACE_BITS, and NB_MMU_MODES.
Include this new file from exec/cpu-defs.h.
This now removes the somewhat odd requirement that target/arch/cpu.h
defines TARGET_LONG_BITS before including exec/cpu-defs.h, so push the
bulk of the includes within target/arch/cpu.h to the top.
Backports commit 74433bf083b0766aba81534f92de13194f23ff3e from qemu
Commit 89e68b575 "target/arm: Use vector operations for saturation"
causes this abort() when booting QEMU ARM with a Cortex-A15:
0 0x00007ffff4c2382f in raise () at /usr/lib/libc.so.6
1 0x00007ffff4c0e672 in abort () at /usr/lib/libc.so.6
2 0x00005555559c1839 in disas_neon_data_insn (insn=<optimized out>, s=<optimized out>) at ./target/arm/translate.c:6673
3 0x00005555559c1839 in disas_neon_data_insn (s=<optimized out>, insn=<optimized out>) at ./target/arm/translate.c:6386
4 0x00005555559cd8a4 in disas_arm_insn (insn=4081107068, s=0x7fffe59a9510) at ./target/arm/translate.c:9289
5 0x00005555559cd8a4 in arm_tr_translate_insn (dcbase=0x7fffe59a9510, cpu=<optimized out>) at ./target/arm/translate.c:13612
6 0x00005555558d1d39 in translator_loop (ops=0x5555561cc580 <arm_translator_ops>, db=0x7fffe59a9510, cpu=0x55555686a2f0, tb=<optimized out>, max_insns=<optimized out>) at ./accel/tcg/translator.c:96
7 0x00005555559d10d4 in gen_intermediate_code (cpu=cpu@entry=0x55555686a2f0, tb=tb@entry=0x7fffd7840080 <code_gen_buffer+126091347>, max_insns=max_insns@entry=512) at ./target/arm/translate.c:13901
8 0x00005555558d06b9 in tb_gen_code (cpu=cpu@entry=0x55555686a2f0, pc=3067096216, cs_base=0, flags=192, cflags=-16252928, cflags@entry=524288) at ./accel/tcg/translate-all.c:1736
9 0x00005555558ce467 in tb_find (cf_mask=524288, tb_exit=1, last_tb=0x7fffd783e640 <code_gen_buffer+126084627>, cpu=0x1) at ./accel/tcg/cpu-exec.c:407
10 0x00005555558ce467 in cpu_exec (cpu=cpu@entry=0x55555686a2f0) at ./accel/tcg/cpu-exec.c:728
11 0x000055555588b0cf in tcg_cpu_exec (cpu=0x55555686a2f0) at ./cpus.c:1431
12 0x000055555588d223 in qemu_tcg_cpu_thread_fn (arg=0x55555686a2f0) at ./cpus.c:1735
13 0x000055555588d223 in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x55555686a2f0) at ./cpus.c:1709
14 0x0000555555d2629a in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:502
15 0x00007ffff4db8a92 in start_thread () at /usr/lib/libpthread.
This patch ensures that we don't hit the abort() in the second switch
case in disas_neon_data_insn() as we will return from the first case.
Backports commit 2f143d3ad1c05e91cf2cdf5de06d59a80a95e6c8 from qemu
The mask implied by the extract is redundant with the one
implied by the deposit. Also, fix spelling of BFXIL.
Backports commit 87eb65a3c45c788a309986d48170a54a0d1c0705 from qemu
Use the newly introduced infrastructure for guest random numbers.
Backports commit de390645675966cce113bf5394445bc1f8d07c85 from qemu
(with the actual RNG portion disabled to preserve determinism for the
time being).
Most of the existing users would continue around a loop which
would fault the tlb entry in via a normal load/store.
But for AArch64 SVE we have an existing emulation bug wherein we
would mark the first element of a no-fault vector load as faulted
(within the FFR, not via exception) just because we did not have
its address in the TLB. Now we can properly only mark it as faulted
if there really is no valid, readable translation, while still not
raising an exception. (Note that beyond the first element of the
vector, the hardware may report a fault for any reason whatsoever;
with at least one element loaded, forward progress is guaranteed.)
Backports commit 4811e9095c0491bc6f5450e5012c9c4796b9e59d from qemu
We can now use the CPUClass hook instead of a named function.
Create a static tlb_fill function to avoid other changes within
cputlb.c. This also isolates the asserts within. Remove the
named tlb_fill function from all of the targets.
Backports commit c319dc13579a92937bffe02ad2c9f1a550e73973 from qemu
Remove a function of the same name from target/arm/.
Use a branchless implementation of abs gleaned from gcc.
Backports commit ff1f11f7f8710a768f9313f24bd7f509d3db27e5 from qemu
Replace the single opcode in .opc with a null-terminated
array in .opt_opc. We still require that all opcodes be
used with the same .vece.
Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion. Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.
Convert all existing vector aware front ends.
Backports commit 53229a7703eeb2bbe101a19a33ef22aaf960c65b from qemu
Currently the dc_zva helper function uses a variable length
array. In fact we know (as the comment above remarks) that
the length of this array is bounded because the architecture
limits the block size and QEMU limits the target page size.
Use a fixed array size and assert that we don't run off it.
Backports commit 63159601fb3e396b28da14cbb71e50ed3f5a0331 from qemu
In the M-profile architecture, if the CPU implements the DSP extension
then the XPSR has GE bits, in the same way as the A-profile CPSR. When
we added DSP extension support we forgot to add support for reading
and writing the GE bits, which are stored in env->GE. We did put in
the code to add XPSR_GE to the mask of bits to update in the v7m_msr
helper, but forgot it in v7m_mrs. We also must not allow the XPSR we
pull off the stack on exception return to set the nonexistent GE bits.
Correct these errors:
* read and write env->GE in xpsr_read() and xpsr_write()
* only set GE bits on exception return if DSP present
* read GE bits for MRS if DSP present
Backports commit f1e2598c46d480c9e21213a244bc514200762828 from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
Backports commit 2399d4e7cec22ecf1c51062d2ebfd45220dbaace from qemu
The M-profile architecture floating point system supports
lazy FP state preservation, where FP registers are not
pushed to the stack when an exception occurs but are instead
only saved if and when the first FP instruction in the exception
handler is executed. Implement this in QEMU, corresponding
to the check of LSPACT in the pseudocode ExecuteFPCheck().
Backports commit e33cf0f8d8c9998a7616684f9d6aa0d181b88803 from qemu
Pushing registers to the stack for v7M needs to handle three cases:
* the "normal" case where we pend exceptions
* an "ignore faults" case where we set FSR bits but
do not pend exceptions (this is used when we are
handling some kinds of derived exception on exception entry)
* a "lazy FP stacking" case, where different FSR bits
are set and the exception is pended differently
Implement this by changing the existing flag argument that
tells us whether to ignore faults or not into an enum that
specifies which of the 3 modes we should handle.
Backports commit a356dacf647506bccdf8ecd23574246a8bf615ac from qemu
In the v7M architecture, if an exception is generated in the process
of doing the lazy stacking of FP registers, the handling of
possible escalation to HardFault is treated differently to the normal
approach: it works based on the saved information about exception
readiness that was stored in the FPCCR when the stack frame was
created. Provide a new function armv7m_nvic_set_pending_lazyfp()
which pends exceptions during lazy stacking, and implements
this logic.
This corresponds to the pseudocode TakePreserveFPException().
Backports the relevant parts of commit
a99ba8ab1601904e0fa20325192fc850362ce80e from qemu
Add a new helper function which returns the MMU index to use
for v7M, where the caller specifies all of the security
state, privilege level and whether the execution priority
is negative, and reimplement the existing
arm_v7m_mmu_idx_for_secstate_and_priv() in terms of it.
We are going to need this for the lazy-FP-stacking code.
Backports commit fa6252a988dbe440cd6087bf93cbe0887f0c401b from qemu
The M-profile FPCCR.ASPEN bit indicates that automatic floating-point
context preservation is enabled. Before executing any floating-point
instruction, if FPCCR.ASPEN is set and the CONTROL FPCA/SFPA bits
indicate that there is no active floating point context then we
must create a new context (by initializing FPSCR and setting
FPCA/SFPA to indicate that the context is now active). In the
pseudocode this is handled by ExecuteFPCheck().
Implement this with a new TB flag which tracks whether we
need to create a new FP context.
Backports commit 6000531e19964756673a5f4b694a649ef883605a from qemu
The M-profile FPCCR.S bit indicates the security status of
the floating point context. In the pseudocode ExecuteFPCheck()
function it is unconditionally set to match the current
security state whenever a floating point instruction is
executed.
Implement this by adding a new TB flag which tracks whether
FPCCR.S is different from the current security state, so
that we only need to emit the code to update it in the
less-common case when it is not already set correctly.
Note that we will add the handling for the other work done
by ExecuteFPCheck() in later commits.
Backports commit 6d60c67a1a03be32c3342aff6604cdc5095088d1 from qemu
We are close to running out of TB flags for AArch32; we could
start using the cs_base word, but before we do that we can
economise on our usage by sharing the same bits for the VFP
VECSTRIDE field and the XScale XSCALE_CPAR field. This
works because no XScale CPU ever had VFP.
Backports commit ea7ac69d124c94c6e5579145e727adec9ccbefef from qemu
Move the NS TBFLAG down from bit 19 to bit 6, which has not
been used since commit c1e3781090b9d36c60 in 2015, when we
started passing the entire MMU index in the TB flags rather
than just a 'privilege level' bit.
This rearrangement is not strictly necessary, but means that
we can put M-profile-only bits next to each other rather
than scattered across the flag word.
Backports commit 7fbb535f7aeb22896fedfcf18a1eeff48165f1d7 from qemu
Handle floating point registers in exception return.
This corresponds to pseudocode functions ValidateExceptionReturn(),
ExceptionReturn(), PopStack() and ConsumeExcStackFrame().
Backports commit 6808c4d2d2826920087533f517472c09edc7b0d2 from qemu
The magic value pushed onto the callee stack as an integrity
check is different if floating point is present.
Backports commit 0dc51d66fcfcc4c72011cdafb401fd876ca216e7 from qemu
The TailChain() pseudocode specifies that a tail chaining
exception should sanitize the excReturn all-ones bits and
(if there is no FPU) the excReturn FType bits; we weren't
doing this.
Backports commit 60fba59a2f9a092a44b688df5d058cdd6dd9c276 from qemu
For v8M floating point support, transitions from Secure
to Non-secure state via BLNS and BLXNS must clear the
CONTROL.SFPA bit. (This corresponds to the pseudocode
BranchToNS() function.)
Backports commit 3cd6726f0ba7cc77342ee721bd86094e13b2a42a from qemu
Implement the code which updates the FPCCR register on an
exception entry where we are going to use lazy FP stacking.
We have to defer to the NVIC to determine whether the
various exceptions are currently ready or not.
Backports commit b593c2b81287040ab6f452afec6281e2f7ee487b from qemu
Handle floating point registers in exception entry.
This corresponds to the FP-specific parts of the pseudocode
functions ActivateException() and PushStack().
We defer the code corresponding to UpdateFPCCR() to a later patch.
Backports commit 0ed377a8013f40653a83f6ad2c9693897522d7dc from qemu
Currently the code in v7m_push_stack() which detects a violation
of the v8M stack limit simply returns early if it does so. This
is OK for the current integer-only code, but won't work for the
floating point handling we're about to add. We need to continue
executing the rest of the function so that we check for other
exceptions like not having permission to use the FPU and so
that we correctly set the FPCCR state if we are doing lazy
stacking. Refactor to avoid the early return.
Backports commit 3432c79a4e7345818d2defcf9e61a1bcb2907f9f from qemu
The M-profile CONTROL register has two bits -- SFPA and FPCA --
which relate to floating-point support, and should be RES0 otherwise.
Handle them correctly in the MSR/MRS register access code.
Neither is banked between security states, so they are stored
in v7m.control[M_REG_S] regardless of current security state.
Backports commit 2e1c5bcd32014c9ede1b604ae6c2c653de17fc53 from qemu
If the floating point extension is present, then the SG instruction
must clear the CONTROL_S.SFPA bit. Implement this.
(On a no-FPU system the bit will always be zero, so we don't need
to make the clearing of the bit conditional on ARM_FEATURE_VFP.)
Backports commit 1702071302934af77a072b7ee7c5eadc45b37573 from qemu
Correct the decode of the M-profile "coprocessor and
floating-point instructions" space:
* op0 == 0b11 is always unallocated
* if the CPU has an FPU then all insns with op1 == 0b101
are floating point and go to disas_vfp_insn()
For the moment we leave VLLDM and VLSTM as NOPs; in
a later commit we will fill in the proper implementation
for the case where an FPU is present.
Backports commit 8859ba3c9625e7ceb5599f457a344bcd7c5e112b from qemu
Like AArch64, M-profile floating point has no FPEXC enable
bit to gate floating point; so always set the VFPEN TB flag.
M-profile also has CPACR and NSACR similar to A-profile;
they behave slightly differently:
* the CPACR is banked between Secure and Non-Secure
* if the NSACR forces a trap then this is taken to
the Secure state, not the Non-Secure state
Honour the CPACR and NSACR settings. The NSACR handling
requires us to borrow the exception.target_el field
(usually meaningless for M profile) to distinguish the
NOCP UsageFault taken to Secure state from the more
usual fault taken to the current security state.
Backports commit d87513c0abcbcd856f8e1dee2f2d18903b2c3ea2 from qemu
The only "system register" that M-profile floating point exposes
via the VMRS/VMRS instructions is FPSCR, and it does not have
the odd special case for rd==15. Add a check to ensure we only
expose FPSCR.
Backports commit ef9aae2522c22c05df17dd898099dd5c3f20d688 from qemu
The M-profile floating point support has three associated config
registers: FPCAR, FPCCR and FPDSCR. It also makes the registers
CPACR and NSACR have behaviour other than reads-as-zero.
Add support for all of these as simple reads-as-written registers.
We will hook up actual functionality later.
The main complexity here is handling the FPCCR register, which
has a mix of banked and unbanked bits.
Note that we don't share storage with the A-profile
cpu->cp15.nsacr and cpu->cp15.cpacr_el1, though the behaviour
is quite similar, for two reasons:
* the M profile CPACR is banked between security states
* it preserves the invariant that M profile uses no state
inside the cp15 substruct
Backports commit d33abe82c7c9847284a23e575e1078cccab540b5 from qemu
Enforce that for M-profile various FPSCR bits which are RES0 there
but have defined meanings on A-profile are never settable. This
ensures that M-profile code can't enable the A-profile behaviour
(notably vector length/stride handling) by accident.
Backports commit 5bcf8ed9401e62c73158ba110864ee1375558bf7 from qemu
In order to handle TB's that translate to too much code, we
need to place the control of the length of the translation
in the hands of the code gen master loop.
Backports commit 8b86d6d25807e13a63ab6ea879f976b9f18cc45a from qemu
This wasn't subtracting the size of the instruction off the PC like how
the ARM mode tracing was performing the tracing. This simplifies it and
makes the behavior identical.
Allows non-AArch64 environments to always access coprocessors initially.
Removes the need to do avoidable register management when testing
floating-point code.
cortex-a7 and cortex-a15 have pmus (PMUv2) and they advertise
them in ID_DFR0. Let's allow them to function. This also enables
the pmu cpu property to work with these cpu types, i.e. we can
now do '-cpu cortex-a15,pmu=off' to remove the pmu.
Backports commit a46118fc16537a593119e5b316052a98514046bb from qemu
Fix a QEMU NULL derefence that occurs when the guest attempts to
enable PMU counters with a non-v8 cpu model or a v8 cpu model
which has not configured a PMU.
Backports commit cbbb3041fe2f57a475cef5d6b0ef836118aad106 from qemu
The second word has been loaded from the unincremented
address since the first commit.
Backports commit a036f5302c13634f3d375615b2949fd1fa1657b6 from qemu
These instructions do not trap when SVE is disabled in EL0,
causing them to be executed with wrong size information.
Backports commit 5de56742a3c91de3d646326bec43a989bba83ca4 from qemu
Some generic arch timer registers are Config-RW in the EL0,
which means the EL0 exception level can have write permission
if it is appropriately configured.
When VM access registers, QEMU firstly checks whether they have RW
permission, then check whether it is appropriately configured.
If they are defined to read only in EL0, even though they have been
appropriately configured, they still do not have write permission.
So need to add the write permission according to ARMV8 spec when
define it.
Backports commit daf1dc5f82cefe2a57f184d5053e8b274ad2ba9a from qemu
These changes were mostly made in upstream unicorn for what I can guess,
was to support old versions of MSVC's compiler.
This is also a pain to maintain, since everything needs to be done
manually and can be a source of errors. It also makes it take more work
than it needs to, to backport changes from qemu.
Because of that, this change restores Qemu's organization of the
coprocessor registers.
This decoding more closely matches the ARMv8.4 Table C4-6,
Encoding table for Data Processing - Register Group.
In particular, op2 == 0 is now more than just Add/sub (with carry).
Backports commit 2fba34f70d9a81bab56e61bb99a4d6632bdfe531 from qemu
We do not need an out-of-line helper for manipulating bits in pstate.
While changing things, share the implementation of gen_ss_advance.
Backports commit 22ac3c49641f6eed93dca5b852030b4d3eacf6c4 from qemu
The EL0+UMA check is unique to DAIF. While SPSel had avoided the
check by nature of already checking EL >= 1, the other post v8.0
extensions to MSR (imm) allow EL0 and do not require UMA. Avoid
the unconditional write to pc and use raise_exception_ra to unwind.
Backports commit ff730e9666a716b669ac4a8ca7c521177d1d2b15 from qemu
Minimize the number of places that will need updating when
the virtual host extensions are added.
Backports commit 64e40755cd41fbe8cd266cf387e42ddc57a449ef from qemu
Found by inspection: Rn is the base register against which the
load began; I is the register within the mask being processed.
The exception return should of course be processed from the loaded PC.
Backports commit 9d090d17234058f55c3c439d285db78c94d7d4de from qemu
Previously we'd be checking prior to the actual decoding if we were at
the ending address. This worked fine using the old model of the
translation process in qemu. However, this causes the wrong behavior to
occur in both ARM and Thumb/Thumb-2 modes using the newer translator
model.
Given the translator itself checks for the end address already, this
needs to be placed within arm_post_translate_insn().
This prevents the emulation process being off-by-one as well when it
comes to actually executing the instructions.
1. Create an enum name for the IPSR register.
2. Implement read and write of the IPSR via the xpsr helper functions.
Fixes#1065
Backports commit 6c319941a5462ee3a4af4593c371f5674394d6ce from unicorn.
Note that float16_to_float32 rightly squashes SNaN to QNaN.
But of course pickNaNMulAdd, for ARM, selects SNaNs first.
So we have to preserve SNaN long enough for the correct NaN
to be selected. Thus float16_to_float32_by_bits.
Backports commit a4e943a716d5fac923d82df3eabc65d1e3624019 from qemu
There is a set of VFP instructions which we implement in
disas_vfp_v8_insn() and gate on the ARM_FEATURE_V8 bit.
These were all first introduced in v8 for A-profile, but in
M-profile they appeared in v7M. Gate them on the MVFR2
FPMisc field instead, and rename the function appropriately.
Backports commit c0c760afe800b60b48c80ddf3509fec413594778 from qemu
Instead of gating the A32/T32 FP16 conversion instructions on
the ARM_FEATURE_VFP_FP16 flag, switch to our new approach of
looking at ID register bits. In this case MVFR1 fields FPHP
and SIMDHP indicate the presence of these insns.
This change doesn't alter behaviour for any of our CPUs.
Backports commit 602f6e42cfbfe9278be34e9b91d2ceb695837e02 from qemu
There are lots of special cases within these insns. Split the
major argument decode/loading/saving into no_output (compares),
rd_is_dp, and rm_is_dp.
We still need to special case argument load for compare (rd as
input, rm as zero) and vcvt fixed (rd as input+output), but lots
of special cases do disappear.
Now that we have a full switch at the beginning, hoist the ISA
checks from the code generation.
Backports commit e80941bd64cc388554770fd72334e9e7d459a1ef from qemu
Move all of the fp helpers out of helper.c into a new file.
This is code movement only. Since helper.c has no copyright
header, take the one from cpu.h for the new file.
Backports commit 37356079fcdb34e13abbed8ea0c00ca880c31247 from qemu
For opcodes 0-5, move some if conditions into the structure
of a switch statement. For opcodes 6 & 7, decode everything
at once with a second switch.
Backports commit 3c3ff68492c2d00bd8cb39ed2d02bdaf5caf5cb8 from qemu
This was introduced by
commit bf8d09694ccc07487cd73d7562081fdaec3370c8
target/arm: Don't clear supported PMU events when initializing PMCEID1
and identified by Coverity (CID 1398645).
Backports commit 67da43d668320e1bcb0a0195aaf2de4ff2a001a0 from qemu
The "background region" for a v8M MPU is a default which will be used
(if enabled, and if the access is privileged) if the access does
not match any specific MPU region. We were incorrectly using it
always (by putting the condition at the wrong nesting level). This
meant that we would always return the default background permissions
rather than the correct permissions for a specific region, and also
that we would not return the right information in response to a
TT instruction.
Move the check for the background region to the same place in the
logic as the equivalent v8M MPUCheck() pseudocode puts it.
This in turn means we must adjust the condition we use to detect
matches in multiple regions to avoid false-positives.
Backports commit cff21316c666c8053b1f425577e324038d0ca30d from qemu
Fortunately, the functions affected are so far only called from SVE,
so there is no tail to be cleared. But as we convert more of AdvSIMD
to gvec, this will matter.
Backports commit d8efe78e8039511b95c23d75bb48eca6873fbb0f from qemu
For same-sign saturation, we have tcg vector operations. We can
compute the QC bit by comparing the saturated value against the
unsaturated value.
Backports commit 89e68b575e138d0af1435f11a8ffcd8779c237bd from qemu
Change the representation of this field such that it is easy
to set from vector code.
Backports commit a4d5846245c5e029e5aa3945a9bda1de1c3fedbf from qemu
Given that we mask bits properly on set, there is no reason
to mask them again on get. We failed to clear the exception
status bits, 0x9f, which means that the wrong value would be
returned on get. Except in the (probably normal) case in which
the set clears all of the bits.
Simplify the code in set to also clear the RES0 bits.
Backports commit 18aaa59c622208743565307668a2100ab24f7de9 from qemu
Minimize the code within a macro by splitting out a helper function.
Use deposit32 instead of manual bit manipulation.
Backports commit 55a889456ef78f3f9b8eae9846c2f1453b1dd77b from qemu
The 32-bit PMIN/PMAX has been decomposed to scalars,
and so can be trivially expanded inline.
Backports commit 9ecd3c5c1651fa7f9adbedff4806a2da0b50490c from qemu
Since we're now handling a == b generically, we no longer need
to do it by hand within target/arm/.
Backports commit 2900847ff4c862887af750935a875059615f509a from qemu
There are a whole bunch more registers in the CPUID space which are
currently not used but are exposed as RAZ. To avoid too much
duplication we expand ARMCPRegUserSpaceInfo to understand glob
patterns so we only need one entry to tweak whole ranges of registers.
Backports commit d040242effe47850060d2ef1c461ff637d88a84d from qemu
As this is a single register we could expose it with a simple ifdef
but we use the existing modify_arm_cp_regs mechanism for consistency.
Backports commit 522641660c3de64ed8322b8636c58625cd564a3f from qemu
A number of CPUID registers are exposed to userspace by modern Linux
kernels thanks to the "ARM64 CPU Feature Registers" ABI. For QEMU's
user-mode emulation we don't need to emulate the kernels trap but just
return the value the trap would have done. To avoid too much #ifdef
hackery we process ARMCPRegInfo with a new helper (modify_arm_cp_regs)
before defining the registers. The modify routine is driven by a
simple data structure which describes which bits are exported and
which are fixed.
Backports commit 6c5c0fec29bbfe36c64eca1edfd8455be46b77c6 from qemu
Although technically not visible to userspace the kernel does make
them visible via a trap and emulate ABI. We provide a new permission
mask (PL0U_R) which maps to PL0_R for CONFIG_USER builds and adjust
the minimum permission check accordingly.
Backports commit b5bd7440422bb66deaceb812bb9287a6a3cdf10c from qemu
The lo,hi order is different from the comments. And in commit
1ec182c33379 ("target/arm: Convert to HAVE_CMPXCHG128"), it changes
the original code logic. So just restore the old code logic before this
commit:
do_paired_cmpxchg64_be():
cmpv = int128_make128(env->exclusive_high, env->exclusive_val);
newv = int128_make128(new_hi, new_lo);
This fixes a bug that would only be visible for big-endian
AArch64 guest code.
Fixes: 1ec182c33379 ("target/arm: Convert to HAVE_CMPXCHG128")
Backports commit abd5abc58c5d4c9bd23427b0998a44eb87ed47a2 from qemu
HACR_EL2 is a register with IMPDEF behaviour, which allows
implementation specific trapping to EL2. Implement it as RAZ/WI,
since QEMU's implementation has no extra traps. This also
matches what h/w implementations like Cortex-A53 and A57 do.
Backports commit 831a2fca343ebcd6651eab9102bd7a36b77da65d from qemu
This bug was introduced in:
commit 5ecdd3e47cadae83a62dc92b472f1fe163b56f59
target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER
Backports commit 62c7ec3488fe0dcbabffd543f458914e27736115 from qemu
The {IOE, DZE, OFE, UFE, IXE, IDE} bits in the FPSCR/FPCR are for
enabling trapped IEEE floating point exceptions (where IEEE exception
conditions cause a CPU exception rather than updating the FPSR status
bits). QEMU doesn't implement this (and nor does the hardware we're
modelling), but for implementations which don't implement trapped
exception handling these control bits are supposed to be RAZ/WI.
This allows guest code to test for whether the feature is present
by trying to write to the bit and checking whether it sticks.
QEMU is incorrectly making these bits read as written. Make them
RAZ/WI as the architecture requires.
In particular this was causing problems for the NetBSD automatic
test suite.
Backports commit a15945d98d3a3390c3da344d1b47218e91e49d8b from qemu
This has been enabled in the linux kernel since v3.11
(commit d50240a5f6cea, 2013-09-03,
"arm64: mm: permit use of tagged pointers at EL0").
Backports commit f6a148fef63698826e69ca91cc11877ab1ed786f from qemu