Remove a function of the same name from target/arm/.
Use a branchless implementation of abs gleaned from gcc.
Backports commit ff1f11f7f8710a768f9313f24bd7f509d3db27e5 from qemu
Allow expansion either via shift by scalar or by replicating
the scalar for shift by vector.
Backports commit b4578cd91cda4cef1c413304353ca6dc5b957b60 from qemu
The gvec expanders perform a modulo on the shift count. If the target
requires alternate behaviour, then it cannot use the generic gvec
expanders anyway, and will have to have its own custom code.
Backports commit 5ee5c14cacda27e904cd6b0d9e7ffe1acff42838 from qemu
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.
Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:
VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.
VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.
Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.
All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.
Backports commit 37ee55a081b7863ffab2151068dd1b2f11376914 from qemu
The LD1R instruction does all the work. Note that the only
useful addressing mode is a base register with no offset.
Backports commit f23e5e15edfd49d5dd72cab2ed2d85ac354b2eeb from qemu
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.
Backports commit bab1671f0fa928fd678a22f934739f06fd5fd035 from qemu
The i386 backend already has these functions, and the aarch64 backend
could easily split out one. Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.
Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.
Backports commit e7632cfa8b76cdbbc1c76e8737338ef5844e7d60 from qemu
PowerPC Altivec does not support direct moves between vector registers
and general registers. So when tcg_out_mov fails, we can use the
backing memory for the temporary to perform the move.
Backports commit 240c08d0998f402c325fce489de0d14831048128 from qemu
This patch merely changes the interface, aborting on all failures,
of which there are currently none.
Backports commit 78113e83e0007e869c9f0cb4c0497a77538988e3 from qemu
We have a function that takes an additional condition parameter
over the standard backend interface. It already takes care of
eliding no-op moves.
Backports commit c16f52b2c5d91c36e121795bd3b386cea0b7573c from qemu
The only fixed_reg is cpu_env, and it should not be modified
during any TB. Therefore code that tries to special-case moves
into a fixed_reg is dead. Remove it.
Backports commit d63e3b6e694ad6c887be135dddb9cd4893f1a844 from qemu
Replace the single opcode in .opc with a null-terminated
array in .opt_opc. We still require that all opcodes be
used with the same .vece.
Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion. Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.
Convert all existing vector aware front ends.
Backports commit 53229a7703eeb2bbe101a19a33ef22aaf960c65b from qemu
PowerPC Altivec does not support add and subtract of 64-bit elements.
Prepare for that configuration by not assuming the operation is
universally supported.
Backports commit ce27c5d1a38e93da38653af71fb468c5eded4c7b from qemu
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.
Backports commit ac383dde33405106469d04a78de1d76f1a730cb1 from qemu
Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.
Backports commit e1227bb6e59173117f094a6a13b998587b45c928 from qemu
Leading underscores are ill-advised because such identifiers are
reserved. Trailing underscores are merely ugly. Strip both.
Our header guards commonly end in _H. Normalize the exceptions.
Done with scripts/clean-header-guards.pl.
Backports commit a8b991b52dcde75ab5065046653626951aac666d from qemu
This is less tricky than for loads, because we always fall
back to single byte stores to implement unaligned stores.
Backports commit 4601f8d10d7628bcaf2a8179af36e04b42879e91 from qemu
If we attempt to recurse from load_helper back to load_helper,
even via intermediary, we do not get all of the constants
expanded away as desired.
But if we recurse back to the original helper (or a shim that
has a consistent function signature), the operands are folded
away as desired.
Backports commit 2dd926067867c2dd19e66d31a7990e8eea7258f6 from qemu
Going to approach this problem via __attribute__((always_inline))
instead, but full conversion will take several steps.
Backports commit fc1bc777910dc14a3db4e2ad66f3e536effc297d from qemu
Having this in io_readx/io_writex meant that we forgot to
re-compute index after tlb_fill. It also means we can use
the normal aligned memory load path. It also fixes a bug
in that we had cached a use of index across a tlb_fill.
Backports commit f1be36969de2fb9b6b64397db1098f115210fcd9 from qemu
Instead of expanding a series of macros to generate the load/store
helpers we move stuff into common functions and rely on the compiler
to eliminate the dead code for each variant.
Backports commit eed5664238ea5317689cf32426d9318686b2b75c from qemu
Currently the dc_zva helper function uses a variable length
array. In fact we know (as the comment above remarks) that
the length of this array is bounded because the architecture
limits the block size and QEMU limits the target page size.
Use a fixed array size and assert that we don't run off it.
Backports commit 63159601fb3e396b28da14cbb71e50ed3f5a0331 from qemu
In the M-profile architecture, if the CPU implements the DSP extension
then the XPSR has GE bits, in the same way as the A-profile CPSR. When
we added DSP extension support we forgot to add support for reading
and writing the GE bits, which are stored in env->GE. We did put in
the code to add XPSR_GE to the mask of bits to update in the v7m_msr
helper, but forgot it in v7m_mrs. We also must not allow the XPSR we
pull off the stack on exception return to set the nonexistent GE bits.
Correct these errors:
* read and write env->GE in xpsr_read() and xpsr_write()
* only set GE bits on exception return if DSP present
* read GE bits for MRS if DSP present
Backports commit f1e2598c46d480c9e21213a244bc514200762828 from qemu
I encountered the following compilation error on mingw:
/mnt/d/qemu/include/qemu/osdep.h:97:9: error: '__USE_MINGW_ANSI_STDIO' macro redefined [-Werror,-Wmacro-redefined]
\#define __USE_MINGW_ANSI_STDIO 1
^
/mnt/d/llvm-mingw/aarch64-w64-mingw32/include/_mingw.h:433:9: note: previous definition is here
\#define __USE_MINGW_ANSI_STDIO 0 /* was not defined so it should be 0 */
It turns out that __USE_MINGW_ANSI_STDIO must be set before any
system headers are included, not just before stdio.h.
Backports commit 946376c21be1cd9dcc3c7936b204b113781603f7 from qemu
Assuming that the ISA clearly describes how to determine
the length of the instruction, and the ISA has a reasonable
maximum instruction length, the input to the decoder can be
right-justified in an appropriate insn word.
This is not 100% convenient, as out-of-line %fields are
numbered relative to the maximum instruction length, but
this appears to still be usable.
Backports commit 17560e9349ff1fcce814184b37993f92378cf0c4 from qemu
Now that we have curr_cflags, we can include CF_USE_ICOUNT
early and then remove it as necessary.
Backports commit 416986d3f97329655e30da7271a2d11c6d707b06 from qemu
Now that all code generation has been converted to check CF_PARALLEL, we can
generate !CF_PARALLEL code without having yet set !parallel_cpus --
and therefore without having to be in the exclusive region during
cpu_exec_step_atomic.
While at it, merge cpu_exec_step into cpu_exec_step_atomic.
Backports commit ac03ee5331612e44beb393df2b578c951d27dc0d from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
The tb->cflags field is not passed to tcg generation functions. So
we add a field to TCGContext, storing there a copy of tb->cflags.
Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the new field.
Backports commit e82d5a2460b0e176128027651ff9b104e4bdf5cc from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
Backports commit 87d757d60d66d5ee1608460b0f1e07e2b758db9c from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
Backports commit f0ddf11b23260f0af84fb529486a8f9ba2d19401 from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
Backports commit b5e3b4c2aca8eb5a9cfeedfb273af623f17c3731 from qemu
Thereby decoupling the resulting translated code from the current state
of the system.
Backports commit 2399d4e7cec22ecf1c51062d2ebfd45220dbaace from qemu
We were generating code during tb_invalidate_phys_page_range,
check_watchpoint, cpu_io_recompile, and (seemingly) discarding
the TB, assuming that it would magically be picked up during
the next iteration through the cpu_exec loop.
Instead, record the desired cflags in CPUState so that we request
the proper TB so that there is no more magic.
Backports commit 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9 from qemu
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.
Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.
Backports commit 4e2ca83e71b51577b06b1468e836556912bd5b6e from qemu
Unless overridden via an env var or configure arg, QEMU will only look
for the 'python' binary in $PATH. This is unhelpful on distros which
are only shipping Python 3.x (eg Fedora) in their default install as,
if they comply with PEP 394, the bare 'python' binary won't exist.
This changes configure so that by default it will search for all three
common python binaries, preferring to find Python 3.x versions.
Backports commit faf441429adfe5767be52c5dcdb8bc03161d064f from qemu
When running "make" in a build directory from the pre-Kconfig merge time,
the build process currently fails with:
make: *** No rule to make target `.../default-configs/pci.mak',
needed by `aarch64-softmmu/config-devices.mak'. Stop.
To make sure that this problem at least goes away when the user runs
"configure" (or "sh config.status") again, we have to make sure that
we re-generate the .mak.d files. Thus remove the old stale files
while running the configure script.
Backports commit 9c79024225af6b3ae04ea2dd94a5e5c4132a9e65 from qemu
Without the -Wno-typedef-redefinition option, clang complains if a typedef
gets redefined in gnu99 mode (since this is officially a C11 feature). This
used to also happen with older versions of GCC, but since we've bumped our
minimum GCC version to 4.8, all versions of GCC that we support do not seem
to issue this warning in gnu99 mode anymore. So this has become a common
problem for people who only test their code with GCC - they do not notice
the issue until they submit their patches and suddenly patchew or a
maintainer complains.
Now that we do not urgently need to keep the code clean from typedef
redefintions anymore with recent versions of GCC, we can ease the
situation with clang, too, and simply shut these warnings off for good.
Backports commit e6e90feedb706b1b92827a5977b37e1e8defb8ef from qemu
The M-profile architecture floating point system supports
lazy FP state preservation, where FP registers are not
pushed to the stack when an exception occurs but are instead
only saved if and when the first FP instruction in the exception
handler is executed. Implement this in QEMU, corresponding
to the check of LSPACT in the pseudocode ExecuteFPCheck().
Backports commit e33cf0f8d8c9998a7616684f9d6aa0d181b88803 from qemu
Pushing registers to the stack for v7M needs to handle three cases:
* the "normal" case where we pend exceptions
* an "ignore faults" case where we set FSR bits but
do not pend exceptions (this is used when we are
handling some kinds of derived exception on exception entry)
* a "lazy FP stacking" case, where different FSR bits
are set and the exception is pended differently
Implement this by changing the existing flag argument that
tells us whether to ignore faults or not into an enum that
specifies which of the 3 modes we should handle.
Backports commit a356dacf647506bccdf8ecd23574246a8bf615ac from qemu
In the v7M architecture, if an exception is generated in the process
of doing the lazy stacking of FP registers, the handling of
possible escalation to HardFault is treated differently to the normal
approach: it works based on the saved information about exception
readiness that was stored in the FPCCR when the stack frame was
created. Provide a new function armv7m_nvic_set_pending_lazyfp()
which pends exceptions during lazy stacking, and implements
this logic.
This corresponds to the pseudocode TakePreserveFPException().
Backports the relevant parts of commit
a99ba8ab1601904e0fa20325192fc850362ce80e from qemu
Add a new helper function which returns the MMU index to use
for v7M, where the caller specifies all of the security
state, privilege level and whether the execution priority
is negative, and reimplement the existing
arm_v7m_mmu_idx_for_secstate_and_priv() in terms of it.
We are going to need this for the lazy-FP-stacking code.
Backports commit fa6252a988dbe440cd6087bf93cbe0887f0c401b from qemu
The M-profile FPCCR.ASPEN bit indicates that automatic floating-point
context preservation is enabled. Before executing any floating-point
instruction, if FPCCR.ASPEN is set and the CONTROL FPCA/SFPA bits
indicate that there is no active floating point context then we
must create a new context (by initializing FPSCR and setting
FPCA/SFPA to indicate that the context is now active). In the
pseudocode this is handled by ExecuteFPCheck().
Implement this with a new TB flag which tracks whether we
need to create a new FP context.
Backports commit 6000531e19964756673a5f4b694a649ef883605a from qemu
The M-profile FPCCR.S bit indicates the security status of
the floating point context. In the pseudocode ExecuteFPCheck()
function it is unconditionally set to match the current
security state whenever a floating point instruction is
executed.
Implement this by adding a new TB flag which tracks whether
FPCCR.S is different from the current security state, so
that we only need to emit the code to update it in the
less-common case when it is not already set correctly.
Note that we will add the handling for the other work done
by ExecuteFPCheck() in later commits.
Backports commit 6d60c67a1a03be32c3342aff6604cdc5095088d1 from qemu
We are close to running out of TB flags for AArch32; we could
start using the cs_base word, but before we do that we can
economise on our usage by sharing the same bits for the VFP
VECSTRIDE field and the XScale XSCALE_CPAR field. This
works because no XScale CPU ever had VFP.
Backports commit ea7ac69d124c94c6e5579145e727adec9ccbefef from qemu
Move the NS TBFLAG down from bit 19 to bit 6, which has not
been used since commit c1e3781090b9d36c60 in 2015, when we
started passing the entire MMU index in the TB flags rather
than just a 'privilege level' bit.
This rearrangement is not strictly necessary, but means that
we can put M-profile-only bits next to each other rather
than scattered across the flag word.
Backports commit 7fbb535f7aeb22896fedfcf18a1eeff48165f1d7 from qemu
Handle floating point registers in exception return.
This corresponds to pseudocode functions ValidateExceptionReturn(),
ExceptionReturn(), PopStack() and ConsumeExcStackFrame().
Backports commit 6808c4d2d2826920087533f517472c09edc7b0d2 from qemu
The magic value pushed onto the callee stack as an integrity
check is different if floating point is present.
Backports commit 0dc51d66fcfcc4c72011cdafb401fd876ca216e7 from qemu
The TailChain() pseudocode specifies that a tail chaining
exception should sanitize the excReturn all-ones bits and
(if there is no FPU) the excReturn FType bits; we weren't
doing this.
Backports commit 60fba59a2f9a092a44b688df5d058cdd6dd9c276 from qemu
For v8M floating point support, transitions from Secure
to Non-secure state via BLNS and BLXNS must clear the
CONTROL.SFPA bit. (This corresponds to the pseudocode
BranchToNS() function.)
Backports commit 3cd6726f0ba7cc77342ee721bd86094e13b2a42a from qemu
Implement the code which updates the FPCCR register on an
exception entry where we are going to use lazy FP stacking.
We have to defer to the NVIC to determine whether the
various exceptions are currently ready or not.
Backports commit b593c2b81287040ab6f452afec6281e2f7ee487b from qemu
Handle floating point registers in exception entry.
This corresponds to the FP-specific parts of the pseudocode
functions ActivateException() and PushStack().
We defer the code corresponding to UpdateFPCCR() to a later patch.
Backports commit 0ed377a8013f40653a83f6ad2c9693897522d7dc from qemu
Currently the code in v7m_push_stack() which detects a violation
of the v8M stack limit simply returns early if it does so. This
is OK for the current integer-only code, but won't work for the
floating point handling we're about to add. We need to continue
executing the rest of the function so that we check for other
exceptions like not having permission to use the FPU and so
that we correctly set the FPCCR state if we are doing lazy
stacking. Refactor to avoid the early return.
Backports commit 3432c79a4e7345818d2defcf9e61a1bcb2907f9f from qemu
The M-profile CONTROL register has two bits -- SFPA and FPCA --
which relate to floating-point support, and should be RES0 otherwise.
Handle them correctly in the MSR/MRS register access code.
Neither is banked between security states, so they are stored
in v7m.control[M_REG_S] regardless of current security state.
Backports commit 2e1c5bcd32014c9ede1b604ae6c2c653de17fc53 from qemu
If the floating point extension is present, then the SG instruction
must clear the CONTROL_S.SFPA bit. Implement this.
(On a no-FPU system the bit will always be zero, so we don't need
to make the clearing of the bit conditional on ARM_FEATURE_VFP.)
Backports commit 1702071302934af77a072b7ee7c5eadc45b37573 from qemu
Correct the decode of the M-profile "coprocessor and
floating-point instructions" space:
* op0 == 0b11 is always unallocated
* if the CPU has an FPU then all insns with op1 == 0b101
are floating point and go to disas_vfp_insn()
For the moment we leave VLLDM and VLSTM as NOPs; in
a later commit we will fill in the proper implementation
for the case where an FPU is present.
Backports commit 8859ba3c9625e7ceb5599f457a344bcd7c5e112b from qemu
Like AArch64, M-profile floating point has no FPEXC enable
bit to gate floating point; so always set the VFPEN TB flag.
M-profile also has CPACR and NSACR similar to A-profile;
they behave slightly differently:
* the CPACR is banked between Secure and Non-Secure
* if the NSACR forces a trap then this is taken to
the Secure state, not the Non-Secure state
Honour the CPACR and NSACR settings. The NSACR handling
requires us to borrow the exception.target_el field
(usually meaningless for M profile) to distinguish the
NOCP UsageFault taken to Secure state from the more
usual fault taken to the current security state.
Backports commit d87513c0abcbcd856f8e1dee2f2d18903b2c3ea2 from qemu
The only "system register" that M-profile floating point exposes
via the VMRS/VMRS instructions is FPSCR, and it does not have
the odd special case for rd==15. Add a check to ensure we only
expose FPSCR.
Backports commit ef9aae2522c22c05df17dd898099dd5c3f20d688 from qemu
The M-profile floating point support has three associated config
registers: FPCAR, FPCCR and FPDSCR. It also makes the registers
CPACR and NSACR have behaviour other than reads-as-zero.
Add support for all of these as simple reads-as-written registers.
We will hook up actual functionality later.
The main complexity here is handling the FPCCR register, which
has a mix of banked and unbanked bits.
Note that we don't share storage with the A-profile
cpu->cp15.nsacr and cpu->cp15.cpacr_el1, though the behaviour
is quite similar, for two reasons:
* the M profile CPACR is banked between security states
* it preserves the invariant that M profile uses no state
inside the cp15 substruct
Backports commit d33abe82c7c9847284a23e575e1078cccab540b5 from qemu
Enforce that for M-profile various FPSCR bits which are RES0 there
but have defined meanings on A-profile are never settable. This
ensures that M-profile code can't enable the A-profile behaviour
(notably vector length/stride handling) by accident.
Backports commit 5bcf8ed9401e62c73158ba110864ee1375558bf7 from qemu
This change adapts io_readx() to its input access_type. Currently
io_readx() treats any memory access as a read, although it has an
input argument "MMUAccessType access_type". This results in:
1) Calling the tlb_fill() only with MMU_DATA_LOAD
2) Considering only entry->addr_read as the tlb_addr
Buglink: https://bugs.launchpad.net/qemu/+bug/1825359
Backports commit ef5dae6805cce7b59d129d801bdc5db71bcbd60d from qemu
This will not necessarily restrict the size of the TB, since for v7
the majority of constant pool usage is for calls from the out-of-line
ldst code, which is already at the end of the TB. But this does
allow us to save one insn per reference on the off-chance.
Backports commit b4b82d7e9caff7ccca5c621817b5a4b8e95eb9b1 from qemu
There is no point in coding for a 2GB offset when the max TB size
is already limited to 64k. If we further restrict to 32k then we
can eliminate the extra ADDIS instruction.
Backports commit a7cdaf710f2aaaf0be855a338dd67463d4bb99e2 from qemu
If the TB generates too much code, such that backend relocations
overflow, try again with a smaller TB. In support of this, move
relocation processing from a random place within tcg_out_op, in
the handling of branch opcodes, to a new function at the end of
tcg_gen_code.
This is not a complete solution, as there are additional relocs
generated for out-of-line ldst handling and constant pools.
Backports commit 7ecd02a06f8f4c0bbf872ecc15e37035b7e1df5f from qemu
If a TB generates too much code, try again with fewer insns.
Fixes: https://bugs.launchpad.net/bugs/1824853
Backports commit 6e6c4efed995d9eca6ae0cfdb2252df830262f50 from qemu
In order to handle TB's that translate to too much code, we
need to place the control of the length of the translation
in the hands of the code gen master loop.
Backports commit 8b86d6d25807e13a63ab6ea879f976b9f18cc45a from qemu
Will be helpful for s390x. Input 128 bit and output 64 bit only,
which is sufficient for now.
Backports commit 2089fcc9e7b4174d1c351eaa7d277c02188a6dd2 from qemu
Add a new base CPU model called 'Dhyana' to model processors from Hygon
Dhyana(family 18h), which derived from AMD EPYC(family 17h).
The following features bits have been removed compare to AMD EPYC:
aes, pclmulqdq, sha_ni
The Hygon Dhyana support to KVM in Linux is already accepted upstream[1].
So add Hygon Dhyana support to Qemu is necessary to create Hygon's own
CPU model.
Reference:
[1] https://git.kernel.org/tip/fec98069fb72fb656304a3e52265e0c2fc9adf87
Backports commit 8d031cec366f26669807eb43f61eb335973b7053 from qemu