Commit graph

493 commits

Author SHA1 Message Date
Richard Henderson 9cde8bfc44
target-arm: Use clz opcode
Backports commit 7539a012f614b724426ac9360238f3281d928a3f from qemu
2018-03-01 16:13:26 -05:00
Richard Henderson ce3c153bd8
target-arm: Use new deposit and extract ops
Use the new primitives for UBFX and SBFX.

Backports commits 59a71b4c5b4ef2ef6425b9e21c972dd5bf450275 and 86c9ab277615af4e0389eb80a83073873ff96c86 from qemu
2018-03-01 14:09:17 -05:00
Cédric Le Goater c7ab1e782b
target-arm: Add VBAR support to ARM1176 CPUs
ARM1176 CPUs have TrustZone support and can use the Vector Base
Address Register, but currently, qemu only adds VBAR support to ARMv7
CPUs. Fix this by adding a new feature ARM_FEATURE_VBAR which can used
for ARMv7 and ARM1176 CPUs.

The VBAR feature is always set for ARMv7 because some legacy boards
require it even if this is not architecturally correct.

Backports commit 91db4642f868cf2e591b62d31a19d35b02ea791e from qemu
2018-03-01 11:12:29 -05:00
Peter Maydell 554ad1f34e
target-arm: Log AArch64 exception returns
We already log exception entry; add logging of the AArch64 exception
return path as well.

Backports commit c9b61d9aa1ad234b0961f8add023cdc999cda3da from qemu
2018-03-01 11:08:35 -05:00
Julian Brown 86670028f7
Correct value of ARM Cortex-A8 MVFR1 register.
The value of the MVFR1 (Media and VFP Feature Register 1) register for
the Cortex-A8 appears to be incorrect (according to the TRM, DDI0344K),
with the "full denormal arithmetic" and "propagation of NaN" fields
holding both 0 instead of both 1.

I had a go tracing the history of the use of this value, and it seems
it's always just been wrong in QEMU: maybe it was derived from early
documentation, or guessed based on the use of a "VFP Lite" implementation
in the Cortex-A8.

Depending on the startup/early-boot code in use, this can manifest as
failure to perform denormal arithmetic properly: in our case, selecting
a Cortex-A8 CPU when using QEMU as an instruction-set simulator for
bare-metal GCC testing caused tests using denormal arithmetic to
fail. Problems might be masked (or not occur) when using a full OS kernel
with suitable trap handlers (I'm not sure).

Backports commit 0f1944735b6bac810b067e8a7a5154744536fd59 from qemu
2018-03-01 11:07:08 -05:00
Richard Henderson 0f94929fa7
target-arm: Fix aarch64 vec_reg_offset
Since CPUARMState.vfp.regs is not 16 byte aligned, the ^ 8 fixup used
for a big-endian host doesn't do what's intended. Fix this by adding
in the vfp.regs offset after computing the inter-register offset.

Backports commit d437262fa8edd0d9fbe038a515dda3dbf7c5bb54 from qemu
2018-03-01 09:36:03 -05:00
Richard Henderson ce9dca9c5e
target-arm: Fix aarch64 disas_ldst_single_struct
We add s->be_data within do_vec_ld/st. Adding it here means that
we have the wrong bits set in SIZE for a big-endian host, leading
to g_assert_not_reached in write_vec_element and read_vec_element.

Backports commit 74b13f92c2428abae41a61c46a5cf47545da5fcb from qemu
2018-03-01 09:34:34 -05:00
Alex Bennée f4c65739f3
target-arm/translate-a64: fix gen_load_exclusive
While testing rth's latest TCG patches with risu I found ldaxp was
broken. Investigating further I found it was broken by 1dd089d0 when
the cmpxchg atomic work was merged. As part of that change the code
attempted to be clever by doing a single 64 bit load and then shuffle
the data around to set the two 32 bit registers.

As I couldn't quite follow the endian magic I've simply partially
reverted the change to the original code gen_load_exclusive code. This
doesn't affect the cmpxchg functionality as that is all done on in
gen_store_exclusive part which is untouched.

I've also restored the comment that was removed (with a slight tweak
to mention cmpxchg).

Backports commit 5460da501a57cd72eda6fec736d76539122e2f99 from qemu
2018-03-01 09:09:16 -05:00
Wei Huang bceed21d23
arm: Add an option to turn on/off vPMU support
This patch adds a pmu=[on/off] option to enable/disable vPMU support
in guest vCPU. It allows virt tools, such as libvirt, to determine the
exsitence of vPMU and configure it. Note this option is only available
for cortex-a57/cortex-53/ host CPUs, but unavailable on ARMv7 and other
processors. Also even though "pmu=" option is available for TCG mode,
setting it doesn't turn PMU on.

Backports commit 929e754d5a621cd53f30e69b766ccf381b58d124 from qemu
2018-02-28 08:49:23 -05:00
Emilio G. Cota 22be035e60
target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
The exception is not emitted anymore; remove it and the associated
TCG variables.

Backports commit 05188cc72f0399e99c92f608a8e7ca4c8e552c4b from qemu
2018-02-28 00:24:20 -05:00
Emilio G. Cota cb92eea81a
target-arm: emulate aarch64's LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

atomic_add-bench: 1000000 ops/thread, [0,1] range

18 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
16 ++master +-H--+ ++
|| |
14 ++ ++
| | |
12 ++| ++
| | |
10 ++++ ++
8 ++E ++
|+++ |
6 ++ | ++
| | |
4 ++ | ++
| | |
2 +H++E+--- ++
+ | +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,2] range

18 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
16 ++master +-H--+ ++
| | |
14 ++E ++
| | |
12 ++| ++
|+++ |
10 ++ | ++
8 ++ | ++
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- |
2 +H+ +E+-----+++ +++ +++ ---+E+-----+E+------+++
+++ + +E+---+--+E+----++E+------+E+--- ++++ +++ + +E|
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,128] range

70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
60 ++master +-H--+ +++ ---+E+-----+E+------+E+
| +E+------E-------+E+--- |
| --- +++ |
50 ++ +++--- ++
| -+E+ |
40 ++ +++---- ++
| E- |
| --| |
30 ++ -- +++ ++
| +E+ |
20 ++E+ ++
|E+ |
| |
10 ++ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,1024] range

160 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
140 ++master +-H--+ +++ +++
| -+E+-----+E+-------E|
120 ++ +++ ---- +++
| +++ ----E-- |
100 ++ --E--- +++ ++
| +++ ---- +++ |
80 ++ --E-- ++
| ---- +++ |
| -+E+ |
60 ++ ---- +++ ++
| +E+- |
40 ++ -- ++
| +E+ |
20 +EE+ ++
+++ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

Backports commit 1dd089d0eec060dcd8478735114d98421d414805 from qemu
2018-02-28 00:21:27 -05:00
Emilio G. Cota 3546558f66
target-arm: emulate SWP with atomic_xchg helper
Backports commit cf12bce088f22b92bf62ffa0d7f6a3e951e355a9 from qemu
2018-02-28 00:11:23 -05:00
Emilio G. Cota ec14a00925
target-arm: emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

atomic_add-bench: 1000000 ops/thread, [0,1] range

9 ++---------+----------+----------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
8 +Emaster +-H--+ ++
| | |
7 ++E ++
| | |
6 ++++ ++
| | |
5 ++ | ++
4 ++ | ++
| | |
3 ++ | ++
| | |
2 ++ | ++
|H++E+--- +++ ---+E+------+E+------+E|
1 +++ +E+-----+E+------+E+------+E+------+E+-- +++ +++ ++
++H+ + +++ + +++ ++++ + + + |
0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,2] range

16 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
14 ++master +-H--+ ++
| | |
12 ++| ++
| E |
10 ++| ++
| | |
8 ++++ ++
|E+| |
| | |
6 ++ | ++
| | |
4 ++ | ++
| +E+--- +++ +++ +++ ---+E+------+E|
2 +H+ +E+------E-------+E+-----+E+------+E+------+E+-- +++
+ | + +++ + ++++ + + + |
0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,128] range

70 ++---------+----------+---------+----------+----------+----------+---++
+cmpxchg +-E--+ + + + ++++ + |
60 ++master +-H--+ ----E------+E+-------++
| -+E+--- +++ +++ +E|
| +++ ---- +++ ++|
50 ++ +++ ---+E+- ++
| -E--- |
40 ++ ---+++ ++
| +++--- |
| -+E+ |
30 ++ +++---- ++
| +E+ |
20 ++ +++-- ++
| +E+ |
|+E+ |
10 +E+ ++
+ + + + + + + |
0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

atomic_add-bench: 1000000 ops/thread, [0,1024] range

120 ++---------+---------+----------+---------+----------+----------+---++
+cmpxchg +-E--+ + + + + + |
| master +-H--+ ++|
100 ++ ----E+
| +++ ---+E+--- ++|
| --E--- +++ |
80 ++ ---- +++ ++
| ---+E+- |
60 ++ -+E+-- ++
| +++ ---- +++ |
| -+E+- |
40 ++ +++---- ++
| +++ ---+E+ |
| -+E+--- |
20 ++ +E+ ++
|+E+++ |
+E+ + + + + + + |
0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads

Backports commit 354161b37c6465a32073eac5f16fa35939af2bb4 from qemu
2018-02-28 00:07:44 -05:00
Richard Henderson fd9933fbd5
target-arm: Rearrange aa32 load and store functions
Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Backports commit 7f5616f53896a4e08ad37de3ac50d3a4cc8eff7a from qemu
2018-02-27 23:59:16 -05:00
Peter Maydell 1a850bcb19
target-arm: Implement new HLT trap for semihosting
Version 2.0 of the semihosting specification introduces new trap
instructions for AArch32: HLT 0xF000 for A32 and HLT 0x3C for T32.
Implement these (in the same way we implement the existing HLT
semihosting trap for A64).

The old traps via SVC and BKPT are unaffected.

Backports commit 19a6e31c9d2701ef648b70ddcfc3bf64cec8c37e from qemu
2018-02-26 15:28:45 -05:00
Peter Maydell 200771d0ba
target-arm: Add trace events for the generic timers
Backports commit 194cbc492bcc8f3f1868ec97a35146bc99c3c71c from qemu
2018-02-26 08:15:42 -05:00
Peter Maydell 158bfc109a
target-arm: Implement dummy MDCCINT_EL1
MDCCINT_EL1 is part of the DCC debugger communication
channel between the CPU and an attached external debugger.
QEMU doesn't implement this, but since Linux may try
to access this register we need to provide at least
a dummy implementation.

Backports commit 5dbdc4342f479d799a1970dd5fd22e64c9dcd50d from qemu
2018-02-26 08:11:54 -05:00
Peter Maydell f2dcb81b27
Fix masking of PC lower bits when doing exception returns
In commit 9b6a3ea7a699594 store_reg() was changed to mask
both bits 0 and 1 of the new PC value when in ARM mode.
Unfortunately this broke the exception return code paths
when doing a return from ARM mode to Thumb mode: in some
of these we write a new CPSR including new Thumb mode
bit via gen_helper_cpsr_write_eret(), and then use store_reg()
to write the new PC. In this case if the new CPSR specified
Thumb mode then masking bit 1 of the PC is incorrect
(these code paths correspond to the v8 ARM ARM pseudocode
function AArch32.ExceptionReturn(), which always aligns the
new PC appropriately for the new instruction set state).

Instead of using store_reg() in exception-return code paths,
call a new store_pc_exc_ret() which stores the raw new PC
value to env->regs[15], and then mask it appropriately in
the subsequent helper_cpsr_write_eret() where the new
env->thumb state is available.

This fixes a bug introduced by 9b6a3ea7a699594 which caused
crashes/hangs or otherwise bad behaviour for Linux when
userspace was using Thumb.

Backports commit fb0e8e79a9d77ee240dbca036fa8698ce654e5d1 from qemu
2018-02-26 08:09:28 -05:00
Thomas Hanson c69ae10ca7
target-arm: Comments added to identify cases in a switch
3 cases in a switch in disas_exc() require reference to the
ARM ARM spec in order to determine what case they're handling.

Backports commit 957956b3013c8122a749dfe61a41aef8b4100e31 from qemu
2018-02-26 08:05:49 -05:00
Thomas Hanson 00d1803436
target-arm: Code changes to implement overwrite of tag field on PC load
For BR, BLR and RET instructions, if tagged addresses are enabled, the
tag field in the address must be cleared out prior to loading the
address into the PC. Depending on the current EL, it will be set to
either all 0's or all 1's.

Backports commit 6feecb8b941f2d21e5645d0b6e0cdb776998121b from qemu
2018-02-26 08:04:00 -05:00
Thomas Hanson 2af4ca54e9
target-arm: Infrastucture changes to enable handling of tagged address loading into PC
When capturing the current CPU state for the TB, extract the TBI0 and TBI1
values from the correct TCR for the current EL and then add them to the TB
flags field.

Then, at the start of code generation for the block, copy the TBI fields
into the DisasContext structure.

Backports commit 86fb3fa4ed5873b021a362ea26a021f4aeab1bb4 from qemu
2018-02-26 07:58:17 -05:00
Peter Maydell f48d1fe391
target-arm: Correctly handle 'sub pc, pc, 1' for ARMv6
In the ARM v6 architecture, 'sub pc, pc, 1' is not an interworking
branch, so the computed new value is written to r15 as a normal
value. The architecture says that in this case, bits [1:0] of
the value written must be ignored if we are in ARM mode (or
bit [0] ignored if in Thumb mode); this is a change from the
ARMv4/v5 specification that behaviour is UNPREDICTABLE.
Use the correct mask on the PC value when doing a non-interworking
store to PC.

A popular library used on RaspberryPi uses this instruction
as part of a trick to determine whether it is running on
ARMv6 or ARMv7, and we were mishandling the sequence.

Fixes bug: https://bugs.launchpad.net/bugs/1625295

Backports commit 9b6a3ea7a699594162ed3d11e4e04b98568dc5c0 from qemu
2018-02-26 05:02:32 -05:00
Edgar E. Iglesias dedab81d68
target-arm: A64: Fix decoding of iss_sf in disas_ld_lit
Fix the decoding of iss_sf in disas_ld_lit.
The SF (Sixty-Four) field in the ISS (Instruction Specific Syndrome)
is a bit that specifies the width of the register that the
instruction loads to.

If cleared it specifies 32 bits.
If set it specifies 64 bits.

Backports commit 173ff58580b383a7841b18fddb293038c9d40d1c from qemu
2018-02-26 05:01:33 -05:00
Andrey Yurovsky e24890a580
arm: add Cortex A7 CPU parameters
Add the "cortex-a7" CPU with features and registers matching the Cortex-A7
MPCore Technical Reference Manual and the Cortex-A7 Floating-Point Unit
Technical Reference Manual. The A7 is very similar to the A15.

Backports commit dcf578ed8cec89543158b103940e854ebd21a8cf from qemu
2018-02-26 03:44:24 -05:00
Pranith Kumar 32b7cee81e
target-aarch64: Generate fences for aarch64
Backports commit ce1bd93f94e8d4b7117744e49652d2f907bed99f from qemu
2018-02-26 03:26:35 -05:00
Pranith Kumar 7849f8d72a
target-arm: Generate fences in ARMv7 frontend
Backports commit 61e4c432ab26526bab0f3ef746c1861415b6da29 from qemu
2018-02-26 03:22:53 -05:00
Richard Henderson 66d79ac959
tcg: Merge GETPC and GETRA
The return address argument to the softmmu template helpers was
confused. In the legacy case, we wanted to indicate that there
is no return address, and so passed in NULL. However, we then
immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
value, indicating the presence of an (invalid) return address.

Push the GETPC_ADJ subtraction down to the only point it's required:
immediately before use within cpu_restore_state_from_tb, after all
NULL pointer checks have been completed.

This makes GETPC and GETRA identical. Remove GETRA as the lesser
used macro, replacing all uses with GETPC.

Backports commit 01ecaf438b1eb46abe23392c8ce5b7628b0c8cf5 from qemu
2018-02-26 02:54:44 -05:00
Sergey Sorokin a882118050
target-arm: Fix lpae bit in FSR on an alignment fault
If an alignment fault occurred and target EL is using AArch32,
then DFSR/IFSR bit LPAE[9] must be set correctly.

Backports commit e0fe723c24562c8f909bb40f131bfdbe75650677 from qemu
2018-02-25 23:10:29 -05:00
Pranith Kumar 4c880fba9d
target-arm: Fix warn about implicit conversion
Clang warns about an implicit conversion as follows:

/mnt/devops/code/qemu/target-arm/neon_helper.c:1075:1: warning: implicit conversion from 'int' to 'int8_t' (aka 'signed char') changes value from 128 to -128 [-Wconstant-conversion]
NEON_VOP_ENV(qrshl_s8, neon_s8, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/devops/code/qemu/target-arm/neon_helper.c:116:83: note: expanded from macro 'NEON_VOP_ENV'
uint32_t HELPER(glue(neon_,name))(CPUARMState *env, uint32_t arg1, uint32_t arg2) \
^
/mnt/devops/code/qemu/target-arm/neon_helper.c:106:5: note: expanded from macro '\
NEON_VOP_BODY'
NEON_DO##n; \
^~~~~~~~~~
<scratch space>:21:1: note: expanded from here
NEON_DO4
^~~~~~~~
/mnt/devops/code/qemu/target-arm/neon_helper.c:93:5: note: expanded from macro 'NEON_DO4'
NEON_FN(vdest.v1, vsrc1.v1, vsrc2.v1); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/devops/code/qemu/target-arm/neon_helper.c:1054:23: note: expanded from macro 'NEON_FN'
dest = (1 << (sizeof(src1) * 8 - 1)); \
~ ~~^~~~~~~~~~~~~~~~~~~~~~~~~

Fix it by casting to appropriate type.

Backports commit 6bbbb0ac136102098a70b97ab0c07bc7bf53131c from qemu
2018-02-25 22:44:43 -05:00
Richard Henderson 1547048a22
tcg: Reorg TCGOp chaining
Instead of using -1 as end of chain, use 0, and link through the 0
entry as a fully circular double-linked list.

Backports commit dcb8e75870e2de199db853697f8839cb603beefe from qemu
2018-02-25 21:44:50 -05:00
Sergey Sorokin 4a904baaf5
target-arm: Add missed AArch32 TLBI sytem registers
Some PL2 related TLBI system registers are missed in AArch32
implementation. The patch fixes it.

Backports commit 541ef8c2e73fb99d173b125bef7c262fdd2fe33c from qemu
2018-02-25 19:59:15 -05:00
Markus Armbruster 2b65f98538
target-*: Clean up cpu.h header guards
Most of them use guard symbols like CPU_$target_H, but we also have
__MIPS_CPU_H__ and __TRICORE_CPU_H__. They all upset
scripts/clean-header-guards.pl.

The script dislikes CPU_$target_H because they don't match their file
name (they should, to make guard collisions less likely). The others
are reserved identifiers.

Clean them all up: use guard symbol $target_CPU_H for
target-$target/cpu.h.

Backports commit 07f5a258750b3b9a6e10fd5ec3e29c9a943b650e from qemu
2018-02-25 04:12:46 -05:00
Markus Armbruster 60e8836b74
Use #include "..." for our own headers, <...> for others
Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.

Backports commit a9c94277f07d19d3eb14f199c3e93491aa3eae0e from qemu
2018-02-25 04:10:33 -05:00
Sergey Sorokin d1e4ac0451
Fix confusing argument names in some common functions
There are functions tlb_fill(), cpu_unaligned_access() and
do_unaligned_access() that are called with access type and mmu index
arguments. But these arguments are named 'is_write' and 'is_user' in their
declarations. The patches fix the arguments to avoid a confusion.

Backports commit b35399bb4e9968296a12303b00f9f2066470e987 from qemu
2018-02-25 03:58:27 -05:00
Aleksandar Markovic 6eb4fa54f6
softfloat: Implement run-time-configurable meaning of signaling NaN bit
This patch modifies SoftFloat library so that it can be configured in
run-time in relation to the meaning of signaling NaN bit, while, at the
same time, strictly preserving its behavior on all existing platforms.

Background:

In floating-point calculations, there is a need for denoting undefined or
unrepresentable values. This is achieved by defining certain floating-point
numerical values to be NaNs (which stands for "not a number"). For additional
reasons, virtually all modern floating-point unit implementations use two
kinds of NaNs: quiet and signaling. The binary representations of these two
kinds of NaNs, as a rule, differ only in one bit (that bit is, traditionally,
the first bit of mantissa).

Up to 2008, standards for floating-point did not specify all details about
binary representation of NaNs. More specifically, the meaning of the bit
that is used for distinguishing between signaling and quiet NaNs was not
strictly prescribed. (IEEE 754-2008 was the first floating-point standard
that defined that meaning clearly, see [1], p. 35) As a result, different
platforms took different approaches, and that presented considerable
challenge for multi-platform emulators like QEMU.

Mips platform represents the most complex case among QEMU-supported
platforms regarding signaling NaN bit. Up to the Release 6 of Mips
architecture, "1" in signaling NaN bit denoted signaling NaN, which is
opposite to IEEE 754-2008 standard. From Release 6 on, Mips architecture
adopted IEEE standard prescription, and "0" denotes signaling NaN. On top of
that, Mips architecture for SIMD (also known as MSA, or vector instructions)
also specifies signaling bit in accordance to IEEE standard. MSA unit can be
implemented with both pre-Release 6 and Release 6 main processor units.

QEMU uses SoftFloat library to implement various floating-point-related
instructions on all platforms. The current QEMU implementation allows for
defining meaning of signaling NaN bit during build time, and is implemented
via preprocessor macro called SNAN_BIT_IS_ONE.

On the other hand, the change in this patch enables SoftFloat library to be
configured in run-time. This configuration is meant to occur during CPU
initialization, at the moment when it is definitely known what desired
behavior for particular CPU (or any additional FPUs) is.

The change is implemented so that it is consistent with existing
implementation of similar cases. This means that structure float_status is
used for passing the information about desired signaling NaN bit on each
invocation of SoftFloat functions. The additional field in float_status is
called snan_bit_is_one, which supersedes macro SNAN_BIT_IS_ONE.

IMPORTANT:

This change is not meant to create any change in emulator behavior or
functionality on any platform. It just provides the means for SoftFloat
library to be used in a more flexible way - in other words, it will just
prepare SoftFloat library for usage related to Mips platform and its
specifics regarding signaling bit meaning, which is done in some of
subsequent patches from this series.

Further break down of changes:

1) Added field snan_bit_is_one to the structure float_status, and
correspondent setter function set_snan_bit_is_one().

2) Constants <float16|float32|float64|floatx80|float128>_default_nan
(used both internally and externally) converted to functions
<float16|float32|float64|floatx80|float128>_default_nan(float_status*).
This is necessary since they are dependent on signaling bit meaning.
At the same time, for the sake of code cleanup and simplicity, constants
<floatx80|float128>_default_nan_<low|high> (used only internally within
SoftFloat library) are removed, as not needed.

3) Added a float_status* argument to SoftFloat library functions
XXX_is_quiet_nan(XXX a_), XXX_is_signaling_nan(XXX a_),
XXX_maybe_silence_nan(XXX a_). This argument must be present in
order to enable correct invocation of new version of functions
XXX_default_nan(). (XXX is <float16|float32|float64|floatx80|float128>
here)

4) Updated code for all platforms to reflect changes in SoftFloat library.
This change is twofolds: it includes modifications of SoftFloat library
functions invocations, and an addition of invocation of function
set_snan_bit_is_one() during CPU initialization, with arguments that
are appropriate for each particular platform. It was established that
all platforms zero their main CPU data structures, so snan_bit_is_one(0)
in appropriate places is not added, as it is not needed.

[1] "IEEE Standard for Floating-Point Arithmetic",
IEEE Computer Society, August 29, 2008.

Backports commit af39bc8c49224771ec0d38f1b693ea78e221d7bc from qemu
2018-02-24 20:27:12 -05:00
Lluís Vilanova 2297527755
exec: [tcg] Track which vCPU is performing translation and execution
Information is tracked inside the TCGContext structure, and later used
by tracing events with the 'tcg' and 'vcpu' properties.

The 'cpu' field is used to check tracing of translation-time
events ("*_trans"). The 'tcg_env' field is used to pass it to
execution-time events ("*_exec").

Backports commit 7c2550432abe62f53e6df878ceba6ceaf71f0e7e from qemu
2018-02-24 19:21:39 -05:00
Peter Maydell 5ae787f895
target-arm: Provide hook to tell GICv3 about changes of security state
The GICv3 CPU interface needs to know when the CPU it is attached
to makes an exception level or mode transition that changes the
security state, because whether it is asserting IRQ or FIQ can change
depending on these things. Provide a mechanism for letting the GICv3
device register a hook to be called on such changes.

Backports commit bd7d00fc50c9960876dd194ebf0c88889b53e765 from qemu
2018-02-24 19:09:22 -05:00
Peter Maydell eec3a5f843
target-arm: Define new arm_is_el3_or_mon() function
The GICv3 system registers need to know if the CPU is AArch64
in EL3 or AArch32 in Monitor mode. This happens to be the first
part of the check for arm_is_secure(), so factor it out into a
new arm_is_el3_or_mon() function that the GIC can also use.

Backports commit 712058764da29b2908f6fbf56760ca4f15980709 from qemu
2018-02-24 19:04:27 -05:00
Peter Maydell 9bdf310d49
target-arm: Don't permit ARMv8-only Neon insns on ARMv7
The Neon instructions VCVTA, VCVTM, VCVTN, VCVTP, VRINTA, VRINTM,
VRINTN, VRINTP, VRINTX, and VRINTZ were only introduced with ARMv8,
so they need a guard to make them UNDEF if the CPU only supports ARMv7.
(We got this right for all the other new-in-v8 insns, but forgot
it for these Neon 2-reg-misc ops.)

Backports commit fe8fcf3d642b4de1369841bf6acac13e0ec8770d from qemu
2018-02-24 18:20:00 -05:00
Peter Maydell a9fb399490
target-arm: Fix reset and migration of TTBCR(S)
Commit 6459b94c26dd666badb3 broke reset and migration of the AArch32
TTBCR(S) register if the guest used non-LPAE page tables. This is
because the AArch32 TTBCR register definition is marked as ARM_CP_ALIAS,
meaning that the AArch64 variant has to handle migration and reset.
Although AArch64 TCR_EL3 doesn't need to care about the mask and
base_mask fields, AArch32 may do so, and so we must use the special
TTBCR reset and raw write functions to ensure they are set correctly.

This doesn't affect TCR_EL2, because the AArch32 equivalent of that
is HTCR, which never uses the non-LPAE page table variant.

Backports commit 811595a2d4ab8c6354857a50ffd29fafce52a892 from qemu
2018-02-24 18:18:24 -05:00
Shannon Zhao 51c9e12605
target-arm: kvm64: set guest PMUv3 feature bit if supported
Check if kvm supports guest PMUv3. If so, set the corresponding feature
bit for vcpu.

Backports commit 5c0a3819f009639f67ce0453dff6ec7211bfee54 from qemu
2018-02-24 18:17:11 -05:00
Sergey Sorokin c05902eddd
target-arm: Fix TTBR selecting logic on AArch32 Stage 2 translation
Address size is 40-bit for the AArch32 stage 2 translation,
and t0sz can be negative (from -8 to 7),
so we need to adjust it to use the existing TTBR selecting logic.

Backports commit 6e99f762612827afeff54add2e4fc2c3b2657fed from qemu
2018-02-24 16:54:32 -05:00
Peter Maydell 806d72035e
target-arm: Don't try to set ESR IL bit in arm_cpu_do_interrupt_aarch64()
Remove some incorrect code from arm_cpu_do_interrupt_aarch64()
which attempts to set the IL bit in the syndrome register based
on the value of env->thumb. This is wrong in several ways:
* IL doesn't indicate Thumb-vs-ARM, it indicates instruction
length (which may be 16 or 32 for Thumb and is always 32 for ARM)
* not every syndrome format uses IL like this -- for some IL is
always set, and for some it is always clear
* the code is changing esr_el[new_el] even for interrupt entry,
which is not supposed to modify ESR_ELx at all

Delete the code, and instead rely on the syndrome value in
env->exception.syndrome having already been set up with the
correct value of IL.

Backports commit 78f1edb19fe11fa0c5d0bf484db59a384f455d3c from qemu
2018-02-24 16:49:53 -05:00
Peter Maydell dc8bf22d88
target-arm: Set IL bit in syndromes for insn abort, watchpoint, swstep
For some exception syndrome types, the IL bit should always be set.
This includes the instruction abort, watchpoint and software step
syndrome types; add the missing ARM_EL_IL bit to the syndrome
values returned by syn_insn_abort(), syn_swstep() and syn_watchpoint().

Backports commit 04ce861ea545477425ad9e045eec3f61c8a27df9 from qemu
2018-02-24 16:48:59 -05:00
Edgar E. Iglesias 8aee797956
target-arm: A64: Create Instruction Syndromes for Data Aborts
Add support for generating the ISS (Instruction Specific Syndrome) for
Data Abort exceptions taken from AArch64.
These syndromes are used by hypervisors for example to trap and emulate
memory accesses.

We save the decoded data out-of-band with the TBs at translation time.
When exceptions hit, the extra data attached to the TB is used to
recreate the state needed to encode instruction syndromes.
This avoids the need to emit moves with every load/store.

Based on a suggestion from Peter Maydell.

Backports commit aaa1f954d4cab243e3d5337a72bc6d104e1c4808 from qemu
2018-02-24 16:46:44 -05:00
Alistair Francis 25daa5363e
target-arm: Add the HSTR_EL2 register
Add the Hypervisor System Trap Register for EL2.

This register is used early in the Linux boot and without it the kernel
aborts with a "Synchronous Abort" error.

Backports commit 2a5a9abd4bc45e2f4c62c77e07aebe53608c6915 from qemu
2018-02-24 16:24:57 -05:00
Paolo Bonzini 9485b7c2e1
cpu: move exec-all.h inclusion out of cpu.h
exec-all.h contains TCG-specific definitions. It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.

One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.

Backports commit 63c915526d6a54a95919ebece83fa9ca631b2508 from qemu
2018-02-24 02:39:08 -05:00
Paolo Bonzini 058624b9e4
arm: move arm_log_exception into .c file
Avoid need for qemu/log.h inclusion, and make the function static too.

Backports commit 27a7ea8a1f351578ce869b41ba1ba662c063fd62 from qemu
2018-02-24 01:52:55 -05:00
Paolo Bonzini 37f26922dd
qemu-common: push cpu.h inclusion out of qemu-common.h
Backports commit 33c11879fd422b759483ed25fef133ea900ea8d7 from qemu
2018-02-24 01:50:56 -05:00
Lioncash 791413630e
target-arm: make cpu-qom.h not target specific
Make ARMCPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h. This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Backports commit 74e755647c1598a6845df1ee4f8b96d01afd96e7 from qemu
2018-02-24 00:48:59 -05:00