Commit graph

2395 commits

Author SHA1 Message Date
Peter Maydell e528c8229e target/arm: Get correct MMU index for other-security-state
In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Backports 7142eb9e24b4aa5118cd67038057f15694d782aa
2021-03-02 13:08:44 -05:00
Rémi Denis-Courmont a4053565d6 target/arm: fix LORID_EL1 access check
Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Backports 9bd268bae5c4760870522292fb1d46e7da7e372a
2021-03-02 13:06:50 -05:00
Rémi Denis-Courmont df4413edc7 target/arm: fix handling of HCR.FB
HCR should be applied when NS is set, not when it is cleared.

Backports 373e7ffde9bae90a20fb5db21b053f23091689f4
2021-03-02 13:05:01 -05:00
Peter Maydell 6b8096d9fc target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Backports d1a9254be5cc93afb15be19f7543da6ff4806256
2021-03-02 13:03:51 -05:00
Peter Maydell 5c6730a432 target/arm: Fix float16 pairwise Neon ops on big-endian hosts
In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data. This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Backports 552714c0812a10e5cff239bd29928e5fcb8d8b3b
2021-03-02 13:02:31 -05:00
Richard Henderson d473f66177 target/arm: Improve do_prewiden_3d
We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Backports 8aab18a2c5209e4e48998a61fbc2d89f374331ed
2021-03-02 13:00:25 -05:00
Richard Henderson 9263117d47 target/arm: Simplify do_long_3d and do_2scalar_long
In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps

Backports 9f1a5f93c2dd345dc6c8fe86ed14bf1485056f6e
2021-03-02 12:46:53 -05:00
Richard Henderson 07c2b70234 target/arm: Rename neon_load_reg64 to vfp_load_reg64
The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Backports b38b96ca90827012ab8eb045c1337cea83a54c4b
2021-03-02 12:43:25 -05:00
Richard Henderson 9d87b62578 target/arm: Add read/write_neon_element64
Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Backports 0aa8e700a53b0aa7275ed747b8fa3acb61d35f2d
2021-03-02 12:40:33 -05:00
Richard Henderson 89b1f62878 target/arm: Rename neon_load_reg32 to vfp_load_reg32
The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Backports 21c1c0e50b73c580c6bfc8f2314d1b6a14793561
2021-03-02 12:30:20 -05:00
Richard Henderson 011d9ab061 target/arm: Expand read/write_neon_element32 to all MemOp
We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Backports 4d5fa5a80ac28f34b8497be1e85371272413a12e
2021-03-02 12:26:41 -05:00
Richard Henderson d21316d639 target/arm: Add read/write_neon_element32
Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc. The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Backports a712266f5d5a36d04b22fe69fa15592d62bed019
2021-03-02 12:18:31 -05:00
Richard Henderson e390c1ec7f target/arm: Use neon_element_offset in vfp_reg_offset
This seems a bit more readable than using offsetof CPU_DoubleU.

Backports d8719785fde2f5041986853a314c05c6f567d3cb
2021-03-02 11:55:49 -05:00
Richard Henderson c1ca9e53da target/arm: Use neon_element_offset in neon_load/store_reg
These are the only users of neon_reg_offset, so remove that.

Backports 0f2cdc82276a723ee58562b56b9d537a4bd7bfef
2021-03-02 11:54:56 -05:00
Richard Henderson 1b09d0d96f target/arm: Move neon_element_offset to translate.c
This will shortly have users outside of translate-neon.c.inc.

Backports 7ec85c02833f4264840c6ed78b749443a7b4ffe0
2021-03-02 11:52:59 -05:00
Richard Henderson 8a20537e7f target/arm: Introduce neon_full_reg_offset
This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0. This fixes a bug
when running on a big-endian host.

Backports 015ee81a4c06b644969f621fd9965cc6372b879e
2021-03-02 11:50:36 -05:00
Peter Maydell 2f0940677e target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
If the M-profile low-overhead-branch extension is implemented, FPSCR
bits [18:16] are a new field LTPSIZE. If MVE is not implemented
(currently always true for us) then this field always reads as 4 and
ignores writes.

These bits used to be the vector-length field for the old
short-vector extension, so we need to take care that they are not
misinterpreted as setting vec_len. We do this with a rearrangement
of the vfp_set_fpscr() code that deals with vec_len, vec_stride
and also the QC bit; this obviates the need for the M-profile
only masking step that we used to have at the start of the function.

We provide a new field in CPUState for LTPSIZE, even though this
will always be 4, in preparation for MVE, so we don't have to
come back later and split it out of the vfp.xregs[FPSCR] value.
(This state struct field will be saved and restored as part of
the FPSCR value via the vmstate_fpscr in machine.c.)

Backports 8128c8e8cc9489a8387c74075974f86dc0222e7f
2021-03-01 20:36:02 -05:00
Peter Maydell 8a6e118a17 target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
M-profile CPUs with half-precision floating point support should
be able to write to FPSCR.FZ16, but an M-profile specific masking
of the value at the top of vfp_set_fpscr() currently prevents that.
This is not yet an active bug because we have no M-profile
FP16 CPUs, but needs to be fixed before we can add any.

The bits that the masking is effectively preventing from being
set are the A-profile only short-vector Len and Stride fields,
plus the Neon QC bit. Rearrange the order of the function so
that those fields are handled earlier and only under a suitable
guard; this allows us to drop the M-profile specific masking,
making FZ16 writeable.

This change also makes the QC bit correctly RAZ/WI for older
no-Neon A-profile cores.

This refactoring also paves the way for the low-overhead-branch
LTPSIZE field, which uses some of the bits that are used for
A-profile Stride and Len.

Backports commit d31e2ce68d56f5bcc83831497e5fe4b8a7e18e85
2021-03-01 20:33:22 -05:00
Peter Maydell 3ae5543825 target/arm: Implement v8.1M low-overhead-loop instructions
v8.1M's "low-overhead-loop" extension has three instructions
for looping:
* DLS (start of a do-loop)
* WLS (start of a while-loop)
* LE (end of a loop)

The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR. The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.

As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.

(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors. We'll add those later when we implement
MVE.)

Backports commit b7226369721896ab9ef71544e4fe95b40710e05a
2021-03-01 20:29:04 -05:00
Peter Maydell be197f9857 target/arm: Implement v8.1M branch-future insns (as NOPs)
v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).

For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.

Backports commit 05903f036edba8e3ed940cc215b8e27fb49265b9
2021-03-01 20:25:15 -05:00
Peter Maydell 966246d991 target/arm: Don't allow BLX imm for M-profile
The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.

(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)

Backports 920f04fa3ea789f8f85a52cee5395b8887b56cf7
2021-03-01 20:23:59 -05:00
Peter Maydell 5680bc701b target/arm: Make the t32 insn[25:23]=111 group non-overlapping
The t32 decode has a group which represents a set of insns
which overlap with B_cond_thumb because they have [25:23]=111
(which is an invalid condition code field for the branch insn).
This group is currently defined using the {} overlap-OK syntax,
but it is almost entirely non-overlapping patterns. Switch
it over to use a non-overlapping group.

For this to be valid syntactically, CPS must move into the same
overlapping-group as the hint insns (CPS vs hints was the
only actual use of the overlap facility for the group).

The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
necessary and so we can remove it (promoting those insns to
be members of the parent group).

Backports 45f11876ae86128bdee27e0b089045de43cc88e4
2021-03-01 20:22:11 -05:00
Peter Maydell 666fe17025 target/arm: Implement v8.1M conditional-select insns
v8.1M brings four new insns to M-profile:
* CSEL : Rd = cond ? Rn : Rm
* CSINC : Rd = cond ? Rn : Rm+1
* CSINV : Rd = cond ? Rn : ~Rm
* CSNEG : Rd = cond ? Rn : -Rm

Implement these.

Backports cc73bbded0dfb5612b0e416f7eda13a66950542a
2021-03-01 20:19:33 -05:00
Peter Maydell 2dae268fcb target/arm: Implement v8.1M NOCP handling
From v8.1M, disabled-coprocessor handling changes slightly:
* coprocessors 8, 9, 14 and 15 are also governed by the
cp10 enable bit, like cp11
* an extra range of instruction patterns is considered
to be inside the coprocessor space

We previously marked these up with TODO comments; implement the
correct behaviour.

Unfortunately there is no ID register field which indicates this
behaviour. We could in theory test an unrelated ID register which
indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
>= 3 (low-overhead-loops), but it seems better to simply define a new
ARM_FEATURE_V8_1M feature flag and use it for this and other
new-in-v8.1M behaviour that isn't identifiable from the ID registers.

Backports commit 5d2555a1fe7370feeb1efbbf276a653040910017
2021-03-01 20:16:09 -05:00
Richard Henderson f7e831a7e4 target/arm: Ignore HCR_EL2.ATA when {E2H,TGE} != 11
Unlike many other bits in HCR_EL2, the description for this
bit does not contain the phrase "if ... this field behaves
as 0 for all purposes other than", so do not squash the bit
in arm_hcr_el2_eff.

Instead, replicate the E2H+TGE test in the two places that
require it.

Backports 4301acd7d7d455792ea873ced75c0b5d653618b1
2021-03-01 20:12:36 -05:00
Richard Henderson 4f00eacb11 target/arm: Fix reported EL for mte_check_fail
The reporting in AArch64.TagCheckFail only depends on PSTATE.EL,
and not the AccType of the operation. There are two guest
visible problems that affect LDTR and STTR because of this:

(1) Selecting TCF0 vs TCF1 to decide on reporting,
(2) Report "data abort same el" not "data abort lower el".

Backports 50244cc76abcac3296cff3d84826f5ff71808c80
2021-03-01 20:10:44 -05:00
Richard Henderson 511636a3f4 target/arm: Remove redundant mmu_idx lookup
We already have the full ARMMMUIdx as computed from the
function parameter.

For the purpose of regime_has_2_ranges, we can ignore any
difference between AccType_Normal and AccType_Unpriv, which
would be the only difference between the passed mmu_idx
and arm_mmu_idx_el.

Backports 4aedfc0f633fd06dd2a5dc8ffa93f4c56117e37f
2021-03-01 20:09:51 -05:00
Peter Maydell d350644817 target/arm: AArch32 VCVT fixed-point to float is always round-to-nearest
For AArch32, unlike the VCVT of integer to float, which honours the
rounding mode specified by the FPSCR, VCVT of fixed-point to float is
always round-to-nearest. (AArch64 fixed-point-to-float conversions
always honour the FPCR rounding mode.)

Implement this by providing _round_to_nearest versions of the
relevant helpers which set the rounding mode temporarily when making
the call to the underlying softfloat function.

We only need to change the VFP VCVT instructions, because the
standard- FPSCR value used by the Neon VCVT is always set to
round-to-nearest, so we don't need to do the extra work of saving
and restoring the rounding mode.

Backports commit 61db12d9f9eb36761edba4d9a414cd8dd34c512b
2021-03-01 20:04:31 -05:00
Peter Maydell 31013d5a8f target/arm: Fix SMLAD incorrect setting of Q bit
The SMLAD instruction is supposed to:
* signed multiply Rn[15:0] * Rm[15:0]
* signed multiply Rn[31:16] * Rm[31:16]
* perform a signed addition of the products and Ra
* set Rd to the low 32 bits of the theoretical
infinite-precision result
* set the Q flag if the sign-extension of Rd
would differ from the infinite-precision result
(ie on overflow)

Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q. This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1

The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.

Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.

Backports commit 5288145d716338ace0f83e3ff05c4d07715bb4f4
2021-03-01 19:58:39 -05:00
Peter Maydell 6cd06169ee target/arm: Make '-cpu max' have a 48-bit PA
QEMU supports a 48-bit physical address range, but we don't currently
expose it in the '-cpu max' ID registers (you get the same range as
Cortex-A57, which is 44 bits).

Set the ID_AA64MMFR0.PARange field to indicate 48 bits.

Backports d1b6b7017572e8d82f26eb827a1dba0e8cf3cae6
2021-03-01 19:50:28 -05:00
Richard Henderson 567fa21c65 target/arm: Fix SVE splice
While converting to gen_gvec_ool_zzzp, we lost passing
a->esz as the data argument to the function.

Backports commit dd701fafe55a78e655d4823d29226d92250a6b56
2021-03-01 19:20:44 -05:00
Richard Henderson ccb293911f target/arm: Fix sve ldr/str
The mte update missed a bit when producing clean addresses.

Fixes: b2aa8879b88

Backports d8227b098301935ea8e0e032e7d41e5dc3e97590
2021-03-01 19:20:04 -05:00
Peter Maydell 79feec40df target/arm: Make isar_feature_aa32_fp16_arith() handle M-profile
The M-profile definition of the MVFR1 ID register differs slightly
from the A-profile one, and in particular the check for "does the CPU
support fp16 arithmetic" is not the same.

We don't currently implement any M-profile CPUs with fp16 arithmetic,
so this is not yet a visible bug, but correcting the logic now
disarms this beartrap for when we eventually do.

Backports commit dfc523a84b06b6a4b583ed4c29d24fd980dd37a0
2021-03-01 19:17:23 -05:00
Peter Maydell 09a7d6381e target/arm: Move id_pfr0, id_pfr1 into ARMISARegisters
Move the id_pfr0 and id_pfr1 fields into the ARMISARegisters
sub-struct. We're going to want id_pfr1 for an isar_features
check, and moving both at the same time avoids an odd
inconsistency.

Changes other than the ones to cpu.h and kvm64.c made
automatically with:
perl -p -i -e 's/cpu->id_pfr/cpu->isar.id_pfr/' target/arm/*.c hw/intc/armv7m_nvic.c

Backports commit 8a130a7be6e222965641e1fd9469fd3ee752c7d4
2021-03-01 19:15:10 -05:00
Peter Maydell ed92f3c42b target/arm: Replace ARM_FEATURE_PXN with ID_MMFR0.VMSA check
The ARM_FEATURE_PXN bit indicates whether the CPU supports the PXN
bit in short-descriptor translation table format descriptors. This
is indicated by ID_MMFR0.VMSA being at least 0b0100. Replace the
feature bit with an ID register check, in line with our preference
for ID register checks over feature bits.

Backports commit 0ae0326b984e77a55c224b7863071bd3d8951231
2021-03-01 19:06:15 -05:00
Xiaoyao Li d9d68cc128 i386/cpu: Clear FEAT_XSAVE_COMP_{LO,HI} when XSAVE is not available
Per Intel SDM vol 1, 13.2, if CPUID.1:ECX.XSAVE[bit 26] is 0, the
processor provides no further enumeration through CPUID function 0DH.
QEMU does not do this for "-cpu host,-xsave".

Backports 19ca8285fcd61a8f60f2f44f789a561e0958e8e6
2021-03-01 19:04:03 -05:00
Richard Henderson 5e6196ea6b target/riscv: Set instance_align on RISCVCPU TypeInfo
Fix alignment of CPURISCVState.vreg.

Backports 5de5b99b3101a1648ed583193db8d92eea0c4545
2021-03-01 19:00:27 -05:00
Richard Henderson cdf40f7ff6 target/arm: Set instance_align on CPUARM TypeInfo
Fix alignment of CPUARMState.vfp.zregs.

Backports d03087bda4ba17076b430fd2af083020d7c5112a
2021-03-01 18:58:44 -05:00
Aaron Lindsay 97702da7ad target/arm: Count PMU events when MDCR.SPME is set
This check was backwards when introduced in commit
033614c47de78409ad3fb39bb7bd1483b71c6789:

target/arm: Filter cycle counter based on PMCCFILTR_EL0

Backports commit db1f3afb17269cf2bd86c222e1bced748487ef71
2021-03-01 18:25:25 -05:00
Peter Maydell 16ad0d93d9 target/arm: Convert VCMLA, VCADD size field to MO_* in decode
The VCMLA and VCADD insns have a size field which is 0 for fp16
and 1 for fp32 (note that this is the reverse of the Neon 3-same
encoding!). Convert it to MO_* values in decode for consistency.

Backports d186a4854c04e9832907b0b4240a47731da20993
2021-03-01 18:23:34 -05:00
Peter Maydell 61abec1908 target/arm: Convert Neon VCVT fp size field to MO_* in decode
Convert the insns using the 2reg_vcvt and 2reg_vcvt_f16 formats
to pass the size through to the trans function as a MO_* value
rather than the '0==f32, 1==f16' used in the fp 3-same encodings.

Backports commit 0ae715c658a02af1834b63563c56112a6d8842cb
2021-03-01 18:20:11 -05:00
Peter Maydell 524b54bc7b target/arm: Convert Neon 3-same-fp size field to MO_* in decode
In the Neon instructions, some instruction formats have a 2-bit size
field which corresponds exactly to QEMU's MO_8/16/32/64. However the
floating-point insns in the 3-same group have a 1-bit size field
which is "0 for 32-bit float and 1 for 16-bit float". Currently we
pass these values directly through to trans_ functions, which means
that when reading a particular trans_ function you need to know if
that insn uses a 2-bit size or a 1-bit size.

Move the handling of the 1-bit size to the decodetree file, so that
all these insns consistently pass a size to the trans_ function which
is an MO_8/16/32/64 value.

In this commit we switch over the insns using the 3same_fp and
3same_fp_q0 formats.

Backports commit 6cf0f240e0b980a877abed12d2995f740eae6515
2021-03-01 18:15:18 -05:00
Eduardo Habkost cefb1666c0 arm: Fix typo in AARCH64_CPU_GET_CLASS definition
There's a typo in the type name of AARCH64_CPU_GET_CLASS. This
was never detected because the macro is not used by any code.

Backports 37e3d65043229bb20bd07af74dc0866e12071415
2021-03-01 18:03:29 -05:00
Peter Maydell ff74ede2fd target/arm: Enable FP16 in '-cpu max'
Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
that our "-cpu max" has v8.2-FP16.

Backports commit 5f07817eb94542e39a419baafa3026b15e8d33f7
2021-03-01 18:00:13 -05:00
Peter Maydell b948636c4a target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
and use this to implement fp16 support.

Backports fc8ae790311882afa3c7816df004daf978c40e9a
2021-03-01 17:57:36 -05:00
Peter Maydell 8c6affbca4 target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
Add gvec helpers for doing Neon-style indexed non-fused fp
multiply-and-accumulate operations.

Backports commit c50d8d144098a8261233ca31b47e3bc487e112fe
2021-03-01 17:52:31 -05:00
Peter Maydell 3cc3099e36 target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.

Backports commit d7ce81e553e6789bf27657105b32575668d60b1c
2021-03-01 17:48:42 -05:00
Peter Maydell 681218b4ab target/arm: Implement fp16 for Neon VRINTX
Convert the Neon VRINTX insn to use gvec, and use this to implement
fp16 support for it.

Backports 23afcdd2511f2a3dc05bed650d27bd25cf9b2a3c
2021-03-01 17:47:25 -05:00
Peter Maydell 53aba9d900 target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
and use this to implement the fp16 versions.

Backports 18725916b1438b54d6d6533980833d2251a20b7c
2021-03-01 17:44:49 -05:00
Peter Maydell eb4054d04f target/arm: Implement fp16 for Neon VCVT with rounding modes
Convert the Neon VCVT with-specified-rounding-mode instructions
to gvec, and use this to implement fp16 support for them.

Backports ca88a6efdf4ce96b646a896059f9bd324c2cebc4
2021-03-01 17:40:36 -05:00