It may be possible that the implementation runs out of
memory when exporting a key from storage or a secure
element. For example, it may not be possible to directly
move the data from storage to the caller, so the implementation
will have to buffer the material temporarily (an issue if dynamic
memory allocation scheme is used). For a large key
this is more likely to return.
It may be possible that an implementation does not
fetch key material until a command like
this is called and such an error may occur if an
off-chip secure storage dependency may have been wiped.
Note that PSA_ERROR_NOT_PERMITTED is not included
because I can't think of a scenario where you have
a valid key handle but aren't allowed to read the
attributes
If the key doesn't exist by the time this call is made
then the handle is invalid,
which means that PSA_ERROR_INVALID_HANDLE should be
returned rather than "does not exist"
It may be possible that the implementation runs out of
memory when exporting a key from storage or a secure
element. For example, it may not be possible to directly
move the data from storage to the caller, so the implementation
will have to buffer the material temporarily (an issue if dynamic
memory allocation scheme is used). For a large key
this is more likely to return.
It may be possible that an implementation does not
fetch key material until a command like
this is called and such an error may occur if an
off-chip secure storage dependency may have been wiped.
Note that PSA_ERROR_NOT_PERMITTED is not included
because I can't think of a scenario where you have
a valid key handle but aren't allowed to read the
attributes
x0-x3 are skipped such that function parameters to not have to be moved.
MULADDC_INIT and MULADDC_STOP are mostly empty because it is more
efficient to keep everything in registers (and that should easily be
possible). I considered a MULADDC_HUIT implementation, but could not
think of something that would be more efficient than basically 8
consecutive MULADDC_CORE. You could combine the loads and stores, but
it's probably more efficient to interleave them with arithmetic,
depending on the specific microarchitecture. NEON allows to do a
64x64->128 bit multiplication (and optional accumulation) in one
instruction, but is not great at handling carries.