The AES XTS self-test was using a variable len, which was declared only when CTR
was enabled. Changed the declaration of len to be conditional on CTR and XTS.
The AES OFB self-test made use of a variable `offset` but failed to have a
preprocessor condition around it, so unless CTR and CBC were enabled, the
variable would be undeclared.
In ssl_parse_encrypted_pms, some operational failures from
ssl_decrypt_encrypted_pms lead to diff being set to a value that
depended on some uninitialized unsigned char and size_t values. This didn't
affect the behavior of the program (assuming an implementation with no
trap values for size_t) because all that matters is whether diff is 0,
but Valgrind rightfully complained about the use of uninitialized
memory. Behave nicely and initialize the offending memory.
THe function `mbedtls_gf128mul_x_ble()` doesn't multiply by x, x^4, and
x^8. Update the function description to properly describe what the function
does.
mbedtls_aes_crypt_xts() currently takes a `bits_length` parameter, unlike
the other block modes. Change the parameter to accept a bytes length
instead, as the `bits_length` parameter is not actually ever used in the
current implementation.
Add a new context structure for XTS. Adjust the API for XTS to use the new
context structure, including tests suites and the benchmark program. Update
Doxgen documentation accordingly.
AES-XEX is a building block for other cryptographic standards and not yet a
standard in and of itself. We'll just provide the standardized AES-XTS
algorithm, and not AES-XEX. The AES-XTS algorithm and interface provided
can be used to perform the AES-XEX algorithm when the length of the input
is a multiple of the AES block size.
If we're unlucky with memory placement, gf128mul_table_bbe may spread over
two cache lines and this would leak b >> 63 to a cache timing attack.
Instead, take an approach that is less likely to make different memory
loads depending on the value of b >> 63 and is also unlikely to be compiled
to a condition.
XTS mode is fully known as "xor-encrypt-xor with ciphertext-stealing".
This is the generalization of the XEX mode.
This implementation is limited to an 8-bits (1 byte) boundary, which
doesn't seem to be what was thought considering some test vectors [1].
This commit comes with tests, extracted from [1], and benchmarks.
Although, benchmarks aren't really nice here, as they work with a buffer
of a multiple of 16 bytes, which isn't a challenge for XTS compared to
XEX.
[1] http://csrc.nist.gov/groups/STM/cavp/documents/aes/XTSTestVectors.zip
As seen from the first benchmark run, AES-XEX was running pourly (even
slower than AES-CBC). This commit doubles the performances of the
current implementation.
XEX mode, known as "xor-encrypt-xor", is the simple case of the XTS
mode, known as "XEX with ciphertext stealing". When the buffers to be
encrypted/decrypted have a length divisible by the length of a standard
AES block (16), XTS is exactly like XEX.
When MBEDTLS_PLATFORM_MEMORY is defined but MBEDTLS_PLATFORM_FREE_MACRO or
MBEDTLS_PLATFORM_CALLOC_MACRO are not defined then the actual functions
used to allocate and free memory are stored in function pointers.
These pointers are exposed to the caller, and it means that the caller
and the library have to share a data section.
In TF-A, we execute in a very constrained environment, where some images
are executed from ROM and other images are executed from SRAM. The
images that are executed from ROM cannot be modified. The SRAM size
is very small and we are moving libraries to the ROM that can be shared
between the different SRAM images. These SRAM images could import all the
symbols used in mbedtls, but it would create an undesirable hard binary
dependency between the different images. For this reason, all the library
functions in ROM are accesed using a jump table whose base address is
known, allowing the images to execute with different versions of the ROM.
This commit changes the function pointers to actual functions,
so that the SRAM images only have to use the new exported symbols
(mbedtls_calloc and mbedtls_free) using the jump table. In
our scenario, mbedtls_platform_set_calloc_free is called from
mbedtls_memory_buffer_alloc_init which initializes the function pointers
to the internal buffer_alloc_calloc and buffer_alloc_free functions.
No functional changes to mbedtls_memory_buffer_alloc_init.
Signed-off-by: Roberto Vargas <roberto.vargas@arm.com>
Adds error handling into mbedtls_aes_crypt_ofb for AES errors, a self-test
for the OFB mode using NIST SP 800-38A test vectors and adds a check to
potential return errors in setting the AES encryption key in the OFB test
suite.
* development: (97 commits)
Updated version number to 2.10.0 for release
Add a disabled CMAC define in the no-entropy configuration
Adapt the ARIA test cases for new ECB function
Fix file permissions for ssl.h
Add ChangeLog entry for PR#1651
Fix MicroBlaze register typo.
Fix typo in doc and copy missing warning
Fix edit mistake in cipher_wrap.c
Update CTR doc for the 64-bit block cipher
Update CTR doc for other 128-bit block ciphers
Slightly tune ARIA CTR documentation
Remove double declaration of mbedtls_ssl_list_ciphersuites
Update CTR documentation
Use zeroize function from new platform_util
Move to new header style for ALT implementations
Add ifdef for selftest in header file
Fix typo in comments
Use more appropriate type for local variable
Remove useless parameter from function
Wipe sensitive info from the stack
...
Motivation is similar to NO_UDBL_DIVISION.
The alternative implementation of 64-bit mult is straightforward and aims at
obvious correctness. Also, visual examination of the generate assembly show
that it's quite efficient with clang, armcc5 and arm-clang. However current
GCC generates fairly inefficient code for it.
I tried to rework the code in order to make GCC generate more efficient code.
Unfortunately the only way to do that is to get rid of 64-bit add and handle
the carry manually, but this causes other compilers to generate less efficient
code with branches, which is not acceptable from a side-channel point of view.
So let's keep the obvious code that works for most compilers and hope future
versions of GCC learn to manage registers in a sensible way in that context.
See https://bugs.launchpad.net/gcc-arm-embedded/+bug/1775263
- in x509_profile_check_pk_alg
- in x509_profile_check_md_alg
- in x509_profile_check_key
and in ssl_cli.c : unsigned char gets promoted to signed integer
Allowing DECRYPT with crypt_and_tag is a risk as people might fail to check
the tag correctly (or at all). So force them to use auth_decrypt() instead.
See also https://github.com/ARMmbed/mbedtls/pull/1668
When MBEDTLS_TIMING_C was not defined in config.h, but the MemSan
memory sanitizer was activated, entropy_poll.c used memset without
declaring it. Fix this by including string.h unconditionally.
As a protection against the Lucky Thirteen attack, the TLS code for
CBC decryption in encrypt-then-MAC mode performs extra MAC
calculations to compensate for variations in message size due to
padding. The amount of extra MAC calculation to perform was based on
the assumption that the bulk of the time is spent in processing
64-byte blocks, which is correct for most supported hashes but not for
SHA-384. Correct the amount of extra work for SHA-384 (and SHA-512
which is currently not used in TLS, and MD2 although no one should
care about that).
Fix IAR compiler warnings
Two warnings have been fixed:
1. code 'if( len <= 0xFFFFFFFF )' gave warning 'pointless integer comparison'.
This was fixed by wraping the condition in '#if SIZE_MAX > 0xFFFFFFFF'.
2. code 'diff |= A[i] ^ B[i];' gave warning 'the order of volatile accesses is undefined in'.
This was fixed by read the volatile data in temporary variables before the computation.
Explain IAR warning on volatile access
Consistent use of CMAKE_C_COMPILER_ID
The cast to void was motivated by the assumption that the functions only
return non-zero when passed bad arguments, but that might not be true of
alternative implementation, for example on hardware failure.
- need HW failure codes too
- re-use relevant poly codes for chachapoly to save on limited space
Values were chosen to leave 3 free slots at the end of the NET odd range.
That's what it is. So we shouldn't set a block size != 1.
While at it, move call to chachapoly_update() closer to the one for GCM, as
they are similar (AEAD).
This reduces clutter, making the functions more readable.
Also, it makes lcov see each line as covered. This is not cheating, as the
lines that were previously seen as not covered are not supposed to be reached
anyway (failing branches of the selftests).
Thanks to this and previous test suite enhancements, lcov now sees chacha20.c
and poly1305.c at 100% line coverage, and for chachapoly.c only two lines are
not covered (error returns from lower-level module that should never happen
except perhaps if an alternative implementation returns an unexpected error).
This module used (len, pointer) while (pointer, len) is more common in the
rest of the library, in particular it's what's used in the GCM API that
very comparable to it, so switch to (pointer, len) for consistency.
Note that the crypt_and_tag() and auth_decrypt() functions were already using
the same convention as GCM, so this also increases intra-module consistency.
This module used (len, pointer) while (pointer, len) is more common in the
rest of the library, in particular it's what's used in the CMAC API that is
very comparable to Poly1305, so switch to (pointer, len) for consistency.
In addition to making the APIs of the various AEAD modules more consistent
with each other, it's useful to have an auth_decrypt() function so that we can
safely check the tag ourselves, as the user might otherwise do it in an
insecure way (or even forget to do it altogether).
While the old name is explicit and aligned with the RFC, it's also very long,
so with the mbedtls_ prefix prepended we get a 31-char prefix to each
identifier, which quickly conflicts with our 80-column policy.
The new name is shorter, it's what a lot of people use when speaking about
that construction anyway, and hopefully should not introduce confusion at
it seems unlikely that variants other than 20/1305 be standardised in the
foreseeable future.
- in .h files: only put the context declaration inside the #ifdef _ALT
(this was changed in 2.9.0, ie after the original PR)
- in .c file: only leave selftest out of _ALT: even though some function are
trivial to build from other parts, alt implementors might want to go another
way about them (for efficiency or other reasons)
I refactored some code into the function mbedtls_constant_time_memcmp
in commit 7aad291 but this function is only used by GCM and
AEAD_ChaCha20_Poly1305 to check the tags. So this function is now
only enabled if either of these two ciphers is enabled.
This change permits users of the ChaCha20/Poly1305 algorithms
(and the AEAD construction thereof) to pass NULL pointers for
data that they do not need, and avoids the need to provide a valid
buffer for data that is not used.
This implementation is based off the description in RFC 7539.
The ChaCha20 code is also updated to provide a means of generating
keystream blocks with arbitrary counter values. This is used to
generated the one-time Poly1305 key in the AEAD construction.
* development: (504 commits)
Fix minor code style issues
Add the uodate to the soversion to the ChangeLog
Fix the ChangeLog for clarity, english and credit
Update version to 2.9.0
ecp: Fix binary compatibility with group ID
Changelog entry
Change accepted ciphersuite versions when parsing server hello
Remove preprocessor directives around platform_util.h include
Fix style for mbedtls_mpi_zeroize()
Improve mbedtls_platform_zeroize() docs
mbedtls_zeroize -> mbedtls_platform_zeroize in docs
Reword config.h docs for MBEDTLS_PLATFORM_ZEROIZE_ALT
Organize CMakeLists targets in alphabetical order
Organize output objs in alfabetical order in Makefile
Regenerate errors after ecp.h updates
Update ecp.h
Change variable bytes_written to header_bytes in record decompression
Update ecp.h
Update ecp.h
Update ecp.h
...
Rename to mbedtls_ssl_get_async_operation_data and
mbedtls_ssl_set_async_operation_data so that they're about
"async operation data" and not about some not-obvious "data".
When a handshake step starts an asynchronous operation, the
application needs to know which SSL connection the operation is for,
so that when the operation completes, the application can wake that
connection up. Therefore the async start callbacks need to take the
SSL context as an argument. It isn't enough to let them set a cookie
in the SSL connection, the application needs to be able to find the
right SSL connection later.
Also pass the SSL context to the other callbacks for consistency. Add
a new field to the handshake that the application can use to store a
per-connection context. This new field replaces the former
context (operation_ctx) that was created by the start function and
passed to the resume function.
Add a boolean flag to the handshake structure to track whether an
asynchronous operation is in progress. This is more robust than
relying on the application to set a non-null application context.
Change the signature of mbedtls_ssl_handshake_free again. Now take the
whole SSL context as argument and not just the configuration and the
handshake substructure.
This is in preparation for changing the asynchronous cancel callback
to take the SSL context as an argument.
In the refactoring of ssl_parse_encrypted_pms, I advertently broke the
case when decryption signalled an error, with the variable ret getting
overwritten before calculating diff. Move the calculation of diff
immediately after getting the return code to make the connection more
obvious. Also move the calculation of mask immediately after the
calculation of diff, which doesn't change the behavior, because I find
the code clearer that way.
Conflict resolution:
* ChangeLog: put the new entry from my branch in the proper place.
* include/mbedtls/error.h: counted high-level module error codes again.
* include/mbedtls/ssl.h: picked different numeric codes for the
concurrently added errors; made the new error a full sentence per
current standards.
* library/error.c: ran scripts/generate_errors.pl.
* library/ssl_srv.c:
* ssl_prepare_server_key_exchange "DHE key exchanges": the conflict
was due to style corrections in development
(4cb1f4d49c) which I merged with
my refactoring.
* ssl_prepare_server_key_exchange "For key exchanges involving the
server signing", first case, variable declarations: merged line
by line:
* dig_signed_len: added in async
* signature_len: removed in async
* hashlen: type changed to size_t in development
* hash: size changed to MBEDTLS_MD_MAX_SIZE in async
* ret: added in async
* ssl_prepare_server_key_exchange "For key exchanges involving the
server signing", first cae comment: the conflict was due to style
corrections in development (4cb1f4d49c)
which I merged with my comment changes made as part of refactoring
the function.
* ssl_prepare_server_key_exchange "Compute the hash to be signed" if
`md_alg != MBEDTLS_MD_NONE`: conflict between
ebd652fe2d
"ssl_write_server_key_exchange: calculate hashlen explicitly" and
46f5a3e9b4 "Check return codes from
MD in ssl code". I took the code from commit
ca1d742904 made on top of development
which makes mbedtls_ssl_get_key_exchange_md_ssl_tls return the
hash length.
* programs/ssl/ssl_server2.c: multiple conflicts between the introduction
of MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS and new auxiliary functions and
definitions for async support, and the introduction of idle().
* definitions before main: concurrent additions, kept both.
* main, just after `handshake:`: in the loop around
mbedtls_ssl_handshake(), merge the addition of support for
MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS and SSL_ASYNC_INJECT_ERROR_CANCEL
with the addition of the idle() call.
* main, if `opt.transport == MBEDTLS_SSL_TRANSPORT_STREAM`: take the
code from development and add a check for
MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS.
* main, loop around mbedtls_ssl_read() in the datagram case:
take the code from development and add a check for
MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS; revert to a do...while loop.
* main, loop around mbedtls_ssl_write() in the datagram case:
take the code from development and add a check for
MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS; revert to a do...while loop.
In mbedtls_ssl_get_key_exchange_md_tls1_2, add an output parameter for
the hash length. The code that calls this function can currently do
without it, but it will need the hash length in the future, when
adding support for a third-party callback to calculate the signature
of the hash.
Reorganize ssl_parse_encrypted_pms so that it first prepares the
ciphertext to decrypt, then decrypts it, then returns either the
decrypted premaster secret or random data in an appropriate manner.
This is in preparation for allowing the private key operation to be
offloaded to an external cryptographic module which can operate
asynchronously. The refactored code no longer calculates state before
the decryption that needs to be saved until after the decryption,
which allows the decryption to be started and later resumed.
Use the public key to extract metadata rather than the public key.
Don't abort early if there is no private key.
This is in preparation for allowing the private key operation to be
offloaded to an external cryptographic module.
Implement SSL asynchronous private operation for the case of a
signature operation in a server.
This is a first implementation. It is functional, but the code is not
clean, with heavy reliance on goto.
The pk layer can infer the hash length from the hash type. Calculate
it explicitly here anyway because it's needed for debugging purposes,
and it's needed for the upcoming feature allowing the signature
operation to be offloaded to an external cryptographic processor, as
the offloading code will need to know what length hash to copy.
New compile-time option MBEDTLS_SSL_ASYNC_PRIVATE_C, enabling
callbacks to replace private key operations. These callbacks allow the
SSL stack to make an asynchronous call to an external cryptographic
module instead of calling the cryptography layer inside the library.
The call is asynchronous in that it may return the new status code
MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS, in which case the SSL stack returns
and can be later called where it left off.
This commit introduces the configuration option. Later commits will
implement the feature proper.
This function is declared in ssl_internal.h, so this is not a public
API change.
This is in preparation for mbedtls_ssl_handshake_free needing to call
methods from the config structure.
In SSL, don't use mbedtls_pk_ec or mbedtls_pk_rsa on a private
signature or decryption key (as opposed to a public key or a key used
for DH/ECDH). Extract the data (it's the same data) from the public
key object instead. This way the code works even if the private key is
opaque or if there is no private key object at all.
Specifically, with an EC key, when checking whether the curve in a
server key matches the handshake parameters, rely only on the offered
certificate and not on the metadata of the private key.
* public/pr/1380:
Update ChangeLog for #1380
Generate RSA keys according to FIPS 186-4
Generate primes according to FIPS 186-4
Avoid small private exponents during RSA key generation
Change mbedtls_zeroize() implementation to use memset() instead of a
custom implementation for performance reasons. Furthermore, we would
also like to prevent as much as we can compiler optimisations that
remove zeroization code.
The implementation of mbedtls_zeroize() now uses a volatile function
pointer to memset() as suggested by Colin Percival at:
http://www.daemonology.net/blog/2014-09-04-how-to-zero-a-buffer.html
Add a new macro MBEDTLS_UTILS_ZEROIZE that allows users to configure
mbedtls_zeroize() to an alternative definition when defined. If the
macro is not defined, then mbed TLS will use the default definition of
the function.
This commit removes all the static occurrencies of the function
mbedtls_zeroize() in each of the individual .c modules. Instead the
function has been moved to utils.h that is included in each of the
modules.
The new header contains common information across various mbed TLS
modules and avoids code duplication. To start, utils.h currently only
contains the mbedtls_zeroize() function.
The specification requires that P and Q are not too close. The specification
also requires that you generate a P and stick with it, generating new Qs until
you have found a pair that works. In practice, it turns out that sometimes a
particular P results in it being very unlikely a Q can be found matching all
the constraints. So we keep the original behavior where a new P and Q are
generated every round.
The specification requires that numbers are the raw entropy (except for odd/
even) and at least 2^(nbits-0.5). If not, new random bits need to be used for
the next number. Similarly, if the number is not prime new random bits need to
be used.
Attacks against RSA exist for small D. [Wiener] established this for
D < N^0.25. [Boneh] suggests the bound should be N^0.5.
Multiple possible values of D might exist for the same set of E, P, Q. The
attack works when there exists any possible D that is small. To make sure that
the generated key is not susceptible to attack, we need to make sure we have
found the smallest possible D, and then check that D is big enough. The
Carmichael function λ of p*q is lcm(p-1, q-1), so we can apply Carmichael's
theorem to show that D = d mod λ(n) is the smallest.
[Wiener] Michael J. Wiener, "Cryptanalysis of Short RSA Secret Exponents"
[Boneh] Dan Boneh and Glenn Durfee, "Cryptanalysis of RSA with Private Key d Less than N^0.292"
Clang-Msan is known to report spurious errors when MBEDTLS_AESNI_C is
enabled, due to the use of assembly code. The error reports don't
mention AES, so they can be difficult to trace back to the use of
AES-NI. Warn about this potential problem at compile time.
Zeroing out an fd_set before calling FD_ZERO on it is in principle
useless, but without it some memory sanitizers think the fd_set is
still uninitialized after FD_ZERO (e.g. clang-msan/Glibc/x86_64 where
FD_ZERO is implemented in assembly). Make the zeroing conditional on
using a memory sanitizer.
The initialization via FD_SET is not seen by memory sanitizers if
FD_SET is implemented through assembly. Additionally zeroizing the
respective fd_set's before calling FD_SET contents the sanitizers
and comes at a negligible computational overhead.
In mbedtls_ssl_derive_keys, don't call mbedtls_md_hmac_starts in
ciphersuites that don't use HMAC. This doesn't change the behavior of
the code, but avoids relying on an uncaught error when attempting to
start an HMAC operation that hadn't been initialized.
Clarify what MBEDTLS_ERR_ECP_SIG_LEN_MISMATCH and
MBEDTLS_ERR_PK_SIG_LEN_MISMATCH mean. Add comments to highlight that
this indicates that a valid signature is present, unlike other error
codes. See
https://github.com/ARMmbed/mbedtls/pull/1149#discussion_r178130705
Conflict resolution:
* ChangeLog
* tests/data_files/Makefile: concurrent additions, order irrelevant
* tests/data_files/test-ca.opensslconf: concurrent additions, order irrelevant
* tests/scripts/all.sh: one comment change conflicted with a code
addition. In addition some of the additions in the
iotssl-1381-x509-verify-refactor-restricted branch need support for
keep-going mode, this will be added in a subsequent commit.
The relevant ASN.1 definitions for a PKCS#8 encoded Elliptic Curve key are:
PrivateKeyInfo ::= SEQUENCE {
version Version,
privateKeyAlgorithm PrivateKeyAlgorithmIdentifier,
privateKey PrivateKey,
attributes [0] IMPLICIT Attributes OPTIONAL
}
AlgorithmIdentifier ::= SEQUENCE {
algorithm OBJECT IDENTIFIER,
parameters ANY DEFINED BY algorithm OPTIONAL
}
ECParameters ::= CHOICE {
namedCurve OBJECT IDENTIFIER
-- implicitCurve NULL
-- specifiedCurve SpecifiedECDomain
}
ECPrivateKey ::= SEQUENCE {
version INTEGER { ecPrivkeyVer1(1) } (ecPrivkeyVer1),
privateKey OCTET STRING,
parameters [0] ECParameters {{ NamedCurve }} OPTIONAL,
publicKey [1] BIT STRING OPTIONAL
}
Because of the two optional fields, there are 4 possible variants that need to
be parsed: no optional fields, only parameters, only public key, and both
optional fields. Previously mbedTLS was unable to parse keys with "only
parameters". Also, only "only public key" was tested. There was a test for "no
optional fields", but it was labelled incorrectly as SEC.1 and not run because
of a great renaming mixup.
check-names.sh reserves the prefix MBEDTLS_ for macros defined in
config.h so this name (or check-names.sh) had to change.
This is also more flexible because it allows for platforms that don't have
an EINTR equivalent or have multiple such values.
Also, introduce MBEDTLS_EINTR locally in net_sockets.c
for the platform-dependent return code macro used by
the `select` call to indicate that the poll was interrupted
by a signal handler: On Unix, the corresponding macro is EINTR,
while on Windows, it's WSAEINTR.
If the select UNIX system call is interrupted by a signal handler,
it is not automatically restarted but returns EINTR. This commit
modifies the use of select in mbedtls_net_poll from net_sockets.c
to retry the select call in this case.
Found by running:
CC=clang cmake -D CMAKE_BUILD_TYPE="Check"
tests/scripts/depend-pkalgs.pl
(Also tested with same command but CC=gcc)
Another PR will address improving all.sh and/or the depend-xxx.pl scripts
themselves to catch this kind of thing.
library\x509_crt.c(2137): warning C4267: 'function' : conversion from 'size_t' to 'int', possible loss of data
library\x509_crt.c(2265): warning C4267: 'function' : conversion from 'size_t' to 'int', possible loss of data
* development: (557 commits)
Add attribution for #1351 report
Adapt version_features.c
Note incompatibility of truncated HMAC extension in ChangeLog
Add LinkLibraryDependencies to VS2010 app template
Add ChangeLog entry for PR #1382
MD: Make deprecated functions not inline
Add ChangeLog entry for PR #1384
Have Visual Studio handle linking to mbedTLS.lib internally
Mention in ChangeLog that this fixes#1351
Add issue number to ChangeLog
Note in the changelog that this fixes an interoperability issue.
Style fix in ChangeLog
Add ChangeLog entries for PR #1168 and #1362
Add ChangeLog entry for PR #1165
ctr_drbg: Typo fix in the file description comment.
dhm: Fix typo in RFC 5114 constants
tests_suite_pkparse: new PKCS8-v2 keys with PRF != SHA1
data_files/pkcs8-v2: add keys generated with PRF != SHA1
tests/pkcs5/pbkdf2_hmac: extend array to accommodate longer results
tests/pkcs5/pbkdf2_hmac: add unit tests for additional SHA algorithms
...
Fix warnings from gcc -O -Wall about `ret` used uninitialized in
CMAC selftest auxiliary functions. The variable was indeed
uninitialized if the function was called with num_tests=0 (which
never happens).
Use specific instructions for moving bytes around in a word. This speeds
things up, and as a side-effect, slightly lowers code size.
ARIA_P3 and ARIA_P1 are now 1 single-cycle instruction each (those
instructions are available in all architecture versions starting from v6-M).
Note: ARIA_P3 was already translated to a single instruction by Clang 3.8 and
armclang 6.5, but not arm-gcc 5.4 nor armcc 5.06.
ARIA_P2 is already efficiently translated to the minimal number of
instruction (1 in ARM mode, 2 in thumb mode) by all tested compilers
Manually compiled and inspected generated code with the following compilers:
arm-gcc 5.4, clang 3.8, armcc 5.06 (with and without --gnu), armclang 6.5.
Size reduction (arm-none-eabi-gcc -march=armv6-m -mthumb -Os): 5288 -> 5044 B
Effect on executing time of self-tests on a few boards:
FRDM-K64F (Cortex-M4): 444 -> 385 us (-13%)
LPC1768 (Cortex-M3): 488 -> 432 us (-11%)
FRDM-KL64Z (Cortex-M0): 1429 -> 1134 us (-20%)
Measured using a config.h with no cipher mode and the following program with
aria.c and aria.h copy-pasted to the online compiler:
#include "mbed.h"
#include "aria.h"
int main() {
Timer t;
t.start();
int ret = mbedtls_aria_self_test(0);
t.stop();
printf("ret = %d; time = %d us\n", ret, t.read_us());
}
(A similar commit for Arm follows.)
Use specific instructions for moving bytes around in a word. This speeds
things up, and as a side-effect, slightly lowers code size.
ARIA_P3 (aka reverse byte order) is now 1 instruction on x86, which speeds up
key schedule. (Clang 3.8 finds this but GCC 5.4 doesn't.)
I couldn't find an Intel equivalent of ARM's ret16 (aka ARIA_P1), so I made it
two instructions, which is still much better than the code generated with
the previous mask-shift-or definition, and speeds up en/decryption. (Neither
Clang 3.8 nor GCC 5.4 find this.)
Before:
O aria.o ins
s 7976 43,865
2 10520 37,631
3 13040 28,146
After:
O aria.o ins
s 7768 33,497
2 9816 28,268
3 11432 20,829
For measurement method, see previous commit:
"aria: turn macro into static inline function"
This decreases the size with -Os by nearly 1k while
not hurting performance too much with -O2 and -O3
Before:
O aria.o ins
s 8784 41,408
2 11112 37,001
3 13096 27,438
After:
O aria.o ins
s 7976 43,865
2 10520 37,631
3 13040 28,146
(See previous commit for measurement details.)
Besides documenting types better and so on, this give the compiler more room
to optimise either for size or performance.
Here are some before/after measurements of:
- size of aria.o in bytes (less is better)
- instruction count for the selftest function (less is better)
with various -O flags.
Before:
O aria.o ins
s 10896 37,256
2 11176 37,199
3 12248 27,752
After:
O aria.o ins
s 8784 41,408
2 11112 37,001
3 13096 27,438
The new version allows the compiler to reach smaller size with -Os while
maintaining (actually slightly improving) performance with -O2 and -O3.
Measurements were done on x86_64 (but since this is mainly about inlining
code, this should transpose well to other platforms) using the following
helper program and script, after disabling CBC, CFB and CTR in config.h, in
order to focus on the core functions.
==> st.c <==
#include "mbedtls/aria.h"
int main( void ) {
return mbedtls_aria_self_test( 0 );
}
==> p.sh <==
#!/bin/sh
set -eu
ccount () {
(
valgrind --tool=callgrind --dump-line=no --callgrind-out-file=/dev/null --collect-atstart=no --toggle-collect=main $1
) 2>&1 | sed -n -e 's/.*refs: *\([0-9,]*\)/\1/p'
}
printf "O\taria.o\tins\n"
for O in s 2 3; do
GCC="gcc -Wall -Wextra -Werror -Iinclude"
$GCC -O$O -c library/aria.c
$GCC -O1 st.c aria.o -o st
./st
SIZE=$( du -b aria.o | cut -f1 )
INS=$( ccount ./st )
printf "$O\t$SIZE\t$INS\n"
done
We're not absolutely consistent in the rest of the library, but we tend to use
C99-style comments less often.
Change to use C89-style comments everywhere except for end-of-line comments
Those suites were defined in ciphersuite_definitions[] but not included in
ciphersuite_preference[] which meant they couldn't be negotiated unless
explicitly added by the user. Add them so that they're usable by default like
any other suite.