The race goes this way:
1. ssl_recv() succeeds (ie no signal received yet)
2. processing the message leads to aborting handshake with ret != 0
3. reset ret if we were signaled
4. print error if ret is still non-zero
5. go back to net_accept() which can be interrupted by a signal
We print the error message only if the signal is received between steps 3 and
5, not when it arrives between steps 1 and 3.
This can cause failures in ssl-opt.sh where we check for the presence of "Last
error was..." in the server's output: if we perform step 2, the client will be
notified and exit, then ssl-opt.sh will send SIGTERM to the server, but if it
didn't get a chance to run and pass step 3 in the meantime, we're in trouble.
The purpose of step 3 was to avoid spurious "Last error" messages in the
output so that ssl-opt.sh can check for a successful run by the absence of
that message. However, it is enough to suppress that message when the last
error we get is the one we expect from being interrupted by a signal - doing
more could hide real errors.
Also, improve the messages printed when interrupted to make it easier to
distinguish the two cases - this could be used in a testing script wanted to
check that the server doesn't see the client as disconnecting unexpectedly.
Restructed test suite helper and main code to support tests suite helper
functions, changed C++ comments to C-style, and made the generated
source code more navigable.
Added to generate_code.pl:
- support for per test suite helper functions
- description of the structure of the files the script uses to construct
the test suite file
- delimiters through the source code to make the machine generated code
easier to understand
The corner cases fixed include:
* Allocating a buffer of size 0. With this change, the allocator now
returns a NULL pointer in this case. Note that changes in pem.c and
x509_crl.c were required to fix tests that did not work under this
assumption.
* Initialising the allocator with less memory than required for headers.
* Fix header chain checks for uninitialised allocator.
Fix compilation error on Mingw32 when `_TRUNCATE` is defined. Use
`_TRUNCATE` only if `__MINGW32__` not defined. Fix suggested by
Thomas Glanzmann and Nick Wilson on issue #355
If lsof is not available, wait_server_start uses a fixed timeout,
which can trigger a race condition if the timeout turns out to be too
short. Emit a warning so that we know this is going on from the test
logs.
- Some of the CI machines don't have lsof installed yet, so rely on an sleeping
an arbitrary number of seconds while the server starts. We're seeing
occasional failures with the current delay because the CI machines are highly
loaded, which seems to indicate the current delay is not quite enough, but
hopefully not to far either, so double it.
- While at it, also double the watchdog delay: while I don't remember seeing
much failures due to client timeout, this change doesn't impact normal
running time of the script, so better err on the safe side.
These changes don't affect the test and should only affect the false positive
rate coming from the test framework in those scripts.
The SHA-256 / SHA-512 context used for entropy mixing in entropy.c
was previously reset by zeroization. The commit replaces this by
a pair of calls to `mbedtls_shaxxx_init` and `mbedtls_shaxxx_free`
which is safe also for alternative implementations of SHA-256 or
SHA-512 for which zeroization might not be a proper reset.
The entropy context contains a SHA-256 or SHA-512 context for entropy
mixing, but doesn't initialize / free this context properly in the
initialization and freeing functions `mbedtls_entropy_init` and
`mbedtls_entropy_free` through a call to `mbedtls_sha{256/512}_init`
resp. `mbedtls_sha{256/512}_free`. Instead, only a zeroization of the
entire entropy structure is performed. This doesn't lead to problems
for the current software implementations of SHA-256 and SHA-512 because
zeroization is proper initialization for them, but it may (and does)
cause problems for alternative implementations of SHA-256 and SHA-512
that use context structures that cannot be properly initialized through
zeroization. This commit fixes this. Found and fix suggested by ccli8.
1) The MPI test for prime generation missed a return value
check for a call to `mbedtls_mpi_shift_r`. This is neither
critical nor new but should be fixed.
2) The RSA keygeneration example program contained code
initializing an RSA context after a potentially failing
call to CTR DRBG initialization, leaving the corresponding
RSA context free call in the cleanup section orphaned.
The commit fixes this by moving the initializtion of the
RSA context prior to the first potentially failing call.
* mbedtls-2.1:
selftest: fix build error in some configurations
Timing self test: shorten redundant tests
Timing self test: increased duration
Timing self test: increased tolerance
selftest: allow excluding a subset of the tests
selftest: allow running a subset of the tests
selftest: fixed an erroneous return code
selftest: refactor to separate the list of tests from the logic
Timing self test: print some diagnosis information
mbedtls_timing_get_timer: don't use uninitialized memory
timing interface documentation: minor clarifications
Timing: fix mbedtls_set_alarm(0) on Unix/POSIX
* public/pr/1223:
selftest: fix build error in some configurations
Timing self test: shorten redundant tests
Timing self test: increased duration
Timing self test: increased tolerance
selftest: allow excluding a subset of the tests
selftest: allow running a subset of the tests
selftest: fixed an erroneous return code
selftest: refactor to separate the list of tests from the logic
Timing self test: print some diagnosis information
mbedtls_timing_get_timer: don't use uninitialized memory
timing interface documentation: minor clarifications
Timing: fix mbedtls_set_alarm(0) on Unix/POSIX
Increase the duration of the self test, otherwise it tends to fail on
a busy machine even with the recently upped tolerance. But run the
loop only once, it's enough for a simple smoke test.
mbedtls_timing_self_test fails annoyingly often when running on a busy
machine such as can be expected of a continous integration system.
Increase the tolerances in the delay test, to reduce the chance of
failures that are only due to missing a deadline on a busy machine.