While these tests and the issue with it are pre-existing:
- we previously didn't understand that the issue was an openssl bug
- failures seem to have become more frequent since the recent changes
So let's disable these fragile tests in order to get a clean CI. We still have
the tests against gnutls-serv for interop testing.
While making the initial commit, I thought $OPENSSL_LEGACY was not affect by
this bug, but it turns out I was wrong. All versions of OpenSSL installed on
the CI are. Therefore, the corresponding tests are disabled for the same
reason as the gnutls-cli tests above it.
This commit is only about the tests that were added in the recent
fragmentation work. One of those two tests had a particularly
annoying mode of failure: it failed consistently with seed=1 (use in the
release version of all.sh), once #1951 was applied. This has nothing
particular to do with #1951, except that by changing retransmission behaviour
1951 made the proxy run into a path that triggered the OpenSSL bug with this
seed, while it previously did that only with other seeds.
Other 3d interop test are also susceptible to triggering this OpenSSL bug or
others (or bugs in GnuTLS), but they are left untouched by this commit as:
- they were pre-existing to the recent DTLS branches;
- they don't seem to have the particularly annoying seed=1 mode of failure.
However it's probably desirable to do something about them at some point in
the future.
This commit adds a test to ssl-opt.sh which exercises the behavior
of the library in the situation where a single proper fragment
of a future handshake message is received prior to the next
expected handshake message (concretely, the client receives
the first fragment of the server's Certificate message prior
to the server's ServerHello).
This commit adds two builds to all.sh which use a value of
MBEDTLS_SSL_DTLS_MAX_BUFFERING that allows to run the
reordering tests in ssl-opt.sh introduced in the last commit.
This commit adds tests to ssl-opt.sh which trigger code-paths
responsible for freeing future buffered messages when the buffering
limitations set by MBEDTLS_SSL_DTLS_MAX_BUFFERING don't allow the
next expected message to be reassembled.
These tests only work for very specific ranges of
MBEDTLS_SSL_DTLS_MAX_BUFFERING and will therefore be skipped
on a run of ssl-opt.sh in ordinary configurations.
This commit adds functions requires_config_value_at_most()
and requires_config_value_at_least() which can be used to
only run tests when a numerical value from config.h
(e.g. MBEDTLS_SSL_IN_CONTENT_LEN) is within a certain range.
The negotiated MFL is always the one suggested by the client, even
if the server has a smaller MFL configured locally. Hence, in the test
where the client asks for an MFL of 4096 bytes while the server locally
has an MFL of 512 bytes configured, the client will still send datagrams
of up to ~4K size.
Depending on the settings of the local machine, gnutls-cli will either try
IPv4 or IPv6 when trying to connect to localhost. With TLS, whatever it tries
first, it will notice if any failure happens and try the other protocol if
necessary. With DTLS it can't do that. Unfortunately for now there isn't
really any good way to specify an address and hostname independently, though
that might come soon: https://gitlab.com/gnutls/gnutls/issues/344
A work around is to specify an address directly and then use --insecure to
ignore certificate hostname mismatch; that is OK for tests that are completely
unrelated to certificate verification (such as the recent fragmenting tests)
but unacceptable for others.
For that reason, don't specify a default hostname for gnutls-cli, but instead
let each test choose between `--insecure 127.0.0.1` and `localhost` (or
`--insecure '::1'` if desired).
Alternatives include:
- having test certificates with 127.0.0.1 as the hostname, but having an IP as
the CN is unusual, and we would need to change our test certs;
- have our server open two sockets under the hood and listen on both IPv4 and
IPv6 (that's what gnutls-serv does, and IMO it's a good thing) but that
obviously requires development and testing (esp. for windows compatibility)
- wait for a newer version of GnuTLS to be released, install it on the CI and
developer machines, and use that in all tests - quite satisfying but can't
be done now (and puts stronger requirements on test environment).
From Hanno:
When a server replies to a cookieless ClientHello with a HelloVerifyRequest,
it is supposed to reset the connection and wait for a subsequent ClientHello
which includes the cookie from the HelloVerifyRequest.
In testing environments, it might happen that the reset of the server
takes longer than for the client to replying to the HelloVerifyRequest
with the ClientHello+Cookie. In this case, the ClientHello gets lost
and the client will need retransmit. This may happen even if the underlying
datagram transport is reliable.
This commit continues commit 47db877 by removing resend guards in the
ssl-opt.sh tests 'DTLS fragmenting: proxy MTU, XXX' which sometimes made
the tests fail in case the log showed a resend from the client.
See 47db877 for more information.
When a server replies to a cookieless ClientHello with a HelloVerifyRequest,
it is supposed to reset the connection and wait for a subsequent ClientHello
which includes the cookie from the HelloVerifyRequest.
In testing environments, it might happen that the reset of the server
takes longer than for the client to replying to the HelloVerifyRequest
with the ClientHello+Cookie. In this case, the ClientHello gets lost
and the client will need retransmit. This may happen even if the underlying
datagram transport is reliable.
This commit removes a guard in the ssl-opt.sh test
'DTLS fragmenting: proxy MTU, resumed handshake' which made
the test fail in case the log showed a resend from the client.
We previously observed random-looking failures from this test. I think they
were caused by a race condition where the client tries to reconnect while the
server is still closing the connection and has not yet returned to an
accepting state. In that case, the server would fail to see and reply to the
ClientHello, and the client would have to resend it.
I believe logs of failing runs are compatible with this interpretation:
- the proxy logs show the new ClientHello and the server's closing Alert are
sent the same millisecond.
- the client logs show the server's closing Alert is received after the new
handshake has been started (discarding message from wrong epoch).
The attempted fix is for the client to wait a bit before reconnecting, which
should vastly enhance the probability of the server reaching its accepting
state before the client tries to reconnect. The value of 1 second is arbitrary
but should be more than enough even on loaded machines.
The test was run locally 100 times in a row on a slightly loaded machine (an
instance of all.sh running in parallel) without any failure after this fix.
Use the same values as other 3d tests: this makes the test hopefully a bit
faster than the default values, while not increasing the failure rate.
While at it:
- adjust "needs_more_time" setting for 3d interop tests (we can't set the
timeout values for other implementations, so the test might be slow)
- fix some supposedly DTLS 1.0 test that were using dtls1_2 on the command
line
Now that the UDP proxy has the ability to delay specific
handshake message on the client and server side, use
this to rewrite the reordering tests and thereby make
them independent on the choice of PRNG used by the proxy
(which is not stable across platforms).