Fix possible close_notify/ClientHello confusion

The ssl-opt.sh test cases using session resumption tend to fail occasionally
on the CI due to a race condition in how ssl_server2 and ssl_client2 handle
the reconnection cycle.

The server does the following in order:
- S1 send application data
- S2 send a close_notify alert
- S3 close the client socket
- S4 wait for a "new connection" (actually a new datagram)
- S5 start a handshake

The client does the following in order:
- C1 wait for and read application data from the server
- C2 send a close_notify alert
- C3 close the server socket
- C4 reset session data and re-open a server socket
- C5 start a handshake

If the client has been able to send the close_notify (C2) and if has been
delivered to the server before if closes the client socket (S3), when the
server reaches S4, the datagram that we start the new connection will be the
ClientHello and everything will be fine.

However if S3 wins the race and happens before the close_notify is delivered,
in S4 the close_notify is what will be seen as the first datagram in a new
connection, and then in S5 this will rightfully be rejected as not being a
valid ClientHello and the server will close the connection (and go wait for
another one). The client will then fail to read from the socket and exit
non-zero and the ssl-opt.sh harness will correctly report this as a failure.

In order to avoid this race condition in test using ssl_client2 and
ssl_server2, this commits introduces a new command-line option
skip_close_notify to ssl_client2 and uses it in all ssl-opt.sh tests that use
session resumption with DTLS and ssl_server2.

This works because ssl_server2 knows how many messages it expects in each
direction and in what order, and closes the connection after that rather than
relying on close_notify (which is also why there was a race in the first
place).

Tests that use another server (in practice there are two of them, using
OpenSSL as a server) wouldn't work with skip_close_notify, as the server won't
close the connection until the client sends a close_notify, but for the same
reason they don't need it (there is no race between receiving close_notify and
closing as the former is the cause of the later).

An alternative approach would be to make ssl_server2 keep the connection open
until it receives a close_notify. Unfortunately it creates problems for tests
where we simulate a lossy network, as the close_notify could be lost (and the
client can't retransmit it). We could modify udp_proxy with an option to never
drop alert messages, but when TLS 1.3 comes that would no longer work as the
type of messages will be encrypted.

Signed-off-by: Manuel Pégourié-Gonnard <manuel.pegourie-gonnard@arm.com>
This commit is contained in:
Manuel Pégourié-Gonnard 2020-02-17 11:04:33 +01:00
parent 0fce215851
commit 498e632b08
2 changed files with 46 additions and 21 deletions

View file

@ -115,6 +115,7 @@ int main( void )
#define DFL_FALLBACK -1
#define DFL_EXTENDED_MS -1
#define DFL_ETM -1
#define DFL_SKIP_CLOSE_NOTIFY 0
#define GET_REQUEST "GET %s HTTP/1.0\r\nExtra-header: "
#define GET_REQUEST_END "\r\n\r\n"
@ -278,6 +279,7 @@ int main( void )
" options: 1 (level-triggered, implies nbio=1),\n" \
" read_timeout=%%d default: 0 ms (no timeout)\n" \
" max_resend=%%d default: 0 (no resend on timeout)\n" \
" skip_close_notify=%%d default: 0 (send close_notify)\n" \
"\n" \
USAGE_DTLS \
"\n" \
@ -376,6 +378,7 @@ struct options
int dgram_packing; /* allow/forbid datagram packing */
int extended_ms; /* negotiate extended master secret? */
int etm; /* negotiate encrypt then mac? */
int skip_close_notify; /* skip sending the close_notify alert */
} opt;
int query_config( const char *config );
@ -653,6 +656,7 @@ int main( int argc, char *argv[] )
opt.extended_ms = DFL_EXTENDED_MS;
opt.etm = DFL_ETM;
opt.dgram_packing = DFL_DGRAM_PACKING;
opt.skip_close_notify = DFL_SKIP_CLOSE_NOTIFY;
for( i = 1; i < argc; i++ )
{
@ -993,6 +997,12 @@ int main( int argc, char *argv[] )
{
return query_config( q );
}
else if( strcmp( p, "skip_close_notify" ) == 0 )
{
opt.skip_close_notify = atoi( q );
if( opt.skip_close_notify < 0 || opt.skip_close_notify > 1 )
goto usage;
}
else
goto usage;
}
@ -2032,10 +2042,25 @@ close_notify:
mbedtls_printf( " . Closing the connection..." );
fflush( stdout );
/* No error checking, the connection might be closed already */
do ret = mbedtls_ssl_close_notify( &ssl );
while( ret == MBEDTLS_ERR_SSL_WANT_WRITE );
ret = 0;
/*
* Most of the time sending a close_notify before closing is the right
* thing to do. However, when the server already knows how many messages
* are expected and closes the connection by itself, this alert becomes
* redundant. Sometimes with DTLS this redundancy becomes a problem by
* leading to a race condition where the server might close the connection
* before seeing the alert, and since UDP is connection-less when the
* alert arrives it will be seen as a new connection, which will fail as
* the alert is clearly not a valid ClientHello. This may cause spurious
* failures in tests that use DTLS and resumption with ssl_server2 in
* ssl-opt.sh, avoided by enabling skip_close_notify client-side.
*/
if( opt.skip_close_notify == 0 )
{
/* No error checking, the connection might be closed already */
do ret = mbedtls_ssl_close_notify( &ssl );
while( ret == MBEDTLS_ERR_SSL_WANT_WRITE );
ret = 0;
}
mbedtls_printf( " done\n" );

View file

@ -1575,7 +1575,7 @@ run_test "Session resume using tickets: openssl client" \
run_test "Session resume using tickets, DTLS: basic" \
"$P_SRV debug_level=3 dtls=1 tickets=1" \
"$P_CLI debug_level=3 dtls=1 tickets=1 reconnect=1" \
"$P_CLI debug_level=3 dtls=1 tickets=1 reconnect=1 skip_close_notify=1" \
0 \
-c "client hello, adding session ticket extension" \
-s "found session ticket extension" \
@ -1589,7 +1589,7 @@ run_test "Session resume using tickets, DTLS: basic" \
run_test "Session resume using tickets, DTLS: cache disabled" \
"$P_SRV debug_level=3 dtls=1 tickets=1 cache_max=0" \
"$P_CLI debug_level=3 dtls=1 tickets=1 reconnect=1" \
"$P_CLI debug_level=3 dtls=1 tickets=1 reconnect=1 skip_close_notify=1" \
0 \
-c "client hello, adding session ticket extension" \
-s "found session ticket extension" \
@ -1603,7 +1603,7 @@ run_test "Session resume using tickets, DTLS: cache disabled" \
run_test "Session resume using tickets, DTLS: timeout" \
"$P_SRV debug_level=3 dtls=1 tickets=1 cache_max=0 ticket_timeout=1" \
"$P_CLI debug_level=3 dtls=1 tickets=1 reconnect=1 reco_delay=2" \
"$P_CLI debug_level=3 dtls=1 tickets=1 reconnect=1 skip_close_notify=1 reco_delay=2" \
0 \
-c "client hello, adding session ticket extension" \
-s "found session ticket extension" \
@ -1735,7 +1735,7 @@ run_test "Session resume using cache: openssl server" \
run_test "Session resume using cache, DTLS: tickets enabled on client" \
"$P_SRV dtls=1 debug_level=3 tickets=0" \
"$P_CLI dtls=1 debug_level=3 tickets=1 reconnect=1" \
"$P_CLI dtls=1 debug_level=3 tickets=1 reconnect=1 skip_close_notify=1" \
0 \
-c "client hello, adding session ticket extension" \
-s "found session ticket extension" \
@ -1749,7 +1749,7 @@ run_test "Session resume using cache, DTLS: tickets enabled on client" \
run_test "Session resume using cache, DTLS: tickets enabled on server" \
"$P_SRV dtls=1 debug_level=3 tickets=1" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 skip_close_notify=1" \
0 \
-C "client hello, adding session ticket extension" \
-S "found session ticket extension" \
@ -1763,7 +1763,7 @@ run_test "Session resume using cache, DTLS: tickets enabled on server" \
run_test "Session resume using cache, DTLS: cache_max=0" \
"$P_SRV dtls=1 debug_level=3 tickets=0 cache_max=0" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 skip_close_notify=1" \
0 \
-S "session successfully restored from cache" \
-S "session successfully restored from ticket" \
@ -1772,7 +1772,7 @@ run_test "Session resume using cache, DTLS: cache_max=0" \
run_test "Session resume using cache, DTLS: cache_max=1" \
"$P_SRV dtls=1 debug_level=3 tickets=0 cache_max=1" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 skip_close_notify=1" \
0 \
-s "session successfully restored from cache" \
-S "session successfully restored from ticket" \
@ -1781,7 +1781,7 @@ run_test "Session resume using cache, DTLS: cache_max=1" \
run_test "Session resume using cache, DTLS: timeout > delay" \
"$P_SRV dtls=1 debug_level=3 tickets=0" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 reco_delay=0" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 skip_close_notify=1 reco_delay=0" \
0 \
-s "session successfully restored from cache" \
-S "session successfully restored from ticket" \
@ -1790,7 +1790,7 @@ run_test "Session resume using cache, DTLS: timeout > delay" \
run_test "Session resume using cache, DTLS: timeout < delay" \
"$P_SRV dtls=1 debug_level=3 tickets=0 cache_timeout=1" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 reco_delay=2" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 skip_close_notify=1 reco_delay=2" \
0 \
-S "session successfully restored from cache" \
-S "session successfully restored from ticket" \
@ -1799,7 +1799,7 @@ run_test "Session resume using cache, DTLS: timeout < delay" \
run_test "Session resume using cache, DTLS: no timeout" \
"$P_SRV dtls=1 debug_level=3 tickets=0 cache_timeout=0" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 reco_delay=2" \
"$P_CLI dtls=1 debug_level=3 tickets=0 reconnect=1 skip_close_notify=1 reco_delay=2" \
0 \
-s "session successfully restored from cache" \
-S "session successfully restored from ticket" \
@ -3348,19 +3348,19 @@ run_test "Event-driven I/O, DTLS: ticket + client auth" \
run_test "Event-driven I/O, DTLS: ticket + client auth + resume" \
"$P_SRV dtls=1 event=1 tickets=1 auth_mode=required" \
"$P_CLI dtls=1 event=1 tickets=1 reconnect=1" \
"$P_CLI dtls=1 event=1 tickets=1 reconnect=1 skip_close_notify=1" \
0 \
-c "Read from server: .* bytes read"
run_test "Event-driven I/O, DTLS: ticket + resume" \
"$P_SRV dtls=1 event=1 tickets=1 auth_mode=none" \
"$P_CLI dtls=1 event=1 tickets=1 reconnect=1" \
"$P_CLI dtls=1 event=1 tickets=1 reconnect=1 skip_close_notify=1" \
0 \
-c "Read from server: .* bytes read"
run_test "Event-driven I/O, DTLS: session-id resume" \
"$P_SRV dtls=1 event=1 tickets=0 auth_mode=none" \
"$P_CLI dtls=1 event=1 tickets=0 reconnect=1" \
"$P_CLI dtls=1 event=1 tickets=0 reconnect=1 skip_close_notify=1" \
0 \
-c "Read from server: .* bytes read"
@ -3372,7 +3372,7 @@ run_test "Event-driven I/O, DTLS: session-id resume" \
run_test "Event-driven I/O, DTLS: session-id resume, UDP packing" \
-p "$P_PXY pack=50" \
"$P_SRV dtls=1 event=1 tickets=0 auth_mode=required" \
"$P_CLI dtls=1 event=1 tickets=0 reconnect=1" \
"$P_CLI dtls=1 event=1 tickets=0 reconnect=1 skip_close_notify=1" \
0 \
-c "Read from server: .* bytes read"
@ -6378,7 +6378,7 @@ run_test "DTLS fragmenting: proxy MTU, resumed handshake" \
key_file=data_files/server8.key \
hs_timeout=10000-60000 \
force_ciphersuite=TLS-ECDHE-ECDSA-WITH-AES-128-GCM-SHA256 \
mtu=1450 reconnect=1 reco_delay=1" \
mtu=1450 reconnect=1 skip_close_notify=1 reco_delay=1" \
0 \
-S "autoreduction" \
-s "found fragmented DTLS handshake message" \
@ -7225,7 +7225,7 @@ run_test "DTLS proxy: 3d, min handshake, resumption" \
"$P_SRV dtls=1 dgram_packing=0 hs_timeout=500-10000 tickets=0 auth_mode=none \
psk=abc123 debug_level=3" \
"$P_CLI dtls=1 dgram_packing=0 hs_timeout=500-10000 tickets=0 psk=abc123 \
debug_level=3 reconnect=1 read_timeout=1000 max_resend=10 \
debug_level=3 reconnect=1 skip_close_notify=1 read_timeout=1000 max_resend=10 \
force_ciphersuite=TLS-PSK-WITH-AES-128-CCM-8" \
0 \
-s "a session has been resumed" \
@ -7239,7 +7239,7 @@ run_test "DTLS proxy: 3d, min handshake, resumption, nbio" \
"$P_SRV dtls=1 dgram_packing=0 hs_timeout=500-10000 tickets=0 auth_mode=none \
psk=abc123 debug_level=3 nbio=2" \
"$P_CLI dtls=1 dgram_packing=0 hs_timeout=500-10000 tickets=0 psk=abc123 \
debug_level=3 reconnect=1 read_timeout=1000 max_resend=10 \
debug_level=3 reconnect=1 skip_close_notify=1 read_timeout=1000 max_resend=10 \
force_ciphersuite=TLS-PSK-WITH-AES-128-CCM-8 nbio=2" \
0 \
-s "a session has been resumed" \