An interesting SSL implementation bug: CVE-2013-5914

Pascal Cuoq - 23rd Feb 2014

SSL in the news

SSL is a protocol for point-to-point confidential and authenticated communication over an insecure medium. It is the protocol behind HTTPS, among many other uses. In an Internet-connected system, the SSL implementation stands at the frontier between the system and the hostile outside world. For this reason, SSL implementation flaws are a prime target for attacks.

An ordinary bug

A rather banal SSL implementation bug was revealed over the weekend. A duplicated line in a crucial, if ordinary-looking, sequence of validation operations means that some of the validation steps are not taken:

    ...
    if ((err = ReadyHash(&SSLHashSHA1  &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx  &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx  &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx  &signedParams)) != 0)
        goto fail;
        goto fail; // <-------------------------------------------------
    if ((err = SSLHashSHA1.final(&hashCtx  &hashOut)) != 0)
        goto fail;
    ...
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

Note that the second consecutive goto fail is executed with variable err containing the result of the last validation operation 0 indicating success. This value is returned as-is by the function; the caller is therefore mislead into believing that the validation succeeded despite some of the validation steps not having been executed.

Because C lacks an exception mechanism the above is an often-seen programming pattern. The programming style here can hardly be blamed: this is how the best C programs are written. Except of course for the extraneous goto.

The SSL implementation and thus the bug in question are found in tens of millions of iOS devices with a few additional Mac OS X computers on top of that. The scope of the security problem caused by this bug and the obviousness of the issue when pointed out have together lead to much commentary on Twitter and other popular discussion forums.

So reaction. Very noise

“I would never have had this problem because I know that goto is bad” some commenters claim. I wish I was caricaturing but I am unfortunately only paraphrasing and combining several public reactions into one.

“I would never have had this problem because I use braces” essentially state others. Certainly the root cause of the problem must have been that the developer who introduced the bug was confused about the meaning of if (c) stmt1; stmt2;. Just look how ey indented it!

These two often-heard remarks strongly suggest to use brace-less control flow or the presence of goto as predictors of defects in C code. I am sure this will be a fruitful research direction. Someone from the software engineering academic community should look into it.

“Just adding a single line of code can bring a system to its knees” reminds Arie van Deursen. True true an important lesson there. We should all insert the following line somewhere in our respective codebases from time to time and take a good look at the effect it has in remembrance.

It is Ariane 5 all over again! Worse instead of just the makers of formal verification systems everyone seems to have a scheme to prevent the problem they already know is there.

An interesting bug in a different SSL implementation

The problem in most of the armchair analysis that has been going on on the Internet lies in the two following questions: how many more bugs in security-critical C code does the proposed remedy (be it more braces fewer gotos …) find? How many time-costly tedious soul-crushingly boring false positives does it uncover?

Since commenters have been generalizing on the basis of a population of one sample I feel no shame in presenting my own single example raising the total number of scrutinized bugs to two. Note that for us to make statistically sound arguments we will eventually need to extend the discussion to examples of correct code that we do not wish to change.

Until then here is a funny SSL implementation bug. It is characterized as follows in a version of PolarSSL 1.1.8 that colleagues and I have been modifying.

Screenshot 1: an alarm in our tis_recv() function?

CVE-2013-5914-1.png

PolarSSL expects the program in which it is incorporated to provide it with a function that receives data from the outside world and writes it to a memory buffer. We have made such a function baptized it tis_recv and we have set up PolarSSL to use it.

The function tis_recv takes three arguments. The first one is a context argument in case someone needs one (our function ignores this argument). Second is a pointer to the buffer where the data is to be written then third is the maximum size the function is allowed to write there.

We have specified our function tis_recv thus:

/*@
  requires recv_r1: \valid(output+(0..output_len-1)) ;
  requires recv_r2: output_len > 0 ;
*/
int tis_recv(void* p  unsigned char * output  size_t output_len);

The screenshot indicates on the bottom right that the pre-condition recv_r1 which states that the argument output points to a buffer large enough for output_len characters is not verified. How could this be? Surely a false positive… Or someone is calling the function with the wrong arguments. It does not look like the problem is in our function.

The GUI informs us that the function tis_recv is called in one place only and that place is inside ssl_fetch_input(). It is called through a function pointer stored inside a member of a struct accessed through a pointer. The GUI tells us that we can mentally substitute ssl->f_recv(..) with tis_recv(...).

Screenshot 2: a wrong third argument

The GUI tells us that the buffer that PolarSSL passes to tis_recv() has size 16KiB-ish (not pictured) and that the variable len passed as third argument appears to take any of the values of its size_t type (pictured in the bottom panel below).

CVE-2013-5914-2.png

Screenshot 3: jumping back to the origin of the value

We inquire where the value of variable len was last set and the GUI tells us it is at the yellow line just above the function call (pictured in the middle panel). Well hum yes we could have noticed that but it was faster to click.

CVE-2013-5914-3.png

Screenshot 4: argument nb_want is too large

The value of len is computed from ssl_fetch_input()'s argument nb_want and the value of nb_want appears to be too large 65540 for the size of the buffer that the GUI tells us we have (in the screenshot below the values computed as possible for nb_want are displayed in the bottom panel).

CVE-2013-5914-4.png

Screenshot 5: dangerous values come from caller ssl_read_record()

A new possibility offered by the Frama-C version I am using that may not even(*) be available in the latest release Fluorine is to observe in the bottom panel which call-sites and originating call-stacks cause which values to occur in a variable. Here the GUI shows that nb_want appears to be 65540 when called from function ssl_read_record() at line 1178 in file ssl_tls.c of PolarSSL. This means we can continue the investigation there. In contrast the value of nb_want can only be 5 when ssl_fetch_input() is called from ssl_parse_client_key_exchange() so there is no need to look at that function: it is definitely not part of this problem.

(*) I don't remember. It has been a long time has it not?

CVE-2013-5914-5.png

Screenshot 6: the seemingly too large argument is ssl->in_msglen

CVE-2013-5914-6.png

The problem is that ssl->in_msglen is too large here. But where does it come from?

Screenshot 7:

ssl->in_msglen has been computed from two bytes sent by the network (bad bad network). But the value of ssl->in_msglen is supposed to have been validated before being used. This is what the lines between the obtention of the value and its use are supposed to do.

CVE-2013-5914-7.png

Screenshot 8:

CVE-2013-5914-8.png

The value of ssl->in_msglen is validated when ssl->minor_ver is 0 and it is validated when ssl->minor_ver is 1. But according to the GUI ssl->minor_ver can be any of 0 1 or 2.

Explanation

At this point it is only a matter of confirming that the call to ssl_read_record() can indeed be reached with ssl->minor_ver set to 2. This is where one switches to existential mode possibly crafting a malicious message or simply building a convincing argument that values can converge here to cause bad things and send it to the official maintainer .

When I said that this was a modified PolarSSL 1.1.8 we were analyzing I cheated a little. This is a 1.1.8 version in which I have re-introduced a bug that was there from 1.0.0 to 1.1.7. The principal maintainer of PolarSSL suggests to fix the bug by replacing == SSL_MINOR_VERSION_1 by >= SSL_MINOR_VERSION_1.

Conclusion

We have seen a second example of a real bug that can occur in an SSL implementation. Understandably in the current context there has been much speculation over the last 48 hours on the innocence of the bug in Apple's implementation. Might it be a voluntarily inserted backdoor? Is the NSA behind this bug? (I link here to John Gruber's post because he writes clearly but the same question is raised all over the Internet).

Allow me to put it this way: if the Apple SSL bug is a trick from the NSA then you US citizens are lucky. Our spy agency in Europe is so much better that it does not even have a name you have heard before and it is able to plant bugs where the buffer overflow leading to arbitrary code execution is three function calls away from the actual bug. The bug from our spy agency is so deniable that the code actually used to be fine when there were only two minor revisions of the SSL protocol. The backdoors from your spy agency are so lame that the Internet has opinions about them.

Real bugs come in all sizes and shapes. That one mistake with security implication also causes easily detectable unreachable code or wrongly-indented lines does not mean that all security flaws will be detected so easily and that plenty of quirky but functionally correct code will not be wrongly flagged.

Speaking of varied bugs “what about PolarSSL version 1.1.8 without the manually re-inserted bug from CVE-2013-5914?” I hear you ask. This will be the subject of another blog post.

Acknowledgments

Philippe Herrmann was first to use Frama-C to find a security bug in PolarSSL (Philippe's bug was a denial of service in versions up to 1.1.1 fixed in 1.1.2). Anne Pacalet and Paul Bakker have helped me with various aspects of PolarSSL's verification including but not limited to the bug described in this post. Twitter users aloria thegrugq and johnregehr provided comments on drafts of this post. And finally the Internet made this post possible by being itself.

Pascal Cuoq
23rd Feb 2014