Formally verifying zlib

Pascal Cuoq - 6th Dec 2012

In a blog post earlier this year, John Regehr wonders when software verification will finally matter. He means “formal verification”, I am pretty sure. “Verification” is what practitioners of, say, the software development V-cycle have been doing for decades, and it has kept us safe for that long—at least, when it mattered most.

Formal verification on the other hand, despite having been used here and there, is not yet the big deal that it could be. You'd better read the post I fear I may not be doing it justice.

In that post John writes:

For example I’d be happy to drop a proved-correct gzip into my Linux machine as long as it was respectably fast.

Okay John let us do this and see how far we can go.

We'll start with zlib that we are going to approach like we did QuickLZ by verifying that the decompression feature is resilient to malicious inputs. I have described the first potential issue on this code review site. If you have an opinion on this potential issue you can contribute by expressing it there (registering is a one-click process) or here in the comments if you prefer.

Edited to add: too late Andrew Šveikauskas has already determined that our first alarm was a false positive as you can see on codereview.stackexchange.com. Ah well… It would have been nice if out first alarm were a true positive but that was unlikely.

In case you regret missing out on the fun here is another one: when reaching line inffast.c:269 where from is computed as out - dist to be accessed next what prevents variable out to point one byte into the output buffer and dist to be 2 or 3 or 19? I expect the library must guard against this in the code above but this is the end of a long day and I did not find where. Also this would typically be the kind of relational information that the value analysis fails to remember so I am not promising there is a bug to find here.

Pascal Cuoq
6th Dec 2012