Code review finds minor issue in Zlib

Pascal Cuoq - 9th Jan 2013

In an article about comparing static analyzers (long-time readers of the blog do not follow the link. It is still the same old article)…

Where was I? Ah yes. One reason why it is extremely tricky to compare static analyzers is that a static analyzer is for identifying undefined behavior in C programs but while some families of undefined behavior are extremely dangerous C has many families of undefined behaviors. Some of them have been rather harmless until now and programmers voluntarily use constructs that cause them. One example briefly alluded to in a certain article is the computation of an invalid offset of a pointer.

It is not alright for a C program to compute an invalid offset as in char t[50]; ...(t-1).... This is technically undefined behavior. Unfortunately it is not alright for a C static analyzer to warn for this either because the programmer is likely doing it on purpose. With some compilation options the QuickLZ compression library does it (on purpose).

Frama-C's value analysis choice is to remain silent for the computation of t-1 but to warn if it is ultimately compared to another pointer—it could for instance equal &c against the programmer's expectations. Or of course the value analysis warns if the invalid pointer is dereferenced.

Et tu zlib?

Here is the beginning of file inffast.c in library zlib:

#ifdef POSTINC 
... 
#else 
#  define OFF 1 
#  define PUP(a) *++(a) 
#endif 

When you see *++p in a C program you can expect an invalid offset computation nearby. Indeed:

    in = strm->next_in - OFF; 
    ... 
    out = strm->next_out - OFF; 

Zlib's documentation allows the user of the library to pass in strm->next_in and strm->next_out pointers to the beginning of allocated blocks. This can cause a minor undefined behavior for some inputs.

A “bug” not found by static analysis

Nothing would be easier than implementing the detection of this undefined behavior in Frama-C but seeing more and more code that invoke it does not convince me that programmers want to know about it. Anyway I just stumbled on macro PUP as I was analyzing zlib.

Post-scriptum

A way to exhibit the undefined behavior is to use the input vector in a future post together with a more stringent instrumentation:

--- zlib-1.2.7/inffast.c	2010-04-19 06:16:23.000000000 +0200 
+++ zlib-modified/inffast.c	2013-01-21 00:56:25.000000000 +0100 
@@ -64 6 +64 9 @@ 
       requires strm->avail_out >= 258 for each loop to avoid checking for 
       output space. 
  */ 
+ 
+int printf(const char *fmt  ...); 
+ 
 void ZLIB_INTERNAL inflate_fast(strm  start) 
 z_streamp strm; 
 unsigned start;         /* inflate()'s starting value for strm->avail_out */ 
@@ -98 6 +101 7 @@ 
     state = (struct inflate_state FAR *)strm->state; 
     in = strm->next_in - OFF; 
     last = in + (strm->avail_in - 5); 
+    printf("strm->next_out=%p and zlib is about to subtract one from it"  strm->next_out); 
     out = strm->next_out - OFF; 
     beg = out - (start - strm->avail_out); 
     end = out + (strm->avail_out - 257); 
@@ -316 6 +320 7 @@ 
     strm->next_in = in + OFF; 
     strm->next_out = out + OFF; 
     strm->avail_in = (unsigned)(in < last ? 5 + (last - in) : 5 - (in - last)); 
+    printf("out=%p  end=%p"  out  end); 
     strm->avail_out = (unsigned)(out < end ? 
                                  257 + (end - out) : 257 - (out - end)); 
     state->hold = hold; 

When I compile and execute I get:

$ gcc -I Downloads/zlib-modified/ zlib_UB.c Downloads/zlib-modified/libz.a 
$ ./a.out  
out_buffer: 0x7fff59e83a34 
strm->next_out=0x7fff59e83a34 and zlib is about to subtract one from it 
out=0x7fff59e83a33  end=0x7fff59e83a5e 
Pascal Cuoq
9th Jan 2013