str2long

Pascal Cuoq - 20th Mar 2013

A coding contest of weeks past

A fortnight ago, I sent an entry in this Quick Coding Contest. The objective of the contest is to produce a C function to convert a string to the integer it represents. Now it turns out that my submission generates the shortest code after libc whatever that means. I am not sure why that is. Other submissions duplicate more code than mine but I read yet more submissions that felt just as short as mine.

Rather than the code duplication I could tease Robert Graham for using type int for indexes into a char array which can lead to incorrect behavior for strings of more than 2GiB on 64-bit and even on 32-bit architectures:

long str2long_robert_2(const char *str) 
{ 
    long result = 0; 
    int i; 
… 
       for (i=1; str[i]; i++) { 
            int c = str[i]; 

The proper type to use in C is size_t I think. In fact ptrdiff_t would feel just as natural to me although that would take us back to the issue with 2.5GiB strings and 32-bit architectures.

There are good reasons to forbid objects of more than 2GiB on 32-bit architectures (making the use of int as index in a char array safe in that case). But on my system malloc() still lets me allocate 2.5GiB without a peep. Actually I didn't change systems since the last time ptrdiff_t was discussed in this blog.

I do not have the heart to tease because writing with high-level idioms portable across languages is a laudable goal. Still if I was going to use a higher-level language with no unsigned type I would compute using the multiprecision integer type that any high-level language should have. C does not have native multiprecision integers and it has unsigned types. I feel that using the latter is acceptable when it compensates for the absence of the former.

Making an even shorter str2long() function

My submission looks like this:

long str2long_pascal (const char *p) 
{ 
  unsigned long l = 0; 
  int neg = 0; 
  if (!*p) goto err; 
  if (*p == '-') 
    { 
      p++; 
      neg = 1; 
    } 
 loop: 
  if (*p < '0' || *p > '9') goto err; 
  … 

One nice trick I saw in other submissions was to use neg = (*p == '-'); instead of what I wrote. I do not know whether compilers when provided with my version are typically able to simplify it into this one. It would be a sophisticated optimization but sounds plausible.

I really wanted to remark that statement if (!*p) goto err; in my submission was completely useless. It is intended to ensure that the empty string input is mapped onto the error case. If the statement was removed in the case of the empty input the test for '-' would fail and next control would go to if (*p < '0' || *p > '9') goto err; which would send execution to label err just the same.

I am pretty sure that no compiler knows how to detect this second trick by itself.

What can I say in guise of excuse? I spend a lot of time writing in OCaml where recursion and pattern-matching are more idiomatic. Yes I think this is a good excuse and I will explain why at length in a subsequent blog post.

Also I noticed the uselessness of the empty string test while the contest was still open but I wanted to keep my impressive score on metrics other than code size and speed as bragged about in the next section.

Other metrics the contest should definitely be judged on

I was a little disappointed to see that code size and speed were used for the first evaluation of the submissions because I optimized for completely different criteria.

  • Least number of structured loops: I hoped there would be a special prize for this easily measurable and totally objective criterion.
  • Time spent: I submitted my function after spending 27 minutes on it in total. This may seem like a lot for 20 lines of C. But it is actually fair for 20 lines of C code that do something and behave as intended. I achieved the 27 minutes time by :
  • Least quantity of compilation static analysis and general parsing not to mention testing of the submitted function : I did not compile or analyze my entry before sending it in. I was relieved to find out that it compiled at all after I sent the e-mail. I spent half of the working time proofreading my submission trying to think of all the ways I had written incorrect C programs in the past.

Conclusion

On a more serious note I am still unsure whether the confidence gained per unit of time was better or worse with reviewing than it would have been with testing. In this instance for a throwaway submission the tests of which I would never have a chance to reuse I was happy to forgo testing completely and simply read my function again and again.

If not having to write another unit test is like reaching the week-end not having to write any unit tests is like being on holidays.

Becoming serious again I think that similarly to the situation with static analysis and tests a comparison between reviewing code and tests does not make complete sense because different techniques are good at finding different issues. Rational developers use all available tools (and reviewing code as one such tool is probably underrated).

Pascal Cuoq
20th Mar 2013