New machdep mechanism in Frama-C

André Maroneze - 29th Jan 2024

A machdep (for machine-dependent) in Frama-C is a set of architecture-specific configurations, which include: integer sizes, predefined macros, compiler type, standard library constants, etc. They are essential when analyzing embedded, non-portable code. Thanks to some C11 features, the machdep generation mechanism has been revised in Frama-C, allowing users to more easily and reliably create their own machdeps. In this post, we briefly present the changes and how they will help users. Users only interested with how to use the new mechanism may want to jump directly to this section.

Machdep’s main features

A machdep defines a target platform. The exact meaning of what constitutes a “platform” is subject to changes, but currently it includes:

  • Compiler “class” (GCC, Clang, MSVC, …)
  • “Architecture” (x86-64, x86-32, AVR 16-bit, …)
  • OS (Linux, macOS, …) and libc (GNU libc, …)

The “class” is used by Frama-C to know whether some compiler extensions are allowed, and for predefined macros. For instance, a MSVC-based machdep will define the symbol __MSC_VER. Also, some accepted compiler extensions rely on this information to know whether we are in “GCC mode”, and thus extra built-ins and behaviors should be allowed.

The “architecture” part is mostly related to standard integer types (short, int, long, pointers, and their relation to types defined in stdint.h) and the associated libc definitions (e. g. INT_MAX). It also takes into account alignment constraints and more “exotic” details, such as whether char is signed or unsigned (which is true in PowerPC and some ARM architectures, but false in x86/AMD64).

Finally, the “OS and libc” part, which is the least well-defined one, takes into account some predefined macros, such as the numeric values for errno-related errors (EBADF, EINVAL, etc.), and the fact that the compiler and hardware architecture are not always enough to uniquely determine all the constants used during preprocessing.

Given the amount of information and tedious repetition involved in writing such a machdep file, it is clear that this task is best performed with a (semi-)automatic tool.

Previous attempts: compile-yourself C files

In the ideal case where the user is able to compile a program for the target architecture (using the same toolchain than the one they use for the program they want to analyze with Frama-C), the easiest solution was to use the C compiler to build and run a specially-crafted source file that would perform the tests itself and output the constants needed to create a new machdep. Even better, the program output itself could be the actual machdep!

Unfortunately, in practice, this did not always work: - Sometimes, users could compile a program for the target architecture, but not execute it; - In more extreme cases, users could preprocess and parse a file, but not link it to obtain the final binary; - Even if users could execute their program, the target architecture (often an embedded device) might not have support for stdio.h and variadic functions such as printf; - In case some features were not supported (e.g. sizeof applied to a void type, or to a function type; these are compiler extensions whose values might be needed in some programs), the compiler could simply fail and produce no output at all; and #ifdefs cannot always check these features reliably.

Overall, it would be necessary to write several C files, try to compile each of them separately, and then assemble the result in a single, unified machdep file. It would also be better if printf was not needed; even better, if the user did not even have to run the compiled binary.

C11’s features to the rescue

The solution adopted since Frama-C 27 (Cobalt), consists in using C11 features (namely, _Static_assert and _Generic) to obtain the above-mentioned results while minimizing the amount of tedious work, both in writing the machdep generator script, and in writing machdep files themselves.

This does add an extra requirement for the user: they need a C11-compatible parser, or at least one that supports _Static_assert and _Generic. Nowadays, most GCC- and Clang-based cross-toolchains do have such features; but if the user does not have such a compiler, they will have to manually fill in a machdep template.

As an example, here’s the char_is_unsigned.c file, which is used to test whether the char type is signed or unsigned in the target architecture:

_Static_assert((char)-1 >= 0 ? 1 : 0, "char_is_unsigned is False");
_Static_assert((char)-1 >= 0 ? 0 : 1, "char_is_unsigned is True");

This very simple file is (cross-)compiled by the machdep generator script, and the output is used to create the final machdep file. Similar techniques are used for several constants. For instance, to check the result of sizeof(void), we try to compile a file with such static assertions. If the file fails to compile, then we know the target machdep does not support it at all.

For errno.h error constants, the approach is slightly different: the compiler preprocesses a file containing several lines such as:

#include <errno.h>
#ifdef EACCES
int eacces_is = EACCES;
#endif

We obtain a preprocessed .i file that contains the numeric value for each constant, if it exists (thus their absence does not entail total failure). These constants are added to the machdep file and used when Frama-C preprocesses the sources.

Finally, an example where _Generic is useful (we also fall back to GCC’s __builtin_types_compatible_p if _Generic is not available; but such situations are increasingly rare), is when determining which integer type corresponds to a library type, e.g. size_t. The (simplified) macros below illustrate the approach:

#define COMPATIBLE(T1, T2) _Generic(((T1){0}), T2 : 1, default : 0)
#define TEST_TYPE_COMPATIBLE(T1, T2) \
  _Static_assert(!COMPATIBLE(T1, T2), "" #(T2) " is `" #T1 "`");
#define TEST_TYPE_IS(type) TEST_TYPE_COMPATIBLE(type, TEST_TYPE)
TEST_TYPE_IS(unsigned int)
TEST_TYPE_IS(unsigned long)
TEST_TYPE_IS(unsigned long long)

Thus, we combine _Generic and _Static_assert to check whether size_t is defined, in the target machdep, to unsigned int, unsigned long, or unsigned long long.

Machdeps are expressed in YAML

To simplify manual completion and improve readability, we chose YAML for machdep files. Using JSON would require adding double quotes and commas everywhere. The machdep files are quite long but simple in structure; using YAML ensures they remain very close to #define-like lists, making them easier to write and proofread.

Using it to create a machdep

The easiest way to run the machdep generator script is to run:

frama-c-script make-machdep

This will print the generated machdep in stdout, using the default compiler (cc) in your system. You can add --help to see the list of options. With --compiler and --compiler-flags, you can use a different C compiler (e.g. arm-linux-gnueabi-gcc) to create a machdep file for your target.

By default, Frama-C comes with a set of predefined machdeps, including some “compiler-neutral” ones (x86_32, x86_64), and some compiler-specific machdeps (e.g. gcc_x86_16 and msvc_x86_64). frama-c -machdep help lists the predefined machdeps.

Side effect: dynamic machdeps

One side effect of this new machdep architecture is that, instead of including one from a list of predefined header files, Frama-C now generates a new machdep on-the-fly, whenever parsing a source file, and stores it in a temporary directory (/tmp/__fc_machdepXXXXXX.dir, where XXXXXX are randomly-selected characters). This means that, if you want to manually re-run the preprocessing command executed by Frama-C (e.g. when debugging a parsing error), you need to add -kernel-msg-key pp, so that the generated machdep will not be erased. Frama-C will output a message such as:

[extlib] Debug: not removing dir /tmp/__fc_machdepc4cf14.dir

So that you can add -I /tmp/__fc_machdepc4cf14.dir to the preprocessing flags. This directory contains a generated __fc_machdep.h, which will be included by Frama-C’s standard library.

Conclusion: machdeps are better, more precise, and easier to use

The new machdep mechanism fixes some previous issues with predefined macros that could, in a few case studies, lead to discrepancies between the code seen by the compiler and the code seen by Frama-C, which generally resulted in issues during parsing. They should better adapt to different OSes (e.g. macOS) and architectures, avoiding some hardcoded values.

The generation of a machdep for a new architecture is easier than ever, especially if the target compiler supports C11. Otherwise, it is still possible to fill in the required constants by hand, but the user must then pay attention to their writing.

Overall, this overhaul of the machdep mechanism is a welcome change to ensure more reliable analyses and less manual work, thanks to improvements in the C standard and its toolchains.

Further machdep-related enhancements are under development, especially in the case where the preprocessor used to generate the machdep and the one used to preprocess the sources are the same.

Acknowledgments

We thank the DGA (Direction Générale de l’Armement) and the AID (Agence de l’Innovation de Défense) for their support in implementing this feature.

André Maroneze
29th Jan 2024