Internship Position at CEA LIST - LSL

Posted: 13 Oct 2025
Keywords: Machine Learning, Graph Neural Networks, Software verification

Program representations for Deep Learning

Institution

The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development, and innovation. Drawing on the widely acknowledged expertise gained by its 16,000 staff spanned over 9 research centers with a budget of 4.1 billion Euros, CEA actively participates in more than 400 European collaborative projects with a large number of academic (notably as a member of Paris-Saclay University) and industrial partners. Within the CEA Technological Research Division, the CEA List institute addresses the challenges coming from smart digital systems.

Among other activities, CEA List’s Software Safety and Security Laboratory (LSL) research teams design and implement automated analysis in order to make software systems more trustworthy, to exhaustively detect their vulnerabilities, to guarantee conformity to their specifications, and to accelerate their certification. In particular, the Frama-C platform is dedicated to perform a wide range of analyses over C programs (with an experimental C++ front-end).

Scientific context

As software becomes increasingly prevalent in critical applications, concerns about safety and security are growing in importance. To address this concern, formal methods offer a mathematical framework for applying logical reasoning to programs, enabling rigorous proofs of their correctness and the absence of flaws. Among these methods, Abstract Interpretation (Wilhelm 2022) stands out as a theory well-suited for fast and automated program verification.

Eva (Bühler, Maroneze, and Perrelle 2024) is a Frama-C plugin based on Abstract Interpretation dedicated to the automatic inference of properties about C programs, such as finding all the possible values of variables or inferring arithmetic relations between variables. The primary goal of this analysis is to prove the absence of runtime execution errors (e.g. arithmetic overflows, invalid memory accesses and other C undefined behaviors) that could be used by an attacker or cause safety issues. It is also used for code auditing and as a support for other analyses.

Eva includes a suite of techniques and strategies that can be selectively combined to enhance the precision of program analysis. However, enabling these techniques often incurs significant computational costs in terms of time and memory usage. Consequently, it is essential to make informed decisions about which techniques to apply in a given context. Addressing this problem requires in-depth expertise in both the code being analyzed and the internals of the analyzer—knowledge that can be difficult to obtain. Furthermore, manually fine-tuning the analysis to achieve optimal results can be a costly and time-consuming process, potentially even prohibitive in practice.

To tackle this challenge, we leverage Machine Learning methods—particularly Graph Neural Networks (GNNs) (Scarselli et al. 2008) —to automatically predict the most effective strategy for analyzing a specific program. These methods depend on representing the program as a graph (Cummins et al. 2021), capturing its syntactic structure, control flow, and data flow.

Building on this approach, we obtained promising initial results for predicting the loop-unrolling strategy in a recent ASE’25 publication.

The effectiveness of the prediction model heavily relies on the quality of this graph representation. An overly detailed graph may introduce noise and hinder learning, while a graph that omits key information may lack the expressiveness needed for accurate predictions. Although standard graph representations are often sufficient for many program analysis tasks, we hypothesize that our problem requires a deeper semantic understanding of the program. This suggests the need for more specialized graph constructions.

Internship objectives

The primary goal of the internship is to design and implement a set of graph generators capable of producing various representations of a given program. These graphs will be evaluated through a Machine Learning pipeline to identify which representation yields the best predictive performance for our specific task.

The resulting tool will be integrated into the Frama-C platform and should be designed with reusability in mind, allowing it to support future learning-based analyses.

This project involves preliminary theoretical research to identify relevant graph representations. Potential directions include:

Static Single Assignment Form
Static Single Information Form (Ananian 2001)
Joint program and memory representations, where nodes correspond to memory locations at specific control points

Qualifications

Minimal
- Master student in Computer Science or Computer Engineering (M1/M2)
- Good knowledge of Machine Learning, preferably Graph Neural Networks (GNNs)
- Ability to work in a team
Preferred
- A certain flair/affinity for mathematical matters
- Some knowledge of functional programming, preferably OCaml
- Some knowledge of imperative programming, preferably C

Characteristics

Duration: 5-6 months
Location: CEA Nano-INNOV, Paris-Saclay Campus, France (Work From Home possible up to 2-3 days/week)
Compensation:
- €1300 (Bac+4/M1), €1400 (Bac+5/M2) monthly stipend
- Maximum €229 housing and travel expense monthly allowance (in case a relocation is needed)
- CEA shuttle in Paris region and 75% refund of transit pass
- Subsidized lunches

Application

If you are interested in this internship, please send to the contact persons an application containing:

Your resume
A cover letter indicating how your curriculum and experience match the qualifications expected and how you would plan to contribute to the project
Your bachelor and master transcripts, if any

Applications are welcome until the position is filled. Please note that the administrative processing may take up to 3 months.

Contact persons

For further information or details about the internship before applying, please contact:

Valentin Perrelle (valentin.perrelle@cea.fr)
Michele Alberti (michele.alberti@cea.fr)

References

Ananian, C Scott. 2001. “The Static Single Information Form.” PhD thesis, Massachusetts Institute of Technology.

Bühler, David, André Maroneze, and Valentin Perrelle. 2024. “Abstract Interpretation with the Eva Plug-in.” In Guide to Software Verification with Frama-c: Core Components, Usages, and Applications, edited by Nikolai Kosmatov, Virgile Prevosto, and Julien Signoles, 131–86. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-55608-1_3.

Cummins, Chris, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael F. P. O’Boyle, and Hugh Leather. 2021. “ProGraML: A Graph-Based Program Representation for Data Flow Analysis and Compiler Optimizations.” In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, edited by Marina Meila and Tong Zhang, 139:2244–53. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/cummins21a.html.

Scarselli, Franco, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. “The Graph Neural Network Model.” IEEE Transactions on Neural Networks 20 (1): 61–80.

Wilhelm, Reinhard. 2022. “Principles of Abstract Interpretation: By Patrick Cousot MIT Press, 2021, ISBN 9780262044905, pp. 1-819. Reviewed by Reinhard Wilhelm.” Formal Aspects Comput. 34 (2): 1–3. https://doi.org/10.1145/3546953.