Below are listed some of the projects I have been working on (reverse-chronological order).

Static Single Assignment form

Static Single Assignment (SSA) form is an intermediate representation of program in which program variables are assigned exactly once, thus making explicit in the program syntax the link between the variable definition program point and program points where it is used (def-use chains). Introduced in the late 80's by Cytron et al., SSA has been extensively used in many static analysis, because of the strong invariants that its structural constraints trigger.

Despite its large use in practice, and the vast body of academic work on SSA-based analysis, little work had been done on proving them sound with respect to a formal semantics of SSA, up until ~2010. This work fills this gap by formalizing the intuitive semantics of SSA, and identifying the key lemma that allows SSA to be thought of as an equational form of programs. We illustrate how these invariants can be used to prove an SSA-based optimizations, such as Sparse Conditional Constant Analysis, or Global Value Numbering.

This work has been formalized on top of CompCert, a verified C compiler, by adding to it an SSA-based middle-end. The middle-end performs (i) conversion from RTL to SSA form (ii) SSA-based optimizations, and (iii) a coalescing destruction à la Boissinot et al. of the SSA form back to RTL via the Conventional SSA form.

Website: http://compcertssa.gforge.inria.fr/

References:

Tagged low-level architecture and dynamic security enforcement

I was involved in 2013 in the CRASH-SAFE project, that aims at designing, implement and verify, from scratch, a new low-level architecture whose built-in, programmable hardware tags provide a dynamic enforcement mechanism of a variety of security property (non-interference, memory safety, and more generally memory protection).

I was involved in the verification effort of this new architecture. We formalize in Coq a core-model of this architecture, that exposes its main characteristics : tags as pointers to (dynamically allocated) heap data-structures and a specific hardware cache for storing recent combination of tags, while the cache fault handling mechanism, handled by software, interprets the security policy of interest, and, in turn, in case the instruction is allowed, populates the cache with the appropriate resulting tags.

We prove that this concrete machine is indeed ensuring a dynamic non-interference property, relying on a powerful semantic refinement methodology (from an abstract machine denoting the specification, to the concrete implementation in hardware), allowing for the proof to scale.

The tag management unit (TMU) is however designed for supporting a wide variety of other security property, and we are currently investigating the generality of our approach.

Website: http://www.crash-safe.org/

References:

Certified compilation of multi-threaded Java

This work aims at providing a certified compiler of Java concurrent programs. Ultimately, an input bytecode program will compile down to an provably equivalent assembly program. This entails choosing a formal semantics for the input language. The Java Memory Model, as is, is not suitable, and this is twofold. First, its formal definition is known to be broken on several points. Second, the race-commiting semantics it defines is too complex to even think about formally proving a compiler with regard to it.

But, towards the formal verification of multi-threaded Java compilers, it seems reasonable to specialize the compiler to a given target architecture. This is semantically correct, as it amounts to refine the memory model early in the compiler chain. This way, one dimension of the generality of the JMM is cut down. This is what we propose to do here, by considering target architectures with a tractable, well-defined and well-understood semantics. Hence, we consider architectures with a TSO memory model, whose tractability in the field of verified compilation has already been demonstrated, thanks to its operational characterization. The contract we propose here is a Buffered Memory Model for Java (BMM). While not as expressive as the JMM, in terms of valid optimizations, we believe it imposes only modest impact on performance on TSO architectures, given its well-suitedness for formal reasoning. BMM comes in two forms that we formally prove equivalent. The axiomatic view, close to the current definition of the JMM, is easy for the programmer to understand even in the presence of races. The operational view of the model is amenable for reasoning about program transformations and static analysis, and better matches the semantics of the compiler's lower level representations.

References:

Java bytecode intermediate representation

This work deals with Static analysis of Java program at the bytecode level. This can be either (i) preferred to Java source program analysis (simpler syntax and semantics) (ii) or required in the case where the Java compiler cannot be trusted. One particular difficulty about static analysis of bytecode is that the Java Virtual Machine is stack-based: the bytecode manipulates an operand-stack (to compute expressions) that has to be properly handled by analyses, for precision concern. One solution is to abstract the operand stack with symbolic information, but this might deeply impact the complexity of the analysis. An alternative is to perform the analysis on an intermediate representation of the Bytecode, but the generation of this IR must in turn be proved semantics-preserving.

We propose here a provably correct stackless IR for Java Bytecode, in which side-effect free expression trees are recovered. It also fold the object constructor code sequence, that might be needed for some analyses. Finally, the IR provides dynamic explicit checks instructions, that prevents the exception throwing order to be altered. The transformation is proved to be correct with regards to an operational semantics of Java bytecode. This IR is now part of the larger project of Sawja (Static Analysis Workshop for Java), a framework for designing and implementing Java bytecode analysis -- and includes a full Java bytecode class file manipulation library. Sawja is written in Ocaml and developped in the Celtique research team. Sawja has been used to develop several static analysis (see URL below), and the use this IR turned out to be very helpful in simplifying their conception.

Website: http://sawja.inria.fr/

References:

Non-interference

Recent work of Askarov et al (ESORICS’08) shows that the notion of non-interference used in most of the tools for analysing information flow security of program is not as secure as it was thought before: for programs which perform output, the amount of information leaked by a Denning style analysis is not bounded; the good news is that if secrets are chosen to be sufficiently large and sufficiently random then they cannot be effectively leaked at all. But secrets cannot always be made sufficiently large or sufficiently random. This work addresses this issue by developping a notion of secret-sensitive noninterference in which “small” secrets are handled more carefully than “big” ones. We illustrate the idea with a type system enforcing this policy, and which combines a liberal Denning-style analysis with a more restrictive system according to the nature of the secrets at hand.

References: