GDR LaMHA/LTP

class: center, middle

# Verified Translation Validation of Static Analyses

Gilles Barthe$^1$, Sandrine Blazy$^2$, Vincent Laporte$^1$, David Pichardie$^3$,  
**Alix Trieu**$^2$

$^1$ IMDEA Software, Madrid, Spain.  
$^2$ CNRS IRISA - Inria Rennes - Université Rennes 1  
$^3$ CNRS IRISA - Inria Rennes - ENS Rennes

Journée GDR LaMHA/LTP

October 11, 2017

---

# Analyses at low-level

We need analyses at low-level (i.e. close to ASM) for:
- verifying security properties
- optimizations
- ...

However, low-level languages lack abstractions that can make the analysis easier and more precise:
- no type
- no memory separation
- ...

---
class: split-50

# Timing Attack

.column[![:scale 80%](https://www.troyhunt.com/content/images/2016/02/610609stopwatch1.jpg)]
.column[
```C
int cmp4(char s1[4], char s2[4]) {
  for (int i = 0; i < 4; i++) {
*    if (s1[i] != s2[i]) {
*      return 0;
    }
  };
  return 1;
}
```]

.center[The running time of the program is proportional to the length of the greatest common prefix.]
???
Our side channel of interest is the running time of the applications, we can show that it can leak secrets.

---

# Constant-Time Programming

- Programming discipline named constant-time programming which ensures that the running time is constant-time *with regards to secrets*.

Constant-time programs do not:
- branch on secrets
- perform memory accesses depending on secrets

Lucky Microseconds: A Timing Attack on Amazon's s2n Implementation of TLS (Albrecht, Paterson, 2015)

.center.red.bold[How do we make sure a program is constant-time ?]

---

# Taint Analysis

- Initially taint all secrets.
- Taint variables in which data depending on secrets are assigned to.
- If no branching test and memory accesses are tainted, then the program is constant-time.

```C
int main(void) {
  int `my_secret` = /* secret */;
  ...
  x = my_secret;
  ...
  if (*p > 42)
  ...
}
```
---
count: false
# Taint Analysis

- Initially taint all secrets.
- Taint variables in which data depending on secrets are assigned to.
- If no branching test and memory accesses are tainted, then the program is constant-time.

```C
int main(void) {
  int `my_secret` = /* secret */;
  ...
  `x` = my_secret;
  ...
  if (*p > 42)
  ...
}
```
---
count: false
# Taint Analysis

- Initially taint all secrets.
- Taint variables in which data depending on secrets are assigned to.
- If no branching test and memory accesses are tainted, then the program is constant-time.

```C
int main(void) {
  int `my_secret` = /* secret */;
  ...
  `x` = my_secret;
  ...
  if (`*p` > 42)
  ...
}
```

---
count: false
# Taint Analysis

- Initially taint all secrets.
- Taint variables in which data depending on secrets are assigned to.
- If no branching test and memory accesses are tainted, then the program is constant-time.

```C
int main(void) {
  int `my_secret` = /* secret */;
  ...
  `x` = my_secret;
  ...
  if (*p > 42)
  ...
}
```

#### ASM constant-time analysis

G. Barthe, G. Betarte, J. D. Campo, C. Luna and D. Pichardie. _System-level non-interference for constant-time cryptography_. CCS 2014

- We want what we execute to be constant-time, i.e. assembly
- Constant-Time Analysis at ASM level
- Needs points-to information for the analysis

---
# Load Elimination Optimization

```C
$r_1$ = load $addr_1$;
store $addr_2$;
$r_2$ = load $addr_1$;
```

---
class: center, middle

.big[How could we help low-level analyses ?]

???
We observe that it is easier to analyse source programs, our solution is thus to translate results of analyses at high level down to low level.

---
# Contributions

.bigger[
- General methodology to *correctly* validate the translation of results of a class of static analyses from source to lower-level.
- Instantiation of the methodology for a points-to analysis.
- Experimental validation by improving an optmization pass and the previous taint analysis.]

---
# CompCert

CompCert is a formally verified realistic C compiler with a proof that it does not introduce bugs.

.theorem[#### Semantics preservation theorem:
$\forall S, safe(S) \implies S \equiv compile(S)$]

For all source programs S and compiler-generated code C,  
if the compiler generates machine code C from source S,  
without reporting a compilation error,  
and S has a safe behavior,  
then «C behaves like S».

Observable behavior: termination / divergence / undefined («going wrong») + trace of I/O operations performed

.cite[X. Leroy. _Formal certification of a compiler back-end, or: programming a compiler with a proof assistant_. POPL 2006

X. Leroy. _A Formally Verified Compiler Back-end_. JAR 2009]

???
Notice that there is nothing that says that it preserves security properties.

---

# Verasco

Verasco is a formally verified abstract interpreter based on CompCert.

.theorem[#### Correctness theorem:
$\forall S, analyze(S) = OK \implies safe(S)$]

For all source program S,  
if the analyzer applied to S does not generate alarms,  
then S has a safe behavior.

In order to work, Verasco performs very precise analyses by relying on very precise numerical domains.

.cite[JH. Jourdan, V. Laporte, S. Blazy, X. Leroy, D. Pichardie. _A Formally-Verified C Static Analyzer_.  POPL 2015.

S. Blazy, V. Laporte, D. Pichardie. _An Abstract Memory Functor for Verified C Static Analyzers_.  ICFP 2016.]

---
# CompCert and Verasco

.center[![:scale 120%](compcertverasco.svg)]

---

class: split-50

# How to correctly translate static analyses ?

.column[
- Verasco computes very precise points-to information on C#minor
- How do we correctly translate them down to assembly ?

Direct proof:
- Need to add a proof for each pass
- Need to modify the proof if a pass is modified
- Need to add a proof each time a pass is added
- Brittle approach

We show another method.
]
.column[
![:scale 50%](constant-time-2.svg)
]

---

# Key idea 1: Relative-safety checker

- Given two programs `$p_1$` and `$p_2$` knowing that `$p_1$` is safe, tries to prove that `$p_2$` is also safe.

# Key idea 2: Enforceable properties

- Enforceable properties are a class of program properties that can be enforced using runtime monitors.
- Inlining these monitors yields defensive forms.
- Verifying that the property holds is reduced to verify that the defensive program is safe.

For example, we can verify if a variable is within a given interval:
```C
int x;
/* ... */
/* $x \mapsto$ [0, 42] */
assert (0 <= x && x <= 42);
/* ... */
```

.cite[F. Schneider. _Enforceable security policies_. ACM Transactions on Information and System Security. 2000.]
---
.center[![:scale 100%](images/images.001.png)]
---
count: false
.center[![:scale 100%](images/images.002.png)]
---
count: false
.center[![:scale 100%](images/images.003.png)]
---
# Example

Original code with annotations:
```C
/* $p \mapsto A[0..4]$ */
x = *p;
/* $p \mapsto B[4..19]$ */
 *p = y;
```

Defensive code:
```C
/* $p \mapsto A[0..4]$ */
assert (A + 0 <= p && p <= A + 4);
x = *p;
/* $p \mapsto B[4..19]$ */
assert (B + 4 <= p && p <= B + 19);
 *p = y;
```
---
count: false
.center[![:scale 100%](images/images.003.png)]
---
count: false
.center[![:scale 100%](images/images.004.png)]
---
count: false
.center[![:scale 100%](images/images.005.png)]
---
count: false
.center[![:scale 100%](images/images.006.png)]
---
count: false
.center[![:scale 100%](images/images.007.png)]
---
count: false
.center[![:scale 100%](images/images.008.png)]
---
### Example

A is a local array of the current function:
```C
/* $p \mapsto A[0..4]$*/
x = *p;
```

During compilation, all local arrays are merged into a stack for each function:
```C
/* $r8 \mapsto stack[16..20]$*/
r16 = load(int32, r8);
```
---
count: false
.center[![:scale 100%](images/images.008.png)]
---
count: false
.center[![:scale 100%](images/images.009.png)]
---
count: false
.center[![:scale 100%](images/images.010.png)]
---
count: false
.center[![:scale 100%](images/images.011.png)]

---
# Precision Theorem

.theorem[
Theorem: `$
\forall p, \forall \phi,\\\\
\texttt{make_def}(p,\phi)= p_{\phi} \implies\\\\
\texttt{safe}(p_\phi) \implies\\\\
p \models \phi
$`]

For all program $p$ annotated with points-to information $\phi$,  
if the transformed defensive program $p_{\phi}$ is safe  
then $p$ verifies $\phi$.

$p \models \phi$ is defined as $p$ is safe in a *blocking* semantics which dynamically checks that the memory accesses are indeed in the bounds given by the annotations.
---
.center[![:scale 100%](images/images.011.png)]

???

- On ne dit rien de l'utilité des annotations par contre !
- Pas besoin de preuve en haut, même pas besoin d'être sûr que les annotations sont corrects en haut.

---

# Instantiation

.center[![:scale 100%](instantiation.svg)]

---

# Relative-safety checker

- We want to prove that $p_2$ is safe knowing that $p_1$ is safe.

- In our case, the programs are very similar, we can directly prove that they are equivalent.

- Both programs have the same functions, we only need to prove that the functions are equivalent pairwise.

- We use program products, which is a witness of the equivalence proof of both program.

---

# Function product example

.center[![:scale 100%](product.svg)]

---
# Correctness

The relative-safety checker verifies the following theorem:

.theorem[
#### Correctness theorem:  
$\forall L, R, safe(L\times R) \implies safe(L) \implies L \equiv R \wedge safe(R)$
]

If the program product of program $L$ and $R$ is safe,  
and $L$ is safe,  
then there is a simulation between programs $L$ and $R$,  
which entails that $R$ is also safe.

---

# Experimental evaluation: CSE Benchmark

```C
$r_1$ = load $addr_1$;
store $addr_2$;
$r_2$ = load $addr_1$;
```

![:scale 100%](CSEbench.svg)

---

# Results

.center[![:scale 100%](results.svg)]

???
- First block: mBedTLS/PolarSSL
- Second block: OpenSSL
- Third block: From CompCert's bench
- Fourth block: random program produced by Csmith

---

# Conclusion
.bigger[
- Methodology to translate results of static analyses
- Instantiation of the methodology
- Improvement of constant-time analysis and optimization pass]

# Perspectives
.bigger[
- Can we do better and directly translate the constant-time property ?
]
---
count:false
class: middle, center

# Questions ?

---
count:false
class: middle, center

# Annex

---
count: false
# RTL

The relative-safety checker is implemented at the RTL level.

- skip(_l_)
- op($op, \vec{r}, r, l$)
- return

- if(`$op, \vec{r}, l_{true} , l_{false}$`)
- load($\kappa,addr,\vec{r},r,l$)
- store($\kappa,addr,\vec{r}, r, l$)
- call($sig,id,\vec{r},r,l$)
- return r

The product programs use the same instructions augmented with:
- havoc
- assert($e$)

---
count:false
# Function product

.center[![:scale 80%](table.svg)]

---
count: false
# Defensive form

```C
/* x $\mapsto$ $t_1[2..4]$, $t_2[6..8]$ */
assert (($t_1$ + 2 <= x && x <= $t_1$ + 4) ||
        ($t_2$ + 6 <= x && x <= $t_2$ + 8));
y = *x;
```
???
Unfortunately, this is not correct C.
--
Not correct C according to C99, as $t_1$ and $t_2$ do not point to the same object.

```C
/* x $\mapsto$ $t_1[2..4]$, $t_2[6..8]$ */
assert (x == $t_1$ + 2 || x == $t_1$ + 3 || x == $t_1$ + 4 ||
        x == $t_2$ + 6 || x == $t_2$ + 7 || x == $t_2$ + 8);
y = *x;
```
We have to enumerate all possible pointers.

???
However, this does not affect performance since this is only a proof artifact.
---
count:false
# Forging pointers

```C
int main(void) {
  int $t_1$[42], $t_2$[9001];
  /* ... call to f ... */
  return;
}

int f(int* z) {
  int y, *x;
  /* ... */
  /* x $\mapsto$ main@$t_1[2..4]$, main@$t_2[6..8]$ */
  y = *x;
  /* ... */
}
```

- Problem: How do we write pointers to `$t_1$` and `$t_2$` to implement the defensive test ?
- Our solution:
    - Implement a shadow stack
    - Global array that will hold all the needed pointers
    - Functions will leak their stack pointers into it

---
count:false
# Shadow stack

```C
int* STK[2048];         /* Shadow-stack */
int CNT = 0;            /* Stack counter */

int main(void) {
  int main_stk[9043];
  CNT = CNT + 1;        /* Prologue */
  STK[CNT] = main_stk;
  /* ... f ... */
  CNT = CNT - 1;        /* Epilogue */
}

int f(int* z) {
  int f_stk[2];
  CNT = CNT + 1;
  STK[CNT] = f_stk;
  /* ... */
  /* f_stk[1] $\mapsto$ -1[2..4], -1[48..50] */
  assert(f_stk[1] == STK[CNT - 1] + 2 || f_stk[1] == STK[CNT - 1] + 3 || ... );
  f_stk[0] = *(f_stk[1]);
  /* ... */
  CNT = CNT - 1;
}
```

???
- On va transformer le programme en introduisant ce qu'on appelle une shadow stack.
- On transforme le programme pour que chaque fonction n'ait plus qu'une seule variable locale, un tableau qui représente la fusion des variables originales. On parle de la pile de la fonction.
- On rajoute un grand tableau global (la .emph[shadow stack]) et un compteur global au programme.
- On instrumente le programme pour que chaque fonction à son entrée incrémente le compteur et rajoute un pointeur vers sa pile dans le tableau global.
- De même, à sa sortie, il décrémente le compteur.