

# EuskalHack Security Congress VII





### \$ whoami

Name: Pepe Vila Alias: cgvwzq Web: <u>https://vwzq.net/</u>

In a past life: XSS troll master, NFC tinkerer, web security, CTFs, 2-columns-paper writer...

These days: security architect at Arm  $^{-}(^{\circ}_{-}o)/^{-}$ 





### **DISCLAIMER: this presentation reflects my personal views**

Most of which are nothing but a mix of other people's views incorporated into my neural network as result of a random set of experiences...





### In a hole in the microarchitecture there lived a vulnerability...

## Browse AI Text Sel 14777 💧 A: 🚞 ★ 🗶 Arm<sup>®</sup> Architecture Reference Manual for A-profile architecture arm Copyright © 2013-2024 Arm Limited or its affiliates. All rights reserved. ARM DDI 0487K a (ID032224

https://developer.arm.com/documentation/ddi0487/latest/

An architect works on the architecture as an engineer works on the ...

The Architecture defines the behavior of an abstract machine:

• ISA, memory model, execution modes, I/O, etc.

Microarchitecture refers to the implementation details:

caches, execution pipelines, optimizations, etc.

architecture

software

microarchitecture

software architegture microarchitecture



### And security?

2017: Spectre v1 - Bound check bypass 2017: Spectre v2 - Branch target 2017: Meltdown - Rogue data cache load 2018: Rogue system register read 2018: Spectre v4 - Speculative Store bypass 2018: Lazy FP state restore 2018: Spectre v1.1 - bound check bypass store 2018: SpectreRSB/ret2spec - return mispredict 2018: Foreshadow - L1 terminal fault 2019: MDS: RIDL, Zombieload, Fallout, CacheOut, ... 2020: Load Value Injection 2022: MMIO stale data issues 2022: PACMAN 2022: Branch type confusion: retbleed, phantom jumps.. 2022: Downfall 2022: Augury 2022: Spectre BHB 2023: Zenbleed 2023: Inception



@cgvwzq

5/

# RISKALMUN

### CPU vulns: There and back again

### Some (assumed) background

• Pipelining, out-of-order execution, speculation (e.g., branch prediction)

### One useful lie

For simplicity we'll replace the exfiltration channels by a simple **oracle** 

• we can ignore whether we do Flush+Reload, Prime+Probe, TLB covert-channel, etc.

Today, the attacker can **magically observe the addresses of all locations accessed** by the victim, but not the values.

Remember: Local attacker with code execution.



"Priestess of Delphi" by John Collier, 1891.



#### Spectre Attacks: Exploiting Speculative Execution

 Paul Kocher<sup>1</sup>, Jann Horn<sup>2</sup>, Anders Fogh<sup>3</sup>, Daniel Genkin<sup>4</sup>, Daniel Gruss<sup>5</sup>, Werner Haas<sup>6</sup>, Mike Hamburg<sup>7</sup>, Moritz Lipp<sup>5</sup>, Stefan Mangard<sup>5</sup>, Thomas Prescher<sup>6</sup>, Michael Schwarz<sup>9</sup>, Yuval Yarom<sup>8</sup>
 <sup>1</sup> Independent (www.paulkocher.com), <sup>2</sup> Google Project Zero,
 <sup>3</sup> G DATA Advanced Analytics, <sup>4</sup> University of Pennsylvania and University of Maryland, <sup>5</sup> Graz University of Technology, <sup>6</sup> Cyberus Technology,
 <sup>7</sup> Rambus, Cryptography Research Division, <sup>8</sup> University of Adelaide and Data61



### **Spectre v2 (aka. Branch target injection)**

Although more generally, we should say prediction injection

#### Branch predictor

| PC=0x704 | Target=0x500 |
|----------|--------------|
| PC=0x708 | Target=0x500 |
| PC=0x70c | Target=0x500 |
|          |              |
|          |              |
|          |              |
|          |              |

@cgvwzq



. . .



Attacker

0x500: ret . . . 0x700: mov x0, #0x500 0×704: blr ×0 0x708: blr x0

0x70c: blr x0

Long latency operation...

Prediction hit

EUSKALHACK SECURITY CONGRESS VII

0x708: blr x4



**Spectre v2 (aka. Branch target injection)** Although more generally, we should say prediction injection



### Value coming from memory to resolve pending load (dramatization)

**Branch predictor** 

| PC=0x704 | Target=0x500 |
|----------|--------------|
| PC=0x708 | Target=0x500 |
| PC=0x70c | Target=0x500 |
|          |              |
|          |              |
|          |              |



0x700: mov x0, #0x500 0x704: blr x0 0x708: blr x0 0x70c: blr x0

; attacker controls x0
> 0x704: ldr x4, [x3] ---0x708: blr x4

Long latency operation...



### Spectre v2 (aka. Branch target injection)

Although more generally, we should say prediction injection

#### **Branch predictor**

| PC=0x704 | Target=0x500 |
|----------|--------------|
| PC=0x708 | Target=0x500 |
| PC=0x70c | Target=0x500 |
|          |              |
|          |              |
|          |              |
|          |              |

@cgvwzq





. . .

0x500: ret

Attacker

0x700: mov x0, #0x500

### Whose problem is it?



#### B2.9.4 "Restrictions on the effects of speculation from Armv8.5"

#### If FEAT\_CSV2 is implemented:

- Code running in one hardware-defined context (context1) cannot either exploitatively control, or predictively leak to, the speculative execution of code in a different hardware-defined context (context2), as a result of the behavior of any of the following resources:
  - Branch target prediction based on the branch targets used in context1.
    - This applies to both direct and indirect branches, including return instructions, but excludes the
      prediction of the direction of a conditional branch.
  - Data Value predictions based on data value from execution in context1.

PSTATE. {N,Z,C,V} values from context1 are not considered a data value for this purpose.

- Virtual address-based cache prefetch predictions generated as a result of execution in context1, based on, or causing dereference of, data values from memory.
- Any other prediction mechanisms, other than Branch, Data Value, or Cache Prefetch predictions.

In this definition, the hardware-defined context is determined by:

- The Exception level.
- The Security state.
- When executing at EL1, if EL2 is implemented and enabled in the current Security state, the VMID.
- When executing at EL0, whether the EL1&0 or the EL2&0 translation regime is in use.
- When executing at EL0 and using the EL1&0 translation regime, the *address space identifier (ASID)* and, if EL2 is implemented and enabled in the current Security state, the VMID.
- When executing at EL0 and using the EL2&0 translation regime, the ASID.



### In practice...

An example of implementation that meets the requirements is **context tagging** 

#### **Branch predictor**

| Tag=A | PC=0x700 | Target=0x500 |
|-------|----------|--------------|
| Tag=A | PC=0x704 | Target=0x500 |
| Tag=A | PC=0x708 | Target=0x500 |
|       |          |              |
|       |          |              |
|       |          |              |
|       |          |              |



```
; exfiltration gadget
0x500: ldr x0, [x0]
0x504: ldr x1, [x0]
...
```



0x700: mov x0, #0x500 0x704: blr x0 0x708: blr x0 0x70c: blr x0

#### Prediction miss

. . .

. . .



Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks

Enrico Barberis<sup>†</sup> Pietro Frigo<sup>†</sup> Marius Muench Herbert Bos Cristiano Giuffrida

Vrije Universiteit Amsterdam {e.barberis, p.frigo, m.muench}@vu.nl {herbertb, giuffrida}@cs.vu.nl

### Spectre BHB

PEPE VILA

@cgvwzq



### Whose problem is it?

If either FEAT\_CSV2\_1p1 or FEAT\_CSV2\_3 is implemented, code running in one hardware-defined context (context1) cannot either exploitatively control, or predictively leak to, the speculative execution of code in a different hardware-defined context (context2) as a result of the behavior of branch target prediction based on the branch history used in context1.

#### Branch prediction

If FEAT\_CLRBHB is not implemented, then the architecture does not define any branch predictor maintenance instructions for AArch64 state.

If branch prediction is architecturally visible, cache maintenance must also apply to branch prediction.

When FEAT\_CLRBHB is implemented, the CLRBHB instruction available. When the CLRBHB instruction is executed, the branch history is cleared for the current context to the extent that branch history information created before the CLRBHB instruction cannot be used by code before the CLRBHB instruction to exploitatively control the execution of any code in the current context appearing in program order after the instruction.

When FEAT\_ECBHB is implemented, the branch history information created in a context before an exception to a higher Exception level using AArch64 cannot be used by code before that exception to exploitatively control the execution of any indirect branches in code in a different context after the exception.



#### Meltdown: Reading Kernel Memory from User Space

Moritz Lipp<sup>1</sup>, Michael Schwarz<sup>1</sup>, Daniel Gruss<sup>1</sup>, Thomas Prescher<sup>2</sup>, Werner Haas<sup>2</sup>, Anders Fogh<sup>3</sup>, Jann Horn<sup>4</sup>, Stefan Mangard<sup>1</sup>, Paul Kocher<sup>5</sup>, Daniel Genkin<sup>6,9</sup>, Yuval Yarom<sup>7</sup>, Mike Hamburg<sup>8</sup> <sup>1</sup>Graz University of Technology, <sup>2</sup>Cyberus Technology GmbH, <sup>3</sup>G-Data Advanced Analytics, <sup>4</sup>Google Project Zero, <sup>5</sup>Independent (www.paulkocher.com), <sup>6</sup>University of Michigan, <sup>7</sup>University of Adelaide & Data61, <sup>8</sup>Rambus, Cryptography Research Division

Meltdown

### Meltdown

ldrb w0, [x1] // x1 = 0xffff00000000abcd ldr xzr, [x2, x0]



#### 4KB granule 48-bit OA

RALIAC



.....



Table D8-62 Summary of possible memory access permissions using Direct permissions for a stage 1 translation supporting two Exception levels

| UXN | PXN | AP[2:1] | WXN | Permission                                                  |  |
|-----|-----|---------|-----|-------------------------------------------------------------|--|
| 0   | 0   | 00      | 0   | PrivRead, PrivWrite, PrivExecute, UnprivExecute             |  |
| 0   | 0   | 00      | 1   | PrivRead, PrivWrite, PrivWXN, UnprivExecute                 |  |
| 0   | 0   | 01      | 0   | PrivRead, PrivWrite, UnprivRead, UnprivWrite, UnprivExecute |  |
| 0   | 0   | 01      | 1   | PrivRead, PrivWrite, UnprivRead, UnprivWrite, UnprivWXN     |  |
| 0   | 0   | 10      | x   | PrivRead, PrivExecute, UnprivExecute                        |  |
| 0   | 0   | 11      | х   | PrivRead, PrivExecute, UnprivRead, UnprivExecute            |  |
| 0   | 1   | 00      | x   | PrivRead, PrivWrite, UnprivExecute                          |  |
| 0   | 1   | 01      | 0   | PrivRead, PrivWrite, UnprivRead, UnprivWrite, UnprivExecut  |  |
| 0   | 1   | 01      | 1   | PrivRead, PrivWrite, UnprivRead, UnprivWrite, UnprivWXN     |  |
| 0   | 1   | 10      | x   | PrivRead, UnprivExecute                                     |  |
| 0   | 1   | 11      | x   | PrivRead, UnprivRead, UnprivExecute                         |  |
| 1   | 0   | 00      | 0   | PrivRead, PrivWrite, PrivExecute                            |  |
| 1   | 0   | 00      | 1   | PrivRead, PrivWrite, PrivWXN                                |  |
| 1   | 0   | 01      | x   | PrivRead, PrivWrite, UnprivRead, UnprivWrite                |  |
| 1   | 0   | 10      | x   | PrivRead, PrivExecute                                       |  |
| 1   | 0   | 11      | x   | PrivRead, PrivExecute, UnprivRead                           |  |
| 1   | 1   | 00      | x   | PrivRead, PrivWrite                                         |  |
| 1   | 1   | 01      | х   | PrivRead, PrivWrite, UnprivRead, UnprivWrite                |  |
| 1   | 1   | 10      | х   | PrivRead                                                    |  |
| 1   | 1   | 11      | x   | PrivRead, UnprivRead                                        |  |

If a stage 1 translation regime supports two VA ranges, then all of the following are used to select the TTBR\_ELM

- If VA bit[55] is zero, then TTBR0\_ELx is selected.
- If VA bit[55] is one, then TTBR1\_ELx is selected.

17 /

PEPE VILA

@cgvwzq

### Meltdown

٠

٠

### Whose problem is it?



miss for performance monitoring features.

#### B2.9.4 "Restrictions on the effects of speculation from Armv8.5"

FEAT CSV3 introduces these restrictions:

- Data loaded under speculation with a Permission or Domain fault cannot be used to form an address, generate condition codes, or generate SVE predicate values to be used by other instructions in the speculative sequence.
- Any read under speculation from a register that is not architecturally accessible from the current Exception level cannot be used to form an address, to generate condition codes, or to generate SVE predicate values to be used by other instructions in the speculative sequence.





Jose Rodrigo Sanchez Vicarte<sup>1</sup>\*, Michael Flanders<sup>1†</sup>, Riccardo Paccagnella\*, Grant Garrett-Grossman\*, Adam Morrison<sup>‡</sup>, Christopher W. Fletcher\*, David Kohlbrenner<sup>†</sup> \*University of Illinois Urbana-Champaign, <sup>‡</sup>Tel Aviv University, <sup>†</sup>University of Washington {josers2, rp8, grantlg2, cwfletch}@illinois.edu, mad@cs.tau.ac.il, {mkf727, dkohlbre}@cs.washington.edu

#### GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers

| Boru Chen | Yingchen Wang | Pradyumna Shome            | Christopher W. Fletcher |
|-----------|---------------|----------------------------|-------------------------|
| UIUC      | UT Austin     | Georgia Tech               | UC Berkeley             |
|           | Kohlbrenner   | Riccardo Paccagnella       | Daniel Genkin           |
|           | of Washington | Carnegie Mellon University | Georgia Tech            |

### Data prefetchers

### Data prefetchers

#### Stride prefetcher

#### 1 2 3 4

```
for (int i=0; i<size; i+=stride) {
   sum += array[i];
}</pre>
```

#### Replay (or SMS) prefetcher

#### 

int process(my\_struct \*s) {
 s->out = s->field1 + s->field2;
}

#### Adjacent prefetcher

#### 2 43

```
for (int i=0; i<size; i++) {
    sum += array[i];
}</pre>
```







### Concluding thoughts...

- Hardware security issues are not more complex than software
  - It's just a blacker blackbox
  - Different background, but offensive mindset translates well
- Researchers (and attackers?) understanding of microarchitectures has increased dramatically → We are seen increasingly complex exploits
  - Keep breaking assumptions about what is and what is not exploitable
- Unexplored surface attacks → Lots of opportunities
  - More and more accelerators (GPUs, NPUs, JPegUs? Dafuq! :S)
  - New vendors building their custom silicon
- Increasing number of hardware security features: PAN, PAC, BTI, MTE, PPL, APRR, KTTR..
  - Most bypasses exploit deployment/configuration issues
  - PAN bypass was an interesting case of an architectural (i.e., design) problem
  - Will we start seeing hardware vulns in exploit chains?





i MUCHAS GRACIAS! ESKERRIK ASKO!