P09 — C compiles to silicon

The whole point of implementing a real ISA (instead of inventing our own opcodes for convenience) is that there’s a real toolchain out there that targets it. Today P09 stopped being a chip you write hand-assembled programs for and started being a chip you compile C to.

tools/riscv-asm/ is the new harness. It’s small — a Makefile, a linker script, a 5-instruction boot stub in start.S, a 70-line bin_to_prog.py that converts riscv64-elf-objcopy -O binary output into a SystemVerilog localparam PROG_FROM_C = { ... }; literal. The flow is:

fib.c
  + start.S
  + p09.ld
       │
       ▼  riscv64-elf-gcc
  fib.elf
       │
       ▼  riscv64-elf-objcopy -O binary
  fib.bin
       │
       ▼  uv run bin_to_prog.py
  fib.svh
       │
       ▼  iverilog tb_c.sv …
  the chip runs the program.

make c-test in projects/09_rv32i_min/test/ chains all of this. It builds the .svh in tools/riscv-asm/, copies it next to the testbench, instantiates top with PROG(PROG_FROM_C), and asserts the chip writes 55 to dmem[0] after halt.

Result on the first clean run: 301 cycles, dmem[0] == 55, PASS.

Two surprises

The first version of fib.c was three lines:

static volatile unsigned int * const DMEM = (unsigned int *)0x00000000;
int main(void) { DMEM[0] = fib(10); return 0; }

gcc compiled main to two instructions:

sw zero, 0(zero)
ebreak

That’s: write 0 (not 55) to address 0, then trap. Both wrong.

The “write 0” part is gcc treating *(int*)NULL = X as undefined behaviour and silently replacing the value being stored with zero. The ebreak is gcc treating the entire null-pointer write as UB and emitting __builtin_trap() as the body of the function. Both come from -fdelete-null-pointer-checks, which is on by default. We add -fno-delete-null-pointer-checks and gcc emits a real fib loop that stores 55 to dmem[0].

This is one of those default optimizations that’s right for hosted programs (where 0 is genuinely null) and wrong for bare-metal programs (where 0 is a real, valid memory address — in P09’s case, the first slot of the data RAM). Worth knowing about.

The second surprise was milder: gcc -Os removed fib() entirely on the first -fno-delete-null-pointer-checks build because the input was a compile-time constant. fib(10) collapsed to 55 and the loop disappeared. That’s actually fine — the chip still ran the (extremely short) compiled program correctly. But for the demo I wanted gcc to emit a real loop, so I made fib’s argument come from a volatile variable so the optimizer can’t fold it.

Final assembly for main:

main:
    li    a4, 10            ; n
    li    a3, 1             ; b
    li    a5, 0             ; a
loop:
    mv    a2, a3            ; t = b
    addi  a4, a4, -1        ; n--
    add   a3, a3, a5        ; b = b + a
    mv    a5, a2            ; a = t
    bne   a4, x0, loop
    sw    a2, 0(x0)         ; dmem[0] = result
    li    a0, 0             ; return 0
    ret                     ; jalr x0, 0(ra) -> falls through to halt

Eleven instructions of fib loop + four-instruction boot stub = fifteen instructions total, fitting comfortably in P09’s 256-slot ROM.

What this unlocks

We can now write real C programs and run them on the chip. Not “this assembly looks like what gcc would emit” — actual gcc. That’s the moment a CPU stops being a curiosity and starts being useful. The next demos can be things like:

a CRC8 routine (loop + bitwise ops),
a simple state machine driving a fictional UART output,
a small interpreter for an even simpler instruction set (a CPU running a CPU).

All within the 1 KB ROM ceiling, which is genuinely tight but forces the kind of attention to code size that real embedded work requires.

P09 is now hardened-and-toolchainable. The natural next step is TT-style packaging — but that’s its own project.