The whole point of implementing a real ISA (instead of inventing our own opcodes for convenience) is that there’s a real toolchain out there that targets it. Today P09 stopped being a chip you write hand-assembled programs for and started being a chip you compile C to.
tools/riscv-asm/ is the new harness. It’s small — a Makefile, a
linker script, a 5-instruction boot stub in start.S, a
70-line bin_to_prog.py that converts riscv64-elf-objcopy -O binary
output into a SystemVerilog localparam PROG_FROM_C = { ... };
literal. The flow is:
fib.c
+ start.S
+ p09.ld
│
▼ riscv64-elf-gcc
fib.elf
│
▼ riscv64-elf-objcopy -O binary
fib.bin
│
▼ uv run bin_to_prog.py
fib.svh
│
▼ iverilog tb_c.sv …
the chip runs the program.
make c-test in projects/09_rv32i_min/test/ chains all of
this. It builds the .svh in tools/riscv-asm/, copies it next to
the testbench, instantiates top with PROG(PROG_FROM_C), and
asserts the chip writes 55 to dmem[0] after halt.
Result on the first clean run: 301 cycles, dmem[0] == 55, PASS.
Two surprises
The first version of fib.c was three lines:
static volatile unsigned int * const DMEM = (unsigned int *)0x00000000;
int main(void) { DMEM[0] = fib(10); return 0; }
gcc compiled main to two instructions:
sw zero, 0(zero)
ebreak
That’s: write 0 (not 55) to address 0, then trap. Both wrong.
The “write 0” part is gcc treating *(int*)NULL = X as undefined
behaviour and silently replacing the value being stored with
zero. The ebreak is gcc treating the entire null-pointer write
as UB and emitting __builtin_trap() as the body of the
function. Both come from -fdelete-null-pointer-checks, which
is on by default. We add -fno-delete-null-pointer-checks and
gcc emits a real fib loop that stores 55 to dmem[0].
This is one of those default optimizations that’s right for hosted programs (where 0 is genuinely null) and wrong for bare-metal programs (where 0 is a real, valid memory address — in P09’s case, the first slot of the data RAM). Worth knowing about.
The second surprise was milder: gcc -Os removed fib() entirely
on the first -fno-delete-null-pointer-checks build because the
input was a compile-time constant. fib(10) collapsed to 55
and the loop disappeared. That’s actually fine — the chip still
ran the (extremely short) compiled program correctly. But for the
demo I wanted gcc to emit a real loop, so I made fib’s argument
come from a volatile variable so the optimizer can’t fold it.
Final assembly for main:
main:
li a4, 10 ; n
li a3, 1 ; b
li a5, 0 ; a
loop:
mv a2, a3 ; t = b
addi a4, a4, -1 ; n--
add a3, a3, a5 ; b = b + a
mv a5, a2 ; a = t
bne a4, x0, loop
sw a2, 0(x0) ; dmem[0] = result
li a0, 0 ; return 0
ret ; jalr x0, 0(ra) -> falls through to halt
Eleven instructions of fib loop + four-instruction boot stub = fifteen instructions total, fitting comfortably in P09’s 256-slot ROM.
What this unlocks
We can now write real C programs and run them on the chip. Not “this assembly looks like what gcc would emit” — actual gcc. That’s the moment a CPU stops being a curiosity and starts being useful. The next demos can be things like:
- a CRC8 routine (loop + bitwise ops),
- a simple state machine driving a fictional UART output,
- a small interpreter for an even simpler instruction set (a CPU running a CPU).
All within the 1 KB ROM ceiling, which is genuinely tight but forces the kind of attention to code size that real embedded work requires.
P09 is now hardened-and-toolchainable. The natural next step is TT-style packaging — but that’s its own project.