journal 2026-05-03

P45 - atomics on the chip

p45riscvatomicslinux-path

P44 was the framebuffer demo - a side rung. P45 is the first rung back on the Linux climb: the A extension. Eleven new instructions (lr.w, sc.w, and nine amo*.w ops), a 32-bit reservation register, and two new FSM states for read-modify-write.

The implementation is small because we’re on a single-hart in-order core. The reservation set is one address register plus a valid bit; loads/stores can’t be reordered across each other anyway, so aq/rl ordering bits are effectively zero-cost (we ignore them). AMO ops just walk through S_AMO_LOAD -> S_AMO_STORE -> S_WB, with a combinational modify between the load and store.

Two real bugs surfaced during bring-up.

Bug 1: ALU forgot about AMO

First run of the directed probe halted with error 40 — AMOSWAP’s “OLD value” came back wrong. The diagnosis took maybe ten minutes: the ALU’s effective-address switch case lists is_load, is_store, and a few others, but I’d forgotten to add is_amo. So mem_addr = alu_y_q = 0 for every AMO op. Memory at address 0 is the start of the FreeRTOS image, which has nonzero bytes there but not the value the test wrote.

Fix: add is_amo to the load/store case, set alu_b = 0 for AMO (so the address is just rs1).

This is the kind of bug that bites you specifically when you add a new opcode that “fits” in existing infrastructure but needs an explicit reference somewhere to be wired up. The decoder added is_amo just fine; the ALU forgot.

Bug 2: t0 IS x5

After fixing the ALU, the directed probe passed. Then I ran the arch-test sweep. Every test halted with x5 = 0x10001ff8.

That value is MMIO_HALT. So the sequence in the DUT plugin’s RVMODEL_HALT_PASS macro was clobbering x5 after setting it to

  1. The macro looked like:
li x5, 1
li t0, 0x10001ff8
sw x1, 0(t0)

And t0 in the RISC-V ABI is x5. Same physical register. So li t0, 0x10001ff8 overwrites x5 = 1 with 0x10001ff8. By the time the chip halts, x5 is the MMIO address.

Fix: use t3/t4 as scratches for the halt-port write (those are x28/x29, never aliased to x5), and re-order so li x5, 1 happens after the MMIO_HALT store. Then x5 sits at 1 when halted goes high.

The catch is purely about the ABI alias and our legacy testbench checking x5. If we’d switched the testbench to read halt_code exclusively (P43’s MMIO halt port), the bug wouldn’t have mattered. But halt_code is per-project, and the P17 testbench shape that the arch-test runner uses still reads x5. So we keep both.

What works now

What’s next

P46 - the C extension (compressed). That’s a meaningfully bigger RTL change: 16-bit instruction decode, fetch alignment fixup (instructions can start on any 16-bit boundary), and a 32-bit expansion table for ~75 c.* encodings. The FSM front-end gets restructured; the back-end stays mostly the same.

Or maybe Zba+Zbb first (P47 in the roadmap), since that’s smaller and additive. Engineering pragmatism vs roadmap order. We’ll see.