P44 was the framebuffer demo - a side rung. P45 is the first rung back on the Linux climb: the A extension. Eleven new instructions (lr.w, sc.w, and nine amo*.w ops), a 32-bit reservation register, and two new FSM states for read-modify-write.
The implementation is small because we’re on a single-hart in-order
core. The reservation set is one address register plus a valid
bit; loads/stores can’t be reordered across each other anyway, so
aq/rl ordering bits are effectively zero-cost (we ignore them).
AMO ops just walk through S_AMO_LOAD -> S_AMO_STORE -> S_WB,
with a combinational modify between the load and store.
Two real bugs surfaced during bring-up.
Bug 1: ALU forgot about AMO
First run of the directed probe halted with error 40 — AMOSWAP’s
“OLD value” came back wrong. The diagnosis took maybe ten minutes:
the ALU’s effective-address switch case lists is_load, is_store,
and a few others, but I’d forgotten to add is_amo. So
mem_addr = alu_y_q = 0 for every AMO op. Memory at address 0 is
the start of the FreeRTOS image, which has nonzero bytes there but
not the value the test wrote.
Fix: add is_amo to the load/store case, set alu_b = 0 for AMO
(so the address is just rs1).
This is the kind of bug that bites you specifically when you add a
new opcode that “fits” in existing infrastructure but needs an
explicit reference somewhere to be wired up. The decoder added is_amo
just fine; the ALU forgot.
Bug 2: t0 IS x5
After fixing the ALU, the directed probe passed. Then I ran the
arch-test sweep. Every test halted with x5 = 0x10001ff8.
That value is MMIO_HALT. So the sequence in the DUT plugin’s
RVMODEL_HALT_PASS macro was clobbering x5 after setting it to
- The macro looked like:
li x5, 1
li t0, 0x10001ff8
sw x1, 0(t0)
And t0 in the RISC-V ABI is x5. Same physical register. So
li t0, 0x10001ff8 overwrites x5 = 1 with 0x10001ff8. By the
time the chip halts, x5 is the MMIO address.
Fix: use t3/t4 as scratches for the halt-port write (those are
x28/x29, never aliased to x5), and re-order so li x5, 1 happens
after the MMIO_HALT store. Then x5 sits at 1 when halted goes
high.
The catch is purely about the ABI alias and our legacy testbench
checking x5. If we’d switched the testbench to read halt_code
exclusively (P43’s MMIO halt port), the bug wouldn’t have mattered.
But halt_code is per-project, and the P17 testbench shape that
the arch-test runner uses still reads x5. So we keep both.
What works now
- Directed probe: 18 sub-checks, all pass (AMOSWAP/ADD/AND/OR/XOR/ MIN/MAX/MINU/MAXU + LR/SC happy + LR/SC fail no-prior-LR + LR/SC fail intervening-store).
- FreeRTOS demo: still runs cleanly to halt_code=1 at 5,101,270 clocks (vs P43’s 5,100,751 - the difference is the new probe).
- ISA:
RV32IMA + Zicsr + Zifencei + Zicntr. One letter closer to RV32IMAC. - arch-test sweep: results pending; the I/M/Zicsr/Zifencei/Zicntr batches should match P39, and the new A batch is the first thing we’ll have a recorded count for.
What’s next
P46 - the C extension (compressed). That’s a meaningfully bigger RTL change: 16-bit instruction decode, fetch alignment fixup (instructions can start on any 16-bit boundary), and a 32-bit expansion table for ~75 c.* encodings. The FSM front-end gets restructured; the back-end stays mostly the same.
Or maybe Zba+Zbb first (P47 in the roadmap), since that’s smaller and additive. Engineering pragmatism vs roadmap order. We’ll see.