Autonomous-mode rung, 8th past P45. The big architectural piece — the chip’s first MMU.
What landed
Two new FSM states (S_PTW1, S_PTW0) walk the Sv32 page tables.
On any load/store from S-mode with satp.MODE=1, the chip
now:
- Saves the VA, transitions to S_PTW1.
- S_PTW1 issues a memory read at
(satp.PPN<<12) + VPN[1]*4. On response, validates the PTE and either traps on invalid/leaf-at-L1 (we don’t support megapages yet) or carries the L1 PPN to S_PTW0. - S_PTW0 reads at
(L1.PPN<<12) + VPN[0]*4. Validates leaf PTE: V set, R/W matches access, A set, D set if store, U not set (S can’t access U pages without SUM, which we don’t have). On success, setsalu_y_q = {pte.PPN, va.offset}and transitions to S_MEM. - S_MEM does the actual load/store at the translated PA.
- On any walker failure, raises load/store page-fault with
VA in
mtval.
The probe builds a real page table in BSS, plants known data at a chosen VA, mret-s to S, does load+store via that VA, then clears PTE.V to trigger a fault. PASS at 5,245,499 clocks.
A subtle NBA bug
Found a non-blocking-assignment timing bug during bring-up:
ptw_va_q <= alu_y_q reads the previous instruction’s
alu_y_q because alu_y_q <= alu_y is also a NBA in the
same S_EXECUTE cycle. Test 1 (load) passed by coincidence
because the prior li t0, TEST_VA had left alu_y_q at
TEST_VA. Test 2 (store) failed because a li t1, 0xCAFEBABE
intervened. Fix: read combinational alu_y directly instead
of going through the latched alu_y_q.
Pre-existing probe regression
The P47 storage probe was writing 0xC0DE0009 to satp and
reading it back. With the walker, that canary’s bit-31 set
leaves MODE=Sv32 enabled, and any later S-mode load/store
walks an unset page table → page faults. Updated the P47
probe to clear satp after its round-trip.
What’s NOT done
The minimum-viable cut leaves real holes:
- Instruction fetch translation. Kernel code in S still fetches via PA. A Linux kernel can be made to identity-map itself initially, but real userspace can’t run yet.
- Megapages. Linux uses 4 MiB pages for the kernel direct-map; we trap on leaf-at-L1.
- A/D updates. We trap; software must pre-set both.
- TLB. Every memory access walks. ~10× slowdown vs a TLB-equipped chip.
sfence.vma. Not decoded — illegal-instr.- AMO translation. AMO ops in S use raw VA. A small follow-up.
- SUM/MXR/MPRV.
A real Linux boot will need at least megapages, instruction-
fetch translation, and sfence.vma. None are deep changes
from here — they reuse the same walker.
Wart count
Only the open one (P46 gcc-zbb sext.b hang).
What’s next
The architectural ladder is essentially done; from here it’s platform glue. Next rungs in roadmap order:
- CLINT-shaped timer so a stock Linux DTS finds its timer interrupt.
- PLIC-shaped external interrupt controller.
- Larger external memory model (256 KiB → 16+ MiB) so we can actually load a kernel image.
- Device tree + minimal SBI.
Then BusyBox initramfs and the actual kernel boot. We’re much closer than I expected to be when “go to sleep please continue down the path” started 7 rungs ago.