Six projects on the ladder, half the roadmap. P06 is the first thing that deserves to be called a CPU: P05’s datapath gets a control unit, an instruction ROM, and a 4-state FSM — FETCH → DECODE → EXECUTE → WB.
What happened
The smallest defensible CPU. 5-bit PC walking a 32-entry × 16-bit ROM. 16-op encoding (4-bit opcode, 3 register fields, 8-bit imm for LDI, 5-bit absolute branch target). Branches read flags from the most-recent flag-writing instruction. HLT parks the FSM in S_HALT.
Bring-up caught two RTL bugs:
- LDI was wired through the ALU’s MOV path (a-passthrough), but the
immediate had been routed to
op_b. Three test programs failed identically;r1 = LDI 5produced zero. Re-routed LDI’s imm intoop_a. - The
op_jmpencoding was 15 bits not 16, so HLT decoded as SAR. The CPU just kept executing whatever was after the halt.
Three Yosys gotchas during synthesis:
- Yosys’s
read_verilogrejectedfunction automatic ... return {...}inside parameter initializers. Switched all encoder helpers andreg_readto Verilog-2001 style (function-name assignment, noautomatic). Iverilog and Yosys both happy. - Unpacked array parameters didn’t fly — used packed bit-vectors.
- (These are the same recurring issues that hit P07 a couple hours later. Worth memorizing: SystemVerilog functions in parameter contexts are a Yosys minefield.)
Hardened at 100 MHz: 2659 cells, 130 × 130 µm die, +2.65 ns slow-corner setup, +0.94 ns hold, DRC=0, LVS=0, antenna=0. The 4-stage FSM split delivered as predicted: P05 needed 25 ns at slow corner; P06 ships at 10 ns with 2.65 ns of headroom. 2.5× the clock for 4× the cycles per instruction.
Then the punchline: replaced the rare-on-8-bit SAR opcode with OUT,
a real CPU instruction that pushes regs[ra] out a hardware UART tx
pin. P03’s UART module gets inlined as a sub-module inside top.sv,
with new top-level ports baud_div and uart_tx. The CPU’s FSM
stalls in WB until the byte finishes transmitting. Re-hardened at
100 MHz with the UART on board: 2333 cells (slightly fewer because
SAR’s barrel shifter went away), +0.32 ns slow-corner setup. Tighter
than the no-UART build but still passes.
Demo TB runs Fibonacci(7), pushes each value out via OUT, and a
behavioral 8N1 receiver in the testbench samples uart_tx and prints
the decoded bytes. Reads like a serial console. New annotation U
calls out the UART’s 28-flop cluster along the top edge.
Subagent in parallel built a /stack page from scratch — a
plain-language tour of iverilog, Yosys, OpenROAD, Magic, KLayout,
Netgen, LibreLane, sky130, Nix, the gds_to_glb pipeline. Linked from
the nav. (It got rebuilt completely a few hours later; see
the site infrastructure entry.)
Receipts
f4616d2— RTL + tests + librelane config.beea75b— hardened at 100 MHz, full signoff clean. Yosys function-syntax fix landed here.484b1cb— OUT instruction wires P03’s UART into the CPU; re-harden at 100 MHz; first version of/stack.
Project page: /projects/06_fsm_cpu/.