P09 is hardened. 17,277 non-filler cells, 3,262 flops, 600 × 600 µm die, 40 MHz clock, 2.94 ns of setup slack, zero DRC/LVS/antenna violations. Three iverilog programs (arith, branch, fib(10)) already passed before harden; the testbench still passes after the post-harden gate-level netlist replays them via STA.
The interesting part of this run wasn’t the harden — it was the two harden attempts that came before, both technically successful flows that produced the wrong chip.
Attempt 1: 210 cells
Started the harden with the original RTL. The flow finished cleanly. 25 ns target, 16.5 ns of slack, zero violations. Looked great on paper. Then I checked the cell count.
210 non-filler cells. Out of 92,996 placed components, 87,793
were decap_3 and 4,991 were fill_1. The actual logic was 210
cells. For a CPU with a 32-entry × 32-bit register file (1024
flops by themselves), a 64-word data memory (2048 flops), an ALU,
a branch comparator, and a five-stage FSM, that’s nonsense.
Yosys had reasoned about the design correctly: the only top-level
outputs were pc_out and halted. The PC walks autonomously and
nothing else feeds it. halted follows the FSM state. The
register file and the data memory are unobservable from outside
the chip — every write goes back into the same flops, no
external port reads them. So yosys, behaving exactly as a smart
synth tool should, proved the regfile and dmem were dead state
and threw them away. The chip that came out the other end was
the FSM and the PC and a halt detector. About 37 of the 210 cells
were even flops.
This is dead-code elimination at the design level, and it’s correct. The lesson is that what the chip can do and what the chip can prove it does are not the same thing. Simulation testbenches read regs[] directly via SystemVerilog hierarchy references, so they observed the correct behavior. The synth tool can’t do that — it can only observe the outputs.
Attempt 2: still 210 cells
Added two new ports — dbg_reg_out (mux from any regfile entry to
a chip pin) and dmem_out (dmem[0] to a chip pin). Re-ran the
harden expecting the regfile and dmem to materialize.
Cell count: still 210.
The default PROG baked into the parameter was jal x0, 0 — a
one-instruction infinite loop. After rst_n deasserts, the CPU
fetches jal x0, 0, decodes it, executes a branch back to PC=0,
and never writes a register. Yosys still proves the regfile stays
at zero forever, so dbg_reg_out is constant zero, so the regfile
is still dead. Same story for dmem.
Attempt 3: real default PROG
Replaced the placeholder jal x0, 0 with an actual Fibonacci(10)
program (13 instructions, ending in a halt). Now yosys sees that
the regfile gets written from non-zero ALU outputs and the dmem
gets a sw to address 0, so both have observable downstream
effects through the new debug ports. Re-ran the harden.
17,277 non-filler cells. 3,262 flops. The regfile and dmem are visible in the layout as dense flop arrays. The chip works.
What this is really about
The default PROG parameter is not just a convenience for
testbench authors — it’s the contract the synth tool sees about
what the chip might do. If your default boot program is
“infinite loop forever,” you’ve signed a contract that says “this
chip’s logic is welcome to be optimized down to a halt detector.”
For an educational CPU where you want to be able to point at a
real regfile in the layout, the default has to actually exercise
the regfile.
(In a production chip with external memory, this would never come up, because you’d have an instruction-fetch port wired to a chip pin, and the contract would be “the program is whatever is on the bus.”)
I added a comment to the parameter declaration explaining all of this, so the next person to read this file (probably future me) doesn’t spend three harden cycles wondering where the regfile went.
Tooling: the asset pipeline survived a 240 MB shapes.json
While debugging the empty chip, I discovered the layout asset pipeline doesn’t gracefully handle designs with 1.2M shapes (P09 flattened with substrate layers included produces a 240 MB shapes.json that Astro happily ships at runtime — not OK). Added two small things:
- An
EXTRA_SKIP_LAYERSenv var onbuild_layout_assets.shto drop noisy substrate layers (poly/nwell/diff/licon/mcon/via1/via2) from both the 2D shapes.json and the 3D layout.glb. For chip- scale viewing those layers are just texture; the metal stack carries the routing story. - A
SKIP_GLB=1escape hatch for designs where the 3D extrusion OOMs. P09 with substrate dropped extrudes fine; the safety valve is for whatever the next big design ends up being.
P09’s final assets: shapes.json drops from 240 MB to ~95 MB with just substrate layers off; layout.glb fits in well under a megabyte after meshopt compression.