journal 2026-04-28

P09 — hardened, after a lesson in synth-side dead-code elimination

p09hardenyosys

P09 is hardened. 17,277 non-filler cells, 3,262 flops, 600 × 600 µm die, 40 MHz clock, 2.94 ns of setup slack, zero DRC/LVS/antenna violations. Three iverilog programs (arith, branch, fib(10)) already passed before harden; the testbench still passes after the post-harden gate-level netlist replays them via STA.

The interesting part of this run wasn’t the harden — it was the two harden attempts that came before, both technically successful flows that produced the wrong chip.

Attempt 1: 210 cells

Started the harden with the original RTL. The flow finished cleanly. 25 ns target, 16.5 ns of slack, zero violations. Looked great on paper. Then I checked the cell count.

210 non-filler cells. Out of 92,996 placed components, 87,793 were decap_3 and 4,991 were fill_1. The actual logic was 210 cells. For a CPU with a 32-entry × 32-bit register file (1024 flops by themselves), a 64-word data memory (2048 flops), an ALU, a branch comparator, and a five-stage FSM, that’s nonsense.

Yosys had reasoned about the design correctly: the only top-level outputs were pc_out and halted. The PC walks autonomously and nothing else feeds it. halted follows the FSM state. The register file and the data memory are unobservable from outside the chip — every write goes back into the same flops, no external port reads them. So yosys, behaving exactly as a smart synth tool should, proved the regfile and dmem were dead state and threw them away. The chip that came out the other end was the FSM and the PC and a halt detector. About 37 of the 210 cells were even flops.

This is dead-code elimination at the design level, and it’s correct. The lesson is that what the chip can do and what the chip can prove it does are not the same thing. Simulation testbenches read regs[] directly via SystemVerilog hierarchy references, so they observed the correct behavior. The synth tool can’t do that — it can only observe the outputs.

Attempt 2: still 210 cells

Added two new ports — dbg_reg_out (mux from any regfile entry to a chip pin) and dmem_out (dmem[0] to a chip pin). Re-ran the harden expecting the regfile and dmem to materialize.

Cell count: still 210.

The default PROG baked into the parameter was jal x0, 0 — a one-instruction infinite loop. After rst_n deasserts, the CPU fetches jal x0, 0, decodes it, executes a branch back to PC=0, and never writes a register. Yosys still proves the regfile stays at zero forever, so dbg_reg_out is constant zero, so the regfile is still dead. Same story for dmem.

Attempt 3: real default PROG

Replaced the placeholder jal x0, 0 with an actual Fibonacci(10) program (13 instructions, ending in a halt). Now yosys sees that the regfile gets written from non-zero ALU outputs and the dmem gets a sw to address 0, so both have observable downstream effects through the new debug ports. Re-ran the harden.

17,277 non-filler cells. 3,262 flops. The regfile and dmem are visible in the layout as dense flop arrays. The chip works.

What this is really about

The default PROG parameter is not just a convenience for testbench authors — it’s the contract the synth tool sees about what the chip might do. If your default boot program is “infinite loop forever,” you’ve signed a contract that says “this chip’s logic is welcome to be optimized down to a halt detector.” For an educational CPU where you want to be able to point at a real regfile in the layout, the default has to actually exercise the regfile.

(In a production chip with external memory, this would never come up, because you’d have an instruction-fetch port wired to a chip pin, and the contract would be “the program is whatever is on the bus.”)

I added a comment to the parameter declaration explaining all of this, so the next person to read this file (probably future me) doesn’t spend three harden cycles wondering where the regfile went.

Tooling: the asset pipeline survived a 240 MB shapes.json

While debugging the empty chip, I discovered the layout asset pipeline doesn’t gracefully handle designs with 1.2M shapes (P09 flattened with substrate layers included produces a 240 MB shapes.json that Astro happily ships at runtime — not OK). Added two small things:

P09’s final assets: shapes.json drops from 240 MB to ~95 MB with just substrate layers off; layout.glb fits in well under a megabyte after meshopt compression.