P13 hardened. The chip is the same TT 8×2 tile as P12, the same
RV32E core, the same tt_um_* pin frame — but with an instruction
memory that’s writable from outside and a UART loader to drive it.
The fabricated chip stops being a “runs one program forever” toy
and becomes a real (very small) microcontroller.
Numbers
| P12 | P13 | factor | |
|---|---|---|---|
| Cells | 4,943 | 11,082 | 2.24× |
| Flops | 920 | 2,038 | 2.21× |
| imem flops | 0 (comb ROM) | 1,024 | +1,024 |
| Setup slack @ 50 MHz | 3.91 ns | 0.93 ns | tighter |
| Tile | TT 8×2 | TT 8×2 | same |
| DRC / LVS / antenna | 0/0/0 | 0/0/0 | clean |
The 2.24× cell count growth almost exactly matches the new instruction memory: moving 32 × 32-bit ROM from combinational mux into a flop array adds 1,024 flops directly, plus a couple hundred gates of write-port logic. The loader FSM (12 flops) and uart_rx (~30 flops) are basically free in comparison.
Setup slack tightened from 19.5% to 4.6%. The flop-imem read path is longer than P12’s combinational mux (a 32:1 mux of stored words vs a 32:1 mux of constant bits collapsed by yosys to ~5-level gates). Still meets timing, but if I needed to push the clock higher, I’d want to add a fetch-stage register and let the imem read take a full cycle.
The bug that bit me
First synth attempt failed: 1 Yosys check errors found at the
very last check before placement. The error was vague —
“Drivers conflicting with a constant 1’0 driver” on a $procmux
inside loader_fsm — but the cause was simple: I had two separate
always_comb blocks, both driving imem_we. Yosys correctly
flagged the conflict.
Why the bug? When I was structuring the loader, I wanted a
“default outputs” block separate from the FSM-driven write strobe,
because the wdata depends on a different combinational expression
than the strobe. But splitting them across two always_comb
blocks means both write to the same signal, which is a multi-driver
error. Once-per-signal is a hard rule for always_comb; I knew
that and I still wrote it that way. Coding too fast.
The fix was to fold both blocks into one:
always_comb begin
imem_we = (lstate == L_READ_BYTE) && (byte_idx == 2'd3) && rx_valid;
imem_waddr = word_idx;
imem_wdata = {rx_data, word_buf};
end
After the merge, the rest of the synth ran clean.
Layout shape
The placer organized the chip very differently from P12. P12 had the regfile on the left and dmem on the right. P13 has imem on the left (because it’s the biggest single block now), regfile on the right, dmem in the lower-center, and the I/O peripherals (UART TX, UART RX, loader FSM) packed across the top edge of the tile in a thin horizontal stripe.
The placer sees:
- imem (1024 flops): biggest block, takes the dominant left half.
- regfile (448): second-biggest, takes the right-edge column.
- dmem (256): mid-size, parks bottom-center.
- everything else: pin-edge peripherals on top.
This is exactly the kind of organization a real microcontroller’s floorplan would show. Scale up by 1000× — bigger memory, bigger regfile, bigger ALU — and you have an STM32. The shape is right; the absolute numbers are tiny.
What this teaches
P13 is the smallest system-level difference between “a chip that runs the program in its mask” and “a chip that runs the program you sent it.” The new pieces are:
- A writable instruction memory. 1,024 flops + a single write port. Costs ~half a tile worth of silicon at this scale.
- A serial loader. 12 flops + a 5-state FSM. Reads 0xA5 magic, then count, then N×4 bytes, writes them into the imem one word at a time.
- A separate UART RX receiver. Symmetric to UART TX, but listening instead of speaking.
That’s the entire reprogramability surface. The CPU itself is
unchanged from P12 — same instruction encoding, same FSM, same
pin frame for everything except the two pins repurposed for
uart_rx and load_mode.
What’s next
Open question: how to make programs persistent. Right now imem loses its contents on power cycle (it’s flops, not flash). A real microcontroller would have an SPI flash chip on the dev board and boot ROM that loads instructions from it. P14 would be that (uses 4 of the 8 uio bits as SPI master pins, adds a small boot controller that copies flash → imem on rst_n release, then releases the CPU).
But for the educational ladder, P13 is a clean stopping point — the lesson is “field reprogrammability via UART”, and the lesson landed.