P06’s CPU bolted onto a real bus with RAM, full UART (TX and RX), and GPIO at memory-mapped addresses. Three harden attempts to land clean, and the failure modes are textbook.
What happened
P06’s OUT and NOT opcodes get swapped for ST and LD — real
loads and stores to a real bus. Address map:
0x00..0x0F 16-byte RAM (flop-based)
0x40 UART TX data (W)
0x41 UART status: bit0=tx_busy, bit1=rx_valid (R)
0x42 UART RX data, read clears rx_valid (R)
0x80 GPIO out (R/W)
0x81 GPIO in (R)
UART RX is the new thing — the first time the chip samples something into itself, not just emitting. Same shift-register shape as TX inverted, with a 1.5-bit start delay so we sample mid-bit.
Four programs in tb.sv all PASS: RAM round-trip, UART TX of “Hi”,
GPIO write/read, and a UART RX byte routed to GPIO out. tb_demo.sv
runs an echo loop where the host TB drives “hey\n” into the chip’s
rx pin and decodes whatever comes out tx. Four bytes echo cleanly.
Bring-up bugs caught (some recurring):
- Yosys still rejects unpacked array parameters → packed bit-vectors.
- Yosys still rejects
function automatic ... return {...}→ V2001 form. (Same lesson from P06.) - Iverilog rejects
refof continuously-assigned wires — inlined helpers. - Initial test had all four DUTs sharing
rst_n; the UART receiver caught garbage bytes from the first run before its own check ran. Per-DUT reset. - Half-period reload was getting assigned in two
always_ffblocks silently; consolidated into the timer block.
Three harden iterations to land:
| attempt | clk | die | result |
|---|---|---|---|
| 1 | 10 ns | 160×160 µm | placement failed at 82% util |
| 2 | 10 ns | 220×220 µm | slow-corner setup -0.89 ns |
| 3 | 12 ns | 220×220 µm | slow-corner setup -0.22 ns |
| 4 | 14 ns | 220×220 µm | +0.63 ns slow-corner PASS |
Final at 71 MHz: 8330 cells, 220 × 220 µm die (41% utilization), +0.63 ns slow-corner setup, +7.84 ns typical, +0.88 ns hold, DRC=0, LVS=0, antenna=0.
The 82% utilization wall on attempt 1 is worth pausing over. Density that high doesn’t leave the global router enough channels for an SoC with this much fan-in/fan-out. Doubling the die in one direction (160 → 220 µm) dropped utilization to a comfortable 41% and the placement failure went away.
Annotations from the placed netlist call out six regions: the 16-byte (128-flop) RAM along the bottom, the 56-flop CPU regfile on the left, the 4-flop FSM state register, the 28-flop UART TX cluster (right column), the new 37-flop UART RX FSM, and the 16-flop GPIO.
Receipts
6d66669— RTL pass. Address map decided. P06 MDX fix rolled in (flags_d <= {Z, N, C, V}was being parsed as a JSX expression and crashing the build).2e9a8ad— hardened at 71 MHz on the third attempt.
Project page: /projects/07_tiny_soc/.