journal 2026-04-28

P11 — wrapping P06's CPU into the Tiny Tapeout pin frame

p11tinytapeoutwrapping

P10 was the simplest possible Tiny Tapeout port: P02’s counter/PWM/LFSR design (50ish cells of pure datapath, no controller) wrapped in the tt_um_* pin frame. P11 is the natural follow-up — take an actual CPU through the same path.

The wrapper itself is small. Maybe sixty lines, almost all of it glue: take the TT pin-frame inputs (ui_in[7:0], uio_in[7:0], ena, clk, rst_n) and convert them into the inputs P06’s internal top module expects (clock, reset, baud divider, run/halt control). Take P06’s outputs (UART TX, halted, R7 register contents) and pack them into the TT output pins.

tt_um_librelane_p06_cpu (the wrapper)
   uo_out[0]    = uart_tx_line
   uo_out[1]    = halted
   uo_out[7:2]  = R7[5:0]   ; low 6 bits of R7
   uio_out      = 0
   uio_oe       = 0         ; uio is input-only here
   ui_in[7:0]   = baud_div[7:0]
   uio_in[7:0]  = baud_div[15:8]

The interesting design choice is what to expose. P06’s full output surface is out[7:0], pc_out[4:0], halted, and uart_tx. That’s 15 bits. The TT pin frame gives you 8 dedicated outputs (uo_out[7:0]) plus 8 bidirectional pins. Picking what makes the cut is itself a design decision.

The pick: keep UART TX (because that’s the chip’s user-facing interface), keep halted (because that’s the easiest “did the chip work” signal), keep 6 bits of R7 (the program-result register), drop pc_out entirely (debug-only, not needed once the chip is fabbed), drop the high bits of R7 (the default boot program’s result fits in 6 bits). The bidirectional pins go unused and uio_oe is tied to all-zero.

Default boot output

I wanted the silicon-default boot program to be visually distinguishable from any C-compiled program a host might load. For P06 that meant: keep the CPU’s internal Fibonacci(6) program that prints '8' over UART, then halts. Three observable signals the moment rst_n deasserts:

The host-side testbench uses a behavioural UART receiver (borrowed from P03’s tb_console.sv) to decode 0x38 from the chip’s TX line, and tb.sv asserts all three observables.

The submission flow vs the standalone harden

There are two parallel paths that both produce a hardened GDS:

  1. Standalone harden — the make harden PROJECT=11_tt_cpu target in this repo. Targets a 333 × 225 µm die (= TT 2×2 tile) at 50 MHz, runs full DRC/LVS/antenna. Useful for sanity-checking the wrapper before submitting it anywhere.

  2. Real TT submission — fork the tt-template repo, drop our top.sv into src/, fill in info.yaml, push to GitHub. TT’s CI workflow runs OpenLane against the TT shuttle’s chip-level template, produces a GDS, and the submission gets queued for the next shuttle.

P11 is the standalone harden case. The submission case adds an external aggregator (TT’s tt_top.v), an external mux that selects between hundreds of user projects, and a fixed pad ring. What lives in this repo is the upstream half — the thing every TT user hands to TT’s CI.

Lessons