P11 — wrapping P06's CPU into the Tiny Tapeout pin frame

P10 was the simplest possible Tiny Tapeout port: P02’s counter/PWM/LFSR design (50ish cells of pure datapath, no controller) wrapped in the tt_um_* pin frame. P11 is the natural follow-up — take an actual CPU through the same path.

The wrapper itself is small. Maybe sixty lines, almost all of it glue: take the TT pin-frame inputs (ui_in[7:0], uio_in[7:0], ena, clk, rst_n) and convert them into the inputs P06’s internal top module expects (clock, reset, baud divider, run/halt control). Take P06’s outputs (UART TX, halted, R7 register contents) and pack them into the TT output pins.

tt_um_librelane_p06_cpu (the wrapper)
   uo_out[0]    = uart_tx_line
   uo_out[1]    = halted
   uo_out[7:2]  = R7[5:0]   ; low 6 bits of R7
   uio_out      = 0
   uio_oe       = 0         ; uio is input-only here
   ui_in[7:0]   = baud_div[7:0]
   uio_in[7:0]  = baud_div[15:8]

The interesting design choice is what to expose. P06’s full output surface is out[7:0], pc_out[4:0], halted, and uart_tx. That’s 15 bits. The TT pin frame gives you 8 dedicated outputs (uo_out[7:0]) plus 8 bidirectional pins. Picking what makes the cut is itself a design decision.

The pick: keep UART TX (because that’s the chip’s user-facing interface), keep halted (because that’s the easiest “did the chip work” signal), keep 6 bits of R7 (the program-result register), drop pc_out entirely (debug-only, not needed once the chip is fabbed), drop the high bits of R7 (the default boot program’s result fits in 6 bits). The bidirectional pins go unused and uio_oe is tied to all-zero.

Default boot output

I wanted the silicon-default boot program to be visually distinguishable from any C-compiled program a host might load. For P06 that meant: keep the CPU’s internal Fibonacci(6) program that prints '8' over UART, then halts. Three observable signals the moment rst_n deasserts:

UART emits one byte: 0x38 = ASCII '8'.
halted goes high.
uo_out[7:2] mirrors R7’s low 6 bits = 8.

The host-side testbench uses a behavioural UART receiver (borrowed from P03’s tb_console.sv) to decode 0x38 from the chip’s TX line, and tb.sv asserts all three observables.

The submission flow vs the standalone harden

There are two parallel paths that both produce a hardened GDS:

Standalone harden — the make harden PROJECT=11_tt_cpu target in this repo. Targets a 333 × 225 µm die (= TT 2×2 tile) at 50 MHz, runs full DRC/LVS/antenna. Useful for sanity-checking the wrapper before submitting it anywhere.
Real TT submission — fork the tt-template repo, drop our top.sv into src/, fill in info.yaml, push to GitHub. TT’s CI workflow runs OpenLane against the TT shuttle’s chip-level template, produces a GDS, and the submission gets queued for the next shuttle.

P11 is the standalone harden case. The submission case adds an external aggregator (TT’s tt_top.v), an external mux that selects between hundreds of user projects, and a fixed pad ring. What lives in this repo is the upstream half — the thing every TT user hands to TT’s CI.

Lessons

Pin packing is a design decision. P06’s output bits don’t match TT’s output count, and there’s no automatic “expose everything” path. You pick what’s worth pinning out and document the rest as internal-only.
The wrapper preserves the inner module. I didn’t modify P06’s RTL. The wrapper imports it (renamed to p06_top to avoid name collisions on a shuttle hosting hundreds of top modules) and passes the appropriate signals through. Same module, same hardened netlist if you ran them in isolation; just dressed up in the TT pin frame.
ena matters. TT’s user mux holds ena high while the shuttle is “running this project” and low otherwise. P11’s wrapper treats it as a synchronous gate on the outputs: when ena=0, all uo_out bits force to 0 regardless of internal state. That keeps the chip from leaking signals onto the shared shuttle pins when another project is selected.