P10 was the simplest possible Tiny Tapeout port: P02’s
counter/PWM/LFSR design (50ish cells of pure datapath, no
controller) wrapped in the tt_um_* pin frame. P11 is the
natural follow-up — take an actual CPU through the same path.
The wrapper itself is small. Maybe sixty lines, almost all of it
glue: take the TT pin-frame inputs (ui_in[7:0], uio_in[7:0],
ena, clk, rst_n) and convert them into the inputs P06’s
internal top module expects (clock, reset, baud divider, run/halt
control). Take P06’s outputs (UART TX, halted, R7 register
contents) and pack them into the TT output pins.
tt_um_librelane_p06_cpu (the wrapper)
uo_out[0] = uart_tx_line
uo_out[1] = halted
uo_out[7:2] = R7[5:0] ; low 6 bits of R7
uio_out = 0
uio_oe = 0 ; uio is input-only here
ui_in[7:0] = baud_div[7:0]
uio_in[7:0] = baud_div[15:8]
The interesting design choice is what to expose. P06’s full output
surface is out[7:0], pc_out[4:0], halted, and uart_tx.
That’s 15 bits. The TT pin frame gives you 8 dedicated outputs
(uo_out[7:0]) plus 8 bidirectional pins. Picking what makes the
cut is itself a design decision.
The pick: keep UART TX (because that’s the chip’s user-facing
interface), keep halted (because that’s the easiest “did the
chip work” signal), keep 6 bits of R7 (the program-result
register), drop pc_out entirely (debug-only, not needed once
the chip is fabbed), drop the high bits of R7 (the default boot
program’s result fits in 6 bits). The bidirectional pins go
unused and uio_oe is tied to all-zero.
Default boot output
I wanted the silicon-default boot program to be visually
distinguishable from any C-compiled program a host might load.
For P06 that meant: keep the CPU’s internal Fibonacci(6) program
that prints '8' over UART, then halts. Three observable signals
the moment rst_n deasserts:
- UART emits one byte:
0x38= ASCII'8'. haltedgoes high.uo_out[7:2]mirrors R7’s low 6 bits = 8.
The host-side testbench uses a behavioural UART receiver
(borrowed from P03’s tb_console.sv) to decode 0x38 from the
chip’s TX line, and tb.sv asserts all three observables.
The submission flow vs the standalone harden
There are two parallel paths that both produce a hardened GDS:
-
Standalone harden — the
make harden PROJECT=11_tt_cputarget in this repo. Targets a 333 × 225 µm die (= TT 2×2 tile) at 50 MHz, runs full DRC/LVS/antenna. Useful for sanity-checking the wrapper before submitting it anywhere. -
Real TT submission — fork the tt-template repo, drop our
top.svintosrc/, fill ininfo.yaml, push to GitHub. TT’s CI workflow runs OpenLane against the TT shuttle’s chip-level template, produces a GDS, and the submission gets queued for the next shuttle.
P11 is the standalone harden case. The submission case adds an
external aggregator (TT’s tt_top.v), an external mux that
selects between hundreds of user projects, and a fixed pad ring.
What lives in this repo is the upstream half — the thing every
TT user hands to TT’s CI.
Lessons
- Pin packing is a design decision. P06’s output bits don’t match TT’s output count, and there’s no automatic “expose everything” path. You pick what’s worth pinning out and document the rest as internal-only.
- The wrapper preserves the inner module. I didn’t modify P06’s
RTL. The wrapper imports it (renamed to
p06_topto avoid name collisions on a shuttle hosting hundreds oftopmodules) and passes the appropriate signals through. Same module, same hardened netlist if you ran them in isolation; just dressed up in the TT pin frame. enamatters. TT’s user mux holdsenahigh while the shuttle is “running this project” and low otherwise. P11’s wrapper treats it as a synchronous gate on the outputs: whenena=0, alluo_outbits force to 0 regardless of internal state. That keeps the chip from leaking signals onto the shared shuttle pins when another project is selected.