journal 2026-04-29

P12 — RV32E hardened on a TT 8×2 tile

p12hardenriscvtinytapeout

After P09 hardened too big for any TT slot (17k cells / 360k µm² — needs about 1.25× the area of an 8×2 tile), the question was: can we make a real RISC-V core that fits? P12 is the answer.

It fits.

Numbers

P09 (RV32I-min)P12 (RV32E-tt)factor
Cells17,2774,9430.29×
Flops3,2629200.28×
Die area360,000 µm²290,250 µm²0.81×
Aspect1:1 (square)5.7:1 (wide)TT-shaped
Setup slack16.55 ns3.91 nstighter
DRC / LVS / antenna0 / 0 / 00 / 0 / 0clean

The 3.5× cell shrink and 3.5× flop shrink came from three changes:

  1. RV32E instead of RV32I — 16 × 32-bit regfile vs 32 × 32-bit. Saves 512 flops outright. gcc supports -march=rv32e -mabi=ilp32e, so existing C code compiles for it directly.

  2. 8-word dmem vs 64-word — 256 flops vs 2,048. The shrink is eye-watering but in practice 32 bytes is enough for a couple of locals and one result slot. The MMIO UART register at 0x80 lives outside dmem, so I/O and storage don’t compete.

  3. 32-instruction PROG ROM vs 256. 128 bytes of code vs 1024. Forces programs to be tight, which is exactly the embedded discipline the chip is meant to teach.

Single observable

P09 had two debug pins (dbg_reg_out for any regfile entry, dmem_out for dmem[0]). P12 has none — the TT pin frame doesn’t have spare bits. So we expose CPU state through a peripheral instead: a memory-mapped UART at byte address 0x80. Writes to 0x80 emit a byte; reads from 0x80 return the busy bit. The boot program poll-and-sends three bytes ('1', '3', '\n'), which a host USB-UART bridge sees as 13\n on the wire.

This is what real microcontrollers do: every storage element worth naming has either a memory-mapped peripheral that exposes its state, or doesn’t get exposed at all. Debug ports as wide as the internal state are a privilege of educational chips with pin counts to spare.

The wide-and-short floorplan

A TT 8×2 tile is 1290 × 225 µm. Most of my mental model for “a chip” is square — P09 is 600 × 600, P06 is 130 × 130, P08 is 130 × 130. P12 is 5.7× wider than tall, which is genuinely strange to look at. Standard cell rows go across the entire tile without folding, so the floorplan looks like a horizontal stripe of cells rather than a checkerboard.

I dropped FP_CORE_UTIL from the default ~50 % to 35 % to give the placer more room to route through the long thin core. At 50 % the DRT step was struggling with congestion; at 35 % it converged cleanly. Final core utilization comes out around 17 % real density — the chip looks underfilled in the screenshot, but the metal stack is dense and that’s where the timing closure happens.

Caveat: max-cap warnings in slow corners

One non-fatal warning surfaced from STA: [Checker.MaxCapViolations] in the ss_100C_1v60 (slow-slow, 100 °C, 1.6 V) corner. That’s the worst-case process corner — the spec considers this a manufacturing bound, not a typical operating condition. On a real TT shuttle this typically translates to “the chip will work but might miss a few tens of MHz of speed at the high-temp corner.” The chip passes in the typical and fast corners, which is what matters for “does it work.”

If we were sending this to silicon, I’d want to add a PIN_CAP_FILE entry and re-run STA after the post-CTS resizer budgets cell-input capacitance more carefully. For an educational target that’s documented-as-rtl-pass-with-known-corner-warnings, the chip is good as-is.

What’s next

The natural follow-up: pipe the C toolchain (tools/riscv-asm/) through to P12 the way it now hooks up to P09. fib(10) compiled by gcc was 60 bytes — just barely fits in P12’s 128-byte ROM if we trim the boot stub. The boot stub currently uses 16 bytes; if the C program uses 60 bytes, we have 52 bytes left over for either a bigger compute kernel or a UART-print-result preamble. That’s right at the edge of what 128 bytes of RV32E can express.

After that: a real TT submission. The wrapper module tt_um_librelane_p12_rv32e matches the TT info.yaml signature; the last step is wiring up a TT user-project repo, dropping in our RTL, and submitting to the next open shuttle window. ~$300 and six months of waiting and we have a real RISC-V chip on a desk.