orient/speed ceiling

How fast can this go?

Picking apart “what’s the fastest RISC-V we could build on this PDK” into the parts the technology decides, the parts the architecture decides, and the parts careful PnR decides.

PDKsky130A · sky130_fd_sc_hdtoday (P37)25 MHz target · 8.74 ns slackimplied Fmax~32 MHz signoff · ~48 MHz nominalrealistic ceiling~250 MHz pipelined

This page is a ceiling estimation, not a roadmap. The ladder is going somewhere else — toward enough RISC-V to host real software. But the question “if we just wanted to push speed, how far could we go on sky130A?” comes up often enough to write down.

§ What the technology decides

The PDK fixes two things that anchor every speed number on this site:

A back-of-envelope ceiling for a single fast path on sky130 hd is roughly 15–25 FO4 inverter delays per cycle once you account for setup time, clock skew, and a realistic mix of multi-input gates. That’s 750 ps – 1.25 ns, or 800 MHz – 1.3 GHz as a raw cell-delay number.

That number is misleading. Real designs don’t get there because the critical path isn’t a chain of inverters — it’s a register-to-register path through ALU logic, mux trees, and routed wires. Every realistic RISC-V critical path on sky130 hd lands much further down.

§ What the architecture decides

The biggest single multiplier between “FO4 ceiling” and “what your chip actually clocks at” is whether you pipelined.

core stylecomfortablewith efforthard ceiling
multi-cycle FSM (today)50–80 MHz~100 MHzfront-end FSM transitions
3-stage pipelined (Ibex-ish)100–150 MHz~180 MHzreg-file + ALU forwarding
5-stage pipelined (Rocket-mini)100–180 MHz~220 MHzbranch + memory paths
aggressively tuned~250 MHzclock tree, std-cell skew

A multi-cycle CPU like the one we ship today has long combinational paths between flop-stages because each instruction does several operations in series across one giant FSM. Adding a real pipeline breaks those paths into smaller pieces, and the cycle time drops roughly proportionally — at the cost of pipeline registers, forwarding logic, hazard detection, and a much larger test surface.

5 stages is the classic RISC textbook split (fetch / decode / execute / memory / writeback). It’s also what most open-silicon RV32 cores actually ship: VexRiscv, Ibex (when configured), and SCR1-class designs all live in the 100–180 MHz zone on sky130 hd.

§ What careful PnR decides

Past the architecture, you can squeeze another factor by being careful about the flow itself:

These are diminishing returns. Each one buys 5–15% cycle time. None of them turn a 100 MHz core into a 200 MHz core — that’s an architecture change.

§ Open-silicon reference points

designcore typesky130 PDKreported clock
Caravel mgmt SoCVexRiscv-derived RV32IMCsky130A~10–40 MHz (system-bound)
Ibex (open-silicon hardenings)RV32IMC, 2-stage pipesky130 hd~80–120 MHz
VexRiscv, mid-tuneRV32IMC, configurable pipesky130 hd~100–150 MHz
VexiiRiscv, aggressiveRV32IMC, deeper pipesky130 hd~180–220 MHz
TinyQVRV32EC, multi-cyclesky130A · TT~64 MHz (TT clock)

These are the public points worth pinning the chart on. Anything claiming >250 MHz on sky130A for an in-order RISC-V is either running at the nominal corner only, ignoring SRAM access timing, or using cells the open community can’t reach.

§ Where today’s chip sits

P37 is FSM-bound, not technology-bound. We chose CLOCK_PERIOD: 40.0 because that’s the constraint the rest of the ladder used; the recorded slack at signoff is 8.74 ns against a 40 ns budget, which means the critical path is around 31 ns, not 40 ns.

A speed-push experiment confirmed this directly. P37 was re-hardened at CLOCK_PERIOD: 30.0 (33.3 MHz); the flow ran end-to-end and produced a clean GDS, but signoff reported 152 setup violations at the slow corner with worst slack -1.258 ns. Implied critical path: 30 + 1.258 = 31.258 ns — the same 31.258 ns we computed from P37’s positive-slack number. The resizer didn’t gain anything from the tighter budget: same 27157 cells, same 191927 um² of stdcell.

That puts today’s RTL at an empirical ~32 MHz Fmax at the slow signoff corner, with the critical path landing inside a wide OR- reduction tree starting at the ALU operand-B register and walking through the divider/multiplier block. The journal entry has the endpoint detail.

A meaningful speed jump from here means a different core, not a tighter budget: pipelined fetch/decode/execute, register-file forwarding, branch-target resolution moved out of the same cycle as ALU. That’s a P-something-large rung, and only worth doing if “fast” becomes a real goal. For now, “boring and correct enough to host FreeRTOS” is the cheaper milestone.

§ Honest framing

The fastest RISC-V we could realistically produce on sky130A is ~200–250 MHz, with a well-pipelined RV32 core and careful but not exotic PnR. Above that requires custom flops, custom clocking, and research-level effort that doesn’t fit the educational shape of this project.

The ladder is not currently aimed there. The roadmap explains what it is aimed at: enough of the RISC-V architecture to plausibly host real software, starting with FreeRTOS.