This page is a ceiling estimation, not a roadmap. The ladder is going somewhere else — toward enough RISC-V to host real software. But the question “if we just wanted to push speed, how far could we go on sky130A?” comes up often enough to write down.
§ What the technology decides
The PDK fixes two things that anchor every speed number on this site:
- Standard-cell library. We use
sky130_fd_sc_hd— the open high-density library. It has a moderate cell variety, no hand-optimized fast flops, and a published FO4 delay around50 psfor an inverter at the nominal corner. That’s the fundamental gate-delay yardstick. - Routing parasitics. sky130 is a 130 nm node. Wire delay is a real fraction of the cycle once nets get long. Tools have to spend buffers to keep slew under control on big fanout trees, and that buffering eats into the cycle budget.
A back-of-envelope ceiling for a single fast path on sky130 hd is
roughly 15–25 FO4 inverter delays per cycle once you account for setup
time, clock skew, and a realistic mix of multi-input gates. That’s
750 ps – 1.25 ns, or 800 MHz – 1.3 GHz as a raw cell-delay
number.
That number is misleading. Real designs don’t get there because the critical path isn’t a chain of inverters — it’s a register-to-register path through ALU logic, mux trees, and routed wires. Every realistic RISC-V critical path on sky130 hd lands much further down.
§ What the architecture decides
The biggest single multiplier between “FO4 ceiling” and “what your chip actually clocks at” is whether you pipelined.
| core style | comfortable | with effort | hard ceiling |
|---|---|---|---|
| multi-cycle FSM (today) | 50–80 MHz | ~100 MHz | front-end FSM transitions |
| 3-stage pipelined (Ibex-ish) | 100–150 MHz | ~180 MHz | reg-file + ALU forwarding |
| 5-stage pipelined (Rocket-mini) | 100–180 MHz | ~220 MHz | branch + memory paths |
| aggressively tuned | — | ~250 MHz | clock tree, std-cell skew |
A multi-cycle CPU like the one we ship today has long combinational paths between flop-stages because each instruction does several operations in series across one giant FSM. Adding a real pipeline breaks those paths into smaller pieces, and the cycle time drops roughly proportionally — at the cost of pipeline registers, forwarding logic, hazard detection, and a much larger test surface.
5 stages is the classic RISC textbook split (fetch / decode / execute / memory / writeback). It’s also what most open-silicon RV32 cores actually ship: VexRiscv, Ibex (when configured), and SCR1-class designs all live in the 100–180 MHz zone on sky130 hd.
§ What careful PnR decides
Past the architecture, you can squeeze another factor by being careful about the flow itself:
- Floorplan. Hand-placing critical macros (reg-file, instruction buffer) so they sit close to the path that uses them shortens routes and removes buffering.
- Clock tree. The default CTS targets a generous skew budget. Tightening it costs more buffers but recovers cycle time.
- Flops. The default flops in
sky130_fd_sc_hdare general- purpose. Some designs swap in*_2or*_4drive-strength variants on critical endpoints to reduce setup time. - Synthesis. Pushing Yosys harder (
abc -fastoff, retiming on, different mapping passes) trades runtime for QoR.
These are diminishing returns. Each one buys 5–15% cycle time. None of them turn a 100 MHz core into a 200 MHz core — that’s an architecture change.
§ Open-silicon reference points
| design | core type | sky130 PDK | reported clock |
|---|---|---|---|
| Caravel mgmt SoC | VexRiscv-derived RV32IMC | sky130A | ~10–40 MHz (system-bound) |
| Ibex (open-silicon hardenings) | RV32IMC, 2-stage pipe | sky130 hd | ~80–120 MHz |
| VexRiscv, mid-tune | RV32IMC, configurable pipe | sky130 hd | ~100–150 MHz |
| VexiiRiscv, aggressive | RV32IMC, deeper pipe | sky130 hd | ~180–220 MHz |
| TinyQV | RV32EC, multi-cycle | sky130A · TT | ~64 MHz (TT clock) |
These are the public points worth pinning the chart on. Anything
claiming >250 MHz on sky130A for an in-order RISC-V is either
running at the nominal corner only, ignoring SRAM access timing, or
using cells the open community can’t reach.
§ Where today’s chip sits
P37 is FSM-bound, not technology-bound. We chose CLOCK_PERIOD: 40.0
because that’s the constraint the rest of the ladder used; the
recorded slack at signoff is 8.74 ns against a 40 ns budget, which
means the critical path is around 31 ns, not 40 ns.
A speed-push experiment confirmed this directly. P37 was re-hardened
at CLOCK_PERIOD: 30.0 (33.3 MHz); the flow ran end-to-end and
produced a clean GDS, but signoff reported 152 setup violations at
the slow corner with worst slack -1.258 ns. Implied critical path:
30 + 1.258 = 31.258 ns — the same 31.258 ns we computed from
P37’s positive-slack number. The resizer didn’t gain anything from
the tighter budget: same 27157 cells, same 191927 um² of stdcell.
That puts today’s RTL at an empirical ~32 MHz Fmax at the slow
signoff corner, with the critical path landing inside a wide OR-
reduction tree starting at the ALU operand-B register and walking
through the divider/multiplier block. The
journal entry has the
endpoint detail.
A meaningful speed jump from here means a different core, not a tighter budget: pipelined fetch/decode/execute, register-file forwarding, branch-target resolution moved out of the same cycle as ALU. That’s a P-something-large rung, and only worth doing if “fast” becomes a real goal. For now, “boring and correct enough to host FreeRTOS” is the cheaper milestone.
§ Honest framing
The fastest RISC-V we could realistically produce on sky130A is ~200–250 MHz, with a well-pipelined RV32 core and careful but not exotic PnR. Above that requires custom flops, custom clocking, and research-level effort that doesn’t fit the educational shape of this project.
The ladder is not currently aimed there. The roadmap explains what it is aimed at: enough of the RISC-V architecture to plausibly host real software, starting with FreeRTOS.