The speed page put today’s chip somewhere around ~32 MHz
implied Fmax at the slow signoff corner, against a 25 MHz (40 ns)
target. P37’s recorded worst setup slack at signoff was 8.74 ns,
which puts the critical path around 31 ns. The natural diagnostic
is: drop CLOCK_PERIOD to 30 ns (33.3 MHz) and re-harden. If it
converges, we learn something about how much margin we actually had;
if it breaks, we learn which path inside the FSM CPU pinches first.
The new config is checked in at:
projects/37_rv32im_zicsr_zifencei/librelane/config_speed30.yaml
Same RTL, same SDC, same memory map, same DEFAULT_CORNER. The only
diff vs the recorded P37 harden is CLOCK_PERIOD: 40.0 -> 30.0.
Running it
scripts/run_librelane.sh hard-codes librelane/config.yaml, so this
one goes through librelane directly:
cd projects/37_rv32im_zicsr_zifencei
librelane librelane/config_speed30.yaml
The fresh RUN_<timestamp>/ lands under
projects/37_rv32im_zicsr_zifencei/librelane/runs/, alongside the
existing RUN_2026-05-02_22-49-46/ from the original P37 harden. The
original hardened result is not disturbed.
What we got
Outcome 2: converges through GDS with negative setup slack. The
flow ran end-to-end and produced a clean GDS (Magic DRC, KLayout DRC,
LVS, antenna, routing DRC, XOR all 0 errors), but signoff timing
reported 152 setup violations at max_ss_100C_1v60 with worst
slack -1.258 ns.
| metric | P37 (40 ns) | P37-speed (30 ns) |
|---|---|---|
| Worst setup slack | 8.742 ns | -1.258 ns |
| Worst hold slack | 0.105 ns | 0.105 ns |
| Setup violation count | 0 | 152 |
| Hold violation count | 0 | 0 |
max_ss slew vio | 83 | 83 |
max_ss cap vio | 8 | 8 |
| Standard-cell area | 191927 um² | 191927 um² |
| Standard-cell count | 27157 | 27157 |
| Magic DRC / KLayout DRC / LVS / antenna / routing DRC | 0 | 0 |
The first interesting observation: the implied critical-path length
is identical. P37 had 40 - 8.742 = 31.258 ns of used path; P37-30
has 30 + 1.258 = 31.258 ns. Same number, four decimal places. The
resizer didn’t gain anything from the tighter budget - same
standard-cell count, same area, same DRV tail. This is the cell-
strength ceiling for that path; making the constraint tighter just
moved the slack from positive to negative without changing the
underlying logic.
The critical endpoint
The worst violator (and the top 16 violators) all start at the same register:
Startpoint: _19399_/Q (u_core.op_b[2], the ALU operand-B register)
Endpoint: _17817_/D (a downstream flop)
The path runs through a ~16-deep chain of or4_2/or3_2/or2_2/
a22o_2 gates with buffer repeaters between segments. That shape is a
classic wide OR-reduction tree, almost certainly the divider’s
bit-by-bit quotient/remainder reduction or the multiplier’s add-
reduction in the same block. The fact that the same source register
(op_b[2]) drives the top 16 endpoints tells us those 16 destinations
are all part of the same wide reduction.
This is consistent with what the speed page said: today’s
chip is FSM-bound, not technology-bound. The path is not about
wire delay or routing parasitics - it is the depth of a combinational
arithmetic reduction inside the FSM’s S_DIV/S_MUL cycle. Adding a
single pipeline register inside that reduction would split the path
in half and the design would converge at ~16 ns. Doing that means
designing a real pipelined core, which is a P-something-large rung,
not a config flip.
What this confirms
- The headroom-from-slack estimate on the speed page is
correct. The signoff Fmax of this RTL is about
1 / 31.258 ns = 32 MHz, almost exactly the~32 MHzwe estimated. - The resizer is already doing what it can. Same cells, same area at both budgets - it can’t trade more cells for more speed on this path.
- The path lives in the divider/multiplier reduction. That is a deliberate target for a future architectural rung if speed becomes a goal.
Status
Configured: PASS
Hardened (fast): PARTIAL - GDS/DRC/LVS clean, 152 setup violations
in signoff timing.
The fresh RUN_2026-05-03_01-29-50/ is checked into the run
directory next to the original P37 harden; the original 40 ns
harden is unchanged.
The roadmap-side conclusion is unchanged: today’s chip is FSM-bound, not technology-bound. A real speed jump means a different core, not a tighter budget. See /speed/ for the framing. This run graduates from “experiment we should try” to “data point we have,” and the divider/multiplier OR-reduction is the named target if we ever care about Fmax.