P126 splits the P124/P125 non-integer block classes into explicit
shadow holding records. One record models memory-class queued work; one
record models control-flow queued work. The architectural core still
runs the same in-order FSM.
check
result
Verilator build
PASS
BusyBox shell workload reaches P126-FILE-OK
PASS
Holding-record counters emitted
PASS
Hardened layout
NOT RUN
memory holding counter
value
candidates
6,659,815
accepts
3,245,087
releases
3,245,025
hold cycles
6,726,473
full blocks
0
source-busy blocks
3,414,728
sources ready
3,245,087
loads / stores / AMOs
3,629,606 / 2,872,573 / 157,636
flush clears
61
control holding counter
value
candidates
9,328,049
accepts
6,280,124
releases
6,280,124
hold cycles
6,280,124
full blocks
29,878
source-busy blocks
3,018,251
sources ready
6,309,798
branches / JALs / JALRs
5,956,381 / 2,169,899 / 1,201,769
flush clears
0
metric
P125
P126
post-load cycles
218,418,785
218,372,668
shell window cycles
64,495,264
64,451,352
retired instructions
86,247,293
86,230,113
CPI
2.5325
2.5324
S_FETCH cycles
7,621,101
7,621,517
S_MEM cycles
27,710,923
27,697,973
This is not a speedup claim. The measured value is classification. The
memory side has millions of ready candidates and no full-record pressure;
the control side has enough ready candidates to fill a one-entry record
and still shows 29.9K full blocks. That makes P127’s job concrete:
model scheduler wakeup/issue rules across integer, memory, and control
records instead of treating writeback as the only drain point.