No. 142 / project of 147 on the ladder

Selective prefetch second-word I-cache repair

introduces — frontend-consuming prefetch repair budget; split demand/prefetch repair grants; first post-P134 shell-window win

harden statelast run2026-05-06
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P142 extends P141 by giving a second repair word to prefetch fills that are immediately consumed by the frontend. Demand-fetch lines still get two repair words. Plain auxiliary/background prefetch fills stay at one.

checkresult
Verilator buildPASS
BusyBox shell workload reaches P142-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP134P141P142
post-load cycles218,247,567219,252,781218,856,863
shell window cycles64,221,64264,289,26763,926,691
retired instructions86,139,76085,956,78185,799,045
CPI2.53362.55072.5508
S_FETCH cycles7,617,6957,613,7557,598,133
S_MEM cycles27,799,34329,236,89729,199,830

P142 is a speed PASS. It beats P141 by 362,576 shell-window cycles, P140 by 1,108,790 cycles, and P134 by 294,951 cycles.

repair-usefulness counterP141P142
background repair word fills32,151,19642,853,866
first later fetch hits1,351,5431,575,004
repeat later fetch hits602,447663,894
first-hit usefulness ratio4.20%3.68%
first + repeat fetch-hit ratio6.08%5.22%
P142 policy bucketcount
repair starts51,970,292
demand second-word grants2,636,400
prefetch second-word grants48,846,507
budget stops17,527,553
already-valid word skips106,365
budget at end0

The ratio got worse, but the absolute useful-hit count improved enough to win. That is the useful story: P139’s full repair stream was too expensive, P141’s demand-only second word was nearly right, and P142 shows that some prefetch repair deserves budget too.

The next rung should split the prefetch-consumer bucket. Right now the new counter says 48.85M prefetch second-word grants, which is too broad. P143 should identify which prefetch consumers actually pay for themselves.

shell phases label P142 shell workload cycles 218,856,863 cpi 2.55
  1. kernel banner to /init 117,353,521 53.8%
  2. /init to shell banner 1,104,548 0.5%
  3. shell banner to first command 35,843,251 16.4%
  4. echo command 1,649 0%
  5. uname -a 2,251,657 1%
  6. ls /bin /usr/share 31,655,825 14.5%
  7. cat sample file 3,007,339 1.4%
  8. touch/write/cat/rm /tmp file 10,744,396 4.9%
  9. 8x ash loop with file I/O 16,265,145 7.5%
  10. final marker 680 0%
state breakdown label P142 selective prefetch second-word I-cache repair workload cycles 218,856,863 cpi 2.55
  1. fetch 3.5% 7,598,133
  2. execute 39.2% 85,823,609
  3. mem 13.5% 29,477,350
  4. walker 1.2% 2,659,820
  5. writeback 39.2% 85,799,045
  6. mul/div 3.4% 7,497,190