No. 145 / project of 147 on the ladder

Conditional execute-prefetch repair

introduces — conditional execute-prefetch second-word repair; negative frontend repair predicate result

harden statelast run2026-05-06
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P145 tries the obvious repair-policy refinement after P144. Execute-prefetch fills get a second repair word only when the already-returned prefetch instruction says the next sequential PC is the adjacent word in the same I-cache line.

checkresult
Verilator buildPASS
BusyBox shell workload reaches P145-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP142P144P145
post-load cycles218,856,863219,161,535220,074,276
shell window cycles63,926,69164,192,83365,101,034
retired instructions85,799,04585,905,85586,210,825
CPI2.55082.55122.5527
S_FETCH cycles7,598,1337,601,8967,621,660
S_MEM cycles29,199,83029,215,06329,346,230

This is a speed FAIL. P145 is 908,201 cycles slower than P144 and 1,174,343 cycles slower than P142.

counterP144P145delta
background repair fills32,979,52533,087,489+107,964
first later fetch hits1,383,1181,385,394+2,276
repeat later fetch hits606,411607,427+1,016
demand second-word grants2,728,2972,736,584+8,287
prefetch second-word grants22,115,55322,281,434+165,881

The condition is too narrow to recover the useful locality P142 found, and the extra traffic is not buying enough hits to pay for itself.

classrepair fillsfirst hitsrepeat hitsfirst+repeat ratio
demand fetch1,023,366483,130332,36479.68%
execute prefetch29,110,829559,692181,2692.55%
load prefetch2,435,053134,97615,5776.18%
writeback prefetch408,650171,03954,03055.07%
steer prefetch0000.00%
aux prefetch109,59136,55724,18755.43%
unknown0000.00%

The next rung should be audit-only: keep P144’s active behavior and count multiple candidate execute-prefetch usefulness predicates before changing the repair budget again.

shell phases label P145 shell workload cycles 220,074,276 cpi 2.55
  1. kernel banner to /init 117,333,618 53.5%
  2. /init to shell banner 1,105,760 0.5%
  3. shell banner to first command 35,905,045 16.4%
  4. echo command 1,649 0%
  5. uname -a 2,439,451 1.1%
  6. ls /bin /usr/share 32,110,057 14.6%
  7. cat sample file 3,158,643 1.4%
  8. touch/write/cat/rm /tmp file 10,985,859 5%
  9. 8x ash loop with file I/O 16,404,695 7.5%
  10. final marker 680 0%
state breakdown label P145 conditional execute-prefetch repair workload cycles 220,074,276 cpi 2.55
  1. fetch 3.5% 7,621,660
  2. execute 39.2% 86,235,573
  3. mem 13.5% 29,625,182
  4. walker 1.2% 2,695,859
  5. writeback 39.2% 86,210,825
  6. mul/div 3.5% 7,683,461