No. 144 / project of 147 on the ladder

Execute-prefetch repair throttle

introduces — execute-prefetch second-word repair throttle; class-aware repair policy change; repair-traffic versus shell-speed tradeoff

harden statelast run2026-05-06
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P144 applies P143’s audit result. Execute-stage prefetch fills still get one adjacent repair word, but no longer get a second repair word by default.

checkresult
Verilator buildPASS
BusyBox shell workload reaches P144-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP142P143P144
post-load cycles218,856,863219,787,362219,161,535
shell window cycles63,926,69164,756,82064,192,833
retired instructions85,799,04586,097,55485,905,855
CPI2.55082.55282.5512
S_FETCH cycles7,598,1337,607,3577,601,896
S_MEM cycles29,199,83029,346,50529,215,063

P144 is 563,987 cycles faster than P143 and 28,809 cycles faster than P134, but 266,142 cycles slower than P142. That makes it a useful policy experiment, not the new best shell result.

counterP143P144delta
background repair fills42,994,35232,979,525-10,014,827
first later fetch hits1,575,0881,383,118-191,970
repeat later fetch hits663,574606,411-57,163
demand second-word grants2,641,1952,728,297+87,102
prefetch second-word grants49,052,36622,115,553-26,936,813

The throttle did cut the low-payback traffic, but it also removed enough useful hits to fall behind P142.

classrepair fillsfirst hitsrepeat hitsfirst+repeat ratio
demand fetch1,021,344482,207331,81979.70%
execute prefetch29,021,605558,785180,9042.55%
load prefetch2,419,665134,87115,5756.22%
writeback prefetch407,551170,82053,95255.15%
steer prefetch0000.00%
aux prefetch109,36036,43524,16155.41%
unknown0000.00%

The next rung should bring back execute-prefetch second-word repair only under a tighter condition, rather than enabling it for the whole class.

shell phases label P144 shell workload cycles 219,161,535 cpi 2.55
  1. kernel banner to /init 117,345,818 53.7%
  2. /init to shell banner 1,095,952 0.5%
  3. shell banner to first command 35,898,113 16.4%
  4. echo command 1,649 0%
  5. uname -a 2,449,446 1.1%
  6. ls /bin /usr/share 31,650,054 14.5%
  7. cat sample file 3,003,495 1.4%
  8. touch/write/cat/rm /tmp file 10,595,352 4.9%
  9. 8x ash loop with file I/O 16,492,157 7.6%
  10. final marker 680 0%
state breakdown label P144 execute-prefetch repair throttle workload cycles 219,161,535 cpi 2.55
  1. fetch 3.5% 7,601,896
  2. execute 39.2% 85,930,489
  3. mem 13.5% 29,493,185
  4. walker 1.2% 2,682,490
  5. writeback 39.2% 85,905,855
  6. mul/div 3.4% 7,545,904