No. 147 / project of 147 on the ladder

Strict execute-prefetch guard

introduces — active composite guard for execute-prefetch second-word I-cache repair; final negative result for this repair bucket

harden statelast run2026-05-07
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P147 is the final narrow execute-prefetch repair check promised by P146. It makes one composite predicate active:

predicted_not_taken && word_not_last && quiet_backend

When the guard fires, execute-prefetch repair gets a two-word budget. Otherwise it keeps the P144 one-word budget.

checkresult
Verilator buildPASS
BusyBox userspace buildPASS
Linux image rebuilt with P147 initramfsPASS
BusyBox shell workload reaches P147-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP142P144P146P147
post-load cycles218,856,863219,161,535220,667,218219,782,307
shell window cycles63,926,69164,192,83365,663,03964,812,561
retired instructions85,799,04585,905,85586,410,37686,122,202
CPI2.55082.55122.55372.5520
S_FETCH cycles7,598,1337,601,8967,642,6877,620,860
S_MEM cycles29,199,83029,215,06329,435,05429,306,905

P147 improves on P146 by 850,478 cycles, but still loses to P144 by 619,728 cycles and to P142 by 885,870 cycles. That makes it an RTL PASS and a speed FAIL.

predicateopportunitiesfillsfirst hitsrepeat hitsfirst+repeat / fill
seq_adjacent63,41927,079000.00%
word_not_last22,111,12035,052,260758,121259,0952.90%
uncompressed_not_last21,322,75533,848,167225,82748,3530.81%
predicted_not_taken20,165,35533,108,902737,761251,8962.99%
quiet_backend24,924,18332,829,051636,269219,4272.61%
strict_composite16,516,16029,570,243620,389216,4222.83%

The active guard still spends too much bandwidth for too little fetch reuse. Against P144 it adds 8.09M background repair fills and 14.64M prefetch second-word grants, but only 191,040 extra first/repeat hits.

classrepair fillsfirst hitsrepeat hitsfirst+repeat ratio
demand fetch931,741438,523295,53578.79%
execute prefetch37,205,331759,054259,1092.74%
load prefetch2,423,233132,40613,0836.00%
writeback prefetch396,342169,07252,95456.02%
aux prefetch109,67336,63924,19455.47%

This closes the execute-prefetch second-word repair thread for now. The next architecture rung should pivot to a different frontend/memory bottleneck.

shell phases label P147 shell workload cycles 219,782,307 cpi 2.55
  1. kernel banner to /init 117,338,268 53.5%
  2. /init to shell banner 1,086,082 0.5%
  3. shell banner to first command 35,916,577 16.4%
  4. echo command 1,649 0%
  5. uname -a 2,457,747 1.1%
  6. ls /bin /usr/share 32,296,246 14.7%
  7. cat sample file 3,052,234 1.4%
  8. touch/write/cat/rm /tmp file 10,694,437 4.9%
  9. 8x ash loop with file I/O 16,309,568 7.4%
  10. final marker 680 0%
state breakdown label P147 strict execute-prefetch guard workload cycles 219,782,307 cpi 2.55
  1. fetch 3.5% 7,620,860
  2. execute 39.2% 86,147,006
  3. mem 13.5% 29,585,503
  4. walker 1.2% 2,675,861
  5. writeback 39.2% 86,122,202
  6. mul/div 3.5% 7,629,159