No. 134 / project of 147 on the ladder

Aux load prefetch policy

introduces — guarded aux-load issue; frontend-memory pivot; load-miss overlap policy

harden statelast run2026-05-06
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P134 is the pivot back to frontend/memory work. P133 proved the backend dispatch module boundary was coherent, but it was still one-deep and shadow-only. P134 reopens the aux-load path and only lets it fire when the main memory port can also prefetch the next instruction word.

checkresult
Verilator buildPASS
BusyBox shell workload reaches P134-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP133P134
post-load cycles218,556,365218,247,567
shell window cycles64,581,91764,221,642
retired instructions86,293,67986,139,760
CPI2.53272.5336
S_FETCH cycles7,626,6437,617,695
S_MEM cycles27,727,93727,799,343
aux-load counterP133P134
candidates5,410,3885,372,186
issues0139,881
queue enqueues0139,881
queue dequeues0139,881
queue full drops00
aux load fills0139,881
P134 block reasoncount
frontend prefetch not safe/useful1,587,474
D-cache background fill active2,817,459
I-cache background fill active4,862,648

This is a modest speedup: 360,275 fewer shell-window cycles than P133, about 0.56%. The more important result is that the aux-load mechanism is active again without returning to the too-eager P111/P112 behavior.

The next memory-side question is now concrete: most remaining blocked aux-load candidates are blocked by I-cache or D-cache background-fill activity. P135 should measure whether those background policies are too strict, too loose, or just correctly protecting instruction service.

shell phases label P134 shell workload cycles 218,247,567 cpi 2.53
  1. kernel banner to /init 116,783,090 53.7%
  2. /init to shell banner 1,077,826 0.5%
  3. shell banner to first command 35,536,836 16.3%
  4. echo command 1,649 0%
  5. uname -a 2,498,995 1.2%
  6. ls /bin /usr/share 31,861,392 14.6%
  7. cat sample file 2,851,669 1.3%
  8. touch/write/cat/rm /tmp file 11,004,668 5.1%
  9. 8x ash loop with file I/O 16,002,589 7.4%
  10. final marker 680 0%
state breakdown label P134 aux load policy workload cycles 218,247,567 cpi 2.53
  1. fetch 3.5% 7,617,695
  2. execute 39.5% 86,164,426
  3. mem 12.9% 28,077,899
  4. walker 1.2% 2,676,872
  5. writeback 39.5% 86,139,760
  6. mul/div 3.5% 7,569,199