No. 135 / project of 147 on the ladder

Cache background policy audit

introduces — mutually exclusive aux-load block buckets; I-cache versus D-cache background attribution; P136 policy target

harden statelast run2026-05-06
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P135 keeps P134’s aux-load policy and adds mutually exclusive block buckets. The question is which background-fill guard actually blocks otherwise-safe aux-load opportunities.

checkresult
Verilator buildPASS
BusyBox shell workload reaches P135-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP134P135
post-load cycles218,247,567219,445,401
shell window cycles64,221,64265,411,939
retired instructions86,139,76086,534,503
CPI2.53362.5359
S_FETCH cycles7,617,6957,634,641
S_MEM cycles27,799,34327,975,703

P135 is not a speedup rung. The value is the attribution.

aux-load counterP134P135
candidates5,372,1865,418,028
issues139,881140,540
queue enqueues139,881140,540
queue dequeues139,881140,540
queue full drops00
aux load fills139,881140,540
P135 mutually exclusive bucketcount
frontend prefetch not safe/useful1,596,970
frontend-ready candidates3,821,058
background quiet candidates / actual issues140,540
blocked by D-cache background only196,378
blocked by I-cache background only1,620,547
blocked by both backgrounds1,863,593

The next target is clear: D-cache-only blocks are small. I-cache background activity is the larger single guard. P136 should test whether a useful next-PC prefetch plus aux-load issue can preempt I-cache background fill while still keeping D-cache background fill quiet.

shell phases label P135 shell workload cycles 219,445,401 cpi 2.54
  1. kernel banner to /init 116,777,482 53.4%
  2. /init to shell banner 1,075,030 0.5%
  3. shell banner to first command 35,552,777 16.3%
  4. echo command 1,649 0%
  5. uname -a 2,579,088 1.2%
  6. ls /bin /usr/share 31,146,301 14.2%
  7. cat sample file 4,002,356 1.8%
  8. touch/write/cat/rm /tmp file 11,410,983 5.2%
  9. 8x ash loop with file I/O 16,270,882 7.4%
  10. final marker 680 0%
state breakdown label P135 cache background audit workload cycles 219,445,401 cpi 2.54
  1. fetch 3.5% 7,634,641
  2. execute 39.4% 86,559,397
  3. mem 12.9% 28,256,237
  4. walker 1.2% 2,704,319
  5. writeback 39.4% 86,534,503
  6. mul/div 3.5% 7,754,588