No. 136 / project of 147 on the ladder

I-cache background preempt

introduces — I-cache background preemption test; aggressive aux-load issue policy; measured negative shell result

harden statelast run2026-05-06
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P136 tests the narrow target from P135: allow a useful next-PC prefetch plus aux-load issue to preempt I-cache background fill, while still blocking on D-cache background fill.

checkresult
Verilator buildPASS
BusyBox shell workload reaches P136-FILE-OKPASS
Aux-load queue full dropsPASS
Aux response errors/cancelsPASS
Hardened layoutNOT RUN
metricP135P136
post-load cycles219,445,401222,462,201
shell window cycles65,411,93967,392,718
retired instructions86,534,50386,961,841
CPI2.53592.5582
S_FETCH cycles7,634,6417,646,785
S_MEM cycles27,975,70329,771,841

This is a measured speedup FAIL. P136 is 1,980,779 shell-window cycles slower than P135, even though it issues far more auxiliary loads.

aux-load counterP135P136
candidates5,418,0285,182,600
issues140,5401,773,934
queue enqueues140,5401,773,934
queue dequeues140,5401,773,934
queue full drops00
aux load fills140,5401,773,934
P136 policy bucketcount
frontend prefetch not safe/useful1,597,219
frontend-ready candidates3,585,381
background quiet candidates / issued172,991
issued while preempting I-cache background1,600,943
blocked by D-cache background only362,011
blocked by I-cache background only0
blocked by both backgrounds1,449,436

The mechanical part worked: the old I-cache-only block bucket turned into 1.60M issued preemptions with 0 drops, 0 errors, and 0 cancels. The performance result says the policy is too aggressive. Interrupting I-cache background repair steals useful instruction-side service and raises S_MEM cycles by 1.80M versus P135.

The next step is an arbitration rung, not a wider open gate: put an age/debt limit around I-cache-background preemption or roll the policy back to the quiet-I-cache guard and keep this as the negative result.

shell phases label P136 shell workload cycles 222,462,201 cpi 2.56
  1. kernel banner to /init 117,366,189 52.9%
  2. /init to shell banner 1,083,891 0.5%
  3. shell banner to first command 35,990,549 16.2%
  4. echo command 1,649 0%
  5. uname -a 2,007,182 0.9%
  6. ls /bin /usr/share 33,073,949 14.9%
  7. cat sample file 2,725,174 1.2%
  8. touch/write/cat/rm /tmp file 11,954,163 5.4%
  9. 8x ash loop with file I/O 16,226,094 7.3%
  10. final marker 1,404,507 0.6%
state breakdown label P136 I-cache background preempt workload cycles 222,462,201 cpi 2.56
  1. fetch 3.4% 7,646,785
  2. execute 39.1% 86,987,095
  3. mem 13.5% 30,054,083
  4. walker 1.2% 2,734,200
  5. writeback 39.1% 86,961,841
  6. mul/div 3.6% 8,076,493