P136 tests the narrow target from P135: allow a useful next-PC prefetch plus aux-load issue to preempt I-cache background fill, while still blocking on D-cache background fill.
| check | result |
|---|---|
| Verilator build | PASS |
BusyBox shell workload reaches P136-FILE-OK | PASS |
| Aux-load queue full drops | PASS |
| Aux response errors/cancels | PASS |
| Hardened layout | NOT RUN |
| metric | P135 | P136 |
|---|---|---|
| post-load cycles | 219,445,401 | 222,462,201 |
| shell window cycles | 65,411,939 | 67,392,718 |
| retired instructions | 86,534,503 | 86,961,841 |
| CPI | 2.5359 | 2.5582 |
| S_FETCH cycles | 7,634,641 | 7,646,785 |
| S_MEM cycles | 27,975,703 | 29,771,841 |
This is a measured speedup FAIL. P136 is 1,980,779 shell-window cycles slower than P135, even though it issues far more auxiliary loads.
| aux-load counter | P135 | P136 |
|---|---|---|
| candidates | 5,418,028 | 5,182,600 |
| issues | 140,540 | 1,773,934 |
| queue enqueues | 140,540 | 1,773,934 |
| queue dequeues | 140,540 | 1,773,934 |
| queue full drops | 0 | 0 |
| aux load fills | 140,540 | 1,773,934 |
| P136 policy bucket | count |
|---|---|
| frontend prefetch not safe/useful | 1,597,219 |
| frontend-ready candidates | 3,585,381 |
| background quiet candidates / issued | 172,991 |
| issued while preempting I-cache background | 1,600,943 |
| blocked by D-cache background only | 362,011 |
| blocked by I-cache background only | 0 |
| blocked by both backgrounds | 1,449,436 |
The mechanical part worked: the old I-cache-only block bucket turned into 1.60M issued preemptions with 0 drops, 0 errors, and 0 cancels. The performance result says the policy is too aggressive. Interrupting I-cache background repair steals useful instruction-side service and raises S_MEM cycles by 1.80M versus P135.
The next step is an arbitration rung, not a wider open gate: put an age/debt limit around I-cache-background preemption or roll the policy back to the quiet-I-cache guard and keep this as the negative result.
- kernel banner to /init 117,366,189 52.9%
- /init to shell banner 1,083,891 0.5%
- shell banner to first command 35,990,549 16.2%
- echo command 1,649 0%
- uname -a 2,007,182 0.9%
- ls /bin /usr/share 33,073,949 14.9%
- cat sample file 2,725,174 1.2%
- touch/write/cat/rm /tmp file 11,954,163 5.4%
- 8x ash loop with file I/O 16,226,094 7.3%
- final marker 1,404,507 0.6%
- fetch 3.4% 7,646,785
- execute 39.1% 86,987,095
- mem 13.5% 30,054,083
- walker 1.2% 2,734,200
- writeback 39.1% 86,961,841
- mul/div 3.6% 8,076,493