P136 answered the obvious question and gave the wrong performance answer: unbounded I-cache-background preemption is functionally safe but too aggressive. P137 added the smallest fairness policy possible. One I-cache-background preemption may issue; the next otherwise-safe I-cache-only candidate defers and resets the burst counter.
The shell workload passes. The aux-load queue handles 1,674,017 load responses with 0 drops, 0 errors, and 0 cancels.
Performance recovers most of P136’s regression. P136’s shell window was 67,392,718 cycles. P137 lands at 65,356,841 cycles, which is 2,035,877 cycles better and 55,098 cycles faster than P135.
That is not a breakthrough. P134 is still faster, and P137 still spends 29,389,375 cycles in S_MEM versus P135’s 27,975,703. The useful lesson is direction: bounded arbitration helps, but a one-bit-ish burst cap is not expressive enough. Next should be a debt or age counter.