journal 2026-05-05

P97: D-cache line-fill

P97 tried the obvious D-cache follow-up: four-word lines with critical-word-first demand loads and background fill for the rest of the line.

Functional result: PASS. Speed result: FAIL versus P96.

metricP96P97
shell window cycles66,084,15567,369,576
post-load cycles221,522,958222,850,787
memory stall cycles59,418,37560,295,642
load stall cycles10,976,90210,387,310
fetch stall cycles26,676,10429,593,757

The local cache counters improved:

counterP96P97
load hits3,656,0644,370,122
load misses6,354,8765,746,602
background fills03,419,006

So the line-fill geometry is useful, but the background policy is too eager for a one-port memory system. P98 should make background fill conditional on real frontend idleness.