journal 2026-05-05

P96: D-cache v0

P96 took the data-side path after P95’s store-buffer miss. The new RTL starts from P94, adds a tiny direct-mapped word D-cache, and leaves stores ordered and write-through.

The shell profile passed:

metricP94P96
post-load cycles222,459,202221,522,958
shell window cycles67,050,37466,084,155
CPI2.56692.5656
memory stall cycles60,032,32959,418,375
load stall cycles14,632,99210,976,902
fetch stall cycles23,549,35926,676,104

D-cache counters:

countervalue
load hits3,656,064
load misses6,354,876
fills6,354,876
store updates10,473,803
invalidations1,873,327

This is a modest speed win and a useful signal. Load stalls drop by about 25%, but fetch stalls rise by about 13%, so the next data-cache rung should use multi-word lines with critical-word-first service and background fill. Blocking fill is already known-bad from P90.