journal 2026-05-05

P101: Split ITLB/DTLB

P101 split the unified 8-entry TLB into separate 8-entry ITLB and DTLB banks. The page-table walker is still shared, but fetch/prefetch and LSU traffic no longer evict each other’s translations from one tiny array.

Functional result: PASS. Speed result versus P100: PASS.

metricP100P101
shell window cycles66,518,62663,777,267
post-load cycles221,990,140217,630,965
memory stall cycles59,819,12958,999,994
fetch stall cycles27,346,15027,158,844
load stall cycles10,729,42710,999,370
fetch page walks1,122,943674,678
data page walks1,218,084666,522
CPI2.56602.5297

The shell window improved by 4.12%. Fetch walks fell 39.92%; data walks fell 45.28%. That is a real result, not just instrumentation churn.

The remaining issue is now clearer: the ITLB and DTLB banks are split, but misses still serialize through one walker and the same lower memory service. P102 should probably be data-side buffering with forwarding, unless we decide to attack nonblocking miss tracking first.