P101: Split ITLB/DTLB

P101 split the unified 8-entry TLB into separate 8-entry ITLB and DTLB banks. The page-table walker is still shared, but fetch/prefetch and LSU traffic no longer evict each other’s translations from one tiny array.

Functional result: PASS. Speed result versus P100: PASS.

metric	P100	P101
shell window cycles	66,518,626	63,777,267
post-load cycles	221,990,140	217,630,965
memory stall cycles	59,819,129	58,999,994
fetch stall cycles	27,346,150	27,158,844
load stall cycles	10,729,427	10,999,370
fetch page walks	1,122,943	674,678
data page walks	1,218,084	666,522
CPI	2.5660	2.5297

The shell window improved by 4.12%. Fetch walks fell 39.92%; data walks fell 45.28%. That is a real result, not just instrumentation churn.

The remaining issue is now clearer: the ITLB and DTLB banks are split, but misses still serialize through one walker and the same lower memory service. P102 should probably be data-side buffering with forwarding, unless we decide to attack nonblocking miss tracking first.