P101 split the unified 8-entry TLB into separate 8-entry ITLB and DTLB banks. The page-table walker is still shared, but fetch/prefetch and LSU traffic no longer evict each other’s translations from one tiny array.
Functional result: PASS. Speed result versus P100: PASS.
| metric | P100 | P101 |
|---|---|---|
| shell window cycles | 66,518,626 | 63,777,267 |
| post-load cycles | 221,990,140 | 217,630,965 |
| memory stall cycles | 59,819,129 | 58,999,994 |
| fetch stall cycles | 27,346,150 | 27,158,844 |
| load stall cycles | 10,729,427 | 10,999,370 |
| fetch page walks | 1,122,943 | 674,678 |
| data page walks | 1,218,084 | 666,522 |
| CPI | 2.5660 | 2.5297 |
The shell window improved by 4.12%. Fetch walks fell 39.92%; data walks fell 45.28%. That is a real result, not just instrumentation churn.
The remaining issue is now clearer: the ITLB and DTLB banks are split, but misses still serialize through one walker and the same lower memory service. P102 should probably be data-side buffering with forwarding, unless we decide to attack nonblocking miss tracking first.