P99 is the architecture-map rung before the Harvard push. It keeps the P98 RTL behavior, reruns the BusyBox shell profile, and records what the instruction/data split needs to become.
Functional result: PASS. Speed result versus P98: FAIL, as expected for a map rung with no performance RTL change.
| metric | P98 | P99 |
|---|---|---|
| shell window cycles | 66,055,345 | 66,998,698 |
| post-load cycles | 221,452,591 | 222,509,604 |
| memory stall cycles | 59,683,338 | 59,928,278 |
| fetch stall cycles | 27,286,526 | 27,399,253 |
| load stall cycles | 10,697,962 | 10,718,661 |
| D-cache background fills | 377,930 | 379,701 |
The map says the core already has I-cache and D-cache structures,
separate fetch/LSU TLB lookup wires, named memory request classes, and
a fetch queue. The missing part is independent instruction and data
service. All request classes still collapse into one final mem_valid
port.
P100 should split the near-core service into instruction-side and data-side request records while keeping lower shared memory underneath for explicit conflict counting.