P99: Harvard I/D map

P99 is the architecture-map rung before the Harvard push. It keeps the P98 RTL behavior, reruns the BusyBox shell profile, and records what the instruction/data split needs to become.

Functional result: PASS. Speed result versus P98: FAIL, as expected for a map rung with no performance RTL change.

metric	P98	P99
shell window cycles	66,055,345	66,998,698
post-load cycles	221,452,591	222,509,604
memory stall cycles	59,683,338	59,928,278
fetch stall cycles	27,286,526	27,399,253
load stall cycles	10,697,962	10,718,661
D-cache background fills	377,930	379,701

The map says the core already has I-cache and D-cache structures, separate fetch/LSU TLB lookup wires, named memory request classes, and a fetch queue. The missing part is independent instruction and data service. All request classes still collapse into one final mem_valid port.

P100 should split the near-core service into instruction-side and data-side request records while keeping lower shared memory underneath for explicit conflict counting.