P94 split the shared memory-port mux into named request classes and added want/grant/class counters to the Verilator harness. The physical memory model is still one shared RAM port; the difference is that the traffic is now visible.
The shell smoke passed:
P94 direct UART console + memory attribution smoke PASS
The run is slightly slower than P93, so this is not a speed claim:
| metric | P93 | P94 |
|---|---|---|
| post-load cycles | 221,863,586 | 222,459,202 |
| shell window cycles | 66,342,842 | 67,050,374 |
| memory stall cycles | 59,886,452 | 60,032,329 |
| fetch stall cycles | 23,503,650 | 23,549,359 |
The useful new counter is contention. Only background I-cache fill was denied service by higher-priority work:
| class | denied cycles |
|---|---|
| I-cache background fill | 4,542,004 |
| all foreground classes | 0 |
That points P95 away from another pure frontend guess. The foreground load/store path is paying real shared-memory latency, so a store buffer or tiny D-cache is the next sensible experiment.