journal 2026-05-05

P94: memory arbiter v0

P94 split the shared memory-port mux into named request classes and added want/grant/class counters to the Verilator harness. The physical memory model is still one shared RAM port; the difference is that the traffic is now visible.

The shell smoke passed:

P94 direct UART console + memory attribution smoke PASS

The run is slightly slower than P93, so this is not a speed claim:

metricP93P94
post-load cycles221,863,586222,459,202
shell window cycles66,342,84267,050,374
memory stall cycles59,886,45260,032,329
fetch stall cycles23,503,65023,549,359

The useful new counter is contention. Only background I-cache fill was denied service by higher-priority work:

classdenied cycles
I-cache background fill4,542,004
all foreground classes0

That points P95 away from another pure frontend guess. The foreground load/store path is paying real shared-memory latency, so a store buffer or tiny D-cache is the next sensible experiment.