journal 2026-05-05

First tiny I-cache for fetch stalls

P89 added a 256-word direct-mapped instruction cache to the P88/P87 Linux shell core. The cache has physical tags and one 32-bit word per entry. It is checked both in normal S_FETCH and in the existing writeback prefetch path.

The first run exposed a counter bug: icache_fetch_hit was not gated by state == S_FETCH, so the hit counter counted unrelated cycles. The RTL fetch path was still gated by state, but the benchmark JSON was wrong. After fixing the counter, the shell smoke passed again.

Result: fetch stalls improved, full workload did not. Fetch stall cycles went from 58,870,166 in P88 to 54,266,192 in P89, a 7.82% reduction. Total memory stalls fell 5.15%. But post-load cycles moved from 221,748,021 to 222,317,206, and the shell window regressed about 1%.

That is still useful. The cache is too crude: one-word lines and whole-cache invalidation on every store are not enough for a Linux shell. P90 should try line fill before we blame the idea of an I-cache.

Status: PASS for RTL shell smoke and fetch-stall reduction, FAIL for whole-workload speedup, NOT RUN for LibreLane hardening.