journal 2026-05-05

Blocking I-cache line fill

P90 tried the obvious next frontend step: convert P89’s one-word I-cache into a 4-word direct-mapped line cache. The implementation is blocking: after the critical miss word returns, the core enters S_IC_FILL and fetches the rest of the line before it executes the original instruction.

The smoke passed, which is good. The benchmark result is bad, which is also useful. I-cache hits nearly doubled, but the workload got much slower. The shell window went from 66,957,620 cycles in P89 to 84,084,195 cycles in P90. S_IC_FILL alone took 11,224,132 cycles.

This is exactly why frontend design is more than “add a cache.” The line-fill policy needs to be critical-word-first or nonblocking. Blocking the whole core to fill a line makes a Linux shell workload worse even when the hit counter looks better.

Status: PASS for RTL shell smoke, FAIL for whole-workload speedup, NOT RUN for LibreLane hardening.