P90 tried the obvious next frontend step: convert P89’s one-word I-cache
into a 4-word direct-mapped line cache. The implementation is blocking:
after the critical miss word returns, the core enters S_IC_FILL and
fetches the rest of the line before it executes the original instruction.
The smoke passed, which is good. The benchmark result is bad, which is
also useful. I-cache hits nearly doubled, but the workload got much
slower. The shell window went from 66,957,620 cycles in P89 to
84,084,195 cycles in P90. S_IC_FILL alone took 11,224,132 cycles.
This is exactly why frontend design is more than “add a cache.” The line-fill policy needs to be critical-word-first or nonblocking. Blocking the whole core to fill a line makes a Linux shell workload worse even when the hit counter looks better.
Status: PASS for RTL shell smoke, FAIL for whole-workload speedup, NOT RUN for LibreLane hardening.