P92 added a one-entry fetch queue to the Linux-capable core. The queue
is deliberately conservative: safe word-aligned next_pc fetches can
launch from S_EXECUTE, then S_WB consumes the queued instruction if
the PC still matches.
The shell smoke passed:
P92 direct UART console + memory attribution smoke PASS
The result was mixed. Queue counters prove the mechanism is active:
| counter | value |
|---|---|
| queue valid cycles | 53,982,463 |
| queue fills | 53,982,463 |
| queue consumes | 53,982,463 |
| execute-prefetch cycles | 53,982,463 |
But the shell window is slower than P91:
| metric | P91 | P92 |
|---|---|---|
| post-load cycles | 221,327,811 | 222,624,131 |
| shell window cycles | 65,985,297 | 67,206,635 |
| fetch stall cycles | 55,533,555 | 23,555,005 |
| I-cache hits | 8,855,599 | 42,665,352 |
So the queue is not the wrong idea, but this one-entry implementation is not enough. It cuts fetch-class stalls while leaving the machine too tangled around one memory path and no prediction.
Next: P93 branch predictor v0.