journal 2026-05-06

P111 nonblocking load aux

P111 made the P110 load owner real. An aligned integer D-cache miss can now consume AUX_OWNER_LOAD from the auxiliary lower-bank lane while the main port services a safe next-PC instruction prefetch.

The shell smoke passed and reached P111-FILE-OK. The important counter is aux_response_slot.owner_counts.load = 3,545,688, with zero aux errors and zero cancels.

The speed result is not good yet:

P110 shell window: 63,761,231 cycles
P111 shell window: 64,766,712 cycles
delta:             +1,005,481 cycles

So P111 is a functionality PASS and speedup FAIL. The next work should not throw away the load owner; it should add an MSHR-like queue/policy boundary so this overlap only fires when it actually helps.