P111 made the P110 load owner real. An aligned integer D-cache miss can
now consume AUX_OWNER_LOAD from the auxiliary lower-bank lane while the
main port services a safe next-PC instruction prefetch.
The shell smoke passed and reached P111-FILE-OK. The important counter
is aux_response_slot.owner_counts.load = 3,545,688, with zero aux
errors and zero cancels.
The speed result is not good yet:
P110 shell window: 63,761,231 cycles
P111 shell window: 64,766,712 cycles
delta: +1,005,481 cycles
So P111 is a functionality PASS and speedup FAIL. The next work should not throw away the load owner; it should add an MSHR-like queue/policy boundary so this overlap only fires when it actually helps.