Strict execute-prefetch guard · librelane-playground

P147 is the final narrow execute-prefetch repair check promised by P146. It makes one composite predicate active:

predicted_not_taken && word_not_last && quiet_backend

When the guard fires, execute-prefetch repair gets a two-word budget. Otherwise it keeps the P144 one-word budget.

check	result
Verilator build	PASS
BusyBox userspace build	PASS
Linux image rebuilt with P147 initramfs	PASS
BusyBox shell workload reaches `P147-FILE-OK`	PASS
Aux-load queue full drops	PASS
Aux response errors/cancels	PASS
Hardened layout	NOT RUN

metric	P142	P144	P146	P147
post-load cycles	218,856,863	219,161,535	220,667,218	219,782,307
shell window cycles	63,926,691	64,192,833	65,663,039	64,812,561
retired instructions	85,799,045	85,905,855	86,410,376	86,122,202
CPI	2.5508	2.5512	2.5537	2.5520
S_FETCH cycles	7,598,133	7,601,896	7,642,687	7,620,860
S_MEM cycles	29,199,830	29,215,063	29,435,054	29,306,905

P147 improves on P146 by 850,478 cycles, but still loses to P144 by 619,728 cycles and to P142 by 885,870 cycles. That makes it an RTL PASS and a speed FAIL.

predicate	opportunities	fills	first hits	repeat hits	first+repeat / fill
`seq_adjacent`	63,419	27,079	0	0	0.00%
`word_not_last`	22,111,120	35,052,260	758,121	259,095	2.90%
`uncompressed_not_last`	21,322,755	33,848,167	225,827	48,353	0.81%
`predicted_not_taken`	20,165,355	33,108,902	737,761	251,896	2.99%
`quiet_backend`	24,924,183	32,829,051	636,269	219,427	2.61%
`strict_composite`	16,516,160	29,570,243	620,389	216,422	2.83%

The active guard still spends too much bandwidth for too little fetch reuse. Against P144 it adds 8.09M background repair fills and 14.64M prefetch second-word grants, but only 191,040 extra first/repeat hits.

class	repair fills	first hits	repeat hits	first+repeat ratio
demand fetch	931,741	438,523	295,535	78.79%
execute prefetch	37,205,331	759,054	259,109	2.74%
load prefetch	2,423,233	132,406	13,083	6.00%
writeback prefetch	396,342	169,072	52,954	56.02%
aux prefetch	109,673	36,639	24,194	55.47%

This closes the execute-prefetch second-word repair thread for now. The next architecture rung should pivot to a different frontend/memory bottleneck.

shell phases label P147 shell workload cycles 219,782,307 cpi 2.55

kernel banner to /init 117,338,268 53.5%
/init to shell banner 1,086,082 0.5%
shell banner to first command 35,916,577 16.4%
echo command 1,649 0%
uname -a 2,457,747 1.1%
ls /bin /usr/share 32,296,246 14.7%
cat sample file 3,052,234 1.4%
touch/write/cat/rm /tmp file 10,694,437 4.9%
8x ash loop with file I/O 16,309,568 7.4%
final marker 680 0%

state breakdown label P147 strict execute-prefetch guard workload cycles 219,782,307 cpi 2.55

fetch 3.5% 7,620,860
execute 39.2% 86,147,006
mem 13.5% 29,585,503
walker 1.2% 2,675,861
writeback 39.2% 86,122,202
mul/div 3.5% 7,629,159