Banked lower-memory contract · librelane-playground

P106 is the first real widened boundary in the lower-memory banking arc. P105 modeled safe different-bank extra grants. P106 emits those requests on a top-level auxiliary read lane and has the Verilator memory model service them against the real boot image.

The core does not consume the auxiliary response yet, so this is not a speed rung.

Result

check	result
`make check-tools`	PASS
Verilator build	PASS
Linux reaches `/init`	PASS
BusyBox prompt	PASS
BusyBox shell workload reaches `P106-FILE-OK`	PASS
Auxiliary lower-bank read lane serviced	PASS
Auxiliary read errors	PASS
Hardened layout	NOT RUN

Timing

metric	P105 model	P106 contract
post-load cycles	218,480,625	219,613,584
shell window cycles	64,438,096	65,558,077
retired instructions	86,106,731	86,478,207
CPI	2.5373	2.5395
BusyBox ready milestone	118,422,909	118,413,096
shell `FILE-OK` milestone	218,480,768	219,613,727
kernel panic milestone	0	0

The timing table is included for honesty. Since the core still ignores the auxiliary response, P106 cannot reduce architectural cycles yet.

Auxiliary Contract

The new top-level lane carries:

banked_aux_valid, banked_aux_addr, banked_aux_size,
banked_aux_bank, banked_aux_side

It fires for the same conservative condition P105 modeled: split-bank I/D demand, one real grant, and a blocked read-like request.

Service Counters

counter	value
auxiliary instruction reads serviced	488,792
auxiliary data reads serviced	20,141,261
auxiliary reads serviced total	20,630,053
shell-window auxiliary reads	8,458,681
auxiliary read errors	0
auxiliary read checksum	947,922,106

The serviced read count matches the model exactly:

comparison	value
modeled extra grants	20,630,053
serviced auxiliary reads	20,630,053
match	100.00%
modeled shell extra grants	8,458,681
serviced shell auxiliary reads	8,458,681
shell match	100.00%

Per-bank auxiliary service:

bank	auxiliary reads serviced
0	8,716,171
1	4,270,912
2	3,843,696
3	3,799,274

Memory Stalls

memory stalls label P106 banked lower-contract workload stalls 58,912,691 handshakes 65,996,622

instruction fetch 27,458,054 46.6% 46,933,482 req
data load 11,664,621 19.8% 560,189 req
data store 10,933,918 18.6% 77,398 req
atomic memory op 174,290 0.3% 167,624 req
page walk for fetch 684,866 1.2% 678,712 req
page walk for load/store 671,603 1.1% 665,427 req
other 7,325,339 12.4% 16,913,790 req

The main stall chart still reflects the single response consumed by the core. The new result lives in banked_lower_contract.

Shell Phases

shell phases label P106 shell workload cycles 219,613,584 cpi 2.54

kernel banner to /init 116,715,634 53.3%
/init to shell banner 1,069,252 0.5%
shell banner to first command 35,642,554 16.3%
echo command 1,649 0%
uname -a 2,558,009 1.2%
ls /bin /usr/share 32,115,978 14.7%
cat sample file 2,713,112 1.2%
touch/write/cat/rm /tmp file 10,261,206 4.7%
8x ash loop with file I/O 17,907,443 8.2%
final marker 680 0%

The shell script reaches P106-FILE-OK.

Cycle Shape

state breakdown label P106 banked lower-contract workload cycles 219,613,584 cpi 2.54

fetch 3.7% 8,131,172
execute 39.4% 86,503,337
mem 12.8% 28,069,359
walker 1.2% 2,700,608
writeback 39.4% 86,478,207
mul/div 3.5% 7,729,185

P106 retires 86.48M instructions at CPI 2.5395.

Hot Functions

hot functions label P106 BusyBox shell symbols samples 64,022 period every 1,024 cycles

printf_core busybox

5.7% of samples (3,676 samples)

5.7% 3,676
memset kernel

5.2% of samples (3,308 samples)

5.2% 3,308
memcpy busybox

3.6% of samples (2,288 samples)

3.6% 2,288
vruntime_eligible kernel

3.4% of samples (2,195 samples)

3.4% 2,195
blake2s_compress_generic kernel

2.8% of samples (1,796 samples)

2.8% 1,796
__fwritex busybox

2.7% of samples (1,723 samples)

2.7% 1,723
memcpy kernel

2.5% of samples (1,620 samples)

2.5% 1,620
handle_exception kernel

1.8% of samples (1,148 samples)

1.8% 1,148
unmap_page_range kernel

1.6% of samples (1,001 samples)

1.6% 1,001
avg_vruntime kernel

1.4% of samples (872 samples)

1.4% 872
n_tty_write kernel

1.4% of samples (867 samples)

1.4% 867
memset busybox

1.3% of samples (825 samples)

1.3% 825
ret_from_exception kernel

1.3% of samples (804 samples)

1.3% 804
n_tty_read kernel

1.1% of samples (681 samples)

1.1% 681
next_uptodate_folio kernel

1% of samples (669 samples)

1% 669
(remaining) remaining

55.1% of samples (35,264 samples)

55.1% 35,264

The software workload is unchanged. This page is about the hardware boundary and serviced auxiliary-memory work.

P107 should consume the auxiliary response for one narrow, low-risk client first. Background D-cache fill or prefetch traffic is a better first target than demand fetch/load, because a wrong demand response path can corrupt architectural execution immediately.

Result

Timing

Auxiliary Contract

Service Counters

Memory Stalls

Shell Phases

Cycle Shape

Hot Functions

Next