P106 is the first real widened boundary in the lower-memory banking arc. P105 modeled safe different-bank extra grants. P106 emits those requests on a top-level auxiliary read lane and has the Verilator memory model service them against the real boot image.
The core does not consume the auxiliary response yet, so this is not a speed rung.
Result
| check | result |
|---|---|
make check-tools | PASS |
| Verilator build | PASS |
Linux reaches /init | PASS |
| BusyBox prompt | PASS |
BusyBox shell workload reaches P106-FILE-OK | PASS |
| Auxiliary lower-bank read lane serviced | PASS |
| Auxiliary read errors | PASS |
| Hardened layout | NOT RUN |
Timing
| metric | P105 model | P106 contract |
|---|---|---|
| post-load cycles | 218,480,625 | 219,613,584 |
| shell window cycles | 64,438,096 | 65,558,077 |
| retired instructions | 86,106,731 | 86,478,207 |
| CPI | 2.5373 | 2.5395 |
| BusyBox ready milestone | 118,422,909 | 118,413,096 |
shell FILE-OK milestone | 218,480,768 | 219,613,727 |
| kernel panic milestone | 0 | 0 |
The timing table is included for honesty. Since the core still ignores the auxiliary response, P106 cannot reduce architectural cycles yet.
Auxiliary Contract
The new top-level lane carries:
banked_aux_valid, banked_aux_addr, banked_aux_size,
banked_aux_bank, banked_aux_side
It fires for the same conservative condition P105 modeled: split-bank I/D demand, one real grant, and a blocked read-like request.
Service Counters
| counter | value |
|---|---|
| auxiliary instruction reads serviced | 488,792 |
| auxiliary data reads serviced | 20,141,261 |
| auxiliary reads serviced total | 20,630,053 |
| shell-window auxiliary reads | 8,458,681 |
| auxiliary read errors | 0 |
| auxiliary read checksum | 947,922,106 |
The serviced read count matches the model exactly:
| comparison | value |
|---|---|
| modeled extra grants | 20,630,053 |
| serviced auxiliary reads | 20,630,053 |
| match | 100.00% |
| modeled shell extra grants | 8,458,681 |
| serviced shell auxiliary reads | 8,458,681 |
| shell match | 100.00% |
Per-bank auxiliary service:
| bank | auxiliary reads serviced |
|---|---|
| 0 | 8,716,171 |
| 1 | 4,270,912 |
| 2 | 3,843,696 |
| 3 | 3,799,274 |
Memory Stalls
- instruction fetch 27,458,054 46.6% 46,933,482 req
- data load 11,664,621 19.8% 560,189 req
- data store 10,933,918 18.6% 77,398 req
- atomic memory op 174,290 0.3% 167,624 req
- page walk for fetch 684,866 1.2% 678,712 req
- page walk for load/store 671,603 1.1% 665,427 req
- other 7,325,339 12.4% 16,913,790 req
The main stall chart still reflects the single response consumed by the
core. The new result lives in banked_lower_contract.
Shell Phases
- kernel banner to /init 116,715,634 53.3%
- /init to shell banner 1,069,252 0.5%
- shell banner to first command 35,642,554 16.3%
- echo command 1,649 0%
- uname -a 2,558,009 1.2%
- ls /bin /usr/share 32,115,978 14.7%
- cat sample file 2,713,112 1.2%
- touch/write/cat/rm /tmp file 10,261,206 4.7%
- 8x ash loop with file I/O 17,907,443 8.2%
- final marker 680 0%
The shell script reaches P106-FILE-OK.
Cycle Shape
- fetch 3.7% 8,131,172
- execute 39.4% 86,503,337
- mem 12.8% 28,069,359
- walker 1.2% 2,700,608
- writeback 39.4% 86,478,207
- mul/div 3.5% 7,729,185
P106 retires 86.48M instructions at CPI 2.5395.
Hot Functions
- 5.7% of samples (3,676 samples)5.7% 3,676
- 5.2% of samples (3,308 samples)5.2% 3,308
- 3.6% of samples (2,288 samples)3.6% 2,288
- 3.4% of samples (2,195 samples)3.4% 2,195
- 2.8% of samples (1,796 samples)2.8% 1,796
- 2.7% of samples (1,723 samples)2.7% 1,723
- 2.5% of samples (1,620 samples)2.5% 1,620
- 1.8% of samples (1,148 samples)1.8% 1,148
- 1.6% of samples (1,001 samples)1.6% 1,001
- 1.4% of samples (872 samples)1.4% 872
- 1.4% of samples (867 samples)1.4% 867
- 1.3% of samples (825 samples)1.3% 825
- 1.3% of samples (804 samples)1.3% 804
- 1.1% of samples (681 samples)1.1% 681
- 1% of samples (669 samples)1% 669
- 55.1% of samples (35,264 samples)55.1% 35,264
The software workload is unchanged. This page is about the hardware boundary and serviced auxiliary-memory work.
Next
P107 should consume the auxiliary response for one narrow, low-risk client first. Background D-cache fill or prefetch traffic is a better first target than demand fetch/load, because a wrong demand response path can corrupt architectural execution immediately.