P105 keeps the real lower memory single-port, but models what a four-bank service could grant in parallel. This matters because P104 showed plenty of different-bank instruction/data overlap, but did not separate safe read-like overlap from ordering-sensitive traffic.
Result
| check | result |
|---|---|
make check-tools | PASS |
| Verilator build | PASS |
Linux reaches /init | PASS |
| BusyBox prompt | PASS |
BusyBox shell workload reaches P105-FILE-OK | PASS |
| Banked service model emitted | PASS |
| Hardened layout | NOT RUN |
Timing
| metric | P104 bank counters | P105 banked service model |
|---|---|---|
| post-load cycles | 219,172,843 | 218,480,625 |
| shell window cycles | 65,062,462 | 64,438,096 |
| retired instructions | 86,339,942 | 86,106,731 |
| CPI | 2.5385 | 2.5373 |
| BusyBox ready milestone | 118,418,832 | 118,422,909 |
shell FILE-OK milestone | 219,172,986 | 218,480,768 |
| kernel panic milestone | 0 | 0 |
The actual run is slightly faster than P104, but P105 is not claiming a hardware speedup. The lower memory port is still one real lane. The meaningful result is the model block below.
Model Policy
P105 adds side-effect flags to the Harvard service taps. It models an extra grant only when instruction and data want different lower banks and the blocked side is read-like. Fetch, prefetch, cache fills, loads, and read-only page-table walks qualify. Stores, FP stores, AMOs, store-buffer drains, and PTE A/D writes do not.
Modeled Service
| counter | value |
|---|---|
| split-bank wants | 20,460,163 |
| modeled extra instruction grants | 488,212 |
| modeled extra data grants | 19,971,951 |
| modeled extra grants total | 20,460,163 |
| shell-window extra grants | 8,290,549 |
| same-bank cycles left serialized | 8,199,513 |
| unsafe split cycles left serialized | 0 |
| actual shell window | 64,438,096 |
| projected shell window if each extra grant saves one cycle | 56,147,547 |
| idealized shell-window reduction | 12.87% |
The surprising part is unsafe_split_cycles = 0 for this run. The
single-port policy usually grants the side-effecting data operation first
when one exists, so the blocked different-bank request is read-like
instruction traffic. Under P105’s conservative model, every split-bank
blocked cycle becomes a candidate extra grant.
Bank Distribution
| bank | I want | D want | I grant | D grant |
|---|---|---|---|---|
| 0 | 27,608,896 | 18,463,913 | 27,430,941 | 6,359,291 |
| 1 | 23,344,445 | 13,601,194 | 23,222,337 | 6,389,211 |
| 2 | 24,054,837 | 12,984,541 | 23,920,432 | 6,529,923 |
| 3 | 23,573,246 | 13,630,263 | 23,376,587 | 7,117,260 |
Bank 0 remains hottest, but the model still finds enough split-bank overlap to justify a real implementation experiment.
Memory Stalls
- instruction fetch 27,340,309 46.6% 46,723,707 req
- data load 11,613,521 19.8% 556,407 req
- data store 10,885,814 18.6% 76,888 req
- atomic memory op 173,144 0.3% 166,597 req
- page walk for fetch 677,148 1.2% 670,994 req
- page walk for load/store 666,477 1.1% 660,291 req
- other 7,301,071 12.4% 16,833,614 req
The stall chart is still the single-port hardware. Use the
banked_service_model numbers to read what could change.
Shell Phases
- kernel banner to /init 116,719,059 53.6%
- /init to shell banner 1,075,640 0.5%
- shell banner to first command 35,619,763 16.4%
- echo command 1,649 0%
- uname -a 2,233,445 1%
- ls /bin /usr/share 32,300,369 14.8%
- cat sample file 2,857,053 1.3%
- touch/write/cat/rm /tmp file 10,994,749 5.1%
- 8x ash loop with file I/O 16,050,151 7.4%
- final marker 680 0%
The same BusyBox script reaches P105-FILE-OK.
Cycle Shape
- fetch 3.7% 8,108,238
- execute 39.4% 86,131,517
- mem 12.8% 27,922,270
- walker 1.2% 2,674,910
- writeback 39.4% 86,106,731
- mul/div 3.4% 7,535,243
P105 retires 86.11M instructions at CPI 2.5373.
Hot Functions
- 5.8% of samples (3,626 samples)5.8% 3,626
- 5.2% of samples (3,249 samples)5.2% 3,249
- 3.8% of samples (2,374 samples)3.8% 2,374
- 3.2% of samples (2,011 samples)3.2% 2,011
- 2.9% of samples (1,811 samples)2.9% 1,811
- 2.7% of samples (1,712 samples)2.7% 1,712
- 2.7% of samples (1,671 samples)2.7% 1,671
- 1.7% of samples (1,059 samples)1.7% 1,059
- 1.6% of samples (1,026 samples)1.6% 1,026
- 1.4% of samples (856 samples)1.4% 856
- 1.3% of samples (832 samples)1.3% 832
- 1.2% of samples (778 samples)1.2% 778
- 1.2% of samples (758 samples)1.2% 758
- 1% of samples (644 samples)1% 644
- 1% of samples (641 samples)1% 641
- 55.1% of samples (34,665 samples)55.1% 34,665
The workload is unchanged. The new result is an architecture estimate: about 8.29M shell-window cycles of blocked read-like lower-memory service are candidates for real different-bank parallelism.
Next
P106 should change the memory contract instead of adding another estimator: two near-core lanes, four lower banks, same-cycle different bank grants, deterministic same-bank priority, and conservative serialization for stores, AMOs, and PTE updates until ordering tests exist.