P100 is the first measured Harvard-service rung. It keeps the ISA-visible behavior intact, splits the memory clients into instruction-side and data-side service intentions, and counts where the old lower shared memory policy still serializes them.
The short version: this is not two physical RAM ports yet. It is the boundary and the counters that make the next split honest.
Result
| metric | P94 arbiter | P98 throttle | P99 map | P100 split service |
|---|---|---|---|---|
| post-load cycles | 222,459,202 | 221,452,591 | 222,509,604 | 221,990,140 |
| shell window cycles | 67,050,374 | 66,055,345 | 66,998,698 | 66,518,626 |
| retired instructions | 86,664,089 | 86,329,983 | 86,648,693 | 86,512,027 |
| CPI | 2.5669 | 2.5652 | 2.5680 | 2.5660 |
| memory stall cycles | 60,032,329 | 59,683,338 | 59,928,278 | 59,819,129 |
| fetch stall cycles | 23,549,359 | 27,286,526 | 27,399,253 | 27,346,150 |
| load stall cycles | 14,632,992 | 10,697,962 | 10,718,661 | 10,729,427 |
| comparison | result |
|---|---|
| shell window vs P99 | -0.72% |
| post-load cycles vs P99 | -0.23% |
| memory stalls vs P99 | -0.18% |
| fetch stalls vs P99 | -0.19% |
| load stalls vs P99 | +0.10% |
| shell window vs P98 | +0.70% |
| shell window vs P94 | -0.79% |
P100 is a functional PASS. The small shell-window improvement versus P99 is useful, but the main claim is structural: the old one-port service policy is now visible as instruction/data demand and lower conflict.
Split Service Counters
| side | want cycles | grant cycles | not granted by lower policy |
|---|---|---|---|
| instruction | 99,366,119 | 99,366,119 | 0 |
| data | 59,349,365 | 26,952,797 | 32,396,568 |
| lower shared service | cycles |
|---|---|
| both sides wanted service | 28,051,030 |
| lower handshakes | 66,499,787 |
| lower stall cycles | 59,819,129 |
In this policy, instruction-side traffic always wins when it wants the lower service. Data-side traffic wants service for 59.35M cycles and is not granted for 32.40M cycles. That is the number the next rungs should either reduce or move into a better explained lower-memory bucket.
Request Split
| class | P100 side |
|---|---|
fetch | instruction |
execute_prefetch | instruction |
writeback_prefetch | instruction |
icache_background | instruction |
ptw_fetch | instruction |
load | data |
store | data |
fp_load | data |
fp_store | data |
amo | data |
ptw_lsu | data |
dcache_background | data |
The old lower mem_valid path still exists underneath this split. That
is why P100 can measure lower conflict without also changing cache,
translation, store, AMO, or fence behavior.
Memory Stalls
- instruction fetch 27,346,150 45.7% 46,090,879 req
- data load 10,729,427 17.9% 879,270 req
- data store 11,963,949 20% 217,262 req
- atomic memory op 157,854 0.3% 183,986 req
- page walk for fetch 1,122,943 1.9% 1,116,789 req
- page walk for load/store 1,218,084 2% 1,217,565 req
- other 7,280,722 12.2% 16,794,036 req
Load stalls remain far below P94 thanks to the D-cache work. Fetch stalls remain elevated versus P94 because the frontend still competes with lower shared service whenever instruction-side work misses the near-core structures.
Shell Phases
- kernel banner to /init 117,624,921 53.1%
- /init to shell banner 1,093,314 0.5%
- shell banner to first command 36,125,214 16.3%
- echo command 1,649 0%
- uname -a 2,535,944 1.2%
- ls /bin /usr/share 31,727,690 14.3%
- cat sample file 4,129,183 1.9%
- touch/write/cat/rm /tmp file 11,905,538 5.4%
- 8x ash loop with file I/O 16,217,942 7.3%
- final marker 680 0%
The full BusyBox shell script reaches P100-FILE-OK.
Cycle Shape
- fetch 3.8% 8,346,661
- execute 39% 86,537,001
- mem 12.7% 28,085,707
- walker 2.1% 4,675,381
- writeback 39% 86,512,027
- mul/div 3.5% 7,831,647
No ISA-visible execution state was added. The new work is counters and service classification around the existing memory arbiter.
Hot Functions
- 5.6% of samples (3,662 samples)5.6% 3,662
- 5% of samples (3,253 samples)5% 3,253
- 3.7% of samples (2,385 samples)3.7% 2,385
- 3.3% of samples (2,151 samples)3.3% 2,151
- 2.8% of samples (1,806 samples)2.8% 1,806
- 2.7% of samples (1,760 samples)2.7% 1,760
- 2.6% of samples (1,688 samples)2.6% 1,688
- 1.8% of samples (1,160 samples)1.8% 1,160
- 1.7% of samples (1,102 samples)1.7% 1,102
- 1.4% of samples (895 samples)1.4% 895
- 1.3% of samples (858 samples)1.3% 858
- 1.3% of samples (830 samples)1.3% 830
- 1.1% of samples (738 samples)1.1% 738
- 1% of samples (652 samples)1% 652
- 1% of samples (641 samples)1% 641
- 55.4% of samples (35,999 samples)55.4% 35,999
The software mix is still the BusyBox shell workload. P100 changes how we explain the hardware service path underneath it.
Honest Status
| check | status |
|---|---|
make check-tools | PASS |
| Verilator build | PASS |
| BusyBox userspace/initramfs build | PASS |
| Linux image rebuilt with P100 initramfs | PASS |
BusyBox shell workload reaches P100-FILE-OK | PASS |
| P100 chart data captured | PASS |
| Instruction/data service intent counters | PASS |
| Split physical RAM ports | NOT RUN |
| Split ITLB/DTLB storage | NOT RUN |
| Data-side write buffer with forwarding | NOT RUN |
| Nonblocking miss machinery | NOT RUN |
| LibreLane hardening | NOT RUN |
Next
P101 should keep the P100 service counters and split translation next: separate ITLB/DTLB lookup/refill accounting, with the shared page-table walker still underneath at first. That tells us whether the next block of lost cycles is translation interference or lower memory service.