No. 100 / project of 147 on the ladder

Split I/D service

introduces — instruction/data service intent counters; lower shared conflict accounting; Harvard boundary evidence

harden statelast run2026-05-05
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P100 is the first measured Harvard-service rung. It keeps the ISA-visible behavior intact, splits the memory clients into instruction-side and data-side service intentions, and counts where the old lower shared memory policy still serializes them.

The short version: this is not two physical RAM ports yet. It is the boundary and the counters that make the next split honest.

Result

metricP94 arbiterP98 throttleP99 mapP100 split service
post-load cycles222,459,202221,452,591222,509,604221,990,140
shell window cycles67,050,37466,055,34566,998,69866,518,626
retired instructions86,664,08986,329,98386,648,69386,512,027
CPI2.56692.56522.56802.5660
memory stall cycles60,032,32959,683,33859,928,27859,819,129
fetch stall cycles23,549,35927,286,52627,399,25327,346,150
load stall cycles14,632,99210,697,96210,718,66110,729,427
comparisonresult
shell window vs P99-0.72%
post-load cycles vs P99-0.23%
memory stalls vs P99-0.18%
fetch stalls vs P99-0.19%
load stalls vs P99+0.10%
shell window vs P98+0.70%
shell window vs P94-0.79%

P100 is a functional PASS. The small shell-window improvement versus P99 is useful, but the main claim is structural: the old one-port service policy is now visible as instruction/data demand and lower conflict.

Split Service Counters

sidewant cyclesgrant cyclesnot granted by lower policy
instruction99,366,11999,366,1190
data59,349,36526,952,79732,396,568
lower shared servicecycles
both sides wanted service28,051,030
lower handshakes66,499,787
lower stall cycles59,819,129

In this policy, instruction-side traffic always wins when it wants the lower service. Data-side traffic wants service for 59.35M cycles and is not granted for 32.40M cycles. That is the number the next rungs should either reduce or move into a better explained lower-memory bucket.

Request Split

classP100 side
fetchinstruction
execute_prefetchinstruction
writeback_prefetchinstruction
icache_backgroundinstruction
ptw_fetchinstruction
loaddata
storedata
fp_loaddata
fp_storedata
amodata
ptw_lsudata
dcache_backgrounddata

The old lower mem_valid path still exists underneath this split. That is why P100 can measure lower conflict without also changing cache, translation, store, AMO, or fence behavior.

Memory Stalls

memory stalls label P100 split I/D service workload stalls 59,819,129 handshakes 66,499,787
  1. instruction fetch 27,346,150 45.7% 46,090,879 req
  2. data load 10,729,427 17.9% 879,270 req
  3. data store 11,963,949 20% 217,262 req
  4. atomic memory op 157,854 0.3% 183,986 req
  5. page walk for fetch 1,122,943 1.9% 1,116,789 req
  6. page walk for load/store 1,218,084 2% 1,217,565 req
  7. other 7,280,722 12.2% 16,794,036 req

Load stalls remain far below P94 thanks to the D-cache work. Fetch stalls remain elevated versus P94 because the frontend still competes with lower shared service whenever instruction-side work misses the near-core structures.

Shell Phases

shell phases label P100 shell workload cycles 221,990,140 cpi 2.57
  1. kernel banner to /init 117,624,921 53.1%
  2. /init to shell banner 1,093,314 0.5%
  3. shell banner to first command 36,125,214 16.3%
  4. echo command 1,649 0%
  5. uname -a 2,535,944 1.2%
  6. ls /bin /usr/share 31,727,690 14.3%
  7. cat sample file 4,129,183 1.9%
  8. touch/write/cat/rm /tmp file 11,905,538 5.4%
  9. 8x ash loop with file I/O 16,217,942 7.3%
  10. final marker 680 0%

The full BusyBox shell script reaches P100-FILE-OK.

Cycle Shape

state breakdown label P100 split I/D service workload cycles 221,990,140 cpi 2.57
  1. fetch 3.8% 8,346,661
  2. execute 39% 86,537,001
  3. mem 12.7% 28,085,707
  4. walker 2.1% 4,675,381
  5. writeback 39% 86,512,027
  6. mul/div 3.5% 7,831,647

No ISA-visible execution state was added. The new work is counters and service classification around the existing memory arbiter.

Hot Functions

hot functions label P100 BusyBox shell symbols samples 64,960 period every 1,024 cycles
  1. printf_core busybox
    5.6% 3,662
  2. memset kernel
    5% 3,253
  3. memcpy busybox
    3.7% 2,385
  4. vruntime_eligible kernel
    3.3% 2,151
  5. blake2s_compress_generic kernel
    2.8% 1,806
  6. memcpy kernel
    2.7% 1,760
  7. __fwritex busybox
    2.6% 1,688
  8. handle_exception kernel
    1.8% 1,160
  9. unmap_page_range kernel
    1.7% 1,102
  10. avg_vruntime kernel
    1.4% 895
  11. n_tty_write kernel
    1.3% 858
  12. memset busybox
    1.3% 830
  13. ret_from_exception kernel
    1.1% 738
  14. do_trap_ecall_u kernel
    1% 652
  15. next_uptodate_folio kernel
    1% 641
  16. (remaining) remaining
    55.4% 35,999

The software mix is still the BusyBox shell workload. P100 changes how we explain the hardware service path underneath it.

Honest Status

checkstatus
make check-toolsPASS
Verilator buildPASS
BusyBox userspace/initramfs buildPASS
Linux image rebuilt with P100 initramfsPASS
BusyBox shell workload reaches P100-FILE-OKPASS
P100 chart data capturedPASS
Instruction/data service intent countersPASS
Split physical RAM portsNOT RUN
Split ITLB/DTLB storageNOT RUN
Data-side write buffer with forwardingNOT RUN
Nonblocking miss machineryNOT RUN
LibreLane hardeningNOT RUN

Next

P101 should keep the P100 service counters and split translation next: separate ITLB/DTLB lookup/refill accounting, with the shared page-table walker still underneath at first. That tells us whether the next block of lost cycles is translation interference or lower memory service.