P113 keeps the P112 queue but gates aux-load issue behind frontend and
D-cache pressure checks.
Result
| check | result |
|---|
| Verilator build | PASS |
BusyBox shell workload reaches P113-FILE-OK | PASS |
| Aux-load policy counters | PASS |
| Speedup against P112 | PASS |
| Speedup against P110 | FAIL |
| Hardened layout | NOT RUN |
| metric | P112 queued load aux | P113 load policy |
|---|
| post-load cycles | 227,966,087 | 218,189,372 |
| shell window cycles | 71,950,542 | 64,307,797 |
| retired instructions | 88,125,232 | 86,158,953 |
| CPI | 2.5868 | 2.5324 |
| S_MEM cycles | 32,192,993 | 27,674,926 |
Policy
| counter | value |
|---|
| aux-load candidates | 5,394,095 |
| aux-load issues | 0 |
| blocked: no next-PC fetch queue hit | 5,394,095 |
| blocked: D-cache background active | 2,843,612 |
The policy recovered P112’s regression by blocking the queue, but it is
too strict to be the final answer.
memory stalls label P113 load-policy workload stalls 58,328,963 handshakes 64,113,801 - instruction fetch 28,212,291 48.4% 45,152,834 req
- data load 10,350,842 17.7% 560,950 req
- data store 10,899,784 18.7% 76,768 req
- atomic memory op 174,156 0.3% 165,725 req
- page walk for fetch 677,509 1.2% 671,355 req
- page walk for load/store 678,011 1.2% 671,827 req
- other 7,336,370 12.6% 16,814,342 req
shell phases label P113 shell workload cycles 218,189,372 cpi 2.53 - kernel banner to /init 116,719,451 53.7%
- /init to shell banner 1,068,355 0.5%
- shell banner to first command 35,465,702 16.3%
- echo command 1,649 0%
- uname -a 2,403,211 1.1%
- ls /bin /usr/share 31,786,095 14.6%
- cat sample file 2,860,644 1.3%
- touch/write/cat/rm /tmp file 11,258,176 5.2%
- 8x ash loop with file I/O 15,997,342 7.4%
- final marker 680 0%
state breakdown label P113 load-policy workload cycles 218,189,372 cpi 2.53 dominant state
execute
39.5%
86,183,647 cycles
- fetch 3.5% 7,617,467
- execute 39.5% 86,183,647
- mem 12.8% 27,953,768
- walker 1.2% 2,698,702
- writeback 39.5% 86,158,953
- mul/div 3.5% 7,575,119
hot functions label P113 BusyBox shell symbols samples 62,800 period every 1,024 cycles -
printf_core busybox
5.6% of samples (3,537 samples)
5.6% 3,537
-
memset kernel
5.1% of samples (3,212 samples)
5.1% 3,212
-
memcpy busybox
3.7% of samples (2,350 samples)
3.7% 2,350
-
vruntime_eligible kernel
3.3% of samples (2,084 samples)
3.3% 2,084
-
blake2s_compress_generic kernel
2.9% of samples (1,814 samples)
2.9% 1,814
-
__fwritex busybox
2.7% of samples (1,707 samples)
2.7% 1,707
-
memcpy kernel
2.7% of samples (1,698 samples)
2.7% 1,698
-
handle_exception kernel
1.7% of samples (1,090 samples)
1.7% 1,090
-
unmap_page_range kernel
1.6% of samples (1,018 samples)
1.6% 1,018
-
n_tty_write kernel
1.3% of samples (830 samples)
1.3% 830
-
memset busybox
1.3% of samples (808 samples)
1.3% 808
-
avg_vruntime kernel
1.3% of samples (786 samples)
1.3% 786
-
ret_from_exception kernel
1.3% of samples (786 samples)
1.3% 786
-
next_uptodate_folio kernel
1.1% of samples (668 samples)
1.1% 668
-
do_trap_ecall_u kernel
1% of samples (613 samples)
1% 613
-
(remaining) remaining
55.3% of samples (34,719 samples)
55.3% 34,719
kernel (virtual addr) user ELF symbols firmware / pre-relocation tail of distribution