No. 87 / project of 147 on the ladder

Direct UART console perf

introduces — direct UART MMIO shell output; HVC/SBI console bypass measurement; P86/P87 shell workload comparison

harden statelast run2026-05-05
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P87 keeps the P86 core and changes the shell bridge. Once BusyBox ash is running on its Linux PTY, console_sh writes PTY output directly to the platform UART MMIO register instead of writing it back through /dev/console.

That bypasses the guest Linux HVC/SBI output path for shell text. Early kernel printk still uses the normal SBI console.

Result

metricP86 /dev/console bridgeP87 direct UART bridgedelta
post-load cycles223,777,049222,825,777-0.43%
shell window cycles68,361,94567,266,772-1.60%
retired instructions87,361,45486,750,479-0.70%
CPI2.56152.5686+0.28%
memory stall cycles88,823,19388,210,458-0.69%
n_tty_write samples1,613847-47.49%
hvc_sbi_tty_put samples2640-100.00%
sbi_console_putchar samples1730-100.00%

The profile moved in the intended direction. The old HVC/SBI output symbols disappear from the shell-window folded profile, and n_tty_write roughly halves. The wall-cycle gain is smaller, which is also useful data: the final console byte path was not the only bottleneck.

Shell Phases

shell phases label P87 shell workload cycles 222,825,777 cpi 2.57
  1. kernel banner to /init 117,614,359 52.9%
  2. /init to shell banner 1,085,555 0.5%
  3. shell banner to first command 36,231,026 16.3%
  4. echo command 1,598 0%
  5. uname -a 2,544,990 1.2%
  6. ls /bin /usr/share 31,752,029 14.3%
  7. cat sample file 4,837,758 2.2%
  8. touch/write/cat/rm /tmp file 10,838,537 4.9%
  9. 8x ash loop with file I/O 16,336,480 7.4%
  10. final marker 955,380 0.4%
phaseP86 cyclesP87 cyclesdelta
shell setup to first command36,090,71936,231,026+0.39%
echo marker20,3761,598-92.16%
uname -a2,512,5912,544,990+1.29%
ls /bin /usr/share34,108,36731,752,029-6.91%
cat sample file3,033,4254,837,758+59.48%
/tmp file create/read/remove12,040,69710,838,537-9.98%
8x ash loop with file I/O16,637,62916,336,480-1.81%

The phase movement is mixed. ls and /tmp work improve, cat gets slower in this single run, and the final marker is noisy. The aggregate shell window is the better number here.

Cycle Shape

state breakdown label P87 direct UART console workload cycles 222,825,777 cpi 2.57
  1. fetch 3.8% 8,359,801
  2. execute 38.9% 86,775,673
  3. mem 12.7% 28,211,557
  4. walker 2.1% 4,774,268
  5. writeback 38.9% 86,750,479
  6. mul/div 3.6% 7,952,283

P87 did not change the core pipeline, so the state chart should look a lot like P86. That is expected. The experiment is about removing a Linux console path, not about reducing page walks or execute/writeback cycles.

Hot Functions

hot functions label P87 BusyBox shell symbols samples 65,690 period every 1,024 cycles
  1. printf_core busybox
    5.5% 3,593
  2. memset kernel
    5% 3,286
  3. memcpy busybox
    3.6% 2,339
  4. vruntime_eligible kernel
    3.5% 2,274
  5. blake2s_compress_generic kernel
    2.8% 1,806
  6. memcpy kernel
    2.7% 1,775
  7. __fwritex busybox
    2.6% 1,699
  8. handle_exception kernel
    1.8% 1,186
  9. unmap_page_range kernel
    1.7% 1,093
  10. avg_vruntime kernel
    1.4% 893
  11. n_tty_write kernel
    1.3% 847
  12. memset busybox
    1.3% 846
  13. ret_from_exception kernel
    1.2% 781
  14. next_uptodate_folio kernel
    1% 684
  15. do_trap_ecall_u kernel
    1% 658
  16. (remaining) remaining
    55.6% 36,532

BusyBox formatting remains the dominant named userspace cost: printf_core, memcpy, and __fwritex are still high. The console driver symbols are much less prominent, which means P87 answered the specific question it asked.

Honest Status

checkstatus
Direct UART MMIO output path in console_shPASS
BusyBox shell workload runsPASS
P86/P87 benchmark comparison stagedPASS
BusyBox-symbolized hot-function profile stagedPASS
LibreLane hardeningNOT RUN

Next

The next feature round should target a bigger remaining cost: memory latency, exception/syscall overhead, or a more serious terminal device model. P87 says console output mattered, but it was not the whole story.