P87 tested the next obvious idea from the BusyBox symbol profile:
stop sending shell PTY output back through /dev/console.
The bridge still gives ash a real Linux PTY. Host input still enters
through the existing MMIO FIFO. The change is only on output: when
/dev/mem mapping succeeds, console_sh writes bytes directly to
MMIO_UART_DATA.
The shell smoke passed. Result:
| metric | P86 | P87 | delta |
|---|---|---|---|
| post-load cycles | 223,777,049 | 222,825,777 | -0.43% |
| shell window cycles | 68,361,945 | 67,266,772 | -1.60% |
n_tty_write samples | 1,613 | 847 | -47.49% |
hvc_sbi_tty_put samples | 264 | 0 | -100.00% |
sbi_console_putchar samples | 173 | 0 | -100.00% |
So the theory was right, but the payoff was bounded. The HVC/SBI output path was visible overhead. Removing it cleans up the profile and buys a small cycle improvement. The remaining workload is still dominated by BusyBox formatting, memory behavior, scheduler/filesystem paths, and the general cost of running Linux shell machinery on this tiny core.