No. 102 / project of 147 on the ladder

Data-side write buffer with forwarding

introduces — core-local translated store buffer; store-buffer counters; reproducible BusyBox corruption case

harden statelast run2026-05-05
signoff
  • DRCNOT RUN
  • LVSNOT RUN
  • antennaNOT RUN

P102 tried to add the next obvious data-side piece: a one-entry store buffer inside the core, after translation and next to the D-cache.

It did not pass. That is the point of this page.

Result

checkresult
Verilator buildPASS
Linux reaches first printkPASS
Linux reaches /initPASS
BusyBox shell promptFAIL
Hardened layoutNOT RUN

The failing run reaches BusyBox as PID 1, then panics:

Run /init as init process
init[1]: unhandled signal 11 code 0x1 at 0xffffff88 in busybox[a9d42,10000+117000]
epc : 000b9d42
badaddr: ffffff88 cause: 0000000d
Kernel panic - not syncing: Attempted to kill init

At 0x000b9d42, BusyBox is doing:

lw a4,-120(s0)

The trap dump has s0 = 0, so the bad address is exactly 0 - 120 = 0xffffff88.

Store-Buffer Counters At Failure

countervalue
post-load cycles to stop118,576,193
Linux version milestone628,210
/init milestone117,349,113
kernel panic milestone118,576,336
accepted stores79
drained stores79
forwards0
valid cycles79
full-wait cycles0
order-wait cycles0
drain stalls0

That is a small, crisp failure. Only 79 user-mode stores enter the buffer before BusyBox loses a frame/base register.

What We Tried

The first version let early M-mode stores use the buffer. That broke the stage-0 handoff and parked the core at PC zero.

The second version drained before privilege boundaries and satp, but still allowed translated S-mode kernel stores. Linux reached the kernel mapping, then parked before the first printk.

The third version limited acceptance to translated user-mode aligned SW. Linux booted and launched /init, but BusyBox faulted before the prompt.

The final conservative version drains before the next fetch and invalidates D-cache state instead of publishing buffered store data optimistically. The same BusyBox fault remains.

Why This Matters

This is the right kind of failure for the architecture arc. A store buffer is not just a queue. It has to define precise visibility against:

  • later loads
  • fetch and prefetch fast paths
  • timer interrupts
  • synchronous traps
  • ecall and privilege return
  • satp and sfence.vma
  • D-cache fill/update policy

P102 proves the current core does not yet have that contract cleanly enough. The next step should be a trace rung, not another speculative performance feature.

Next

P103 did the focused debug rung. It found that P102 cleared the store buffer on mem_ready even when prefetch, not the buffered store, had won the memory grant. The repaired contract is request plus store grant plus ready, and the BusyBox shell smoke passes again.