P102 tried to add the next obvious data-side piece: a one-entry store buffer inside the core, after translation and next to the D-cache.
It did not pass. That is the point of this page.
Result
| check | result |
|---|---|
| Verilator build | PASS |
| Linux reaches first printk | PASS |
Linux reaches /init | PASS |
| BusyBox shell prompt | FAIL |
| Hardened layout | NOT RUN |
The failing run reaches BusyBox as PID 1, then panics:
Run /init as init process
init[1]: unhandled signal 11 code 0x1 at 0xffffff88 in busybox[a9d42,10000+117000]
epc : 000b9d42
badaddr: ffffff88 cause: 0000000d
Kernel panic - not syncing: Attempted to kill init
At 0x000b9d42, BusyBox is doing:
lw a4,-120(s0)
The trap dump has s0 = 0, so the bad address is exactly
0 - 120 = 0xffffff88.
Store-Buffer Counters At Failure
| counter | value |
|---|---|
| post-load cycles to stop | 118,576,193 |
| Linux version milestone | 628,210 |
/init milestone | 117,349,113 |
| kernel panic milestone | 118,576,336 |
| accepted stores | 79 |
| drained stores | 79 |
| forwards | 0 |
| valid cycles | 79 |
| full-wait cycles | 0 |
| order-wait cycles | 0 |
| drain stalls | 0 |
That is a small, crisp failure. Only 79 user-mode stores enter the buffer before BusyBox loses a frame/base register.
What We Tried
The first version let early M-mode stores use the buffer. That broke the stage-0 handoff and parked the core at PC zero.
The second version drained before privilege boundaries and satp, but
still allowed translated S-mode kernel stores. Linux reached the kernel
mapping, then parked before the first printk.
The third version limited acceptance to translated user-mode aligned
SW. Linux booted and launched /init, but BusyBox faulted before the
prompt.
The final conservative version drains before the next fetch and invalidates D-cache state instead of publishing buffered store data optimistically. The same BusyBox fault remains.
Why This Matters
This is the right kind of failure for the architecture arc. A store buffer is not just a queue. It has to define precise visibility against:
- later loads
- fetch and prefetch fast paths
- timer interrupts
- synchronous traps
ecalland privilege returnsatpandsfence.vma- D-cache fill/update policy
P102 proves the current core does not yet have that contract cleanly enough. The next step should be a trace rung, not another speculative performance feature.
Next
P103 did the focused debug rung.
It found that P102 cleared the store buffer on mem_ready even when
prefetch, not the buffered store, had won the memory grant. The repaired
contract is request plus store grant plus ready, and the BusyBox shell
smoke passes again.