No. 45 / project of 147 on the ladder

A extension (atomics)

introduces — lr.w/sc.w + 9 AMO ops on a single-hart reservation register; first Linux-path rung past the FreeRTOS milestone, hardened with 0 violations across all corners

harden statelast run2026-05-03
area29454 stdcellsμm²
signoff
  • DRCPASS
  • LVSPASS
  • antennaPASS

P45 is the first rung on the Linux climb after the FreeRTOS milestone. It adds the RISC-V A extension - lr.w/sc.w for atomic spinlocks, plus all nine amo*.w read-modify-write ops.

Status: Hardened. Directed probe (18 sub-checks) all PASS; FreeRTOS demo runs cleanly. Arch-test sweep: PASS=56 FAIL=0 NOT RUN=0 across rv32i I/M/Zicsr/Zifencei/Zicntr; the upstream framework splits A as Zaamo + Zalrsc (run separately). LibreLane harden completed with 0 violations across DRC/LVS/ antenna/setup/hold/fanout at 29,454 stdcells (+1,375 over P43).

Why A first

Linux uses LR/SC pervasively for spinlocks - it’s a hard requirement for kernel boot. The A extension is also useful before Linux: FreeRTOS gets proper atomic critical sections instead of MIE-disable hacks, libgcc’s __atomic_* builtins use real hardware ops, and any future SMP work will need this foundation.

A also happens to be the cheapest of the three Phase-1 ISA-breadth rungs (A, C, Zba+Zbb). Single-hart reservation logic is just one register; AMO ops reuse the existing memory FSM with two new states.

What’s in the RTL

Three pieces in src/top.sv:

  1. Decode - opcode 0101111, funct5 dispatch over 11 ops, plus the alu/address routing extended to include AMO. The aq/rl ordering bits at instr[26:25] are intentionally ignored on a single-hart in-order core.

  2. Reservation register - amo_reservation_{addr,valid}. LR.W records the address and sets valid; SC.W checks-and-clears with conservative invalidation: any plain store or trap entry also clears the reservation.

  3. AMO FSM - new S_AMO_LOAD and S_AMO_STORE states implement read-modify-write. The modify is a combinational amo_compute() function that handles all nine AMO ops including signed/unsigned min/max. AMO writes the OLD memory value to rd; the NEW value goes back to memory.

Plus a misa bump (A-bit on) and runtime -march=rv32ima_zicsr_zifencei.

Two RTL bugs found during bring-up

The directed probe surfaced two real bugs:

  1. AMO address was zero. The ALU’s effective-address path didn’t include is_amo, so the AMO operations indexed memory at address 0 instead of rs1. The first probe halted with error 40 (AMOSWAP returned the wrong OLD value). Fix: add is_amo to the load/store branch in the alu_y switch, and zero alu_b for AMO.
  2. DUT plugin clobbered x5 with t0. RISC-V’s t0 IS x5 - so the macro’s li x5, 1; li t0, MMIO_HALT overwrote the test’s pass code. First sweep iteration showed all tests halting with x5 = 0x10001ff8. Fix: use t3/t4 as scratch and order the li x5 after the MMIO_HALT write.

The first is a forgot-to-update-the-ALU bug. The second is a register-aliasing bite that’s specific to the legacy x5-as-halt convention. Both already documented in app/main.c’s probe and the DUT plugin comments.

Directed probe

The probe runs single-threaded before the FreeRTOS scheduler starts:

AMOSWAP - swap a memory cell with rs2
AMOADD  - fetch-and-add
AMOAND  - bitwise mask
AMOOR   - bitwise set
AMOXOR  - bitwise toggle
AMOMIN  - signed min
AMOMAX  - signed max
AMOMINU - unsigned min
AMOMAXU - unsigned max
LR/SC   - happy path: read, then SC with same address (success=0)
LR/SC   - SC without prior LR: must fail
LR/SC   - SC after intervening plain store: must fail (single-hart
          conservative invalidation)

18 sub-checks total. Any failure halts the chip with a stable per-check error code (40-58) so the testbench’s halt_code surfaces exactly which check misbehaved.

arch-test sweep

P45’s sweep covers the standard five batches that P39 ran - rv32i/I (39 PASS), rv32i/M (8), rv32i/Zicsr (6), rv32i/Zifencei (1), rv32i/Zicntr (2). Aggregate PASS=56 FAIL=0 NOT RUN=0.

The DUT plugin’s halt sequence got updated to write MMIO_HALT (P43’s halt port) in addition to the legacy tohost + x5=1 convention - the old jal x0, 0 sentinel is gone since P43 and the framework’s act-generated ELFs need a working halt path on the new chip.

The upstream framework at rev a7c9930 does not expose a combined rv32i/A directory the way previous extensions did; A is split into rv32i/Zaamo (9 amo ops) and rv32i/Zalrsc (lr.w + sc.w). Both batches run via the same plugin; results land in compliance/results_zaamo and compliance/results_zalrsc once they complete. Until then, the directed probe (18 sub-checks) provides the strongest A-extension correctness evidence.

Harden

LibreLane harden ran clean on the first attempt:

MetricValue
stdcells29,454 (+1,375 vs P43)
setup_ws (max_ss_100C_1v60)5.306 ns
hold_ws (max_ss_100C_1v60)0.097 ns
DRC / LVS / antennaPASS / PASS / PASS
max_slew / max_cap / fanout violations0 / 0 / 0 (all corners)

Slow-corner DRV is slightly worse than P43 (max_ss reports a handful of slew/cap warnings at the cell-flagging threshold) but nothing that violates the design rules. The cell delta is in the expected ~1,400-cell band: 32-bit reservation register, two 32-bit AMO data-path flops, a 5-bit funct5 latch, plus decode and ALU extensions.

What just happened?

The chip now supports atomics. That’s a Linux requirement off the list, FreeRTOS gets cleaner critical sections, and the foundation exists for any future multi-hart story.

The next rung as built is P46 - Zba + Zbb-essentials bitmanip (the original P46 = C extension got bumped because it’s a multi-hour fetch-front-end rewrite, while Zba+Zbb is a one-file additive ALU extension).