P45 is the first rung on the Linux climb after the FreeRTOS
milestone. It adds the RISC-V A extension - lr.w/sc.w for atomic
spinlocks, plus all nine amo*.w read-modify-write ops.
Status: Hardened. Directed probe (18 sub-checks) all PASS; FreeRTOS demo runs cleanly. Arch-test sweep:
PASS=56 FAIL=0 NOT RUN=0across rv32i I/M/Zicsr/Zifencei/Zicntr; the upstream framework splits A as Zaamo + Zalrsc (run separately). LibreLane harden completed with 0 violations across DRC/LVS/ antenna/setup/hold/fanout at 29,454 stdcells (+1,375 over P43).
Why A first
Linux uses LR/SC pervasively for spinlocks - it’s a hard requirement
for kernel boot. The A extension is also useful before Linux:
FreeRTOS gets proper atomic critical sections instead of
MIE-disable hacks, libgcc’s __atomic_* builtins use real hardware
ops, and any future SMP work will need this foundation.
A also happens to be the cheapest of the three Phase-1 ISA-breadth rungs (A, C, Zba+Zbb). Single-hart reservation logic is just one register; AMO ops reuse the existing memory FSM with two new states.
What’s in the RTL
Three pieces in src/top.sv:
-
Decode - opcode
0101111, funct5 dispatch over 11 ops, plus the alu/address routing extended to include AMO. Theaq/rlordering bits at instr[26:25] are intentionally ignored on a single-hart in-order core. -
Reservation register -
amo_reservation_{addr,valid}. LR.W records the address and sets valid; SC.W checks-and-clears with conservative invalidation: any plain store or trap entry also clears the reservation. -
AMO FSM - new
S_AMO_LOADandS_AMO_STOREstates implement read-modify-write. The modify is a combinationalamo_compute()function that handles all nine AMO ops including signed/unsigned min/max. AMO writes the OLD memory value to rd; the NEW value goes back to memory.
Plus a misa bump (A-bit on) and runtime -march=rv32ima_zicsr_zifencei.
Two RTL bugs found during bring-up
The directed probe surfaced two real bugs:
- AMO address was zero. The ALU’s effective-address path
didn’t include
is_amo, so the AMO operations indexed memory at address 0 instead ofrs1. The first probe halted with error 40 (AMOSWAP returned the wrong OLD value). Fix: addis_amoto the load/store branch in the alu_y switch, and zeroalu_bfor AMO. - DUT plugin clobbered x5 with t0. RISC-V’s
t0ISx5- so the macro’sli x5, 1; li t0, MMIO_HALToverwrote the test’s pass code. First sweep iteration showed all tests halting withx5 = 0x10001ff8. Fix: uset3/t4as scratch and order theli x5after the MMIO_HALT write.
The first is a forgot-to-update-the-ALU bug. The second is a
register-aliasing bite that’s specific to the legacy x5-as-halt
convention. Both already documented in app/main.c’s probe and the
DUT plugin comments.
Directed probe
The probe runs single-threaded before the FreeRTOS scheduler starts:
AMOSWAP - swap a memory cell with rs2
AMOADD - fetch-and-add
AMOAND - bitwise mask
AMOOR - bitwise set
AMOXOR - bitwise toggle
AMOMIN - signed min
AMOMAX - signed max
AMOMINU - unsigned min
AMOMAXU - unsigned max
LR/SC - happy path: read, then SC with same address (success=0)
LR/SC - SC without prior LR: must fail
LR/SC - SC after intervening plain store: must fail (single-hart
conservative invalidation)
18 sub-checks total. Any failure halts the chip with a stable per-check error code (40-58) so the testbench’s halt_code surfaces exactly which check misbehaved.
arch-test sweep
P45’s sweep covers the standard five batches that P39 ran -
rv32i/I (39 PASS), rv32i/M (8), rv32i/Zicsr (6),
rv32i/Zifencei (1), rv32i/Zicntr (2). Aggregate
PASS=56 FAIL=0 NOT RUN=0.
The DUT plugin’s halt sequence got updated to write
MMIO_HALT (P43’s halt port) in addition to the legacy tohost +
x5=1 convention - the old jal x0, 0 sentinel is gone since P43
and the framework’s act-generated ELFs need a working halt path on
the new chip.
The upstream framework at rev a7c9930 does not expose a
combined rv32i/A directory the way previous extensions did; A is
split into rv32i/Zaamo (9 amo ops) and rv32i/Zalrsc (lr.w +
sc.w). Both batches run via the same plugin; results land in
compliance/results_zaamo and compliance/results_zalrsc once
they complete. Until then, the directed probe (18 sub-checks)
provides the strongest A-extension correctness evidence.
Harden
LibreLane harden ran clean on the first attempt:
| Metric | Value |
|---|---|
| stdcells | 29,454 (+1,375 vs P43) |
| setup_ws (max_ss_100C_1v60) | 5.306 ns |
| hold_ws (max_ss_100C_1v60) | 0.097 ns |
| DRC / LVS / antenna | PASS / PASS / PASS |
| max_slew / max_cap / fanout violations | 0 / 0 / 0 (all corners) |
Slow-corner DRV is slightly worse than P43 (max_ss reports a handful of slew/cap warnings at the cell-flagging threshold) but nothing that violates the design rules. The cell delta is in the expected ~1,400-cell band: 32-bit reservation register, two 32-bit AMO data-path flops, a 5-bit funct5 latch, plus decode and ALU extensions.
What just happened?
The chip now supports atomics. That’s a Linux requirement off the list, FreeRTOS gets cleaner critical sections, and the foundation exists for any future multi-hart story.
The next rung as built is P46 - Zba + Zbb-essentials bitmanip (the original P46 = C extension got bumped because it’s a multi-hour fetch-front-end rewrite, while Zba+Zbb is a one-file additive ALU extension).