The plan for P68 was simple: take AtomVM
upstream, vendor it, write a platform shim modeled on its RP2 port,
and run hello-world Erlang on our chip. Reality had a different
plan: link against newlib and the very first call into libc trapped
at PC 0x95610 on instruction 0x1141 — a c.addi. A compressed
instruction. The nixpkgs riscv32-none-elf toolchain ships
libc/libgcc compiled with -march=rv32imafdc and there is no
practical way to get a no-C build without forking and rebuilding the
toolchain. So P68 became two things: AtomVM bring-up and the C
extension our chip needed to link against the real world.
Headline: AtomVM hello-world runs on the chip. The BEAM VM loads
hello.beam, executeshello:start/0, and emits"hello from atomvm on a homemade chip"over UART. Total run: ~4.7M cycles, 1.5 seconds Verilator wall time.
AtomVM on a homemade RV32 chip (P68 bare-metal port)
Starting AtomVM revision 0.8.0-dev+git.7a57441
Found startup beam: hello.beam
"hello from atomvm on a homemade chip"
AtomVM exited
Why the C extension at all
Without C, every link against newlib means stepping through compiled object files and rejecting any that contain compressed encodings. The math doesn’t work: all of newlib is built that way in the standard toolchains. We had three options:
- Maintain a parallel no-C toolchain build (months of work and a permanent maintenance cost).
- Replace every call into libc with a hand-rolled equivalent (also months — newlib has hundreds of internal helpers).
- Add C to the chip (~a day of careful RTL).
Option 3 is also the right answer for the broader chip story: real RV32 software in 2026 is ~95% compressed encodings on common code paths, and synthesizing without C means we’d be designing the chip around an artificial constraint that no real toolchain produces.
What “C support” means in this chip
Three RTL changes, all in the existing CPU module (no new states):
-
A 16-bit → 32-bit decompressor (
rvc_decompress, combinational function intop.sv). Covers all of RV32C — ADDI/LI/LUI, LW/SW + LWSP/SWSP + ADDI4SPN, J/JAL/JR/JALR, BEQZ/BNEZ, MV/ADD/SUB/AND/OR/XOR/ANDI, SLLI/SRLI/SRAI, EBREAK. Unsupported / RV64-only / FP-flavoured encodings return32'h0, which the existing illegal-opcode decode catches asMCAUSE_ILLEGAL_INSTR. ~120 lines of mostly-mechanical table-driven code. -
Straddle-aware fetch path. Instructions can now sit at any 2-byte boundary. A 32-bit instruction at
pc[1] == 1straddles two memory words. The fetch state grows two regs:fetch_straddle_q(1 = we have the low half from the previous word) andfetch_straddle_lo_q[15:0](the stashed low 16 bits). When the chip detects a straddle (pc[1]=1 and the low-half candidate looks like a 32-bit insn) it stashes the low half, doesn’t advancepc, and the next cycle’s S_FETCH issues a fetch at(pc & ~3) + 4to grab the high half.if (fetch_straddle_q) begin ir <= {mem_rdata[15:0], fetch_straddle_lo_q}; is_compressed_q <= 1'b0; fetch_straddle_q <= 1'b0; state <= S_EXECUTE; end else if (pc[1] && mem_rdata[17:16] == 2'b11) begin fetch_straddle_lo_q <= mem_rdata[31:16]; fetch_straddle_q <= 1'b1; // stay in S_FETCH; mem_addr drive picks up the +4 path end else if (pc[1] ? mem_rdata[17:16] == 2'b11 : mem_rdata[1:0] == 2'b11) begin ir <= mem_rdata; // 32-bit insn at pc[1]==0 is_compressed_q <= 1'b0; state <= S_EXECUTE; end else begin ir <= rvc_decompress(pc[1] ? mem_rdata[31:16] : mem_rdata[15:0]); is_compressed_q <= 1'b1; state <= S_EXECUTE; end -
Compressed-aware PC advance and link value.
next_pcdefaults topc + (is_compressed_q ? 2 : 4), so a compressed instruction advances PC by two bytes and a 32-bit one by four. The same conditional drives the JAL/JALR link value (alu_b = is_compressed_q ? 32'd2 : 32'd4) — ac.jalrhas to pushpc + 2intora, notpc + 4, or the callee returns to the middle of the calling instruction and traps. We learned that one the hard way on the first run.
fetch_aligned relaxes from pc[1:0] == 2'b00 to pc[0] == 0
to allow halfword-aligned PCs. The misalignment trap fires only
on pc[0] != 0 now.
What broke during bring-up
- First run (no C in the chip): trapped at the very first
c.addi(0x1141) inside libc’sfputs, ~3000 cycles in. PC0x95610, mtval = the compressed insn we couldn’t decode. - Second run (C in the chip, but JAL/JALR link value still
hardcoded
pc + 4): trapped at0x745b6— exactly two bytes past the next instruction after ac.jalr. The callee returned to the middle of the next instruction. Diagnostic PC dump told us instantly that the link had been wrong. - Third run (link fixed): chip ran 17M+ cycles cleanly through
libc, but newlib’s
__sfvwrite_rchunking emitted bad chunk lengths in our environment (no__libc_init_arraycall, FILE buffers never properly initialised). Workaround: replace the fputs calls in our shim with direct UART writes (p68_uart_puts), bypassing newlib’s stdio entirely. Banner emits cleanly. - Fourth run (banner via direct UART): AtomVM
globalcontext_newreturned NULL because newlib’s malloc dragged in__malloc_lockand friends that require runtime init we don’t do. Workaround: provide our own bump-allocatormalloc/free/calloc/reallocinport/p68_libc.cthat walks_sbrk’s heap window directly. AtomVM’s hello-world malloc-heavy startup needs ~25 calls totalling under 1 MiB — trivial for an 8 MiB heap window. - Fifth run (custom malloc): trapped on
STORE_ADDR_MISALIGNEDat PC0xb54— malloc returned a 7-aligned pointer where it should have been 8-aligned. Instrumented the allocator ([m:size=ret]markers); confirmed malloc returned0x95167for the first call when it should have returned0x95160. Decompressor bug:c.andi(andc.srli/c.srai/c.sub/c.xor/c.or/c.and) put the destination register field in the wrong slot — usedrd_p = {2'b01, ci[4:2]}(the rs2’ position) instead ofrs1_p = {2'b01, ci[9:7]}(the rd’ position). The AND ona2 = 0x95167was being applied to a different register, soa2retained its 7-aligned value and the next allocation inherited that misalignment. Fixed by swappingrd_pforrs1_pin those four sub-cases. Boot fully succeeded on the next run.
The AtomVM build flow
End-to-end reproducible from clean. Drops into the nix devshell
(pkgs.erlang, pkgs.cmake, pkgs.ninja, pkgs.gperf,
pkgs.rebar3, pkgs.pkgsCross.riscv32-embedded.buildPackages.gcc)
which already had previous-rung agent work behind it.
nix develop # shell with all tools
cd projects/68_atomvm_port/vendor/AtomVM/tools/packbeam
rebar3 escriptize # builds packbeam escript
cd ../../.. # back to projects/68_atomvm_port
cd test && make all # libAtomVM + hello.beam + main.avm + final ELF + boot blob
make verilator-run # ~13 sec wall to first UART output
Pinned commit: AtomVM 7b282159 (2025-W16 main).
What’s next
-
A bigger Erlang demo — blocked on a soft-float toolchain. Tried a two-process ping-pong (
projects/68_atomvm_port/erlang/pingpong.erl) to exercise the scheduler. Got partway:pingpong:start/0runs, prints thestartingatom, then traps onfsd fs0, 168(sp)— a hardware FP store — inside libAtomVM’sterm_compareprologue. Our chip has no FPU, but the only RV32 bare-metal toolchain in nixpkgs (pkgsCross.riscv32-embedded) ships single-multilibrv32imafdc/ilp32donly. The compiler uses callee-saved FP registers as scratch spill space even when no float arithmetic appears in the source. Hello-world doesn’t trip it because the inlined functions never need an FP-register spill; deeper code paths do.Three exits from this trap:
- Soft-float multilib — nixpkgs the toolchain with
-march=rv32imac -mabi=ilp32libgcc/newlib variants (chunky packaging work). - Trap-and-emulate FP — illegal-instruction trap handler that decodes and emulates F/D ops in software (real chunk of trap-handler RTL/firmware).
- Add F to the chip — biggest scope, but the cleanest answer long-term and a fun project rung in its own right.
Pick one when there’s energy for it. AtomVM’s hello-world stays green in the meantime.
- Soft-float multilib — nixpkgs the toolchain with
-
Fold C into a trunk core. The optimization-arc rungs P63-P66 don’t have C; this rung does. Future trunk work will either retrofit C onto the pipelined cores or branch a new rung that combines C with the latest pipeline work.
-
Bring back libc properly if we want printf/scanf/etc. The custom bump-malloc + direct-UART-write workarounds are fine for hello-world but won’t carry larger Erlang programs that hit io:format or anything routed through stdio. Either run
__libc_init_arrayand friends from start.S, or vendor a smaller libc (picolibc, llvm-libc) that doesn’t drag the same dependencies.