orient/arch-test

How RISC-V tests its own chips

The official compliance suite isn’t a runner — it’s a contract. This page walks the contract, what Sail’s role is, and what our DUT plugin actually plugs into.

upstreamriscv-non-isa/riscv-arch-testreference modelSail RISC-Vour harnessscripts/p17_act4_batch.pyscoped resultRV32I + M + Zicsr + Zifencei

People say “we ran the compliance suite” like it’s a button. It isn’t. riscv-arch-test is closer to a contract: a pile of test sources, a trusted reference model, and a set of macros every CPU under test (DUT) is expected to implement. The framework only works if you bring the runner.

This page walks through what the contract is, what generates the “correct” answers, and how our existing harness already implements the DUT side — something we did not always describe accurately on the site.

§ What riscv-arch-test is

The upstream repo lives at riscv-non-isa/riscv-arch-test. What’s inside:

What it isn’t: a runner, a simulator, an FPGA bitstream, or a self-contained app. There is no riscv-arch-test command that “runs the tests on your chip.” You build the runner.

§ The Sail role

Every *.reference_output file in the suite is generated by running the corresponding test on Sail RISC-V — the formal-model implementation curated by the official RISC-V foundation. Sail is a domain-specific language for ISA semantics; the RISC-V Sail spec is effectively the executable version of the architecture manual.

The flow is:

  1. Test author writes the assembly source using the standard macros.
  2. They run that source on the Sail model, which dumps the signature region.
  3. That signature is checked into the repo as *.reference_output.
  4. Anyone who later runs the same test on a real DUT must produce bit-identical bytes in their signature region — or the test fails.

This is why the suite is so strict. The reference isn’t “what the author thinks should happen”; it’s “what the formal spec computes, recorded as bytes.” A DUT that disagrees on a single byte fails the test, even if every visible state otherwise matched.

§ The signature loop

Every test follows the same shape:

test source(assembly + macros) compile(riscv-gcc) run on Sail(reference model) run on DUT(your CPU) .reference_output(bytes) .signature.output(bytes) diff PASS / FAIL
The arch-test loop — every test produces the same kind of artifact and is judged the same way.

The framework’s job stops at “you provide a way to compile and run each ELF, and to produce a signature file that looks like the reference.” Everything else — your simulator, your testbench, your trap handler, your halt mechanism — is yours to wire up.

§ The DUT contract, in detail

A working integration is three small files plus your existing CPU runner:

rvmodel_macros.h — the macros the framework needs, filled in for your CPU:

A linker script (link.ld) — places .text.init at the DUT’s reset address, lays out the rest of the code, data, and .tohost section. For us, text starts at 0x00000000 to match the external memory model.

A test-config YAML — points the framework at the right compiler, reference-model executable, and DUT plugin directory. We pass ours via --act4-dir to the runner script.

Optionally, a signature comparator — a tiny Python or shell script that reads the dumped memory region and diffs against *.reference_output. We don’t use this today; the in-ELF self-check covers our test population.

§ What we do today (and have been doing since P15)

Our compliance harness already runs the upstream framework with a custom DUT plugin. The naming gets confusing because we call the generated ELFs ACT4 ELFs (after the upstream act generator), so it sounds like a parallel project — it isn’t.

our piecewhat it actually is
scripts/p17_act4_batch.pythe runner that invokes the upstream act generator and then runs each ELF on our DUT
projects/38_arch_test_official/arch_test/rvmodel_macros.hthe DUT plugin’s halt + IO macros, formerly buried under projects/26_rv32i_act4_probe/act4/
projects/38_arch_test_official/arch_test/link.ldthe DUT plugin’s linker script, placing text/data/.tohost at 0x00000000
tb_external_mem.sv + top.svthe DUT
upstream tests/rv32i_m/<EXT>/unmodified upstream test sources
upstream config/sail/sail-rv32-max/sail.jsonthe Sail reference config we patch into a P17-compatible memory map
Sail at /tmp/sail-riscv-0.10/the reference model — signatures match its execution

The halt convention is a small custom touch. Our DUT recognises jal x0, 0 (machine code 0x0000006f) as a halt loop. The plugin’s RVMODEL_HALT_PASS writes 1 to tohost (so Sail also stops), sets x5 = 1 (so the testbench can report PASS without diffing a signature), then drops into the halt loop.

The reason results still say scoped ACT4/Sail is not “we wrote our own runner” — it’s that the test population we run is a curated subset (rv32i/I, rv32i/M, rv32i/Zicsr, rv32i/Zifencei), and the in-ELF self-check is one of two acceptable verification modes. The canonical one — diffing a dumped signature region against the framework’s *.reference_output — is something we have started infrastructure for (P38 ships a tb_arch_test.sv signature-dump testbench), but that path is not yet the default.

§ What P38 actually changes

The earlier framing on this page suggested that “switching to the upstream framework” was a future rung. That was wrong. The accurate picture:

§ What the next credibility step would be

The remaining work to push this from “scoped” to something stronger:

  1. Widen the test population. Run rv32i/privilege, rv32i/Zicntr (after P39), rv32i/ExceptionsTraps, and the per-instruction Sm-class sub-suites we currently exclude.
  2. Switch verification to the canonical signature-diff mode. Use tb_arch_test.sv to dump RVMODEL_DATA_BEGIN / _END regions and diff against *.reference_output. This catches classes of bugs the in-ELF self-check ignores by design (e.g. sign-extended write of the right value to the wrong address).
  3. Run a published profile. Today we run the sail-RVI20U32 / sail-rv32-max configs; running a named profile like RVA20U32 (when our extensions reach it) is what compliance language actually means.

None of those are part of P38. They are reasons to keep saying scoped ACT4/Sail on the result tables.