journal 2026-05-03

P42 boots a real FreeRTOS demo

p42freertosschedulerdebugging

P41 compiled. P42 runs. Three tasks plus FreeRTOS’s idle task, queue-based producer/consumer, timer-driven preemption, watcher with a 200-tick delay. Expected UART output SabcdefghD arrives at 5.1M clocks.

The actual bring-up was less smooth than that summary suggests. Two real bugs cost most of an hour to diagnose.

Bug 1: iverilog hangs on a 256 KiB byte array

First cut of the testbench had an inline logic [7:0] mem [0:262143] with always @* blocks computing rdata from byte slices. iverilog’s elaborator went to 99% CPU for 11 minutes and never produced output.

The fix is to use the same packed-word memory model P17 has been using since the external-memory rung: logic [31:0] mem [0:WORDS-1] with explicit byte-lane masking on writes. iverilog handles that shape fine because it doesn’t need to compute per-byte sensitivity.

This is a reminder: iverilog is still the right tool for this repo, but its sensitivity-list machinery has scaling limits. Big test memories want word-packed.

Bug 2: gcc, FreeRTOS, and our halt sentinel walk into a bar

This was the real one. After fixing the iverilog hang, the demo printed S and a and then went silent, eventually hitting our 8M-cycle test budget.

The chip had halted (halted == 1, x5 == 0) but at no point did any of our deliberate halt paths run. So who emitted 0x0000006f?

prvIdleTask. Disassembly:

00000304 <prvIdleTask>:
     304:  auipc a3, 0x5
     308:  addi  a3, a3, -388     # &pxReadyTasksLists[0]
     30c:  li    a4, 1
     310:  j     318
     314:  ecall                  # taskYIELD()
     318:  lw    a5, 0(a3)        # load idle-list count
     31c:  bltu  a4, a5, 314      # if (1 < count) yield
     320:  0000006f               # j .

That j . at 0x320 is gcc’s noreturn safety net for the for(;;) loop. Reaching it is supposed to be impossible. Yet our chip halts there.

Three ingredients combine:

  1. Our DUT halts on the instruction 0x0000006f (jal x0, 0). I set this convention up in P09 for directed-runtime halts.
  2. gcc -O2 emits j . (= 0x0000006f) at the end of any for (;;) function where the loop body has no memory-observable effect.
  3. FreeRTOS’s portYIELD() is __asm volatile ("ecall") with no "memory" clobber. From gcc’s perspective, ecall doesn’t touch any C-visible memory.

In prvIdleTask, the loop body is “load ready-list count; if > 1 yield.” With no memory clobber on the yield, gcc proves the loaded count is loop-invariant, hoists the load out, and treats the empty loop body as unreachable past the first iteration. The function “continues” via a j . epilogue that our chip mistakes for a halt.

The fix in P42 is software-only: configUSE_IDLE_HOOK = 1 plus a one-instruction vApplicationIdleHook that does __asm__ volatile("nop"). The function call inside the for-loop body prevents the collapse-to-j . pattern, and idle keeps running forever (correctly) on a real loop.

After the fix:

UART tx[0..9] = S a b c d e f g h D
DONE: P42 saw expected UART sequence after 5100726 clocks
PASS: P42 FreeRTOS multi-task demo complete.

What this means for future rungs

The right long-term fix is not an idle hook. It’s a harder halt sentinel in the DUT itself. jal x0, 0 is too easy for the toolchain to emit by accident. Better candidates: a specific ebreak plus a magic CSR write, or a write to a reserved MMIO halt port.

That’s a real RTL-side change, and a future rung. P42 stays software-only and documents the collision so the next person reading the chip’s is_halt_loop logic at least knows why we leaned on an idle hook.

What FreeRTOS just proved

End-to-end on our hardware (in simulation, but real hardware shape):

That’s a working RTOS, not a “FreeRTOS-shaped binary that links.”

The next rung is P43: harden this image and produce a GDS. Same RTL, same scheduler, but with a final layout you could (in principle) tape out.