P41 compiled. P42 runs. Three tasks plus FreeRTOS’s idle task,
queue-based producer/consumer, timer-driven preemption, watcher with
a 200-tick delay. Expected UART output SabcdefghD arrives at 5.1M
clocks.
The actual bring-up was less smooth than that summary suggests. Two real bugs cost most of an hour to diagnose.
Bug 1: iverilog hangs on a 256 KiB byte array
First cut of the testbench had an inline logic [7:0] mem [0:262143]
with always @* blocks computing rdata from byte slices. iverilog’s
elaborator went to 99% CPU for 11 minutes and never produced output.
The fix is to use the same packed-word memory model P17 has been
using since the external-memory rung: logic [31:0] mem [0:WORDS-1]
with explicit byte-lane masking on writes. iverilog handles that
shape fine because it doesn’t need to compute per-byte sensitivity.
This is a reminder: iverilog is still the right tool for this repo, but its sensitivity-list machinery has scaling limits. Big test memories want word-packed.
Bug 2: gcc, FreeRTOS, and our halt sentinel walk into a bar
This was the real one. After fixing the iverilog hang, the demo
printed S and a and then went silent, eventually hitting our
8M-cycle test budget.
The chip had halted (halted == 1, x5 == 0) but at no point did
any of our deliberate halt paths run. So who emitted 0x0000006f?
prvIdleTask. Disassembly:
00000304 <prvIdleTask>:
304: auipc a3, 0x5
308: addi a3, a3, -388 # &pxReadyTasksLists[0]
30c: li a4, 1
310: j 318
314: ecall # taskYIELD()
318: lw a5, 0(a3) # load idle-list count
31c: bltu a4, a5, 314 # if (1 < count) yield
320: 0000006f # j .
That j . at 0x320 is gcc’s noreturn safety net for the for(;;)
loop. Reaching it is supposed to be impossible. Yet our chip
halts there.
Three ingredients combine:
- Our DUT halts on the instruction
0x0000006f(jal x0, 0). I set this convention up in P09 for directed-runtime halts. - gcc -O2 emits
j .(=0x0000006f) at the end of anyfor (;;)function where the loop body has no memory-observable effect. - FreeRTOS’s
portYIELD()is__asm volatile ("ecall")with no"memory"clobber. From gcc’s perspective, ecall doesn’t touch any C-visible memory.
In prvIdleTask, the loop body is “load ready-list count; if > 1
yield.” With no memory clobber on the yield, gcc proves the loaded
count is loop-invariant, hoists the load out, and treats the empty
loop body as unreachable past the first iteration. The function
“continues” via a j . epilogue that our chip mistakes for a halt.
The fix in P42 is software-only: configUSE_IDLE_HOOK = 1 plus a
one-instruction vApplicationIdleHook that does __asm__ volatile("nop"). The function call inside the for-loop body
prevents the collapse-to-j . pattern, and idle keeps running
forever (correctly) on a real loop.
After the fix:
UART tx[0..9] = S a b c d e f g h D
DONE: P42 saw expected UART sequence after 5100726 clocks
PASS: P42 FreeRTOS multi-task demo complete.
What this means for future rungs
The right long-term fix is not an idle hook. It’s a harder halt
sentinel in the DUT itself. jal x0, 0 is too easy for the
toolchain to emit by accident. Better candidates: a specific
ebreak plus a magic CSR write, or a write to a reserved MMIO halt
port.
That’s a real RTL-side change, and a future rung. P42 stays
software-only and documents the collision so the next person reading
the chip’s is_halt_loop logic at least knows why we leaned on an
idle hook.
What FreeRTOS just proved
End-to-end on our hardware (in simulation, but real hardware shape):
- xPortStartFirstTask drops into a task via
mret - ecall → context switch round-trips correctly
- timer interrupt fires at the right cycle count
- xTaskIncrementTick and vTaskSwitchContext drive preemption
- queues block and unblock through the right list operations
mtime/mtimecmp32-bit MMIO works as the tick source
That’s a working RTOS, not a “FreeRTOS-shaped binary that links.”
The next rung is P43: harden this image and produce a GDS. Same RTL, same scheduler, but with a final layout you could (in principle) tape out.