No. 05 / project of 147 on the ladder

Tiny ALU + datapath

introduces — register file, flag register, 2R1W read ports, signed vs unsigned shifts

harden statelast run2026-04-28
cells1,607non-filler
slack2.49ns setup
area12100 (die) / 9774 (core)μm²
signoff
  • DRCPASS
  • LVSPASS
  • antennaPASS

A 12-op ALU with an 8-register file and a flag register (Z N C V) — externally the chip still looks like a combinational box driven by an “instruction” each cycle, but the regs and flags hold their values across cycles. This is the building block project 06 will bolt a state machine on top of to make a tiny CPU.

layout · sky130A x= μm y= μm
drag · scroll to zoom · double-click to fit · 1 1:1 · f fit 110 × 110 μm die · sky130A · 40 MHz target · regfile + ALU
3d · sky130A · z×10
drag · scroll · right-drag pan · double-click recenter · R reset full sky130 stack · z exaggerated 10× · 90k shapes · meshopt-compressed

Clock target dropped from P02/P04’s 100 MHz to 40 MHz (25 ns period). 1,607 cells; 68 flops handle the regfile + ALU result staging

  • flag register. Max-slew has warnings in the extreme corner only.

The first attempt at this project targeted 100 MHz like P02/P04 and missed the slow PVT corner setup by −2.41 ns (the critical path is read-port mux → 12-way ALU op mux → flag-bit logic → writeback in one cycle, ~20 ns at slow corner). 50 MHz still missed by −0.37 ns. 40 MHz lands cleanly with +2.49 ns of slack at the slow corner. P06 will reintroduce pipelining and crank the clock back up — when you split decode and execute into separate cycles, each cycle’s combinational path is half as long.

Architecture

regs[ra] regs[rb] imm b result Z N C V carry-in op, ra, rb, rdimm, use_immwe, flag_we regfileR0..R7 B mux ALU12 ops flag reg obs result
P05's datapath. Each cycle, the instruction inputs select two source registers, route them through the ALU, and conditionally write the result back. The flag register only updates when flag_we is high, so flags persist across instructions — what ADC and SBC read as carry-in.

What’s new vs. P04

  • Register file. regs[0..7] of 8-bit each. R0 is hardcoded to zero (reads as 0, writes ignored) — same convention every RISC uses.
  • Flag register. Z N C V captured into a 4-bit register when flag_we is high. ADC / SBC read C from this register, so the flags persist across instructions.
  • Two read ports + one write port. The ra / rb / rd triple is the standard “2R1W” shape every CPU microarchitecture is built on.
  • Signed vs unsigned shifts. SHR zero-extends, SAR sign-extends. V flag is set on signed overflow for arithmetic ops.

Op encoding

opmnemonicresult
0000ADDrd = a + b
0001SUBrd = a - b
0010ANDrd = a & b
0011ORrd = a | b
0100XORrd = a ^ b
0101SHLrd = a << 1
0110SHRrd = a >> 1 (zero-extend)
0111SARrd = a >>> 1 (sign-extend)
1000MOVrd = a (passthrough)
1001NOTrd = ~a
1010ADCrd = a + b + carry-in
1011SBCrd = a - b - carry-in

b is regs[rb] when use_imm = 0, or imm[7:0] when use_imm = 1.

RTL

projects/05_alu_datapath/src/top.sv system-verilog
// Project 05: tiny ALU / datapath.
//
// One step up from project 01's pure-comb ALU: this one has a *register
// file* and *flag register*, so the ALU result is something you can
// store and feed back into the next operation. Externally it still
// looks like a combinational box driven by an "instruction" each cycle
// (no FSM, no fetch, no PC) — but the regs and flags hold their values
// across cycles, which is what every CPU and most peripherals are
// built out of.
//
// Architecture:
//
//   ┌─────────────┐    ra   ┌──────────┐  a   ┌──────┐ result   ┌────┐
//   │             │────────▶│ regfile  │─────▶│      │─────────▶│    │
//   │   inputs    │    rb   │  R0..R7  │  b   │ ALU  │          │ rd │
//   │             │────────▶│ 8 × 8b   │─────▶│      │ flags    │ we │
//   │  (op, ra,   │    imm  └──────────┘      └──────┘    ┌────▶│    │
//   │   rb, rd,   │   ──── (mux on use_imm) ─────         │     └────┘
//   │   imm, we)  │                                        │
//   └─────────────┘                                  flag_we
//
//                                                     ┌──────────┐
//                                                     │  flags   │
//                                                     │  Z N C V │
//                                                     └──────────┘
//
// Register file:
//   - 8 registers × 8 bits. R0 is hardcoded to zero (reads as 0; writes
//     are silently ignored). RISC convention; very useful as a "throw
//     away" destination and as the "0" operand for MOV-via-ADD.
//   - Asynchronous read on `ra` and `rb`; synchronous write to `rd`
//     when `we` is high.
//
// ALU ops (4-bit `op`):
//   0000 ADD    rd = a + b
//   0001 SUB    rd = a - b
//   0010 AND    rd = a & b
//   0011 OR     rd = a | b
//   0100 XOR    rd = a ^ b
//   0101 SHL    rd = a << 1
//   0110 SHR    rd = a >> 1     (logical)
//   0111 SAR    rd = $signed(a) >>> 1
//   1000 MOV    rd = a          (passthrough, ignores b)
//   1001 NOT    rd = ~a
//   1010 ADC    rd = a + b + carry-in (carry from current flags reg)
//   1011 SBC    rd = a - b - carry-in
//   1100..1111  reserved — treat as MOV
//
// Flags:
//   Z  result == 0
//   N  result[7]                 (sign bit, two's-complement)
//   C  carry-out (ADD/ADC) or borrow-out (SUB/SBC)   ; 0 for logical ops
//   V  signed overflow on ADD/ADC/SUB/SBC            ; 0 for logical ops
//
// The flags register only updates when `flag_we` is high. ADC/SBC
// reads the C flag from the *current* register value, before the
// pending update — i.e., the flags reg holds the value from the last
// flag-writing instruction. (This matches every classical 8-bit CPU.)
//
// Observability:
//   `obs` mirrors register `ra` — set ra=N to peek register N's value
//   without affecting anything. ra is a pure read port, no side effects.
//
// What this project teaches that the earlier ones didn't:
//   - A **register file** (the heart of every CPU's microarchitecture).
//   - **Flag-register state** that lives across cycles.
//   - Multiple read ports + one write port — the standard "2R1W" shape.
//   - Explicit handling of *signed* vs *unsigned* arithmetic (SAR, V).

`default_nettype none

module top (
    input  logic        clk,
    input  logic        rst_n,

    // ---- "instruction" inputs (drive these each cycle) ----
    input  logic [3:0]  op,
    input  logic [2:0]  ra,        // read port A address
    input  logic [2:0]  rb,        // read port B address
    input  logic [2:0]  rd,        // write port address
    input  logic [7:0]  imm,       // immediate (alternative to register B)
    input  logic        use_imm,   // 1 = use imm as B operand, 0 = use reg[rb]
    input  logic        we,        // write-enable for register file
    input  logic        flag_we,   // capture the four flags into the flag register

    // ---- live outputs ----
    output logic [7:0]  result,    // combinational ALU result
    output logic [7:0]  obs,       // mirrors register[ra] (peek any reg)
    output logic [3:0]  flags      // {Z, N, C, V} — registered
);

  // ---- register file ----
  // Eight 8-bit registers. R0 is treated as a constant zero — reads
  // bypass the storage, writes are dropped. We declare storage for
  // all eight to keep the indexing straightforward.
  logic [7:0] regs [0:7];

  wire [7:0] a_data = (ra == 3'd0) ? 8'h00 : regs[ra];
  wire [7:0] b_reg  = (rb == 3'd0) ? 8'h00 : regs[rb];
  wire [7:0] b_data = use_imm ? imm : b_reg;

  // ---- flag register ----
  // {Z, N, C, V}. Updated when flag_we is high.
  logic [3:0] flags_q;

  // ---- ALU (combinational) ----
  // We compute the full set of candidate results in parallel and mux
  // by `op`. ADD/SUB widen to 9 bits so the carry/borrow falls out as
  // bit 8 — same trick project 01 used for its 4-bit add.
  wire [8:0] add_w = {1'b0, a_data} + {1'b0, b_data};
  wire [8:0] sub_w = {1'b0, a_data} - {1'b0, b_data};
  wire       cin   = flags_q[1];      // C bit of the current flag reg
  wire [8:0] adc_w = {1'b0, a_data} + {1'b0, b_data} + {8'h00, cin};
  wire [8:0] sbc_w = {1'b0, a_data} - {1'b0, b_data} - {8'h00, cin};

  // Carry/overflow per op. C and V are 0 for non-arithmetic ops.
  logic [7:0] alu_y;
  logic       c_out;
  logic       v_out;
  always_comb begin
    alu_y = 8'h00;
    c_out = 1'b0;
    v_out = 1'b0;
    unique case (op)
      4'b0000: begin                   // ADD
        alu_y = add_w[7:0];
        c_out = add_w[8];
        v_out = (a_data[7] == b_data[7]) && (alu_y[7] != a_data[7]);
      end
      4'b0001: begin                   // SUB
        alu_y = sub_w[7:0];
        c_out = sub_w[8];              // borrow-out (1 = borrowed)
        v_out = (a_data[7] != b_data[7]) && (alu_y[7] != a_data[7]);
      end
      4'b0010: alu_y =  a_data &  b_data;       // AND
      4'b0011: alu_y =  a_data |  b_data;       // OR
      4'b0100: alu_y =  a_data ^  b_data;       // XOR
      4'b0101: begin                            // SHL
        alu_y = {a_data[6:0], 1'b0};
        c_out = a_data[7];
      end
      4'b0110: begin                            // SHR (logical)
        alu_y = {1'b0, a_data[7:1]};
        c_out = a_data[0];
      end
      4'b0111: begin                            // SAR (arithmetic right)
        alu_y = {a_data[7], a_data[7:1]};
        c_out = a_data[0];
      end
      4'b1000: alu_y = a_data;                  // MOV (passthrough A)
      4'b1001: alu_y = ~a_data;                 // NOT
      4'b1010: begin                            // ADC
        alu_y = adc_w[7:0];
        c_out = adc_w[8];
        v_out = (a_data[7] == b_data[7]) && (alu_y[7] != a_data[7]);
      end
      4'b1011: begin                            // SBC
        alu_y = sbc_w[7:0];
        c_out = sbc_w[8];
        v_out = (a_data[7] != b_data[7]) && (alu_y[7] != a_data[7]);
      end
      default: alu_y = a_data;                  // reserved → MOV
    endcase
  end

  wire z_out = (alu_y == 8'h00);
  wire n_out =  alu_y[7];

  // ---- sequential storage ----
  integer i;
  always_ff @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      for (i = 0; i < 8; i = i + 1) regs[i] <= 8'h00;
      flags_q <= 4'h0;
    end else begin
      if (we && rd != 3'd0) regs[rd] <= alu_y;
      if (flag_we)          flags_q  <= {z_out, n_out, c_out, v_out};
    end
  end

  // ---- outputs ----
  assign result = alu_y;
  assign obs    = a_data;
  assign flags  = flags_q;

endmodule

`default_nettype wire

Testbench

The verifying TB exercises every op in both register-register and register-immediate forms, checks flag bits on representative cases (carry-out, signed overflow, zero, negative), and verifies the gating behavior of we / flag_we.

projects/05_alu_datapath/test/tb.sv system-verilog
// Project 05 testbench — verifying TB for the tiny ALU/datapath.
//
// We drive an "instruction" each cycle — set op/ra/rb/rd/imm/use_imm/
// we/flag_we, advance the clock, then check `result`, `obs`, `flags`,
// and (if a write occurred) the destination register on the next
// cycle. Each ALU op is exercised with both register-register and
// register-immediate forms; flag bits are checked on a representative
// case for each arithmetic op.
//
// Style: a small set of helper tasks pretend we have an instruction
// set, then a series of test_*() tasks each cover one operation. The
// reference values are computed inline in plain SV with the same
// widening trick the DUT uses, so bugs in the reference are caught by
// disagreement with the explicit constants in the asserts.

`timescale 1ns/1ps
`default_nettype none

module tb;

  // ---- 100 MHz chip clock ----
  logic clk = 0;
  always #5 clk = ~clk;

  // ---- DUT I/O ----
  logic        rst_n;
  logic [3:0]  op;
  logic [2:0]  ra;
  logic [2:0]  rb;
  logic [2:0]  rd;
  logic [7:0]  imm;
  logic        use_imm;
  logic        we;
  logic        flag_we;
  logic [7:0]  result;
  logic [7:0]  obs;
  logic [3:0]  flags;

  top dut (
    .clk     (clk),
    .rst_n   (rst_n),
    .op      (op),
    .ra      (ra),
    .rb      (rb),
    .rd      (rd),
    .imm     (imm),
    .use_imm (use_imm),
    .we      (we),
    .flag_we (flag_we),
    .result  (result),
    .obs     (obs),
    .flags   (flags)
  );

  // ALU op constants for readability.
  localparam logic [3:0] OP_ADD = 4'b0000;
  localparam logic [3:0] OP_SUB = 4'b0001;
  localparam logic [3:0] OP_AND = 4'b0010;
  localparam logic [3:0] OP_OR  = 4'b0011;
  localparam logic [3:0] OP_XOR = 4'b0100;
  localparam logic [3:0] OP_SHL = 4'b0101;
  localparam logic [3:0] OP_SHR = 4'b0110;
  localparam logic [3:0] OP_SAR = 4'b0111;
  localparam logic [3:0] OP_MOV = 4'b1000;
  localparam logic [3:0] OP_NOT = 4'b1001;
  localparam logic [3:0] OP_ADC = 4'b1010;
  localparam logic [3:0] OP_SBC = 4'b1011;

  int errors = 0;

  // ---- helpers --------------------------------------------------------

  // One-cycle "step": present inputs, settle through the next clk edge,
  // hold for half a cycle so combinational outputs are stable for checking.
  task automatic step(
    input logic [3:0] op_i,
    input logic [2:0] ra_i,
    input logic [2:0] rb_i,
    input logic [2:0] rd_i,
    input logic [7:0] imm_i,
    input logic       use_imm_i,
    input logic       we_i,
    input logic       flag_we_i
  );
    begin
      @(negedge clk);
      op      = op_i;
      ra      = ra_i;
      rb      = rb_i;
      rd      = rd_i;
      imm     = imm_i;
      use_imm = use_imm_i;
      we      = we_i;
      flag_we = flag_we_i;
      @(posedge clk);
      // small settle so result/flags reflect the post-edge state
      #1;
    end
  endtask

  // Loadi: rd = imm. Implemented as ADD R0 + imm (R0 reads as zero, so
  // the result is just imm). Asserts we; doesn't touch flags.
  // Note: using MOV here wouldn't work — MOV is a-passthrough, not b.
  task automatic loadi(input logic [2:0] rd_i, input logic [7:0] val);
    step(OP_ADD, 3'd0, 3'd0, rd_i, val, 1'b1, 1'b1, 1'b0);
  endtask

  // Peek register N onto `obs` without writing or computing flags.
  task automatic peek(input logic [2:0] r);
    step(OP_MOV, r, 3'd0, 3'd0, 8'h00, 1'b0, 1'b0, 1'b0);
  endtask

  // Issue an ALU op writing rd (and optionally updating flags), with
  // the b operand coming from register rb_i.
  task automatic alu_rr(
    input logic [3:0] op_i,
    input logic [2:0] rd_i,
    input logic [2:0] ra_i,
    input logic [2:0] rb_i,
    input logic       cap_flags
  );
    step(op_i, ra_i, rb_i, rd_i, 8'h00, 1'b0, 1'b1, cap_flags);
  endtask

  // Same, but with an immediate b operand.
  task automatic alu_ri(
    input logic [3:0] op_i,
    input logic [2:0] rd_i,
    input logic [2:0] ra_i,
    input logic [7:0] imm_i,
    input logic       cap_flags
  );
    step(op_i, ra_i, 3'd0, rd_i, imm_i, 1'b1, 1'b1, cap_flags);
  endtask

  // Read register r and assert its value matches expected. Uses peek()
  // to put it on `obs`, then checks. Doesn't disturb anything else.
  task automatic check_reg(input logic [2:0] r, input logic [7:0] exp,
                            input string label);
    begin
      peek(r);
      if (obs !== exp) begin
        $display("FAIL [%s] R%0d: got 0x%02h, expected 0x%02h",
                 label, r, obs, exp);
        errors = errors + 1;
      end
    end
  endtask

  task automatic check_flags(input logic [3:0] exp, input string label);
    begin
      if (flags !== exp) begin
        $display("FAIL [%s] flags: got %b (ZNCV), expected %b",
                 label, flags, exp);
        errors = errors + 1;
      end
    end
  endtask

  // ---- the actual tests ----------------------------------------------

  initial begin
    $dumpfile("tb.vcd");
    $dumpvars(0, tb);

    // sane defaults
    op      = OP_MOV;
    ra      = 3'd0;
    rb      = 3'd0;
    rd      = 3'd0;
    imm     = 8'h00;
    use_imm = 1'b0;
    we      = 1'b0;
    flag_we = 1'b0;
    rst_n   = 1'b0;

    // Hold reset for a few cycles, then release.
    repeat (3) @(posedge clk);
    @(negedge clk); rst_n = 1'b1;

    // After reset, every register should be zero.
    check_reg(3'd1, 8'h00, "post-reset R1");
    check_reg(3'd7, 8'h00, "post-reset R7");
    check_flags(4'b0000, "post-reset flags");

    // R0 must always read zero, even after a write.
    loadi(3'd0, 8'hFF);
    check_reg(3'd0, 8'h00, "R0 stays zero after write");

    // ---- LOADI / MOV ----
    loadi(3'd1, 8'h0F);
    loadi(3'd2, 8'h11);
    loadi(3'd3, 8'h80);
    check_reg(3'd1, 8'h0F, "loadi R1");
    check_reg(3'd2, 8'h11, "loadi R2");
    check_reg(3'd3, 8'h80, "loadi R3");

    // ---- ADD ----
    // R4 = R1 + R2 = 0x0F + 0x11 = 0x20
    alu_rr(OP_ADD, 3'd4, 3'd1, 3'd2, 1'b1);
    check_reg(3'd4, 8'h20, "ADD R4=R1+R2");
    // Z=0 N=0 C=0 V=0
    check_flags(4'b0000, "ADD R4 flags");

    // ADD with carry-out: 0xFF + 0x01 = 0x100 → result 0x00, C=1, Z=1
    loadi(3'd5, 8'hFF);
    alu_ri(OP_ADD, 3'd6, 3'd5, 8'h01, 1'b1);
    check_reg(3'd6, 8'h00, "ADD R6=R5+1 wraps");
    check_flags(4'b1010, "ADD R5+1 flags Z C");      // {Z N C V} = {1,0,1,0}

    // Signed overflow: 0x7F + 0x01 = 0x80, V=1, N=1
    loadi(3'd5, 8'h7F);
    alu_ri(OP_ADD, 3'd6, 3'd5, 8'h01, 1'b1);
    check_reg(3'd6, 8'h80, "ADD 0x7F+1 = 0x80");
    check_flags(4'b0101, "ADD 0x7F+1 flags N V");    // Z=0 N=1 C=0 V=1

    // ---- SUB ----
    // R4 = R2 - R1 = 0x11 - 0x0F = 0x02
    loadi(3'd1, 8'h0F);
    loadi(3'd2, 8'h11);
    alu_rr(OP_SUB, 3'd4, 3'd2, 3'd1, 1'b1);
    check_reg(3'd4, 8'h02, "SUB R4=R2-R1");
    check_flags(4'b0000, "SUB no-borrow flags");

    // SUB with borrow: 0x00 - 0x01 = 0xFF, C=1 (borrow), N=1
    alu_ri(OP_SUB, 3'd4, 3'd0, 8'h01, 1'b1);
    check_reg(3'd4, 8'hFF, "SUB R4=0-1 wraps to 0xFF");
    check_flags(4'b0110, "SUB 0-1 flags N C");       // Z=0 N=1 C=1 V=0

    // SUB equal: result == 0, Z=1
    loadi(3'd1, 8'h42);
    alu_rr(OP_SUB, 3'd4, 3'd1, 3'd1, 1'b1);
    check_reg(3'd4, 8'h00, "SUB R-R=0");
    check_flags(4'b1000, "SUB R-R flags Z");

    // ---- AND / OR / XOR ----
    loadi(3'd1, 8'hF0);
    loadi(3'd2, 8'h0F);
    alu_rr(OP_AND, 3'd3, 3'd1, 3'd2, 1'b1);
    check_reg(3'd3, 8'h00, "AND F0 & 0F = 0");
    check_flags(4'b1000, "AND zero flag");

    alu_rr(OP_OR,  3'd3, 3'd1, 3'd2, 1'b1);
    check_reg(3'd3, 8'hFF, "OR F0|0F = FF");
    check_flags(4'b0100, "OR negative flag");

    alu_ri(OP_XOR, 3'd3, 3'd1, 8'hAA, 1'b1);
    check_reg(3'd3, 8'h5A, "XOR F0^AA = 5A");

    // ---- SHL / SHR / SAR ----
    loadi(3'd1, 8'h81);
    alu_rr(OP_SHL, 3'd2, 3'd1, 3'd0, 1'b1);
    check_reg(3'd2, 8'h02, "SHL 0x81 = 0x02");
    check_flags(4'b0010, "SHL out-bit → C");          // C=top bit before shift

    alu_rr(OP_SHR, 3'd2, 3'd1, 3'd0, 1'b1);
    check_reg(3'd2, 8'h40, "SHR 0x81 = 0x40");
    check_flags(4'b0010, "SHR out-bit → C");          // C=bottom bit before

    alu_rr(OP_SAR, 3'd2, 3'd1, 3'd0, 1'b1);
    check_reg(3'd2, 8'hC0, "SAR 0x81 = 0xC0");
    check_flags(4'b0110, "SAR keeps sign, N=1 C=1");

    // ---- NOT ----
    loadi(3'd1, 8'h55);
    alu_rr(OP_NOT, 3'd2, 3'd1, 3'd0, 1'b1);
    check_reg(3'd2, 8'hAA, "NOT 0x55 = 0xAA");
    check_flags(4'b0100, "NOT 0x55 flag N");

    // ---- ADC / SBC ----
    // First do an ADD that sets C=1, then ADC reads it.
    loadi(3'd1, 8'hFF);
    alu_ri(OP_ADD, 3'd2, 3'd1, 8'h01, 1'b1);  // sets C=1, R2=0
    check_reg(3'd2, 8'h00, "setup C=1: 0xFF+1 wraps");
    // R3 = R0 + R0 + C = 0 + 0 + 1 = 1
    alu_rr(OP_ADC, 3'd3, 3'd0, 3'd0, 1'b1);
    check_reg(3'd3, 8'h01, "ADC R3=0+0+C");
    // After ADC, flags should reflect ADC: Z=0 N=0 C=0 V=0
    check_flags(4'b0000, "ADC flags fresh");

    // Set C=1 again then SBC: R4 = R0 - R0 - 1 = 0xFF, borrow out C=1
    loadi(3'd1, 8'hFF);
    alu_ri(OP_ADD, 3'd2, 3'd1, 8'h01, 1'b1);   // C=1
    alu_rr(OP_SBC, 3'd4, 3'd0, 3'd0, 1'b1);
    check_reg(3'd4, 8'hFF, "SBC R4=0-0-C=0xFF");
    check_flags(4'b0110, "SBC borrow flags");

    // ---- write-enable gating ----
    // we=0 must NOT update the destination register.
    loadi(3'd1, 8'h11);
    step(OP_ADD, 3'd1, 3'd0, 3'd1, 8'h22, 1'b1, 1'b0, 1'b0);
    check_reg(3'd1, 8'h11, "we=0 leaves R1 alone");

    // flag_we=0 must NOT update flags.
    // First force flags to a known value (Z=1 after R-R=0).
    loadi(3'd1, 8'h42);
    alu_rr(OP_SUB, 3'd2, 3'd1, 3'd1, 1'b1);
    check_flags(4'b1000, "flags pre-test = Z");
    // Now do something that would change flags, but with flag_we=0.
    step(OP_ADD, 3'd1, 3'd0, 3'd2, 8'hFF, 1'b1, 1'b1, 1'b0);
    check_flags(4'b1000, "flag_we=0 holds flags");

    // ---- summary ----
    if (errors == 0) $display("PASS: tiny ALU datapath, all checks ok.");
    else             $display("FAIL: %0d errors", errors);

    $finish;
  end

  // safety net
  initial begin
    #1_000_000;
    $display("FAIL: testbench timed out");
    $finish;
  end

endmodule

`default_nettype wire

The demo TB runs a register-trace “program” — the first eight Fibonacci numbers, then a couple of bitwise ops, then a 0xFF + 1 to show carry-out:

projects/05_alu_datapath/test/tb_demo.sv system-verilog
// Project 05 demo testbench — runs a small "program" through the ALU
// datapath and prints each step in a register-trace style. Like
// `make demo` for the earlier projects: it doesn't PASS/FAIL anything,
// it just shows the chip doing something.
//
// The "program" computes the first eight Fibonacci numbers using the
// register file as scratch space, then does a couple of bitwise ops
// to demonstrate flag behavior:
//
//     R1 = 1                    (loadi)
//     R2 = 1                    (loadi)
//     R3 = R1 + R2     = 2
//     R4 = R2 + R3     = 3
//     R5 = R3 + R4     = 5
//     R6 = R4 + R5     = 8
//     R7 = R5 + R6     = 13
//
//     -- bitwise --
//     R1 = 0xCA, R2 = 0xFE, R3 = R1 & R2, R4 = R1 | R2, R5 = R1 ^ R2
//
// Style choice: keep it pure plain SV, no $display formatting tricks
// beyond the printf-ish `%h`/`%b`.

`timescale 1ns/1ps
`default_nettype none

module tb_demo;

  logic clk = 0;
  always #5 clk = ~clk;

  logic        rst_n;
  logic [3:0]  op;
  logic [2:0]  ra, rb, rd;
  logic [7:0]  imm;
  logic        use_imm, we, flag_we;
  logic [7:0]  result, obs;
  logic [3:0]  flags;

  top dut (
    .clk(clk), .rst_n(rst_n),
    .op(op), .ra(ra), .rb(rb), .rd(rd),
    .imm(imm), .use_imm(use_imm), .we(we), .flag_we(flag_we),
    .result(result), .obs(obs), .flags(flags)
  );

  localparam logic [3:0] OP_ADD=4'b0000, OP_SUB=4'b0001, OP_AND=4'b0010;
  localparam logic [3:0] OP_OR =4'b0011, OP_XOR=4'b0100, OP_MOV=4'b1000;

  task automatic step(
    input logic [3:0] op_i,  input logic [2:0] ra_i,
    input logic [2:0] rb_i,  input logic [2:0] rd_i,
    input logic [7:0] imm_i, input logic use_imm_i,
    input logic       we_i,  input logic flag_we_i
  );
    begin
      @(negedge clk);
      op = op_i; ra = ra_i; rb = rb_i; rd = rd_i;
      imm = imm_i; use_imm = use_imm_i; we = we_i; flag_we = flag_we_i;
      @(posedge clk);
      #1;
    end
  endtask

  // ADD R0 + imm puts imm into rd; using MOV here wouldn't work since
  // MOV is a-passthrough and we want the immediate.
  task automatic loadi(input logic [2:0] r, input logic [7:0] v);
    step(OP_ADD, 3'd0, 3'd0, r, v, 1'b1, 1'b1, 1'b0);
  endtask
  task automatic peek(input logic [2:0] r);
    step(OP_MOV, r, 3'd0, 3'd0, 8'h00, 1'b0, 1'b0, 1'b0);
  endtask

  task automatic peek_print(input logic [2:0] r, input string label);
    begin
      peek(r);
      $display("[alu]    %s R%0d = 0x%02h (%0d)", label, r, obs, obs);
    end
  endtask

  task automatic do_add_rr(input logic [2:0] rd_i,
                            input logic [2:0] ra_i,
                            input logic [2:0] rb_i,
                            input string note);
    begin
      step(OP_ADD, ra_i, rb_i, rd_i, 8'h00, 1'b0, 1'b1, 1'b1);
      $display("[alu]    R%0d = R%0d + R%0d  → 0x%02h  (Z=%0d N=%0d C=%0d V=%0d) %s",
               rd_i, ra_i, rb_i, result,
               flags[3], flags[2], flags[1], flags[0], note);
    end
  endtask

  task automatic do_op_rr(input logic [3:0] op_i,
                           input string opname,
                           input logic [2:0] rd_i,
                           input logic [2:0] ra_i,
                           input logic [2:0] rb_i);
    begin
      step(op_i, ra_i, rb_i, rd_i, 8'h00, 1'b0, 1'b1, 1'b1);
      $display("[alu]    R%0d = R%0d %s R%0d  → 0x%02h  (Z=%0d N=%0d)",
               rd_i, ra_i, opname, rb_i, result, flags[3], flags[2]);
    end
  endtask

  initial begin
    $dumpfile("tb_demo.vcd");
    $dumpvars(0, tb_demo);

    op = 0; ra = 0; rb = 0; rd = 0; imm = 0;
    use_imm = 0; we = 0; flag_we = 0;
    rst_n = 0;
    repeat (3) @(posedge clk);
    @(negedge clk); rst_n = 1;

    $display("[alu]");
    $display("[alu]  -- librelane-playground / project 05 / tiny ALU datapath --");
    $display("[alu]  ops: ADD SUB AND OR XOR SHL SHR SAR MOV NOT ADC SBC");
    $display("[alu]  regfile: 8 × 8b, R0 hardwired to 0");
    $display("[alu]");
    $display("[alu]  fibonacci(8) using R1..R7:");

    loadi(3'd1, 8'h01); peek_print(3'd1, "loadi");
    loadi(3'd2, 8'h01); peek_print(3'd2, "loadi");
    do_add_rr(3'd3, 3'd1, 3'd2, "");
    do_add_rr(3'd4, 3'd2, 3'd3, "");
    do_add_rr(3'd5, 3'd3, 3'd4, "");
    do_add_rr(3'd6, 3'd4, 3'd5, "");
    do_add_rr(3'd7, 3'd5, 3'd6, "");

    $display("[alu]");
    $display("[alu]  bitwise on 0xCA and 0xFE:");
    loadi(3'd1, 8'hCA);
    loadi(3'd2, 8'hFE);
    do_op_rr(OP_AND, "&", 3'd3, 3'd1, 3'd2);
    do_op_rr(OP_OR,  "|", 3'd4, 3'd1, 3'd2);
    do_op_rr(OP_XOR, "^", 3'd5, 3'd1, 3'd2);

    $display("[alu]");
    $display("[alu]  carry-out demo: 0xFF + 0x01:");
    loadi(3'd1, 8'hFF);
    step(OP_ADD, 3'd1, 3'd0, 3'd2, 8'h01, 1'b1, 1'b1, 1'b1);
    $display("[alu]    R2 = R1 + 1     → 0x%02h  (Z=%0d N=%0d C=%0d V=%0d)",
             result, flags[3], flags[2], flags[1], flags[0]);

    $display("[alu]");
    $finish;
  end

endmodule

`default_nettype wire

See also