No. 02 / project of 147 on the ladder

Counter, PWM, and LFSR

introduces — flip-flops, clocks, real STA, the slow/fast clock-period experiment

harden statelast run2026-04-28
cells1,946non-filler
slack1.24ns setup
area10000 (die) / 7807 (core)μm²
signoff
  • DRCPASS
  • LVSPASS
  • antennaPASS

The first sequential design on the ladder. Three small synchronous circuits sharing a clock and a reset — an 8-bit counter, an 8-bit Fibonacci LFSR (max-length, taps 8/6/5/4), and a 1-bit PWM whose duty is threshold / 256. A mode input selects which of them drives the 8-bit out bus; count / lfsr / pwm are also exposed independently for observation.

This is also the project where we switch the RTL from plain Verilog to SystemVeriloglogic, always_ff, always_comb, unique case. Yosys 0.16+ handles the synth subset natively; iverilog needs -g2012. The visible code reads the same shape but synthesis catches more bugs at compile time.

layout · sky130A x= μm y= μm
drag · scroll to zoom · double-click to fit · 1 1:1 · f fit 100 × 100 μm die · sky130A · vectorized from final GDS
3d · sky130A · z×10
drag · scroll · right-drag pan · double-click recenter · R reset full sky130 stack · z exaggerated 10× · 21k polygons · meshopt-compressed

Wire length (estimated): 2,691 μm, 3× P01. Static + switching power: 203 nW — finally non-trivial because the flops are toggling every cycle. One max-slew warning in the slow corner (11 violations) — a known flaky check on small designs that the user-mux isolation in TT-shaped projects normally hides.

What’s new vs. project 01

Project 01 had no clock, no flops, and ran STA against a synthetic __VIRTUAL_CLK__. Project 02 has a real clock domain — 16 flip-flops, a 100 MHz target — and this is where the EDA flow’s full sequential machinery kicks in:

  • CTS (clock tree synthesis) actually runs. The flow inserts a small tree of clock buffers driving every flop with bounded skew.
  • STA measures setup/hold timing on real data paths between flops. Setup slack of +1.24 ns means data arrives 1.24 ns before the next clock edge needs it; hold slack of +0.12 ns means data hangs around long enough at the destination flop’s hold window.
  • Max-slew violations appear for the first time. 11 nets, all in the slow PVT corner (typical/100°C/1.60 V). This is the flow saying “in the worst PVT corner you’ll see, signal transition times on these nets are too long for safe operation.” It’s not catastrophic — it’s what you’d fix by inserting a buffer. The flow didn’t auto-repair here, which is interesting; we’ll come back to that.

The RTL

The whole module is ~50 lines of SystemVerilog. Three independent always blocks (the counter, the LFSR, the output mux) and two combinational assigns (the PWM and the wrap tick).

projects/02_counter_pwm_lfsr/src/top.sv system-verilog
// Project 02: counter + PWM + LFSR.
//
// Three small synchronous circuits sharing a clock and reset:
//
//   - 8-bit free-running counter
//   - 8-bit Fibonacci LFSR (Galois taps at bits 8/6/5/4 → max-length
//     pseudo-random sequence of 255 states; the all-zero state is locked
//     out by the structure)
//   - 1-bit PWM output: high while `count < threshold`, so the duty
//     cycle is `threshold / 256`
//
// `mode` selects which signal drives the 8-bit `out` bus. `count_o` /
// `lfsr_o` / `pwm_o` are also exposed independently so a testbench (or
// silicon debugger) can observe all three concurrently.
//
// `tick` is a one-cycle pulse the cycle before counter rollover (high
// when `count == 8'hFF`); useful as a heartbeat for downstream modules.
//
// SystemVerilog choices in this file:
//   - `logic` everywhere (no wire/reg ambiguity)
//   - `always_ff` / `always_comb` for explicit register / combinational
//     intent. Synthesis will fail if a block declared `always_comb`
//     could infer a latch — that's the whole point.
//   - Active-low async reset `rst_n` at the chip boundary, sync at
//     downstream FF inputs. (Pure async reset, but the sync release is
//     handled by the testbench dropping rst_n on a clock edge.)

`default_nettype none

module top (
    input  logic        clk,
    input  logic        rst_n,        // active-low async reset
    input  logic [1:0]  mode,         // 00=count, 01=lfsr, 10=pwm-replicated, 11=zero
    input  logic [7:0]  threshold,    // PWM duty threshold; out_pwm = (count < threshold)
    output logic [7:0]  out,          // mode-selected 8-bit output
    output logic [7:0]  count_o,      // raw counter value (for debug/observe)
    output logic [7:0]  lfsr_o,       // raw LFSR value
    output logic        pwm_o,        // raw 1-bit PWM
    output logic        tick          // 1-cycle pulse when count == 8'hFF
);

  // ---- counter ----
  // Free-running 8-bit counter. Wraps naturally because bit 8 is dropped.
  logic [7:0] count;
  always_ff @(posedge clk or negedge rst_n) begin
    if (!rst_n) count <= 8'h00;
    else        count <= count + 8'h01;
  end

  // ---- LFSR ----
  // Fibonacci 8-bit LFSR. Taps at bit positions 8/6/5/4 (numbered from 1)
  // give a maximum-length sequence over 255 nonzero states.
  // Reset to 8'hFF — must be nonzero or the LFSR locks at 0.
  logic [7:0] lfsr;
  logic       lfsr_fb;
  assign lfsr_fb = lfsr[7] ^ lfsr[5] ^ lfsr[4] ^ lfsr[3];
  always_ff @(posedge clk or negedge rst_n) begin
    if (!rst_n) lfsr <= 8'hFF;
    else        lfsr <= {lfsr[6:0], lfsr_fb};
  end

  // ---- PWM ----
  // Comparator on the counter — pure combinational. High while the counter
  // is below threshold, low otherwise. Over each 256-cycle period, it
  // is high for exactly `threshold` cycles.
  assign pwm_o = (count < threshold);

  // ---- tick ----
  // Combinational equality check on the counter. High for one full clock
  // cycle at value 0xFF, then count rolls to 0 and tick goes low.
  assign tick = (count == 8'hFF);

  // ---- output mux ----
  // Drive `out` from one of the three observables based on `mode`.
  // PWM mode replicates the 1-bit PWM across all 8 bits so a slow
  // probe or LED display sees a clean on/off pattern.
  always_comb begin
    unique case (mode)
      2'b00:   out = count;
      2'b01:   out = lfsr;
      2'b10:   out = {8{pwm_o}};
      2'b11:   out = 8'h00;
    endcase
  end

  assign count_o = count;
  assign lfsr_o  = lfsr;

endmodule

`default_nettype wire
src/top.sv — 4-bit ALU's grown-up cousin: now with state.

The always_ff @(posedge clk or negedge rst_n) blocks are this project’s first asynchronous reset. When rst_n goes low, the flop’s Q jumps immediately to the reset value (8'h00 for the counter, 8'hFF for the LFSR — never zero, or the LFSR would lock at 0 forever).

The LFSR taps at bits 8, 6, 5, 4 are the canonical max-length taps for an 8-bit register. With those taps, the register cycles through all 255 nonzero states before repeating. We verify this in simulation: 1024 cycles of running, 255 distinct states observed, zero never seen.

The testbench

Six phases. The interesting ones:

projects/02_counter_pwm_lfsr/test/tb.sv system-verilog · L33-90
      .out      (out),
      .count_o  (count_o),
      .lfsr_o   (lfsr_o),
      .pwm_o    (pwm_o),
      .tick     (tick)
  );

  int errors = 0;
  int wraps  = 0;
  int pwm_high_cycles = 0;
  int pwm_total_cycles = 0;
  int lfsr_unique;

  // 256-bit one-hot bitmap of LFSR values seen.
  logic [255:0] lfsr_seen = 256'h0;

  // Watchdog so a hang doesn't run forever.
  initial begin
    #200_000;
    $display("FAIL: testbench timed out before $finish");
    $finish;
  end

  initial begin
    $dumpfile("tb.vcd");
    $dumpvars(0, tb);

    // ---- Phase 1: reset ----
    rst_n = 1'b0;
    repeat (4) @(posedge clk);
    if (count_o !== 8'h00) begin
      $display("FAIL phase 1: count_o=%h, want 00", count_o);
      errors = errors + 1;
    end
    if (lfsr_o !== 8'hFF) begin
      $display("FAIL phase 1: lfsr_o=%h, want FF", lfsr_o);
      errors = errors + 1;
    end
    if (out !== 8'h00) begin
      $display("FAIL phase 1: out=%h, want 00 (mode=00, count under reset)", out);
      errors = errors + 1;
    end

    // ---- Phase 2: release reset, run for 4 wraps ----
    @(negedge clk);
    rst_n = 1'b1;

    // 4 wraps × 256 cycles = 1024 samples. Sample on each posedge AFTER
    // the clock has updated `count` / `lfsr` for that cycle.
    for (int i = 0; i < 4 * 256; i = i + 1) begin
      @(posedge clk);
      pwm_total_cycles = pwm_total_cycles + 1;
      if (pwm_o)         pwm_high_cycles = pwm_high_cycles + 1;
      if (tick)          wraps = wraps + 1;
      lfsr_seen[lfsr_o] = 1'b1;
    end

    // ---- Phase 3: counter wrap count ----
tb.sv — phase 2 (run for 4 wraps, accumulate stats) and the LFSR coverage check.

Phase 4 is exact: with threshold = 0x40 (= 64), the PWM is high for exactly 64 cycles per 256-cycle period. Over 4 wraps that’s exactly 256 high cycles. Off by one is a fail.

Phase 5 uses a 256-bit one-hot bitmap to track unique LFSR states. At the end, we count set bits and verify it’s 255 (max length) and that bit 0 is not set (LFSR never visited the all-zero state).

$ make test PROJECT=02_counter_pwm_lfsr
== 02_counter_pwm_lfsr ==
iverilog -g2012 -Wall -o tb.vvp -s tb ../src/top.sv tb.sv
PASS: counter (4 wraps), LFSR (255 states), PWM (256/1024 high), mode mux all OK.
02_counter_pwm_lfsr              PASS

Watching it do something

The verifying testbench is exhaustive but quiet about what the chip is doing. tb_demo.sv is a sibling that just lets the clock run and prints what’s on the output bus mode by mode — a “scope view” of the silicon as time advances. make demo PROJECT=02_counter_pwm_lfsr:

[chip] -- librelane-playground / project 02 / counter+LFSR+PWM --
[chip] clock running at 100 MHz; reset released.

[count] mode=00; 8 consecutive ticks of the free-running counter:
[count]   t=  65000  count=0x03  out=0x03  tick=0
[count]   t=  75000  count=0x04  out=0x04  tick=0
[count]   t=  85000  count=0x05  out=0x05  tick=0
[count]   t=  95000  count=0x06  out=0x06  tick=0
[count]   t= 105000  count=0x07  out=0x07  tick=0
[count]   t= 115000  count=0x08  out=0x08  tick=0
[count]   t= 125000  count=0x09  out=0x09  tick=0
[count]   t= 135000  count=0x0a  out=0x0a  tick=0

[lfsr ] mode=01; 16 consecutive states of the 8-bit Fibonacci LFSR
[lfsr ]         (taps 8/6/5/4 → max-length 255-state cycle):
[lfsr ]   t= 145000  lfsr=0x5e
[lfsr ]   t= 155000  lfsr=0xbc
[lfsr ]   t= 165000  lfsr=0x78
[lfsr ]   t= 175000  lfsr=0xf1
[lfsr ]   t= 185000  lfsr=0xe3
[lfsr ]   t= 195000  lfsr=0xc6
[lfsr ]   t= 205000  lfsr=0x8d
[lfsr ]   ... 8 more ...

[pwm  ] mode=10; threshold=0x40 → expected duty 64/256 = 25%
[pwm  ]   measured: 64 high cycles per 256-cycle period (= 25%)

The counter half is the simplest possible output: a number that goes up by one every clock. The LFSR half is the same circuit operating in pseudo-random mode — those 16 hex values look like noise, but they’re the deterministic output of three XORs and a shift register. The PWM half lets us count physical high cycles across one period and see that yes, threshold=0x40 really does hold the line high for exactly 64 of the 256 cycles. The chip, externally, is doing what the RTL said it would.

What LibreLane did differently

Same 7-phase shape as project 01, but with two real differences:

  • CTS inserted a small clock tree. All 16 flops get the clock from clk through a balanced tree of ~5 buffers. After CTS, every flop’s clock pin sees the rising edge within a few hundred picoseconds of every other.
  • Resizer worked harder. With a real clock, the resizer found paths it actually wanted to optimize, not just paranoid against an imaginary clock. Cell count grew accordingly.

The 11 max-slew violations come from net segments in the slow corner where transition times stretch past the cell library’s max-slew constraint. Possible mitigations (any one would fix it):

  • Tighten the drive strength in synthesis (force buffer sizing up)
  • Run the flow with MAX_TRANSITION set explicitly
  • Loosen the timing target so the resizer focuses on slew

For now we leave it. It’s a useful artifact: a real-flow violation in a real run, which we’ll come back to in project 04 or 05 when we intentionally push timing.

What just happened?

Plain SystemVerilog → 1946 cells → 10000 μm² of silicon (a 100 × 100 μm square), with 16 real flip-flops in a real clock domain, signoff clean except 11 max-slew violations in the slowest PVT corner. The simulator passed first try because the SV always_ff / always_comb discipline catches sequential-vs-combinational confusion at compile time — there is no “oh, a latch” surprise to worry about.

The 3D viewer above shows what changed structurally compared to project 01: more vertical structures (clock tree buffers visible as small vertical poly stacks at the centers of cell rows), and a denser met1 layer carrying clock plus signal interconnect.

See also