The first sequential design on the ladder. Three small synchronous
circuits sharing a clock and a reset — an 8-bit counter, an 8-bit
Fibonacci LFSR (max-length, taps 8/6/5/4), and a 1-bit PWM whose duty
is threshold / 256. A mode input selects which of them drives the
8-bit out bus; count / lfsr / pwm are also exposed
independently for observation.
This is also the project where we switch the RTL from plain Verilog to
SystemVerilog — logic, always_ff, always_comb, unique case. Yosys 0.16+ handles the synth subset natively; iverilog needs
-g2012. The visible code reads the same shape but synthesis catches
more bugs at compile time.
Wire length (estimated): 2,691 μm, 3× P01. Static + switching power: 203 nW — finally non-trivial because the flops are toggling every cycle. One max-slew warning in the slow corner (11 violations) — a known flaky check on small designs that the user-mux isolation in TT-shaped projects normally hides.
What’s new vs. project 01
Project 01 had no clock, no flops, and ran STA against a synthetic
__VIRTUAL_CLK__. Project 02 has a real clock domain — 16 flip-flops,
a 100 MHz target — and this is where the EDA flow’s full sequential
machinery kicks in:
- CTS (clock tree synthesis) actually runs. The flow inserts a small tree of clock buffers driving every flop with bounded skew.
- STA measures setup/hold timing on real data paths between flops. Setup slack of +1.24 ns means data arrives 1.24 ns before the next clock edge needs it; hold slack of +0.12 ns means data hangs around long enough at the destination flop’s hold window.
- Max-slew violations appear for the first time. 11 nets, all in the slow PVT corner (typical/100°C/1.60 V). This is the flow saying “in the worst PVT corner you’ll see, signal transition times on these nets are too long for safe operation.” It’s not catastrophic — it’s what you’d fix by inserting a buffer. The flow didn’t auto-repair here, which is interesting; we’ll come back to that.
The RTL
The whole module is ~50 lines of SystemVerilog. Three independent
always blocks (the counter, the LFSR, the output mux) and two
combinational assigns (the PWM and the wrap tick).
// Project 02: counter + PWM + LFSR.
//
// Three small synchronous circuits sharing a clock and reset:
//
// - 8-bit free-running counter
// - 8-bit Fibonacci LFSR (Galois taps at bits 8/6/5/4 → max-length
// pseudo-random sequence of 255 states; the all-zero state is locked
// out by the structure)
// - 1-bit PWM output: high while `count < threshold`, so the duty
// cycle is `threshold / 256`
//
// `mode` selects which signal drives the 8-bit `out` bus. `count_o` /
// `lfsr_o` / `pwm_o` are also exposed independently so a testbench (or
// silicon debugger) can observe all three concurrently.
//
// `tick` is a one-cycle pulse the cycle before counter rollover (high
// when `count == 8'hFF`); useful as a heartbeat for downstream modules.
//
// SystemVerilog choices in this file:
// - `logic` everywhere (no wire/reg ambiguity)
// - `always_ff` / `always_comb` for explicit register / combinational
// intent. Synthesis will fail if a block declared `always_comb`
// could infer a latch — that's the whole point.
// - Active-low async reset `rst_n` at the chip boundary, sync at
// downstream FF inputs. (Pure async reset, but the sync release is
// handled by the testbench dropping rst_n on a clock edge.)
`default_nettype none
module top (
input logic clk,
input logic rst_n, // active-low async reset
input logic [1:0] mode, // 00=count, 01=lfsr, 10=pwm-replicated, 11=zero
input logic [7:0] threshold, // PWM duty threshold; out_pwm = (count < threshold)
output logic [7:0] out, // mode-selected 8-bit output
output logic [7:0] count_o, // raw counter value (for debug/observe)
output logic [7:0] lfsr_o, // raw LFSR value
output logic pwm_o, // raw 1-bit PWM
output logic tick // 1-cycle pulse when count == 8'hFF
);
// ---- counter ----
// Free-running 8-bit counter. Wraps naturally because bit 8 is dropped.
logic [7:0] count;
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) count <= 8'h00;
else count <= count + 8'h01;
end
// ---- LFSR ----
// Fibonacci 8-bit LFSR. Taps at bit positions 8/6/5/4 (numbered from 1)
// give a maximum-length sequence over 255 nonzero states.
// Reset to 8'hFF — must be nonzero or the LFSR locks at 0.
logic [7:0] lfsr;
logic lfsr_fb;
assign lfsr_fb = lfsr[7] ^ lfsr[5] ^ lfsr[4] ^ lfsr[3];
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) lfsr <= 8'hFF;
else lfsr <= {lfsr[6:0], lfsr_fb};
end
// ---- PWM ----
// Comparator on the counter — pure combinational. High while the counter
// is below threshold, low otherwise. Over each 256-cycle period, it
// is high for exactly `threshold` cycles.
assign pwm_o = (count < threshold);
// ---- tick ----
// Combinational equality check on the counter. High for one full clock
// cycle at value 0xFF, then count rolls to 0 and tick goes low.
assign tick = (count == 8'hFF);
// ---- output mux ----
// Drive `out` from one of the three observables based on `mode`.
// PWM mode replicates the 1-bit PWM across all 8 bits so a slow
// probe or LED display sees a clean on/off pattern.
always_comb begin
unique case (mode)
2'b00: out = count;
2'b01: out = lfsr;
2'b10: out = {8{pwm_o}};
2'b11: out = 8'h00;
endcase
end
assign count_o = count;
assign lfsr_o = lfsr;
endmodule
`default_nettype wire The always_ff @(posedge clk or negedge rst_n) blocks are this
project’s first asynchronous reset. When rst_n goes low, the flop’s
Q jumps immediately to the reset value (8'h00 for the counter,
8'hFF for the LFSR — never zero, or the LFSR would lock at 0
forever).
The LFSR taps at bits 8, 6, 5, 4 are the canonical max-length taps for an 8-bit register. With those taps, the register cycles through all 255 nonzero states before repeating. We verify this in simulation: 1024 cycles of running, 255 distinct states observed, zero never seen.
The testbench
Six phases. The interesting ones:
.out (out),
.count_o (count_o),
.lfsr_o (lfsr_o),
.pwm_o (pwm_o),
.tick (tick)
);
int errors = 0;
int wraps = 0;
int pwm_high_cycles = 0;
int pwm_total_cycles = 0;
int lfsr_unique;
// 256-bit one-hot bitmap of LFSR values seen.
logic [255:0] lfsr_seen = 256'h0;
// Watchdog so a hang doesn't run forever.
initial begin
#200_000;
$display("FAIL: testbench timed out before $finish");
$finish;
end
initial begin
$dumpfile("tb.vcd");
$dumpvars(0, tb);
// ---- Phase 1: reset ----
rst_n = 1'b0;
repeat (4) @(posedge clk);
if (count_o !== 8'h00) begin
$display("FAIL phase 1: count_o=%h, want 00", count_o);
errors = errors + 1;
end
if (lfsr_o !== 8'hFF) begin
$display("FAIL phase 1: lfsr_o=%h, want FF", lfsr_o);
errors = errors + 1;
end
if (out !== 8'h00) begin
$display("FAIL phase 1: out=%h, want 00 (mode=00, count under reset)", out);
errors = errors + 1;
end
// ---- Phase 2: release reset, run for 4 wraps ----
@(negedge clk);
rst_n = 1'b1;
// 4 wraps × 256 cycles = 1024 samples. Sample on each posedge AFTER
// the clock has updated `count` / `lfsr` for that cycle.
for (int i = 0; i < 4 * 256; i = i + 1) begin
@(posedge clk);
pwm_total_cycles = pwm_total_cycles + 1;
if (pwm_o) pwm_high_cycles = pwm_high_cycles + 1;
if (tick) wraps = wraps + 1;
lfsr_seen[lfsr_o] = 1'b1;
end
// ---- Phase 3: counter wrap count ---- Phase 4 is exact: with threshold = 0x40 (= 64), the PWM is high for
exactly 64 cycles per 256-cycle period. Over 4 wraps that’s exactly
256 high cycles. Off by one is a fail.
Phase 5 uses a 256-bit one-hot bitmap to track unique LFSR states. At the end, we count set bits and verify it’s 255 (max length) and that bit 0 is not set (LFSR never visited the all-zero state).
$ make test PROJECT=02_counter_pwm_lfsr
== 02_counter_pwm_lfsr ==
iverilog -g2012 -Wall -o tb.vvp -s tb ../src/top.sv tb.sv
PASS: counter (4 wraps), LFSR (255 states), PWM (256/1024 high), mode mux all OK.
02_counter_pwm_lfsr PASS
Watching it do something
The verifying testbench is exhaustive but quiet about what the chip
is doing. tb_demo.sv is a sibling that just lets the clock run and
prints what’s on the output bus mode by mode — a “scope view” of the
silicon as time advances. make demo PROJECT=02_counter_pwm_lfsr:
[chip] -- librelane-playground / project 02 / counter+LFSR+PWM --
[chip] clock running at 100 MHz; reset released.
[count] mode=00; 8 consecutive ticks of the free-running counter:
[count] t= 65000 count=0x03 out=0x03 tick=0
[count] t= 75000 count=0x04 out=0x04 tick=0
[count] t= 85000 count=0x05 out=0x05 tick=0
[count] t= 95000 count=0x06 out=0x06 tick=0
[count] t= 105000 count=0x07 out=0x07 tick=0
[count] t= 115000 count=0x08 out=0x08 tick=0
[count] t= 125000 count=0x09 out=0x09 tick=0
[count] t= 135000 count=0x0a out=0x0a tick=0
[lfsr ] mode=01; 16 consecutive states of the 8-bit Fibonacci LFSR
[lfsr ] (taps 8/6/5/4 → max-length 255-state cycle):
[lfsr ] t= 145000 lfsr=0x5e
[lfsr ] t= 155000 lfsr=0xbc
[lfsr ] t= 165000 lfsr=0x78
[lfsr ] t= 175000 lfsr=0xf1
[lfsr ] t= 185000 lfsr=0xe3
[lfsr ] t= 195000 lfsr=0xc6
[lfsr ] t= 205000 lfsr=0x8d
[lfsr ] ... 8 more ...
[pwm ] mode=10; threshold=0x40 → expected duty 64/256 = 25%
[pwm ] measured: 64 high cycles per 256-cycle period (= 25%)
The counter half is the simplest possible output: a number that
goes up by one every clock. The LFSR half is the same circuit
operating in pseudo-random mode — those 16 hex values look like
noise, but they’re the deterministic output of three XORs and a
shift register. The PWM half lets us count physical high cycles
across one period and see that yes, threshold=0x40 really does
hold the line high for exactly 64 of the 256 cycles. The chip,
externally, is doing what the RTL said it would.
What LibreLane did differently
Same 7-phase shape as project 01, but with two real differences:
- CTS inserted a small clock tree. All 16 flops get the clock from
clkthrough a balanced tree of ~5 buffers. After CTS, every flop’s clock pin sees the rising edge within a few hundred picoseconds of every other. - Resizer worked harder. With a real clock, the resizer found paths it actually wanted to optimize, not just paranoid against an imaginary clock. Cell count grew accordingly.
The 11 max-slew violations come from net segments in the slow corner where transition times stretch past the cell library’s max-slew constraint. Possible mitigations (any one would fix it):
- Tighten the drive strength in synthesis (force buffer sizing up)
- Run the flow with
MAX_TRANSITIONset explicitly - Loosen the timing target so the resizer focuses on slew
For now we leave it. It’s a useful artifact: a real-flow violation in a real run, which we’ll come back to in project 04 or 05 when we intentionally push timing.
What just happened?
Plain SystemVerilog → 1946 cells → 10000 μm² of silicon (a 100 × 100 μm
square), with 16 real flip-flops in a real clock domain, signoff clean
except 11 max-slew violations in the slowest PVT corner. The simulator
passed first try because the SV always_ff / always_comb discipline
catches sequential-vs-combinational confusion at compile time — there
is no “oh, a latch” surprise to worry about.
The 3D viewer above shows what changed structurally compared to project 01: more vertical structures (clock tree buffers visible as small vertical poly stacks at the centers of cell rows), and a denser met1 layer carrying clock plus signal interconnect.
See also
- Project 01 — the combinational ancestor.
- Project 03 → first explicit FSM, first protocol that talks to the outside world.
- Project README — full lesson plan.