No. 01 / project of 147 on the ladder

Combinational logic playground

introduces — gates, muxes, and what synthesis does when there is no clock to chase

harden statelast run2026-04-27
cells81non-filler
slack0.54ns setup
area3600 (die) / 2432 (core)μm²
signoff
  • DRCPASS
  • LVSPASS
  • antennaPASS

A 4-bit ALU with two operands, a 2-bit op selector, an 8-bit result, and two flags. Pure combinational — no flops, no clock — and the first thing on this ladder to be hardened all the way to a sky130 GDS.

layout · sky130A x= μm y= μm
drag · scroll to zoom · double-click to fit · 1 1:1 · f fit 60 × 60 μm die · sky130A · vectorized from final GDS
3d · sky130A · z×10
drag · scroll · right-drag pan · double-click recenter · R reset full sky130 stack · drag to orbit · z exaggerated 10× for visibility

Wire length (estimated): 897 μm. Static power at the typical corner: 31 nW — no clock means no switching, so the chip burns exactly the leakage current of 81 cells. The full 75-step flow (yosys → OpenROAD place + route + CTS + STA → magic + klayout DRC → netgen LVS → antenna check) takes about 50 seconds end-to-end.

What the circuit does

Two 4-bit operands a and b, a 2-bit op selector, and three outputs: an 8-bit result y, a carry flag, and a zero flag. The op encoding:

  • op = 00 — addition. y is the 5-bit sum zero-extended into 8 bits. carry is bit 4 of the 5-bit sum (the carry-out from a 4+4 → 5 add).
  • op = 01 — bitwise XOR. y[3:0] = a ^ b, top four bits are zero, carry is forced to 0.
  • op = 10 — bitwise AND. Same shape as XOR.
  • op = 11 — bitwise OR. Same shape as XOR.

zero is just (y == 8'h00), regardless of op. There is no clock, no reset, no enable, no flops anywhere in the design. Outputs change the moment inputs change. That’s it.

The Verilog

projects/01_comb_logic/src/top.v is one module, ~25 effective lines. Plain Verilog 2001, no SystemVerilog conveniences — keeps the synthesis toolchain happy on every backend.

`default_nettype none

module top (
    input  wire [3:0] a,
    input  wire [3:0] b,
    input  wire [1:0] op,
    output wire [7:0] y,
    output wire       zero,
    output wire       carry
);

  // Add: 5-bit sum so we have a real carry-out bit.
  wire [4:0] sum5 = {1'b0, a} + {1'b0, b};

  assign y = (op == 2'b00) ? {3'b000, sum5}      :
             (op == 2'b01) ? {4'b0000, a ^ b}    :
             (op == 2'b10) ? {4'b0000, a & b}    :
                             {4'b0000, a | b};

  assign carry = (op == 2'b00) ? sum5[4] : 1'b0;

  assign zero = (y == 8'h00);

endmodule

Three things to notice:

The `default_nettype none at the top is mandatory for our project conventions. It disables Verilog’s “create a 1-bit wire if you reference an undeclared name” misfeature. Typo a signal name and the compiler will yell at you instead of silently creating a wire that’s nothing.

The 5-bit sum5 is the only bit of width-juggling in the design. Two 4-bit operands can sum to 5 bits, so we zero-extend each to 5 bits before adding. The MSB of sum5 is the carry-out — same bit you’d implement as a chain of full-adders. Yosys will do exactly that, lowering the + into a small ripple-carry tree.

The conditional-chain on y is a 4-to-1 multiplexer. It compiles to the same gates whether you write it as nested ternaries or as a case block inside always @*. We chose the ternary form because it’s smaller in text, but if you mutate the design to play with that equivalence, you’ll see an identical cell count after synthesis.

The testbench

Project 01’s testbench is exhaustive: 4 ops × 16 × 16 = 1024 input combinations, each compared against an inline Verilog reference model. For a design this small, exhaustive is cheap and removes any “did we test that case?” doubt.

for (oi = 0; oi < 4; oi = oi + 1) begin
  op = oi[1:0];
  for (ai = 0; ai < 16; ai = ai + 1) begin
    a = ai[3:0];
    for (bi = 0; bi < 16; bi = bi + 1) begin
      b = bi[3:0];

      // Reference model — the same op table, written separately.
      case (op)
        2'b00: begin
          sum5_ref  = {1'b0, a} + {1'b0, b};
          exp_y     = {3'b000, sum5_ref};
          exp_carry = sum5_ref[4];
        end
        2'b01: begin exp_y = {4'b0000, a ^ b}; exp_carry = 1'b0; end
        2'b10: begin exp_y = {4'b0000, a & b}; exp_carry = 1'b0; end
        2'b11: begin exp_y = {4'b0000, a | b}; exp_carry = 1'b0; end
      endcase
      exp_zero = (exp_y == 8'h00);

      #1;  // settle combinational paths before sampling

      if (y !== exp_y || zero !== exp_zero || carry !== exp_carry)
        $display("FAIL ...");
    end
  end
end

A few things this testbench does on purpose:

The reference model is written in the same Verilog file but in a deliberately different style — case block instead of ternary, named intermediate sum5_ref. Two implementations that disagree are easier to catch than two copies of the same expression that share a typo.

The #1 delay before sampling looks redundant for pure combinational logic, but it’s the right habit for sequential designs (Project 02 onward will have flops). Better to write it correctly from project 01 than to skip it and forget when it matters.

The !== (case-equality) operator is used instead of != so a DUT output of x (unknown) counts as a mismatch — != against x returns x, which evaluates as false.

What the simulator says

$ make test PROJECT=01_comb_logic
== 01_comb_logic ==
iverilog -g2012 -Wall -o tb.vvp -s tb ../src/top.v tb.v
warning: Some design elements have no explicit time unit and/or
       : time precision. This may cause confusing timing results.
       : Affected design elements are:
       :   -- module top declared here: ../src/top.v:16
VCD info: dumpfile tb.vcd opened for output.
PASS: 1024 vectors checked.
tb.v:85: $finish called at 1024000 (1ps)
01_comb_logic                    PASS

The Icarus warning about top having no `timescale is intentional and correct. Design modules should not carry a timescale directive — the testbench is the authoritative time domain. Icarus is just observing that the design is inheriting the testbench’s 1ns/1ps scale, which is exactly what you want.

The exit timestamp 1024000 (in 1ps units) is informative: 1024 vectors × the #1ns settle each = 1024 ns = 1024000 ps. So the simulator burned through the entire testspace in just over a microsecond of simulated time and a few milliseconds of wall time.

Watching it do something

The verifying testbench is exhaustive but mute — PASS: 1024 vectors is correct, but it doesn’t show you the chip doing anything. A sister testbench, tb_demo.v, runs a handful of operations and prints each one as a calculator-style line. make demo PROJECT=01_comb_logic:

[alu]  -- librelane-playground / project 01 / 4-bit ALU --
[alu]  ops: 00=ADD  01=XOR  10=AND  11=OR

[alu]  0x3 + 0x5 = 0x08   carry=0  zero=0
[alu]  0x7 + 0x9 = 0x10   carry=1  zero=0
[alu]  0xf + 0x1 = 0x10   carry=1  zero=0
[alu]  0x0 + 0x0 = 0x00   carry=0  zero=1

[alu]  0xa ^ 0x5 = 0x0f   carry=0  zero=0
[alu]  0xf ^ 0xf = 0x00   carry=0  zero=1

[alu]  0xc & 0x3 = 0x00   carry=0  zero=1
[alu]  0xf & 0xa = 0x0a   carry=0  zero=0

[alu]  0x6 | 0x9 = 0x0f   carry=0  zero=0
[alu]  0x0 | 0x0 = 0x00   carry=0  zero=1

carry=1 on 0xF + 0x1 = 0x10 is the bit-4 carry-out of the 5-bit internal sum — exactly the line in the RTL where we widened to 5 bits on purpose so the carry would have somewhere to go. zero=1 on every result equal to 0x00 is the comparator branch falling out clean.

What LibreLane did

make harden PROJECT=01_comb_logic invokes LibreLane v3.0.2 against the RTL plus a minimal config. The flow walked 75 steps end-to-end in about fifty seconds. In plain language, those 75 steps fall into seven phases:

Phase 1 — Lint & sanity (steps 1–4)

Verilator and Yosys’s own lint pass read the RTL looking for patterns that synthesize to something you didn’t intend: implicit nets, combinational loops, multiple drivers, inferred latches, missing default branches in case. Project 01 passes clean — the assign-only style makes latches structurally impossible.

Phase 2 — Synthesis (steps 5–9)

Yosys turns Verilog into a netlist of standard cells from the sky130 high-density library. The 4-to-1 mux becomes a few NAND-ish cells; the 4-bit add becomes a small chain of full-adders; the comparators for zero and op==xx become trees of OR/AND. Yosys reports zero unmapped instances, meaning every operator was lowered to real cells.

Phase 3 — Floorplan (steps 10–18)

OpenROAD is told the die is 60 × 60 μm with a 50 × 50 core area, target 20–35% utilization. It scatters the cells across the core, builds power grids (the orange horizontal stripes you see in the layout — those are the VPWR / VGND rails), and inserts well-tap cells (the regular pattern of small green squares).

This is where the flow grumbled most: with no CLOCK_PORT and no real flops, OpenROAD invented a synthetic __VIRTUAL_CLK__ so its STA engine had something to chew on. Three families of warnings (STA-0366, STA-0419, STA-0450) all stem from that — they’re the flow being honest that it’s running timing analysis on a clockless design.

Phase 4 — Placement & resizing (steps 19–37)

Cells get optimized positions (global → detailed placement). The resizer runs against the virtual clock and finds 17 paths it would like to buffer “for safety”. Those 17 timing-repair buffers are pure overhead in this design — they exist to protect a clock that doesn’t. We could disable resizer with a config flag; for project 01 we let it do its harmless thing.

Phase 5 — Routing (steps 38–50)

Global router lays out wire paths; detailed router commits them to specific metal layers. The detailed router found 11 DRC errors on its first iteration and fixed all 11 on the second pass. That’s the expected pattern — it’s an iterative solver, and convergence in two iterations on a small design is a good sign.

Phase 6 — Parasitic extraction & STA (steps 51–55)

OpenRCX extracts wire capacitance and resistance from the routed geometry. STA replays timing with those parasitics included; the worst setup slack at the slow corner (TT, 100°C, 1.60 V) lands at +0.54 ns. Comfortable, against an imaginary 25 ns clock.

Phase 7 — Signoff (steps 56–75)

The final run of independent verification tools, each checking something different:

  • Magic DRC — geometric design rule checks (minimum widths, spacings, enclosures)
  • KLayout DRC — second opinion using a different rule deck
  • KLayout XOR — compares the GDS to the database the routers produced; catches GDS-write bugs
  • Magic SPICE extraction → Netgen LVS — extracts a SPICE netlist from the GDS, compares device-by-device against the synthesized netlist
  • Antenna check — looks for long unconnected metal wires that could collect manufacturing-time charge and damage gates

Every signoff check came back zero violations. That’s the cleanest possible result for a first project.

Reading the layout

Open the viewer at the top of this page and double-click to fit. A few landmarks to find at fit zoom:

The horizontal salmon bands running across are the power rails: VPWR near the top of each row, VGND along the bottom. Cells get their power by sitting between two adjacent rails, drawing from above and dumping current to ground below. The rails repeat every cell-row height (~2.7 μm in sky130) so cells of any width can be tiled.

The regular grid of small squares in two staggered patterns: those are the well taps and substrate taps. Every few cells you need one to hold the n-well at VDD and the substrate at GND. Without them the substrate would float and devices would misbehave. The router places them on a regular pitch and the floorplan pre-allocates rows of them.

The larger regions of pink and red are the standard cells themselves — each one is a small piece of logic (a 2-input NAND, a mux, a buffer). Project 01 has 33 of these doing real work, plus 17 timing-repair buffers, plus the taps and fillers.

Zoom in (mouse wheel) anywhere and you’ll see the internal structure of a single cell: thin vertical poly stripes (transistor gates), the wider horizontal diff regions (source/drain), small contact squares (licon1, gold-ish), and the local-interconnect (li1) wires that hop between contacts inside one cell.

Vias connecting cell internals up to the routing layers (met1 and above) appear as small bright squares. Above the cells, longer thin strips are met2 running vertically and met3 running horizontally, forming the actual signal nets that connect cells to each other.

What just happened?

Plain Verilog → 81 cells → 3600 μm² of silicon (a 60 × 60 μm square), with all signoff checks clean, in under a minute of compute. The flow inserted some scaffolding (a virtual clock, 17 buffers) it didn’t need but couldn’t help itself about; we let it. The result is a real GDS that, in principle, could be sent to a sky130 fab.

For project 01 the lesson is that the LibreLane flow has strong opinions about what a “well-formed design” looks like — clocked, registered, testable — and applies them whether your design needs them or not. It also means a clockless ALU like this one fits inside those opinions without breaking anything; the warnings are louder than the actual problems they describe.

See also