No. 01 / project

Combinational logic playground

gates, muxes, and what synthesis does when there is no clock to chase

A 4-bit ALU with two operands, a 2-bit op selector, an 8-bit result, and two flags. Pure combinational — no flops, no clock — and the first thing on this ladder to be hardened all the way to a sky130 GDS.

layout · 60×60μm · sky130A x= μm y= μm
drag · scroll to zoom · double-click to fit · 1 1:1 · f fit 60 × 60 μm die · sky130A · vectorized from final GDS
3d · sky130A · z×10
drag · scroll · right-drag pan · double-click recenter · R reset full sky130 stack · drag to orbit · z exaggerated 10× for visibility
die area
3600 μm²
60 × 60 μm
core area
2432 μm²
20.2% util
stdcells
81
≈50 are real logic
wire len
897 μm
estimated
setup ws
+0.54 ns
slow corner
power
31 nW
no clock = no switching
signoff · 75 flow steps · ~50 s 7/8 clean
  • yosys synth 0 unmapped
  • lint 0 latches
  • magic DRC 0 violations
  • klayout DRC 0 violations
  • netgen LVS 0 mismatches
  • antenna 0 violations
  • max-slew/fan/cap all 3 corners
  • wire length no threshold

What the circuit does

Two 4-bit operands a and b, a 2-bit op selector, and three outputs: an 8-bit result y, a carry flag, and a zero flag. The op encoding:

zero is just (y == 8'h00), regardless of op. There is no clock, no reset, no enable, no flops anywhere in the design. Outputs change the moment inputs change. That’s it.

The Verilog

projects/01_comb_logic/src/top.v is one module, ~25 effective lines. Plain Verilog 2001, no SystemVerilog conveniences — keeps the synthesis toolchain happy on every backend.

`default_nettype none

module top (
    input  wire [3:0] a,
    input  wire [3:0] b,
    input  wire [1:0] op,
    output wire [7:0] y,
    output wire       zero,
    output wire       carry
);

  // Add: 5-bit sum so we have a real carry-out bit.
  wire [4:0] sum5 = {1'b0, a} + {1'b0, b};

  assign y = (op == 2'b00) ? {3'b000, sum5}      :
             (op == 2'b01) ? {4'b0000, a ^ b}    :
             (op == 2'b10) ? {4'b0000, a & b}    :
                             {4'b0000, a | b};

  assign carry = (op == 2'b00) ? sum5[4] : 1'b0;

  assign zero = (y == 8'h00);

endmodule

Three things to notice:

The `default_nettype none at the top is mandatory for our project conventions. It disables Verilog’s “create a 1-bit wire if you reference an undeclared name” misfeature. Typo a signal name and the compiler will yell at you instead of silently creating a wire that’s nothing.

The 5-bit sum5 is the only bit of width-juggling in the design. Two 4-bit operands can sum to 5 bits, so we zero-extend each to 5 bits before adding. The MSB of sum5 is the carry-out — same bit you’d implement as a chain of full-adders. Yosys will do exactly that, lowering the + into a small ripple-carry tree.

The conditional-chain on y is a 4-to-1 multiplexer. It compiles to the same gates whether you write it as nested ternaries or as a case block inside always @*. We chose the ternary form because it’s smaller in text, but if you mutate the design to play with that equivalence, you’ll see an identical cell count after synthesis.

The testbench

Project 01’s testbench is exhaustive: 4 ops × 16 × 16 = 1024 input combinations, each compared against an inline Verilog reference model. For a design this small, exhaustive is cheap and removes any “did we test that case?” doubt.

for (oi = 0; oi < 4; oi = oi + 1) begin
  op = oi[1:0];
  for (ai = 0; ai < 16; ai = ai + 1) begin
    a = ai[3:0];
    for (bi = 0; bi < 16; bi = bi + 1) begin
      b = bi[3:0];

      // Reference model — the same op table, written separately.
      case (op)
        2'b00: begin
          sum5_ref  = {1'b0, a} + {1'b0, b};
          exp_y     = {3'b000, sum5_ref};
          exp_carry = sum5_ref[4];
        end
        2'b01: begin exp_y = {4'b0000, a ^ b}; exp_carry = 1'b0; end
        2'b10: begin exp_y = {4'b0000, a & b}; exp_carry = 1'b0; end
        2'b11: begin exp_y = {4'b0000, a | b}; exp_carry = 1'b0; end
      endcase
      exp_zero = (exp_y == 8'h00);

      #1;  // settle combinational paths before sampling

      if (y !== exp_y || zero !== exp_zero || carry !== exp_carry)
        $display("FAIL ...");
    end
  end
end

A few things this testbench does on purpose:

The reference model is written in the same Verilog file but in a deliberately different style — case block instead of ternary, named intermediate sum5_ref. Two implementations that disagree are easier to catch than two copies of the same expression that share a typo.

The #1 delay before sampling looks redundant for pure combinational logic, but it’s the right habit for sequential designs (Project 02 onward will have flops). Better to write it correctly from project 01 than to skip it and forget when it matters.

The !== (case-equality) operator is used instead of != so a DUT output of x (unknown) counts as a mismatch — != against x returns x, which evaluates as false.

What the simulator says

$ make test PROJECT=01_comb_logic
== 01_comb_logic ==
iverilog -g2012 -Wall -o tb.vvp -s tb ../src/top.v tb.v
warning: Some design elements have no explicit time unit and/or
       : time precision. This may cause confusing timing results.
       : Affected design elements are:
       :   -- module top declared here: ../src/top.v:16
VCD info: dumpfile tb.vcd opened for output.
PASS: 1024 vectors checked.
tb.v:85: $finish called at 1024000 (1ps)
01_comb_logic                    PASS

The Icarus warning about top having no `timescale is intentional and correct. Design modules should not carry a timescale directive — the testbench is the authoritative time domain. Icarus is just observing that the design is inheriting the testbench’s 1ns/1ps scale, which is exactly what you want.

The exit timestamp 1024000 (in 1ps units) is informative: 1024 vectors × the #1ns settle each = 1024 ns = 1024000 ps. So the simulator burned through the entire testspace in just over a microsecond of simulated time and a few milliseconds of wall time.

What LibreLane did

make harden PROJECT=01_comb_logic invokes LibreLane v3.0.2 against the RTL plus a minimal config. The flow walked 75 steps end-to-end in about fifty seconds. In plain language, those 75 steps fall into seven phases:

Phase 1 — Lint & sanity (steps 1–4)

Verilator and Yosys’s own lint pass read the RTL looking for patterns that synthesize to something you didn’t intend: implicit nets, combinational loops, multiple drivers, inferred latches, missing default branches in case. Project 01 passes clean — the assign-only style makes latches structurally impossible.

Phase 2 — Synthesis (steps 5–9)

Yosys turns Verilog into a netlist of standard cells from the sky130 high-density library. The 4-to-1 mux becomes a few NAND-ish cells; the 4-bit add becomes a small chain of full-adders; the comparators for zero and op==xx become trees of OR/AND. Yosys reports zero unmapped instances, meaning every operator was lowered to real cells.

Phase 3 — Floorplan (steps 10–18)

OpenROAD is told the die is 60 × 60 μm with a 50 × 50 core area, target 20–35% utilization. It scatters the cells across the core, builds power grids (the orange horizontal stripes you see in the layout — those are the VPWR / VGND rails), and inserts well-tap cells (the regular pattern of small green squares).

This is where the flow grumbled most: with no CLOCK_PORT and no real flops, OpenROAD invented a synthetic __VIRTUAL_CLK__ so its STA engine had something to chew on. Three families of warnings (STA-0366, STA-0419, STA-0450) all stem from that — they’re the flow being honest that it’s running timing analysis on a clockless design.

Phase 4 — Placement & resizing (steps 19–37)

Cells get optimized positions (global → detailed placement). The resizer runs against the virtual clock and finds 17 paths it would like to buffer “for safety”. Those 17 timing-repair buffers are pure overhead in this design — they exist to protect a clock that doesn’t. We could disable resizer with a config flag; for project 01 we let it do its harmless thing.

Phase 5 — Routing (steps 38–50)

Global router lays out wire paths; detailed router commits them to specific metal layers. The detailed router found 11 DRC errors on its first iteration and fixed all 11 on the second pass. That’s the expected pattern — it’s an iterative solver, and convergence in two iterations on a small design is a good sign.

Phase 6 — Parasitic extraction & STA (steps 51–55)

OpenRCX extracts wire capacitance and resistance from the routed geometry. STA replays timing with those parasitics included; the worst setup slack at the slow corner (TT, 100°C, 1.60 V) lands at +0.54 ns. Comfortable, against an imaginary 25 ns clock.

Phase 7 — Signoff (steps 56–75)

The final run of independent verification tools, each checking something different:

Every signoff check came back zero violations. That’s the cleanest possible result for a first project.

Reading the layout

Open the viewer at the top of this page and double-click to fit. A few landmarks to find at fit zoom:

The horizontal salmon bands running across are the power rails: VPWR near the top of each row, VGND along the bottom. Cells get their power by sitting between two adjacent rails, drawing from above and dumping current to ground below. The rails repeat every cell-row height (~2.7 μm in sky130) so cells of any width can be tiled.

The regular grid of small squares in two staggered patterns: those are the well taps and substrate taps. Every few cells you need one to hold the n-well at VDD and the substrate at GND. Without them the substrate would float and devices would misbehave. The router places them on a regular pitch and the floorplan pre-allocates rows of them.

The larger regions of pink and red are the standard cells themselves — each one is a small piece of logic (a 2-input NAND, a mux, a buffer). Project 01 has 33 of these doing real work, plus 17 timing-repair buffers, plus the taps and fillers.

Zoom in (mouse wheel) anywhere and you’ll see the internal structure of a single cell: thin vertical poly stripes (transistor gates), the wider horizontal diff regions (source/drain), small contact squares (licon1, gold-ish), and the local-interconnect (li1) wires that hop between contacts inside one cell.

Vias connecting cell internals up to the routing layers (met1 and above) appear as small bright squares. Above the cells, longer thin strips are met2 running vertically and met3 running horizontally, forming the actual signal nets that connect cells to each other.

What just happened?

Plain Verilog → 81 cells → 60 μm² of silicon, with all signoff checks clean, in under a minute of compute. The flow inserted some scaffolding (a virtual clock, 17 buffers) it didn’t need but couldn’t help itself about; we let it. The result is a real GDS that, in principle, could be sent to a sky130 fab.

For project 01 the lesson is that the LibreLane flow has strong opinions about what a “well-formed design” looks like — clocked, registered, testable — and applies them whether your design needs them or not. It also means a clockless ALU like this one fits inside those opinions without breaking anything; the warnings are louder than the actual problems they describe.

See also