A 4-bit ALU with two operands, a 2-bit op selector, an 8-bit result, and two flags. Pure combinational — no flops, no clock — and the first thing on this ladder to be hardened all the way to a sky130 GDS.
- yosys synth 0 unmapped
- lint 0 latches
- magic DRC 0 violations
- klayout DRC 0 violations
- netgen LVS 0 mismatches
- antenna 0 violations
- max-slew/fan/cap all 3 corners
- wire length no threshold
What the circuit does
Two 4-bit operands a and b, a 2-bit op selector, and three outputs:
an 8-bit result y, a carry flag, and a zero flag. The op encoding:
op = 00— addition.yis the 5-bit sum zero-extended into 8 bits.carryis bit 4 of the 5-bit sum (the carry-out from a 4+4 → 5 add).op = 01— bitwise XOR.y[3:0] = a ^ b, top four bits are zero,carryis forced to 0.op = 10— bitwise AND. Same shape as XOR.op = 11— bitwise OR. Same shape as XOR.
zero is just (y == 8'h00), regardless of op. There is no clock, no
reset, no enable, no flops anywhere in the design. Outputs change the
moment inputs change. That’s it.
The Verilog
projects/01_comb_logic/src/top.v is one module, ~25 effective lines.
Plain Verilog 2001, no SystemVerilog conveniences — keeps the synthesis
toolchain happy on every backend.
`default_nettype none
module top (
input wire [3:0] a,
input wire [3:0] b,
input wire [1:0] op,
output wire [7:0] y,
output wire zero,
output wire carry
);
// Add: 5-bit sum so we have a real carry-out bit.
wire [4:0] sum5 = {1'b0, a} + {1'b0, b};
assign y = (op == 2'b00) ? {3'b000, sum5} :
(op == 2'b01) ? {4'b0000, a ^ b} :
(op == 2'b10) ? {4'b0000, a & b} :
{4'b0000, a | b};
assign carry = (op == 2'b00) ? sum5[4] : 1'b0;
assign zero = (y == 8'h00);
endmodule
Three things to notice:
The `default_nettype none at the top is mandatory for our project
conventions. It disables Verilog’s “create a 1-bit wire if you reference
an undeclared name” misfeature. Typo a signal name and the compiler
will yell at you instead of silently creating a wire that’s nothing.
The 5-bit sum5 is the only bit of width-juggling in the design. Two
4-bit operands can sum to 5 bits, so we zero-extend each to 5 bits before
adding. The MSB of sum5 is the carry-out — same bit you’d implement as
a chain of full-adders. Yosys will do exactly that, lowering the +
into a small ripple-carry tree.
The conditional-chain on y is a 4-to-1 multiplexer. It compiles to the
same gates whether you write it as nested ternaries or as a case block
inside always @*. We chose the ternary form because it’s smaller in
text, but if you mutate the design to play with that equivalence, you’ll
see an identical cell count after synthesis.
The testbench
Project 01’s testbench is exhaustive: 4 ops × 16 × 16 = 1024 input combinations, each compared against an inline Verilog reference model. For a design this small, exhaustive is cheap and removes any “did we test that case?” doubt.
for (oi = 0; oi < 4; oi = oi + 1) begin
op = oi[1:0];
for (ai = 0; ai < 16; ai = ai + 1) begin
a = ai[3:0];
for (bi = 0; bi < 16; bi = bi + 1) begin
b = bi[3:0];
// Reference model — the same op table, written separately.
case (op)
2'b00: begin
sum5_ref = {1'b0, a} + {1'b0, b};
exp_y = {3'b000, sum5_ref};
exp_carry = sum5_ref[4];
end
2'b01: begin exp_y = {4'b0000, a ^ b}; exp_carry = 1'b0; end
2'b10: begin exp_y = {4'b0000, a & b}; exp_carry = 1'b0; end
2'b11: begin exp_y = {4'b0000, a | b}; exp_carry = 1'b0; end
endcase
exp_zero = (exp_y == 8'h00);
#1; // settle combinational paths before sampling
if (y !== exp_y || zero !== exp_zero || carry !== exp_carry)
$display("FAIL ...");
end
end
end
A few things this testbench does on purpose:
The reference model is written in the same Verilog file but in a
deliberately different style — case block instead of ternary, named
intermediate sum5_ref. Two implementations that disagree are easier to
catch than two copies of the same expression that share a typo.
The #1 delay before sampling looks redundant for pure combinational
logic, but it’s the right habit for sequential designs (Project 02
onward will have flops). Better to write it correctly from project 01
than to skip it and forget when it matters.
The !== (case-equality) operator is used instead of != so a DUT
output of x (unknown) counts as a mismatch — != against x returns
x, which evaluates as false.
What the simulator says
$ make test PROJECT=01_comb_logic
== 01_comb_logic ==
iverilog -g2012 -Wall -o tb.vvp -s tb ../src/top.v tb.v
warning: Some design elements have no explicit time unit and/or
: time precision. This may cause confusing timing results.
: Affected design elements are:
: -- module top declared here: ../src/top.v:16
VCD info: dumpfile tb.vcd opened for output.
PASS: 1024 vectors checked.
tb.v:85: $finish called at 1024000 (1ps)
01_comb_logic PASS
The Icarus warning about top having no `timescale is intentional
and correct. Design modules should not carry a timescale directive — the
testbench is the authoritative time domain. Icarus is just observing
that the design is inheriting the testbench’s 1ns/1ps scale, which is
exactly what you want.
The exit timestamp 1024000 (in 1ps units) is informative: 1024 vectors
× the #1ns settle each = 1024 ns = 1024000 ps. So the simulator burned
through the entire testspace in just over a microsecond of simulated
time and a few milliseconds of wall time.
What LibreLane did
make harden PROJECT=01_comb_logic invokes LibreLane v3.0.2 against the
RTL plus a minimal config. The flow walked 75 steps end-to-end in
about fifty seconds. In plain language, those 75 steps fall into seven
phases:
Phase 1 — Lint & sanity (steps 1–4)
Verilator and Yosys’s own lint pass read the RTL looking for patterns
that synthesize to something you didn’t intend: implicit nets,
combinational loops, multiple drivers, inferred latches, missing default
branches in case. Project 01 passes clean — the assign-only style
makes latches structurally impossible.
Phase 2 — Synthesis (steps 5–9)
Yosys turns Verilog into a netlist of standard cells from the sky130
high-density library. The 4-to-1 mux becomes a few NAND-ish cells; the
4-bit add becomes a small chain of full-adders; the comparators for
zero and op==xx become trees of OR/AND. Yosys reports zero unmapped
instances, meaning every operator was lowered to real cells.
Phase 3 — Floorplan (steps 10–18)
OpenROAD is told the die is 60 × 60 μm with a 50 × 50 core area, target
20–35% utilization. It scatters the cells across the core, builds power
grids (the orange horizontal stripes you see in the layout — those are
the VPWR / VGND rails), and inserts well-tap cells (the regular
pattern of small green squares).
This is where the flow grumbled most: with no CLOCK_PORT and no real
flops, OpenROAD invented a synthetic __VIRTUAL_CLK__ so its STA engine
had something to chew on. Three families of warnings (STA-0366,
STA-0419, STA-0450) all stem from that — they’re the flow being
honest that it’s running timing analysis on a clockless design.
Phase 4 — Placement & resizing (steps 19–37)
Cells get optimized positions (global → detailed placement). The resizer runs against the virtual clock and finds 17 paths it would like to buffer “for safety”. Those 17 timing-repair buffers are pure overhead in this design — they exist to protect a clock that doesn’t. We could disable resizer with a config flag; for project 01 we let it do its harmless thing.
Phase 5 — Routing (steps 38–50)
Global router lays out wire paths; detailed router commits them to specific metal layers. The detailed router found 11 DRC errors on its first iteration and fixed all 11 on the second pass. That’s the expected pattern — it’s an iterative solver, and convergence in two iterations on a small design is a good sign.
Phase 6 — Parasitic extraction & STA (steps 51–55)
OpenRCX extracts wire capacitance and resistance from the routed geometry. STA replays timing with those parasitics included; the worst setup slack at the slow corner (TT, 100°C, 1.60 V) lands at +0.54 ns. Comfortable, against an imaginary 25 ns clock.
Phase 7 — Signoff (steps 56–75)
The final run of independent verification tools, each checking something different:
- Magic DRC — geometric design rule checks (minimum widths, spacings, enclosures)
- KLayout DRC — second opinion using a different rule deck
- KLayout XOR — compares the GDS to the database the routers produced; catches GDS-write bugs
- Magic SPICE extraction → Netgen LVS — extracts a SPICE netlist from the GDS, compares device-by-device against the synthesized netlist
- Antenna check — looks for long unconnected metal wires that could collect manufacturing-time charge and damage gates
Every signoff check came back zero violations. That’s the cleanest possible result for a first project.
Reading the layout
Open the viewer at the top of this page and double-click to fit. A few landmarks to find at fit zoom:
The horizontal salmon bands running across are the power rails:
VPWR near the top of each row, VGND along the bottom. Cells get
their power by sitting between two adjacent rails, drawing from above
and dumping current to ground below. The rails repeat every cell-row
height (~2.7 μm in sky130) so cells of any width can be tiled.
The regular grid of small squares in two staggered patterns: those are the well taps and substrate taps. Every few cells you need one to hold the n-well at VDD and the substrate at GND. Without them the substrate would float and devices would misbehave. The router places them on a regular pitch and the floorplan pre-allocates rows of them.
The larger regions of pink and red are the standard cells themselves — each one is a small piece of logic (a 2-input NAND, a mux, a buffer). Project 01 has 33 of these doing real work, plus 17 timing-repair buffers, plus the taps and fillers.
Zoom in (mouse wheel) anywhere and you’ll see the internal structure
of a single cell: thin vertical poly stripes (transistor gates), the
wider horizontal diff regions (source/drain), small contact squares
(licon1, gold-ish), and the local-interconnect (li1) wires that
hop between contacts inside one cell.
Vias connecting cell internals up to the routing layers (met1 and
above) appear as small bright squares. Above the cells, longer thin
strips are met2 running vertically and met3 running horizontally,
forming the actual signal nets that connect cells to each other.
What just happened?
Plain Verilog → 81 cells → 60 μm² of silicon, with all signoff checks clean, in under a minute of compute. The flow inserted some scaffolding (a virtual clock, 17 buffers) it didn’t need but couldn’t help itself about; we let it. The result is a real GDS that, in principle, could be sent to a sky130 fab.
For project 01 the lesson is that the LibreLane flow has strong opinions about what a “well-formed design” looks like — clocked, registered, testable — and applies them whether your design needs them or not. It also means a clockless ALU like this one fits inside those opinions without breaking anything; the warnings are louder than the actual problems they describe.
See also
- Project README — the full lesson plan, including “what could go wrong” and “next mutations to try.”
- Journal: project 01 RTL pass — sim-time entry.
- Journal: project 01 hardened — getting LibreLane installed and the harden through.