The start page explained what a chip is and what the flow does at the conceptual level. This page is the inventory: which specific program runs at each step, what it actually does, what it consumes, what it produces.
One make harden PROJECT=NN_name walks this entire chain. LibreLane
wraps the sequence; our make target wraps LibreLane. By the end you
have a GDS file you could ship to a fab.
What follows is each stage as it ran on project 01 — the smallest fully- hardened design on the ladder. Each section shows the actual file the stage consumed and what it produced; click through to see the rest.
Author the RTL
Plain Verilog, behavioural
The starting point is just a Verilog source file. Project 01’s design is a 4-bit ALU: two operands, a 2-bit op selector, and three outputs (an 8-bit result, a carry flag, a zero flag). Pure combinational — no clock, no flops, no state.
`default_nettype none
module top (
input wire [3:0] a,
input wire [3:0] b,
input wire [1:0] op,
output wire [7:0] y,
output wire zero,
output wire carry
);
// Add: 5-bit sum so we have a real carry-out bit.
wire [4:0] sum5 = {1'b0, a} + {1'b0, b};
assign y = (op == 2'b00) ? {3'b000, sum5} :
(op == 2'b01) ? {4'b0000, a ^ b} :
(op == 2'b10) ? {4'b0000, a & b} :
{4'b0000, a | b};
assign carry = (op == 2'b00) ? sum5[4] : 1'b0;
assign zero = (y == 8'h00);
endmodule
`default_nettype wire Simulate
iverilog · -g2012
Before anything synthesizes, the RTL has to actually do
what we think it does. We compile each project’s testbench with
Icarus Verilog at the IEEE 1800-2012 SystemVerilog level, then
execute it under vvp. P01’s testbench is exhaustive — 4 ops × 16 × 16
= 1024 input combinations, each compared against an inline reference.
Iverilog is slower than Verilator and not cycle-accurate in the high-performance sense, but it parses real SystemVerilog testbenches without a build step and writes legible VCDs. For a teaching repo that’s the right trade.
== 01_comb_logic ==
iverilog -g2012 -Wall -o tb.vvp -s tb ../src/top.v tb.v
warning: Some design elements have no explicit time unit and/or
: time precision. (...)
VCD info: dumpfile tb.vcd opened for output.
PASS: 1024 vectors checked.
tb.v:85: $finish called at 1024000 (1ps)
01_comb_logic PASS Synthesize
Yosys → standard cells
Yosys turns behavioural RTL into a netlist of fixed
cells from the PDK’s standard-cell library. The 4-to-1
mux becomes a few NAND-ish cells; the 4-bit add becomes a chain of
full-adders; the comparators for zero and the op decode become
trees of OR/AND. Every operator is lowered to a real cell
instance.
Caveat we hit on project 06: Yosys’s read-Verilog frontend doesn’t
accept function automatic ... return {...} inside parameter
initializers. Stick to Verilog-2001 function form (function foo; ... foo = ...) and you’re fine.
sky130_fd_sc_hd__and2_2 _24_ ( .A(b[3]), .B(a[3]), .X(_00_) ); sky130_fd_sc_hd__nor2_2 _25_ ( .A(b[3]), .B(a[3]), .Y(_01_) ); sky130_fd_sc_hd__nor2_2 _26_ ( .A(_00_), .B(_01_), .Y(_02_) ); sky130_fd_sc_hd__and2_2 _27_ ( .A(b[2]), .B(a[2]), .X(_03_) ); sky130_fd_sc_hd__xor2_2 _28_ ( ... );
Floorplan
OpenROAD · DIE_AREA, rows, tracks
OpenROAD reads the netlist and the project’s config.yaml (which sets
DIE_AREA: [0, 0, 60, 60] for P01), then writes the first DEF
of the run. The floorplan establishes: where the die boundary lives,
where the standard-cell rows go (every 2.72 µm at sky130 high-density),
and which routing tracks the router will be allowed to use later.
No cells are placed yet — just the substrate they’ll sit on.
VERSION 5.8 ; DIVIDERCHAR "/" ; BUSBITCHARS "[]" ; DESIGN top ; UNITS DISTANCE MICRONS 1000 ; DIEAREA ( 0 0 ) ( 60000 60000 ) ; ROW ROW_0 unithd 5060 5440 N DO 108 BY 1 STEP 460 0 ; ROW ROW_1 unithd 5060 8160 FS DO 108 BY 1 STEP 460 0 ; ROW ROW_2 unithd 5060 10880 N DO 108 BY 1 STEP 460 0 ; ROW ROW_3 unithd 5060 13600 FS DO 108 BY 1 STEP 460 0 ; ... TRACKS X 230 DO 130 STEP 460 LAYER li1 ; TRACKS Y 170 DO 176 STEP 340 LAYER li1 ; TRACKS X 170 DO 176 STEP 340 LAYER met1 ; ...
Place
OpenROAD · global → detailed
Now the cells move from “exist” to “exist somewhere”. Global placement
spreads them across the die roughly; detailed placement tightens them
onto the rows and resolves overlaps. The same DEF gets rewritten with
each cell’s ( x y ) coordinate filled in.
COMPONENTS 117 ; - PHY_EDGE_ROW_0_Left_18 sky130_fd_sc_hd__decap_3 + SOURCE DIST + FIXED ( 5060 5440 ) N ; - PHY_EDGE_ROW_0_Right_0 sky130_fd_sc_hd__decap_3 + SOURCE DIST + FIXED ( 53360 5440 ) FN ; ... - _24_ sky130_fd_sc_hd__and2_2 + PLACED ( 28980 8160 ) FS ; - _25_ sky130_fd_sc_hd__nor2_2 + PLACED ( 30360 8160 ) FS ; - _26_ sky130_fd_sc_hd__nor2_2 + PLACED ( 31740 8160 ) FS ; - _27_ sky130_fd_sc_hd__and2_2 + PLACED ( 33120 8160 ) FS ; ... END COMPONENTS
Clock-tree synthesis
OpenROAD · skipped on P01
CTS builds the buffered tree that distributes the clock from one input
pin to every flop on the chip with minimal skew. Project 01 has no
clock and no flops, so CTS runs as a no-op. The flow still walks the
step — it inserts a synthetic __VIRTUAL_CLK__ so STA has something
to chew on, which is why the simulation log mentions a virtual clock
even though there’s no clk port. From P02 onward this stage actually
does work.
Route
OpenROAD · global → detailed → repair
Global routing assigns each net to a coarse channel; detailed routing commits each segment to a specific metal layer with explicit vias. P01’s detailed router found 11 DRC errors on its first iteration and fixed them all on the second pass — convergence in two passes is the expected shape for a small design.
VIAS 4 ; - via2_3_1600_480_1_5_320_320 + VIARULE M1M2_PR + CUTSIZE 150 150 + LAYERS met1 via met2 + ENCLOSURE 85 165 55 85 + ROWCOL 1 5 ; - via3_4_1600_480_1_4_400_400 + VIARULE M2M3_PR + CUTSIZE 200 200 + LAYERS met2 via2 met3 + ENCLOSURE 40 85 65 65 + ROWCOL 1 4 ; - via4_5_1600_480_1_4_400_400 + VIARULE M3M4_PR + CUTSIZE 200 200 + LAYERS met3 via3 met4 + ENCLOSURE 90 60 100 65 + ROWCOL 1 4 ; - via5_6_1600_1600_1_1_1600_1600 + VIARULE M4M5_PR + CUTSIZE 800 800 + LAYERS met4 via4 met5 + ENCLOSURE 400 190 310 400 ; END VIAS COMPONENTS 117 ; - _24_ sky130_fd_sc_hd__and2_2 + PLACED ( 28980 8160 ) FS ; ...
Signoff
Magic + KLayout + Netgen + antenna
Three independent tools attack the final geometry: Magic runs the fab-approved DRC rule deck; KLayout runs a different DRC rule deck for a cross-check; Netgen extracts a SPICE netlist from the GDS and compares it device-by-device against the synthesized netlist (LVS). An antenna check looks for long unconnected metal segments that could collect manufacturing-time charge.
Two DRC tools is not paranoia — they have different rule decks and catch different bugs. Anything that would block at the fab gets caught here.
top ---------------------------------------- [INFO] COUNT: 0
Subcircuit pins: Circuit 1: top |Circuit 2: top ------------------------------------|------------------------------------ zero |zero carry |carry y[7] |y[7] ... (all ports match) Cell pin lists are equivalent. Device classes top and top are equivalent. Final result: Circuits match uniquely.
GDS
Final geometry · ready for fab
The flow’s terminal artifact is one top.gds file — a binary stream of
every polygon on every layer. SkyWater would feed this to a maskmaker;
on this site we feed it to gds_to_glb.py and gds_to_svg.py to
produce the layout viewers on each project page.
$ ls -la final/gds/top.gds -rw-r--r-- 1 jadams jadams 32664 final/gds/top.gds $ file final/gds/top.gds final/gds/top.gds: GDS data, version 5.8, library 'top'
The PDK · sky130A
Underneath every step in the flow above is the PDK — the
collection of files that says what cells exist, how big they are, what
DRC rules to enforce, and what the SPICE models look like. This site
uses sky130A from SkyWater Foundries, opened by Google + SkyWater
in 2020 and the first open-source PDK with a real fab path. The
high-density standard cell library is sky130_fd_sc_hd.
The PDK itself ships as ~3 GB of libraries. We let ciel (a PDK manager)
keep a versioned cache under ~/.ciel/. When LibreLane needs a cell’s
LEF or .lib, it reaches into that cache.
How we run it
# Walks all 75 flow steps for one project. ~50 seconds wall-time on a
# laptop for a 60 × 60 µm design.
make harden PROJECT=01_comb_logicThe wrapper is scripts/run_librelane.sh. It picks Nix
(nix run github:librelane/librelane) when available, falls back to
Docker, redirects scratch onto a bigger volume, and invokes LibreLane
with the project’s librelane/config.yaml.
Site-only tooling
Three small scripts in scripts/ produce the viewer assets for each
project page from the same final GDS:
gds_to_glb.py— extrudes each sky130 layer to a 3D mesh and writes a glTF scene; run throughgltfpack -ccfor meshopt compression before shipping. Powers the<ChipViewer>on every project page.gds_to_svg.py— flattens the GDS to a single big SVG.svg_to_shapes.py— turns that SVG into a compact JSON polygon list the<LayoutViewer>reads.
gtkwave for waveform debug. Three.js drives the 3D viewer. The
diagrams on this page render through beautiful-mermaid — see the
notes/diagrams writeup.
What’s missing
The flow above produces a GDS but doesn’t actually tape out anything.
Project 10 (Tiny Tapeout) on the roadmap is where a real
chip ships. There’s also no interactive simulator yet — Verilator + a
pty bridge is the planned route to running screen against a running
sim, probably wired up alongside project 07.