No. 08 / project of 147 on the ladder

Macro integration

introduces — hard-IP integration, sync-read SRAM, sim-vs-synth model split

harden statelast run2026-04-28
cells32,098non-filler
slack10.37ns setup
area360000 (die) / 346472 (core)μm²
signoff
  • DRCPARTIAL
  • LVSPASS
  • antennaPASS

P07’s tiny SoC, with the 16-byte flop-based RAM swapped for a real sky130 OpenRAM SRAM macro: sky130_sram_1kbyte_1rw1r_8x1024_8. The chip now contains a hard-IP block that the place-and-route flow has to abut against standard cells, route power around, and treat as opaque.

layout · sky130A x= μm y= μm
drag · scroll to zoom · double-click to fit · 1 1:1 · f fit 600 × 600 μm die · sky130A · 40 MHz target · CPU + 1 KB SRAM macro + UART + GPIO
3d · sky130A · z×10
drag · scroll · right-drag pan · double-click recenter · R reset full sky130 stack · z exaggerated 10× · SRAM macro internals visible

Clock target: 40 MHz. 32,098 standard cells (excluding the OpenRAM macro itself). DRC checks marked SKIP rather than PASS — the OpenRAM SRAM has known OPC-rule mismatches against magic + klayout’s deck; detail in the next section.

Why DRC is SKIP, not PASS. OpenRAM macros have known optical- proximity DRC mismatches against magic’s rule deck (the cell layouts were generated by an older OPC routine than what magic checks against). The OpenLane SRAM tutorial documents this and tells you to set ERROR_ON_MAGIC_DRC: false — that’s what we did. The errors are real but not actionable from our side; the macro’s GDS is what it is, and a real fab using these macros would override magic’s rule deck for the macro region.

What’s actually different from P07

The page below walks through exactly four changes — that’s the entire delta from P07. Everything else (the CPU, the address decoder, the peripherals, the demo program) is identical.

CPU addressdecoder SRAM macro(NEW: hard-IP, sync-read) UART (P07) GPIO (P07) bus_rdatamux
Same SoC shape as P07, but the RAM block changed kind. Everything else is the same — that's the point of the lesson.

Change 1 — the wrapper

P07’s RAM was 16 bytes of flops with an async read. P08’s RAM is a 1 KB OpenRAM macro with a synchronous read on clk0. Same module name to the bus side; the change lives entirely in sram_wrapper:

projects/08_macro_integration/src/top.sv system-verilog · L538-579
// web=0 means write, web=1 means read.
//
// The macro has a second read port we don't use; we tie csb1=1.
// =====================================================================
module sram_wrapper (
    input  logic        clk,
    input  logic        rst_n,
    input  logic [6:0]  addr,
    input  logic [7:0]  wdata,
    input  logic        we,
    input  logic        re,
    output logic [7:0]  rdata
);
  // Only assert chip-select when the bus is doing something.
  wire        csb0 = ~(we | re);
  wire        web0 = ~we;          // 0 = write, 1 = read
  wire [9:0]  addr0 = {3'b000, addr};

  // Power pins (vccd1 / vssd1) are NOT connected here — the macro
  // model `ifdef USE_POWER_PINS`s them, and Verilator rejects tying
  // tristate inout ports to a constant. LibreLane stitches the chip-
  // level PDN onto these pins via the `MACROS` config; for sim, we
  // leave USE_POWER_PINS undefined so the model's port list omits them.
  sky130_sram_1kbyte_1rw1r_8x1024_8 u_macro (
    .clk0    (clk),
    .csb0    (csb0),
    .web0    (web0),
    .wmask0  (1'b1),               // single-byte writes always
    .addr0   (addr0),
    .din0    (wdata),
    .dout0   (rdata),
    // Port 1 (read-only) tied off.
    .clk1    (clk),
    .csb1    (1'b1),
    .addr1   (10'h000),
    .dout1   ()
  );

  // Reset: the macro doesn't honour rst_n; we ignore it. Power-on
  // contents are undefined in the GDS but the behavioural model
  // initializes mem[] to X. Programs must write before they read.
  wire _unused = &{1'b0, rst_n};

The csb0/web0/addr0/din0/dout0 pin names come straight from the OpenRAM-shipped macro. They’re active-low chip-select and write-enable — csb=0, web=0 is “write”, csb=0, web=1 is “read”.

Change 2 — the FSM picks up an extra state

The OpenRAM macro is registered-read: drive csb0 / addr0 in cycle N, dout0 is valid in cycle N+1. P07’s CPU captured bus_rdata on the EXECUTE→WB clock edge — that worked when the bus slave was a flop-based async-read RAM. With a sync-read SRAM, dout0 at that edge is still the OLD value.

The fix is one new FSM state, S_MEMWAIT, that runs only on LD:

is_ld otherwise not HLT HLT FETCH DECODE EXECUTE MEMWAIT WB HALT
MEMWAIT only fires for LD. ST writes synchronously like before; ALU/branch ops never touch the bus.

The CPU’s bus-master logic asserts bus_re across both EXECUTE and MEMWAIT so the macro sees a stable address through its negedge clk read scheduler:

projects/08_macro_integration/src/top.sv system-verilog · L441-471

  // Branch resolution (uses the most recently captured flag register).
  wire take_branch = is_jmp
                   || (is_bz  &&  flags_q[3])
                   || (is_bnz && ~flags_q[3]);

  // ----- Bus master signals -----
  // ST drives the bus during EXECUTE (1-cycle write — synchronous in
  // the SRAM, async-write semantics for the GPIO peripheral).
  // LD asserts bus_re across BOTH EXECUTE and MEMWAIT so the SRAM
  // macro sees csb0=0 long enough for its `negedge clk` read scheduler
  // to fire. The MEMWAIT→WB clock edge captures dout0 into result_q.
  always_comb begin
    bus_addr  = 8'h00;
    bus_wdata = 8'h00;
    bus_we    = 1'b0;
    bus_re    = 1'b0;
    if (state == S_EXECUTE) begin
      if (is_st) begin
        bus_addr  = op_a;
        bus_wdata = op_b;
        bus_we    = 1'b1;
      end else if (is_ld) begin
        bus_addr  = op_a;
        bus_re    = 1'b1;
      end
    end else if (state == S_MEMWAIT) begin
      // Hold bus_addr / bus_re through MEMWAIT so the SRAM macro's
      // posedge-latched csb0/web0/addr0 are stable when its `negedge`
      // read scheduler fires mid-cycle.
      bus_addr = op_a;

LD instructions take 5 cycles instead of 4. ST and everything else stay at 4 — the FSM only diverges on is_ld.

Change 3 — the address map

P07 had RAM at 0x00..0x0F (16 bytes). P08 exposes 128 bytes of the 1 KB macro through bus_addr[6:0], so the RAM range stretches from 0x00..0x7F. Peripherals move up:

addrnameaccess
0x00..0x7FRAM (SRAM macro)R/W
0x80..0x82UARTR/W
0xC0..0xC1GPIOR/W

The decoder is two more lines than P07’s:

projects/08_macro_integration/src/top.sv system-verilog · L100-115
  // Slave selects — one-hot from the address.
  // RAM at 0x00..0x7F: bus_addr[7] = 0.
  wire ram_sel    = (bus_addr[7] == 1'b0);
  // UART at 0x80..0x82.
  wire uart_sel   = (bus_addr >= 8'h80) && (bus_addr <= 8'h82);
  // GPIO at 0xC0..0xC1.
  wire gpio_sel   = (bus_addr >= 8'hC0) && (bus_addr <= 8'hC1);

  // Read-data mux — combinational, picks the active slave's rdata.
  logic [7:0] ram_rdata;
  logic [7:0] uart_rdata;
  logic [7:0] gpio_rdata;
  always_comb begin
    if      (ram_sel)  bus_rdata = ram_rdata;
    else if (uart_sel) bus_rdata = uart_rdata;
    else if (gpio_sel) bus_rdata = gpio_rdata;

Change 4 — the librelane config

Hard IP is invisible to standard-cell-only librelane/config.yamls (P01–P07 didn’t need to touch this). P08’s config picks up the MACROS: block plus a few related knobs:

projects/08_macro_integration/librelane/config.yaml yaml
# Project 08 — macro integration with the sky130 1KB OpenRAM SRAM.
#
# This is the first hardening on the ladder that includes a hard-IP
# block. The SRAM macro lives at a fixed location on the floorplan;
# the standard-cell logic places around it. PDN straps land on the
# macro's pre-placed `vccd1` / `vssd1` pins via FP_PDN_MACRO_HOOKS.
#
# We use OpenLane2/LibreLane's modern `MACROS:` schema rather than the
# older `EXTRA_LEFS / EXTRA_GDS_FILES / EXTRA_LIBS` triple. Each macro
# has its own block with `gds`, `lef`, `nl` (gate-level netlist for
# STA), `lib` (process-corner-keyed), and `instances` (placement +
# orientation).
#
# Iteration plan: the OpenLane SRAM tutorial recommends 25 ns clock
# (40 MHz) for SoCs containing one of these macros. P07 settled at
# 14 ns; with the SRAM in the critical path we'll likely need to drop
# further. Start at 25 ns and tighten if there's slack.

DESIGN_NAME: top

VERILOG_FILES:
  - dir::../src/top.sv

PDK: sky130A
STD_CELL_LIBRARY: sky130_fd_sc_hd

CLOCK_PORT:   clk
CLOCK_PERIOD: 25.0

# Floorplan — 600 × 600 µm. The SRAM macro is 455 × 446 µm, so this
# leaves a ring of standard-cell area around it for the CPU + bus
# logic + UART + GPIO. ~4 µm gap above the core for taps.
FP_SIZING:    absolute
DIE_AREA:     [0, 0, 600, 600]
CORE_AREA:    [5, 5, 595, 595]
FP_CORE_UTIL: 25

# The OpenRAM macros use `vccd1` / `vssd1` as the power net names
# (Caravel convention). The `MACROS.instances.*.power_connections`
# entries below stitch them onto the chip's PDN.
VDD_NETS:
  - vccd1
GND_NETS:
  - vssd1

# Macro definitions. Path globs resolve through ciel's PDK cache.
#
# OpenRAM ships only the TT_1p8V_25C corner for this macro. Real
# silicon flows would pad in corner-derated approximations; for an
# educational harden we re-use the TT lib as a stand-in for every
# corner the flow asks for. Timing slack reported against ss/ff
# corners is therefore optimistic — note this in the README.
MACROS:
  sky130_sram_1kbyte_1rw1r_8x1024_8:
    gds:
      - pdk_dir::libs.ref/sky130_sram_macros/gds/sky130_sram_1kbyte_1rw1r_8x1024_8.gds
    lef:
      - pdk_dir::libs.ref/sky130_sram_macros/lef/sky130_sram_1kbyte_1rw1r_8x1024_8.lef
    nl:
      - pdk_dir::libs.ref/sky130_sram_macros/verilog/sky130_sram_1kbyte_1rw1r_8x1024_8.v
    lib:
      "*":
        - pdk_dir::libs.ref/sky130_sram_macros/lib/sky130_sram_1kbyte_1rw1r_8x1024_8_TT_1p8V_25C.lib
    instances:
      u_ram.u_macro:
        location: [80, 80]
        orientation: N

# PDN macro connections — stitches the chip-level power straps onto
# each macro instance's pre-placed power pins. Format per entry:
#   "<instance_name_regex> <vdd_net> <gnd_net> <vdd_pin> <gnd_pin>"
# The OpenRAM macros use vccd1/vssd1 as their pin names.
PDN_MACRO_CONNECTIONS:
  - "u_ram.u_macro vccd1 vssd1 vccd1 vssd1"

# OpenRAM macros have known optical-proximity DRC mismatches that are
# not our problem. The OpenLane tutorial explicitly tells you to
# disable the gate.
ERROR_ON_MAGIC_DRC: false
# Same story for KLayout's deck.
RUN_KLAYOUT_DRC: false

Sim ≠ synth

OpenRAM ships a behavioural Verilog model alongside the GDS/LEF/lib. LibreLane uses it as the gate-level netlist for STA; iverilog can’t schedule one of its non-blocking-with-delay constructs and ends up returning xx for every read. So we ship a port-compatible stub at test/sram_model.v that the simulator gets instead — same module name, same ports, simpler internal semantics.

The synthesis path consumes the OpenRAM model from the PDK; sim gets our stub. This is a common pattern (Cornell ECE 5745 documents the same workaround). The whole stub is 70 lines:

projects/08_macro_integration/test/sram_model.v system-verilog
// Behavioural simulation stub for the sky130 OpenRAM SRAM macro
// `sky130_sram_1kbyte_1rw1r_8x1024_8`.
//
// Why this file exists rather than the `.v` from the PDK directly:
// the OpenRAM-shipped behavioural model uses
//
//     dout0 <= #(DELAY) mem[addr0_reg];
//
// inside an `always @(negedge clk)`. Iverilog's scheduler does not
// reliably propagate that NBA-with-intra-delay update — `dout0` ends
// up stuck at X even though writes commit and the read condition
// fires. This is documented anti-behaviour in several public chipyard /
// cocotb bug threads (see /notes/diagrams or the upstream tracker).
// LibreLane (and any synthesis tool) consumes only the `.lef` / `.lib`
// / `.gds` from the PDK; the `.v` model exists only for behavioural
// simulation. So we replace it with a port-compatible stub that uses
// straightforward sync-write/sync-read semantics. Same ports, same
// names, no hidden gotchas.
//
// Macro behavior reproduced here:
//   - 1024 × 8 bit storage on port 0 (1 R/W).
//   - Synchronous read: drive csb0=0/web0=1/addr0=A in cycle N, dout0
//     valid in cycle N+1 (latched on the posedge between).
//   - Synchronous write: drive csb0=0/web0=0/addr0/din0; mem[addr0]
//     latches on the posedge.
//   - Port 1 (read-only) is wired through with the same sync-read
//     shape, addressed by addr1.
//
// We don't model T_HOLD, the dout-X-window, or wmask — wmask0[0] is
// always treated as "write all 8 bits" since our SoC never partial-
// writes.

`default_nettype none

module sky130_sram_1kbyte_1rw1r_8x1024_8 (
`ifdef USE_POWER_PINS
    inout vccd1,
    inout vssd1,
`endif
    // Port 0 (read/write)
    input  wire         clk0,
    input  wire         csb0,    // active-low chip select
    input  wire         web0,    // active-low write enable (web0=1 = read)
    input  wire         wmask0,  // unused: treated as 1
    input  wire [9:0]   addr0,
    input  wire [7:0]   din0,
    output reg  [7:0]   dout0,

    // Port 1 (read-only)
    input  wire         clk1,
    input  wire         csb1,
    input  wire [9:0]   addr1,
    output reg  [7:0]   dout1
);

  reg [7:0] mem [0:1023];

  always @(posedge clk0) begin
    if (!csb0) begin
      if (!web0) mem[addr0] <= din0;        // write
      else       dout0      <= mem[addr0];  // read
    end
  end

  always @(posedge clk1) begin
    if (!csb1) dout1 <= mem[addr1];
  end

  // Tieoff lint warnings for unused signals.
  wire _unused = &{1'b0, wmask0};

endmodule

`default_nettype wire

What just happened?

We integrated a hard-IP block. Four small changes get us from a standard-cell-only SoC to one with a 455 × 446 µm SRAM macro dropped into the floorplan. The flow that hardened P01–P07 picked up the macro through the MACROS: config, abutted standard cells around it, routed PDN straps onto its pre-placed power pins, and treated it as opaque for everything else. P09’s RV32I-min could adopt the same pattern for instruction or data memory.

See also

  • Project 07 → the all-flop SoC this evolves from. Diff between P07 and P08 is exactly the four changes documented above.
  • Project README