P07’s tiny SoC, with the 16-byte flop-based RAM swapped for a real
sky130 OpenRAM SRAM macro: sky130_sram_1kbyte_1rw1r_8x1024_8.
The chip now contains a hard-IP block that the place-and-route flow
has to abut against standard cells, route power around, and treat as
opaque.
Clock target: 40 MHz. 32,098 standard cells (excluding the OpenRAM macro itself). DRC checks marked SKIP rather than PASS — the OpenRAM SRAM has known OPC-rule mismatches against magic + klayout’s deck; detail in the next section.
Why DRC is SKIP, not PASS. OpenRAM macros have known optical- proximity DRC mismatches against magic’s rule deck (the cell layouts were generated by an older OPC routine than what magic checks against). The OpenLane SRAM tutorial documents this and tells you to set
ERROR_ON_MAGIC_DRC: false— that’s what we did. The errors are real but not actionable from our side; the macro’s GDS is what it is, and a real fab using these macros would override magic’s rule deck for the macro region.
What’s actually different from P07
The page below walks through exactly four changes — that’s the entire delta from P07. Everything else (the CPU, the address decoder, the peripherals, the demo program) is identical.
Change 1 — the wrapper
P07’s RAM was 16 bytes of flops with an async read. P08’s RAM is a
1 KB OpenRAM macro with a synchronous read on clk0. Same module
name to the bus side; the change lives entirely in sram_wrapper:
// web=0 means write, web=1 means read.
//
// The macro has a second read port we don't use; we tie csb1=1.
// =====================================================================
module sram_wrapper (
input logic clk,
input logic rst_n,
input logic [6:0] addr,
input logic [7:0] wdata,
input logic we,
input logic re,
output logic [7:0] rdata
);
// Only assert chip-select when the bus is doing something.
wire csb0 = ~(we | re);
wire web0 = ~we; // 0 = write, 1 = read
wire [9:0] addr0 = {3'b000, addr};
// Power pins (vccd1 / vssd1) are NOT connected here — the macro
// model `ifdef USE_POWER_PINS`s them, and Verilator rejects tying
// tristate inout ports to a constant. LibreLane stitches the chip-
// level PDN onto these pins via the `MACROS` config; for sim, we
// leave USE_POWER_PINS undefined so the model's port list omits them.
sky130_sram_1kbyte_1rw1r_8x1024_8 u_macro (
.clk0 (clk),
.csb0 (csb0),
.web0 (web0),
.wmask0 (1'b1), // single-byte writes always
.addr0 (addr0),
.din0 (wdata),
.dout0 (rdata),
// Port 1 (read-only) tied off.
.clk1 (clk),
.csb1 (1'b1),
.addr1 (10'h000),
.dout1 ()
);
// Reset: the macro doesn't honour rst_n; we ignore it. Power-on
// contents are undefined in the GDS but the behavioural model
// initializes mem[] to X. Programs must write before they read.
wire _unused = &{1'b0, rst_n}; The csb0/web0/addr0/din0/dout0 pin names come straight from
the OpenRAM-shipped macro. They’re active-low chip-select and
write-enable — csb=0, web=0 is “write”, csb=0, web=1 is “read”.
Change 2 — the FSM picks up an extra state
The OpenRAM macro is registered-read: drive csb0 / addr0 in
cycle N, dout0 is valid in cycle N+1. P07’s CPU captured bus_rdata
on the EXECUTE→WB clock edge — that worked when the bus slave was a
flop-based async-read RAM. With a sync-read SRAM, dout0 at that edge
is still the OLD value.
The fix is one new FSM state, S_MEMWAIT, that runs only on LD:
The CPU’s bus-master logic asserts bus_re across both EXECUTE and
MEMWAIT so the macro sees a stable address through its negedge clk
read scheduler:
// Branch resolution (uses the most recently captured flag register).
wire take_branch = is_jmp
|| (is_bz && flags_q[3])
|| (is_bnz && ~flags_q[3]);
// ----- Bus master signals -----
// ST drives the bus during EXECUTE (1-cycle write — synchronous in
// the SRAM, async-write semantics for the GPIO peripheral).
// LD asserts bus_re across BOTH EXECUTE and MEMWAIT so the SRAM
// macro sees csb0=0 long enough for its `negedge clk` read scheduler
// to fire. The MEMWAIT→WB clock edge captures dout0 into result_q.
always_comb begin
bus_addr = 8'h00;
bus_wdata = 8'h00;
bus_we = 1'b0;
bus_re = 1'b0;
if (state == S_EXECUTE) begin
if (is_st) begin
bus_addr = op_a;
bus_wdata = op_b;
bus_we = 1'b1;
end else if (is_ld) begin
bus_addr = op_a;
bus_re = 1'b1;
end
end else if (state == S_MEMWAIT) begin
// Hold bus_addr / bus_re through MEMWAIT so the SRAM macro's
// posedge-latched csb0/web0/addr0 are stable when its `negedge`
// read scheduler fires mid-cycle.
bus_addr = op_a; LD instructions take 5 cycles instead of 4. ST and everything else
stay at 4 — the FSM only diverges on is_ld.
Change 3 — the address map
P07 had RAM at 0x00..0x0F (16 bytes). P08 exposes 128 bytes of the
1 KB macro through bus_addr[6:0], so the RAM range stretches from
0x00..0x7F. Peripherals move up:
| addr | name | access |
|---|---|---|
0x00..0x7F | RAM (SRAM macro) | R/W |
0x80..0x82 | UART | R/W |
0xC0..0xC1 | GPIO | R/W |
The decoder is two more lines than P07’s:
// Slave selects — one-hot from the address.
// RAM at 0x00..0x7F: bus_addr[7] = 0.
wire ram_sel = (bus_addr[7] == 1'b0);
// UART at 0x80..0x82.
wire uart_sel = (bus_addr >= 8'h80) && (bus_addr <= 8'h82);
// GPIO at 0xC0..0xC1.
wire gpio_sel = (bus_addr >= 8'hC0) && (bus_addr <= 8'hC1);
// Read-data mux — combinational, picks the active slave's rdata.
logic [7:0] ram_rdata;
logic [7:0] uart_rdata;
logic [7:0] gpio_rdata;
always_comb begin
if (ram_sel) bus_rdata = ram_rdata;
else if (uart_sel) bus_rdata = uart_rdata;
else if (gpio_sel) bus_rdata = gpio_rdata; Change 4 — the librelane config
Hard IP is invisible to standard-cell-only librelane/config.yamls
(P01–P07 didn’t need to touch this). P08’s config picks up the
MACROS: block plus a few related knobs:
# Project 08 — macro integration with the sky130 1KB OpenRAM SRAM.
#
# This is the first hardening on the ladder that includes a hard-IP
# block. The SRAM macro lives at a fixed location on the floorplan;
# the standard-cell logic places around it. PDN straps land on the
# macro's pre-placed `vccd1` / `vssd1` pins via FP_PDN_MACRO_HOOKS.
#
# We use OpenLane2/LibreLane's modern `MACROS:` schema rather than the
# older `EXTRA_LEFS / EXTRA_GDS_FILES / EXTRA_LIBS` triple. Each macro
# has its own block with `gds`, `lef`, `nl` (gate-level netlist for
# STA), `lib` (process-corner-keyed), and `instances` (placement +
# orientation).
#
# Iteration plan: the OpenLane SRAM tutorial recommends 25 ns clock
# (40 MHz) for SoCs containing one of these macros. P07 settled at
# 14 ns; with the SRAM in the critical path we'll likely need to drop
# further. Start at 25 ns and tighten if there's slack.
DESIGN_NAME: top
VERILOG_FILES:
- dir::../src/top.sv
PDK: sky130A
STD_CELL_LIBRARY: sky130_fd_sc_hd
CLOCK_PORT: clk
CLOCK_PERIOD: 25.0
# Floorplan — 600 × 600 µm. The SRAM macro is 455 × 446 µm, so this
# leaves a ring of standard-cell area around it for the CPU + bus
# logic + UART + GPIO. ~4 µm gap above the core for taps.
FP_SIZING: absolute
DIE_AREA: [0, 0, 600, 600]
CORE_AREA: [5, 5, 595, 595]
FP_CORE_UTIL: 25
# The OpenRAM macros use `vccd1` / `vssd1` as the power net names
# (Caravel convention). The `MACROS.instances.*.power_connections`
# entries below stitch them onto the chip's PDN.
VDD_NETS:
- vccd1
GND_NETS:
- vssd1
# Macro definitions. Path globs resolve through ciel's PDK cache.
#
# OpenRAM ships only the TT_1p8V_25C corner for this macro. Real
# silicon flows would pad in corner-derated approximations; for an
# educational harden we re-use the TT lib as a stand-in for every
# corner the flow asks for. Timing slack reported against ss/ff
# corners is therefore optimistic — note this in the README.
MACROS:
sky130_sram_1kbyte_1rw1r_8x1024_8:
gds:
- pdk_dir::libs.ref/sky130_sram_macros/gds/sky130_sram_1kbyte_1rw1r_8x1024_8.gds
lef:
- pdk_dir::libs.ref/sky130_sram_macros/lef/sky130_sram_1kbyte_1rw1r_8x1024_8.lef
nl:
- pdk_dir::libs.ref/sky130_sram_macros/verilog/sky130_sram_1kbyte_1rw1r_8x1024_8.v
lib:
"*":
- pdk_dir::libs.ref/sky130_sram_macros/lib/sky130_sram_1kbyte_1rw1r_8x1024_8_TT_1p8V_25C.lib
instances:
u_ram.u_macro:
location: [80, 80]
orientation: N
# PDN macro connections — stitches the chip-level power straps onto
# each macro instance's pre-placed power pins. Format per entry:
# "<instance_name_regex> <vdd_net> <gnd_net> <vdd_pin> <gnd_pin>"
# The OpenRAM macros use vccd1/vssd1 as their pin names.
PDN_MACRO_CONNECTIONS:
- "u_ram.u_macro vccd1 vssd1 vccd1 vssd1"
# OpenRAM macros have known optical-proximity DRC mismatches that are
# not our problem. The OpenLane tutorial explicitly tells you to
# disable the gate.
ERROR_ON_MAGIC_DRC: false
# Same story for KLayout's deck.
RUN_KLAYOUT_DRC: false Sim ≠ synth
OpenRAM ships a behavioural Verilog model alongside the GDS/LEF/lib.
LibreLane uses it as the gate-level netlist for STA; iverilog can’t
schedule one of its non-blocking-with-delay constructs and ends up
returning xx for every read. So we ship a port-compatible stub at
test/sram_model.v that the simulator gets instead — same module
name, same ports, simpler internal semantics.
The synthesis path consumes the OpenRAM model from the PDK; sim gets our stub. This is a common pattern (Cornell ECE 5745 documents the same workaround). The whole stub is 70 lines:
// Behavioural simulation stub for the sky130 OpenRAM SRAM macro
// `sky130_sram_1kbyte_1rw1r_8x1024_8`.
//
// Why this file exists rather than the `.v` from the PDK directly:
// the OpenRAM-shipped behavioural model uses
//
// dout0 <= #(DELAY) mem[addr0_reg];
//
// inside an `always @(negedge clk)`. Iverilog's scheduler does not
// reliably propagate that NBA-with-intra-delay update — `dout0` ends
// up stuck at X even though writes commit and the read condition
// fires. This is documented anti-behaviour in several public chipyard /
// cocotb bug threads (see /notes/diagrams or the upstream tracker).
// LibreLane (and any synthesis tool) consumes only the `.lef` / `.lib`
// / `.gds` from the PDK; the `.v` model exists only for behavioural
// simulation. So we replace it with a port-compatible stub that uses
// straightforward sync-write/sync-read semantics. Same ports, same
// names, no hidden gotchas.
//
// Macro behavior reproduced here:
// - 1024 × 8 bit storage on port 0 (1 R/W).
// - Synchronous read: drive csb0=0/web0=1/addr0=A in cycle N, dout0
// valid in cycle N+1 (latched on the posedge between).
// - Synchronous write: drive csb0=0/web0=0/addr0/din0; mem[addr0]
// latches on the posedge.
// - Port 1 (read-only) is wired through with the same sync-read
// shape, addressed by addr1.
//
// We don't model T_HOLD, the dout-X-window, or wmask — wmask0[0] is
// always treated as "write all 8 bits" since our SoC never partial-
// writes.
`default_nettype none
module sky130_sram_1kbyte_1rw1r_8x1024_8 (
`ifdef USE_POWER_PINS
inout vccd1,
inout vssd1,
`endif
// Port 0 (read/write)
input wire clk0,
input wire csb0, // active-low chip select
input wire web0, // active-low write enable (web0=1 = read)
input wire wmask0, // unused: treated as 1
input wire [9:0] addr0,
input wire [7:0] din0,
output reg [7:0] dout0,
// Port 1 (read-only)
input wire clk1,
input wire csb1,
input wire [9:0] addr1,
output reg [7:0] dout1
);
reg [7:0] mem [0:1023];
always @(posedge clk0) begin
if (!csb0) begin
if (!web0) mem[addr0] <= din0; // write
else dout0 <= mem[addr0]; // read
end
end
always @(posedge clk1) begin
if (!csb1) dout1 <= mem[addr1];
end
// Tieoff lint warnings for unused signals.
wire _unused = &{1'b0, wmask0};
endmodule
`default_nettype wire What just happened?
We integrated a hard-IP block. Four small changes get us from a
standard-cell-only SoC to one with a 455 × 446 µm SRAM macro
dropped into the floorplan. The flow that hardened P01–P07 picked
up the macro through the MACROS: config, abutted standard cells
around it, routed PDN straps onto its pre-placed power pins, and
treated it as opaque for everything else. P09’s RV32I-min could
adopt the same pattern for instruction or data memory.
See also
- Project 07 → the all-flop SoC this evolves from. Diff between P07 and P08 is exactly the four changes documented above.
- Project README