Storage Replication with DRBD for High Availability
When you have data that can't go down — your database, your file server, your critical VMs — you need that data on more than one machine. DRBD (Distributed Replicated Block Device) solves this at the block level. It mirrors a partition or logical volume from one server to another in real time, essentially creating a network RAID 1 between two nodes.
DRBD operates below the filesystem layer. Your application writes to what looks like a normal block device. DRBD intercepts those writes and replicates them to the peer node over the network. If the primary node fails, the secondary already has an identical copy of the data and can take over immediately.
This isn't exotic enterprise technology. DRBD is open source, included in the Linux kernel since 2.6.33, and runs on commodity hardware. For a homelab aiming at real high availability, it's one of the most reliable approaches available.

When DRBD Makes Sense
DRBD is ideal when you need:
- Active-passive failover for VMs, databases, or file services
- Synchronous replication where data consistency matters more than performance
- Simple two-node HA without the complexity of a distributed storage system like Ceph
DRBD is less suitable for:
- Scaling beyond two nodes (DRBD 9 supports more, but the sweet spot is two)
- Large-scale distributed storage (use Ceph or GlusterFS instead)
- Situations where eventual consistency is acceptable (use rsync or Syncthing)
Prerequisites
You need two Linux servers (physical or virtual) with:
- A dedicated network connection between them (ideally a direct link or VLAN, not through your regular LAN traffic)
- An unused partition or logical volume on each node of the same size
- Matching DRBD versions on both nodes
For this guide, we'll use:
- node1: 192.168.10.1 with
/dev/sdb1 - node2: 192.168.10.2 with
/dev/sdb1 - A dedicated 10.0.0.0/24 replication network (192.168.10.x in our example)
Installation
On Debian/Ubuntu:
sudo apt update
sudo apt install -y drbd-utils
On Fedora/RHEL:
sudo dnf install -y drbd drbd-utils
Load the kernel module:
sudo modprobe drbd
echo drbd | sudo tee /etc/modules-load.d/drbd.conf
Verify on both nodes:
cat /proc/drbd
Configuring a DRBD Resource
DRBD resources are defined in configuration files under /etc/drbd.d/. Create a resource file on both nodes — the file must be identical on each.
Create /etc/drbd.d/data.res on both nodes:
resource data {
protocol C;
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
disk {
resync-rate 100M;
}
on node1 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.10.1:7789;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.10.2:7789;
meta-disk internal;
}
}
Key settings:
- protocol C — Synchronous replication. A write is only confirmed after both nodes have it. This is the safest option and the right choice for most homelab HA setups. Protocol A (asynchronous) and B (semi-synchronous) trade safety for performance.
- after-sb-* — Split-brain recovery policies. More on this below.
- resync-rate — Limits bandwidth during initial sync or recovery. Set this to match your dedicated link's capacity.
Initializing the Resource
Run these commands on both nodes:
# Create DRBD metadata
sudo drbdadm create-md data
# Bring up the resource
sudo drbdadm up data
At this point, both nodes are connected but neither has valid data. You need to designate one as the initial primary. On node1:
# Force node1 as the initial sync source
sudo drbdadm primary --force data
This triggers a full sync from node1 to node2. Monitor progress:
watch cat /proc/drbd
Or with the newer tool:
sudo drbdadm status data
You'll see something like:
data role:Primary
disk:UpToDate
peer role:Secondary
replication:SyncTarget done:34.5
For a 500 GB disk over a 1 Gbps link, the initial sync takes roughly 70-80 minutes. Over 10 GbE, it's under 10 minutes. Don't interrupt the sync — let it complete before putting the resource into production.
Creating a Filesystem
Once sync completes, create a filesystem on the primary node:
sudo mkfs.ext4 /dev/drbd0
Mount it:
sudo mkdir -p /mnt/data
sudo mount /dev/drbd0 /mnt/data
Write some test data:
echo "Hello from DRBD" | sudo tee /mnt/data/test.txt
You can only mount the filesystem on the primary node. The secondary node has the raw block data but doesn't mount it — that's the active-passive model.
Failover
To switch which node is primary (planned maintenance, for example):
On the current primary (node1):
sudo umount /mnt/data
sudo drbdadm secondary data
On the new primary (node2):
sudo drbdadm primary data
sudo mount /dev/drbd0 /mnt/data
cat /mnt/data/test.txt # Should show "Hello from DRBD"
The data is there, byte-for-byte identical. This is the core of DRBD-based HA: the secondary always has current data, and promotion is instantaneous because there's no data to copy.
Split-Brain Recovery
Split-brain happens when both nodes think they're primary, usually because the network link between them goes down and an operator (or automated system) promotes the secondary. Now both nodes have divergent data.
The after-sb-* policies in our config handle common scenarios automatically:
- after-sb-0pri — Neither node is primary.
discard-zero-changesdrops the node with no changes. - after-sb-1pri — One node is primary.
discard-secondarydrops the secondary's changes. - after-sb-2pri — Both nodes are primary.
disconnectstops replication so you can fix it manually.
If automatic recovery can't resolve it, you'll need to manually choose which node's data to keep:
# On the node whose data you want to DISCARD:
sudo drbdadm disconnect data
sudo drbdadm secondary data
sudo drbdadm -- --discard-my-data connect data
# On the node whose data you want to KEEP:
sudo drbdadm connect data
The node with discarded data will resync from the survivor.
Dual-Primary Mode
DRBD can run with both nodes as primary simultaneously. This requires a cluster-aware filesystem like GFS2 or OCFS2 that handles concurrent writes with distributed locking. Regular filesystems like ext4 or XFS will corrupt instantly in dual-primary mode.
Enable it in the resource config:
resource data {
net {
allow-two-primaries;
}
...
}
Then format with a cluster filesystem:
sudo mkfs.gfs2 -p lock_dlm -t cluster_name:data -j 2 /dev/drbd0
Dual-primary is needed for live migration of VMs between nodes (both need simultaneous access to the VM's disk). If you're using Proxmox or similar with DRBD-backed shared storage, this is the configuration you'll end up with.
Integration with Pacemaker/Corosync
For automated failover, pair DRBD with Pacemaker and Corosync. These clustering tools monitor node health and automatically promote the secondary if the primary fails.
Install the cluster stack:
sudo apt install -y pacemaker corosync
Configure Corosync for your two nodes, then create Pacemaker resources for DRBD:
sudo pcs resource create drbd_data ocf:linbit:drbd \
drbd_resource=data \
op monitor interval=15s
sudo pcs resource promotable drbd_data \
promoted-max=1 promoted-node-max=1 \
clone-max=2 clone-node-max=1
sudo pcs resource create fs_data ocf:heartbeat:Filesystem \
device=/dev/drbd0 directory=/mnt/data fstype=ext4
sudo pcs constraint colocation add fs_data with drbd_data-clone INFINITY with-rsc-role=Promoted
sudo pcs constraint order promote drbd_data-clone then start fs_data
Now Pacemaker handles promotion, mounting, and failover automatically. If node1 goes down, Pacemaker promotes node2 and mounts the filesystem within seconds.
Performance Tips
Use a dedicated network for replication. DRBD traffic can saturate a link during initial sync or heavy write workloads. A separate VLAN or direct cable between nodes keeps replication traffic off your main network.
Match your resync rate to your link speed. Don't set resync-rate higher than your network can handle — it won't go faster but can cause congestion.
Use SSDs for the backing device. DRBD adds latency to every write (the network round-trip for synchronous replication). Starting with fast storage minimizes the impact.
Monitor with Prometheus. The DRBD exporter (drbd_exporter) exposes replication state, sync progress, and connection status as Prometheus metrics.
DRBD isn't flashy. It doesn't have a web UI or a marketing page with animations. It's a kernel module that copies blocks between machines, and it's been doing that reliably for over two decades. For a two-node homelab HA setup, it's hard to beat.