← All articles
VIRTUALIZATION Proxmox VE Clustering for High Availability 2026-02-09 · 7 min read · proxmox · clustering · high-availability

Proxmox VE Clustering for High Availability

Virtualization 2026-02-09 · 7 min read proxmox clustering high-availability virtualization live-migration

A single Proxmox server is a great foundation for a homelab. Two or three Proxmox servers in a cluster unlock the features that make virtualization genuinely powerful: live migration (move running VMs between hosts without downtime), high availability (VMs restart automatically on a surviving node when a host fails), and centralized management of all your virtualization infrastructure from one web interface.

Proxmox VE clustering is built on mature Linux technologies — Corosync for cluster communication, a distributed configuration filesystem (pmxcfs), and the Proxmox HA manager for failover. It works on commodity hardware, doesn't require matching configurations across nodes, and the setup takes about 15 minutes once you understand the requirements.

Proxmox VE logo

Prerequisites

Before creating a cluster, make sure your environment meets these requirements:

Network:

Hostnames and DNS:

Time:

Fresh or compatible installations:

Example Setup

For this guide:

Verify /etc/hosts on each node:

192.168.1.101  pve1
192.168.1.102  pve2
192.168.1.103  pve3

Do NOT have entries like 127.0.1.1 pve1 — this causes Corosync binding issues.

Creating the Cluster

On the first node (pve1), create the cluster:

pvecm create homelab-cluster

That's it. One command. Verify it:

pvecm status

You should see a cluster with one node. The web UI (https://192.168.1.101:8006) now shows the cluster name.

Specifying the Cluster Network

If you have a dedicated cluster network interface, specify it during creation:

pvecm create homelab-cluster --link0 192.168.10.101

This binds Corosync to the dedicated interface. For redundancy, add a second link:

pvecm create homelab-cluster --link0 192.168.10.101 --link1 192.168.1.101

Dual links mean the cluster survives the failure of one network path.

Joining Nodes

On pve2 and pve3, join the cluster by pointing at any existing cluster member:

# On pve2
pvecm add 192.168.1.101

# On pve3
pvecm add 192.168.1.101

You'll be prompted for the root password of the target node. After joining, verify:

pvecm status

All three nodes should appear with status. The web UI on any node now shows all three nodes and their VMs/containers.

If Joining Fails

Common issues:

Quorum

A cluster with three nodes requires at least two nodes to be online to have quorum (a majority). Without quorum, the cluster becomes read-only to prevent split-brain scenarios.

The Two-Node Problem

Two Proxmox nodes can form a cluster, but if either goes down, the survivor doesn't have quorum and HA won't function. Solutions:

  1. QDevice (Corosync QNet): Add a lightweight third vote from a Raspberry Pi or any Linux machine. It doesn't run Proxmox — it just provides the tie-breaking vote.
# On the QDevice host (any small Linux box)
sudo apt install corosync-qnetd

# On a Proxmox node
pvecm qdevice setup 192.168.1.200
  1. Three nodes: Even a modest third node (a mini PC or old laptop running Proxmox) provides genuine three-way quorum.

For homelabs, the QDevice approach is popular because it doesn't require a third full server.

Shared Storage for Live Migration and HA

Live migration and HA require that VM disk images are accessible from all nodes simultaneously. A VM can only move to another node if that node can access the same disk.

Options for shared storage:

NFS

The simplest option. Export a directory from your NAS and add it as storage on all Proxmox nodes:

Datacenter > Storage > Add > NFS
  Server: 192.168.1.50
  Export: /mnt/pool/proxmox
  Content: Disk image, ISO image, Container template

NFS works well for homelab clusters. Performance is adequate for most workloads, and setup is trivial.

Ceph (Built Into Proxmox)

Proxmox includes Ceph integration. Each node contributes local disks to a distributed storage pool. No external NAS needed. This is the most "proper" solution but requires:

For a three-node homelab cluster, Ceph with SSDs provides excellent performance and redundancy. Setup is done through the Proxmox web UI under Datacenter > Ceph.

iSCSI

Presents a block device from your NAS to all Proxmox nodes. Better raw performance than NFS for I/O-intensive VMs. More complex to set up but well-supported by Proxmox.

ZFS over iSCSI

If your NAS runs ZFS, you can expose ZFS volumes as iSCSI targets. Proxmox has a dedicated storage plugin for this.

Enabling High Availability

With shared storage in place, enabling HA for a VM or container is straightforward.

Via the Web UI

  1. Select a VM or container
  2. Go to More > Manage HA
  3. Set the HA group and priority
  4. Choose the requested state (started, stopped, disabled)

Via the Command Line

# Add VM 100 to HA with max_restart of 3
ha-manager add vm:100 --state started --max_restart 3 --max_relocate 1

# Check HA status
ha-manager status

HA Groups

HA groups define which nodes a VM can run on and their priority:

# Create a group
ha-manager groupadd preferred-nodes --nodes pve1,pve2 --nofailback 0

What Happens During a Node Failure

  1. Corosync detects the node is unreachable (after ~30 seconds of missed heartbeats)
  2. The HA manager on a surviving node takes over management responsibility
  3. The failed node is fenced (more on this below)
  4. HA-managed VMs from the failed node are restarted on surviving nodes
  5. Restart happens in priority order with configurable delays

Total failover time is typically 1-3 minutes, depending on fencing method and VM boot time.

Fencing

Fencing ensures that a failed node is truly stopped before its VMs are started elsewhere. Without fencing, you risk two copies of the same VM running simultaneously, which corrupts data.

Proxmox supports several fencing methods:

Hardware Watchdog (Recommended for Homelabs)

Most server hardware has an IPMI/iDRAC/iLO watchdog timer. If the node stops refreshing the watchdog, the hardware forces a reboot.

# Check if a hardware watchdog is available
ls /dev/watchdog*

# Proxmox uses the softdog module as a fallback

Proxmox configures the HA manager to use a watchdog by default. If the HA manager on a node loses cluster communication, the watchdog triggers a reboot after a timeout, ensuring the node doesn't continue running VMs that are being started elsewhere.

IPMI Fencing

For more reliable fencing, configure IPMI so surviving nodes can force-power-off the failed node:

# Test IPMI connectivity
ipmitool -I lanplus -H 192.168.1.201 -U admin -P password power status

Configure in /etc/pve/ha/fence.cfg:

device ipmi pve1 {
    cmd "ipmitool -I lanplus -H 192.168.1.201 -U admin -P password power off"
}

Live Migration

With shared storage, you can move running VMs between nodes with zero downtime:

Via the Web UI

Right-click a VM > Migrate > Select target node > Migrate

Via the Command Line

# Live migrate VM 100 to pve2
qm migrate 100 pve2 --online

Live migration copies the VM's RAM contents to the target node while it continues running, then switches over in the final milliseconds. The VM experiences a brief pause (typically under 100ms) during the switchover.

Requirements for live migration:

Cluster Network Best Practices

Separate cluster traffic from VM traffic. Corosync heartbeats are small but latency-sensitive. If your cluster network shares bandwidth with a large VM backup or migration, missed heartbeats can trigger false failovers.

Use a dedicated VLAN or physical NIC for Corosync. Even a separate 1 GbE link dedicated to cluster traffic is better than sharing a 10 GbE link with everything else.

Use link bonding for redundancy. A single network cable failure shouldn't partition your cluster. Bond two interfaces or use dual Corosync links.

Set up a dedicated migration network. Under Datacenter > Options > Migration Settings, specify a network for live migration traffic. This prevents large migrations from saturating your cluster or production network.

Maintenance

Removing a Node

If you need to permanently remove a node:

  1. Migrate all VMs and containers off the node
  2. Remove HA resources from that node
  3. On the node being removed: pvecm delnode NODENAME
  4. On a remaining node: pvecm delnode NODENAME (if the removed node is already offline)

Updating the Cluster

Update nodes one at a time. Migrate VMs off a node, update it, reboot, verify it rejoins the cluster, then move to the next node. This rolling update approach keeps your services available throughout.

A Proxmox cluster transforms your homelab from "a couple of servers" into genuine infrastructure. VMs survive hardware failures, maintenance doesn't require downtime, and you manage everything from a single interface. The setup is straightforward enough to complete in an afternoon, and the operational benefits are immediate.