Clustering & High Availability
Complete guide to Proxmox VE clustering, high availability, and failover configuration
Clustering & High Availability
Proxmox VE clustering provides centralized management, resource sharing, and high availability for your virtualized infrastructure. This guide covers cluster setup, configuration, and HA implementation.
Clustering Overview
Proxmox VE clusters enable centralized management of multiple nodes with shared configuration, live migration, and high availability features.
Cluster Benefits
Cluster Requirements
- Minimum 3 nodes for proper quorum
- Reliable network with low latency (less than 5ms recommended)
- Shared storage for VM/CT migration
- Time synchronization (NTP) across all nodes
- Identical Proxmox versions on all nodes
Cluster Setup
Network Planning
Plan your cluster network carefully. Changes to cluster network configuration after setup can be complex and disruptive.
Simple Setup (not recommended for production):
# All traffic on single network
auto vmbr0
iface vmbr0 inet static
address 192.168.1.100/24
gateway 192.168.1.1
bridge-ports eth0
bridge-stp off
bridge-fd 0Recommended Setup:
# Management network
auto vmbr0
iface vmbr0 inet static
address 192.168.1.100/24
gateway 192.168.1.1
bridge-ports eth0
bridge-stp off
bridge-fd 0
# Dedicated cluster network
auto eth1
iface eth1 inet static
address 10.0.0.100/24
# No gateway - cluster communication onlyProduction Setup with Redundancy:
# Bonded management network
auto bond0
iface bond0 inet manual
bond-slaves eth0 eth1
bond-miimon 100
bond-mode active-backup
auto vmbr0
iface vmbr0 inet static
address 192.168.1.100/24
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
# Dedicated cluster network with redundancy
auto bond1
iface bond1 inet static
address 10.0.0.100/24
bond-slaves eth2 eth3
bond-miimon 100
bond-mode active-backupCreating a Cluster
On the first node:
- Datacenter → Cluster → Create Cluster
- Configure cluster settings:
- Cluster Name: Choose descriptive name
- Cluster Network: Select network for cluster communication
- Link 0: Primary cluster network
- Link 1: Optional secondary network for redundancy
On additional nodes:
- Datacenter → Cluster → Join Cluster
- Enter cluster information:
- Information: Cluster join information from first node
- Password: Root password of existing cluster node
- Fingerprint: Verify cluster certificate fingerprint
Initialize cluster on first node:
# Create cluster
pvecm create production-cluster
# Optional: Specify cluster network
pvecm create production-cluster --bindnet0_addr 10.0.0.100
# Get cluster information for joining
pvecm statusJoin additional nodes:
# Join cluster from additional nodes
pvecm add 10.0.0.100
# Verify cluster membership
pvecm status
pvecm nodesCluster Configuration
Corosync Configuration
# Edit /etc/pve/corosync.conf
totem {
version: 2
cluster_name: production-cluster
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
}
nodelist {
node {
ring0_addr: 10.0.0.100
name: pve-node1
nodeid: 1
}
node {
ring0_addr: 10.0.0.101
name: pve-node2
nodeid: 2
}
node {
ring0_addr: 10.0.0.102
name: pve-node3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
two_node: 0
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
}Cluster Network Redundancy
# Configure redundant cluster links
totem {
version: 2
cluster_name: production-cluster
transport: knet
# Multiple links for redundancy
interface {
ringnumber: 0
bindnetaddr: 10.0.0.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 10.0.1.0
mcastaddr: 239.255.1.2
mcastport: 5405
ttl: 1
}
}# Node configuration with multiple links
nodelist {
node {
ring0_addr: 10.0.0.100
ring1_addr: 10.0.1.100
name: pve-node1
nodeid: 1
}
node {
ring0_addr: 10.0.0.101
ring1_addr: 10.0.1.101
name: pve-node2
nodeid: 2
}
}# Test cluster communication
corosync-cfgtool -s
# Check cluster membership
corosync-cmapctl | grep members
# Monitor cluster status
crm_mon -1
# Test link failover
# Disconnect primary cluster network and verify secondary takes overShared Storage Configuration
Storage Requirements for Clustering
Shared storage is essential for live migration and high availability. All cluster nodes must have access to the same storage.
NFS Shared Storage
On NFS server:
# Install NFS server
apt update && apt install nfs-kernel-server
# Create export directories
mkdir -p /srv/nfs/proxmox/{images,backup,iso,templates}
# Configure exports (/etc/exports)
/srv/nfs/proxmox 10.0.0.0/24(rw,sync,no_subtree_check,no_root_squash)
# Apply configuration
exportfs -ra
systemctl restart nfs-kernel-server
systemctl enable nfs-kernel-server
# Verify exports
showmount -e localhostAdd NFS storage to cluster:
# Via command line
pvesm add nfs shared-storage \
--server 192.168.1.200 \
--export /srv/nfs/proxmox \
--content images,iso,vztmpl,backup \
--options vers=3,hard,intr
# Verify storage
pvesm status shared-storageVia Web Interface:
- Datacenter → Storage → Add → NFS
- Configure NFS settings and content types
# Optimize NFS mount options
pvesm set shared-storage --options vers=3,hard,intr,rsize=32768,wsize=32768
# Client-side tuning
echo 'net.core.rmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
sysctl -pCeph Distributed Storage
# Install Ceph on all nodes
pveceph install --version quincy
# Initialize Ceph cluster (on first node)
pveceph init --network 10.0.0.0/24
# Create monitors (on each node)
pveceph mon create
# Create manager (on each node)
pveceph mgr create
# Check cluster status
ceph status# Create OSD (on each node with storage)
pveceph osd create /dev/sdb
# Create storage pools
pveceph pool create vm-pool --size 3 --min_size 2
pveceph pool create backup-pool --size 2 --min_size 1
# Verify pool creation
ceph osd pool ls
ceph df# Add Ceph RBD storage to Proxmox
pvesm add rbd ceph-vm --pool vm-pool --content images,rootdir
pvesm add rbd ceph-backup --pool backup-pool --content backup
# Verify Ceph storage
pvesm status
ceph healthHigh Availability Configuration
HA Manager Overview
Proxmox HA Manager monitors services and automatically restarts or relocates them in case of node failures.
HA Groups Configuration
Create HA Groups:
- Datacenter → HA → Groups → Add
- Configure group settings:
- ID: Group identifier
- Nodes: Node priorities (higher number = higher priority)
- Restricted: Limit to specific nodes
- No Failback: Prevent automatic failback
Example Configuration:
- Group: production
- Nodes: node1:3,node2:2,node3:1
- Restricted: Yes (production workloads only)
# Create HA groups
ha-manager groupadd production --nodes "node1:3,node2:2,node3:1" --restricted
ha-manager groupadd development --nodes "node2:2,node3:2" --nofailback
# List HA groups
ha-manager groupconfig# Complex group configuration
ha-manager groupadd critical \
--nodes "node1:100,node2:90,node3:80" \
--restricted \
--comment "Critical production services"
# Group with specific constraints
ha-manager groupadd gpu-nodes \
--nodes "gpu-node1:2,gpu-node2:1" \
--restricted \
--comment "GPU-enabled nodes only"HA Resource Configuration
Add VMs to HA:
# Add VM to HA with basic settings
ha-manager add vm:100 --state started --group production
# Advanced HA configuration
ha-manager add vm:101 \
--state started \
--group production \
--max_restart 3 \
--max_relocate 1 \
--comment "Critical database server"
# Configure startup delay
ha-manager set vm:100 --state started --max_restart 2Via Web Interface:
- Datacenter → HA → Resources → Add
- Configure HA resource settings
# Add container to HA
ha-manager add ct:200 --state started --group development
# Container with specific requirements
ha-manager add ct:201 \
--state started \
--group production \
--max_restart 5 \
--comment "Web application container"# Check HA status
ha-manager status
# Monitor HA resources
ha-manager config
# View HA logs
journalctl -u pve-ha-lrm
journalctl -u pve-ha-crmFencing Configuration
Proper fencing is crucial for preventing split-brain scenarios and ensuring data integrity in HA clusters.
# Configure IPMI fencing
ha-manager fence-add ipmi-node1 \
--type ipmi \
--ip 192.168.100.101 \
--username admin \
--password secret
# Test IPMI fencing
fence_ipmilan -a 192.168.100.101 -l admin -p secret -o status
# Enable fencing for group
ha-manager groupset production --fence ipmi# Configure network-based fencing
ha-manager fence-add network-fence \
--type network \
--ip 192.168.1.101 \
--username admin \
--password secret
# SSH-based fencing
ha-manager fence-add ssh-fence \
--type ssh \
--ip 192.168.1.101 \
--username root# Custom fencing script
#!/bin/bash
# /usr/local/bin/custom-fence.sh
NODE=$1
ACTION=$2
case $ACTION in
"off")
# Custom power-off logic
ssh root@$NODE "shutdown -h now"
;;
"on")
# Custom power-on logic
wakeonlan 00:11:22:33:44:55
;;
"status")
# Check node status
ping -c 1 $NODE >/dev/null 2>&1
;;
esacLive Migration
Migration Requirements
Migration Configuration
# Configure migration settings in datacenter.cfg
echo 'migration: secure,network=10.0.0.0/24' >> /etc/pve/datacenter.cfg
# Allow insecure migration (faster, less secure)
echo 'migration_unsecure: 1' >> /etc/pve/datacenter.cfg
# Set bandwidth limit (KB/s)
echo 'bwlimit: migration=100000' >> /etc/pve/datacenter.cfgVia Web Interface:
- Datacenter → Options → Migration
- Configure migration type and network
# Dedicated migration network
auto eth2
iface eth2 inet static
address 10.0.2.100/24
# Migration traffic only
# Configure migration to use specific network
echo 'migration: secure,network=10.0.2.0/24' >> /etc/pve/datacenter.cfg# Optimize migration performance
echo 'migration: secure,network=10.0.0.0/24' >> /etc/pve/datacenter.cfg
echo 'bwlimit: migration=1000000' >> /etc/pve/datacenter.cfg # 1GB/s
echo 'bwlimit: clone=500000' >> /etc/pve/datacenter.cfg # 500MB/s
echo 'bwlimit: default=100000' >> /etc/pve/datacenter.cfg # 100MB/s
# CPU optimization for migration
echo 'cpu: host' >> /etc/pve/datacenter.cfgPerforming Migrations
# Live migrate VM
qm migrate 100 node2 --online
# Live migrate with specific options
qm migrate 100 node2 --online --with-local-disks --targetstorage shared-storage
# Migrate container
pct migrate 200 node2 --online --restartVia Web Interface:
- Right-click VM/CT → Migrate
- Select target node and options
# Offline migration (VM stopped)
qm migrate 100 node2
# Migrate with storage
qm migrate 100 node2 --with-local-disks --targetstorage local-lvm# Move VM disk to different storage
qm move-disk 100 scsi0 new-storage
# Move with format conversion
qm move-disk 100 scsi0 new-storage --format qcow2Cluster Maintenance
Node Maintenance
# Put node in maintenance mode
pvecm expected 2 # Reduce expected votes temporarily
# Migrate all VMs/CTs from node
for vm in $(qm list | awk 'NR>1 {print $1}'); do
qm migrate $vm node2 --online
done
# Shutdown node safely
shutdown -h now# Automated node evacuation script
#!/bin/bash
NODE_TO_EVACUATE="node1"
TARGET_NODE="node2"
# Migrate all VMs
qm list | grep $NODE_TO_EVACUATE | awk '{print $1}' | while read vmid; do
echo "Migrating VM $vmid to $TARGET_NODE"
qm migrate $vmid $TARGET_NODE --online
done
# Migrate all containers
pct list | grep $NODE_TO_EVACUATE | awk '{print $1}' | while read ctid; do
echo "Migrating CT $ctid to $TARGET_NODE"
pct migrate $ctid $TARGET_NODE --online --restart
done# Update cluster nodes one by one
# 1. Migrate VMs/CTs away from node
# 2. Update node packages
apt update && apt upgrade
# 3. Reboot if kernel updated
reboot
# 4. Verify node rejoins cluster
pvecm status
# 5. Repeat for next nodeBackup and Recovery
# Backup cluster configuration
tar -czf cluster-backup-$(date +%Y%m%d).tar.gz /etc/pve/
# Backup individual node configuration
tar -czf node-backup-$(date +%Y%m%d).tar.gz \
/etc/pve/nodes/$(hostname)/ \
/etc/network/interfaces \
/etc/hosts
# Restore cluster configuration (if needed)
tar -xzf cluster-backup-20240209.tar.gz -C /
systemctl restart pve-clusterTroubleshooting
Common Cluster Issues
Always backup cluster configuration before attempting major troubleshooting steps.
# Check cluster status
pvecm status
corosync-quorumtool -s
# Temporary quorum fix (emergency only)
pvecm expected 1
# Permanent fix: add more nodes or configure QDevice
# Install corosync-qdevice on external system
apt install corosync-qdevice
# Configure QDevice on cluster
pvecm qdevice setup 192.168.1.250# Test cluster communication
corosync-cfgtool -s
# Check cluster membership
corosync-cmapctl | grep members
# Monitor cluster traffic
tcpdump -i eth1 port 5405
# Restart cluster services
systemctl restart corosync
systemctl restart pve-cluster# Identify split-brain condition
pvecm status # Check on all nodes
# Stop cluster services on minority partition
systemctl stop pve-cluster
systemctl stop corosync
# On majority partition, update expected votes
pvecm expected 2
# Rejoin minority nodes
systemctl start corosync
systemctl start pve-cluster
# Verify cluster recovery
pvecm statusPerformance Monitoring
# Monitor cluster performance
#!/bin/bash
# cluster-monitor.sh
while true; do
echo "=== Cluster Status $(date) ==="
pvecm status
echo "=== Resource Usage ==="
for node in node1 node2 node3; do
echo "Node: $node"
ssh $node "uptime; free -h | head -2; df -h / | tail -1"
echo
done
sleep 60
doneBest Practices
Cluster Design
- Odd number of nodes: Prevents split-brain scenarios
- Dedicated cluster network: Isolate cluster traffic
- Redundant networking: Multiple network paths
- Shared storage: Essential for migration and HA
- Time synchronization: NTP on all nodes
- Regular backups: Cluster configuration and data
Security Considerations
- Network isolation: Separate cluster and management networks
- Firewall rules: Restrict cluster communication ports
- Certificate management: Regular certificate updates
- Access control: Limit cluster administration access
- Monitoring: Continuous cluster health monitoring
Operational Procedures
- Change management: Document all cluster changes
- Testing procedures: Regular failover testing
- Monitoring alerts: Automated cluster health alerts
- Maintenance windows: Scheduled maintenance procedures
- Recovery procedures: Documented disaster recovery plans
Proper cluster planning and maintenance ensure reliable, high-performance virtualization infrastructure with minimal downtime.