Complete guide to Proxmox VE clustering, high availability, and failover configuration

Clustering & High Availability

Proxmox VE clustering provides centralized management, resource sharing, and high availability for your virtualized infrastructure. This guide covers cluster setup, configuration, and HA implementation.

Clustering Overview

Proxmox VE clusters enable centralized management of multiple nodes with shared configuration, live migration, and high availability features.

Cluster Benefits

Centralized Management

Resource Sharing

High Availability

Cluster Requirements

Minimum 3 nodes for proper quorum
Reliable network with low latency (less than 5ms recommended)
Shared storage for VM/CT migration
Time synchronization (NTP) across all nodes
Identical Proxmox versions on all nodes

Cluster Setup

Network Planning

Plan your cluster network carefully. Changes to cluster network configuration after setup can be complex and disruptive.

Simple Setup (not recommended for production):

# All traffic on single network
auto vmbr0
iface vmbr0 inet static
    address 192.168.1.100/24
    gateway 192.168.1.1
    bridge-ports eth0
    bridge-stp off
    bridge-fd 0

Recommended Setup:

# Management network
auto vmbr0
iface vmbr0 inet static
    address 192.168.1.100/24
    gateway 192.168.1.1
    bridge-ports eth0
    bridge-stp off
    bridge-fd 0

# Dedicated cluster network
auto eth1
iface eth1 inet static
    address 10.0.0.100/24
    # No gateway - cluster communication only

Production Setup with Redundancy:

# Bonded management network
auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    bond-miimon 100
    bond-mode active-backup

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.100/24
    gateway 192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

# Dedicated cluster network with redundancy
auto bond1
iface bond1 inet static
    address 10.0.0.100/24
    bond-slaves eth2 eth3
    bond-miimon 100
    bond-mode active-backup

Creating a Cluster

On the first node:

Datacenter → Cluster → Create Cluster
Configure cluster settings:
- Cluster Name: Choose descriptive name
- Cluster Network: Select network for cluster communication
- Link 0: Primary cluster network
- Link 1: Optional secondary network for redundancy

On additional nodes:

Datacenter → Cluster → Join Cluster
Enter cluster information:
- Information: Cluster join information from first node
- Password: Root password of existing cluster node
- Fingerprint: Verify cluster certificate fingerprint

Initialize cluster on first node:

# Create cluster
pvecm create production-cluster

# Optional: Specify cluster network
pvecm create production-cluster --bindnet0_addr 10.0.0.100

# Get cluster information for joining
pvecm status

Join additional nodes:

# Join cluster from additional nodes
pvecm add 10.0.0.100

# Verify cluster membership
pvecm status
pvecm nodes

Cluster Configuration

Corosync Configuration

# Edit /etc/pve/corosync.conf
totem {
    version: 2
    cluster_name: production-cluster
    transport: knet
    crypto_cipher: aes256
    crypto_hash: sha256
}

nodelist {
    node {
        ring0_addr: 10.0.0.100
        name: pve-node1
        nodeid: 1
    }
    node {
        ring0_addr: 10.0.0.101
        name: pve-node2
        nodeid: 2
    }
    node {
        ring0_addr: 10.0.0.102
        name: pve-node3
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 0
}

logging {
    to_logfile: yes
    logfile: /var/log/corosync/corosync.log
    to_syslog: yes
}

Cluster Network Redundancy

# Configure redundant cluster links
totem {
    version: 2
    cluster_name: production-cluster
    transport: knet
    
    # Multiple links for redundancy
    interface {
        ringnumber: 0
        bindnetaddr: 10.0.0.0
        mcastaddr: 239.255.1.1
        mcastport: 5405
        ttl: 1
    }
    
    interface {
        ringnumber: 1
        bindnetaddr: 10.0.1.0
        mcastaddr: 239.255.1.2
        mcastport: 5405
        ttl: 1
    }
}

# Node configuration with multiple links
nodelist {
    node {
        ring0_addr: 10.0.0.100
        ring1_addr: 10.0.1.100
        name: pve-node1
        nodeid: 1
    }
    node {
        ring0_addr: 10.0.0.101
        ring1_addr: 10.0.1.101
        name: pve-node2
        nodeid: 2
    }
}

# Test cluster communication
corosync-cfgtool -s

# Check cluster membership
corosync-cmapctl | grep members

# Monitor cluster status
crm_mon -1

# Test link failover
# Disconnect primary cluster network and verify secondary takes over

Shared Storage Configuration

Storage Requirements for Clustering

Shared storage is essential for live migration and high availability. All cluster nodes must have access to the same storage.

Supported Shared Storage

Storage Considerations

NFS Shared Storage

On NFS server:

# Install NFS server
apt update && apt install nfs-kernel-server

# Create export directories
mkdir -p /srv/nfs/proxmox/{images,backup,iso,templates}

# Configure exports (/etc/exports)
/srv/nfs/proxmox 10.0.0.0/24(rw,sync,no_subtree_check,no_root_squash)

# Apply configuration
exportfs -ra
systemctl restart nfs-kernel-server
systemctl enable nfs-kernel-server

# Verify exports
showmount -e localhost

Add NFS storage to cluster:

# Via command line
pvesm add nfs shared-storage \
  --server 192.168.1.200 \
  --export /srv/nfs/proxmox \
  --content images,iso,vztmpl,backup \
  --options vers=3,hard,intr

# Verify storage
pvesm status shared-storage

Via Web Interface:

Datacenter → Storage → Add → NFS
Configure NFS settings and content types

# Optimize NFS mount options
pvesm set shared-storage --options vers=3,hard,intr,rsize=32768,wsize=32768

# Client-side tuning
echo 'net.core.rmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf

sysctl -p

Ceph Distributed Storage

# Install Ceph on all nodes
pveceph install --version quincy

# Initialize Ceph cluster (on first node)
pveceph init --network 10.0.0.0/24

# Create monitors (on each node)
pveceph mon create

# Create manager (on each node)
pveceph mgr create

# Check cluster status
ceph status

# Create OSD (on each node with storage)
pveceph osd create /dev/sdb

# Create storage pools
pveceph pool create vm-pool --size 3 --min_size 2
pveceph pool create backup-pool --size 2 --min_size 1

# Verify pool creation
ceph osd pool ls
ceph df

# Add Ceph RBD storage to Proxmox
pvesm add rbd ceph-vm --pool vm-pool --content images,rootdir
pvesm add rbd ceph-backup --pool backup-pool --content backup

# Verify Ceph storage
pvesm status
ceph health

High Availability Configuration

HA Manager Overview

Proxmox HA Manager monitors services and automatically restarts or relocates them in case of node failures.

HA Groups Configuration

Create HA Groups:

Datacenter → HA → Groups → Add
Configure group settings:
- ID: Group identifier
- Nodes: Node priorities (higher number = higher priority)
- Restricted: Limit to specific nodes
- No Failback: Prevent automatic failback

Example Configuration:

Group: production
Nodes: node1:3,node2:2,node3:1
Restricted: Yes (production workloads only)

# Create HA groups
ha-manager groupadd production --nodes "node1:3,node2:2,node3:1" --restricted

ha-manager groupadd development --nodes "node2:2,node3:2" --nofailback

# List HA groups
ha-manager groupconfig

# Complex group configuration
ha-manager groupadd critical \
  --nodes "node1:100,node2:90,node3:80" \
  --restricted \
  --comment "Critical production services"

# Group with specific constraints
ha-manager groupadd gpu-nodes \
  --nodes "gpu-node1:2,gpu-node2:1" \
  --restricted \
  --comment "GPU-enabled nodes only"

HA Resource Configuration

Add VMs to HA:

# Add VM to HA with basic settings
ha-manager add vm:100 --state started --group production

# Advanced HA configuration
ha-manager add vm:101 \
  --state started \
  --group production \
  --max_restart 3 \
  --max_relocate 1 \
  --comment "Critical database server"

# Configure startup delay
ha-manager set vm:100 --state started --max_restart 2

Via Web Interface:

Datacenter → HA → Resources → Add
Configure HA resource settings

# Add container to HA
ha-manager add ct:200 --state started --group development

# Container with specific requirements
ha-manager add ct:201 \
  --state started \
  --group production \
  --max_restart 5 \
  --comment "Web application container"

# Check HA status
ha-manager status

# Monitor HA resources
ha-manager config

# View HA logs
journalctl -u pve-ha-lrm
journalctl -u pve-ha-crm

Fencing Configuration

Proper fencing is crucial for preventing split-brain scenarios and ensuring data integrity in HA clusters.

# Configure IPMI fencing
ha-manager fence-add ipmi-node1 \
  --type ipmi \
  --ip 192.168.100.101 \
  --username admin \
  --password secret

# Test IPMI fencing
fence_ipmilan -a 192.168.100.101 -l admin -p secret -o status

# Enable fencing for group
ha-manager groupset production --fence ipmi

# Configure network-based fencing
ha-manager fence-add network-fence \
  --type network \
  --ip 192.168.1.101 \
  --username admin \
  --password secret

# SSH-based fencing
ha-manager fence-add ssh-fence \
  --type ssh \
  --ip 192.168.1.101 \
  --username root

# Custom fencing script
#!/bin/bash
# /usr/local/bin/custom-fence.sh

NODE=$1
ACTION=$2

case $ACTION in
    "off")
        # Custom power-off logic
        ssh root@$NODE "shutdown -h now"
        ;;
    "on")
        # Custom power-on logic
        wakeonlan 00:11:22:33:44:55
        ;;
    "status")
        # Check node status
        ping -c 1 $NODE >/dev/null 2>&1
        ;;
esac

# Configure migration settings in datacenter.cfg
echo 'migration: secure,network=10.0.0.0/24' >> /etc/pve/datacenter.cfg

# Allow insecure migration (faster, less secure)
echo 'migration_unsecure: 1' >> /etc/pve/datacenter.cfg

# Set bandwidth limit (KB/s)
echo 'bwlimit: migration=100000' >> /etc/pve/datacenter.cfg

Via Web Interface:

Datacenter → Options → Migration
Configure migration type and network

# Dedicated migration network
auto eth2
iface eth2 inet static
    address 10.0.2.100/24
    # Migration traffic only

# Configure migration to use specific network
echo 'migration: secure,network=10.0.2.0/24' >> /etc/pve/datacenter.cfg

# Optimize migration performance
echo 'migration: secure,network=10.0.0.0/24' >> /etc/pve/datacenter.cfg
echo 'bwlimit: migration=1000000' >> /etc/pve/datacenter.cfg  # 1GB/s
echo 'bwlimit: clone=500000' >> /etc/pve/datacenter.cfg      # 500MB/s
echo 'bwlimit: default=100000' >> /etc/pve/datacenter.cfg    # 100MB/s

# CPU optimization for migration
echo 'cpu: host' >> /etc/pve/datacenter.cfg

Performing Migrations

# Live migrate VM
qm migrate 100 node2 --online

# Live migrate with specific options
qm migrate 100 node2 --online --with-local-disks --targetstorage shared-storage

# Migrate container
pct migrate 200 node2 --online --restart

Via Web Interface:

Right-click VM/CT → Migrate
Select target node and options

# Offline migration (VM stopped)
qm migrate 100 node2

# Migrate with storage
qm migrate 100 node2 --with-local-disks --targetstorage local-lvm

# Move VM disk to different storage
qm move-disk 100 scsi0 new-storage

# Move with format conversion
qm move-disk 100 scsi0 new-storage --format qcow2

Cluster Maintenance

Node Maintenance

# Put node in maintenance mode
pvecm expected 2  # Reduce expected votes temporarily

# Migrate all VMs/CTs from node
for vm in $(qm list | awk 'NR>1 {print $1}'); do
    qm migrate $vm node2 --online
done

# Shutdown node safely
shutdown -h now

# Automated node evacuation script
#!/bin/bash
NODE_TO_EVACUATE="node1"
TARGET_NODE="node2"

# Migrate all VMs
qm list | grep $NODE_TO_EVACUATE | awk '{print $1}' | while read vmid; do
    echo "Migrating VM $vmid to $TARGET_NODE"
    qm migrate $vmid $TARGET_NODE --online
done

# Migrate all containers
pct list | grep $NODE_TO_EVACUATE | awk '{print $1}' | while read ctid; do
    echo "Migrating CT $ctid to $TARGET_NODE"
    pct migrate $ctid $TARGET_NODE --online --restart
done

# Update cluster nodes one by one
# 1. Migrate VMs/CTs away from node
# 2. Update node packages
apt update && apt upgrade

# 3. Reboot if kernel updated
reboot

# 4. Verify node rejoins cluster
pvecm status

# 5. Repeat for next node

Backup and Recovery

# Backup cluster configuration
tar -czf cluster-backup-$(date +%Y%m%d).tar.gz /etc/pve/

# Backup individual node configuration
tar -czf node-backup-$(date +%Y%m%d).tar.gz \
  /etc/pve/nodes/$(hostname)/ \
  /etc/network/interfaces \
  /etc/hosts

# Restore cluster configuration (if needed)
tar -xzf cluster-backup-20240209.tar.gz -C /
systemctl restart pve-cluster

Troubleshooting

Common Cluster Issues

Always backup cluster configuration before attempting major troubleshooting steps.

# Check cluster status
pvecm status
corosync-quorumtool -s

# Temporary quorum fix (emergency only)
pvecm expected 1

# Permanent fix: add more nodes or configure QDevice
# Install corosync-qdevice on external system
apt install corosync-qdevice

# Configure QDevice on cluster
pvecm qdevice setup 192.168.1.250

# Test cluster communication
corosync-cfgtool -s

# Check cluster membership
corosync-cmapctl | grep members

# Monitor cluster traffic
tcpdump -i eth1 port 5405

# Restart cluster services
systemctl restart corosync
systemctl restart pve-cluster

# Identify split-brain condition
pvecm status  # Check on all nodes

# Stop cluster services on minority partition
systemctl stop pve-cluster
systemctl stop corosync

# On majority partition, update expected votes
pvecm expected 2

# Rejoin minority nodes
systemctl start corosync
systemctl start pve-cluster

# Verify cluster recovery
pvecm status

Performance Monitoring

# Monitor cluster performance
#!/bin/bash
# cluster-monitor.sh

while true; do
    echo "=== Cluster Status $(date) ==="
    pvecm status
    
    echo "=== Resource Usage ==="
    for node in node1 node2 node3; do
        echo "Node: $node"
        ssh $node "uptime; free -h | head -2; df -h / | tail -1"
        echo
    done
    
    sleep 60
done

Best Practices

Cluster Design

Odd number of nodes: Prevents split-brain scenarios
Dedicated cluster network: Isolate cluster traffic
Redundant networking: Multiple network paths
Shared storage: Essential for migration and HA
Time synchronization: NTP on all nodes
Regular backups: Cluster configuration and data

Security Considerations

Network isolation: Separate cluster and management networks
Firewall rules: Restrict cluster communication ports
Certificate management: Regular certificate updates
Access control: Limit cluster administration access
Monitoring: Continuous cluster health monitoring

Operational Procedures

Change management: Document all cluster changes
Testing procedures: Regular failover testing
Monitoring alerts: Automated cluster health alerts
Maintenance windows: Scheduled maintenance procedures
Recovery procedures: Documented disaster recovery plans

Proper cluster planning and maintenance ensure reliable, high-performance virtualization infrastructure with minimal downtime.

Clustering & High Availability

Clustering & High Availability

Clustering Overview

Cluster Benefits

Cluster Requirements

Cluster Setup

Network Planning

Creating a Cluster

Cluster Configuration

Corosync Configuration

Cluster Network Redundancy

Shared Storage Configuration

Storage Requirements for Clustering

NFS Shared Storage

Ceph Distributed Storage

High Availability Configuration

HA Manager Overview

HA Groups Configuration

HA Resource Configuration

Fencing Configuration

Live Migration

Migration Requirements

Migration Configuration

Performing Migrations

Cluster Maintenance

Node Maintenance

Backup and Recovery

Troubleshooting

Common Cluster Issues

Performance Monitoring

Best Practices

Cluster Design

Security Considerations

Operational Procedures

On this page