Tag: cluster

Reinstall Proxmox VE node in cluster

Reinstall Proxmox VE node in cluster

After node pve01 in Proxmox VE cluster crushed, reinstall new pve01 in same hardware.

Install PVE using ISO

This just follows the normal installation steps.

Try and error

Tried many tries, end up using following steps to add the replacement node.

  • In the any old node, which is not the node itself, run following to del the node from cluster
pvecm delnode <old_node>
  • Remove old node known host from all other nodes
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "<old_node>"
ssh-keygen -f "/root/.ssh/known_hosts" -R "<old_node>"
  • In the new node, run
pvecm add <existing_node>
pvecm updatecerts
  • Update vote for new node (optional)

Edit file /etc/pve/corosync.conf change the vote number.

  • Import old local pools
zpool import -f <old_local_pool>

Change expected votes

Run following commands to check and set acceptable votes in existing node in the cluster

pvecm status
pvecm expected 3

Remove old node

pvecm delnode pve01

Remove old ssh know host

ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve01"
ssh-keygen -f "/root/.ssh/known_hosts" -R "pve01"

or manual edit two files.

Add node

Run following command in NEW NODE

pvecm add <existing_node>

Sync certs

pvecm updatecerts

Test SSH key authentication

Make sure SSH Key authentication is working

Copy UI certificate

cp /etc/pve/nodes/pve02/pveproxy-ssl.* /etc/pve/nodes/pve01

Remove local-zfs filesystem

If the previous node was using zfs, then now change to ext4, local-zfs needs to be removed.

vi /etc/pve/storage.cfg

If need to disable cluster, following command can be used

systemctl stop pve-cluster
/usr/bin/pmxcfs -l

Restart pve01 cluster

systemctl restart pve-cluster

References

Cluster Manager
Wiki - Cluster Manager
Correct procedure for zpool removal

Changing IP address for all nodes in Proxmox Cluster

Changing IP address for all nodes in Proxmox Cluster

Steps

  • Change IP in all nodes in following files /etc/network/interfaces and /etc/hosts
  • Change all IP address in /etc/pve/corosync.conf
  • Reboot all nodes.

Troubleshooting

If above failed during the synchronization, use following commands to fix it.

  • Stop cluster services on the node that wasn't synchronized
systemctl stop corosync.service
systemctl stop pve-cluster
  • Update the corosync.conf file manually
vi /etc/corosync/corosync.conf
  • Restart cluster services
systemctl start corosync.service
systemctl start pve-cluster

Verify configuration file again and cluster status

cat /etc/corosync/corosync.conf
pvecm status

Proxmox VM migration failed – no local-zfs rpool

Proxmox VM migration failed - no local-zfs rpool

When try to migrate VM from one node to another, following error encountered

Failed to sync data - could not activate storage 'local-zfs', zfs error: cannot import 'rpool': no such pool available

The reason is two nodes have different storage pool

Solution

Change source node storage pool local-zfs as below.

  • Select Datacenter -> Storage
  • Select storage pool local-zfs, and click Edit
  • Change Nodes from All (No restrictions) to the node the stroage belongs to
  • Click OK to save the option

References

Migration of VM between nodes failed - could not activate storage 'local-zfs', zfs error: cannot imp

Add a Proxmox Node to Cluster

Add a Proxmox Node to Cluster

When using UI Web interface to add node into cluster, the following error occurred

ERROR: TFA-enabled login currently works only with a TTY. at /usr/share/perl5/PVE/APIClient/LWP.pm line 100

Solution

Use command line below to add node via Shell

pvecm add <target ip> -link0 <source ip>

If got error on key validation, try node name instead

pvecm add <target_dns_name>

References

Convert Proxmox Cluster Node to Standalone Local Mode

Convert Proxmox Cluster Node to Standalone PVE

When adding the Proxmox node into existing cluster, the IP to DNS reverse lookup has different name, miss configuration. Then the new node thought it is already a member of cluster, but other nodes are not.

Solution

Convert the node back to local mode

Convert the node

Stop the corosync and pve-cluster services on the node:

systemctl stop pve-cluster
systemctl stop corosync

Start the cluster file system again in local mode:

pmxcfs -l

Delete the corosync configuration files:

rm /etc/pve/corosync.conf
rm -r /etc/corosync/*

Start the file system again as a normal service:

killall pmxcfs
systemctl start pve-cluster

The node is now separated from the cluster.

Remove the node from cluster

Deleted it from any remaining node of the cluster if it is already a node of cluster

pvecm delnode oldnode

If the command fails due to a loss of quorum in the remaining node, you can set the expected votes to 1 as a workaround:

pvecm expected 1

And then repeat the pvecm delnode command.

Cleanup the cluster files

This ensures that the node can be added to another cluster again without problems.

rm /var/lib/corosync/*

Remove /etc/pve/nodes/<node_name> from other nodes.

Stop remove access

Remove ssh key from /etc/pve/priv/authorized_keys file

References

Remove a cluster node