Author: Bian Xi

Failed to start Corosync Cluster Engine After PVE Reboot

Failed to start Corosync Cluster Engine After PVE Reboot

Error

After reboot PVE, following error occurred.

# systemctl status pve-cluster
...
Jun 26 09:33:26 pve01 pmxcfs[3506]: [quorum] crit: quorum_initialize failed: 2
Jun 26 09:33:26 pve01 pmxcfs[3506]: [quorum] crit: can't initialize service
Jun 26 09:33:26 pve01 pmxcfs[3506]: [confdb] crit: cmap_initialize failed: 2
Jun 26 09:33:26 pve01 pmxcfs[3506]: [confdb] crit: can't initialize service
Jun 26 09:33:26 pve01 pmxcfs[3506]: [dcdb] crit: cpg_initialize failed: 2
Jun 26 09:33:26 pve01 pmxcfs[3506]: [dcdb] crit: can't initialize service
Jun 26 09:33:26 pve01 pmxcfs[3506]: [status] crit: cpg_initialize failed: 2
Jun 26 09:33:26 pve01 pmxcfs[3506]: [status] crit: can't initialize service
...
# journalctl -u corosync.service
...
Jun 26 09:26:17 pve01 corosync[1826]:   [MAIN  ] failed to parse node address 'pve01.xx.xx'
Jun 26 09:26:17 pve01 corosync[1826]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1417.
Jun 26 09:26:17 pve01 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Jun 26 09:26:17 pve01 systemd[1]: corosync.service: Failed with result 'exit-code'.
Jun 26 09:26:17 pve01 systemd[1]: Failed to start Corosync Cluster Engine.
...

Fix

Change ring0_addr pve01.xx.xx of node in pve01 corosync.conf to IP address.

References

cluster node cant sync after reboot

Reinstall Proxmox VE node in cluster

Reinstall Proxmox VE node in cluster

After node pve01 in Proxmox VE cluster crushed, reinstall new pve01 in same hardware.

Install PVE using ISO

This just follows the normal installation steps.

Try and error

Tried many tries, end up using following steps to add the replacement node.

  • In the any old node, which is not the node itself, run following to del the node from cluster
pvecm delnode <old_node>
  • Remove old node known host from all other nodes
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "<old_node>"
ssh-keygen -f "/root/.ssh/known_hosts" -R "<old_node>"
  • In the new node, run
pvecm add <existing_node>
pvecm updatecerts
  • Update vote for new node (optional)

Edit file /etc/pve/corosync.conf change the vote number.

  • Import old local pools
zpool import -f <old_local_pool>

Change expected votes

Run following commands to check and set acceptable votes in existing node in the cluster

pvecm status
pvecm expected 3

Remove old node

pvecm delnode pve01

Remove old ssh know host

ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve01"
ssh-keygen -f "/root/.ssh/known_hosts" -R "pve01"

or manual edit two files.

Add node

Run following command in NEW NODE

pvecm add <existing_node>

Sync certs

pvecm updatecerts

Test SSH key authentication

Make sure SSH Key authentication is working

Copy UI certificate

cp /etc/pve/nodes/pve02/pveproxy-ssl.* /etc/pve/nodes/pve01

Remove local-zfs filesystem

If the previous node was using zfs, then now change to ext4, local-zfs needs to be removed.

vi /etc/pve/storage.cfg

If need to disable cluster, following command can be used

systemctl stop pve-cluster
/usr/bin/pmxcfs -l

Restart pve01 cluster

systemctl restart pve-cluster

References

Cluster Manager
Wiki - Cluster Manager
Correct procedure for zpool removal

Change Expected Votes for Proxmox Cluster

Change Expected Votes for Proxmox Cluster

If two or more nodes down in Proxmox cluster, then user can not login to Proxmox web page. In order to login, the number of Expected votes needs to be changed.

This change is only temporary, and cannot be changed to smaller than current Total votes.

Steps

Run following command to check status

pvecm status

Run following command to change Expected votes

pvecm expected 2

Change node vote

Change quorum_votes in file /etc/pve/corosync.conf, to set different quorum vote for each node.

The vote of node can be 0, if this node is only a test node in the cluster.
The vote of node can be more than 1, if the node has more important role, such as TrueNAS running.

References

cluster quorum

Moving Proxmox VE Server to another Machine

Moving Proxmox to another Machine

The reason to move Proxmox VE server to another machine is, I got issue when booting up Proxmox installation USB disk from a MacBook Pro. So I decided to use existing Proxmox VE server USB disk boot from this MacBook Pro.

Requirement

The previous Proxmox Virtual Environment USB disk, must be an UEFI disk, because MacBook Pro is a UEFI machine.

After boot

The network configuration /etc/network/interfaces needs to be changed due to different network interface name.

First, change the interface name, which can be found using ip a command, the two lines need to be updated.

auto lo
iface lo inet loopback

iface enp0s10 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.205/24
        gateway 192.168.1.254
        bridge-ports enp0s10
        bridge-stp off
        bridge-fd 0

The WIFI interface can be disabled if it is not used.

#iface wlp1s0 inet manual

References

Migrate USB UEFI boot with iSCSI root Ubuntu to Proxmox VM

Migrate USB boot with iSCSI root Ubuntu to Proxmox VM

To convert Ubuntu to Proxmox Virtual Environment, the migration is required.

Ubuntu configuration

The Ubuntu server has following configuration

  • Boots from USB device with /boot and /boot/efi filesystems.
  • Connect to iSCSI host using GRUB2 configuration
  • Root file system / is on iSCSI disk

Conversion

Create Proxmox VM

  • Create VM with 2GB disk
  • BIOS type is UEFI
  • Add EFI disk
  • Add Ubuntu Live CD and boot from CD

Create partition

Duplicating USB device partition to 2GB VM disk

Create filesystems

mkfs.vfat /dev/sda1
mkfs.btrfs /dev/sda2

Duplicate UUID

Duplicate UUID for /boot/efi

If don't change UUID for /boot/efi, later will need to change /etc/fstab file after reboot.

Duplicate UUID for /boot

Using following command to duplicate UUID for BTRFS filesystem

  • Retrieve partition from USB Ubuntu

    sfdisk -d /dev/sda
  • Create partitions on 2GB VM disk

  • Duplicate UUID of partition /boot/efi

  • Duplicate UUID of partition /boot

    btrfstune -U  /dev/sda2

Change network interface name in iSCSI configuration in Grub

  • Retrieve network interface name
ip a
  • Mount boot filesystem
mount /dev/sda2 /boot
  • Edit file /boot/grub/grub.cfg

Change all interface names in the grub.cfg.

linux /vmlinuz-5.4.0-113-generic ... ip=192.168.1.99::192.168.1.254:255.255.255.0:fish:ensXX::192.168.1.55

Reboot VM

References

Modifying a BTRFS filesystem UUID

Change partition UUID in Ubuntu

Change partition UUID in Ubuntu

Generate UUID

uuidgen

Change one partition

sgdisk -U <uuid> /dev/sda1

Change multiple partitions

Run following command to retrieve partitions info

sfdisk -d /dev/sda > /tmp/sda.dsk

Edit the UUID in the file /tmp/sda.dsk.

Run following command to reimport the modified partitions

sfdisk /dev/sda < /tmp/sda.dsk

References

Proxmox VM migration failed – no local-zfs rpool

Proxmox VM migration failed - no local-zfs rpool

When try to migrate VM from one node to another, following error encountered

Failed to sync data - could not activate storage 'local-zfs', zfs error: cannot import 'rpool': no such pool available

The reason is two nodes have different storage pool

Solution

Change source node storage pool local-zfs as below.

  • Select Datacenter -> Storage
  • Select storage pool local-zfs, and click Edit
  • Change Nodes from All (No restrictions) to the node the stroage belongs to
  • Click OK to save the option

References

Migration of VM between nodes failed - could not activate storage 'local-zfs', zfs error: cannot imp

Add a Proxmox Node to Cluster

Add a Proxmox Node to Cluster

When using UI Web interface to add node into cluster, the following error occurred

ERROR: TFA-enabled login currently works only with a TTY. at /usr/share/perl5/PVE/APIClient/LWP.pm line 100

Solution

Use command line below to add node via Shell

pvecm add <target ip> -link0 <source ip>

If got error on key validation, try node name instead

pvecm add <target_dns_name>

References