Tag: proxmox

Lost network after PVE rebooted

Lost network after PVE rebooted

Error

After reboot of PVE, network interfaces detected, but no link activated, ip address command shows all physical interfaces are down, and interfaces LED lights are shut off when loading OS.

Getting permission denied error when run ifup command, when using python3 /usr/sbin/ifup -a command, getting error as another instance of this application is already running

After using strace python3 /usr/sbin/ifup -a command, found that the command tried to access folder /run/network, but it doesn't exist.

Solution

Create folder /run/network after rebooted, then run command python3 /usr/sbin/ifup -a to bring up network manually.

Note: This is only a temporary solution, because the folder /run/network will disappear. Will troubleshoot again when got time.

References

Reinstall Proxmox VE node in cluster

Reinstall Proxmox VE node in cluster

After node pve01 in Proxmox VE cluster crushed, reinstall new pve01 in same hardware.

Install PVE using ISO

This just follows the normal installation steps.

Try and error

Tried many tries, end up using following steps to add the replacement node.

  • In the any old node, which is not the node itself, run following to del the node from cluster
pvecm delnode <old_node>
  • Remove old node known host from all other nodes
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "<old_node>"
ssh-keygen -f "/root/.ssh/known_hosts" -R "<old_node>"
  • In the new node, run
pvecm add <existing_node>
pvecm updatecerts
  • Update vote for new node (optional)

Edit file /etc/pve/corosync.conf change the vote number.

  • Import old local pools
zpool import -f <old_local_pool>

Change expected votes

Run following commands to check and set acceptable votes in existing node in the cluster

pvecm status
pvecm expected 3

Remove old node

pvecm delnode pve01

Remove old ssh know host

ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve01"
ssh-keygen -f "/root/.ssh/known_hosts" -R "pve01"

or manual edit two files.

Add node

Run following command in NEW NODE

pvecm add <existing_node>

Sync certs

pvecm updatecerts

Test SSH key authentication

Make sure SSH Key authentication is working

Copy UI certificate

cp /etc/pve/nodes/pve02/pveproxy-ssl.* /etc/pve/nodes/pve01

Remove local-zfs filesystem

If the previous node was using zfs, then now change to ext4, local-zfs needs to be removed.

vi /etc/pve/storage.cfg

If need to disable cluster, following command can be used

systemctl stop pve-cluster
/usr/bin/pmxcfs -l

Restart pve01 cluster

systemctl restart pve-cluster

References

Cluster Manager
Wiki - Cluster Manager
Correct procedure for zpool removal

Changing IP address for all nodes in Proxmox Cluster

Changing IP address for all nodes in Proxmox Cluster

Steps

  • Change IP in all nodes in following files /etc/network/interfaces and /etc/hosts
  • Change all IP address in /etc/pve/corosync.conf
  • Reboot all nodes.

Troubleshooting

If above failed during the synchronization, use following commands to fix it.

  • Stop cluster services on the node that wasn't synchronized
systemctl stop corosync.service
systemctl stop pve-cluster
  • Update the corosync.conf file manually
vi /etc/corosync/corosync.conf
  • Restart cluster services
systemctl start corosync.service
systemctl start pve-cluster

Verify configuration file again and cluster status

cat /etc/corosync/corosync.conf
pvecm status

Fix Spice Certificate Issue in Proxmox

Fix Spice Certificate Issue in Proxmox

After changed the display to Spice and added Spice USB device, following error appeared.

swtpm_setup: Not overwriting existing state file.
kvm: warning: Spice: reds.c:2893:reds_init_ssl: Could not load certificates from /etc/pve/local/pve-ssl.pem
kvm: warning: Spice: error:0909006C:PEM routines:get_name:no start line
kvm: warning: Spice: error:140DC009:SSL routines:use_certificate_chain_file:PEM lib
kvm: failed to initialize spice server
stopping swtpm instance (pid 55260) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code 1

Update certificate also got following errors

root@pve01:~# pvecm updatecerts --force
(re)generate node files
generate new node certificate
Signature ok
subject=OU = PVE Cluster Node, O = Proxmox Virtual Environment, CN = pve01.xxx.net
Getting CA Private Key
CA certificate and CA private key do not match
139954545105792:error:06067099:digital envelope routines:EVP_PKEY_copy_parameters:different parameters:../crypto/evp/p_lib.c:93:
139954545105792:error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch:../crypto/x509/x509_cmp.c:303:
unable to generate pve ssl certificate:
command 'faketime yesterday openssl x509 -req -in /tmp/pvecertreq-56235.tmp -days 161 -out /etc/pve/nodes/pve01/pve-ssl.pem -CAkey /etc/pve/priv/pve-root-ca.key -CA /etc/pve/pve-root-ca.pem -CAserial /etc/pve/priv/pve-root-ca.srl -extfile /tmp/pvesslconf-56235.tmp' failed: exit code 1

In this case, remove keys and regenerate.

root@pve01:~# rm -f /etc/pve/pve-root-ca.pem /etc/pve/priv/pve-root-ca.* /etc/pve/local/pve-ssl.*
root@pve01:~# pvecm updatecerts -f
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
root@pve01:~# pvecm updatecerts -f
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
root@pve01:~# 

Now, problem fixed.

References

pveproxy fails to load local certificate chain after upgrade to pve 6

Convert Ubuntu VM to Proxmox

Convert Ubuntu VM to Proxmox

This is to describe how to convert Ubuntu VM to Proxmox.

VM creation

Following hardware options can be considered

  • BIOS: SeaBIOS (Should be able to see Grub menu)
  • Machine: Default (i440fx)
  • SCSI Controller: VirtIO SCSI (It might not be used as sata0 to be considered for disk)
  • Hard Disk (sata0): disk_image_file
  • Network Device (net0): vmxnet3=<mac_address> (This is default for VMware, can use other type too)

Convert the VMware disk to Proxmox disk and attach the disk to new VM

qm importdisk 121 ubuntu.vmdk pool240ssd --format qcow2

Attach the disk as sata0.

Boot

After boot up system show a GUI error screen, press Contrl + Alt + F3 to switch to console mode.

Note: Press Shift to active Grub Menu if required

Network

Find out new network interface UUID

nmcli conn

Change NetworkManager file name

cd /etc/NetworkManager/system-connections
mv Wired\ connection\ 1-<old_uuid>.nmconnection Wired\ connection\ 1-<new_uuid>.nmconnection

Update nmconnection file

[connection]
id=<new_interface_name>
uuid=<new_uuid>
type=ethernet
autoconnect-priority=-999
interface-name=<new_interface_name>
permissions=
timestamp=1628151710

[ethernet]
mac-address-blacklist=

[ipv4]
address1=192.168.1.232/24,192.168.1.254
dns=192.168.1.250;8.8.8.8;
dns-search=
method=manual

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
method=disabled

[proxy]

Errors

No login GUI

After boot, only a white screen with error message appears, this was fixed by running apt update and upgrade

First, update /etc/apt/sources.list file, replace all repo URL to old-releases.ubuntu.com

Then run following commands

apt update
apt upgrade -y

References

Convert Oracle Linux 7.9 to Proxmox

Convert Oracle Linux 7.9 to Proxmox

This is to describe how to convert Oracle Linux 7.9 to Proxmox.

VM creation

Following hardware options can be considered

  • BIOS: SeaBIOS (Should be able to see Grub menu)
  • Machine: Default (i440fx)
  • SCSI Controller: VirtIO SCSI (It might not be used as sata0 to be considered for disk)
  • Hard Disk (sata0): disk_image_file
  • Network Device (net0): vmxnet3=<mac_address> (This is default for VMware, can use other type too)

Convert the VMware disk to Proxmox disk and attach the disk to new VM

qm importdisk 121 oracle18c.vmdk pool240ssd --format qcow2

Attach the disk as sata0.

Boot

Select item Oracle Linux Server (0-rescue-ed95572bd80641d79f83cd91e03c0283 with Linux) 7.9 from Grub menu to boot into rescue mode.

Note: Tried other option, all got error and unable to boot

Kernel

Find Kernel Package

Login as valid user, then find out the kernel to be used

rpm -q -a | grep kernel | sort

Got following list

kernel-3.10.0-1160.25.1.el7.x86_64
kernel-3.10.0-1160.36.2.el7.x86_64
kernel-3.10.0-1160.el7.x86_64
kernel-tools-3.10.0-1160.36.2.el7.x86_64
kernel-tools-libs-3.10.0-1160.36.2.el7.x86_64
kernel-uek-5.4.17-2102.203.6.el7uek.x86_64
kernel-uek-5.4.17-2102.204.4.2.el7uek.x86_64
kernel-uek-5.4.17-2102.204.4.4.el7uek.x86_64

Choose the latest one, which is also Unbreakable Enterprise Kernel

Recreate Grub kernel files

Find scripts in Kernel package

rpm -q kernel-uek-5.4.17-2102.204.4.4.el7uek.x86_64 --scripts

Following posttrans scriptlet shown

...
posttrans scriptlet (using /bin/sh):
/usr/sbin/new-kernel-pkg --package kernel --mkinitrd --dracut --depmod --update 5.4.17-2102.204.4.4.el7uek.x86_64 || exit $?
/usr/sbin/new-kernel-pkg --package kernel --rpmposttrans 5.4.17-2102.204.4.4.el7uek.x86_64 || exit $?
...

Run above commands to rebuild Grub files, then reboot the system to the menu with kernel recreated.

Note: The error Unable to open file: /etc/keys/x509_ima.der (-2) can be ignored

Network

You can reconfigure network interface the same as VMware, this is to avoid reconfiguration of network settings

Interface Type

Check VMWare vmx file to find disk type, then set the same in Proxmox

ethernet0.virtualDev = "vmxnet3"

Mac Address

You can change Mac Address using the value in VMWare configuration

ethernet0.generatedAddress = "00:11c:22:33:44:55"

Interface Name

Find out the interface in /etc/sysconfig/network-scripts as below.

/etc/sysconfig/network-scripts/ifcfg-ens192

The interface name is ifcfg-ens192

Create the file /etc/udev/rules.d/70-custom-ifnames.rules with the following contents:

SUBSYSTEM=="net",ACTION=="add",ATTR{address}=="00:11:22:33:44:55",ATTR{type}=="1",NAME="ens192"

Then reboot the server, then check the Interface and IP address using ip a command.

References

Consistent network interface device naming

Remove orphan disks in Proxmox

Remove orphan disks in Proxmox

If you canceled disk movement in Proxmox, an orphaned disk will be created. In such case, it will not be shown in VM hardware configuration, and it can not be removed from storage session. If you try to remove it, you will have an error as the disk is attached to a VM.

In order to remove it, rescan disk is required.

Rescan

Use following command can make the orphan disk reattached to the VM.

qm rescan --vmid <vm_id>

*Note: rescan should be done in the Proxmox node which contains VM configuration, otherwise, could not find the VM configuration file error will appear.

References

https://forum.proxmox.com/threads/cancelled-disk-move-orphaned-disk.96650/

Detect and Fix Proxmox file integrity issue

Detect and Fix Proxmox file integrity issue

The files can be easily corrupted if the Proxmox OS is installed on an USB device.

Detection

Because Proxmox uses Debian system, run following command to detect corruption

# dpkg --verify
??5?????? c /etc/apt/sources.list.d/pve-enterprise.list
??5?????? c /etc/lvm/lvm.conf
??5?????? c /etc/issue
# 

Above files can be ignored as reason below.

  • /etc/apt/sources.list.d/pve-enterprise.list: Not using enterprise repo
  • /etc/lvm/lvm.conf: LVM configuration file contains system specific info, such as UUIDs
  • /etc/issue: This file contains IP address of host

Fix

Find out the packages

# dpkg -S <list of file_name>

Reinstall the packages

apt --reinstall install <list of package_name>

References

Import existing zpool as Proxmox storage

Import existing zpool as Proxmox storage

Steps

Import zpool

zpool import <existing_pool_name> <new_pool_name>

Create storage

Create storage via GUI

You have to connect to the node where you create the zpool to create storage in Proxmox Datacenter.

References

ZFS Pool Import - Proxmox single host reinstall without full backup
zpool not shown when add storage