Blog

Blog

Reinstall Proxmox VE node in cluster

Reinstall Proxmox VE node in cluster

After node pve01 in Proxmox VE cluster crushed, reinstall new pve01 in same hardware.

Install PVE using ISO

This just follows the normal installation steps.

Try and error

Tried many tries, end up using following steps to add the replacement node.

  • In the any old node, which is not the node itself, run following to del the node from cluster
pvecm delnode <old_node>
  • Remove old node known host from all other nodes
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "<old_node>"
ssh-keygen -f "/root/.ssh/known_hosts" -R "<old_node>"
  • In the new node, run
pvecm add <existing_node>
pvecm updatecerts
  • Update vote for new node (optional)

Edit file /etc/pve/corosync.conf change the vote number.

  • Import old local pools
zpool import -f <old_local_pool>

Change expected votes

Run following commands to check and set acceptable votes in existing node in the cluster

pvecm status
pvecm expected 3

Remove old node

pvecm delnode pve01

Remove old ssh know host

ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve01"
ssh-keygen -f "/root/.ssh/known_hosts" -R "pve01"

or manual edit two files.

Add node

Run following command in NEW NODE

pvecm add <existing_node>

Sync certs

pvecm updatecerts

Test SSH key authentication

Make sure SSH Key authentication is working

Copy UI certificate

cp /etc/pve/nodes/pve02/pveproxy-ssl.* /etc/pve/nodes/pve01

Remove local-zfs filesystem

If the previous node was using zfs, then now change to ext4, local-zfs needs to be removed.

vi /etc/pve/storage.cfg

If need to disable cluster, following command can be used

systemctl stop pve-cluster
/usr/bin/pmxcfs -l

Restart pve01 cluster

systemctl restart pve-cluster

References

Cluster Manager
Wiki - Cluster Manager
Correct procedure for zpool removal

Install *Synology* NAS managed *Let’s Encrypt Certificate* in *NGINX*

Install Synology NAS managed Let's Encrypt Certificate in NGINX

Certificate Management

Synology NAS can be used for certificate management, and Let's Encrypt certificate can be exported as ZIP file used for NGINX HTTPS configuration.

  1. Go to Control Panel -> Security -> Certificate
  2. Select certificate to be exported
  3. Select Export Certificate from right click menu
  4. Save exported file

For existing certificates, can use right click -> renew option to renew.

Note: All domain in the certificates, must be resolved to current Synology NAS at port 80 and port 443, otherwise, certificate generation will be failed.

In downloaded ZIP file, following files can be found.

  • certs.pem
  • chain.pem
  • privkey.pem

NGINX configuration

  1. Concatenate cert.pem and chain.pem to cert-with-chain.pem (or fullchain.pem) file

  2. Copy cert-with-chain.pem and privkey.pem into NGNIX conf.d folder

  3. Verify NGINX configuration as below

ssl_certificate     conf.d/cert-with-chain.pem;
ssl_certificate_key conf.d/privkey.pem;
  1. Restart NGINX

Verification

Browser

The date of issue for new certificate should be displayed in certificate information window.

Command line

Following command can be used for verification

openssl s_client -connect <domain_name>:<port>

If got following error, concatenate chain.pem into cert.pem, because the full chain is required.

verify error:num=20:unable to get local issuer certificate
verify error:num=21:unable to verify the first certificate

References

How to install Let's Encrypt on Nginx

Using certbot apply let’s encrypt certificate

Using certbot apply let's encrypt certificate

In order to use NGINX module, certbot needs to use it's own NGINX server or it needs to modify the NGINX configuration.

Steps

Preparation

  • Shutdown application which listening on port 80 and port 443
docker stop nginx
  • Install software if haven't installed

Note: skip this step if packages installed

apt install certbot
apt install python3-certbot-nginx
  • Request certificate

Note: do not need to start nginx service, certbot will start it automatically

certbot certonly --nginx -d <domain1> -d <domain2> -d <domain3>
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Certificate not yet due for renewal

You have an existing certificate that has exactly the same domains or certificate name you requested and isn't close to expiry.
(ref: /etc/letsencrypt/renewal/<domain1>.conf)

What would you like to do?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: Attempt to reinstall this existing certificate
2: Renew & replace the certificate (may be subject to CA rate limits)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 2
Renewing an existing certificate for <domain1> and <domain2>

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/<domain1>/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/<domain1>/privkey.pem
This certificate expires on 2023-05-11.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.

Deploying certificate
Successfully deployed certificate for <domain1> to /etc/nginx/sites-enabled/default
Successfully deployed certificate for <domain2> to /etc/nginx/sites-enabled/default
Your existing certificate has been successfully renewed, and the new certificate has been installed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  • certificate location

Certificate can be found in following directory

ls /etc/letsencrypt/live/domain1/
  • stop nginx created by certbot
systemctl stop nginx
systemctl disable nginx
  • setup docker certificates

Copy privkey.pem and fullchain.pem into docker configuration directory.

Troubleshooting

All domains in the command lines must be resolved to the running host for both port 80 and port 443, otherwise the certificate can not be created.

Another way

Run certbot docker choud be better as no additional package install, and the certbot service can be stopped using docker command

References

Issue using certbot with nginx
Get Certbot

Fix Synology `Allocation Status` Crashed Error

Fix Synology Allocation Status Crashed Error

I use JBOD for backup volume with checksum turned on, because I don't expect both data on source and backup date lost. The issue of one disk in JBOD volume can cause volume crash, which becomes read only. When checking the the status further, only one disk shows Allocation Status as Crashed but Health Status as Healthy.

In the pass, due to the faulty volume is in read only status, I need to create new folders with new names and copy all data into new folders, then rebuilt the disk array, and move the volume back to new created volume, which requires reconfiguration of permission and services too, such as NFS, Timemachine, Rsync, etc. It can take days to complete all these tasks.

This time, I tried to recover the volume using a few commands.

Steps

Recreate Array

  • Login into command line of Sysnology as root

  • Find the array

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda5[0] sdc5[2] sdb5[1]
      1943862912 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md12 : active raid5 sdjc7[5] sdjb7[6] sdjd7[3] sdja7[7] sdje7[8]
      1953467648 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md9 : active raid5 sdjc6[9] sdjb6[8] sdja6[6] sdjd6[7] sdje6[5]
      703225088 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md6 : active raid5 sdjc5[6] sdjd5[5] sdjb5[9] sdja5[8] sdje5[7]
      1230960384 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md4 : active linear sdg3[0] sdh3[2](E) sdf3[1]
      2915921472 blocks super 1.2 64k rounding [3/3] [UUE]

md10 : active raid5 sdja8[2] sdje8[3] sdjc8[4]
      1953485824 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md7 : active raid5 sdib6[4] sdie6[5] sdic6[3] sdia6[2] sdid6[1]
      3906971648 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md3 : active raid5 sdie5[5] sdia5[4] sdid5[3] sdib5[7] sdic5[6]
      7794733824 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md8 : active raid5 sdie7[0] sdib7[3] sdic7[2] sdia7[1]
      2930228736 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md1 : active raid1 sdh2[5] sdg2[4] sdf2[3] sdc2[2] sdb2[1] sda2[0]
      2097088 blocks [8/6] [UUUUUU__]

md0 : active raid1 sdh1[3] sdg1[4] sdf1[2] sda1[0] sdb1[1] sdc1[6]
      2490176 blocks [8/6] [UUUUU_U_]

unused devices: <none>
  • Collect RAID info
# mdadm --examine /dev/sdh3
/dev/sdh3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6783225a:318612f7:3473d58a:09a977b2
           Name : ds1812:4  (local to host ds1812)
  Creation Time : Wed Dec 28 07:04:52 2022
     Raid Level : linear
   Raid Devices : 3

 Avail Dev Size : 3897584768 (1858.51 GiB 1995.56 GB)
  Used Dev Size : 0
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=65 sectors
          State : clean
    Device UUID : 14704640:a5536257:40c4ae47:2f008c53

    Update Time : Sat Jan 21 00:36:29 2023
       Checksum : 8685d50c - correct
         Events : 5

       Rounding : 64K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
root@ds1812:~#
  • Unmount the system, if not successful, use force and kill option
# umount -f -k /volume3
  • Stop array
# mdadm --stop /dev/md4
  • Recreate array, answer the question as y
# mdadm --create --force /dev/md4 --metadata==1.2 --raid-devices=3 ---level=linear /dev/sdg3 /dev/sdf3 /dev/sdh3 -u6783225a:318612f7:3473d58a:09a977b2
mdadm: ... appears to be part of a raid array:
       ...
Continue creating array? y

Now, the array has been recreated, and should be in correct state

# cat /proc/mdstat

Check the filesystem and mount it again

The filesystem type is btrfs, so use following command to verify it

# btrfsck /dev/md4
Syno caseless feature on.
Checking filesystem on /dev/md4
UUID: 7a3a3941-e0c4-4505-8981-d309fb9482a5
checking extents
checking free space tree
checking fs roots
checking csums
checking root refs
found 2037124587520 bytes used err is 0
total csum bytes: 1986978456
total tree bytes: 2458648576
total fs tree bytes: 62947328
total extent tree bytes: 50741248
btree space waste bytes: 294577149
file data blocks allocated: 6689106694144
 referenced 1995731652608
root@ds1812:/# echo $?
0

Mount the filesystem, now, the Synology error beep should be stopped

mount /volume3

References

How to handle a drive that has "Allocation Status: Crashed"
[HOWTO] repair a clean volume who stays crashed volume
mdadm(8) — Linux manual page
Manualy repair filesystem command line DS214
How to recover from BTRFS errors

Shell command to remove `(1)` from filename

Shell command to remove (1) from filename

To compare massive number of files with (1) in file name, with the original files without (1), such as ABCD(1).txt and ABCD.txt, following commands can be used. Beware, they are not steps but commands.

Use bash substring

  • Find out all *(1)* files and check whether have original file in same folder.
find . -name "*\(1\)*" | while read line
do
    if test -e "${line/(1)/}"; then
        echo "$line"
    fi
done

Then can clean up them one by one.

  • Move them to another directory

  • Rename them to be the same as original file in same folder

find . -name "*\(1\)*" | while read line
do
    if test ! -e "${line/(1)/}"; then
        mv "$line" "${line/(1)/}"
    fi
done
  • Compare them with original files in same folder

Note: This method only work with the original filename has no (1) string.

Use sed

Following sample script can be used for same task.

#!/bin/bash

find . -name "*" -type f | while read line
do
        dname="`dirname -- \"$line\"`"
        bname="`basename -- \"$line\"`"
        # pattern='s/\(([0-9])\)\./\1/'         # remove "." if match "(1).", \1 == ([0-9])
        # pattern='s/(\([0-9]\))\./\1/'         # remove "(", ")" and "." if match "(1).", \1 == [0-9]
        # pattern='s/([0-9]).//'                # remove "(1)"+any_char
        # pattern='s/[0-9]\.//'                 # remove "(1)."
        # pattern='s/([0-9])\././'              # remove "(1)"
        pattern='s/\s*([0-9])\././'           # remove any_space+"(1)"
        # pattern='s/\s*\././'                  # remove any_space before "."
        # pattern='s/^\./11./'                  # add "11" in front if start with "."
        # pattern='s/^01\./10./'                        # replace starting "01." to "10."
        # pattern='s/^0\([2-9]\)\./1\1./'               # replace starting "01." to "10."
        nname="`echo \"$bname\" | sed -e "$pattern"`"
        # echo "$bname"; echo "$nname"

        if [ "$nname" != "$bname" -a ! -e "$nname" ] ; then
                pushd "$dname"
                echo "$bname"; echo "$nname"
                mv "$bname" "$nname"
                popd
        fi
done

Use vim

  • Use following command to get the list of file name
find . -name "*(1).*" -exec echo mv ~{}~ ~{}~ \; > list
  • Use vim to edit the file
vi list
  • Use lookahead to replace the last (1)
%s/.*\zs(1)//
  • Replace ~ to ", then save it
%s/\~/"/g
  • Run the script
sh list

References

How to change last occurrence of the string in the line?
Regex lookahead and lookbehind

Query RAM Type in Windows 11

Query RAM Type in Windows 11

To query RAM type for each slot, run following command

wmic memorychip get

Note: High clock speed RAM can be used in low speed computer normally.

References

How to get full PC memory specs (speed, size, type, part number, form factor) on Windows 10
Is there any problem if I use 3200 MHz RAM whereas my motherboard supports up to 2400 MHz?

Unable to query DNS with DOMAIN from `dnsmasq` server

Unable to query DNS with DOMAIN from dnsmasq server

When doing nslookup, dnsmasq server could not reply the DNS with DOMAIN, but able to reply short dns name only. Following message may appear.

# nslookup www
....
dnsmasq server can't find www.example.com: NXDOMAIN

Solution

The reason is that DNS entries in dnsmasq host file (default is banner_add_hosts) has no domain name

192.168.1.1  www

In dnsmasq.conf file

Following lines are required. The expand-hosts option allows appending the domain name defined in domain line to short hostname in host file

domain=example.com,192.168.1.0/24
expand-hosts

References

Move MicroSD boot proxmox to eMMC

Move MicroSD boot proxmox to eMMC

Steps

  • Manually deplicate partition from MicroSD to eMMC using fdisk, ignore the bios partition as EFI partition used.
  • Unmount old /boot/efi partition, then duplicate EFI partition using dd from MicroSD, this can keep UUID
  • Create PV on eMMC data partition and add it to pve VG
  • Move all data from old MicroSD partition to eMMC partition
    pvmove /dev/<MicroSD partition>
  • Check structure and UUID using following command
    lsblk -o +UUID
  • Remove MicroSD PV from pve VG using vgreduce, then use pvremove to remove PV from MicroSD
  • Mount new /boot/efi partition, then run grub-install to recreate grub.cfg file
  • Remove MicroSD from system, then reboot

References