Month: October 2021

The most insane issue with TrueNAS

The most insane issue with TrueNAS

This morning, I saw the login screen of my TrueNAS, so decided to have a look. After login, the TrueNAS rebooted...

This is really a design issue, both shutdown and reboot are not well designed, the URL can be reused without any warning prompt.

In fact, I knew this issue, but only careful enough just after reboot or shutdown performed. After yesterday's reboot, I didn't try to login using UI URL.

Although I careful enough, this issue leads me avoid using Back button of browser, because the URL can be in history.

The solution can be very easy, just change GET method to POST method in both reboot and shutdown pages with addition variable. But when will they make such change as it is already a mature product for years.

Plan install a NextCloud server at home

Plan install a NextCloud server at home

Planning to have a NextCloud server at home. There are many solutions.

Solutions

Synology

Install NextCloud as docker in Synology ds1812+ or ds2419+, but

  • Unable to use other system next time.
  • Expose all data in NAS
  • CPU and memory limited

But consider use it as storage via iSCSI.

TrueNAS

Install NextCloud on TrueNAS is officially supported. It is installed as docker container in TrueNAS, but

  • Use more power as the system running on a normal PC
  • Migration might be hard

Raspberry Pi

Install NextCloud on Raspberry Pi, a few options there

  • Install directly on OS, but will mix up with OS as well
  • Install as docker, needs to have correct structure to be maintained
  • Install as NextCloudPi, limited storage on SD card

First plan

After considered all above, first plan will be

  • Install on a Pi which has 1GB ethernet port
  • Install BerryBoot and boot from Synology iSCSI disk
  • Configure storage utilize Synology NAS

Pros

  • Separate storage and application
  • Low power
  • Easy to replace the hardware
  • Got official update regularly

Second plan

After read some documents on line, feel the raspberry pi at current stage still isn't good for NextCloud. The major issue is the CPU speed, which also caused 1GB ethernet performance drop to 40MB/s.

I think I had better use my core 2 due machine to install NextCloud. Maybe I will try it on Raspberry Pi 4 with 8GB RAM later as well.

References

NextCloud Website
BerryBoot Installation
BerryBoot iSCSI Installation
Berryboot install NextCloudPi on an external drive step by step
Nextcloud now officially supported on TrueNAS
NextCloud Plugin on TrueNAS

ZFS useful commands

ZFS useful commands

Create pool

Storage providers

Storage provides are spinning disks or SSDs.

ls -al /dev/ada?

Vdevs

Vdevs are grouping of storage providers into various RAID configurations.

RAID 0 or Stripes

Create stripes pool

zpool create OurFirstZpool ada1 ada2 ada3

RAID 1 or Mirror

Create mirror vdev and add into pool

zpool create tank mirror ada1 ada2 ada3

Create another group of mirror vdev and add into existing pool

zpool add tank mirror ada4 ada5 ada6

Detach a disk from vdev

zpool detach tank ada4

RAID-Z1, RAID-Z2 and RAID-Z3

Create RAID-Z1 vdev and add into pool

zpool create tank raidz1 ada1 ada2 ada3

Create a RAID-Z1 vdev and add into existing pool

zpool add tank raidz1 ada4 ada5 ada6

Zpools

Zpools are aggregation of vdevs into a single storage pools

Create pool

zpool create OurFirstZpool ada1 ada2 ada3

Verify pool

zpool status

Add a new disk (vdev) to increase space

zpool add OurFirstZpool ada4

Z-Filesystems

Z-Filesystems are datasets with cool features like compression and reservation.

Create dataset

zfs create OurFirstZpool/dataset1

List dataset

zfs list

Zvols

ZFS Concept

ZFS Concept

Pool

ZFS pool (Zpool) is a collection of one or more virtual devices (vdevs), vdev is a group of physical disks. They have following facts.

  • The redundancy level for vdevs can be a single drive, mirror, RAID-Z1, RAID-Z2, and RAID-Z3.
  • After creating a Zpool, it may not be possible to add additional disks to the vdev except mirrors.
  • Add additional vdevs to expand the Zpool is possible.
  • The storage space allocated to the Zpool cannot be decreased.
  • The drives in vdevs that are parts of the Zpool can be exchanged.

If there is a need to change the layout of the Zpool, the data should be backed up and the Zpool destroyed.

Datasets

Datasets is the space emulating a regular file system.

Datasets can be nested, which can possess different settings for snapshots, compression, deduplication and so on.

Volumes

Volumes (zvols) is the space emulating a block devices.

Data Integrity

No overwritten

The copy-on-write mechanism is to keep old data on the disk.

Checksum

Checksum information is written when data is written into disk, then verified when read data from disk. When checksum mismatch detected, use redundant data is used for correction.

Different checksum algorithms are used

  • Fletcher-based checksum
  • SHA-256 hash

ZFS RAID

  • Single - Zpool has a vdev consisting of a single disk, similar to RAID0.
  • Mirror – similar to RAID1.
  • RAIDZ1 – similar to RAID5 but without the write hole issue.
  • RAIDZ2 – similar to RAID6, with 2 disks redundancy.
  • RAIDZ3 – similar to RAID6, with 3 disks redundancy.

RAID write hole in a RAID5/RAID1 occurs when one of the member disks doesn't match the others and by the nature of single-redundant RAID5/RAID1 it is impossible to tell which of the disks is bad.

Errors

Checksum mismatch

ZFS is a self-healing system. If mismatched checksum is detected, ZFS tries to retrieve the data from other disks. If data correct, the system will amend the incorrect data and checksum.

Disk failure

If a disk in a Zpool fails, the pool is set to the degraded state, then data on the failed device is calculated and written to first the spare disk replaces the failed one. This is called resilvering. Once the restoration operation is complete, the status of the Zpool changes back to online. In case of when multiple disks have failed and if there are not enough redundant devices, the Zpool changes its state into unavailable.

Migrate to different system

In old system, export zpool, which unmounts Zpool’s datasets or zvols.

In new system, import zpool, which mount Zpool's datasets or zvols.

Maintenance

Scrubbing

The scrubbing is consistency check operation, and try to repair corrupted data.

No defragmentation

There is no online defragmentation in ZFS, so try to keep zpools below 70% utilization instead.

Copy-on-write

On ZFS, the data changes are stored on a different location than the original location on a disk and then the metadata is updated in that place on the disk. This mechanism guarantees that the old data is safely preserved in case of power loss or system crash that in other cases would result in loss of data.

Snapshots

The snapshot contains information about the original version of the file system to be retained. Snapshots do not require additional disk space within the pool. Once the data rendered in a snapshot is modified, the snapshot will take the disk space since it will now be pointing to the old data.

Clones

The clone is a writeable version of a snapshot. Overwriting the blocks in the cloned volume or file system results in decrementing the reference count on the previous block. The original snapshot that the clone is depending on, can not be deleted.

Rollback

Rollback command is to go back to a previous version of a dataset or a volume. Note that the rollback command cannot revert changes from other snapshots than the most recent one. If to do so, all intermediate snapshots will be automatically destroyed.

Promote

Promote command is to replace an existing volume with its clone.

References

ZFS Essentials – What is pooled storage?
ZFS Essentials – Copy-on-write & snapshots
ZFS Essentials – Data integrity & RAIDZ
RAID Recovery Guide

Disable Copy-On-Write on BTRFS

Disable Copy-On-Write on BTRFS

The issue of COW (copy on write), is fragmentation, because it always write new device block. This is good for SSD, but not good on traditional devices. Even on SSD, if the block size is big, the data to be write would be much larger than actual updated data size. Because of this issue, recommented disable Copy-On-Write on database and VM filesystems.

Methods

Mounting

Disable it by mounting with nodatacow.

Following facts to be considered

  • This implies nodatasum as well
  • COW may still happen if a snapshot is taken
  • COW will still be maintained for existing files
  • COW status can be modified only for empty or newly created files.

File Attribute

For an empty file, add the NOCOW file attribute (use chattr utility with +C)

touch file1
chattr +C file1

For a directory with NOCOW attribute set, new files in it will inherit this attribute.

chattr +C directory1

For old files, copy the original data into the pre-created file, delete original and rename back.

touch vm-image.raw
chattr +C vm-image.raw
fallocate -l10g vm-image.raw

Subvolume (Untested)

Subvolume can not be set nocow separately. This is official answer.

But the files created inherit the attributes from directory, if separately mount subvolume on the directory which has nocow attribute, then the newly created files will inherit nocow attribute as well, regardless of the original volume.

Create directory

mkdir /var/lib/nocow
chattr +C /var/lib/nocow

Create subvolume

mount -o autodefrag,compress=lzo,noatime,space_cache /dev/mapper/zpool1 /mnt/zpool1
btrfs subvolume create /mnt/zpool1/nocow

Mount subvolume

/dev/mapper/zpool1     /var/lib/nocow  btrfs       rw,noatime,compress=lzo,space_cache,autodefrag,subvol=nocow  0 0

Drawback

No checksum, no integrity.

Nodatacow bypasses the very mechanisms that are meant to provide consistency in the filesystem, because the CoW operations are achieved by constructing a completely new metadata tree containing both changes (references to the data, and the csum metadata), and then atomically changing the superblock to point to the new tree.

With nodatacow, writing data and checksum on the physical medium, cause two writes separately. This could cause the data and the checksum mismatch due to I/O error, file corruption could happen.

References

BTRFS FAQ
Setting up a btrfs subvolume with noCOW

ZFS cache and log

ZFS cache and log

There are two kinds of cache, read cache and write cache.

Read cache

Called ARC and L2ARC.

ARC (Adaptive Replacement Cache)

In memory, caching the information that would require in the near future, while discarding the ones that will be needed furthest ahead in time.

This can be set using kernel/module parameter, such as zfs_arc_max.

L2ARC (Level 2 ARC)

In cache device, extension of ARC. Can be created using following command

zpool add tank cache ada3

Note: tank is the pool name, ada3 is the block device used for caching

Write cache

Called ZIL (ZFS Intent Log).

Asynchronous

By default, ZFS will cache write data in memory before write to disk, this is called asynchronous mode.

Synchronous

Synchronous will make sure data written to disk before continue, this can be set using following command

zfs set sync=always mypool/dataset1

ZFS Intent Log (ZIL)

This is the temporary space to store data before write into main disks, this can be used to speed up write operation. The write operation is considered as completed once data written into ZIL device, which is called SLOG (Separate Intent Log) devices, can be defined as follow

zpool add tank log ada3

Note: tank is the pool name, ada3 is the block device used for slog

If worrying SLOG device faulty, it can be mirrored too.

zpool add tank log mirror ada3 ada4

References

Configuring ZFS Cache for High Speed IO
ZFS Performance with Databases (Cached)

Snapshot and Copy on Write

Snapshot and Copy on Write

Two type of snapshots, they are using different ways.

Keep original data

No write to original data, all new data will be in delta file.

VMware

All new data will be in delta file, the original disk file will not be changed. This is very usefull especially in VDI environment, all VDI servers will base on same images and no impact to original disk file.

When deleting a snapshot, the snapshot files are consolidated and written to the parent snapshot disk. If parent is base disk, and all the data from the delta disk will be merged with the virtual machine base disk.

QEMU / KVM: COW mode

COW mode is available on some formats of virtual machine disk as QCOW2. When using the COW mode, no changes are applied to the disk image. All changes are recorded in a separate file preserving the original image. Several COW files can point to the same image to test several configurations simultaneously without jeopardizing the basic system.

QEMU / KVM allows to incorporate changes from a COW file to the original image

Overwrite original disk

Write latest data into original disk, and the original data move to delta disk.

RedHat LVM snapshot

The data in snapshot volume is original data.

ZFS and btrfs

Due to native copy-on-write feature, file usage reference structure always points to new data, and the old data is saved in old reference structure.

Compare

Original Disk Pros Cons
Keep * Support multiple childs without too much performance overhead Deleting snapshot takes time
When disk full, no more write can be done
Overwrite Less overhead - only when first time writing data on new location
Reverting snapshot takes time
* Deleting snapshot fast
Native COW No overhead
Fast dropping snapshot
Fast reverting snapshot
No impact to service when snapshot full, but snapshot corrupt
Unable to delete file when disk full
Disk fragmented easily
* Unable to cache disk write operation

References

Deleting Snapshots
QEMU / KVM: Using the Copy-On-Write mode
Why would I want to disable Copy-On-Write while creating QEMU Images?

Error of txg_sync blocked for more than 120 seconds

Error of txg_sync blocked for more than 120 seconds

Following error was appearing in my dmesg monitoring screen.

txg_sync blocked for more than 120 seconds --> excessive load

If I'm not wrong, it could be caused by slow harddisk speed, because the TrueNAS zfs cache is about 61GB, can take longer time to flush back to hard disk.

Same as other filesystem, zfs has writeback caching (aka write-behind caching), which will flush data back to hard disk in specific interval. zfs has synchronous and asynchronous mode, they are a bit different that readonly, writethrough and writeback mode.

Except above, zfs has different behaviors on copy on write (COW) as below.

  • Always write to new block due to copy on write
  • Big file for random writing, such as VM disk file, can be fragmented
  • Can not reduce the write operation even if keep writing same block

Therefore, copy on write should be disabled for VM images. But if so, snapshot function could be lost.

Reference

Read-Through, Write-Through, Write-Behind Caching and Refresh-Ahead

Change max arc size on TrueNAS SCALE

Change max arc size on TrueNAS SCALE

After upgrade memory to 64GB, the memory usage is less than 32GB even run two VMs together. To utilize all memory, increase zfs cache size is one of the solution can be done.

c_max

The max arc size is defined as a module parameter, which can be viewed by following command

truenas# grep c_max /proc/spl/kstat/zfs/arcstats
c_max                           4    62277025792
truenas# cat /sys/module/zfs/parameters/zfs_arc_max
62277025792
truenas#

To justify this value, following command can be used, but it is not a persistent way.

echo 60129542144 > /sys/module/zfs/parameters/zfs_arc_max

Suggestion from others

Many suggestions can be found, some of them maybe workable, for example

Create module option file

echo "options zfs zfs_arc_max=34359738368" > /etc/modprobe.d/zfs.conf

But they may not suitable for a NAS OS which can not be backed up using configuration backup provided by NAS OS.

  • The upgrade of OS can simply overwrite or delete the file
  • The file can be lost during OS rebuilting process.

Update sysctl (not workable)

Suggestion is update vfs.zfs.arc_max using sysctl, along with disable autotune, but it is only workable for kernel parameters, but no zfs parameters could be found, the zfs is loaded as module.

Implemenation

The parameter needs to be modified using TrueNAS web interface, to ensure that it will be saved during configuration export via System Settings => General => Manage Configuration => Download File.

So, following command is added into System Settings => Advanced => Init/Shutdown Scripts with When set to Post Init

echo 60129542144 > /sys/module/zfs/parameters/zfs_arc_max

Verification

Verify the setting as below.

arc_summary | grep size

Note: The number is in bytes

Reduce the number

In order to reduce the number without reboot, following command needs to be executed to reduce the cache immediately

echo 3 > /proc/sys/vm/drop_caches

References

Why I cannot modify "vfs.zfs.arc_max" in WebUI?
QEMU / KVM: Using the Copy-On-Write mode

Ubuntu grub-efi-amd64-signed error after do-release-upgrade

Ubuntu grub-efi-amd64-signed error after do-release-upgrade

Following error occurred whenever run apt upgrade after perform do-release-upgrade

dpkg: error processing package grub-efi-amd64-signed (–configure):
installed grub-efi-amd64-signed package post-installation script subprocess returned error exit status 32

Solution

Reinstall all grub group packages using following commands

sudo apt-get purge grub\*
sudo apt-get install grub-efi
sudo apt-get autoremove
sudo update-grub