Category: zfs

List zfs Filesystems By Creation Date

List zfs Filesystems By Creation Date

There are many snapshots in Ubuntu system if using zfs as OS filesystem. In order to remove those old snapshots, need to list them by creation date using following command

zfs list -H -t snapshot -o name -S creation

To remove those old snapshots, for example, the oldest 18 snapshots can following command

zfs list -H -t snapshot -o name -S creation | tail -18 | xargs -n 1 zfs destroy

References

How to delete all but last [n] ZFS snapshots?

Resize bpool on ubuntu VM with zfs

Resize bpool on ubuntu VM with zfs

Got two kind of messages of disk space issue on bpool.

  • apt upgrade can not perform snapshot
ERROR couldn't save system state: Minimum free space to take a snapshot and preserve ZFS performance is 20%.
Free space on pool "bpool" is 19%.
  • do-release-upgrade can not be performed

Steps

  • Add iSCSI LUN

  • Change grub configuration

  • Partition iSCSI LUN

  • Attach partitions into zpool

  • Detach old partitions from zpool

  • Repartition rpool and bpool partition in old disk

  • Add back to the rpool and bpool

  • Run update-grub2

  • Detach iSCSI rpool and bpool

  • Run following command to set autoexpand

zpool set autoexpand=on bpool
  • Run partprobe or zfs online
zpool online -e bpool <partition_id>
  • Set autoexpand off
zpool set autoexpand=off bpool

Troubleshooting

Removed local boot partition

I also got unable to boot error due to removed local bpool, and grub can not find BOOT filesystem as it was in iSCSI LUN.

To fix this issue, use following steps

  • Boot from CDROM
  • Install open-iscsi package
  • Add iSCSI LUN
  • Use zfs import bpool to import bpool from iSCSI
  • Attach local boot partition back to bpool again
  • Reboot

Used sfdisk copy partition

This creates an issue, the two partitions has same blkid. After added the second iSCSI LUN.

References

HOWTO replace zfs bpool and rpool with larger disk - Ubuntu 20.04 (Virtualbox)
ZFS on Linux resize rpool

ZFS useful commands

ZFS useful commands

Create pool

Storage providers

Storage provides are spinning disks or SSDs.

ls -al /dev/ada?

Vdevs

Vdevs are grouping of storage providers into various RAID configurations.

RAID 0 or Stripes

Create stripes pool

zpool create OurFirstZpool ada1 ada2 ada3

RAID 1 or Mirror

Create mirror vdev and add into pool

zpool create tank mirror ada1 ada2 ada3

Create another group of mirror vdev and add into existing pool

zpool add tank mirror ada4 ada5 ada6

Detach a disk from vdev

zpool detach tank ada4

RAID-Z1, RAID-Z2 and RAID-Z3

Create RAID-Z1 vdev and add into pool

zpool create tank raidz1 ada1 ada2 ada3

Create a RAID-Z1 vdev and add into existing pool

zpool add tank raidz1 ada4 ada5 ada6

Zpools

Zpools are aggregation of vdevs into a single storage pools

Create pool

zpool create OurFirstZpool ada1 ada2 ada3

Verify pool

zpool status

Add a new disk (vdev) to increase space

zpool add OurFirstZpool ada4

Z-Filesystems

Z-Filesystems are datasets with cool features like compression and reservation.

Create dataset

zfs create OurFirstZpool/dataset1

List dataset

zfs list

Zvols

Change max arc size on TrueNAS SCALE

Change max arc size on TrueNAS SCALE

After upgrade memory to 64GB, the memory usage is less than 32GB even run two VMs together. To utilize all memory, increase zfs cache size is one of the solution can be done.

c_max

The max arc size is defined as a module parameter, which can be viewed by following command

truenas# grep c_max /proc/spl/kstat/zfs/arcstats
c_max                           4    62277025792
truenas# cat /sys/module/zfs/parameters/zfs_arc_max
62277025792
truenas#

To justify this value, following command can be used, but it is not a persistent way.

echo 60129542144 > /sys/module/zfs/parameters/zfs_arc_max

Suggestion from others

Many suggestions can be found, some of them maybe workable, for example

Create module option file

echo "options zfs zfs_arc_max=34359738368" > /etc/modprobe.d/zfs.conf

But they may not suitable for a NAS OS which can not be backed up using configuration backup provided by NAS OS.

  • The upgrade of OS can simply overwrite or delete the file
  • The file can be lost during OS rebuilting process.

Update sysctl (not workable)

Suggestion is update vfs.zfs.arc_max using sysctl, along with disable autotune, but it is only workable for kernel parameters, but no zfs parameters could be found, the zfs is loaded as module.

Implemenation

The parameter needs to be modified using TrueNAS web interface, to ensure that it will be saved during configuration export via System Settings => General => Manage Configuration => Download File.

So, following command is added into System Settings => Advanced => Init/Shutdown Scripts with When set to Post Init

echo 60129542144 > /sys/module/zfs/parameters/zfs_arc_max

Verification

Verify the setting as below.

arc_summary | grep size

Note: The number is in bytes

Reduce the number

In order to reduce the number without reboot, following command needs to be executed to reduce the cache immediately

echo 3 > /proc/sys/vm/drop_caches

References

Why I cannot modify "vfs.zfs.arc_max" in WebUI?
QEMU / KVM: Using the Copy-On-Write mode

ZFS Concept

ZFS Concept

Pool

ZFS pool (Zpool) is a collection of one or more virtual devices (vdevs), vdev is a group of physical disks. They have following facts.

  • The redundancy level for vdevs can be a single drive, mirror, RAID-Z1, RAID-Z2, and RAID-Z3.
  • After creating a Zpool, it may not be possible to add additional disks to the vdev except mirrors.
  • Add additional vdevs to expand the Zpool is possible.
  • The storage space allocated to the Zpool cannot be decreased.
  • The drives in vdevs that are parts of the Zpool can be exchanged.

If there is a need to change the layout of the Zpool, the data should be backed up and the Zpool destroyed.

Datasets

Datasets is the space emulating a regular file system.

Datasets can be nested, which can possess different settings for snapshots, compression, deduplication and so on.

Volumes

Volumes (zvols) is the space emulating a block devices.

Data Integrity

No overwritten

The copy-on-write mechanism is to keep old data on the disk.

Checksum

Checksum information is written when data is written into disk, then verified when read data from disk. When checksum mismatch detected, use redundant data is used for correction.

Different checksum algorithms are used

  • Fletcher-based checksum
  • SHA-256 hash

ZFS RAID

  • Single - Zpool has a vdev consisting of a single disk, similar to RAID0.
  • Mirror – similar to RAID1.
  • RAIDZ1 – similar to RAID5 but without the write hole issue.
  • RAIDZ2 – similar to RAID6, with 2 disks redundancy.
  • RAIDZ3 – similar to RAID6, with 3 disks redundancy.

RAID write hole in a RAID5/RAID1 occurs when one of the member disks doesn't match the others and by the nature of single-redundant RAID5/RAID1 it is impossible to tell which of the disks is bad.

Errors

Checksum mismatch

ZFS is a self-healing system. If mismatched checksum is detected, ZFS tries to retrieve the data from other disks. If data correct, the system will amend the incorrect data and checksum.

Disk failure

If a disk in a Zpool fails, the pool is set to the degraded state, then data on the failed device is calculated and written to first the spare disk replaces the failed one. This is called resilvering. Once the restoration operation is complete, the status of the Zpool changes back to online. In case of when multiple disks have failed and if there are not enough redundant devices, the Zpool changes its state into unavailable.

Migrate to different system

In old system, export zpool, which unmounts Zpool’s datasets or zvols.

In new system, import zpool, which mount Zpool's datasets or zvols.

Maintenance

Scrubbing

The scrubbing is consistency check operation, and try to repair corrupted data.

No defragmentation

There is no online defragmentation in ZFS, so try to keep zpools below 70% utilization instead.

Copy-on-write

On ZFS, the data changes are stored on a different location than the original location on a disk and then the metadata is updated in that place on the disk. This mechanism guarantees that the old data is safely preserved in case of power loss or system crash that in other cases would result in loss of data.

Snapshots

The snapshot contains information about the original version of the file system to be retained. Snapshots do not require additional disk space within the pool. Once the data rendered in a snapshot is modified, the snapshot will take the disk space since it will now be pointing to the old data.

Clones

The clone is a writeable version of a snapshot. Overwriting the blocks in the cloned volume or file system results in decrementing the reference count on the previous block. The original snapshot that the clone is depending on, can not be deleted.

Rollback

Rollback command is to go back to a previous version of a dataset or a volume. Note that the rollback command cannot revert changes from other snapshots than the most recent one. If to do so, all intermediate snapshots will be automatically destroyed.

Promote

Promote command is to replace an existing volume with its clone.

References

ZFS Essentials – What is pooled storage?
ZFS Essentials – Copy-on-write & snapshots
ZFS Essentials – Data integrity & RAIDZ
RAID Recovery Guide

ZFS cache and log

ZFS cache and log

There are two kinds of cache, read cache and write cache.

Read cache

Called ARC and L2ARC.

ARC (Adaptive Replacement Cache)

In memory, caching the information that would require in the near future, while discarding the ones that will be needed furthest ahead in time.

This can be set using kernel/module parameter, such as zfs_arc_max.

L2ARC (Level 2 ARC)

In cache device, extension of ARC. Can be created using following command

zpool add tank cache ada3

Note: tank is the pool name, ada3 is the block device used for caching

Write cache

Called ZIL (ZFS Intent Log).

Asynchronous

By default, ZFS will cache write data in memory before write to disk, this is called asynchronous mode.

Synchronous

Synchronous will make sure data written to disk before continue, this can be set using following command

zfs set sync=always mypool/dataset1

ZFS Intent Log (ZIL)

This is the temporary space to store data before write into main disks, this can be used to speed up write operation. The write operation is considered as completed once data written into ZIL device, which is called SLOG (Separate Intent Log) devices, can be defined as follow

zpool add tank log ada3

Note: tank is the pool name, ada3 is the block device used for slog

If worrying SLOG device faulty, it can be mirrored too.

zpool add tank log mirror ada3 ada4

References

Configuring ZFS Cache for High Speed IO
ZFS Performance with Databases (Cached)

Error of txg_sync blocked for more than 120 seconds

Error of txg_sync blocked for more than 120 seconds

Following error was appearing in my dmesg monitoring screen.

txg_sync blocked for more than 120 seconds --> excessive load

If I'm not wrong, it could be caused by slow harddisk speed, because the TrueNAS zfs cache is about 61GB, can take longer time to flush back to hard disk.

Same as other filesystem, zfs has writeback caching (aka write-behind caching), which will flush data back to hard disk in specific interval. zfs has synchronous and asynchronous mode, they are a bit different that readonly, writethrough and writeback mode.

Except above, zfs has different behaviors on copy on write (COW) as below.

  • Always write to new block due to copy on write
  • Big file for random writing, such as VM disk file, can be fragmented
  • Can not reduce the write operation even if keep writing same block

Therefore, copy on write should be disabled for VM images. But if so, snapshot function could be lost.

Reference

Read-Through, Write-Through, Write-Behind Caching and Refresh-Ahead

Remove ubuntu zfs snapshots

Remove ubuntu zfs snapshots

There are so many snapshots when using zfs in ubuntu.

Issue

When tried to do release update, got following error

# do-release-update
...
...
Not enough free disk space 

The upgrade has aborted. The upgrade needs a total of 256 M free 
space on disk '/boot'. Please free at least an additional 91.4 M of 
disk space on '/boot'. You can remove old kernels using 'sudo apt 
autoremove' and you could also set COMPRESS=xz in 
/etc/initramfs-tools/initramfs.conf to reduce the size of your 
initramfs. 
...

This error messsage was occurred many times before, but those systems had very small /boot partition or many old kernels kept. If it is the first case, total repartitioning and moving root filesystem are required.

Space on /boot

Examing disk space for bpool, found that zfs reported 675MB used in bpool, but actual usage is only 242MB.

root@ubuntu:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
bpool   960M   675M   285M        -         -    30%    70%  1.00x    ONLINE  -
rpool  17.5G  7.99G  9.51G        -         -    21%    45%  1.00x    ONLINE  -
root@ubuntu:~# zfs list bpool
NAME    USED  AVAIL     REFER  MOUNTPOINT
bpool   675M   157M       96K  /boot
root@ubuntu:~# du -cshx /boot
242M    /boot
242M    total
root@ubuntu:~# 

Then found many snapshots both in bpool and data pool

root@ubuntu:~# zfs list -t snapshot | head
NAME                                                               USED  AVAIL     REFER  MOUNTPOINT
bpool/BOOT/ubuntu_e8m8h0@autozsys_ywm1ok                             0B      -      238M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_ms74md                             0B      -      238M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_ugu9z7                            80K      -      242M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_r3xqau                            72K      -      242M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_nkagbh                             0B      -      242M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_xdbwsy                             0B      -      242M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_zrt7vi                            72K      -      242M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_jbmnwk                            72K      -      242M  -
bpool/BOOT/ubuntu_e8m8h0@autozsys_0e5p2e                            64K      -      242M  -
root@ubuntu:~# 
root@ubuntu:~# zfs list -t snapshot | wc
    301    1505   27701

Too many! Not sure how many snapshots ubuntu likes to create

Removing snapshots

List all snapshots for /boot

root@ubuntu:~# df /boot
Filesystem               1K-blocks   Used Available Use% Mounted on
bpool/BOOT/ubuntu_e8m8h0    408192 247808    160384  61% /boot
root@ubuntu:~# zfs list -H -o name -t snapshot bpool/BOOT/ubuntu_e8m8h0
bpool/BOOT/ubuntu_e8m8h0@autozsys_ywm1ok
bpool/BOOT/ubuntu_e8m8h0@autozsys_ms74md
bpool/BOOT/ubuntu_e8m8h0@autozsys_ugu9z7
bpool/BOOT/ubuntu_e8m8h0@autozsys_r3xqau
bpool/BOOT/ubuntu_e8m8h0@autozsys_nkagbh
bpool/BOOT/ubuntu_e8m8h0@autozsys_xdbwsy
bpool/BOOT/ubuntu_e8m8h0@autozsys_zrt7vi
bpool/BOOT/ubuntu_e8m8h0@autozsys_jbmnwk
bpool/BOOT/ubuntu_e8m8h0@autozsys_0e5p2e
bpool/BOOT/ubuntu_e8m8h0@autozsys_b17dwn
bpool/BOOT/ubuntu_e8m8h0@autozsys_uad1rb
bpool/BOOT/ubuntu_e8m8h0@autozsys_mxhvc9
bpool/BOOT/ubuntu_e8m8h0@autozsys_9athz8
bpool/BOOT/ubuntu_e8m8h0@autozsys_61umv1
bpool/BOOT/ubuntu_e8m8h0@autozsys_1q65cz
root@ubuntu:~# 

Then remove them

zfs list -H -o name -t snapshot bpool/BOOT/ubuntu_e8m8h0 | xargs -n 1 zfs destroy

Now, it is ok to upgrade

root@ubuntu:~# zfs list -o space bpool
NAME   AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
bpool   589M   243M        0B     96K             0B       243M
root@ubuntu:~# 

TODO: Move dataset to another zpool in TrueNAS

Move dataset to another zpool in TrueNAS

In Synology, move share folder to another volume is quite easy, can be done via UI interface. In TrueNAS, I could not find such task can be selected.

Duplicate dataset from snapshot

The workable solution is utilize the zfs command to duplicate in SSH environment, then export old pool and import new one.

First make a snapshot poolX/dataset@initial, then use following command duplicate zfs dataset snapshot to new zpool.

zfs send poolX/dataset@initial | zfs recv -F poolY/dataset

Update new dataset

Then make another snapshot poolX/dataset@incremental, then use following command update zfs dataset snapshot to new zpool.

zfs send -i initial poolX/dataset@incremental | zfs recv poolY/dataset

Activate new dataset

To make the new dataset usable, rollback snapshot needs to be performed for new dataset.

Update share

Change shared point to use new pool.

Update client

This is only required if client used server filesystem structure, such as NFS.

References

Migrate to smaller disk
*Note: pv (Pipe Viewer) command is not installed in TrueNAS by default.