Disable Copy-On-Write on BTRFS

Posted on 25/10/202125/10/2021 by Bian Xi Leave a Comment

Disable Copy-On-Write on BTRFS

The issue of COW (copy on write), is fragmentation, because it always write new device block. This is good for SSD, but not good on traditional devices. Even on SSD, if the block size is big, the data to be write would be much larger than actual updated data size. Because of this issue, recommented disable Copy-On-Write on database and VM filesystems.

Methods

Mounting

Disable it by mounting with nodatacow.

Following facts to be considered

This implies nodatasum as well
COW may still happen if a snapshot is taken
COW will still be maintained for existing files
COW status can be modified only for empty or newly created files.

File Attribute

For an empty file, add the NOCOW file attribute (use chattr utility with +C)

touch file1
chattr +C file1

For a directory with NOCOW attribute set, new files in it will inherit this attribute.

chattr +C directory1

For old files, copy the original data into the pre-created file, delete original and rename back.

touch vm-image.raw
chattr +C vm-image.raw
fallocate -l10g vm-image.raw

Subvolume (Untested)

Subvolume can not be set nocow separately. This is official answer.

But the files created inherit the attributes from directory, if separately mount subvolume on the directory which has nocow attribute, then the newly created files will inherit nocow attribute as well, regardless of the original volume.

Create directory

mkdir /var/lib/nocow
chattr +C /var/lib/nocow

Create subvolume

mount -o autodefrag,compress=lzo,noatime,space_cache /dev/mapper/zpool1 /mnt/zpool1
btrfs subvolume create /mnt/zpool1/nocow

Mount subvolume

/dev/mapper/zpool1     /var/lib/nocow  btrfs       rw,noatime,compress=lzo,space_cache,autodefrag,subvol=nocow  0 0

Drawback

No checksum, no integrity.

Nodatacow bypasses the very mechanisms that are meant to provide consistency in the filesystem, because the CoW operations are achieved by constructing a completely new metadata tree containing both changes (references to the data, and the csum metadata), and then atomically changing the superblock to point to the new tree.

With nodatacow, writing data and checksum on the physical medium, cause two writes separately. This could cause the data and the checksum mismatch due to I/O error, file corruption could happen.

References

BTRFS FAQ
Setting up a btrfs subvolume with noCOW

Snapshot and Copy on Write

Posted on 25/10/202125/10/2021 by Bian Xi Leave a Comment

Snapshot and Copy on Write

Two type of snapshots, they are using different ways.

Keep original data

No write to original data, all new data will be in delta file.

VMware

All new data will be in delta file, the original disk file will not be changed. This is very usefull especially in VDI environment, all VDI servers will base on same images and no impact to original disk file.

When deleting a snapshot, the snapshot files are consolidated and written to the parent snapshot disk. If parent is base disk, and all the data from the delta disk will be merged with the virtual machine base disk.

QEMU / KVM: COW mode

COW mode is available on some formats of virtual machine disk as QCOW2. When using the COW mode, no changes are applied to the disk image. All changes are recorded in a separate file preserving the original image. Several COW files can point to the same image to test several configurations simultaneously without jeopardizing the basic system.

QEMU / KVM allows to incorporate changes from a COW file to the original image

Overwrite original disk

Write latest data into original disk, and the original data move to delta disk.

RedHat LVM snapshot

The data in snapshot volume is original data.

ZFS and btrfs

Due to native copy-on-write feature, file usage reference structure always points to new data, and the old data is saved in old reference structure.

Compare

Original Disk	Pros	Cons
Keep	* Support multiple childs without too much performance overhead	Deleting snapshot takes time When disk full, no more write can be done
Overwrite	Less overhead - only when first time writing data on new location Reverting snapshot takes time	* Deleting snapshot fast
Native COW	No overhead Fast dropping snapshot Fast reverting snapshot No impact to service when snapshot full, but snapshot corrupt	Unable to delete file when disk full Disk fragmented easily * Unable to cache disk write operation

References

Deleting Snapshots
QEMU / KVM: Using the Copy-On-Write mode
Why would I want to disable Copy-On-Write while creating QEMU Images?

Error of txg_sync blocked for more than 120 seconds

Posted on 24/10/202124/10/2021 by Bian Xi Leave a Comment

Error of txg_sync blocked for more than 120 seconds

Following error was appearing in my dmesg monitoring screen.

txg_sync blocked for more than 120 seconds --> excessive load

If I'm not wrong, it could be caused by slow harddisk speed, because the TrueNAS zfs cache is about 61GB, can take longer time to flush back to hard disk.

Same as other filesystem, zfs has writeback caching (aka write-behind caching), which will flush data back to hard disk in specific interval. zfs has synchronous and asynchronous mode, they are a bit different that readonly, writethrough and writeback mode.

Except above, zfs has different behaviors on copy on write (COW) as below.

Always write to new block due to copy on write
Big file for random writing, such as VM disk file, can be fragmented
Can not reduce the write operation even if keep writing same block

Therefore, copy on write should be disabled for VM images. But if so, snapshot function could be lost.

Reference

Read-Through, Write-Through, Write-Behind Caching and Refresh-Ahead

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: cow

Disable Copy-On-Write on BTRFS

Methods

Mounting

File Attribute

Subvolume (Untested)

Drawback

References

Snapshot and Copy on Write

Keep original data

VMware

QEMU / KVM: COW mode

Overwrite original disk

RedHat LVM snapshot

ZFS and btrfs

Compare

References

Error of txg_sync blocked for more than 120 seconds

Reference