Btrfs vs. OpenZFS: Why Snapshot Space Accounting Is Hard

Both Btrfs and OpenZFS are copy-on-write filesystems with native snapshots. On the surface they look similar. Under the hood, they handle one fundamental question very differently: who owns a block of data, and how do you account for it when snapshots share it?

I ran into this the hard way. Enabling Btrfs quotas on a system with hundreds of snapshots crashed my Kubernetes cluster every hour — reliably, for the same reason, until I figured out what was happening.


The Problem with Btrfs Quotas

Btrfs snapshots work by sharing data blocks between the original subvolume and the snapshot. When you write new data to the original, Btrfs copies the affected block first (copy-on-write), so the snapshot retains the old version. Efficient and fast.

The trouble starts when you ask: how much space does each subvolume or snapshot actually use?

To answer that, Btrfs has to figure out which blocks are referenced exclusively by one subvolume versus shared with others. It does this by walking back-references — following pointers from each block back to all the subvolumes that reference it. With many snapshots (a backup tool like btrbk creates hundreds), this walk becomes expensive.

Btrfs calls this process a qgroup rescan. It runs whenever quotas are enabled or become inconsistent — which happens automatically after snapshot deletions. And it blocks fsync on the entire filesystem while it runs.

On my setup, this rescan took up to 15 minutes. etcd (the key-value store inside MicroShift) requires fsync to confirm writes durably. When fsync stalls for more than 5 seconds, etcd aborts with DeadlineExceeded, the Kubernetes API server hangs, and MicroShift crashes.

The pattern was unmistakable once I saw it: MicroShift crashed every hour at the same time — exactly 15 minutes after the backup tool’s hourly cleanup run deleted old snapshots.

Fix: btrfs quota disable /


Why ZFS Doesn’t Have This Problem

OpenZFS solves space accounting at the data model level, not retroactively.

Every block written to ZFS carries a birth transaction group (birth_txg) — a timestamp embedded at write time that records which transaction created the block. ZFS knows directly from the block metadata which dataset owns it. No pointer-chasing, no global scan required.

Space accounting is updated atomically as part of every write transaction. The counters are always up to date. There is no “inconsistent” state that needs to be corrected.

When a snapshot is deleted, ZFS needs to figure out which blocks become free. Rather than scanning the whole filesystem, it consults a per-snapshot deadlist — a pre-computed list of blocks that the snapshot exclusively held. Processing the deadlist happens asynchronously in the background. The deletion returns immediately; the space reclamation catches up on its own.

The result: no blocking, no global rescan, no fsync stall.


Comparison

Btrfs full qgroupsBtrfs squotaBtrfs (no quotas)OpenZFS
How ownership is determinedbackref walk (expensive)creator subvolume (O(1))birth_txg (O(1))
When accounting happensretroactively (rescan)inlineinline, atomic
Snapshot deleteblocking rescanno rescanasync deadlist
Numbers always correctno (inconsistent flag)no (snapshots ≈ 0)yes
Safe alongside etcdnoyesyesyes
Snapshot size measurableyesnonoyes

The Btrfs Simple Quota Alternative

Since Linux kernel 6.7, Btrfs offers a second quota mode: simple quotas (squota), enabled with btrfs quota enable --simple /.

Squota takes an approach inspired by ZFS: attribute each extent permanently to the subvolume that first wrote it. Accounting is O(1) per write, no backref walking, no rescan on snapshot deletion.

The trade-off is significant for snapshot-heavy setups: because ownership is “frozen” at creation time, snapshots appear to use almost zero exclusive space — all extents are attributed to the original subvolume they were cloned from. You can’t use squota to answer “how much space are my snapshots consuming?”

For container image layering (where the base image always outlives derived layers), squota works well. For backup-style snapshots of a live system, the numbers are misleading.


Takeaway

Btrfs and ZFS made different architectural choices when they added snapshot support. ZFS designed space accounting into its transaction model from the beginning — every write carries its ownership metadata. Btrfs bolted quota accounting on afterwards, requiring expensive after-the-fact reconstruction.

This isn’t a criticism of Btrfs in general. It’s a mature filesystem with strong compression, subvolume flexibility, and good tooling. But if you’re running a latency-sensitive application like etcd alongside a snapshot-heavy backup strategy, the qgroup rescan behaviour is a real operational constraint worth understanding before you hit it.