ZFS is a fundamentally different file system because it is more than just a file system. ZFS combines the roles of file system and volume manager, enabling additional storage devices to be added to a live system and having the new space available on all of the existing file systems in that pool immediately. By combining the traditionally separate roles, ZFS is able to overcome previous limitations that prevented RAID groups being able to grow. Each top level device in a pool is called a vdev, which can be a simple disk or a RAID transformation such as a mirror or RAID-Z array. ZFS file systems (called datasets) each have access to the combined free space of the entire pool. As blocks are allocated from the pool, the space available to each file system decreases. This approach avoids the common pitfall with extensive partitioning where free space becomes fragmented across the partitions.
| pool | A storage pool is the most basic building block of ZFS. A pool is made up of one or more vdevs, the underlying devices that store the data. A pool is then used to create one or more file systems (datasets) or block devices (volumes). These datasets and volumes share the pool of remaining free space. Each pool is uniquely identified by a name and a GUID. The features available are determined by the ZFS version number on the pool. | 
| vdev Types | A pool is made up of one or more vdevs, which
	      themselves can be a single disk or a group of disks, in
	      the case of a RAID transform.  When
	      multiple vdevs are used, ZFS spreads
	      data across the vdevs to increase performance and
	      maximize usable space.
	      
  | 
| Transaction Group (TXG) | Transaction Groups are the way changed blocks are
	      grouped together and eventually written to the pool.
	      Transaction groups are the atomic unit that
	      ZFS uses to assert consistency.  Each
	      transaction group is assigned a unique 64-bit
	      consecutive identifier.  There can be up to three active
	      transaction groups at a time, one in each of these three
	      states:
	      
 snapshot
	      are written as part of the transaction group.  When a
	      synctask is created, it is added to the currently open
	      transaction group, and that group is advanced as quickly
	      as possible to the syncing state to reduce the
	      latency of administrative commands. | 
| Adaptive Replacement Cache (ARC) | ZFS uses an Adaptive Replacement Cache (ARC), rather than a more traditional Least Recently Used (LRU) cache. An LRU cache is a simple list of items in the cache, sorted by when each object was most recently used. New items are added to the top of the list. When the cache is full, items from the bottom of the list are evicted to make room for more active objects. An ARC consists of four lists; the Most Recently Used (MRU) and Most Frequently Used (MFU) objects, plus a ghost list for each. These ghost lists track recently evicted objects to prevent them from being added back to the cache. This increases the cache hit ratio by avoiding objects that have a history of only being used occasionally. Another advantage of using both an MRU and MFU is that scanning an entire file system would normally evict all data from an MRU or LRU cache in favor of this freshly accessed content. With ZFS, there is also an MFU that only tracks the most frequently used objects, and the cache of the most commonly accessed blocks remains. | 
| L2ARC | L2ARC is the second level
	      of the ZFS caching system.  The
	      primary ARC is stored in
	      RAM.  Since the amount of
	      available RAM is often limited,
	      ZFS can also use
	      cache vdevs.
	      Solid State Disks (SSDs) are often
	      used as these cache devices due to their higher speed
	      and lower latency compared to traditional spinning
	      disks.  L2ARC is entirely optional,
	      but having one will significantly increase read speeds
	      for files that are cached on the SSD
	      instead of having to be read from the regular disks.
	      L2ARC can also speed up deduplication
	      because a DDT that does not fit in
	      RAM but does fit in the
	      L2ARC will be much faster than a
	      DDT that must be read from disk.  The
	      rate at which data is added to the cache devices is
	      limited to prevent prematurely wearing out
	      SSDs with too many writes.  Until the
	      cache is full (the first block has been evicted to make
	      room), writing to the L2ARC is
	      limited to the sum of the write limit and the boost
	      limit, and afterwards limited to the write limit.  A
	      pair of sysctl(8) values control these rate limits.
	      vfs.zfs.l2arc_write_max
	      controls how many bytes are written to the cache per
	      second, while vfs.zfs.l2arc_write_boost
	      adds to this limit during the
	      “Turbo Warmup Phase” (Write Boost). | 
| ZIL | ZIL accelerates synchronous transactions by using storage devices like SSDs that are faster than those used in the main storage pool. When an application requests a synchronous write (a guarantee that the data has been safely stored to disk rather than merely cached to be written later), the data is written to the faster ZIL storage, then later flushed out to the regular disks. This greatly reduces latency and improves performance. Only synchronous workloads like databases will benefit from a ZIL. Regular asynchronous writes such as copying files will not use the ZIL at all. | 
| Copy-On-Write | Unlike a traditional file system, when data is overwritten on ZFS, the new data is written to a different block rather than overwriting the old data in place. Only when this write is complete is the metadata then updated to point to the new location. In the event of a shorn write (a system crash or power loss in the middle of writing a file), the entire original contents of the file are still available and the incomplete write is discarded. This also means that ZFS does not require a fsck(8) after an unexpected shutdown. | 
| Dataset | Dataset is the generic term
	      for a ZFS file system, volume,
	      snapshot or clone.  Each dataset has a unique name in
	      the format
	      poolname/path@snapshot.
	      The root of the pool is technically a dataset as well.
	      Child datasets are named hierarchically like
	      directories.  For example,
	      mypool/home, the home
	      dataset, is a child of mypool
	      and inherits properties from it.  This can be expanded
	      further by creating
	      mypool/home/user.  This
	      grandchild dataset will inherit properties from the
	      parent and grandparent.  Properties on a child can be
	      set to override the defaults inherited from the parents
	      and grandparents.  Administration of datasets and their
	      children can be
	      delegated. | 
| File system | A ZFS dataset is most often used as a file system. Like most other file systems, a ZFS file system is mounted somewhere in the systems directory hierarchy and contains files and directories of its own with permissions, flags, and other metadata. | 
| Volume | In additional to regular file system datasets, ZFS can also create volumes, which are block devices. Volumes have many of the same features, including copy-on-write, snapshots, clones, and checksumming. Volumes can be useful for running other file system formats on top of ZFS, such as UFS virtualization, or exporting iSCSI extents. | 
| Snapshot | The
	      copy-on-write
	      (COW) design of
	      ZFS allows for nearly instantaneous,
	      consistent snapshots with arbitrary names.  After taking
	      a snapshot of a dataset, or a recursive snapshot of a
	      parent dataset that will include all child datasets, new
	      data is written to new blocks, but the old blocks are
	      not reclaimed as free space.  The snapshot contains
	      the original version of the file system, and the live
	      file system contains any changes made since the snapshot
	      was taken.  No additional space is used.  As new data is
	      written to the live file system, new blocks are
	      allocated to store this data.  The apparent size of the
	      snapshot will grow as the blocks are no longer used in
	      the live file system, but only in the snapshot.  These
	      snapshots can be mounted read only to allow for the
	      recovery of previous versions of files.  It is also
	      possible to
	      rollback a live
	      file system to a specific snapshot, undoing any changes
	      that took place after the snapshot was taken.  Each
	      block in the pool has a reference counter which keeps
	      track of how many snapshots, clones, datasets, or
	      volumes make use of that block.  As files and snapshots
	      are deleted, the reference count is decremented.  When a
	      block is no longer referenced, it is reclaimed as free
	      space.  Snapshots can also be marked with a
	      hold.  When a
	      snapshot is held, any attempt to destroy it will return
	      an EBUSY error.  Each snapshot can
	      have multiple holds, each with a unique name.  The
	      release command
	      removes the hold so the snapshot can deleted.  Snapshots
	      can be taken on volumes, but they can only be cloned or
	      rolled back, not mounted independently. | 
| Clone | Snapshots can also be cloned. A clone is a writable version of a snapshot, allowing the file system to be forked as a new dataset. As with a snapshot, a clone initially consumes no additional space. As new data is written to a clone and new blocks are allocated, the apparent size of the clone grows. When blocks are overwritten in the cloned file system or volume, the reference count on the previous block is decremented. The snapshot upon which a clone is based cannot be deleted because the clone depends on it. The snapshot is the parent, and the clone is the child. Clones can be promoted, reversing this dependency and making the clone the parent and the previous parent the child. This operation requires no additional space. Because the amount of space used by the parent and child is reversed, existing quotas and reservations might be affected. | 
| Checksum | Every block that is allocated is also checksummed.
	      The checksum algorithm used is a per-dataset property,
	      see set.
	      The checksum of each block is transparently validated as
	      it is read, allowing ZFS to detect
	      silent corruption.  If the data that is read does not
	      match the expected checksum, ZFS will
	      attempt to recover the data from any available
	      redundancy, like mirrors or RAID-Z).
	      Validation of all checksums can be triggered with scrub.
	      Checksum algorithms include:
	      
 fletcher algorithms are faster,
	      but sha256 is a strong cryptographic
	      hash and has a much lower chance of collisions at the
	      cost of some performance.  Checksums can be disabled,
	      but it is not recommended. | 
| Compression | Each dataset has a compression property, which
	      defaults to off.  This property can be set to one of a
	      number of compression algorithms.  This will cause all
	      new data that is written to the dataset to be
	      compressed.  Beyond a reduction in space used, read and
	      write throughput often increases because fewer blocks
	      are read or written.
	      
  | 
| Copies | When set to a value greater than 1, the
	      copies property instructs
	      ZFS to maintain multiple copies of
	      each block in the
	      File System
	      or
	      Volume.  Setting
	      this property on important datasets provides additional
	      redundancy from which to recover a block that does not
	      match its checksum.  In pools without redundancy, the
	      copies feature is the only form of redundancy.  The
	      copies feature can recover from a single bad sector or
	      other forms of minor corruption, but it does not protect
	      the pool from the loss of an entire disk. | 
| Deduplication | Checksums make it possible to detect duplicate
	      blocks of data as they are written.  With deduplication,
	      the reference count of an existing, identical block is
	      increased, saving storage space.  To detect duplicate
	      blocks, a deduplication table (DDT)
	      is kept in memory.  The table contains a list of unique
	      checksums, the location of those blocks, and a reference
	      count.  When new data is written, the checksum is
	      calculated and compared to the list.  If a match is
	      found, the existing block is used.  The
	      SHA256 checksum algorithm is used
	      with deduplication to provide a secure cryptographic
	      hash.  Deduplication is tunable.  If
	      dedup is on, then
	      a matching checksum is assumed to mean that the data is
	      identical.  If dedup is set to
	      verify, then the data in the two
	      blocks will be checked byte-for-byte to ensure it is
	      actually identical.  If the data is not identical, the
	      hash collision will be noted and the two blocks will be
	      stored separately.  Because DDT must
	      store the hash of each unique block, it consumes a very
	      large amount of memory.  A general rule of thumb is
	      5-6 GB of ram per 1 TB of deduplicated data).
	      In situations where it is not practical to have enough
	      RAM to keep the entire
	      DDT in memory, performance will
	      suffer greatly as the DDT must be
	      read from disk before each new block is written.
	      Deduplication can use L2ARC to store
	      the DDT, providing a middle ground
	      between fast system memory and slower disks.  Consider
	      using compression instead, which often provides nearly
	      as much space savings without the additional memory
	      requirement. | 
| Scrub | Instead of a consistency check like fsck(8),
	      ZFS has scrub.
	      scrub reads all data blocks stored on
	      the pool and verifies their checksums against the known
	      good checksums stored in the metadata.  A periodic check
	      of all the data stored on the pool ensures the recovery
	      of any corrupted blocks before they are needed.  A scrub
	      is not required after an unclean shutdown, but is
	      recommended at least once every three months.  The
	      checksum of each block is verified as blocks are read
	      during normal use, but a scrub makes certain that even
	      infrequently used blocks are checked for silent
	      corruption.  Data security is improved, especially in
	      archival storage situations.  The relative priority of
	      scrub can be adjusted with vfs.zfs.scrub_delay
	      to prevent the scrub from degrading the performance of
	      other workloads on the pool. | 
| Dataset Quota | ZFS provides very fast and
	      accurate dataset, user, and group space accounting in
	      addition to quotas and space reservations.  This gives
	      the administrator fine grained control over how space is
	      allocated and allows space to be reserved for critical
	      file systems.
	       ZFS supports different types of quotas: the dataset quota, the reference quota (refquota), the user quota, and the group quota. Quotas limit the amount of space that a dataset and all of its descendants, including snapshots of the dataset, child datasets, and the snapshots of those datasets, can consume. Note:Quotas cannot be set on volumes, as the
		    | 
| Reference Quota | A reference quota limits the amount of space a dataset can consume by enforcing a hard limit. However, this hard limit includes only space that the dataset references and does not include space used by descendants, such as file systems or snapshots. | 
| User Quota | User quotas are useful to limit the amount of space that can be used by the specified user. | 
| Group Quota | The group quota limits the amount of space that a specified group can consume. | 
| Dataset Reservation | The reservation property makes
	      it possible to guarantee a minimum amount of space for a
	      specific dataset and its descendants.  If a 10 GB
	      reservation is set on
	      storage/home/bob, and another
	      dataset tries to use all of the free space, at least
	      10 GB of space is reserved for this dataset.  If a
	      snapshot is taken of
	      storage/home/bob, the space used by
	      that snapshot is counted against the reservation.  The
	      refreservation
	      property works in a similar way, but it
	      excludes descendants like
	      snapshots.
	      Reservations of any sort are useful in many situations, such as planning and testing the suitability of disk space allocation in a new system, or ensuring that enough space is available on file systems for audio logs or system recovery procedures and files.  | 
| Reference Reservation | The refreservation property
	      makes it possible to guarantee a minimum amount of
	      space for the use of a specific dataset
	      excluding its descendants.  This
	      means that if a 10 GB reservation is set on
	      storage/home/bob, and another
	      dataset tries to use all of the free space, at least
	      10 GB of space is reserved for this dataset.  In
	      contrast to a regular
	      reservation,
	      space used by snapshots and descendant datasets is not
	      counted against the reservation.  For example, if a
	      snapshot is taken of
	      storage/home/bob, enough disk space
	      must exist outside of the
	      refreservation amount for the
	      operation to succeed.  Descendants of the main data set
	      are not counted in the refreservation
	      amount and so do not encroach on the space set. | 
| Resilver | When a disk fails and is replaced, the new disk must be filled with the data that was lost. The process of using the parity information distributed across the remaining drives to calculate and write the missing data to the new drive is called resilvering. | 
| Online | A pool or vdev in the Online
	      state has all of its member devices connected and fully
	      operational.  Individual devices in the
	      Online state are functioning
	      normally. | 
| Offline | Individual devices can be put in an
	      Offline state by the administrator if
	      there is sufficient redundancy to avoid putting the pool
	      or vdev into a
	      Faulted state.
	      An administrator may choose to offline a disk in
	      preparation for replacing it, or to make it easier to
	      identify. | 
| Degraded | A pool or vdev in the Degraded
	      state has one or more disks that have been disconnected
	      or have failed.  The pool is still usable, but if
	      additional devices fail, the pool could become
	      unrecoverable.  Reconnecting the missing devices or
	      replacing the failed disks will return the pool to an
	      Online state
	      after the reconnected or new device has completed the
	      Resilver
	      process. | 
| Faulted | A pool or vdev in the Faulted
	      state is no longer operational.  The data on it can no
	      longer be accessed.  A pool or vdev enters the
	      Faulted state when the number of
	      missing or failed devices exceeds the level of
	      redundancy in the vdev.  If missing devices can be
	      reconnected, the pool will return to a
	      Online state.  If
	      there is insufficient redundancy to compensate for the
	      number of failed disks, then the contents of the pool
	      are lost and must be restored from backups. | 
All FreeBSD documents are available for download at https://download.freebsd.org/ftp/doc/
Questions that are not answered by the
    documentation may be
    sent to <freebsd-questions@FreeBSD.org>.
    Send questions about this document to <freebsd-doc@FreeBSD.org>.