Inside Proxmox VE 9 SAN Snapshot Support

Proxmox VE 9 (PVE9) introduces enhanced snapshot functionality designed to improve compatibility with legacy SAN infrastructures. This technote provides a detailed analysis of the implementation, including its architecture, operational behavior, and known limitations.

Note: This feature is currently offered as a technology preview and is not recommended for production use.

Background: Snapshot Support in Earlier Versions

Before PVE9, snapshot support for traditional enterprise storage required file-based systems capable of handling QCOW2 disk images. Among shared storage options, NFS became the preferred vendor-neutral solution due to its wide compatibility and ease of deployment. The dynamic, grow-on-demand, behavior of QCOW2 images made it incompatible with traditional SAN environments, which have statically provisioned LUNs.

What’s New: External QCOW Snapshots on Shared Thick LVM

PVE9 integrates QCOW2 imaging with dynamically provisioned thick LVM volumes on traditional iSCSI and FC-attached SANs. Although QCOW2 is typically used on a filesystem, it can also be placed directly on a raw block device. Shared storage snapshots are implemented using “external” or “chained” QCOW2 overlays, with each snapshot creating a new logical volume to store delta data.

Functional Improvements

Support for VM snapshots on shared SAN LUNs: Virtual machine snapshots now available on traditional SAN LUNs.
Fast VM rollback performance: Rollbacks from snapshots are significantly faster, improving recovery times.

Known Limitations

High performance impact: QCOW2 overhead results in 30% to 90% performance degradation.
Data consistency and security risks : Uninitialized devices can cause QCOW2 to leak stale data.
Thick provisioning required: Each disk and snapshot image requires a thick-provisioned LVM logical volume.
No support for linked clones from templates: Creating linked clones from VM templates is currently not available.
Snapshot deletion increases storage usage: Overlay data merges during deletion can defeat SAN thin provisioning.
Limited Trusted Platform Module (TPM) support: TPM snapshots are not available, affecting OS compatibility.
NVMe/TCP not recommended : Linux does not implement multi-host starvation controls for NVMe.

Understanding QCOW (QEMU Copy-On-Write)

The QCOW (QEMU Copy-On-Write) image format is a feature-rich virtual disk format developed for use with the QEMU hypervisor. It provides a range of advanced storage capabilities, including internal and external snapshots, thin provisioning, and image layering. The features in QCOW were designed specifically for virtualization environments that need storage efficiency and version control.

Image Format and Data Layout

A QCOW image file consists of several structured components. An image header defines the format version and feature flags. A two-level metadata structure, composed of Level 1 (L1) and Level 2 (L2) tables, maps guest virtual disk offsets to locations within the image file. The actual guest data is stored in fixed-size units known as data clusters, which by default are 64 KiB in size. Additional structures include a reference count table and associated reference count blocks, which track how often each data cluster is used.

Virtual-to-Physical Mapping

QCOW uses a hierarchical two-level mapping to translate guest-visible virtual addresses into physical file offsets. Each L1 entry points to a corresponding L2 table, and each L2 entry maps to a data cluster where the actual content resides. Optionally, L2 entries can include a bitmap enabling subcluster allocation tracking.

QCOW’s metadata indirection enables support for large virtual disks while avoiding the need to preallocate storage. Storage efficiency is achieved through on-demand allocation implemented by the underlying filesystem.

Thin Provisioning and Sparse Allocation

The key efficiency mechanism of QCOW is thin provisioning. Sparse virtual machine disk images are possible by leaving L2 entries in the QCOW metadata unassigned. The absence of an L2 entry (or corresponding sub-cluster allocation bit), informs QCOW that that corresponding virtual address has not been written within the file. Actual storage savings are realized due to the delayed allocation behaviors of the underlying filesystem (i.e., XFS, EXT4, ZFS, etc). In other words, data that is never written does not consume space.

Copy-On-Write Semantics

A core design characteristic of QCOW is its copy-on-write behavior, which governs how writes are processed. When a snapshot is taken, subsequent writes do not overwrite the existing data. Instead, new clusters are allocated, and the write is directed to these fresh locations.

Partial cluster (or partial sub-cluster) writes cause QCOW to perform a copy-on-write operation: the original data is read, modified, and written to a new cluster, preserving the integrity of the snapshot state. The copy-on-write mechanism enables versioning but results in serialized I/O amplification, particularly when workloads involve small or unaligned writes.

Snapshots

QCOW supports internal and external snapshots, each with distinct characteristics and trade-offs.

Internal snapshots are stored within a single QCOW file. Internal snapshots are implemented by duplicating the L1 and L2 metadata structures and adjusting reference counts to coordinate and preserve access to shared data clusters. This approach enables snapshot creation and rollback using basic L1/L2 table manipulation. Before PVE9, all QCOW snapshots were internal.

External snapshots leverage multiple independent QCOW files. Each snapshot is a separate overlay that references a “backing” image. Writes are directed to the top-most overlay, while reads traverse the backing file chain until either an allocated cluster is found or the chain terminates. One advantage of external snapshots is that rollback is fast and can be implemented with a simple file delete. Another advantage, which is particularly relevant to QCOW on SAN, is that the logical QCOW file size is bounded by the changes it contains, establishing predictable maximum size. A significant disadvantage of this approach is that as the chain of snapshots grows, performance degrades because resolving reads may traverse multiple levels of metadata structures distributed across multiple files. Further, deleting external snapshots can require large data merges, causing unexpected increases in storage utilization.

The table below provides a brief comparison of QCOW snapshots strategies.

	Internal Snapshot	External Snapshot
Footprint	Single File	Multiple Files
Cache Footprint	N Snapshots : 1 Cache	N Shapshots : N Caches
Metadata Performance	N Snapshots : 1 Lookup	N Snapshots : up to N Lookups
Rollback	Shuffles Metadata	Deletes A File
Snapshot Delete	Shuffles Metadata	Data Merge

Understanding LVM (Logical Volume Manager)

LVM (Logical Volume Manager) is a Linux device mapper framework that provides a flexible way to manage disk storage. Unlike traditional partitioning schemes that allocate fixed partitions, LVM provides dynamic logical volume management.

Key LVM Concepts

LVM introduces the following key concepts:

Physical Volume (PV): A disk or partition used as part of a storage pool.
Volume Group (VG): A collection of physical volumes that combine to form a storage pool.
Logical Volume (LV): An allocation within the volume group, which can be used as a block device for storing data (like a virtual disk for VMs).

Using Thick LVM as Shared Storage

PVE provisions virtual disks using logical volumes (LV) from a volume group (VG). With Thick LVM Volumes, LVM only tracks the physical extents (continuous regions of physical storage) that make up the logical volume’s address space using lightweight persistent metadata. Technically speaking, the logical volume’s address space is “directly mapped,” meaning that logical addresses are translated to physical addresses in a linear fashion, typically using an in-memory data structure such as an array or linked list.

LVM is not a clustered volume manager. It wasn’t designed to support coordinated, multi-writer access to its metadata without additional synchronization mechanisms. In a clustered environment, PVE enables shared access to LVM storage by ensuring that only one node is responsible for updating the LVM metadata at any given time. The other nodes use coordinated polling to detect metadata changes and refresh their local view of the storage. This creates a single-writer, multi-reader model, where only one node can write the LVM metadata, while the others can read it.

Note: The synchronization model described above works only for thick LVM volumes. It is not compatible with thin LVM, which allocates storage dynamically during write operations.

TECHNICAL DESCRIPTION

Putting it all Together: QCOW2 with External Snapshots on Thick LVM

The new storage feature set in PVE9 combines dynamically provisioned thick LVM logical volumes, QCOW2, and external snapshots, to enable virtual machine snapshot support on shared SAN-backed block storage (i.e., iSCSI, FC) as shown below.

  ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ PVE HOST                                                                                           │
  │             ┌──QEMU/QCOW──┬────────────────LINUX/LVM───────────────┬──MULTIPATH SHARED STORAGE───┐ │
  │ ┌─────────┐ ▼ ┌─────────┐ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ ▼ ┌──────────┐ ┌────────────┐ ▼ │
  │ │ VIRTUAL │   │ QCOW2   │   │ LOGICAL  │ │ VOLUME   │ │ PHYSICAL │   │ MULTIPATH├─► SCSI [SDA] ├───┼─►
  │ │ MACHINE ┼───► IMAGE   ┼───► VOLUME   ┼─► GROUP    ┼─► VOLUME   ┼───► DEVICE   │ ├────────────┤   │
  │ │ [A]     │   │ [QCOW0] │   │ [LV0]    │ │ [VG0]    │ │ [PV0]    │   │ [MPATH0] ├─► SCSI [SDB] ├───┼─►
  │ └─────────┘   └─────────┘   └──────────┘ └──────────┘ └──────────┘   └──────────┘ └────────────┘   │
  └────────────────────────────────────────────────────────────────────────────────────────────────────┘

At the core of this feature is the use of thick LVM logical volumes as raw block devices for QCOW2 images. Each PVE virtual disk (and snapshot) is stored as a QCOW2 image layered on a thick LVM logical volume. To accommodate the dynamic nature of QCOW2 image size, PVE allocates oversized logical volumes, capable of storing the entire dataset formatted as a QCOW2 image.

Thick LVM logical volumes are allocated, on demand, within a multi-host shared volume group using a single (i.e., elected) node to avoid LVM metadata corruption. While LVM itself is not a clustered volume manager, Proxmox enforces a single-writer semantics for LVM metadata updates: one node performs changes while others poll for updates, allowing safe, coordinated access in a multi-node environment.

Virtual machine disk images are stored as QCOW2 images layered directly on thick LVM logical volumes. When a snapshot is created, Proxmox allocates a new full-sized thick LVM volume, places a new QCOW2 image on top, sets the previous image as its backing file, and updates the VM configuration to route I/O to the new image. Single-writer access to the QCOW2 images is implemented by the Proxmox cluster, which ensures that each VM can only run on one node at a time.

Each disk snapshot is a separate QCOW2 image stored on its own thick LVM volume, forming a linear chain of overlays. Reads traverse this chain to resolve the most recently written version of a block, while writes are directed to the topmost layer. A diagram a virtual disk with a single snapshot appears below.

 ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
 │ PVE HOST                                                                                           │
 │             ┌──QEMU/QCOW──┬────────────────LINUX/LVM───────────────┬──MULTIPATH SHARED STORAGE───┐ │
 │ ┌─────────┐ ▼ ┌─────────┐ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ ▼ ┌──────────┐ ┌────────────┐ ▼ │
 │ │ VIRTUAL │   │ QCOW    │   │ LOGICAL  │ │ VOLUME   │ │ PHYSICAL │   │ MULTIPATH├─► SCSI [SDA] ├───┼─►
 │ │ MACHINE ┼───► IMAGE   ┼───► VOLUME   ┼─► GROUP    ┼─► VOLUME   ┼───► DEVICE   │ ├────────────┤   │
 │ │ [A]     │   │ [QCOW1] │   │ [LV1]    │ │ [VG0]    │ │ [PV0]    │   │ [MPATH0] ├─► SCSI [SDB] ├───┼─►
 │ └─────────┘   └────┬────┘   └──────────┘ └────▲─────┘ └──────────┘   └──────────┘ └────────────┘   │
 │                 BACKING                       │                                                    │
 │               ┌────▼────┐   ┌──────────┐      │                                                    │
 │               │ QCOW    │   │ LOGICAL  │      │                                                    │
 │               │ IMAGE   ├───► VOLUME   ┼──────┘                                                    │
 │               │ [QCOW0] │   │ [LV0]    │                                                           │
 │               └─────────┘   └──────────┘                                                           │
 └────────────────────────────────────────────────────────────────────────────────────────────────────┘

Limitations & Considerations

Performance Degradation

Testing revealed a performance loss of 30% to 90% when using base QCOW2-backed disks without snapshots, compared to raw disks provisioned through the plugin, executed against the same storage system.

Using standard FIO tests on disks smaller than 256 GiB, where QCOW metadata fits entirely in memory, we observed a 60% drop in bandwidth and a 30% decline in IOPS.

On larger disks, where metadata QCOW cannot be fully cached, performance fell by 90% across all non-sequential patterns.

Adding snapshots worsened QCOW2 performance, likely due to its chained snapshot architecture, where I/O amplification grows in proportion to the number of snapshots in the chain.

Data Consistency and Information Leak Risks

The QCOW2 format requires that its backing storage return zeros when reading from any offset that has never been written. Sparse files on a filesystem inherently meet this requirement. Raw block devices, however, do not guarantee zero-filled reads for unwritten regions.

In PVE9, there is no mechanism to ensure that a newly provisioned LVM logical volume is initialized to zeros before use. As a result, QCOW2 images stored directly on LVM are vulnerable to corruption after a power loss. In such cases, QCOW2 metadata may reference regions of the device containing residual data from prior use, potentially exposing stale information to the guest and violating data consistency and security guarantees.

Tip: We highly recommend enabling “Wipe Removed Volumes” as a basic precaution. While this is not a complete solution, it does mitigate some risk.

LUN Sizing for Thick Allocations

Currently, this feature requires that you size the LUN on your SAN to accommodate thick provisioning of all disks and snapshots. You must also account for QCOW metadata and fragmentation (estimated at 10%). For example, if you need a 1TiB disk with 24 hours of hourly snapshots, you should provision at least a ((1 + 24) * 1TiB)*110% = 27.5TiB LUN.

Linked Clones From Templates

While external snapshot features in QCOW can be used to implement linked clones, the provisioning logic and state tracking requirements are surprisingly complex. This feature is not currently implemented in PVE9. However it is technically possible and may appear in future releases.

Snapshot Merges Inflate Utilization

Most enterprise SANs implement thin provisioning, which can help offset the storage overhead associated with thick LVM allocation. However, SAN-level space reclamation is not currently supported by the volume management code.

When a snapshot is created, subsequent writes are redirected to a new LVM logical volume. On a thin-provisioned SAN, storage for these writes is allocated on demand. When a snapshot is deleted, the system merges data from the previous QCOW image and LVM logical volume into the new QCOW image and logical volume. This merge process allocates additional storage in the new logical volume but does not release any storage from the old one.

The combination of LVM volume rotation and on-demand write allocation during merge operations can amplify storage consumption, sometimes multiplicatively.

Warning: Extreme caution is required. Many applications and filesystems cannot handle partial write failures caused by block allocation errors such as ENOSPC without risking data corruption.

TPM Incompatibility

QEMU does not recognize QCOW2 as a valid disk format for TPM storage. If a TPM disk resides on a shared LVM storage pool, Proxmox VE (PVE) will be unable to create snapshots of the VM. This limitation may lead to compatibility issues with certain Microsoft products.

Avoid NVMe/TCP and NVMe/Fabrics

Single shared LUN architectures funnel all disk operations from all guests into a single shared task set on the SAN: a well-known and documented performance bottleneck. When a SCSI device reaches its command queuing capacity, SCSI commands are failed with TASK_SET_FULL.

Linux implements a feature called “Dynamic Queue Limits” that allows a SCSI initiator to cooperatively limit queue depth in response to the TASK_SET_FULL congestion notification.

Queue limiting is an essential multi-host fairness feature that is not implemented for NVMe. We recommend avoiding NVMe/TCP and NVMe/Fabrics in production environments without a detailed understanding of the command queueing and I/O handling characteristics of your SAN.

CONCLUSION

Proxmox VE 9’s SAN snapshot support introduces valuable features like VM snapshots on traditional SAN LUNs and delivers fast rollback performance.

As of August 2025, limitations remain around performance, data consistency, and storage overhead that must be resolved before it can be recommended for production use.

We are certain that ongoing development will address these challenges, making future releases more reliable and well suited to meet the demands of enterprise environments.