Overview
Kubernetes v1.36 marks a major milestone for storage management: Volume Group Snapshots have graduated to General Availability (GA). This feature enables crash-consistent snapshots across multiple volumes simultaneously—a game-changer for stateful applications that rely on multiple persistent volumes (e.g., a database storing data on one volume and logs on another). With GA status, you can now confidently use group snapshots in production workflows, ensuring write-order consistency without pausing your applications.
In this tutorial, you'll learn everything you need to get started: from prerequisites and API objects to step-by-step creation and restoration of volume group snapshots. We'll also cover common pitfalls to help you avoid misconfigurations. By the end, you'll be able to leverage group snapshots for reliable backup and disaster recovery.
Prerequisites
Before diving in, ensure your environment meets these requirements:
- Kubernetes cluster running v1.36 or later – The GA feature gate (
VolumeGroupSnapshot) is enabled by default. If you're on an older version, upgrade first. - CSI driver with group snapshot support – Not all CSI drivers implement this capability. Check your storage provider's documentation. Drivers must support the
GroupSnapshotcontroller capability. - kubectl – The command-line tool configured to talk to your cluster.
- Volume Snapshot Controller – Already installed (it’s part of the external-snapshotter project). The controller handles the
VolumeGroupSnapshotlifecycle. - PersistentVolumeClaims (PVCs) – At least two existing PVCs that you want to snapshot together. They must be in the same namespace and use a compatible storage class.
Step-by-Step Instructions
1. Understand the API Objects
Three custom resources orchestrate group snapshots:
- VolumeGroupSnapshot – User-created request to snapshot a group of PVCs.
- VolumeGroupSnapshotContent – Cluster resource created by the snapshot controller after successful provisioning. It binds to the VolumeGroupSnapshot.
- VolumeGroupSnapshotClass – Defines the CSI driver and parameters for creating group snapshots.
These objects live in the groupsnapshot.storage.k8s.io API group.
2. Create a VolumeGroupSnapshotClass
First, define a class that tells Kubernetes which CSI driver to use and any driver-specific parameters. Save the following YAML as vgsc.yaml:
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshotClass
metadata:
name: csi-group-snapclass
annotations:
snapshot.storage.kubernetes.io/is-default-class: "true"
driver: csi.example.com # replace with your CSI driver name
deletionPolicy: Delete
parameters:
# driver-specific parameters (optional)
someParameter: value
Apply it:
kubectl apply -f vgsc.yaml
3. Label PVCs for Grouping
Group snapshots select PVCs using label selectors. Add a common label to all PVCs you want in the group. For example:
kubectl label pvc data-pvc app=myapp
kubectl label pvc logs-pvc app=myapp
4. Create a VolumeGroupSnapshot
Now request the group snapshot. Create a file vgs.yaml:
apiVersion: groupsnapshot.storage.k8s.io/v1beta1
kind: VolumeGroupSnapshot
metadata:
name: my-group-snapshot
namespace: default
spec:
volumeGroupSnapshotClassName: csi-group-snapclass
source:
selector:
matchLabels:
app: myapp
Apply it:
kubectl apply -f vgs.yaml
Kubernetes will create a VolumeGroupSnapshotContent object automatically. Verify with:
kubectl get volumegroupsnapshots
kubectl get volumegroupsnapshotcontents
5. Verify Snapshot Readiness
Check that the snapshot is ready to use:
kubectl describe volumegroupsnapshot my-group-snapshot
Look for status.readyToUse: true. If it stuck at false, troubleshoot using the CSI driver logs.
6. Restore PVCs from the Group Snapshot
To recover data, create new PVCs that reference the individual snapshots within the group. First, list the individual VolumeSnapshot objects that Kubernetes created automatically (each mapped to one original PVC):
kubectl get volumesnapshots
You'll see snapshots named like snapshot-<group-snapshot-uid>-<pvc-uid>. Then create a new PVC using one of those snapshots as a data source. For example:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-restored
spec:
storageClassName: standard
dataSource:
name: snapshot-abc123-def456
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Repeat for each volume. The new PVCs will be pre-populated with crash-consistent data from the point-in-time snapshot group.
7. (Optional) Delete a Group Snapshot
When no longer needed:
kubectl delete volumegroupsnapshot my-group-snapshot
This removes the VolumeGroupSnapshot object and, depending on the deletionPolicy (Delete or Retain), may also delete the underlying snapshots.
Common Mistakes
Mistake #1: Forgetting to Label PVCs
The selector in the VolumeGroupSnapshot spec must match labels on the PVCs. If labels are missing or misspelled, the snapshot will be created but contain zero volumes. Double-check labels before creating the group snapshot.
Mistake #2: Using an Unsupported CSI Driver
Not all CSI drivers implement the group snapshot capability. Confirm with your storage vendor. Attempting to use a driver that doesn't support it will result in an error in the snapshot controller logs.
Mistake #3: Inconsistent Namespace
All PVCs must be in the same namespace as the VolumeGroupSnapshot. Cross-namespace group snapshots are not supported.
Mistake #4: Not Installing the Snapshot Controller
The GA feature requires the external-snapshotter controller (version 8.0+). Without it, the VolumeGroupSnapshot will remain pending. Verify the controller is running in your cluster.
Mistake #5: Assuming Immediate Readiness
Group snapshots may take time, especially for large volumes. Check status.readyToUse and review CSI driver logs if delays occur.
Summary
Volume Group Snapshots in Kubernetes v1.36 GA provide a robust way to create crash-consistent backups across multiple volumes. By using label selectors, you can snapshot any set of PVCs without application quiescence. This tutorial covered the core API objects, step-by-step creation and restoration, and common pitfalls. With these skills, you can confidently protect stateful workloads that span multiple persistent volumes.