More about AWS Snapshots
- A Beginner’s Guide to AWS Snapshots for Amazon EBS Volumes
- Understanding AWS Snapshot Pricing: Data Transfer and Storage Costs
- Deep Dive into Azure and AWS EBS Snapshots
- Storage Snapshots Deep Dive: Cloud Volumes ONTAP Snapshots
- Crash-Consistent Backups for Applications in the AWS Cloud
- NetApp SnapMirror Data Replication with AWS
- The S3 Outage: Be Prepared For Unavoidable Cloud Failures
NetApp Snapshots™ allow for rapid, point-in-time copies to be made of a storage volume. Unlike AWS snapshots, the process is optimized so that it does not require a full copy of the source data, thereby allowing for snapshots to be created very quickly and to also be highly space efficient. These storage snapshots can serve as backups for your data, allowing for read-only access by client hosts and applications.
NetApp Snapshots™ are an integral part of Cloud Volumes, both Cloud Volumes ONTAP and Cloud Volumes Service, providing all the benefits mentioned above and a lot more. In this snapshot technology deep dive, we will examine the way in which NetApp Cloud Volumes snapshots work in detail and compare them to the native storage snapshot facilities provided by AWS and Azure.
In the following blog of this series, Snapshots Deep Dive: AWS Snapshots and Azure Snapshots, we will examine in detail how AWS Snapshots and Azure Snapshots work.
Cloud Volumes Snapshots
Cloud Volumes snapshot technology is fundamentally linked to NetApp’s WAFL (Write Anywhere File Layout) data organization scheme. WAFL uses equally-sized data blocks arranged in a tree of inodes in order to manage the physical storage used by a volume. If a file stored in the volume is small enough, it may only consume a single data block; however, as storage requirements increase, one inode would point to a set of other inodes that then contain the actual file data. This level of indirection (or the depth of the tree) would increase further as the size of the data stored continues to grow. The root inode at the top of the tree acts as an entry point into this data storage structure.
When a snapshot is created, Cloud Volumes simply makes a copy of the root inode. This single 4KB copy is enough to protect all the data that is to be held in the snapshot, which makes the process very quick and extremely space efficient, regardless of whether your volume is a few megabytes in size or hundreds of terabytes. The snapshot acts as an online, read-only copy of the source data at the point-in-time in which it was created and is accessible just like a regular storage volume. As such they can form the backbone of a NetApp backup of your system.
The Active Filesystem
It’s clear from the WAFL description above that Cloud Volumes snapshots initially require very little storage. The amount of storage used, however, increases as changes are made to the active filesystem. Changes can occur in three different ways: new data is written to the filesystem, existing data is updated or data is deleted. The first case is the easiest to deal with, as fresh blocks in the volume will be allocated for the new data and the snapshot copy will remain largely unaffected.
For block-level updates, Cloud Volumes will not update an existing block that is referenced by any snapshots, as to do so would violate their point-in-time guarantees. Instead, the updated data is written to a newly allocated block in the active filesystem, with inodes updated to now reference this new block instead of the old one. This is where the active filesystem and the snapshot start to diverge, with the old blocks required by the snapshot requiring additional storage space. The net effect of this is that the space required by the snapshot increases as the amount of data it shares with the active filesystem decreases.
In a similar way, if data is deleted from the active filesystem but the blocks used are also locked by a snapshot, the related storage will not actually be freed up for reuse. When the last snapshot to reference these blocks is removed, all data blocks that were being used for the purpose of maintaining the point-in-time copy are also released automatically and all the additional space used for old blocks is freed up.
We can see from this that Cloud Volumes snapshots are not only space efficient when they are created but also as data is changed. With NetApp snapshot explained, let’s look at how some other snapshot technologies work. Amazon EBS AWS snapshots, by comparison, must make an initial full copy of the data that acts as a baseline for subsequent incremental snapshots. For Azure Managed Disks, each Azure snapshot is a complete copy of all the storage in current use. These full copies require additional storage, as well as the time to create them, which can cause other technical issues.
For example, Amazon recommends suspending file writes while creating snapshots in order to ensure a complete and consistent copy of the source data, which may not be feasible for an active production system. Cloud Volumes, on the contrary, has no problems with this requirement and is able to maintain up to 255 snapshots of a hot, active filesystem without any degradation in performance.
Consistent Snapshots, Tiered Storage, Snapshot Backup and Recovery and More
When working with certain enterprise applications, such as database systems, host filesystems, and VMs, it is often necessary to make use of application-consistent snapshots. Such storage snapshots are created in conjunction with the running application that is using the underlying storage. This ensures that in-flight I/O operations are handled appropriately and provides a much greater level of protection against problems such as data corruption. NetApp SnapCenter® provides integration with many different enterprise applications such as Microsoft SQL Server, Oracle, Microsoft Exchange, MySQL, SAP, Oracle, Sharepoint (MS), Windows, Unix, HyperV (MS VMWare), VMWare to allow you to create application-consistent snapshots when your data resides in Cloud Volumes. It’s an easier, safer way to create database snapshots.
Another major benefit of Cloud Volumes snapshots is something that is sometimes overlooked when evaluating storage snapshot data backup methods: the ability to rapidly restore a snapshot. Using NetApp snapshot restore technology SnapRestore®, snapshots can be restored just as quickly as they are created, for any size of source data. The mechanism used to achieve this is brilliantly simple: Cloud Volumes replaces the root inode of the active filesystem with the root inode of the snapshot to be restored. By performing just this metadata change, the version of the storage tree in the snapshot becomes the new writable version of the active filesystem, effectively returning it to that previous point-in-time. This is achieved without the need for copying any data at all.
Another advantage of NetApp snapshots is the ability to tier them to inexpensive object storage on Amazon S3 if you are using Cloud Volumes for AWS storage or to Azure Blob storage is you are deploying with Azure storage. Should you ever need the snapshot, it can automatically be restored to the more performant disks on Amazon EBS or Azure Premium/Standard storage.
NetApp snapshots form the basis for a number of other Cloud Volumes features, including FlexClone® technology, which uses a snapshot of an existing volume to instantly create a space-efficient writable copy that is perfect for DevOps and test environments. SnapMirror® uses snapshot replication to incrementally synchronize volume data between Cloud Volumes ONTAP deployments. And similarly, SnapVault® is used to create a data archive for the local snapshots created on a Cloud Volumes ONTAP system.
As we have seen through the course of this article, Cloud Volumes snapshots greatly enhance your ability to instantly create and restore point-in-time backups of your data, and to use these copies to fulfill other requirements, such as setting up test environments and data replication to other locations.
Whether if it’s for corporate data backup, trying to create database snapshots, or more, this very flexible snapshot technology is also highly space efficient, which helps you manage your cloud operating costs.