Blog

Automating Your Disk Backup and Data Archive Part 1: AWS Database Backup with AWS Snapshots

In today’s world of highly- available systems, automatic failover and disaster recovery, there’s still an important role for data backups, like AWS snapshots, to play in data protection. Though backups might not be used to recover live systems as they have in the past, they still help to recover from ransomware attacks, security issues and user errors, cure data corruption issues, create development and test environments, and act as a last resort when all else fails.


In this first part of a three-part series on database backups, we are going to consider the different ways AWS database backup and recovery can be performed in both hybrid cloud and cloud-only environments. We’ll also look at how these backups can be stored redundantly across Availability Zones and Regions, to ensure they’re always available when needed.


Up ahead in this series, we’ll discuss Azure database backups, and then conclude by looking at how Cloud Volumes ONTAP manages data backup and archiving. Offering an alternative approach, NetApp’s unified data management solution can create consistent backups and protect them on both the AWS and Azure clouds.

Database Backups in the Cloud: AWS Backup

Database systems are a good example of where backups are typically used, for both AWS backup and restore.


Most enterprise databases are continually updated as new transactions are executed to create or modify data. Most of these databases, therefore, grow to large sizes. Backups can be performed in one of two ways: Using a file-based copy of the database data or by using a snapshot backup, which backs up all database data in a single, usually rapid, operation on the underlying storage.

Hybrid Cloud

In a hybrid cloud environment, on-premises systems are used in conjunction with cloud resources. If the primary database instance runs on-premises and needs to be backed up to the cloud, then a file-based backup approach is appropriate. (Block-based snapshots, like those used by Cloud Volumes ONTAP, are also applicable here, but those will be covered in a following article).


Database file backups usually come in the form of full backups, which are a dump of all data in a database, and transaction log backups, which backs up all new transactions that have occurred since the last transaction log backup.


Some platforms also support differential backups, which backs up all data since the last full backup, or, for some database platforms, since the last differential backup. Using these different backup types together, an administrator can restore a database to a specific point in time.


Backup files can be moved to the cloud to create an off-site copy, which can then be accessed in the event of a local, site-wide failure. Usually, the backup files would be created locally and then moved to the cloud when the backup is complete. An object filesystem or a file share could be used as the destination.


On AWS, this would be either Amazon S3, Amazon EFS, or the new Amazon FSx. A mechanism would be required, however, to reliably transport the backup files from the on-premises source to the cloud destination and back, during a recovery operation.

Cloud Database Systems

Databases residing entirely in the cloud can make use of additional features, such as snapshots, backup redundancy and managed database services.

AWS Database Snapshot Backups

For block-level cloud storage solutions, snapshot backups can be used to provide fast, storage-efficient, and complete AWS database backups.

Snapshot backups come in two varieties:

  • Application-consistent
  • Crash-consistent

Application-consistent snapshots are created in cooperation with the application using the storage and provide higher guarantees of consistency. This is because the application, in our case the database system, is given the opportunity to finalize pending I/O operations before the storage system creates the snapshot.


With a crash-consistent snapshot, a copy of the underlying storage is made without the knowledge of the running application. This produces a copy of the storage similar to what would result if a system suddenly lost power, hence the name. When using crash-consistent snapshots with database systems, the database engine will need to perform recovery on its data in order to make the database usable again. Although this type of recovery is a normal part of the database startup process, storage backups taken with crash-consistent snapshots can sometimes fail to recover, leading to a corrupted database.


Amazon EBS provides a snapshot capability. This is how it works: The first AWS snapshot backup is essentially a complete copy of the data, with further snapshots making a copy of the blocks that have changed since the last snapshot.


In order to create an application-consistent snapshot for an Amazon EBS backup, scripts would need to be executed in order to freeze the database system and flush buffers before the snapshot is taken. After completing the snapshot, the database system would be returned to normal operations.


AWS allows you to spin up new storage volumes based on snapshots that have been previously created. These storage clones are created instantly and lazily loaded in the background, which means that users do not need to wait for a complete restore of the new volumes before they can start using them.


This can be very useful for creating development/test environments and for other DevOps processes. But keep in mind that these clones are complete copies of the data and storing them will consume large amounts of storage space. As we’ll see in part three, Cloud Volumes ONTAP’s FlexClone® technology provides a much more space-efficient way to clone for DevOps, testing, and more.

Backup Redundancy

When creating snapshots for an Amazon EBS volume, the snapshot data is stored in Amazon S3, which means that the backup data is stored redundantly across Availability Zones within a Region. Cross-region replication means that this data can also be stored across multiple regions.

Managed Database Services


Amazon RDS is a managed database service in the AWS cloud that includes an automated database backup service. An  Amazon RDS backup is achieved by performing a volume snapshot of the underlying storage, as described above. There are some considerations to keep in mind when using Amazon RDS.

Conclusion

Though their role has changed in recent years, data backups are still an important part of any database backup strategy, ensuring data protection, as they provide an invaluable means for recording the state of a system at a point in time.


In this article we have looked at how both file-based backups and snapshot backups are used for AWS database backups. Though snapshot backups can be better at dealing with larger databases, there are both pros and cons to the AWS solutions for this. In Amazon EBS, the initial snapshot must be a complete copy of the data, with subsequent snapshots being incremental with respect to the data that has been modified in the live volume.

Of course, there are more options than AWS database backups.


In the next part of this series we’ll look at Azure’s database backup solutions, and finally see how NetApp’s Cloud Volumes ONTAP allows you to easily manage database backup and archiving. Cloud Volumes ONTAP provides a single platform for enterprise storage and data management, capable of performing application-consistent snapshot backups that can be stored redundantly across Availability Zones or regions for use in AWS disaster recovery, AWS high availability, and more.

Read more on AWS snapshots and other snapshot technology in these blogs:


To get started with managing data backup and archive with Cloud Volumes ONTAP right now, sign up for a free 30-day trial.

-