While enterprise workloads deployed on public and hybrid cloud platforms require enterprise-grade hybrid cloud disaster recovery capabilities, the solutions provided by the underlying cloud service providers, if any, aren’t comprehensive. In many of these cases, cloud workloads will only be available within a single Azure or AWS region by default. For many enterprises, that simply isn’t enough to ensure their business data’s availability.
In this blog we’ll show how NetApp’s Cloud Volumes ONTAP can leverage NetApp SnapMirror® data replication technology to create a cross-region failover and failback for a cloud disaster recovery solution on AWS and Azure.
What is Storage Failover and Failback?
Storage failover is the process of shifting data operations from an impacted primary data storage platform to a secondary storage platform, typically located in a remote site. Failback is the reversal of the failover process which ensures the up-to-date data that lives in the secondary platform post failover is replicated back to the primary storage platform and put into operation.
The failover capability is an integral requirement for most mission-critical enterprise storage solutions, which have to be constantly available. A key prerequisite for this failover activity is the constant data replication from the primary storage solution to the secondary. Both storage failover and failback operations typically rely on automation to make all the steps in the process as seamless as possible to the end user.
Cross-Region Data Replication: Requirements and Challenges
Cross-region replication is an operational necessity for any enterprise disaster recovery solution. Replication of data from the primary to secondary storage platform, as dictated by operational requirements such as RPO (Recovery Point Objective), is typically required to meet compliance. Replicating data across regional boundaries is also a mandatory requirement to cater for a genuine regional disaster such as those evidenced during large-scale natural disasters.
When implementing such cross-region data replication, some of the key requirements and challenges include:
- Simple data replication without unnecessary complexities and third-party tools.
- Limiting performance impact on the production storage instance.
- Replicating the data in an efficient and secure way.
- Flexibility to transfer data under different conditions (e.g. during off-peak traffic times).
Cloud Volumes ONTAP, NetApp SnapMirror, and SnapVault
NetApp ONTAP in the on-premises data center and Cloud Volumes ONTAP in the cloud provide a common data fabric for enterprise customers to store their data in an identical manner with no platform locking across hybrid or multicloud deployments. SnapMirror technology then enables native platform-to-platform data replication within or across various geographical regions with zero complexity.
SnapMirror is underpinned by NetApp Snapshot™ technology. Since these snapshots do not carry performance penalties within the storage system, Cloud Volumes ONTAP can easily replicate data between two instances or across classic NetApp data center storage platforms and Cloud Volumes ONTAP with little to no impact on performance. By design, NetApp SnapMirror only replicates the deltas once a baseline full copy is transferred over from source to destination. This, helped by Cloud Volumes ONTAP’s storage efficiencies, ensures that replication bandwidth requirements and associated costs are kept to a minimum.
Similar to SnapMirror, NetApp SnapVault® is also underpinned by NetApp Snapshot technology. However, unlike SnapMirror, which replicates all the source volume’s content to a target destination, SnapVault provides the ability to back up data as read-only snapshots from one or more ONTAP platforms to a central ONTAP platform quickly, for long-term retention. In the event of data loss or corruption, backed-up data can be restored from the secondary SnapVault instance. By design, NetApp SnapVault also benefits from the same efficiencies as SnapMirror replication.
Cloud Volumes ONTAP is a fully software-defined storage solution (SDS), and those software-defined capabilities can be executed via the OnCommand® Cloud Manager GUI, APIs, or using the OnCommand System Manager. Data replication can be scheduled or run on demand using these tools, providing customers choice and flexibility.
There are some key prerequisites, design decisions, and considerations to be aware of before setting up cross-region replication for AWS disaster recovery or Azure disaster recovery with Cloud Volumes ONTAP and SnapMirror.
Please refer to this documentation for detailed prerequisites for both Cloud Manager and Cloud Volumes ONTAP. You can find the most up-to-date information on Cloud Volumes ONTAP availability across major public cloud platforms here.
SnapMirror considerations such as matching ONTAP versions (destination > source version as a minimum), setting appropriate firewall rules, and network routing in between the Cloud Volumes ONTAP instances should also be in place for cross-region replication.
When it comes to cross-region replication, if the source and destination are from the same cloud platform provider, Cloud Volumes ONTAP instances can be configured to use inter-region VPC peering (AWS) or Global VNet peering (Azure) for cross-region replication traffic for better results. With this approach, inter-region traffic will traverse via the cloud provider’s private backbone rather than over the internet which provides a more secure, high bandwidth, low-latency network between the two regions.
There are three types of cross-region replication policies available: Mirror, Backup, and Mirror & Backup.
Mirror: Creates a version-flexible SnapMirror relationship which creates a new snapshot copy and replicates all existing snapshot copies to the destination volume. Typically used for DR purposes and the content of the source volume and the destination volumes are identical.
Mirror cross-region replication.
Backup: Creates a SnapVault relationship which replicates specific Snapshot copies for long term data backup retention purposes.
Backup cross-region replication.
Mirror & Backup: A combination of both the above options in a single policy. Provides the added benefit of using geo-replication for holding offsite backups for each region.
Setting Up Cross-Region Failover and Failback Processes in Cloud Volumes ONTAP and SnapMirror
This section covers the key implementation steps required for setting up SnapMirror replication between a Cloud Volumes ONTAP deployment and an on-premises ONTAP solution in a remote region. Note the same steps can be applied for two Cloud Volumes ONTAP instances, two on-prem ONTAP systems, or two multicloud instances as well.
While the steps provided here use Cloud Manager for illustration purposes, the same can be automated using DevOps tools such as Ansible or any other programming or scripting tool via the Cloud Manager REST API.
- Ensure all the ONTAP instances are discovered and visible via Cloud Manager.
- Ensure that the appropriate aggregates are set up on the secondary ONTAP instance that will be used as the replication target.
Setting up the replication process via Cloud Manager can simply be done by dragging and dropping the source working environment instance over to the destination working environment.
Now select the source volume to be replicated and click “Continue.”
Follow the instructions you’ll see in the Destination volume name and tiering section to select the name and the disk type for the destination Cloud Volumes ONTAP instance. The specific destination aggregate can be selected via the advanced options.
In the next window, select the maximum transfer rate to set the SnapMirror bandwidth throttling. Note that for cross-region data replication, it’s recommended to set the max transfer rate to be similar or less than the available bandwidth between the source and destination ONTAP instances.
Next, select the replication policy that is most appropriate. Select the replication policy you want to use. For the purpose of this post, we used the Mirror type.
Select the appropriate SnapMirror replication schedule, keeping in mind your RPO requirements.
Now review the settings and click “Go” to set up the SnapMirror relationship and start the initial replication. Replication status can be observed within the Timeline tab within Cloud Manager.
The Working Environments tab in Cloud Manager will show the replication configuration with arrows indicating the replication direction.
Replication status between various ONTAP instances can be viewed under the Replication Status tab. On-demand SnapMirror replications and other various operations can also be carried out via this view, all within Cloud Manager. REST API is also available for DevOps deployments.
When using Cloud Volumes ONTAP for cross-region replication, Cloud Manager provides a number of storage efficiency and cost-optimization options to reduce the compute costs. Cloud Volumes ONTAP can be scheduled to automatically power on and off which can come handy for the secondary instance that may not need to be online 24x7.
Another benefit of Cloud Volumes ONTAP is the cost savings achieved with storage efficiency features such as thin provisioning, deduplication, compression, and tiering data to lower-cost object storage on Amazon S3 or Azure Blob.
Invoking cross-region data failover
When the primary ONTAP instance is unavailable, invoking disaster recovery (for the storage platform) can be done via Cloud Manager by breaking the existing SnapMirror relationship. This can be done on the Replication Status screen.
Once the SnapMirror relationship is broken, the secondary Cloud Volumes ONTAP instance and its replicated volumes are now ready to be accessed by the production workloads (SnapMirror’ed volumes are made writable). The production applications and user connections that failover from the primary storage instance to the secondary can now start as required to facilitate the full disaster recovery failover process from the primary region to the secondary region.
Invoking cross-region failback
Once the primary ONTAP instance is back up and running, storage failback can also be initiated through the Replication Status screen on Cloud Manager using the reverse resync option. This ensures SnapMirror will replicate the most recent copy of data from the secondary ONTAP instance back to the primary instance.
Once the reverse resync is completed and the data is fully reverse-replicated, the application & user failback activities can commence as required.
Automated failover and failback
An enterprise storage failover will typically be a small part of a wider disaster recovery plan involving failing over not just the storage from one region to another, but also the surrounding application stacks and their prerequisites as well as the end user access to the failed over environment. Typically, such complex failovers and failbacks rely on end-to-end automation through application specific code or third-party monitoring and orchestration engines. In order to facilitate this, Cloud Volumes ONTAP storage failover and failback activities can also be initiated via the REST API of the Cloud Manager in order to provide full end-to-end, orderly automation of the entire failover and failback process as a part of a wider enterprise workflow.
NetApp Cloud Volumes ONTAP can be used with NetApp SnapMirror technology for cross-AZ and cross-region data replication on both AWS and Azure. Data is available in its native format with no platform locking and minimum storage consumption on cloud platforms due to Cloud Volumes ONTAP’s built-in storage-efficiency savings.
This software driven data replication can be leveraged direct from the easy-to-use Cloud Manager UI or via various third-party DevOps or automation tools through the use of the Cloud Manager REST API as a part of an automated runbook which can be integrated into a wider cloud disaster recovery plan for full end-to-end automation.