Some of the biggest driving factors for companies today are fault tolerance and business continuity. Companies need to be able to recover from catastrophes – outages caused by natural events or operational failures – quickly and with as minimal downtime and monetary loss as possible.
To this end, it is essential to have a solid Business Continuity and Disaster Recovery (BCDR) strategy in place by employing a world-leading Disaster Recovery as a Service (DRaaS) solution.
Azure Site Recovery (ASR), Microsoft’s DRaaS solution, was named an industry leader by Gartner in 2019 for its completeness of vision and ability to execute. It is offered as a cloud-native service, but it’s versatile enough to cater to on-premises, hybrid, and multicloud environments as well.
This blog post will explore some of ASR’s capabilities, inner workings, and use cases to demonstrate what makes it a world class DRaaS solution.
What is ASR in Azure?
Azure Site Recovery (ASR) is a DRaaS offered by Azure for use in cloud and hybrid cloud architectures. A near-constant data replication process makes sure copies are in sync. The application consistent snapshot feature of Azure Site Recovery ensures that the data is in usable state after the failover. The service enables customers to use Azure as a disaster recovery site on a pay-as-you-go model without having to invest in additional infrastructure.
As a disaster recovery platform, ASR offers support for multiple scenarios:
- Replication of physical servers from on-premises and third party service providers to Azure
- Windows and Linux VMs hosted in VMware and Hyper-V to Azure
- Windows VMs hosted in AWS to Azure
- Windows and Linux VMs in Azure Stack to Azure
Note: ASR also supports replication of VMs in Hyper-V and VMware to a secondary site, however these scenarios are being deprecated and will no longer be supported by March 2023.
The Advantages of ASR
ASR offers cloud-based DRaaS in the event of planned and unplanned outages. Let's explore some of the key benefits of the service.
Cost effective: ASR will charge you for every protected instance, in addition to the storage cost for the replicated data. The service is free for the first 31 days, after which the protection charges will kick in. The data being transferred to storage is compressed with an average compression ratio of 50%, which further reduces the storage cost. There is no compute, network infrastructure, facility rental, or software licensing fees required during ongoing protection.
Data resilience: The replicated data is stored in Azure storage, which is resilient by default. There will be a minimum of three copies of the data available in locally-redundant storage (LRS) to protect from data center failures. For further protection, customers can choose to use geo-redundant storage (GRS) to protect from regional outages.
Heterogeneous workload: ASR supports protection of Windows and Linux workloads hosted on physical servers on-premises, VMs hosted in VMware/Hyper-V, and machines in third party hosting platforms/cloud. It can also protect VMs in Azure from regional outages. The Azure ASR console provides a unified view on the replication status of all your different workloads and allows you to carry out maintenance tasks, such as tweaking recovery plans.
App consistent: ASR captures the in-memory data and transactions along with the disk data and ensures that the recovery points are application-consistent. For Windows, it is enabled through VSS and in Linux it is done using application custom scripts.
BCDR integration: ASR provides seamless integration with native application BCDR features such as SQL Always-On and Oracle Data Guard. This makes it possible for organizations to adopt the service without major overhauls in their application ecosystem.
Non-disruptive testing: To further prepare your system in case of a failure, ASR can run non-disruptive failover and DR drills. This helps in end-to-end testing of DR plans without impacting the ongoing replication.
RPO and RTO targets: ASR supports replication frequencies as low as 30 seconds and can be tailored to meet organization specific RPO and RTO targets. By integrating automation runbooks with your recovery plans as well as integration with Traffic manager, the RTO can be further reduced. Recovery plans are highly customizable to allow quick and sequenced failover and recovery of multi-tiered apps such databases and web services.
Replicate Data to the Cloud With ASR
This section will provide a walkthrough for how to replicate data to the cloud using ASR. As with every DRaaS and migration project, your company will first need an agile plan to ensure a successful DRaaS strategy.
1. Planning Stage
There are several factors that govern a DRaaS strategy: RTO and RPO goals, storage (IOPS and storage account), capacity planning, network bandwidth, network reconfiguration, and daily change rate.
Azure Site Recovery Deployment Planner can help you analyze your source environment for VMware and Hyper-V environments and plan for capacity and scale in the target Azure environment.
One aspect of Azure ASR to keep in mind at this point is network planning. Customers can choose to retain existing IP addresses, but that would require failover of the entire subnet in addition to the machine. Alternatively, a new network range from Azure can be used if that works for the application architecture after failover.
Make sure to review the support Matrix to understand the prerequisites and Azure Site Recovery limitations while replicating VMs and physical machines to Azure. It is also prudent to verify the kinds of workloads that can leverage app-agnostic protection. You can find the full list here.
Pro tip: Lookout for limitations like supported operating systems, the 4 TB limit for managed disks, and the 8 TB limit for disks on storage on each protected VM. Also, lookout for additional charges for storage account usage, storage transactions, and outbound data transfers when configuring ASR.
2. Prepare and Configure
Now that we have a solid plan based on source environment analysis and capacity planning, we can start preparing our environments for replication. The first step is to prepare the source.
ASR supports several source environments like VMware (with or without vCenter), Hyper-V VMs (with or without SCVMM), physical servers, and Azure VMs. It can also be used for DR of machines in other cloud service providers like AWS or from third party hosting services using the same process that is used for protecting physical servers. It is important to note that there are different requirements based on the source environment.
For example, VMware VMs would require additional resources such as a configuration server, process server, and mobility services to help manage, coordinate, and send the encrypted and compressed data chunks to the Recovery Services destination.
The next step is to prepare the target environment in Azure. The very first thing to do would be to create a Recovery Services Vault in Azure. The Recovery Services Vault will house the replication settings and manage the replication.Next step is to create storage and network accounts which will house the replicated on-premises machines (note: for the storage accounts you’ll have to decide between standard and premium account types, and set the LRS and GRS replication options based on your RPO).
Lastly, it is time to configure and enable replication. After the source and target have been prepped, you need to create a replication plan that aligns with your RTO and RPO objectives. Now select the Virtual Machines to be replicated and select the Replication policy that you defined earlier.
Finally, enable the initial replica (note: this process can take quite some time). After the initial replication is complete, ASR replicates data in incremental chunks (changed data) at an interval defined by your replication policy.
3. Failover and Failback
Now that you have performed the replication, it is time to validate the setup and determine if and what changes you need to make if you have to execute a failover.
There are three types of failovers - test failover, planned failover, and unplanned failover. A test failover has no impact to production, but a planned or unplanned failover involves shifting the production site to the replication site such as Azure or another host.
A test Failover can be done either through a recovery plan (to orchestrate failover of multiple machines) or manually for each VM through the Azure console.
If you executed a planned failover, don’t forget to reprotect the machines after they have failed over. Once your source site is up, you can failback the VMs using the process server, master target server, and a failback policy.
4. Manage, Monitor, and Troubleshoot
It is advisable to keep monitoring your replication settings to ensure that your RPO objectives stay aligned. You can tweak replication settings or add scaled out process servers to meet these objectives.
Apart from providing job alerts on the Azure console, ASR also has its own Event Log Source that can be useful for troubleshooting replication failures. Here is a guide on what event sources and ports need to be looked at while troubleshooting these failures.
Protect VMs in Azure using Azure Site Recovery
Azure Site Recovery can be used to protect VMs in Azure by replicating them from one region to another. The quick start steps for enabling this protection are listed below:
- From the Azure portal browse to the Virtual machine->operations->Disaster Recovery.
- Select target region from the geographic cluster. Click on “Next:Advanced Settings.”
- Select the target environment settings. Select the subscription for your VM, its resource group, virtual network, and availability configuration (single instance, availability set, or availability zone). Here you can choose from one of the existing resource groups, virtual network or create a new one.
- Select the cache storage which will be used to temporarily store data in the source region before replicating to target. You can also select the disks that will be replicated and the target disk type, i.e., Standard SSD/HDD or Premium SSD. The subscription where the vault exists, the name of the vault and replication policy to be used for the replication can be selected here. Click on “ Review+start replication.”
- In the next page, click on “Start replication.”
- Once the initial replication is completed, the protection status can be checked from Virtual machine -> Operations->Disaster Recovery.
Owing to its cost-effectiveness, ease of use, and support for an extensive list of workloads, ASR has established itself as a world-leader in BCDR solutions. But for an additional level of protection, customers can also leverage NetApp Cloud Volumes ONTAP in Azure to augment BCDR plan for workloads hosted on-premises as well as in hybrid cloud environments.
As NetApp’s cloud-based version of the successful ONTAP data management platform, Cloud Volumes ONTAP offers value add through built-in storage efficiency, high availability, and data replication features. With SnapMirror® data replication technology, Cloud Volumes ONTAP can be used to replicate data volumes across on-premises and hybrid cloud environments seamlessly and automatically, so data is always kept synced. SnapMirror also allows users to failover the data to secondary sites during unplanned outages/disasters. Once the primary site is back up, data can be replicated back to enable failback. And the thin-provisioning, data compression, and data deduplication storage efficiencies ensure your DR data is always stored cost-efficiently. FlexClone® technology helps you to create instant writable clones of volumes with zero storage penalty, making DR testing faster, more effective, and less expensive.
Implementing the right DRaaS solution is non-negotiable to ensure business continuity and protect your workloads from unplanned eventualities. The right Azure Site Recovery architecture along with advanced services from Cloud Volumes ONTAP can make this journey easy.
Read more to learn about Disaster Recovery with Cloud Volumes ONTAP:
- Seamless Disaster Recovery Failover with ONTAP Cloud
- Pay Less for Cloud-Based Disaster Recovery with Cloud Storage Efficiencies and Manageability
- Cloud Disaster Recovery: Case Studies with Cloud Volumes ONTAP