Windows Server Failover Clustering on AWS with NetApp Cloud Volumes ONTAP

[Cloud Volumes ONTAP, High Availability, AWS, Customer Case Study, Advanced, 8 minute read]

Windows Server Failover Clustering, commonly referred to as WSFC (historically, Microsoft Clustering Service, or MSCS) has long been a popular solution for providing a key part of a typical high availability (HA) platform for various applications in the data center. This functionally carries over to the cloud, and ensures AWS high availability.

WSFC provides a native HA platform that protects against node/server failures of many native Microsoft and third-party applications running on Windows Server instances. This is accomplished through the use of active/passive servers that access a shared storage platform. When coupled with storage HA that is typically provided by the underlying storage infrastructure, WSFC provides a complete resilience against both server nodes and storage node failures to increase application uptime.

This article takes a deep dive into the typical benefits of WSFC for enterprise organizations and how those customers can achieve the same capability when moving to cloud platforms such as AWS using NetApp Cloud Volumes ONTAP.

What Is Windows Server Failover Clustering?

Windows server failover clustering (WSFC) provides native high availability and disaster recovery for applications and services running on Windows Server instances. Grouping independent server instances with a shared storage platform (typically a shared SAN storage), WSFC acts as an abstraction layer, ensuring that shared storage is accessible to all server instances (nodes) in a failover cluster group without data corruption.

WSFC uses the concept of active/passive nodes, where the active instance having write access and the passive (standby) instance only having read capabilities to the shared data volumes at a given point in time. WSFC also acts as a gatekeeper to monitor the health of the nodes. Should the health of the active node deteriorate, the passive instance is seamlessly given the write capability to the shared data volumes in use by the application or service. See these resources for more details on the WSFC architecture and using failover cluster: SQL Server Failover Introduction and this Intro to Failover Clustering.

WSFC itself guards applications against server/node failures; application use cases that require 100% uptime have typically used WSFC together with the storage high availability (which are usually available through enterprise storage solutions with redundant storage controllers or nodes).

Microsoft Exchange Server and Microsoft SQL Server are examples of two popular applications that are often deployed on WSFC due to their criticality to many organizations. In addition to other Microsoft solutions such as Scale-Out and Clustered Windows File Servers, a number of third-party applications such as Oracle GoldenGate and SAP ASCS/SCS on Windows have also been popular deployments on WSFC in order to achieve high availability. Besides applications, a number of Microsoft Windows and third-party services such as Windows Message Queue (aka, MSMQ) and IBM Message Queue (IBM MQ) have historically also relied on the underlying use of WSFC in order to deploy in a highly available manner.

WSFC has been popular mainly due to its accessibility for the masses. Being natively available within Windows Server itself, any Microsoft customer could easily configure and implement clustering for their supported applications to increase the application uptime. This has reduced the need for additional costs involved with having to utilize third-party clustering solutions, something that also makes WSFC popular within the small- to medium-sized corporate customers around the world.

WSFC on AWS Cloud

Challenges with Windows Server Failover Clustering in AWS

Despite its popularity, deploying clustered applications such as WSFC that typically requires an underlying shared disk storage (such as SAN storage), have not always been possible to be implemented in the AWS cloud. This was mainly due to the lack of such shared storage solutions on AWS, especially if you needed to deploy resilient HA across multiple Availability Zones (AZ). This native platform limitation has presented some challenges, mainly for deployments that require enterprise-grade application node high availability within an AWS region. Such deployments are prevented from being able to deploy or migrate their existing WSFC applications to AWS without re-architecting or incorporating additional data replication software on top of the native AWS EBS storage assigned to individual Amazon EC2 nodes.

In order to address these limitations, AWS recently announced the AWS Multi-Attach EBS storage type. Multi-Attach EBS is intended to provide an AWS-native, shared, block storage solution where provisioned EBS disks can be accessed by multiple AWS EC2 instances at the same time. However, Multi-Attach EBS volumes are subject to a number of limitations today which impacts their use for most common enterprise clustered applications such as those utilizing WSFC. Some of the key limitations to Multi-Attach EBS include limited regional availability, restrictions on the type and number of disks per instance, and the limited compatibility with various EC2 instance types.

Windows Server Failover Clusters with NetApp Cloud Volumes ONTAP

NetApp ONTAP has been behind many enterprise WSFC deployments in the data center for almost three decades as the shared storage platform of choice. NetApp Cloud Volumes ONTAP is a fully fledged version of the same ONTAP software running natively in AWS (as well as in Azure and GCP).

Windows Server Failover Clusters with NetApp Cloud Volumes ONTAP

Cloud Volumes ONTAP on AWS runs on native EBS storage volumes and presents that storage as highly available, shared accessible storage in both file (NFS, SMB) as well as block (iSCSI) format to be consumed by various EC2 instances. By design, it delivers extreme performance and advanced data management services to satisfy even the most demanding applications on the cloud. Cloud Volumes ONTAP also enables enterprise customers to provision a unified data storage and management platform spanning multiple AZ’s for enterprise grade storage high availability.

WSFC on AWS with NetApp Cloud Volumes ONTAP iSCSI LUNs

Cloud Volumes ONTAP addresses the challenges customers encounter when planning to deploy or migrate highly available application clusters built on WSFC. iSCSI LUNs provisioned via Cloud Volumes ONTAP offer shared storage volumes that can be presented directly to the guest operating system running inside an AWS EC2 instance. These can then be attached to the Microsoft iSCSI initiator as shared SAN storage for the purpose of deploying a WSFC cluster. These iSCSI LUNs can be used by any EC2 instance of your choice to design and deploy WSFC cluster nodes of any size supporting various deployment choices for SQL, Exchange, SAP, Oracle, and other applications on AWS.

Cloud Volumes ONTAP Multi-AZ High Availability Configuration

Cloud Volumes ONTAP can be deployed in a standalone (inside a single AZ) as well as a HA configuration mode on AWS. The HA deployment can span multiple AZ’s within a region which ensures that the storage platform itself is resilient to data center failures on AWS. Coupled with the ability within WSFC to deploy cluster nodes across multiple AZs within a region, customers can now design extremely highly available enterprise deployments of their critical applications spanning multiple AWS data centers.

VPC Main Route Table

Furthermore, given the enterprise grade availability and data management features natively available within Cloud Volumes ONTAP, WSFC customers on AWS can also benefit from additional capabilities such as low-latency performance, cost effective data protection via built-in NetApp Snapshot™ technology, instant cloning via ONTAP FlexClone®, built in data replication to other regions via ONTAP SnapMirror® for disaster recovery and built in data tiering to Amazon S3 for infrequently used active data.

As a mature, enterprise grade data storage and management solution, Cloud Volumes ONTAP also does not have the sale limitation of AWS multi-attach disks when implementing WSFC. Unlike multi-attach EBS disks, Cloud Volumes ONTAP is not subject to AWS Nitro-based instance limitation and is generally available across most of the AWS regions providing more choice and scalability options to customers. Unlike multi-attach EBS, Cloud Volumes ONTAP iSCSI volumes are also not restricted to Provisioned IOPS SSD (io1) type only and instead can support various AWS storage types including HDD as well as General Purpose SSD (gp2) enabling broader range of TCO optimized use cases with WSFC such as scale-out Windows file servers without the need for expensive EBS storage types. In addition to these, NetApp Cloud Volumes ONTAP also introduces the built in storage efficiency features such as data deduplication and compression, significantly reducing the consumed size of the underlying EBS storage volumes, thereby significantly reducing the TCO for WSFC deployments.

WSFC with Cloud Volumes ONTAP in an AWS High Availability Configuration: An Enterprise Customer Case Study

One company that was able to leverage WSFC with Cloud Volumes ONTAP in AWS is a large business and financial software company that develops and sells financial, accounting, and tax preparation software and has up to 40% of the market share in the US. This company managed to get around the native limitations of implementing WSFC on AWS through the use of NetApp Cloud Volumes ONTAP. This solution succeeded in meeting their cloud migration agenda set out by the executive board.

The customer leveraged iSCSI data LUNs from several Cloud Volumes ONTAP high availability instances with over 1 PB of data to design and deploy a number of clustered Oracle Goldengate instances on AWS with nodes spread across multiple AZ’s for cross site HA. These Oracle Goldengate instances are being used by the organization to replicate data from their on-premises Oracle to AWS RDS as well as to replicate Oracle RDS between different AWS regions.

In addition to these, NetApp Cloud Volumes ONTAP has also provided them with the ability to deploy a number of internal, custom made applications used for various critical internal operations within the organization to AWS cloud in a highly available manner using Microsoft failover clusters. The ability to migrate these applications without application transformation has saved them significant cloud migration costs and meet vital project deadlines.

The SVP for the company wrote:

“This is an amazing cross-functional, and complex infrastructure engineering accomplishment, enabling our aggressive goals of migration of all systems by solving problems where AWS is not ready. Moving to AWS would not have been possible if we didn’t have Cloud Volumes ONTAP iSCSI.”

Conclusion

NetApp Cloud Volumes ONTAP brings decades of storage innovation and experience along with a rich set of data services on to AWS so that technologies such as Windows Server Failover Clustering can be implemented on AWS seamlessly for critical enterprise applications.

Customers can also benefit from a number of additional benefits such as cloud data storage cost efficiency and data management services inherent to Cloud Volumes ONTAP ensuring customers meet their enterprise data availability requirements without the need to sacrifice the TCO efficiency.

New call-to-action

Aviv Degani, Cloud Solutions Architecture Manager, NetApp

Cloud Solutions Architecture Manager, NetApp

-