Given the importance of IT systems, it would be easy to assume that they all rely on highly available infrastructures. Yet, for a number of reasons—namely costs and complexity—that is often not the case, with many businesses taking this as a business risk.
The risk is largely unnecessary. The effort that has to be put in to make a system highly available is lower than might be expected, and the costs of not doing so aren’t just imaginary: when there is an outage in a cloud provider region, it quickly draws big headlines and unhappy customers. That all adds up to huge business losses.
How can you design a highly available deployment on Google Cloud? Google Cloud Platform offers a number of services that are global or multi-region by default and that can help engineering teams build robust systems. It is then worth exploring the native Google Cloud Platform capabilities to design highly available systems from the ground up.
Google Cloud High Availability Infrastructure
In-depth knowledge on what is available will save you precious time and effort while designing and developing scalable and resilient applications.
Google Cloud Platform, a leading voice in Site Reliability Engineering, currently has 24 regions, with a few more already planned to open soon. Each Google Cloud region is an independent geographical area that has at least two or more Google Cloud availability zones. A good practice is to deploy across multiple Google Cloud zones to improve resilience against a single zone failure, for example, a unique data center. While using multiple zones within a region increases the overall system availability, and failures of an entire region are not a common occurrence, it has been shown that when those rare region failures happen, it has disastrous consequences and a very negative and visible public impact.
Google Cloud Platform has multiple services that can offer multi-region availability by default. Good examples are Cloud Key Management Service, which enables you to manage your data encryption and decryption keys, and Cloud Load Balancing, which provides you load balancing capabilities for both internal and internet-facing applications. A few services are not just multi-region but actually global by default, such as Google Content Distribution Network and Cloud DNS, one of the few cloud services that provides a 100% uptime SLA.
Data Resiliency and Availability
The Storage Challenge
Data storage and stateful workloads are the most challenging aspects when designing a highly available system. Keeping data replicated and in sync while geographically distant services are reading and writing data is incredibly challenging to implement when traffic and data volumes are high.
Google Cloud Platform has a few data storage services that provide built-in multi region support, such as Cloud Firestore (NoSQL databases for web and mobile applications) and Bigtable (NoSQL database for large analytical and operational workloads), Spanner (managed relational database), and BigQuery.
Vendor Lock-In Considerations
While providing a huge benefit by taking away the operational burden of making data highly available, these multi-regional database services are based on Google’s own proprietary technology and therefore may result in vendor lock-in. With some managed services, such as Google Cloud SQL, the customers can use open-source engines, such as PostgreSQL and MySQL, that enable them to easily migrate away from Google Cloud and use the same database engine elsewhere. However, these services are usually bound to a single region and any multi regional capabilities will need to be implemented by yourself. It is a tradeoff that requires careful planning and thinking.
When looking purely at object, block, and file storage capabilities, the high availability infrastructure options vary. Google Cloud Storage supports multi-regional capabilities, while Cloud Persistent Disk and Cloud Filestore have regional replication. Multi-zone redundancy might create a challenge if there is an outage in that region. With data storage in Google Cloud priced at a few cents per gigabyte, it is definitely worth replicating it to other Google Cloud regions even if this means developing that logic yourself for Google block and file storage services, which don’t have built-in capabilities to replicate between regions.
More Options for High Availability Infrastructure on Google Cloud
As you can see, there is a lot more than meets the eye when it comes to designing and running highly available workloads in Google Cloud. Understanding the fundamentals of what the native building blocks are is important. Likewise, understanding that a lot of those services with built-in multi region high availability infrastructure come at the expense of possible Google lock-in.
Is there another way to reap the Google Cloud Platform benefits and use the technologies you are already using elsewhere? Yes! Taking Microsoft SQL database as an example, there are different ways you can use it in Google Cloud. Leveraging a managed storage service such as NetApp Cloud Volumes ONTAP for Google Cloud, enables you to achieve storage high availability with RTO 0 and RPO < 60 seconds while using Google computing services to run your operations. Plus, with the additional features of Cloud Volumes ONTAP such as cost efficiency, data protection, and hybridity across multiple cloud providers, you gain access to a whole new range of options when designing your system. This makes data and cloud migrations a lot easier while still giving you all the existing Google Cloud infrastructure and benefits.
The best part? The same storage management capabilities of Cloud Volumes ONTAP are also compatible and available for AWS and Azure.