As the version control repository of choice for many thousands of software development teams, GitLab provides a unified platform for managing source code, executing Continuous Integration/Continuous Delivery (CI/CD) pipelines, issue tracking, sharing wiki-based project documentation, and much more. The majority of these development teams cannot function without ready access to GitLab services.
Being so vital to the software development lifecycle makes it imperative to ensure that GitLab is always available, and as the data being stored by GitLab is the intellectual property of an organization, protecting this data against localized or site-wide storage failures is of paramount concern. Deploying GitLab storage using NetApp’s Cloud Volumes ONTAP makes all of this very easy to achieve.
In this article we will look at the pros and cons of the various storage options for deploying GitLab HA in AWS, and show how the enterprise data management features of Cloud Volumes ONTAP make it the ideal solution.
What Is GitLab?
GitLab is an all-in-one solution for developing software, including Git source control, merge requests, CI/CD pipelines, issue tracking, wikis, and much more. Due to its centrality in the software development process, access to GitLab is vital, with performance being a prime concern. For large teams, this usually means scaling the deployment across many servers.
Deploying GitLab at Scale
GitLab supports both single and multi-node deployments that can process hundreds of concurrent requests per second and handle thousands of active users. In a single node environment, all of the constituent services that are used by GitLab, such as PostgreSQL, Redis, and Consul, will all run on the same machine. To scale for a larger user base of twenty thousand or fifty thousand users, multiple nodes can be used for each of these services.
For GitLab storage there are a number of different options available.
It should be kept in mind that data access in a GitLab cluster is I/O intensive for both reads and writes, and so a high-performance solution is required at this level to deliver the best user experience. As well as storage for Git repositories, GitLab storage is also required for media files, uploads, and build artifacts, which are supplemental to the source code. GitLab’s recommendation here is to store this data using some form of object storage.
GitLab Data Storage
In this section, we will explore the available options for managing your GitLab data, and will review performance characteristics, features, and support for GitLab high availability.
GitLab administrators can deploy data storage for source code repositories and other data by means of block storage, such as Amazon EBS, which would be either directly mounted to the GitLab server, or accessed via the Gitaly service on a separate machine. This provides flexibility in terms of the type and performance of the disks used, such as General Purpose SSD (gp2) or Provisioned IOPS SSD (io1). Also, Amazon EBS storage is redundant within an Availability Zone, and so there is built-in redundancy within the provisioned storage.
Gitlaly does not currently support high availability, which can be a limiting factor when deploying GitLab HA. It should also be noted that while Amazon EBS storage is redundant within an Availability Zone, to extend this redundancy across Availability Zones, or across regions, would require some form of custom replication.
Amazon S3 storage provides cost-effective and highly durable object storage within the AWS cloud, making it an ideal solution for many different types of data storage, such as backups, media files, data archives, etc. Though GitLab can also make use of Amazon S3 for these types of files, it cannot be used for GitLab’s main data stores—i.e., the source code repositories—which means that another solution must be used alongside Amazon S3 for this purpose.
The use of NFS storage is both widespread and widely understood, making it a very good option for deploying shared data storage. As the same file system can be mounted by multiple servers concurrently, NFS makes it easy to scale out and support high availability. GitLab can make direct use of NFS storage for both repository data and object storage.
Ensuring that your NFS environment meets both the strict performance requirements of GitLab as well as providing high availability, can make it prohibitively complex to roll out your own NFS server cluster. Alternatively, using services such as Amazon EFS is strongly discouraged by GitLab due to performance issues. In order to make use of NFS storage, you would need an enterprise-grade, highly available, and high-performance NFS solution in the cloud, which is precisely what NetApp’s Cloud Volumes ONTAP delivers.
GitLab NFS Storage Using Cloud Volumes ONTAP
Using NetApp Cloud Volumes ONTAP, you can take advantage of NetApp storage solutions within AWS, Azure, or Google Cloud, building on the native compute and storage resources provided by each cloud environment. This substantially improves the flexibility, performance, high availability, and cost effectiveness of deploying cloud storage over using native cloud storage resources directly.
NetApp is a recognized industry leader for NFS solutions, and using Cloud Volumes ONTAP allows you to easily deploy GitLab on this very same technology. In AWS, you can choose any of the available Amazon EBS disk types for your new NFS shares, and even combine them together within a RAID group for extra performance. Cloud Volumes ONTAP can also be deployed on a variety of different Amazon EC2 instance types, including General Purpose, Compute Optimized, and Memory Optimized.
Cloud Volumes ONTAP contains many features that help support performance, data protection, high availability, and DevOps benefits for a GitLab deployment:
- Flash Cache: Cloud Volumes ONTAP uses a sophisticated caching mechanism that reduces the need to retrieve frequently accessed data from Amazon EBS. This can dramatically improve data access performance, especially in the case of GitLab where users are pulling data for a common set of source code repositories
- Instant Data Cloning: Based on Snapshot copies, FlexClone data cloning feature provides GitHub users with ample DevOps benefits for GitLab CI/CD pipeline execution. Using NetApp’s FlexClone® feature, any test data volume or database hosted on Cloud Volumes ONTAP, whether on an NFS share or directly on block storage, can be instantly cloned to create a space-efficient, writable copy of the data, without doing a block for block copy. This clone can even be written to during integration tests and then thrown away at the end of the pipeline.
- Data Compression: With transparent server-side support for data compression, Cloud Volumes ONTAP not only helps to reduce your cloud data storage costs and footprint, and therefore operational costs, but also helps to improve storage performance by reducing the level of I/O required for data access.
- Amazon S3 Tiering: Data tiering allows a data volume in Cloud Volumes ONTAP can automatically tier less-frequently accessed data to Amazon S3, giving you the advantage of highly cost effective storage for build artifacts, media files, etc. This actually works better than directly storing these files on Amazon S3, as Cloud Volumes ONTAP will move these files to Amazon EBS when they are being accessed, thereby improving access performance, and move them back to Amazon S3 as required.
- NetApp Snapshot™ Copies: Cloud Volumes ONTAP gives you the ability to create an instant snapshot backup of all your GitLab source repositories, without degrading storage performance or making redundant copies of the data. Snapshot backup copies can be restored back to the original volume, or cloned to a new writable volume in order to perform a selective restore, using NetApp’s FlexClone.
- High Availability: Cloud Volumes ONTAP HA allows for a secondary storage node to be deployed to the same or a different Availability Zone in order to achieve AWS high availability. Each write on one node is synchronously written to the other, which not only improves data redundancy, but also performance, as data can be read from or written to either node in tandem.
- SnapMirror® Replication: SnapMirror data replication allows you to replicate data volumes between instances of Cloud Volumes ONTAP, which may potentially reside in different regions. This gives the ability to protect your intellectual property even in a disaster recovery (DR) scenario.
Building out a highly scalable GitLab HA deployment with multiple GitLab application servers requires an enterprise-grade shared storage solution. Whereas other NFS solutions, such as Amazon EFS, can challenge the strict performance requirements of GitLab, NetApp’s Cloud Volumes ONTAP is a remarkably good fit for this use case, and provides many other ancillary benefits, in terms of data protection and high availability.