As a DevOps engineer, you’ll be regularly asked by other developers to create dev/test environments they can use to develop new application features.
The most important part of such DevOps environments is usually the most difficult and time consuming to provision—the database.
How can you rapidly create an exact duplicate of a production database, complete with all it’s data, that may only be needed for a few hours?
For large databases, backup and restore is simply not a scalable option. Even with backup compression, the situation only improves to a certain extent and at the cost of much higher CPU usage during both the backup and restore operations.
On top of this, in larger teams, multiple developers will need their own copy of the data to work with, which only exacerbates the problem. This leads to a lot of wasted time on the part of the developers, let alone the extra storage overhead for each copy.
ONTAP provides an excellent solution to this problem, which can be used in both on-premises and cloud installations. SnapMirror can be used to move data quickly and efficiently into and out of the cloud, and FlexClone can be used to create temporary writable copies of data without affecting the source.
In this article, we’ll look at how FlexClone works and how it can be used to rapidly create a test copy of a PostgreSQL database hosted in EC2 and using Cloud Volumes ONTAP to manage its storage.
ONTAP FlexClone for DevOps Environments
FlexClone allows you to instantly create a writable copy of an existing volume, irrespective of it’s size.
These clones are very storage efficient, initially requiring only 4KB and then only the storage required for any changed data, with no impact on the source.
They are easy to recreate when up-to-date data is required, and for allowing multiple parallel copies to exist at the same time.
A FlexClone volume is based on a snapshot of an existing volume, which captures the state of the volume’s data blocks at a specific point in time. This is used as a starting point for the data the FlexClone will serve out. Changes to the data are managed, as they normally would be for snapshots, by using copy-on-write.
If data in the source volume changes, copy-on-write ensures that blocks that are locked by a snapshot are not updated, but instead new blocks are allocated to write the changed data. This ensures no impact to the state of the FlexClone.
When data is changed in the FlexClone, the same principle applies, however, the new blocks are written into the FlexClone’s volume instead of the source volume. When the clone is no longer required, its volume can simply be deleted, thereby freeing any extra storage it was using.
To recap, FlexClones have the following advantages:
- Instantaneous creation for source volumes of any size
- Storage efficiency through the use of copy-on-write
- Very easy to cleanup and re-create
These properties make them ideal for creating DevOps and database test environments, as we will show in the next section.
Creating a Cloned Test Database
When the storage for a database is hosted in one or more ONTAP volumes, we can use FlexClone to create a writable copy of the data and mount it to a different server. Performing LUN management and using iSCSI are prerequisites for achieving this, however, describing those actions is outside of the scope of this article, which will focus on how to create a clone and mount a database to it. We would also like to assure you that FlexClone works the same exact way for NFS and CIFS workloads as well.
Let’s imagine we have a production PostgreSQL database that holds customer information. In our environment, this database uses a single volume hosted on Cloud Volumes ONTAP. If we connected to and queried the database, we would see the following:
Creating a Snapshot for DevOps and Database Test Environments
The first step in creating our FlexClone is to create a crash-consistent snapshot of the volume holding our database files. This type of snapshot, as opposed to a transactionally consistent snapshot, is performed without the involvement of the database server.
When snapshots are used for database backups, the database server will flush all pending I/Os to disk and suspend further writes in order for the storage system to create a clean snapshot backup. This interaction between the storage system and the database server can be managed by software such as NetApp SnapCenter.
When creating a snapshot for a test database instance, such strong guarantees of consistency are not always required. On startup, a database server will always perform recovery on the database files in order to deal with a situation, such as a power outage or abort of the database process, where the server did not shut down normally.
For our crash-consistent snapshot, the database server will do exactly the same thing, and rollback any in-flight transactions at the time of the snapshot and roll-forward any transactions that were captured in the transaction log but not yet applied to the data files.
Once we have our snapshot, we can then create a FlexClone. Both operations can be completed through SSH, which allows us to script the operations and simplify the process of repeating them.
::> volume snapshot create -vserver svm_production -volume database -snapshot test.20170428
::> volume clone create -vserver svm_production -parent-volume database -parent-snapshot test.20170428 database_test_20170428
As well as on the CLI, this operation can also be performed through the Cloud Manager web-based UI:
We now have an exact duplicate of our database, which could be terabytes in size, created within seconds. The new volume contains LUNs that we can connect to a test database server over iSCSI.
After connecting the storage to the EC2 host, we must mount it to the data directory that our test instance of PostgreSQL will look for on startup.
By default, the PostgreSQL server configuration file is stored in the data directory.
A better approach would be to move the configuration to a different location, such as somewhere under /etc, or otherwise our clone will contain the production file.
If this the case, we will need to overwrite that file with our test environment configuration.
Another point to note is that as the ONTAP snapshot was taken while the database server was running, the cloned data directory will contain a postmaster.pid file, which if not removed will prevent the test server from starting.
Once this has all been taken care of, we can start our test instance and query the database. If we query the test instance, we’ll see the same data as in production. If we change the data in the test clone it won’t affect production and, likewise, if we change production data the clone is unaffected.
On the left hand side we can see our test server and on the right a connection to production. As we can see, changes made on either side will not affect the other.
When we are finished with our test, we can shutdown the test database server and detatch the LUNs. The following commands can then be used to cleanup the clone and snapshot we created:
::> volume delete -vserver svm_production -volume database_test_20170428
::> volume snapshot delete -vserver svm_production -volume database -snapshot test.20170428
Test and Development Efficiency
This article demonstrates how useful FlexClones are for the rapid creation of DevOps and database test environments. As FlexClones can work with any ONTAP volume, they can also be used for file shares in the test environment.
Though the above examples made use of SSH, for Cloud Volumes ONTAP installations the Cloud Manager API, complete with a Swagger interface, can be used to access ONTAP programmatically, and thereby integrate the setup and teardown of FlexClones with a larger DevOps process.
NetApp Cloud Manager, which is used to deploy and manage Cloud Volumes ONTAP, is available for a free 30 day trial through the AWS and Azure marketplaces. This would be the place to start when building your own DevOps or Disaster Recovery environment in the cloud.