Amazon EFS (Elastic File System) provides cloud-based NFS file share services that are highly available and scalable. It is very quick to get started, allowing you to create a new filesystem share in minutes. Each filesystem can be accessed by hundreds or thousands of client machines and applications concurrently. This has a wide range of uses from consolidating data from multiple sources for data analysis to creating content management systems.
On AWS, migrating data to and from Amazon EFS can still pose challenges. Although Amazon EFS File Sync can be used to copy files into Amazon EFS from Amazon EC2 and on-premises systems, data cannot be copied out in the reverse direction. For bi-directional transfer from on-premises systems, an AWS Direct Connect connection is required in order to mount the filesystem locally, as VPN connections are not supported.
Cloud Sync has been designed to solve all issues related to data transfer and data synchronization. As part of NetApp’s Cloud Data Services, Cloud Sync makes moving data between any source and destination a simple task while keeping it synchronized in a robust and secure manner.
With Cloud Sync, you can ingest data into Amazon EFS from any NFS share, whether hosted on NetApp systems or not, as well as work with other types of storage, such as CIFS shares and Amazon S3. Cloud Sync also facilitates data synchronization in the reverse direction out of Amazon EFS back to NFS systems.
In this article we will start by describing how Cloud Sync works and the features it provides with migrating data to and from Amazon EFS, and then explore how Amazon EFS and Cloud Sync can be used together to great effect to protect Amazon EFS filesystems.
Using Amazon EFS with NetApp Cloud Sync
Data Migration with Cloud Sync
Cloud Sync is a SaaS (Software-as-a-Service) solution for data migration between any source and destination platform. After performing an initial baseline copy of the full data set, Cloud Sync will incrementally synchronize only the data that has changed, which makes it very efficient, especially when working with large datasets.
As Cloud Sync is a service solution, there are no software or agent installations to perform and users can start migrating data within minutes after signing up. A simple and fully-featured web-based UI guides you through the process of setting up synchronization relationships, and from the dashboard you can view the current status of all relationships, check audit logs, and perform other administration functions. DevOps users also have the option to integrate Cloud Sync with a wider workflow by making use of its RESTful API.
The Cloud Sync service works by performing data migration and synchronization operations through the use of a data broker instance. This instance can be created in the cloud, using Amazon EC2 on AWS, or using an on-premises or Microsoft Azure virtual machine. In either case, the Cloud Sync UI simplifies the process by helping you create the data broker.
Selecting a source and target in Cloud Sync.
Using Cloud Sync has many benefits:
- Fast: Cloud Sync has been engineered to use parallel processing to maximize the performance of data synchronization operations. In a performance comparison against a number of popular tools, including rsync, AWS CLI and S3cmd, Cloud Sync was the fastest to synchronize 1TB of data, coming in at as much as 10x faster than some of the other tools under test.
- Efficient: After the initial data copy, Cloud Sync will synchronize only the data that has changed at the source since the last sync schedule. This makes keeping the data synchronized much more efficient than re-copying the full data set every time.
- Flexible: Cloud Sync supports heterogeneous synchronization between a multitude of different protocols. For example, as well as being able to migrate data to and from Amazon EFS and NFS, you can also migrate between NFS, CIFS, Amazon S3, NetApp StorageGRID® WebScale, and any other object store that supports the S3 protocol. This gives you the versatility to transform data across a range of storage formats.
- Robust: Cloud Sync is a complete solution for data migration, transformation and synchronization and addresses all aspects of the data transfer process, such as scheduling, auditing, reporting, setup, error handling, etc. For example, if a Cloud Sync transfer is interrupted due to a temporary network outage, it can be restarted from where it left off after the issue has been resolved. By comparison, DIY solutions created with command line tools and scripts can be prone to failure or unexpected behaviour, and do not usually support the same level of features.
- Secure: Although Cloud Sync operates as a service solution, your data is kept within your VPN or VPC at all times. All data transfer occurs only between data broker instances and the source and destination.
- Cost Effective: Cloud Sync charges are based on the number of synchronization relationships you create, as opposed to the amount of data transferred. This means you can transfer an unlimited amount of data while keeping costs manageable.
- Easy to Use: The Cloud Sync web-based GUI and wizard-style interface make setting up, monitoring, and managing synchronization relationships very easy to accomplish, even for non-technical users.
Protecting Amazon EFS data with Cloud Sync
While creating and using Amazon EFS filesystems is very quick and easy to do, backing up existing shares that have been around for a while is a little more involved. The approach documented by AWS involves the deployment of an AWS Data Pipeline that uses a workflow provided by Amazon through github.
This solution effectively copies the live Amazon EFS data to a secondary, backup Amazon EFS filesystem. It allows you to maintain a rolling set of backups and an additional workflow template is provided to restore the data back to the production environment.
Use of this backup workflow requires allocation of Amazon EC2 resources, as well as an Amazon S3 bucket to hold required assets. AWS Data Pipeline also has related costs, and by needing to store both the live and backup Amazon EFS file systems, cloud storage costs will also increase significantly. Each backup operation is a complete replica of the source data, and in order to create a consistent copy, Amazon recommends performing the backup at a period of low activity. Read more here about Amazon S3 sync.
Amazon EFS File Sync allows for data to be migrated into Amazon EFS from NFS file shares hosted on-premises or in Amazon EC2. Amazon EFS File Sync does not support copying data back out of Amazon EFS, and therefore cannot be used as a backup solution. Amazon EFS File Sync uses an intermediary agent to manage synchronization operations, which provides a text-based console interface for configuration and setup.
Cloud Sync has the advantage of being a service offering and therefore requires no manual setup. The modern, web-based user interface provides users with easy access to all functionality, including the setup of the data broker instances used to facilitate data migrations. As a general solution to data migration and synchronization, Cloud Sync supports a broader scope and can be used to work with other storage systems, such as CIFS shares and object stores, or perform cross region replication, for example.
After the initial full backup, Cloud Sync is able to keep the data synchronized incrementally, by copying over only the changes made since the last sync. Not only does this make the synchronization process much faster and hugely more efficient, it is also very useful when the destination system supports point-in-time snapshot copies. By creating such a snapshot when Cloud Sync finishes synchronizing, you automatically get a rolling set of backups.
As we’ve seen in the course of this article, Cloud Sync makes data synchronization easier, faster, and more robust than homegrown solutions. When working with Amazon EFS in particular, Cloud Sync can be a lifesaver when migrating to and from Amazon EFS and when a backup solution is required, providing added flexibility and ease of use.