A Deep Look at Uploading Data to Amazon S3

April 28, 2017

Topics: Cloud Volumes ONTAP Cloud Sync AWS 7 minute read

Amazon Simple Storage Service (S3) is one of the most popular data storage services offered by AWS. Organizations of all sizes can benefit from making the move to S3, from small startups that need to move just a few GBs of data to large enterprises or video storage systems looking to move massive petabytes (PBs) of data.

If your organization wants to move data to S3, you should get to know the various options AWS has to offer, since each one is based on data size.

This article will walk you through the options AWS offers for moving data to S3 so you can find the one that best suits the amount of data that you have to move.

Solutions for Small to Mid-size Organizations

Consider the use case of an organization that has an abundance of documents and images on their hands. In addition, the organization has to maintain all its financial documents as well as employee data for audit and statutory purposes.

This data is measured in the GBs and if the same data is stored in-house, it takes a huge amount of resources to ensure that the data is maintained with the right kind of storage, retrieval, durability, and availability. If the same data is moved to AWS S3 all those issues can be easily solved, at a much lower cost.

When you want to migrate a few GBs of data like this, the ideal tools for you to get to know are S3 CLI, AWS Import/Export service, Storage Gateway, and Transfer Acceleration.

1. S3 CLI

The S3 CLI is a simple but effective migration tool. It begins with creating an AWS Identity and Access Management (IAM is recommended fosync between source and destinationsync between source and destinationr proper access to resources) user identity followed by installing and configuring AWS CLI.

The process is followed by writing custom scripts to call the correct APIs specifying the destination bucket using the SDKs. Multipart Upload APIs will be your best friend in this scenario.

One of the ideal cases for using S3 CLI is when you want to continuously migrate files such as logs and backup data from an application server. For this, you can write automated scripts or even use AWS SDK to upload the data to S3 at a predefined frequency.

In general, S3 also provides you with an option to sync between source and destination.

For more information on how to sync using CLI, click here.

2. Import/Export

The AWS Import/Export service enables you to transfer your data to S3 by shipping a physical device to AWS. This service is used to transfer data below 16TB and is ideally suited for the following scenarios: Direct Data interchange, off-site backups, and disaster recovery.

Direct Data Interchange - If you consistently get versatile stockpiling data from your business partners, you can have them send that data straight to AWS for import to your S3 storage.
Off-Site Backup - Send full or incremental reinforcements of your backups to Amazon S3 for dependable and excess off-site stockpiling.
Disaster Recovery - In the occasion you have to rapidly recover a substantial reinforcement put away in Amazon S3, utilize AWS Import/Export to exchange the information to a convenient stockpiling gadget and convey it to your site.

For an organization that needs to perform a one-time migration, like in the use case mentioned above, Import/Export would be the right solution.

To get started with AWS Import/Export jobs, click here. However, beginners might find these links about how to import data to S3 and export data to S3 helpful.

3. Storage Gateway

The AWS Storage Gateway service is used to tap into the AWS cloud from your on-premises servers, providing you with hybrid cloud storage. It uses a multi-protocol storage appliance with highly efficient network connectivity.

There are a few scenarios when Storage Gateway is used:

File Gateway: Connecting as a file server resulting in a network file share for on-premises servers and applications.
Connecting as a Local Disk: This connection will transfer your data to S3 for backup while keeping a copy of the data stored in the local disk for high performance. The data can either be stored in cached or stored volumes.

Cached volumes store your data in S3 and keep a copy of frequently accessed data subsets locally. You can consider it the same as the cache of an operating system. Because of its low latency, enterprises use cached volumes for performing manipulations on frequently accessed data.

Stored volumes are used when you want to access complete data sets in low latency. In this configuration your on-premises gateway stores all your data locally and then asynchronously backs up to Amazon S3 through point-in-time snapshots.
Tape Gateway: Connecting as a virtual tape library (VTL) using existing backup and recovery software to perform native backup jobs. Your data will be synced with S3 and later moved to Glacier.

In the use case mentioned above, Cached Volumes would be a good solution for the organization because whenever creating a new record, a backup is stored on AWS.

AWS Storage Gateway is an ideal solution for enterprises that want to make use of the hybrid cloud. It is also helpful when enterprise want to completely replace their traditional tape library with the cloud.

4. Transfer Acceleration

Amazon S3 Transfer Acceleration enables quick accelerated secure transfer of files over long distances between your client and your Amazon S3 bucket.

It leverages Amazon CloudFront’s edge locations: as data arrives at an AWS edge location, the data is routed to your Amazon S3 bucket over an optimized network path.

AWS Transfer Acceleration is useful for organizations with applications that have customers around the globe uploading data to centralized buckets. Transfer Acceleration also benefits organizations who regularly transfer data in GBs across continents and upload data using AWS SDK/CLI.

Solutions for Large Enterprise Organizations

Let’s take a use case of a company that has a huge set of videos available online, including online classes that are available for participants to access at their convenience.

The challenge for this startup is managing to migrate petabytes and exabytes of data.
When you want to migrate hundreds of PBs or Exabytes of data to S3, the ideal tools are Snowball and Snowmobile.

1. Snowball

Snowball is a petabyte-scale information transport arrangement that utilizes secure appliances to exchange petabytes of information in and out of the AWS cloud.

Snowball was introduced by AWS to overcome difficulties with vast scale data transfer including high system costs, long exchange times, and security concerns.

It can be as little as one-fifth of the cost of sending data through Internet and is generally used for cloud migration, disaster recovery, data center decommission.

Snowball is also useful when a lot of information is sent between you and your customers, clients, or business partners.

If you want to store more and process that data Snowball-Edge will interest you.

The device has in-built computing and also allows multiple Snowball Edge devices to be clustered together for reasons like durability and sending data in batches.

2. Snowmobile

If you are planning to move your data centers to a different location, then this service is for you. Snowmobile is used to transfer Exabytes of data to AWS.

The Snowmobile itself is a massive semi-trailer truck equipped with a storage capacity of up to 100 PBs of data. The Snowmobile comes directly to your site location, allowing your organization’s network to connect to and copy data onto the Snowmobile equipment.

In a matter of weeks Snowmobile can move an amount of data to a new location that would take years – even decades – to transmit over a hardline.

Of course, all the data is encrypted with AWS Key Management Service (KMS). The service comes with high security features such as dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an optional escort security vehicle while in-transit

Third-Party Migration Tools

AWS S3 The real challenge comes when you use a hybrid cloud infrastructure and you need to sync data between on-premises or cloud locations and AWS.

You might want to use S3 to perform data analytics on data which is stored on-premise.

That data has to first be synced with AWS S3 for its services to perform operations and afterwards the results are sent back to your on-premise data center.

If you are an open source enthusiast who likes to build and modify solutions on your own, you can use scripts in combination with tools such as Rclone or rsync.

There is also NetApp’s Cloud Sync, which provides precisely the hybrid cloud storage solutions you require. Cloud Sync synchronizes your data from on-premises or the cloud to S3, using NFS or CIFS shares.

By moving your data quickly and securely to S3 you can now conveniently utilize AWS services like AWS Elastic Map Reduce (EMR), Redshift, and RDS. Cloud Sync will sync your data back to its origin once your results are ready.

Cloud Sync offers a 14 day free trial and get be accessed here.

Conclusion

These days migrating data to cloud and specially to S3 is increasingly popular trend due to various tools AWS offers to migrate data. It all depends on the needs of the organization and the size of the migration.

Sometimes, an organization just has to use its available AWS resource pool to create an optimal solution for data migration, but other times it is necessary to find a technology partner who can make the move for them, such as NetApp’s Cloud Sync.

Gali Kovacs