Amazon Elastic MapReduce (Amazon EMR) is a scalable Big Data analytics service on AWS. When using Amazon EMR clusters, there are few caveats that can lead to high costs. When using EMR alongside Amazon S3, users are charged for common HTTP calls including GET, SELECT, PUT, POST, and other operations. When using Amazon EBS as storage for EMR, implementing bootstrap actions and manually tracking the automatic disk space increases that take place when volumes reach 90% capacity leads to increased management overheads.
Using the NetApp-In-Place-Analytics Module (NIPAM), Amazon EMR users can run analytics jobs on their current NFS repositories on AWS with Cloud Volumes ONTAP, or burst their on-prem data instantly to Cloud Volumes ONTAP by using FlexCache. With Cloud Volumes ONTAP, Amazon EMR users gain cost-cutting storage efficiencies, zero API costs, data mobility, and automated data tiering between Amazon EBS and Amazon S3, so cold data is stored at low-cost when Amazon EMR isn’t running analytics jobs. For Cloud Volumes ONTAP users, this integration with Amazon EMR provides an easy way to analyze all the NFS data stored in the cloud.
Set up an Amazon EMR Cluster for your analytics workload
Create a Cloud Volumes ONTAP deployment
Install NIPAM and connect Cloud Volumes ONTAP to Amazon EMR
A single storage back end to service both enterprise workloads and your AWS EMR architecture.
Robust data reliability with NetApp Snapshot™ copies, SnapMirror® data replication, and AWS high availability pair deployments.
No API costs when running EMR on data hosted by Cloud Volumes ONTAP.
Tiering cold data automatically between Amazon EBS disks and low-cost Amazon S3 object storage as needed.
Storage efficiencies that drastically reduce data footprint and associated storage costs.
NetApp FlexClone® data cloning technology allows you to instantly deploy volume clones on which you can run variations of analytics while keeping your main volumes dedicated to production workloads.