Cloud Volumes ONTAP

Amazon EMR

Optimizing data costs for Amazon EMR analytic workloads

Start Free Trial


High Costs and Data Management Overhead

Amazon Elastic MapReduce (Amazon EMR) is a scalable Big Data analytics service on AWS. When using Amazon EMR clusters, there are few caveats that can lead to high costs. When using EMR alongside Amazon S3, users are charged for common HTTP calls including GET, SELECT, PUT, POST, and other operations. When using Amazon EBS as storage for EMR, implementing bootstrap actions and manually tracking the automatic disk space increases that take place when volumes reach 90% capacity leads to increased management overheads.


Enterprise-Grade Data Management

Using the NetApp-In-Place-Analytics Module (NIPAM), Amazon EMR users can run analytics jobs on their current NFS repositories on AWS with Cloud Volumes ONTAP, or burst their on-prem data instantly to Cloud Volumes ONTAP by using FlexCache. With Cloud Volumes ONTAP, Amazon EMR users gain cost-cutting storage efficiencies, zero API costs, data mobility, and automated data tiering between Amazon EBS and Amazon S3, so cold data is stored at low-cost when Amazon EMR isn’t running analytics jobs. For Cloud Volumes ONTAP users, this integration with Amazon EMR provides an easy way to analyze all the NFS data stored in the cloud.


How it Works

Set up an Amazon EMR Cluster for your analytics workload 

Create a Cloud Volumes ONTAP deployment

Install NIPAM and connect Cloud Volumes ONTAP to Amazon EMR

Deploy Now



Easy Management

A single storage back end to service both enterprise workloads and your AWS EMR architecture.

Robust data reliability with NetApp Snapshot™ copies, SnapMirror® data replication, and AWS high availability pair deployments.



No API costs when running EMR on data hosted by Cloud Volumes ONTAP.

Tiering cold data automatically between Amazon EBS disks and low-cost Amazon S3 object storage as needed.

Storage efficiencies that drastically reduce data footprint and associated storage costs.



NetApp FlexClone® data cloning technology allows you to instantly deploy volume clones on which you can run variations of analytics while keeping your main volumes dedicated to production workloads.




Get Block and File Storage for the price of Object Storage

See Full Pricing