logo-ontap

Cloud Volumes ONTAP

AWS EMR

Optimizing data costs for AWS EMR analytic workloads

Start Free Trial

CHALLENGE

High Costs and Data Management Overhead

EMR is a scalable Big Data analytics service on AWS. When using EMR service, there are few caveats that can lead to high costs. When using EMR with S3, the user is charged for HTTP calls including GET, SELECT, PUT, POST, and other operations. When using EBS as storage for EMR, you need to implement a bootstrap action on your EMR cluster, which will program the cluster to monitor disk space and increase volume size seamlessly when detecting a volume reaches 90% used capacity. Therefore, sizing needs to be tracked attention to it which results in management overhead.

sql-problem-1
SOLUTION

Enterprise-Grade Data Management

Using NIPAM (NetApp-In-Place-Analytics Module), EMR users can run analytics jobs on their current NFS repositories on AWS with Cloud Volumes ONTAP, or burst their on-prem data instantly to Cloud Volumes ONTAP by using FlexCache. With Cloud Volumes ONTAP, EMR users gain cost-cutting storage efficiencies, zero API costs, data mobility, AI-driven data mapping via Cloud Compliance, and automated data tiering between Amazon EBS and Amazon S3, so cold data is stored at low-cost when EMR isn’t running analytics jobs. For Cloud Volumes ONTAP users, this integration with EMR provides an easy way to analyze all the NFS data stored in the cloud.

sql-solution-1

How it Works

Create EMR Cluster for your analytic workload 

Create CVO Deployment

Install NIPAM and connect CVO to EMR

Deploy Now

Benefits

illus-self-service-1

Easy Management

A single storage back end to service both enterprise and analytics data.

Robust data reliability with NetApp storage snapshots, SnapMirror® data replication, and AWS High Availability pair deployments.

illustration-cost-2

Cost

No API costs when running EMR on data hosted by Cloud Volumes ONTAP.

Tiering cold data automatically between Amazon EBS disks and low-cost Amazon S3 object storage as needed.

Storage efficiencies that drastically reduce data footprint and associated storage costs.

illus-agility

Agility

NetApp FlexClone® data cloning technology allows you to instantly deploy volume clones on which you can run variations of analytics while keeping your main volumes dedicated to production workloads.

illus-privacy

Privacy

Always-on AI-driven privacy controls and reporting to meet strict security and compliance demands.

Resources

Blog cvo-8-blog-1

Running EMR Clusters on AWS and Cloud Volumes ONTAP

ic-pricing-9

Pricing

Get Block and File Storage for the price of Object Storage

See Full Pricing