NetApp Cloud Volumes Service for AWS: Benchmarks

What does Cloud Volumes Service for AWS offer your workloads?

Fully-managed file services for NFS, SMB or dual protocol support. No need to manage Windows or Linux servers just to provide file services
On-demand scaling, without having to create new data stores
Consistently high performance, over 460k IOPS
Designed for four nines of availability and nine nines of durability
Cost savings with the ability to adjust performance tiers on the fly without having to move data
Protect your database with zero-impact incremental snapshots, only keep and pay for new writes
Enable innovation with rapid copies available within seconds
Sync or migrate your data from on-premises to AWS cloud without having to reformat your applications
Faster time to market by accelerating development and test using quick copies from snapshots

File Services
Oracle
Hadoop Spark
MySQL
AWS EFS
AWS Regions
HPC Workloads
EDA
SMB Multichannel

File services workloads - IOPS

File services workloads - IOPS

Using the FIO load generator and multiple EC2 instances, we found a single volume capable of delivering over 460K 4KiB reads and 125K 4KiB writes per second. For a single instance NFS I/O limits, with only a very little bit of NFS tuning (see nconnect mount options), we were able to drive over 200K 4KiB reads and the same 120K 4KiB writes. NetApp is working with other distributions of Linux to see the nconnect feature (present today in SLES15 and the 5.3 Linux Kernel) adopted more generally. Without the nconnect feature, testing has shown a single client capable of driving ~80K 4KiB reads/s or writes/s.

Read the full baseline performance story

File services workloads - throughput

File services workloads - throughput

Our sequential tests were done using the exact same tests and procedures. In this case, again using the nconnect mount option (nconnect=16) and a single client we saw ~1,700MiB/s of sequential reads and 570MiB/s of sequential writes. The former is possible because nconnect creates multiple network connections (flows) for the single storage endpoint thereby getting around the AWS 5Gbps per network flow limit. The latter 570MiB/s of writes is due to AWS instance level restrictions imposed upon egress over direct connect. From a multi-client configuration, ~4500MiB/s of sequential reads and ~1500MiB/s of sequential writes were observed. For sequential workloads, a 64KiB operation size as well as a 64K rsize and wsize were selected.

image-graph-3-1

Oracle IOPS with EBS and Cloud Volumes Service

The graphs compare the performance when using EBS and when using NetApp Cloud Volumes Service for AWS. The graph on the right shows that Oracle is able to drive 250,000 file system IOPS at 2ms when using the c5.18xlarge instance and a single volume provisioned from the Cloud Volumes Service, or 144,000 file system operations at below 2ms using the c5.9xlarge.

guide-graph-3

image-graph-4 (2)

Oracle IOPS at various read/write ratios

The graph to the left provides more performance examples of how Oracle workloads behave on Cloud Volumes Service for AWS when run on the same c5.18xlarge EC2 instance shown above. Note that Cloud Volumes Service provides a high level of IOPS at 2 ms or below across a range of read/write ratios. Also note that in all cases adding a second volume increases the IOPS provided.

For more information on Oracle Performance and Storage Comparison in AWS: Cloud Volumes Service, EBS, EFS.

guide-graph-4

diagram-3 (2)

Hadoop Spark gets 3,100MB/s

The graph on the right shows performance when using Amazon S3 and NetApp Cloud Volumes Service for AWS (service levels Standard and Premium). It shows that Spark is able to achieve an average throughput of 3,100MB/s against a single Cloud Volumes Service volume when run on 15 C5.9xlarge Amazon EC2 instances.

Although the price of the Premium service level ($0.20/GB/month) is higher than both the Standard service level ($0.10/GB/month) and the upfront costs of Amazon S3 (capacity + egress), the increased bandwidth results in both an overall cost reduction and improved run time, making the Premium service level more cost-efficient overall.

API costs make up a large portion of the Amazon S3 price. GET requests for Standard Access Tier are priced at $0.0004 per 1,000, so the cost of continuously using Amazon S3 for primary analytics clusters can add up to ~$170,000 annually.

Read this in-depth blog on Spark performance using Cloud Volumes Service for AWS.

group-50

MySQL Workload – Latency Relative to Throughput

For load testing MySQL in Cloud Volumes Service for AWS, we selected an industry standard OLTP benchmarking tool and continued increasing user count until throughput reached flatline. By design, OLTP workload generators heavily stress the compute and concurrency limitations of the database engine–stressing the storage is not the objective. That said, the tool used, rather than the storage, was the limiting factor in the graphs.

The 450MiB/s throughput observed in this benchmark test of MySQL on Cloud Volumes Service for AWS is sidling up the 5Gbps per network flow limit imposed by AWS. The metrics in the following graph–450MiB/s maximum throughput–are taken from nfsiostat on the database server and, as such, represent the perspective of the NFS client.

For this test, the following configuration was used:

Instance type: C5.9xlarge
MySQL Version: 10.3.2
Linux Version: Redhat Enterprise Linux 7.6
Workload Distribution to storage: 70/30 read/write with 4KiB operation database page size*
Volume Count: database volume (8TiB Extreme), 1 log volume (1TiB Standard)
Allocated Storage Bandwidth: database volume 1024MiB/s, log volume 16MiB/s
Database Size: 1.25TiB

image-graph-6 (2)

I/O Comparison of Cloud Volumes Service and AWS EFS

The graphs demonstrate how Cloud Volumes Service for AWS and EFS compare when running random and sequential workloads.

The following are the instance and limits in the test:

Instance limits:
Elastic File System: Maximum 250/MB/s per instance throughput
Cloud Volumes: 1GB/s maximum per instance throughput (512MB/s read + 512MB/s write)

Limits:
Elastic File System: Maximum 7,000 IOPS per volume (as documented by AWS)
Cloud Volumes: ~200,000 maximum IOPS per volume as tested

diagram-6-1

AWS Regional Performance

Here are the observed latencies in milliseconds between Amazon EC2 instances and Cloud Volumes Service for AWS based on the regions that the service is available in today.

group-71-copy-3

Genomics HPC workload on Cloud Volumes Service for AWS - Sequential Reads

Cloud Volumes Service for AWS was tested against competing products to move a database designed for genomic workloads to the cloud. The sequential read benchmark had 1 hour to complete - with a goal of ~10 TiB/hr (2,900MiBps). The test itself comprised 2,500 files representing 2000TIB of content.

The throughput achieved using Cloud Volumes Service volume is equal to 2,887MiBps or 9.91TiB/hr, which is 2.1x the rate of the four self-managed NFS servers and 3x times that of the Provisioned Throughput configured EFS volumes. Cloud Volumes Service for AWS achieved the results while also providing snapshot copies, which the other options were not able to provide, or not able to provide without impacting performance.

While the chart indicates a throughput of 2,887 MiBps the test data shows that only a handful of workers took longer than the rest of the 2,500 workers. In fact, most of the workers achieved a throughput of roughly 3,500 MiBps.

hpc-02-2

Genomics HPC workload on Cloud Volumes Service for AWS - Random Queries

As additional data points, the graph on the left shows the results of the second use case - that of a SQL-type query of the 2,500 genomic files. A lower time to completion indicates strong performance. Cloud Volumes Service was able to access data from 100,000 individuals in less than an hour, while also providing snapshot copies like in the previous case.

Read this blog and accompanying report to learn more about this test and the benefits the genomics company gets with Cloud Volumes Service.

group-48-copy

EDA Workload - Latency vs. Operations per Second Rate

The graphs below (or to the right) show performance of a synthetic EDA workload in NetApp Cloud Volumes Service for AWS. Using six NFS cloud volumes and 36 SLES15 r5.4xlarge instance clients, the workload achieved:

130,000 IOPS (2.8GiB/s throughput) at 2ms latency,
250,000 IOPS (4.4GiB/s throughput) at 4ms latency,
303,000 IOPS (5.3GiB/s throughput) at 9ms latency.

no-image

Layout wise, the test generated 5.52 million files spread across 552K directories. The complete workload is a mixture of concurrently running frontend (verification phase) and backend workloads (tapeout phase) which represents the typical behavior of a mixture of EDA type applications.

The frontend workload represents frontend processing and as such is metadata intensive— think file stat and access calls—by majority metadata; this phase also includes a mixture of both sequential and random read and write operations. Though the metadata operations are effectively without size, the read and write operations range between sub 1K and 16K with the majority of reads between 4K and 16K and most of the writes 4K or less.

The backend workload, on the other hand, represents I/O patterns typical for the tapeout phase of chip design. It is this phase that produces the final output files from files already present on disk. Unlike the frontend phase, this workload is entirely comprised of sequential read and write operations, and a mixture of 32K and 64K OP size.

Graphically speaking, most of the throughput shown in the graph comes from the sequential backend workload and the I/O from the small random frontend phases–both of which happened in parallel.

diagram-3-copy-6

SMB Multichannel (Scale Up) Workload - IOPS

Random I/O: With the introduction of SMB Multichannel, we saw an enormous increase in performance. I/O increased up to 150% (Chart 1). Throughput increased up to 500% (see Chart 2). Read the FAQ for full information on SMB Multichannel on Cloud Volumes Service for AWS.

The tests show that we were able to achieve up to 74,700 random read IOPS on a single instance and 60,000 random write IOPS on a single instance.

SMB-1

SMB Multichannel (Scale Up) Workload - Throughput

Sequential I/O: With SMB Multichannel, throughput on Cloud Volumes Service for AWS increased by up to 500%.

In these tests, we were able to achieve up to 2,764 MiB/s of sequential read throughput and 578 MiB/s of sequential write throughput using FIO on a single instance.