HPC on Azure

HPC on Azure: Best Practices for Successful Deployments

September 22, 2020

Topics: Azure NetApp Files Advanced8 minute read

What Is High Performance Computing (HPC) on Azure?

HPC systems are systems that you can create to run large and complex computing tasks with aggregated resources. These systems are made up of clusters of servers, devices, or workstations working together to process your workload in parallel.

While traditional HPC deployments are on-premises, many cloud vendors are beginning to offer HPC-compatible Infrastructure as a Service (IaaS) offerings, and Azure is one of them. With Azure resources, you can create a pure cloud HPC deployment or you can create hybrid deployments supplemented by cloud resources.

In Azure, you can run both massively parallel workloads and large workloads that cannot run in parallel with high speed interconnects. This results in shorter processing times and an ability to perform significantly more complex operations with compute-optimized virtual machines (VMs) or GPU-enabled instances.

In this article, you will learn:

Azure HPC Components

While HPC deployments in Azure can vary according to your specific workload needs and budget, there are some standard components in any deployment. These include:

  • Azure Resource Manager—enables you to deploy applications to your clusters via script files or templates.
  • HPC head node—enables you to schedule jobs and workloads to your worker nodes. This is a virtual machine (VM) that you use to manage HPC clusters.
  • Virtual Network—enables you to create an isolated network for your clusters and storage through secure connections with ExpressRoute or IPsec VPN. You can integrate established DNS servers and IP addresses in your network and granularly control traffic between subnets.
  • Virtual Machine Scale Sets—enables you to provision VMs for your clusters and includes features for autoscaling, multi-zone deployments, and load balancing. You can use scale sets to run several databases, including MongoDB, Cassandra, and Hadoop.
  • Storage—enables you to mount persistent storage for your clusters in the form of blob, disk, file, hybrid, or data lake storage.

Related content: read our guide to HPC storage.

Managing Azure HPC Deployments

Azure offers a few native services to help you manage your HPC deployments. These tools provide flexibility for your management and can help you schedule workloads in Azure as well as in hybrid resources.

Microsoft HPC Pack
A set of utilities that enables you to configure and manage VM clusters, monitor operations, and schedule workloads. HPC Pack includes features to help you migrate on-premises workloads or to continue operating with a hybrid deployment. The utility does not provision or manage VMs or network infrastructure for you.

Azure CycleCloud
An interface for the scheduler of your choice. You can use Azure CycleCloud with a range of native and third-party options, including HPC Pack, Grid Engine, Slurm, and Symphony. CycleCloud enables you to manage and orchestrate workloads, define access controls with Active Directory, and customize cluster policies.

Azure Batch
A managed tool that you can use to autoscale deployments and set policies for job scheduling. The Azure Batch service handles provisioning, assignment, runtimes, and monitoring of your workloads. To use it, you just need to upload your workloads and configure your VM pool.

Azure for the Semiconductor Industry
Azure provides a high performance computing (HPC) platform that comes with high availability and scalability, and it is available for users worldwide. The platform infrastructure is secure and provides fully managed supercomputing services.

Azure HPC workloads offer machine learning, visualization, and rendering, all of which can be leveraged for applications in the semiconductor industry. This enables seamless and resilient cloud integration of oil and gas workloads, as well as cloud-based genomic sequencing and semiconductor design.

Related content: read our guide to HPC use cases.

Best Practices for Azure HPC Deployments

When using HPC in Azure, these best practices can help you get the performance and value you expect.

Distribute Deployments Across Cloud Services

Distributing large deployments across cloud services can help you avoid limitations created by overloading or relying on a single service. By splitting your deployment into smaller segments, you can:

  • Stop idle instances after job completion without interrupting other processes
  • Flexibly start and stop node clusters
  • More easily find available nodes in your clusters
  • Use multiple data centers to ensure disaster recovery

When splitting services, aim for a maximum of 500 VMs or 1000 cores per service. If you deploy more resources than this, you may run into issues with IP address assignments and timeouts. You can reliably split deployments across up to 32 services. Larger splits are untested.

Use Multiple Azure Storage Accounts for Node Deployments

Similar to spreading deployments across services, it’s recommended to attach multiple storage accounts to each deployment. This can provide better performance for large deployments, applications restricted by input/output operations, and custom applications.

When setting up your storage accounts, you should have one account for node provisioning and another for moving job or task data. This ensures that both provisioning and data movement are consistent and low latency.

Increase Proxy Node Instances to Match Deployment Size

Proxy nodes enable communication between head nodes you are operating on-premises and Azure worker nodes. These nodes are attached automatically when you deploy workers in Azure.

If you are running large jobs that meet or exceed the resources provided by the proxy nodes, consider increasing the number you have running. Increasing is especially important as your deployment gets bigger.

Connect to Your Head Node With the HPC Client Utilities

The HPC Pack client utilities are the preferred method for connecting to your head node, particularly if you are running large jobs. You can install these utilities on your users’ workstations and remotely access the head node as needed rather than using Remote Desktop Services (RDS). These utilities are especially helpful if many users are connecting at once.

HPC on Azure with NetApp Azure Files

Azure NetApp Files is a Microsoft Azure file storage service built on NetApp technology, giving you the file capabilities in Azure even your core business applications require.

Get enterprise-grade data management and storage to Azure so you can manage your workloads and applications with ease, and move all of your file-based applications to the cloud.

Azure NetApp Files solves availability and performance challenges for enterprises that want to move mission-critical applications to the cloud, including workloads like HPC, SAP, Linux, Oracle and SQL Server workloads, Windows Virtual Desktop, and more.

In particular, Azure NetApp Files allows you to migrate more applications to Azure–even your business-critical workloads–with extreme file throughput with sub-millisecond response times.

Learn More About HPC on Azure

Read more in our series of guides about HPC on Azure

Migrate Legacy Apps to Cloud
Many organizations rely on legacy applications, those applications that are dependent on traditional IT structures. This reliance can make migrations to the cloud seem impossible or problematic for many companies. However, you can lift and shift applications to the cloud, gaining the benefits of cloud operations for other workloads while protecting your legacy investment.

In this article you’ll learn about what’s involved when moving legacy applications to the cloud, how to address cloud migration concerns, and how to speed your migration process with Azure NetApp Files.

Read “Migrate Legacy Apps to Cloud” here.

Solve Azure HPC Challenges eBook
In this eBook you’ll learn how to meet the requirements for an HPC file system without code changes, how to reduce your computational time, and how 3 EDA and Oil and Gas operations have successfully used Azure HPC.

Read “Solve Azure HPC Challenges eBook” here.

Solve Azure EDA Workload Challenges Guide
In this guide you’ll learn how semiconductor developers can gain low-latency, high performance and agility with Azure, what solutions are available to help you scale while controlling performance and costs in the cloud, and how Azure NetApp Files can optimize EDA workloads.

Read “Solve Azure EDA Workload Challenges Guide” here.

How Azure NetApp Files Supports HPC Workloads in Azure
To maximize the benefits of HPC resources in Azure, you need storage resources that can match HPC performance and resilience. While you can combine multiple Azure storage services to achieve this, it may be easier to centralize your storage management with Azure NetApp Files.

In this article you’ll learn how Azure NetApp Files complements HPC workloads, how to manage Azure NetApp Files, and how Azure Storage with NetApp works.

Read “How Azure NetApp Files Supports HPC Workloads in Azure” here.

Energy Leader Repsol Sees a Surge in Performance in Azure NetApp Files
Energy sector operations demand performance and reliability that are far beyond that of many organizations. This industry is one of the main users of HPC deployments and companies in it, like Repsol, demand innovative, effective solutions, like Azure NetApp Files.

In this article you’ll learn how Repsol leveraged Azure NetApp Files to manage mission-critical HPC applications and increase workload performance.

Read “Energy Leader Repsol Sees a Surge in Performance in Azure NetApp Files” here.

Chip Design and the Azure Cloud: An Azure NetApp Files Story
Chip design processes require high bandwidth and low latency to meet the fast workload processing times that the industry demands. The more parallel jobs a design firm can run, the faster their time to market and the more competitive they are.

In this article you’ll learn how Azure NetApp Files supports the workloads required for chip design and see what benchmarks the service is able to meet.

Read “Chip Design and the Azure Cloud: An Azure NetApp Files Story” here.

What is Cloud Performance and How to implement it in Your Organization
The performance of your cloud resources is critical to ensure business continuity. Learn how to ensure performance is not interrupted despite peaks in demand. This article explains which metrics to use, which testing is relevant to performance, and which techniques you can implement.

Read "What is Cloud Performance and How to Implement it in Your Organization" here.


See Our Additional Guides on Key IaaS Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of IaaS.

Cloud Migration

Learn about cloud migration and what major challenges to expect when implementing a cloud migration strategy in your organization. 

See top articles in our cloud migration strategy guide:

AWS Migration

Learn about Amazon’s basic framework for migration, and how to plan for common challenges that affect almost every migration project.

See top articles in our AWS migration guide:

AWS High Availability

Discover how highly available systems are reliable and resilient and see how AWS can help you achieve high availability for cloud workloads, across 3 dimensions.

See top articles in our AWS high availability guide:

AWS EBS

Learn what is AWS EBS and how to perform common EBS operations. Including five highly useful EBS features that can help you optimize performance and billing. 

See top articles in our guide to AWS EBS:

AWS Cost

Learn how Amazon Web Services (AWS) prices its cloud services and what you can do to optimize your costs in the Amazon cloud.

See top articles in our AWS cost optimization guide:

AWS EFS

Learn about AWS EFS, your backup options, how to optimize performance, see a brief comparison of EFS vs EBS vs S3, and discover how Cloud Volumes ONTAP can help.

See top articles in our guide to AWS EFS:

Azure Migration

Learn about aspects of considerations when implementing Azure migration: migration models, state assessment, storage configuration, security, and maintenance. 

See top articles in our Azure migration guide:

Azure Cost Management

Learn about tools and practices that can help you manage and optimize costs on the Microsoft Azure cloud.

See top articles in our Azure cost management:

Azure High Availability

High availability is one of the major benefits of cloud services. The guarantee that your data will remain accessible is critical to supporting high priority workloads and applications and is the reason many move to the cloud in the first place.

This guide explains what high availability is and how to optimize Azure high availability.

See top articles in our Azure high availability guide:

SAP on Azure

Learn about all SAP solutions offered as a service on Azure, including HANA, S/4HANA, NetWeaver and Hybris, migration considerations and best practices.

See top articles in our guide to SAP on Azure:

Linux on Azure

Learn how to use Linux on Azure, including guides for cloud-based enterprise Linux deployments and performance tips.

See top articles in our guide to Linux on Azure:

Kubernetes in Azure

Learn how to run Kubernetes clusters and containerized applications in Azure, using the Azure Kubernetes Service (AKS), Azure Container Instances (ACI), and related services.

VDI on Azure

Learn what options are available for VDI on Azure. Understand how the architecture works and discover best practices for VDI deployments.

See top articles in our guide to VDI on Azure:

Google Cloud Migration

Learn how to migrate your workloads and data to Google Cloud, including in-depth comparisons between GCP and other cloud providers, tools, strategies, costs, and more.

See top articles in our guide on Google Cloud migration:

VMware Cloud

Learn how VMware partners with public cloud providers to help users run virtualized workloads in a cloud environment.

See top articles in our guide on VMware Cloud:

AWS FSx

Learn about Amazon FSx, a fully managed service that lets you run managed Windows Server and Lustre file systems to support high performance and high throughput data scenarios.

Google Cloud Pricing

Learn how Google Cloud prices its cloud services and what you can do to optimize and reduce your costs in Google Cloud.

Kubernetes on AWS

Learn how to run Kubernetes clusters and containerized applications in AWS, using the Elastic Kubernetes Service (EKS), Amazon Fargate, and related services.

Cloud Data Services

-