Cloud Automation

DevOps vs SRE: Can SRE Make DevOps Better?

What is Site Reliability Engineering (SRE)?

What is DevOps?

Site reliability engineering (SRE) is focused on ensuring that production environments are stable and reliable. To this end, SRE engineers apply automation to IT operations tasks like support and maintenance.

 

The purpose of SRE is to quickly fix bugs and eliminate repetitive manual work. A site reliability engineer can collaborate with developers in designing and engineering tools to improve the reliability of enterprise services, and collaborate with IT operations teams to manage and support those tools.

 

SRE operations strive to improve the reliability, manageability and ease of operation of software services.

The term DevOps stands for development and operations, and represents the unification of these two teams. The goal of DevOps teams is to shorten the software development lifecycle, improve development velocity and promote higher quality, through collaboration and integration between development and operations teams.

 

DevOps promotes collaboration and eliminates silos in the software development pipeline. To ensure faster software release, DevOps teams adopt iterative methodologies, microservices architecture, and deploy systems using cloud automation and cloud native technologies like containers and serverless.

 

DevOps leaders work to create a cultural change that help team members adjust and adapt to the new process.

In this article, you will learn:

What is the Effect of SRE Initiatives on DevOps?

SRE and DevOps are commonly used together at organizations and can have a positive effect on each other. Usually, DevOps provides a framework for managing software development and operating software services, and SRE helps the organization focus on specific operational objectives.

Reducing Organizational Silos

The SRE approach considers operations as a software engineering problem. This means engineers work on solving reliability issues that were previously not considered within the responsibility of either Dev or Ops.

SRE fosters joint responsibility for system stability and performance. It helps the organization create systematic processes and tooling to detect and respond to software defects. It takes a goal-oriented approach, which causes the DevOps organization to clearly define and continuously improve metrics like load time, time to detection and resolution, and occurrence of production defects.

Accepting Failure as Normal

SRE teams do not merely point to deficiencies in operational processes and pass the responsibility to others in the team. Rather, they create a framework for effective reduction of these deficiencies. The SRE process uses the concept of a risk budget, which lets organizations move fast with innovation, while allowing a margin of error. SRE also assumes that ongoing innovation is not compatible with 100% performance and availability targets.

An SRE perspective requires strong collaboration between IT and business when evaluating optimal targets for service level indicators (SLIs) and service level objectives (SLOs). Each violation should funnel a feedback loop back to IT teams. Additionally, targets should be re-evaluated and optimized according to changing IT and business circumstances. The SRE framework requires a blameless postmortem investigation of failure incidents.

Implementing Gradual Change

Like DevOps, SRE promotes continuous improvement. To facilitate this aspect, SRE requires changes to be small and frequent. Ideally, this approach should ensure that negative repercussions have less impact and help teams to quickly test and implement low-risk improvements.

To implement gradual change, SRE teams use automated testing, typically as part of the CI/CD pipeline. Ideally, objective measurements of change should be defined in a way that ensures operational goals are gradually improving, while reducing the cost of failure.

Leveraging Tooling and Automation

DevOps promotes the adoption of automation and technology - and often the technology stack will be different for different teams. SRE, on the other hand, promotes consistent use of tools and technology across projects, because integration issues and incompatibility of different technologies can create unnecessary silos.

SRE requires that any adopted technology should suit the skill set of each team and service area. However, this does not mean teams must use a certain one set of tools for a specific task. The focus is primarily on standardization—ensuring that all tools and their associated ITSM activities can be managed by the same API or automation framework.

Does Your Company Need DevOps, SRE, or Both?

SRE and DevOps are not conflicting factions. They both use the same tools and work towards the same goal, but with a different focus.

Here are a few considerations that can help you decide if your organization needs DevOps, SRE, or a mix of both:

  • Culture—SRE culture prioritizes stability over speed of change, while DevOps emphasizes agility at each stage of the product development cycle.
  • Applicability—SRE teams are based on the Google model which was developed to handle billions of requests every day. SRE is ideal for large, technology-oriented enterprises that prioritize service availability. DevOps is useful for any company that needs to respond rapidly to market requirements.
  • Business value—SRE provides business value by improving system reliability and the user experience, especially for large scale services. DevOps contributes to the organization by improving quality and driving innovation that meets customer needs.
  • New vs. existing services—SRE is heavily focused on maintaining and improving existing software services. DevOps is more strongly focused on creating new services, products and features.

DevOps-Compatible Storage with Cloud Volumes ONTAP

NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP capacity can scale into the petabytes, and it supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.

Next steps:

Yifat Perry, Product Marketing Lead

Product Marketing Lead

-