Monitoring applications effectively is a challenge often neglected by developers and operational teams, one that takes a back seat to developing the application or service itself. Deadlines, inexperience, company culture, management, and misleading or incorrect metrics can impact the ability to predict problems and negatively impact good work.
As the uptake of microservices architecture grows, support, logging, and monitoring become more complex. Since a service usually reaches an end user through dozens of applications with different technologies connecting amongst themselves, a tiny failure could stop the entire process. That is why developers frequently choose to use orchestration tools that handle all the services and servers in a unique structure.
These days, where we’re talking about container orchestration tools, we’re probably talking about Kubernetes (K8s). Kubernetes has all but taken over the tech world in just a few years. One of the best features of K8s is that it removes the complexity of handling distributed processing and presents the user with an abstraction from a single point of view. However, the question is: How can a tool like this be monitored?
In this article, we highlight the importance of monitoring a cluster running Kubernetes, discuss which metrics can be used, and compare tools that can do the job.
Kubernetes: Solutions and New Challenges
As the name explains, a container orchestration tool is designed to manage many containers that need to be synchronized. The tool decides the physical location of each container deployed and how many instances are available. If there is more than one instance, a load balancer may be needed, and the tool must know how to create one. If a machine or application goes down, it’s the tool must bring it back or create a new one.
So far, so good. But managing Kubernetes can be tough. DIY deployment is not for the faint of heart, and requires significant knowledge of Linux, networking, operational systems, and security. It’s crucial to have a team dedicated to managing it if that’s the path you choose, especially for production services.
Fortunately, there are options to lessen the pain. For example, NetApp Kubernetes Service can give organizations the value of K8s while avoiding the many headaches involved in managing it.
Despite this, there is also another challenge: monitoring. Since Kubernetes manages the entire cluster, operations teams need to have a useful set of metrics to evaluate the cluster, pods, and applications—and condense them into an easy-to-visualize dashboard.
Monitoring Key Metrics
Kubernetes monitoring can be segmented into two significant categories: cluster monitoring and pod monitoring.
Cluster monitoring involves evaluating the health of an entire set of machines. Some key monitoring metrics in this category are:
Available nodes: The number of nodes available allows the administrator to evaluate how the entire cluster is performing
Node resources: Each Kubernetes node has its own resource utilization to track, such as CPU usage, memory page faults, and available disk space.
Pods deployed: The number of pods running on each node exposes cluster vulnerabilities and how K8s has handled each deployment. For example, a key system should have more than one pod running in different nodes and should be balanced by a load balancer.
Monitoring pods is also essential. After all, it doesn’t matter if the cluster is available and healthy if there are no services running. Pod metrics can be classified into three categories: Kubernetes metrics, container metrics, and application metrics.
A pod's Kubernetes metrics are the metrics extracted by the orchestrator. These metrics allow the administrator to track ongoing deployments or how many instances are running. If the number of running pods is lower than expected, for instance, this issue may indicate the cluster is out of resources.
Much like node resources, containers, too, have their own resource utilization metrics to track. And just like node resources, these, too, can be constrained. It’s crucial to know how allocated resources are being utilized at the container level to identify problems and bottlenecks.
Application metrics, of course, depend on the application itself. For example, Java applications can expose JMX ports to monitor metrics, or an e-commerce application can expose the number of online users or how many sales were made in the last hour. Much like in the case of any legacy application, these metrics and their KPIs are determined by the application owners and developers and provide insight into how the application is performing.
Much like most technologies, simply collecting K8s metrics is a relatively easy task. Interpreting them, on the other hand, may not be.
Kubernetes introduces multiple new layers of abstraction and virtualization to your already complex environment. Simply alerting on key K8s metrics will allow K8s experts to act when problems do occur, but what about your application and infrastructure administrators that are not K8s experts? If your operational teams don’t understand Kubernetes in intimate detail, metrics alone won’t be enough – you need a tool that provides topology mapping to allow administrators to easily zoom in to find the impacted resource when a problem does occur.
To that end, in NetApp Cloud Insights, we’ve developed a simple topology view of Kubernetes clusters to allow even non-experts to identify issues and dependencies at the container level. This centralizes cluster, node, and application metrics alongside wider metrics from resources and services both on-prem and in the cloud.
All-in-one monitoring tools are an invaluable resource to an operations team because they allow you to visualize and alert on the entire application stack in one place. For example, it’s possible to aggregate frontend response times with backend server loads and the number of instances of a container in order to quickly identify and solve an issue. A system-wide issue may not be due to a single problem, but to a conjunction of problems.
There was a time when monitoring all metrics to keep a system healthy was a task for multiple teams. First, there was the application team, with its dashboards and custom metrics. Then there were ops teams handling multiple servers and allocating resources for each service available. Lastly, the IT teams monitored physical machines and hardware changes.
With monitoring tools like Cloud Insights, which easily integrates all metrics into a single dashboard, it’s simple to monitor Kubernetes health, along with the health of the rest of your infrastructure and services, without dedicated teams.