More about Kubernetes Storage
- Kubernetes StorageClass: Concepts and Common Operations
- Kubernetes Data Mobility with Cloud Volumes ONTAP
- Scaling Kubernetes Persistent Volumes with Cloud Volumes ONTAP
- What's New in K8S 1.23?
- Kubernetes Topology-Aware Volumes and How to Set Them Up
- How to Use NetApp Cloud Manager for Provisioning Persistent Volumes in Kubernetes
- Kubernetes vs. Nomad: Understanding the Tradeoffs
- How to Set Up MySQL Kubernetes Deployments with Cloud Volumes ONTAP
- Kubernetes Volume Cloning with Cloud Volumes ONTAP
- Container Storage Interface: The Foundation of K8s Storage
- Kubernetes Deployment vs StatefulSet: Which is Right for You?
- Kubernetes for Developers: Overview, Insights, and Tips
- Kubernetes StatefulSet: A Practical Guide
- Kubernetes CSI: Basics of CSI Volumes and How to Build a CSI Driver
- Kubernetes Management and Orchestration Services: An Interview with Michael Shaul
- Kubernetes Database: How to Deploy and Manage Databases on Kubernetes
- Kubernetes and Persistent Apps: An Interview with Michael Shaul
- Kubernetes: Dynamic Provisioning with Cloud Volumes ONTAP and Astra Trident
- Kubernetes Cloud Storage Efficiency with Cloud Volumes ONTAP
- Data Protection for Persistent Data Storage in Kubernetes Workloads
- Managing Stateful Applications in Kubernetes
- Kubernetes: Provisioning Persistent Volumes
- An Introduction to Kubernetes
- Google Kubernetes Engine: Ultimate Quick Start Guide
- Azure Kubernetes Service Tutorial: How to Integrate AKS with Azure Container Instances
- Kubernetes Workloads with Cloud Volumes ONTAP: Success Stories
- Container Management in the Cloud Age: New Insights from 451 Research
- Kubernetes Storage: An In-Depth Look
- Monolith vs. Microservices: How Are You Running Your Applications?
- Kubernetes Shared Storage: The Basics and a Quick Tutorial
- Kubernetes NFS Provisioning with Cloud Volumes ONTAP and Trident
- Azure Kubernetes Service How-To: Configure Persistent Volumes for Containers in AKS
- Kubernetes NFS: Quick Tutorials
- NetApp Trident and Docker Volume Tutorial
With Kubernetes’ third release of 2021, 1.23, Kubernetes added a total of 47 enhancements in all three levels: 11 stable enhancements, 16 graduating to beta (enabled by default), 19 introduced alpha features (which need you to have feature flag enabled to use), and one deprecation. How will this affect Kubernetes storage usage?
The 1.23 version is introducing some new and updated functionality, such as Pod Security Admission, Dual-stack IPv4/IPv6 Networking, Kubelet Container Runtime Interface, the general availability of Generic Inline Volume, and much more. There are also some deprecations like the deprecation of klog specific flags. This blog will discuss some of the significant changes in the v1.23 release.
To see details on a specific Kubernetes enhancement proposal (KEP), use the links below:
- API Machinery
Skip Volume Ownership Change
These features allow you to change the volume's ownership when it's a bind mount inside the container. Previously, when you tried to bind-mount a volume inside a container, all the file permissions would change based on the fsGroup value. If you have a really large volume, this process is super slow. If you have a really sensitive application, this process may break the application.
This update introduces fsGroupChangePolicy, which accepts two values, OnRootMismatch and Always, that will fix the issue. If set to OnRootMismatch, the volume permissions will change if the top-level directory doesn't have the same value as fsGroup. If set to always, the file permissions will continue to change if you bind-mount inside a container, as previously.
securityContext: fsGroupChangePolicy: "OnRootMismatch"/Always”
In-Tree to CSI Driver Migration for AWS EBS, GCE PD, and Azure Disk
KEP: AWS (https://feature.k8s.io/1487)
This is a continued effort to move from entry plugins to CSI, all while maintaining the original API (for example, the AWS EBS plugin can call out to the EBS CSI Plugin). This feature is in beta for AWS, GCP, and Azure.
This update is one part of a larger push to migrate CSI drivers outside of Kubernetes (from in-tree to out-of-tree).
Generic Inline Ephemeral Volumes
Generic Inline Volumes is now GA. The Generic Inline Volumes feature gives any storage driver that already offers support for dynamic provisioning to be able to use ephemeral volumes. It's similar to emptyDir, which provides a scratch directory per pod, usually empty after provisioning. It provides features like local or network-attached storage. Volumes can have initial data depending upon the drivers and parameters, and support features like snapshot, cloning, resizing, etc.
Recovering from resize failures
One of the current issues with persistent volume claim (PVC) is that if you try to expand it to a size not supported by the storage provider, you will get an error. With the 1.23 update, you can now reduce the size of PVC and avoid getting the error message.
Always honor reclaim policy
This enhancement determines whether the PV delete reclaim policy is honored. This fixes the issue where persistent volume (PV) is bound to persistent volume claim (PVC), and the order in which you delete the PV-PVC determines whether the PV delete reclaim policy is honored.
Auto Remove PVCs created by StatefulSets
The new feature solves a long-standing problem with abandoned PVCs.
In the past, when StatefulSet automatically created PVCs, those PVCs weren’t deleted when the StatefulSet was deleted. To delete those PVCs, users would have to remove them manually. With the addition of this new auto remove feature, PVCs that are created by StatefulSet can now be removed automatically.
The deletion itself can happen at a few different points: when the StatefulSet is deleted or scaled down, or when the corresponding pod is deleted.
There is an optional field persistentVolumeClaimRetentionPolicy added to StatefulSet through which you can decide whether to delete or retain the PVCs:
apiVersion: apps/v1 kind: StatefulSet metadata:
…… spec: persistentVolumeClaimRetentionPolicy: whenDeleted: Delete/Retain
Custom Resource Definition (CRD) Validation Expression Language
Custom resource definition (CRD) can now be validated using the Common Expression Language (CEL), which will make CRD more self-contained, as you now write it as validation as code in the definition of the crd object.
Previously, users needed to use Admission webhooks to validate custom resource definition, but that is complicated. For example, if we want to ensure that the initial replica count is 4, we can't set the max replica count to 3. Even for a simple solution like this, you need to maintain the webhook. The new CRD validation expression language makes this much easier.
Common Expression Language (CEL) runs on the kube-apiserver and lets you do some lightweight verification without setting up the webhook.
apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition ... schema: openAPIV3Schema: type: object properties: spec: x-kubernetes-validations: - rule: "self.minReplicas <= self.maxReplicas" message: "minReplicas cannot be larger than maxReplicas" type: object properties: minReplicas: type: integer maxReplicas: type: integer
Add Server-Side Unknown Field Validation
When the feature field is enabled, users attempting to send a Kubernetes object in the request that contains unknown or duplicate fields will receive a warning from the server that these fields need to be addressed. You can currently use a command like kubectl -validate=true on the client-side to check if the request fails if there are unknown fields on an object.
As kubectl -validate=true is a client-side feature, there are several limitations, as each client needs to implement the validation. With the introduction of this feature on the server-side (kube-api), validation now occurs on the server end. The valid fields are Ignore (ignores unknown duplicate fields), Warn (will respond with a warning for unknown duplicate fields), and Strict (which will send a request failed response for unknown duplicate fields).
Cronjobs have been stable since Kubernetes version 1.21. The work in version 1.23 is the cleanup of the old controller.
TTL (time to live) Controller is now stable, and it acts like a garbage collector which cleans up jobs and pods of jobs after they finish. You need to add a new field .spec.ttl.SecondsAfterFinished in the job after it finishes.
The TTL Kubernetes controller watches all the jobs and compare a finished job with .spec.conditions.lastTransitionTime with .spec.ttl.SecondsAfterFinished and the current time to see if the pods are done or jobs are done and then delete the corresponding pods if appropriate.
For example, in the below case, it will automatically delete job my-cleanup-job after 150 seconds after it finishes.
apiVersion: batch/v1 kind: Job metadata: name: my-cleanup-job spec: ttlSecondsAfterFinished: 150
minReadySeconds on StatefulSets
This feature allows an end-user to specify the minimum number of seconds that a pod must exist without crash-looping for the StatefulSets to be considered ready.
This is an existing feature in Deployments, DaemonSets, and ReplicaSets, so this update gives StatefulSets parity.
apiVersion: apps/v1 kind: StatefulSet metadata: … spec: selector: matchLabels: app: nginx serviceName: "nginx" replicas: 3 minReadySeconds: 10
With the addition of this feature, we can now run a short-lived container that executes within the existing pod namespaces. To initiate these containers, users need to use the kubectl debug command, which observes the state of other pods and containers for debugging and troubleshooting purposes. For example
kubectl debug -it ephemeral-demo --image=busybox --target=<target pod name>
Kubelet Container Runtime Interface(CRI)
Kubelet Container Runtime Interface (CRI) is now in beta, which means the Container Runtime Interface (CRI) v1 APIs are now the default. This plugin allows you to use various container runtimes, including CRI-O or containerd, as alternatives to Docker.
Pod Priority Based Graceful Node Shutdown
With the addition of this feature, when the kubectl performs a graceful shutdown of the pod, it will consider the pod priority values to determine the order in which pods should be stopped.
For example, a pod with priority 1000000000 will get 10 seconds to stop, a pod with priority 100000 will get 20 seconds, and the pod with priority 0 will get 30 seconds. So as you can see, different pods get a different stop time depending on the pod priority class.
shutdownGracePeriodByPodPriority: - priority: 1000000000 shutdownGracePeriodSeconds: 10 - priority: 100000 shutdownGracePeriodSeconds: 20 - priority: 0 shutdownGracePeriodSeconds: 30
gRPC Probe to Pod
This feature adds the use of gRPC (HTTP/2 over TLS) to Liveness, Readiness, and Startup probes.
apiVersion: v1 kind: Pod metadata: ... livenessProbe: grpc: port: 8080 initialDelaySeconds: 10
Pod Security Admission
Users may already be aware that Pod Security Policy (PSP) has been deprecated since update 1.21 and is currently targeted to be removed in 1.25. Pod Security Admission is the replacement for Pod Security Policy (PSP). Pod Security Admission is an admission controller that evaluates pods against a predefined set of pod security standards to admit or deny the pod from running in the given namespace.
With this update kubectl events add more features than kubectl get events. For example, a listing of events timeline for last N minutes, default sorting of events (limitation of kubectl get events –watch not able to sort events properly), etc.
One of the main challenges to adding functionality to the kubectl get events command is that it comes under kubectl get, which means you are impacting the entire kubectl get a tree. This command is catered to work with events and supports all the functionality provided by kubectl get events.
Deprecation of klog specific flags
This feature deprecated klog-specific flags as Kubernetes is simplifying logging in its components. This was discussed under the structured logging section to make logging simpler and easier to maintain and extend.
Dual-stack IPv4/IPv6 Networking
Dual-stack IPv4/IPv6 networking is now stable. It was first introduced in 1.15 as alpha and then refactored in 1.20. Before 1.20, you needed to have a service per ipv4/ipv6 family. Starting from 1.20, the service API supports dual-stack, and it is now stable in 1.23. It adds support for pods, nodes, and services.
In this article we covered some of the new features to keep in mind when upgrading your Kubernetes cluster. There are a lot of really exciting improvements on the 1.23 release. You can read about all of these in more detail on the Kubernetes release page. And stay tuned, both there and here, as 1.24 has a target release date on April 19, 2022.