What's New in K8S 1.23?

February 16, 2022

Topics: Cloud Volumes ONTAP Advanced 10 minute read Kubernetes Generic

With Kubernetes’ third release of 2021, 1.23, Kubernetes added a total of 47 enhancements in all three levels: 11 stable enhancements, 16 graduating to beta (enabled by default), 19 introduced alpha features (which need you to have feature flag enabled to use), and one deprecation. How will this affect Kubernetes storage usage?

The 1.23 version is introducing some new and updated functionality, such as Pod Security Admission, Dual-stack IPv4/IPv6 Networking, Kubelet Container Runtime Interface, the general availability of Generic Inline Volume, and much more. There are also some deprecations like the deprecation of klog specific flags. This blog will discuss some of the significant changes in the v1.23 release.

To see details on a specific Kubernetes enhancement proposal (KEP), use the links below:

Storage
API Machinery
Apps
Node
Other

Storage

Skip Volume Ownership Change

Status: Stable

KEP: https://github.com/kubernetes/enhancements/issues/695

These features allow you to change the volume's ownership when it's a bind mount inside the container. Previously, when you tried to bind-mount a volume inside a container, all the file permissions would change based on the fsGroup value. If you have a really large volume, this process is super slow. If you have a really sensitive application, this process may break the application.

This update introduces fsGroupChangePolicy, which accepts two values, OnRootMismatch and Always, that will fix the issue. If set to OnRootMismatch, the volume permissions will change if the top-level directory doesn't have the same value as fsGroup. If set to always, the file permissions will continue to change if you bind-mount inside a container, as previously.

securityContext:
  fsGroupChangePolicy: "OnRootMismatch"/Always”

In-Tree to CSI Driver Migration for AWS EBS, GCE PD, and Azure Disk

Status: Alpha

KEP: AWS (https://feature.k8s.io/1487)

GCE (https://feature.k8s.io/1488)

Azure (https://feature.k8s.io/1490)

This is a continued effort to move from entry plugins to CSI, all while maintaining the original API (for example, the AWS EBS plugin can call out to the EBS CSI Plugin). This feature is in beta for AWS, GCP, and Azure.

This update is one part of a larger push to migrate CSI drivers outside of Kubernetes (from in-tree to out-of-tree).

Generic Inline Ephemeral Volumes

Status: Stable

KEP: https://github.com/kubernetes/enhancements/issues/1698

Generic Inline Volumes is now GA. The Generic Inline Volumes feature gives any storage driver that already offers support for dynamic provisioning to be able to use ephemeral volumes. It's similar to emptyDir, which provides a scratch directory per pod, usually empty after provisioning. It provides features like local or network-attached storage. Volumes can have initial data depending upon the drivers and parameters, and support features like snapshot, cloning, resizing, etc.

Recovering from resize failures

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/1790

One of the current issues with persistent volume claim (PVC) is that if you try to expand it to a size not supported by the storage provider, you will get an error. With the 1.23 update, you can now reduce the size of PVC and avoid getting the error message.

Always honor reclaim policy

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/2644

This enhancement determines whether the PV delete reclaim policy is honored. This fixes the issue where persistent volume (PV) is bound to persistent volume claim (PVC), and the order in which you delete the PV-PVC determines whether the PV delete reclaim policy is honored.

Auto Remove PVCs created by StatefulSets

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/1847

The new feature solves a long-standing problem with abandoned PVCs.

In the past, when StatefulSet automatically created PVCs, those PVCs weren’t deleted when the StatefulSet was deleted. To delete those PVCs, users would have to remove them manually. With the addition of this new auto remove feature, PVCs that are created by StatefulSet can now be removed automatically.

The deletion itself can happen at a few different points: when the StatefulSet is deleted or scaled down, or when the corresponding pod is deleted.

There is an optional field persistentVolumeClaimRetentionPolicy added to StatefulSet through which you can decide whether to delete or retain the PVCs:

apiVersion: apps/v1
kind: StatefulSet
metadata:
……
spec:
    persistentVolumeClaimRetentionPolicy:
        whenDeleted: Delete/Retain

API Machinery

Custom Resource Definition (CRD) Validation Expression Language

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/2876

Custom resource definition (CRD) can now be validated using the Common Expression Language (CEL), which will make CRD more self-contained, as you now write it as validation as code in the definition of the crd object.

Previously, users needed to use Admission webhooks to validate custom resource definition, but that is complicated. For example, if we want to ensure that the initial replica count is 4, we can't set the max replica count to 3. Even for a simple solution like this, you need to maintain the webhook. The new CRD validation expression language makes this much easier.

Common Expression Language (CEL) runs on the kube-apiserver and lets you do some lightweight verification without setting up the webhook.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
...
  schema:
   openAPIV3Schema:
      type: object
      properties:
        spec:
          x-kubernetes-validations:
            - rule: "self.minReplicas <= self.maxReplicas"
              message: "minReplicas cannot be larger than maxReplicas"
          type: object
          properties:
            minReplicas:
              type: integer
            maxReplicas:
             type: integer

Add Server-Side Unknown Field Validation

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/pull/2886

When the feature field is enabled, users attempting to send a Kubernetes object in the request that contains unknown or duplicate fields will receive a warning from the server that these fields need to be addressed. You can currently use a command like kubectl -validate=true on the client-side to check if the request fails if there are unknown fields on an object.

As kubectl -validate=true is a client-side feature, there are several limitations, as each client needs to implement the validation. With the introduction of this feature on the server-side (kube-api), validation now occurs on the server end. The valid fields are Ignore (ignores unknown duplicate fields), Warn (will respond with a warning for unknown duplicate fields), and Strict (which will send a request failed response for unknown duplicate fields).

Apps

Cronjobs

Status: Stable

KEP: https://github.com/kubernetes/enhancements/issues/19

Cronjobs have been stable since Kubernetes version 1.21. The work in version 1.23 is the cleanup of the old controller.

TTL Controller

Status: Stable

KEP: https://github.com/kubernetes/enhancements/issues/592

TTL (time to live) Controller is now stable, and it acts like a garbage collector which cleans up jobs and pods of jobs after they finish. You need to add a new field .spec.ttl.SecondsAfterFinished in the job after it finishes.

The TTL Kubernetes controller watches all the jobs and compare a finished job with .spec.conditions.lastTransitionTime with .spec.ttl.SecondsAfterFinished and the current time to see if the pods are done or jobs are done and then delete the corresponding pods if appropriate.

For example, in the below case, it will automatically delete job my-cleanup-job after 150 seconds after it finishes.

apiVersion: batch/v1
kind: Job
metadata:
  name: my-cleanup-job
spec:
  ttlSecondsAfterFinished: 150

minReadySeconds on StatefulSets

Status: Beta

KEP: https://github.com/kubernetes/enhancements/issues/2599

This feature allows an end-user to specify the minimum number of seconds that a pod must exist without crash-looping for the StatefulSets to be considered ready.

This is an existing feature in Deployments, DaemonSets, and ReplicaSets, so this update gives StatefulSets parity.

apiVersion: apps/v1
kind: StatefulSet
metadata:
…
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  minReadySeconds: 10

Node

Ephemeral Containers

Status: Beta

KEP: https://github.com/kubernetes/enhancements/issues/277

With the addition of this feature, we can now run a short-lived container that executes within the existing pod namespaces. To initiate these containers, users need to use the kubectl debug command, which observes the state of other pods and containers for debugging and troubleshooting purposes. For example

kubectl debug -it ephemeral-demo --image=busybox --target=<target pod name>

Kubelet Container Runtime Interface(CRI)

Status: Beta

KEP: https://github.com/kubernetes/enhancements/issues/2040

Kubelet Container Runtime Interface (CRI) is now in beta, which means the Container Runtime Interface (CRI) v1 APIs are now the default. This plugin allows you to use various container runtimes, including CRI-O or containerd, as alternatives to Docker.

Pod Priority Based Graceful Node Shutdown

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/2712

With the addition of this feature, when the kubectl performs a graceful shutdown of the pod, it will consider the pod priority values to determine the order in which pods should be stopped.

For example, a pod with priority 1000000000 will get 10 seconds to stop, a pod with priority 100000 will get 20 seconds, and the pod with priority 0 will get 30 seconds. So as you can see, different pods get a different stop time depending on the pod priority class.

shutdownGracePeriodByPodPriority:
- priority: 1000000000
shutdownGracePeriodSeconds: 10
- priority: 100000
         shutdownGracePeriodSeconds: 20
- priority: 0
shutdownGracePeriodSeconds: 30

gRPC Probe to Pod

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/2727

This feature adds the use of gRPC (HTTP/2 over TLS) to Liveness, Readiness, and Startup probes.

apiVersion: v1
kind: Pod
metadata:
...
    livenessProbe:
      grpc:
        port: 8080
      initialDelaySeconds: 10

Other

Pod Security Admission

Status: Beta

KEP: https://github.com/kubernetes/enhancements/issues/2579

Users may already be aware that Pod Security Policy (PSP) has been deprecated since update 1.21 and is currently targeted to be removed in 1.25. Pod Security Admission is the replacement for Pod Security Policy (PSP). Pod Security Admission is an admission controller that evaluates pods against a predefined set of pod security standards to admit or deny the pod from running in the given namespace.

Kubectl events

Status: Alpha

KEP: https://github.com/kubernetes/enhancements/issues/1440

With this update kubectl events add more features than kubectl get events. For example, a listing of events timeline for last N minutes, default sorting of events (limitation of kubectl get events –watch not able to sort events properly), etc.

One of the main challenges to adding functionality to the kubectl get events command is that it comes under kubectl get, which means you are impacting the entire kubectl get a tree. This command is catered to work with events and supports all the functionality provided by kubectl get events.

Deprecation of klog specific flags

Status: Deprecated

KEP: https://github.com/kubernetes/enhancements/issues/2845

This feature deprecated klog-specific flags as Kubernetes is simplifying logging in its components. This was discussed under the structured logging section to make logging simpler and easier to maintain and extend.

Dual-stack IPv4/IPv6 Networking

Status: Stable

KEP: https://github.com/kubernetes/enhancements/issues/563

Dual-stack IPv4/IPv6 networking is now stable. It was first introduced in 1.15 as alpha and then refactored in 1.20. Before 1.20, you needed to have a service per ipv4/ipv6 family. Starting from 1.20, the service API supports dual-stack, and it is now stable in 1.23. It adds support for pods, nodes, and services.

Conclusion

In this article we covered some of the new features to keep in mind when upgrading your Kubernetes cluster. There are a lot of really exciting improvements on the 1.23 release. You can read about all of these in more detail on the Kubernetes release page. And stay tuned, both there and here, as 1.24 has a target release date on April 19, 2022.

Yifat Perry

What's New in K8S 1.23?

Subscribe to our blog

February 16, 2022

Topics: Cloud Volumes ONTAP Advanced 10 minute read Kubernetes Generic

Storage

Skip Volume Ownership Change

In-Tree to CSI Driver Migration for AWS EBS, GCE PD, and Azure Disk

Generic Inline Ephemeral Volumes

Recovering from resize failures

Always honor reclaim policy

Auto Remove PVCs created by StatefulSets

API Machinery

Custom Resource Definition (CRD) Validation Expression Language

Add Server-Side Unknown Field Validation

Apps

Cronjobs

TTL Controller

minReadySeconds on StatefulSets

Node

Ephemeral Containers

Kubelet Container Runtime Interface(CRI)

Pod Priority Based Graceful Node Shutdown

gRPC Probe to Pod

Other

Pod Security Admission

Kubectl events

Deprecation of klog specific flags

Dual-stack IPv4/IPv6 Networking

Conclusion

Technical Content Manager

What's New in K8S 1.23?

Share

More about Kubernetes Storage

Subscribe to our blog

February 16, 2022

Topics: Cloud Volumes ONTAP Advanced10 minute readKubernetesGeneric

Storage

Skip Volume Ownership Change

In-Tree to CSI Driver Migration for AWS EBS, GCE PD, and Azure Disk

Generic Inline Ephemeral Volumes

Recovering from resize failures

Always honor reclaim policy

Auto Remove PVCs created by StatefulSets

API Machinery

Custom Resource Definition (CRD) Validation Expression Language

Add Server-Side Unknown Field Validation

Apps

Cronjobs

TTL Controller

minReadySeconds on StatefulSets

Node

Ephemeral Containers

Kubelet Container Runtime Interface(CRI)

Pod Priority Based Graceful Node Shutdown

gRPC Probe to Pod

Other

Pod Security Admission

Kubectl events

Deprecation of klog specific flags

Dual-stack IPv4/IPv6 Networking

Conclusion

Technical Content Manager

Topics: Cloud Volumes ONTAP Advanced 10 minute read Kubernetes Generic