Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Upgrade Kubernetes cluster on GKE

GKE’s default upgrade policy

GKE has helpful documentation on how it handles standard cluster upgrades. Most notably, the default upgrade strategy is called “surge”, which mimics what we refer to as “rolling upgrades” in other parts of our documentation. The basic steps remain consistent but are handled automatically be GKE:

  1. A node is cordoned so that new pods are not scheduled to it

  2. The node is drained. For the surge strategy, PodDisruptionBudget and GracefulTerminationPeriod settings are respected for up to one hour.

  3. The control plane reschedules pods managed by controllers onto other nodes. Pods that cannot be rescheduled remain in a Pending state until rescheduling is possible.

Due to these strategies, this means that changing the Kubernetes version via our terraform code produces a non-destructive plan, i.e., the plan states that the nodes can be updated in place, rather than being destroyed and recreated.

Upgrading a cluster

Setup your environment:

export CLUSTER_NAME=<cluster-name>
cd terraform/gcp
terraform init
terraform workspace select $CLUSTER_NAME

Zonal vs. regional clusters

Check supported versions in terraform

There is a terraform output variable that provides the most up-to-date version supported by GKE for a range of predefined Kubernetes version prefixes:

terraform output regular_channel_latest_k8s_versions

You may wish to edit the prefixes checked in the variables.tf file.

Upgrading the control plane

Within the k8s_versions block of the cluster’s .tfvars file, increment the min_master_version variable according to the regular_channel_latest_k8s_versions output.

k8s_versions = {
  min_master_version : "1.32.1-gke.1357001",  # Increment this
  core_nodes_version : "1.32.1-gke.1357001",
  notebook_nodes_version : "1.32.1-gke.1357001",
}

Then plan and apply the changes:

terraform plan -var-file=projects/$CLUSTER_NAME.tfvars
terraform apply -var-file=projects/$CLUSTER_NAME.tfvars

When upgrading the control plane, it is best to upgrade one minor version at a time until you reach your desired version, e.g., 1.30 -> 1.31 -> 1.32.

Upgrading node pools

The process for upgrading the worker node pools is very similar to that for upgrading the control plane.

First, upgrade the core node pool by updating the core_nodes_version variable in the k8s_versions block of the cluster’s .tfvars file, and plan and apply the changes.

Then upgrade the server node pools by updating the notebook_node_version variable, and the dask_nodes_version variable if present, and plan and apply the changes.

Unlike with the control plane, you can jump straight from the current version to the desired version.