New Kubernetes cluster on GCP or Azure#

This guide will walk through the process of adding a new cluster to our terraform configuration.

You can find out more about terraform in Terraform and their documentation.

Attention

Currently, we do not deploy clusters to AWS solely using terraform. Please see New Kubernetes cluster on AWS for AWS-specific deployment guidelines.

Cluster Design#

This guide will assume you have already followed the guidance in Cluster design considerations to select the appropriate infrastructure.

Create a Terraform variables file for the cluster#

The first step is to create a .tfvars file in the appropriate terraform projects subdirectory:

Give it a descriptive name that at a glance provides context to the location and/or purpose of the cluster.

The minimum inputs this file requires are:

  • prefix: Prefix for all objects created by terraform. Primary identifier to ‘group’ together resources.

  • project_id: GCP Project ID to create resources in. Should be the id, rather than display name of the project.

  • regional_cluster: Set to true to provision a GKE Regional Highly Available cluster. Costs ~70$ a month, but worth it for the added reliability for most cases except when cost saving is an absolute requirement. Defaults to true.

  • zone: Zone where cluster nodes and filestore for home directory are created.

  • region: Region where cluster master (if regional_cluster is true) is run, as well as any storage buckets created with user_buckets.

See the variables file for other inputs this file can take and their descriptions.

Example .tfvars file:

prefix           = "my-awesome-project"
project_id       = "my-awesome-project-id"
zone             = "us-central1-c"
region           = "us-central1"
regional_cluster = true

The minimum inputs this file requires are:

  • subscription_id: Azure subscription ID to create resources in. Should be the id, rather than display name of the project.

  • resourcegroup_name: The name of the Resource Group to be created by terraform, where the cluster and other resources will be deployed into.

  • global_container_registry_name: The name of an Azure Container Registry to be created by terraform to use for our image. This must be unique across all of Azure. You can use the following Azure CLI command to check your desired name is available:

    az acr check-name --name ACR_NAME --output table
    
  • global_storage_account_name: The name of a storage account to be created by terraform to use for Azure File Storage. This must be unique across all of Azure. You can use the following Azure CLI command to check your desired name is available:

    az storage account check-name --name STORAGE_ACCOUNT_NAME --output table
    
  • ssh_pub_key: The public half of an SSH key that will be authorised to login to nodes.

See the variables file for other inputs this file can take and their descriptions.

Naming Convention Guidelines for Container Registries and Storage Accounts

Names for Azure container registries and storage accounts must conform to the following guidelines:

  • alphanumeric strings between 5 and 50 characters for container registries, e.g., myContainerRegistry007

  • lowercase letters and numbers strings between 2 and 24 characters for storage accounts, e.g., mystorageaccount314

Note

A failure will occur if you try to create a storage account whose name is not entirely lowercase.

We recommend the following conventions using lowercase:

  • {CLUSTER_NAME}hubregistry for container registries

  • {CLUSTER_NAME}hubstorage for storage accounts

Note

Changes in Azure’s own requirements might break our recommended convention. If any such failure occurs, please signal it.

This increases the probability that we won’t take up a namespace that may be required by the Hub Community, for example, in cases where we are deploying to Azure subscriptions not owned/managed by 2i2c.

Example .tfvars file:

subscription_id                = "my-awesome-subscription-id"
resourcegroup_name             = "my-awesome-resource-group"
global_container_registry_name = "myawesomehubregistry"
global_storage_account_name    = "myawesomestorageaccount"
ssh_pub_key                    = "ssh-rsa my-public-ssh-key"

Once you have created this file, open a Pull Request to the infrastructure repo for review. See our review and merge guidelines for how this process should pan out.

Initialising Terraform#

Our default terraform state is located centrally in our two-eye-two-see-org GCP project, therefore you must authenticate gcloud to your @2i2c.org account before initialising terraform. The terraform state includes all cloud providers, not just GCP.

gcloud auth application-default login

Then you can change into the terraform subdirectory for the appropriate cloud provider and initialise terraform.

cd terraform/gcp
terraform init -backend-config=backends/default-backend.hcl -reconfigure
cd terraform/azure
terraform init

Note

There are other backend config files stored in terraform/backends that will configure a different storage bucket to read/write the remote terraform state for projects which we cannot access from GCP with our @2i2c.org email accounts. This saves us the pain of having to handle multiple authentications as these storage buckets are within the project we are trying to deploy to.

For example, to work with Pangeo you would initialise terraform like so:

terraform init -backend-config=pangeo-backend.hcl -reconfigure

Creating a new terraform workspace#

We use terraform workspaces so that the state of one .tfvars file does not influence another. Create a new workspace with the below command, and again give it the same name as the .tfvars filename.

terraform workspace new WORKSPACE_NAME

Note

Workspaces are defined per backend. If you can’t find the workspace you’re looking for, double check you’ve enabled the correct backend.

Plan and Apply Changes#

Note

When deploying to Google Cloud, make sure the Compute Engine, Kubernetes Engine, Artifact Registry, and Cloud Logging APIs are enabled on the project before deploying!

First, make sure you are in the new workspace that you just created.

terraform workspace show

Plan your changes with the terraform plan command, passing the .tfvars file as a variable file.

terraform plan -var-file=projects/CLUSTER.tfvars

Check over the output of this command to ensure nothing is being created/deleted that you didn’t expect. Copy-paste the plan into your open Pull Request so a fellow 2i2c engineer can double check it too.

If you’re both satisfied with the plan, merge the Pull Request and apply the changes to deploy the cluster.

terraform apply -var-file=projects/CLUSTER.tfvars

Congratulations, you’ve just deployed a new cluster!

Exporting and Encrypting the Cluster Access Credentials#

To begin deploying and operating hubs on your new cluster, we need to export the credentials created by terraform, encrypt it using sops, and store it in the secrets directory of the infrastructure repo.

Check you are still in the correct terraform workspace

terraform workspace show

If you need to change, you can do so as follows

terraform workspace list  # List all available workspaces
terraform workspace select WORKSPACE_NAME

Then, output the credentials created by terraform to a file under the appropriate cluster directory: /config/clusters/$CLUSTER_NAME.

Note

Create the cluster directory if it doesn’t already exist with:

export CLUSTER_NAME=<cluster-name>
mkdir -p ../../config/clusters/$CLUSTER_NAME
terraform output -raw ci_deployer_key > ../../config/clusters/$CLUSTER_NAME/deployer-credentials.secret.json
terraform output -raw kubeconfig > ../../config/clusters/$CLUSTER_NAME/deployer-credentials.secret.yaml

Then encrypt the key using sops.

Note

You must be logged into Google with your @2i2c.org account at this point so sops can read the encryption key from the two-eye-two-see project.

First, make sure you are in the root of the repository

cd ../..
sops --output config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.{{ json | yaml }} --encrypt config/clusters/$CLUSTER_NAME/deployer-credentials.secret.{{ json | yaml }}

This key can now be committed to the infrastructure repo and used to deploy and manage hubs hosted on that cluster.

Create a cluster.yaml file#

See also

We use cluster.yaml files to describe a specific cluster and all the hubs deployed onto it. See Configuration structure for more information.

Create a cluster.yaml file under the config/cluster/$CLUSTER_NAME> folder and populate it with the following info:

name: <cluster-name>  # This should also match the name of the folder: config/clusters/$CLUSTER_NAME>
provider: gcp
gcp:
  # The location of the *encrypted* key we exported from terraform
  key: enc-deployer-credentials.secret.json
  # The name of the GCP project the cluster is deployed in
  project: <gcp-project-name>
  # The name of the cluster *as it appears in the GCP console*! Sometimes our
  # terraform code appends '-cluster' to the 'name' field, so double check this.
  cluster: <cluster-name-in-gcp>
  # The GCP zone the cluster in deployed in. For multi-regional clusters, you
  # may have to strip the last identifier, i.e., 'us-central1-a' becomes 'us-central1'
  zone: <gcp-zone>
  billing:
   # Set to true if billing for this cluster is paid for by the 2i2c card
   paid_by_us: true
   bigquery:
    # contains information about bigquery billing export (https://cloud.google.com/billing/docs/how-to/export-data-bigquery)
    # for calculating how much this cluster costs us. Required if `paid_by_us` is
    # set to true.
    project: <id-of-gcp-project-where-bigquery-dataset-lives>
    dataset: <id-of-bigquery-dataset>
    billing_id: <id-of-billing-account-associated-with-this-project>

Billing information

For projects where we are paying the cloud bill & then passing costs through, you need to fill in information under gcp.billing.bigquery and set gcp.billing.paid_by_us to true. Partnerships should be able to tell you if we are doing cloud costs pass through or not, and eventually this should be provided by a single source of truth for all contracts.

  1. Going to the Billing Tab on Google Cloud Console

  2. Make sure the correct project is selected in the top bar. You might have to select the ‘All’ tab in the project chooser if you do not see the project right away.

  3. Click ‘Go to billing account’

  4. In the default view (Overview) that opens, you can find the value for billing_id in the right sidebar, under “Billing Account”. It should be of the form XXXXXX-XXXXXX-XXXXXX.

  5. Select “Billing export” on the left navigation bar, and you will find the values for project and dataset under “Detailed cost usage”.

  6. If “Detailed cost usage” is not set up, you should enable it

Warning

We use this config only when we do not have permissions on the Azure subscription to create a Service Principal with terraform.

name: <cluster-name>  # This should also match the name of the folder: config/clusters/$CLUSTER_NAME
provider: kubeconfig
kubeconfig:
  # The location of the *encrypted* key we exported from terraform
  file: enc-deployer-credentials.secret.yaml
name: <cluster-name>  # This should also match the name of the folder: config/clusters/$CLUSTER_NAME
provider: azure
azure:
  # The location of the *encrypted* key we exported from terraform
  key: enc-deployer-credentials.secret.json
  # The name of the cluster *as it appears in the Azure Portal*! Sometimes our
  # terraform code adjusts the contents of the 'name' field, so double check this.
  cluster: <cluster-name>
  # The name of the resource group the cluster has been deployed into. This is
  # the same as the resourcegroup_name variable in the .tfvars file.
  resource_group: <resource-group-name>

Commit this file to the repo.

Adding the new cluster to CI/CD#

To ensure the new cluster is appropriately handled by our CI/CD system, please add it as an entry in the following places:

Cluster is now ready, what are the next steps#

Important

Cluster is now ready to perform the next steps:

  1. Deploy support components.

  2. Deploy a new hub