New Kubernetes cluster on AWS
New Kubernetes cluster on AWS#
We use eksctl to provision our k8s clusters on AWS and terraform to provision supporting infrastructure, such as storage buckets.
Install required tools locally#
Follow the instructions outlined in Manually deploy a config change to set up the local environment and prepare
sopsto encrypt and decrypt files.
awsclitool (you can use pip or conda to install it in the environment) and configure it to use the provided AWS user credentials. More about setting out the credentials in Setup credentials.
Install the latest version of eksctl. Mac users can get it from homebrew with
brew install eksctl. Make sure the version is at least 0.97 - you can check by running
Create a new cluster#
Depending on wether this project is using AWS SSO or not, you can use the following links to figure out how to authenticate to this project from your terminal.
Generate cluster files#
We automatically generate the files required to setup a new cluster:
.jsonnetfile for use with
An ssh key for use with eksctl
.tfvarsfile for use with
You can generate these with:
python3 deployer generate-cluster <cluster-name> aws
This will generate the following files:
eksctl/<cluster-name>.jsonnetwith a default cluster configuration, deployed to
sopsencrypted ssh private key that can be used to ssh into the kubernetes nodes.
eksctl/ssh-keys/<cluster-name>.pub, an ssh public key used by
eksctlto grant access to the private key.
terraform/aws/projects/<cluster-name>.tfvars, a terraform variables file that will setup most of the non EKS infrastructure.
Create and render an eksctl config file#
We use an eksctl config file in YAML to specify
how our cluster should be built. Since it can get repetitive, we use
jsonnet to declaratively specify this config. You can
.jsonnet files for the current clusters in the
The previous step should’ve created a baseline
.jsonnet file you can modify as
you like. The eksctl docs have a reference
for all the possible options. You’d want to make sure to change at least the following:
Region / Zone - make sure you are creating your cluster in the correct region!
Size of nodes in instancegroups, for both notebook nodes and dask nodes. In particular, make sure you have enough quota to launch these instances in your selected regions.
Kubernetes version - older
.jsonnetfiles might be on older versions, but you should pick a newer version when you create a new cluster.
Once you have a
.jsonnet file, you can render it into a config file that eksctl
jsonnet <your-cluster>.jsonnet > <your-cluster>.eksctl.yaml
There’s no requirement to commit the
*.eksctl.yaml file to the repository since we can regenerate it using the above
Create the cluster#
Now you’re ready to create the cluster!
eksctl create cluster --config-file <your-cluster>.eksctl.yaml
Make sure the run this command inside the
eksctl directory, otherwise it cannot discover the
This might take a few minutes.
If any errors are reported in the config (there is a schema validation step),
fix it in the
.jsonnet file, re-render the config, and try again.
Once it is done, you can test access to the new cluster with
getting credentials via:
aws eks update-kubeconfig --name=<your-cluster-name> --region=<your-cluster-region>
kubectl should be able to find your cluster now!
kubectl get node should show
you at least one core node running.
Deploy Terraform-managed infrastructure#
Our AWS terraform code is now used to deploy supporting infrastructure for the EKS cluster, including:
An IAM identity account for use with our CI/CD system
Appropriately networked EFS storage to serve as an NFS server for hub home directories
Optionally, setup a shared database
Optionally, setup user buckets
We still store terraform state
in GCP, so you also need to have
gcloud set up and authenticated already.
The steps in Generate cluster files will have created a default
.tfvarsfile. This file can either be used as-is or edited to enable the optional features listed above.
Initialise terraform for use with AWS:
cd terraform/aws terraform init
Create a new terraform workspace
terraform workspace new <your-cluster-name>
Deploy the terraform-managed infrastructure
terraform plan -var-file projects/<your-cluster-name>.tfvars
Observe the plan carefully, and accept it.
terraform apply -var-file projects/<your-cluster-name>.tfvars
Export account credentials with finely scoped permissions for automatic deployment#
In the previous step, we will have created an AWS IAM user with just enough permissions for automatic deployment of hubs from CI/CD. Since these credentials are checked-in to our git repository and made public, they should have least amount of permissions possible.
First, make sure you are in the right terraform directory:
Fetch credentials for automatic deployment
terraform output -raw continuous_deployer_creds > ../../config/clusters/<your-cluster-name>/deployer-credentials.secret.json
Encrypt the file storing the credentials
sops --output ../../config/clusters/<your-cluster-name>/enc-deployer-credentials.secret.json --encrypt ../../config/clusters/<your-cluster-name>/deployer-credentials.secret.json
Double check to make sure that the
config/clusters/<your-cluster-name>/enc-deployer-credentials.secret.jsonfile is actually encrypted by
sopsbefore checking it in to the git repo. Otherwise this can be a serious security leak!
Grant the freshly created IAM user access to the kubernetes cluster.
As this requires passing in some parameters that match the created cluster, we have a
terraform outputthat can give you the exact command to run.
terraform output -raw eksctl_iam_command
eksctl create iamidentitymappingcommand returned by
terraform output. That should give the continuous deployer user access.
The command should look like this:
eksctl create iamidentitymapping \ --cluster <your-cluster-name> \ --region <your-cluster-region> \ --arn arn:aws:iam::<aws-accout-id>:user/hub-continuous-deployer \ --username hub-continuous-deployer \ --group system:masters
Create a minimal
config/clusters/<your-cluster-name>/cluster.yaml), and provide enough information for the deployer to find the correct credentials.
name: <your-cluster-name> provider: aws aws: key: enc-deployer-credentials.secret.json clusterType: eks clusterName: <your-cluster-name> region: <your-region> hubs: 
aws.keyfile is defined relative to the location of the
Test the access by running
deployer use-cluster-credentials <cluster-name>and running
kubectl get node. It should show you the provisioned node on the cluster if everything works out ok.
eksctl access to other users#
This section is still required even if the account is managed by SSO.
Though a user could run
python deployer use-cluster-credentials to gain access as well.
AWS EKS has a strange access control problem, where the IAM user who creates the cluster has full access without any visible settings changes, and nobody else does. You need to explicitly grant access to other users. Find the usernames of the 2i2c engineers on this particular AWS account, and run the following command to give them access:
You can modify the command output by running
terraform output -raw eksctl_iam_command as described in Export account credentials with finely scoped permissions for automatic deployment.
eksctl create iamidentitymapping \ --cluster <your-cluster-name> \ --region <your-cluster-region> \ --arn arn:aws:iam::<your-org-id>:user/<iam-user-name> \ --username <iam-user-name> \ --group system:masters
This gives all the users full access to the entire kubernetes cluster. They can
fetch local config with
aws eks update-kubeconfig --name=<your-cluster-name> --region=<your-cluster-region>
after this step is done.
This should eventually be converted to use an IAM Role instead, so we need not give each individual user access, but just grant access to the role - and users can modify them as they wish.
Export the EFS IP address for home directories#
The terraform run in the previous step will have also created an EFS instance to store the hub home directories, and sets up the network correctly to mount it.
Get the address a hub on this cluster should use for connecting to NFS with
terraform output nfs_server_dns, and set it in the hub’s config under
nfs.pv.serverIP (nested under
basehub when necessary) in the appropriate
Add the cluster to be automatically deployed#
The CI deploy-hubs workflow contains the list of clusters being automatically deployed by our CI/CD system. Make sure there is an entry for new AWS cluster.
A note on the support chart for AWS clusters#
When you deploy the support chart on an AWS cluster, you must enable the
cluster-autoscaler sub-chart, otherwise the node groups will not automatically scale.
Include the following in your
cluster-autoscaler: enabled: true autoDiscovery: clusterName: <cluster-name> awsRegion: <aws-region>