Enable user access to cloud features#
Users of our hubs often need to be granted specific cloud permissions so they can use features of the cloud provider they are on, without having to do a bunch of cloud-provider specific setup themselves. This helps keep code cloud provider agnostic as much as possible, while also improving the security posture of our hubs.
This page lists various features we offer around access to cloud resources, and how to enable them.
How it works#
GCP#
On Google Cloud Platform, we use Workload Identity to map a particular Kubernetes Service Account to a particular Google Cloud Service Account. All pods using the Kubernetes Service Account (user’s jupyter notebook pods as well as dask worker pods) will have the permissions assigned to the Google Cloud Service Account. This Google Cloud Service Account is managed via terraform.
AWS#
On AWS, we use IRSA to map a particular Kubernetes Service Account to a particular AWS IAM Role. All pods using the Kubernetes Service Account (user’s jupyter notebook pods as well as dask worker pods) will have the permissions assigned to the AWS IAM Role. This AWS IAM Role is managed via terraform.
Enabling specific cloud access permissions#
In the
.tfvars
file for the project in which this hub is based off create (or modify) thehub_cloud_permissions
variable. The config is like:hub_cloud_permissions = { "<hub-name-slug>": { requestor_pays : true, bucket_admin_access : ["bucket-1", "bucket-2"] hub_namespace : "<hub-name>" } }
where:
<hub-name-slug>
is the name of the hub, but restricted in length. This and the cluster name together can’t be more than 29 characters.terraform
will complain if you go over this limit, so in general just use the name of the hub and shorten it only ifterraform
complains.(GCP only)
requestor_pays
enables permissions for user pods and dask worker pods to identify as the project while making requests to Google Cloud Storage buckets marked as ‘requestor pays’. More details here.bucket_admin_access
lists bucket names (as specified inuser_buckets
terraform variable) all users on this hub should have full read/write access to. Used along with the user_buckets terraform variable to enable the scratch buckets feature.(GCP only)
hub_namespace
is the full name of the hub, as hubs are put in Kubernetes Namespaces that are the same as their names. This is explicitly specified here because<hub-name-slug>
could possibly be truncated on GCP.
Run
terraform apply -var-file=projects/<cluster-var-file>.tfvars
, and look at the plan carefully. It should only be creating or modifying IAM related objects (such as roles and service accounts), and not really touch anything else. When it looks good, accept the changes and apply it. This provisions a Google Cloud Service Account (if needed) and grants it the appropriate permissions.We will need to connect the Kubernetes Service Account used by the jupyter and dask pods with this Google Cloud Service Account. This is done by setting an annotation on the Kubernetes Service Account.
Run
terraform output kubernetes_sa_annotations
, this should show you a list of hubs and the annotation required to be set on them:$ terraform output kubernetes_sa_annotations { "prod" = "iam.gke.io/gcp-service-account: meom-ige-prod@meom-ige-cnrs.iam.gserviceaccount.com" "staging" = "iam.gke.io/gcp-service-account: meom-ige-staging@meom-ige-cnrs.iam.gserviceaccount.com" }
$ terraform output kubernetes_sa_annotations { "prod" = "eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-prod" "staging" = "eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-staging" }
This shows all the annotations for all the hubs configured to provide cloud access in this cluster. You only need to care about the hub you are currently dealing with.
(If needed) create a
.values.yaml
file specific to this hub underconfig/clusters/<cluster-name>
, and add it underhelm_chart_values_files
for the appropriate hub inconfig/clusters/<cluster-name>/cluster.yaml
.Specify the annotation from step 4, nested under
userServiceAccount.annotations
.userServiceAccount: annotations: iam.gke.io/gcp-service-account: meom-ige-staging@meom-ige-cnrs.iam.gserviceaccount.com"
userServiceAccount: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-staging
Note
If the hub is a
daskhub
, nest the config under abasehub
keyGet this change deployed, and users should now be able to use the requestor pays feature! Currently running users might have to restart their pods for the change to take effect.
Granting access to cloud buckets in other cloud accounts / projects#
Sometimes, users on a hub we manage need access to a storage bucket managed by an external third party - often a different research group. This can help with access to raw data, collaboration, etc.
This section outlines how to grant this access. Currently, this functionality is implemented only on AWS - but we can add it for other cloud providers when needed.
AWS#
On AWS, we would need to set up cross account S3 access.
Find the ARN of the service account used by the users on the hub. You can find this under
userServiceAccount.annotations.eks.amazon.com/role-arn
in thevalues.yaml
file for your hub. It should look something likearn:aws:iam::<account-id>:role/<hub-name>
.In the AWS account with the S3 bucket, create an IAM policy that grants appropriate access to the S3 bucket from the hub. For example, the following policy grants readonly access to the bucket for users of the hub
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "<arn-of-service-account-from-step-1>" }, "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<name-of-bucket>", "arn:aws:s3:::<name-of-bucket>/*" ] } ] }
You can add additional permissions to the bucket if needed here.
Note
You can list as many buckets as you want, but each bucket needs two entries - one with the
/*
and one without so both listing the bucket as well as fetching data from it can workIn the
.tfvars
file for the cluster hosting the hub, addextra_iam_policy
as a key to the hub underhub_cloud_permissions
. This is used to set any additional IAM permissions granted to the users of the hub. In this case, you should copy the exact policy that was applied to the bucket in step 2, but remove the “Principal” key. So it would look something like:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<name-of-bucket>", "arn:aws:s3:::<name-of-bucket>/*" ] } ] }
Apply the terraform config, and test out if s3 bucket access works on the hub!