Enable user access to cloud features

Enable user access to cloud features#

Users of our hubs often need to be granted specific cloud permissions so they can use features of the cloud provider they are on, without having to do a bunch of cloud-provider specific setup themselves. This helps keep code cloud provider agnostic as much as possible, while also improving the security posture of our hubs.

This page lists various features we offer around access to cloud resources, and how to enable them.

How it works#

GCP#

On Google Cloud Platform, we use Workload Identity to map a particular Kubernetes Service Account to a particular Google Cloud Service Account. All pods using the Kubernetes Service Account (user’s jupyter notebook pods as well as dask worker pods) will have the permissions assigned to the Google Cloud Service Account. This Google Cloud Service Account is managed via terraform.

AWS#

On AWS, we use IRSA to map a particular Kubernetes Service Account to a particular AWS IAM Role. All pods using the Kubernetes Service Account (user’s jupyter notebook pods as well as dask worker pods) will have the permissions assigned to the AWS IAM Role. This AWS IAM Role is managed via terraform.

Enabling specific cloud access permissions#

  1. In the .tfvars file for the project in which this hub is based off create (or modify) the hub_cloud_permissions variable.

    Warning

    allow_access_to_external_requester_pays_buckets is not yet supported on AWS!

    The config is like:

    hub_cloud_permissions = {
        "<hub-name-slug>" : {
            allow_access_to_external_requester_pays_buckets : true,
            bucket_admin_access : ["bucket-1", "bucket-2"]
            hub_namespace : "<hub-name>"
        }
    }
    
    hub_cloud_permissions = {
        "<hub-name-slug>" : {
            bucket_admin_access : ["bucket-1", "bucket-2"]
            hub_namespace : "<hub-name>"
        }
    }
    

    where:

    1. <hub-name-slug> is the name of the hub, but restricted in length. This and the cluster name together can’t be more than 29 characters. terraform will complain if you go over this limit, so in general just use the name of the hub and shorten it only if terraform complains.

    2. (GCP only) allow_access_to_external_requester_pays_buckets enables permissions for user pods and dask worker pods to identify as the project while making requests to other Google Cloud Storage buckets, outside of this project, that have ‘Requester Pays’ enabled. More details here.

    3. bucket_admin_access lists bucket names (as specified in user_buckets terraform variable) all users on this hub should have full read/write access to. Used along with the user_buckets terraform variable to enable the scratch buckets feature.

    4. (GCP only) hub_namespace is the full name of the hub, as hubs are put in Kubernetes Namespaces that are the same as their names. This is explicitly specified here because <hub-name-slug> could possibly be truncated on GCP.

  2. Run terraform apply -var-file=projects/<cluster-var-file>.tfvars, and look at the plan carefully. It should only be creating or modifying IAM related objects (such as roles and service accounts), and not really touch anything else. When it looks good, accept the changes and apply it. This provisions a Google Cloud Service Account (if needed) and grants it the appropriate permissions.

  3. We will need to connect the Kubernetes Service Account used by the jupyter and dask pods with this Google Cloud Service Account. This is done by setting an annotation on the Kubernetes Service Account.

  4. Run terraform output kubernetes_sa_annotations, this should show you a list of hubs and the annotation required to be set on them:

    $ terraform output kubernetes_sa_annotations
    {
      "prod" = "iam.gke.io/gcp-service-account: meom-ige-prod@meom-ige-cnrs.iam.gserviceaccount.com"
      "staging" = "iam.gke.io/gcp-service-account: meom-ige-staging@meom-ige-cnrs.iam.gserviceaccount.com"
    }
    
    $ terraform output kubernetes_sa_annotations
    {
      "prod" = "eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-prod"
      "staging" = "eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-staging"
    }
    

    This shows all the annotations for all the hubs configured to provide cloud access in this cluster. You only need to care about the hub you are currently dealing with.

  5. (If needed) create a .values.yaml file specific to this hub under config/clusters/<cluster-name>, and add it under helm_chart_values_files for the appropriate hub in config/clusters/<cluster-name>/cluster.yaml.

  6. Specify the annotation from step 4, nested under userServiceAccount.annotations.

    userServiceAccount:
      annotations:
        iam.gke.io/gcp-service-account: meom-ige-staging@meom-ige-cnrs.iam.gserviceaccount.com"
    
    userServiceAccount:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-staging
    

    Note

    If the hub is a daskhub, nest the config under a basehub key

  7. Get this change deployed, and users should now be able to use the requester pays feature! Currently running users might have to restart their pods for the change to take effect.