Manage a hub’s user environment#

Update the default user environment#

The default user environment is specified in its own GitHub repository, located at 2i2c-org/2i2c-hubs-image.

The image is built and pushed using jupyterhub/repo2docker-action to the 2i2c-hubs-image repository on quay.io.

To update this environment:

  1. Make the changes you need to make to the environment, by editing the files in the 2i2c-hubs-image repository and open a pull request. This will also provide you the ability to test your changes on Binder to make sure everything works as expected.

  2. Once the PR is merged or a commit to the main branch happens, the new image will be pushed to the registry.

  3. Get the latest tag from the tags section in quay.io

  4. Update jupyterhub.singleuser.image.tag in helm-charts/basehub/values.yaml with this tag.

Use a custom user image#

Community hubs use an image we curate as the default. This can be replaced with your own custom image fairly easily. However, custom images should be maintained by the hub admins, and we won’t be able to help much with things in it.

Image requirements#

On top of whatever you are using, your image should match the following requirements:

  1. The jupyterhub package is installed in such a way that the command jupyterhub-singleuser works when executed with the image. 99% of the time, this just means you need to install the jupyterhub python package.

  2. Everything should be able to run as a non-root user, most likely with a uid 1000.

There are a few options to help make this easier.

  1. Use repo2docker to build your image.

  2. Use a pangeo curated docker image. They have an ‘onbuild’ variant of their images that lets you easily customize them.

  3. Use a jupyter curated docker image.

Nothing will be installed on top of this by us, so you really have full control of what goes in your environment.

Which image registry to use?#

While you can push your image to dockerhub, it now has pretty strict usage limits. This could cause disruptions in your hub if a new image can not be pulled due to these rate limits. We recommend the following image repositories:

  1. quay.io. Owned by RedHat / IBM. Easiest to get started in, and recommended as the default

  2. Google Artifact Registry, if you already have infrastructure running on Google Cloud.

  3. AWS Elastic Container Registry if you already have infrastructure running on AWS.

  4. GitHub container registry. It integrates better with GitHub, but doesn’t have a clear policy on rate limits yet.

Configuring your hub to use the custom image#

We define arbitrary zero to jupyterhub on kubernetes values in config/clusters/CLUSTER_NAME/HUB_NAME.values.yaml files, which we can use to set the image name and tag. Here are some example values with only the useful bits.

jupyterhub:
  singleuser:
    image:
      name: pangeo/pangeo-notebook
      tag: "2020.12.08"

This can be any image name & tag available in a public container registry.

Whenever you push a new image, you should make a PR that updates the tag here. Only on merge will the hub get the new image.

Another way to update the image is to use the configurator.

Split up an image for use with the repo2docker-action#

Sometimes we have user images defined in a repo, and we want to extract it to a standalone repo so it can be used with the repo2docker action. But we want to retain full history of the image still, so we can look back and see why things are the way they are, as well as credit the people who contributed over time. This page documents the git-fu required to make this happen.

  1. Clone the base repository you are extracting the image from. We will be performing destructive operations on this clone, so it has to be a fresh clone.

    git clone <base-repo-url> <hub-image> --origin source
    

    We name the directory <hub-image> as that would be the name of the repo with just the user image. We ask git to name the remote it creates be called source, as we will create a new GitHub repo later that will be origin.

  2. Make sure git-filter-repo is installed. It should be available in brew or other package managers

  3. Run git-filter-repo to remove everything from the repo except the image directory.

     git filter-repo --subdirectory-filter <path-to-image-directory> --force
    

    The repo root directory now contains the contents of <path-to-image-directory>, as well as full git history for any commits that touched it! This way, we do not lose history or attribution.

  4. Create a user image repository in GitHub based off this template. The template makes it much easier to setup repo2docker-action on it.

  5. Add the new user image repo as a remote you can push commits to / pull commits from. This will be the primary location of this repo now, so let’s call it origin.

    git remote add origin git@github.com:<your-org-or-user-name>/<your-repo-name>.git
    
  6. Fetch the new repo and check out the main branch, as that is the final destination for our image contents.

     git fetch origin
     git checkout main
    
  7. Remove the environment.yml file from the repo - it is present as an example only, and we have our own we are bringing.

    git rm environment.yml
    git commit -m 'Remove unused environment.yml file'
    
  8. Merge the branch we prepared earlier with just our image contents into this main branch, while telling git to not freak out about them being from different repos initially.

    git merge staging --allow-unrelated-histories  -m 'Bringing in image directory from deployment repo'
    git push origin main
    

    In this case, staging was the name of the branch in the source repo, and main is the name of the main branch in the new user image repo.

  9. Now follow the instructions in the README of your new repo to complete setting up the repo2docker action as well as using this on your hub!