Configuration structure#
This page describes the basic structure of our hub configuration.
Inheritance of configuration#
These configuration sources are merged in the following order to produce config for each hub.
Source |
Maintainer |
Notes |
---|---|---|
Upstream defaults from z2jh |
z2jh maintainers |
|
2i2c engineers |
Common to all hubs run from this repo |
|
2i2c engineers |
There is one folder per cluster, and each cluster can have multiple hubs deployed defined by a |
|
2i2c engineers |
If there is a default set in the schema for available options in the configurator, it will always override the config elsewhere in our YAML files |
|
Configurator |
Hub admins |
If hub admins ‘unset’ a value, it should go to what’s configured via our yaml files |
Location of common configuration#
Finding out what the value for a particular piece of configuration is can be a bit tricky, since there are many places to look at. Here, we’ll look at some common pieces of config people want to know values for, and where you can find them.
Memory limits#
The default memory limit and guarantee for all users across all our hubs is set
in helm-charts/basehub/values.yaml
,
under jupyterhub.singleuser.memory
. This is sometimes overridden on a per-hub
basis in the config for the hub under config/clusters
2i2c staff lists#
The 2i2c team keeps a central list of staff usernames that it automatically adds as administrators to each of the hubs that we deploy.
When a hub is deployed, some custom code is run as part of jupyterhub_config.py
and the usernames corresponding to the correct staff list are added to the hub’s Admin and Allowed users.
You can find the list of staff usernames in the basehub
helm chart along with the custom code used to inject them.
Configuration structure#
We have a config/clusters
directory which contains folders that each contain configuration files describing a single cluster and the JupyterHubs running on that cluster.
See the 2i2c cluster folder for an example.
Each cluster folder must contain one (1) of the following files:
cluster.yaml
This file contains cluster-wide configuration specific to the cluster, including: the provider the cluster is hosted on, the path to a credentials file that grants programmatic access to the cluster, a list of JupyterHubs running on the cluster, and the paths to all configuration files that individually describes those JupyterHubs. We follow the convention that the
name
field defined in this file should match the name of the cluster folder it resides within.- Deployer credentials
A
sops
-encrypted YAML or JSON file that can provide programmatic access to a cluster by updating aKUBECONFIG
file. These credential files have the naming conventionenc-deployer-credentials.secret.{{ json | yaml }}
when ecrypted.- (Optional) Grafana API Token
This token allows us to programmatically create a range of useful dashboards in a cluster’s grafana deployment, allowing us to inspect and track the usage of all the JupyterHubs deployed to that cluster. See the Grafana dashboards documentation for more information on how this token is used. These token files have the naming convention
enc-grafana-token.secret.yaml
when encrypted.
Additionally, the cluster folder can contain any number of helm chart values files to describe any individual JupyterHub running on the cluster.
These files follow the naming convention <hub-name>.values.yaml
where <hub-name>
matches a hubs.[*].name
entry in the cluster.yaml
file, and the path to the values file is listed under the associated helm_chart_values_files
key relative to the cluster.yaml
file.
See the config for the staging hub on the 2i2c cluster for an example.
Secret helm chart values files
A hub’s helm chart values file can be encrypted as well, following the naming convention enc-<hub-name>.secret.values.yaml
Where we run dedicated clusters that only host a staging
and prod
hub, we aggregate all helm chart values shared by each hub into a common.values.yaml
file, and then describe the helm chart values specific to either staging
or prod
with a staging.values.yaml
or prod.values.yaml
file respectively.
See the Pangeo config for an example.
Conventions for our configuration structure#
When designing our configuration structure, we apply the following conventions.
The
cluster.yaml
file of any given cluster is the only file that our deployment infrastructure should have to calculate the filepath for. Other files required to deploy a given hub onto the given cluster should be explicitly defined relative to the position of thecluster.yaml
file.We should move configuration files around as needed to fit convention 1 in the most logical manner without traversing a lot of directories up/down where possible. We want to strictly avoid (but not enforce) overly-complicated relative filepaths, such as:
helm_chart_values_files: - this/is/a/directory/file.values.yaml - ../../../../../who/knows/where/this/is/file.values.yaml
While the first option could be considered reasonable since the filepath is explicit, the second one can become confusing due to the repetition of the
..
operator. If we find ourselves writing complicated relative filepaths like this, we should reassess and move that config to somewhere more logical.In the spirit of convention 2, we should not house our secrets under a specific folder and mimic the structure of
config/clusters
under it. Instead, we define all config for a cluster, encrypted or not, into the same place. We then use filenames to track if a file should be encrypted and whether it is currently encrypted or not. See Secrets and private keys for more information on this.