Event infrastructure preparation checklist#
Below are listed the main aspects to consider adjusting on a hub to prepare it for an event:
1. Quotas#
We must ensure that the quotas from the cloud provider are high-enough to handle expected usage. It might be that the number of users attending the event is very big, or their expected resource usage is big, or both. Either way, we need to check the the existing quotas will accommodate the new numbers.
Action to take
follow the AWS quota guide for information about how to check the quotas in an AWS project
follow the GCP quota guide for information about how to check the quotas in a GCP project
3. Pre-warm the hub to reduce wait times#
There are two mechanisms that we can use to pre-warm a hub before an event:
making sure some nodes are ready when users arrive
This can be done using node sharing via profile lists or by setting a minimum node count.
Note
You can read more about what to consider when setting resource allocation options in profile lists in Resource Allocation on Profile Lists.
Expand this to find out the benefits of node sharing via profile lists
Specifically for events, the node sharing benefits via profile lists vs. setting a minimum node count are:
no
terraform/eksinfrastructure changesthey shouldn’t require modifying terraform/eks code in order to change the underlying cluster architecture thanks to Our instance type choice that should cover most usage needs
more cost flexibility
we can setup the infrastructure a few days before the event by opening a PR, and then just merge it as close to the event as possible. Deploying an infrastructure change for an event a few days before isn’t as costly as starting “x” nodes before, which required an engineer to be available to make terraform changes as close to the event as possible due to costs
less engineering intervention needed
the instructors are empowered to “pre-warm” the hub by starting notebook servers on nodes they wish to have ready.
the user image is not huge, otherwise pre-pulling it must be considered
3.1. Node sharing via profile lists#
Important
Currently, this is the recommended way to handle an event on a hub. However, for some communities that don’t already use profile lists, setting up one just before an event might be confusing, we might want to consider setting up a minimum node count in this case.
During events, we want to tilt the balance towards reducing server startup time. The docs at Resource Allocation on Profile Lists have more information about all the factors that should be considered during resource allocation.
Assuming this hub already has a profile list, before an event, you should check the following:
Information is available
Make sure the information in the event GitHub issue was filled in, especially the number of expected users before an event and their expected resource needs (if that can be known by the community beforehand).
Given the current setup, calculate how many users will fit on a node?
Check that the current number of users/node respects the following general event wishlist.
Minimize startup time
have at least
3-4 people on a nodeas few users per node cause longer startup times, but no more than ~100don’t have more than 30% of the users waiting for a node to come up
Action to take
If the current number of users per node doesn’t respect the rules above, you should adjust the instance type so that it does. Note that if you are changing the instance type, you should also consider re-writing the allocation options, especially if you are going with a smaller machine than the original one.
deployer generate resource-allocation choices <instance type>
Don’t oversubscribe resources
The oversubscription factor is how much larger a limit is than the actual request (aka, the minimum guaranteed amount of a resource that is reserved for a container). When this factor is greater, then a more efficient node packing can be achieved because usually most users don’t use resources up to their limit, and more users can fit on a node.
However, a bigger oversubscription factor also means that the users that use more resources than they are guaranteed can get their kernels killed or CPU throttled at some other times, based on what other users are doing. This inconsistent behavior is confusing to end users and the hub, so we should try and avoid this during events.
Action to take
For an event, you should consider an oversubscription factor of 1.
if the instance type remains unchanged, then just adjust the limit to match the memory guarantee if not already the case
if the instance type also changes, then you can use the
deployer generate resource-allocationcommand, passing it the new instance type and optionally the number of choices.You can then use its output to:
either replace all allocation options with the ones for the new node type
or pick the choice(s) that will be used during the event based on expected usage and just don’t show the others
Example
For example, if the community expects to only use ~3GB of memory during an event, and no other users are expected to use the hub for the duration of the event, then you can choose to only make available that one option.
Assuming they had 4 options on a
n2-highmem-2machine and we wish to move them on an2-highmem-4for the event, we could run:deployer generate resource-allocation choices n2-highmem-4 --num-allocations 4
which will output:
# pick this option to present the single ~3GB memory option for the event mem_3_4: display_name: 3.4 GB RAM, upto 3.485 CPUs kubespawner_override: mem_guarantee: 3662286336 mem_limit: 3662286336 cpu_guarantee: 0.435625 cpu_limit: 3.485 node_selector: node.kubernetes.io/instance-type: n2-highmem-4 default: true mem_6_8: display_name: 6.8 GB RAM, upto 3.485 CPUs kubespawner_override: mem_guarantee: 7324572672 mem_limit: 7324572672 cpu_guarantee: 0.87125 cpu_limit: 3.485 node_selector: node.kubernetes.io/instance-type: n2-highmem-4 (...2 more options)
And we would have this in the profileList configuration:
profileList: - display_name: Workshop description: Workshop environment default: true kubespawner_override: image: python:6ee57a9 profile_options: requests: display_name: Resource Allocation choices: mem_3_4: display_name: 3.4 GB RAM, upto 3.485 CPUs kubespawner_override: mem_guarantee: 3662286336 mem_limit: 3662286336 cpu_guarantee: 0.435625 cpu_limit: 3.485 node_selector: node.kubernetes.io/instance-type: n2-highmem-4
Warning
The
deployer generate resource-allocation:cam only generate options where guarantees (requests) equal limits!
supports the instance types located in
node-capacity-info.jsonfile
3.2. Setting a minimum node count on a specific node pool#
Warning
This section is a Work in Progress!
3.3. Pre-pulling the image#
Warning
This section is a Work in Progress!
Relevant discussions:
Important
To get a deeper understanding of the resource allocation topic, you can read up these issues and documentation pieces:
3.4 Add user placeholders#
Introducing user placeholders#
Whilst one strategy to minimising startup times is to increase the number of user pods that fit onto a single node (thereby reducing the number of times a node needs to be scaled up across the event), this approach will still introduce waiting points when a scale up is required.
A useful mechanism for reducing the likelihood of a blocking scale up is the use of user-placeholders. This technique involves scheduling pods that represent a “seat” or “placeholder” for anticipated user servers. Once users join the cluster, these pods are evicted by the scheduler to make room for the real user pods.
To illustrate this, consider a hub with a dedicated user nodepool. Each node can support 64 singleuser pods. Let’s imagine that the number of user placeholder replicas is set to 32:
Initially, there will be 32 user placeholder pods running on a single user node.
For the next 32 users that join, no scaling up will occur, and once all of the users have joined, the node will be “full” with 32 users and 32 placeholders.
Once the 33rd user joins, one of the user placeholders will be evicted, triggering a scale up to maintain the 32 user-node replica requirement.
Conventionally, one might imaging that the 34th user (and the 35th, etc.) will immediately be able to spawn their server pod, as the scheduler continues to evict pods. In practice this is not what happens. Instead, once the first placeholder pod is evicted and the autoscaler triggers a scale up, subsequent user pods are directly scheduled on the not-yet-ready node. This introduces head-of-line blocking for 31 of these 32 subsequent users.
Choosing placeholder resources#
As such, it is more effective to create a singular placeholder pods that represents $N$ users. For example, if we wish to reserve capacity for 32 users, and each user is guaranteed 1GiB of memory, then we would create a placeholder pod with a 32GiB memory request. In the extreme case, we might wish to reserve an entire node. We must be careful not to request more RAM than is available after the kube-system pods have been started on the node. In practice, this might mean leaving ~2GiB of memory (though this should be confirmed through testing).
In event conditions, we might anticipate a particular rate at which users attempt to start their servers. Measurements of this value can be derived from previous event instances. If one such event needs to support 60 users every 3 minutes, and it takes 10 minutes for a node to spin up, we will need to have room for $60/3 * 10 = 200$ users. We can compute the appropriate number of resources accordingly.
Auxiliary considerations#
Simply ensuring that a node is ready to accept user pods is not sufficient to ensure that users do not experience delays when the user node pool is scaled up. In order to start user pods, the cluster needs to pull their respective OCI container images from the container registry. If we do not do this ahead of time, the first user to require a particular image will need to wait for it to be pulled to the cluster, as will all other image users.
We can anticipate this be pre-pulling the various images required at node startup time, using the continuous image puller (see 3.3. Pre-pulling the image). This should only be used on dedicated nodepools in which the number of images is small (as pulling a set of images delays the time until the node is considered available to user pods).
An example configuration might look something like
jupyterhub:
prePuller:
continuous:
enabled: true
scheduling:
userPlaceholder:
# Keep at least half of a 64 GiB node free
replicas: 1
resources:
requests:
memory: 32Gi