Resizing Prometheus’ disk

Resizing Prometheus’ disk#

We may need to resize Prometheus’ disk that collects metrics data as we store more and more data.

On GCP clusters, the storage classes are set by default to permit auto-expansion. Therefore, simply defining a new persistent volume size in the support chart values and redeploying it, should suffice. However, this may not be the case on other cloud providers. The below steps will walk you through resizing the disk.

Resizing the disk#

# Set the KUBE_EDITOR env var to point to a text editor you're comfortable with
export KUBE_EDITOR="/usr/bin/nano"

# Set the name of the cluster to work against
export CLUSTER_NAME=...

# Authenticate against the cluster
deployer use-cluster-credentials $CLUSTER_NAME
  1. Set the desired size of the Prometheus server persistent volume in the relevant support.values.yaml file.

    prometheus:
      server:
        persistentVolume:
          size: <desired-size>
    
  2. Check the reclaim policy on the persistent volume.

    # List all the PVs. They are not namespaced.
    kubectl get pv
    
  3. Edit persistent volume’s reclaim policy to be Retain if it is not already. This will prevent us from losing the data Prometheus has already collected.

    kubectl edit pv <pv-name>
    
  4. Check the value of ALLOWVOLUMEEXPANSION of the default storage class, identified by (default) next to it’s name.

    kubectl get storageclass
    
  5. Set ALLOWVOLUMEEXPANSION to true if it is not. This will allow the persistent volumes to be dynamically resized.

    kubectl patch storageclass <storage-class-name> --patch '{\"allowVolumeExpansion\": true}'
    

Note

At the point, you could try to redeploy the support chart and see if it succeeds. If it doesn’t, continue with the steps.

  1. Delete the persistent volume claim for the prometheus server. Persistent volume claims cannot be patched so we will need to recreate it.

    # List all PVCs in the support namespace
    kubectl -n support get pvc
    
    # Delete the prometheus server PVC
    kubectl -n support delete pvc support-prometheus-server
    
  2. In another terminal with the CLUSTER_NAME variable set, redeploy the support chart. It should fail with the PVC in a Pending state.

    deployer deploy-support $CLUSTER_NAME
    
  3. Edit the persistent volume to have the same UID and resource version as the newly created PVC underspec.claimRef.

    # Get the UID and resource version of the PVC
    kubectl -n support get pvc support-prometheus-server -o yaml
    
    # Edit the PV to reference these values under `spec.claimRef`
    kubectl edit pv <pv-name>
    
  4. Delete the prometheus server pod and check that it comes back up.

    kubectl -n support delete pod support-prometheus-server-<hash>
    kubectl -n support get pods --watch
    
  5. Redeploy the support chart again and this time it should succeed.

    deployer deploy-support $CLUSTER_NAME