Configure HostPath Storage For Prometheus on OpenShift v3.11

4 minute read

This post covers the configuration of persistent storage with HostPath for OpenShift Monitoring stack.

In OpenShift v3.11, Prometheus cluster monitoring is now fully supported and deployed by default. As the monitoring stack is fairly new, there are many on-going discussions on supported storage plugins. And the documentation does not cover the detailed steps of setting up the storage. The configurability is also limited with Operator in placed which “protects” the configuration.

Through various implementations with HostPath requirement, I have figured a way to make HostPath storage works with Operator.

Configuration Steps

Assumptions

  • HostPath SCC is already created
  • HostPath SCC set to allow hostpath directory volume plugin
  • Granting access to this SCC to all users

Inventory file

It is mandatory set the following variables in the inventory file:

Variable Description
openshift_cluster_monitoring_operator_node_selector={‘role’:’metrics’} I have created a label, “role=metrics” to land the pods on the dedicated nodes for Prometheus/Alertmanager. These nodes should contain the HostPath for persistent storage.
openshift_cluster_monitoring_operator_prometheus_storage_capacity=50G The persistent volume claim size for each of the Prometheus instances. This variable applies only if openshift_cluster_monitoring_operator_prometheus_storage_enabled is set to true. Defaults to 50Gi.
openshift_cluster_monitoring_operator_alertmanager_storage_capacity=5Gi The persistent volume claim size for each of the Alertmanager instances. This variable applies only if openshift_cluster_monitoring_operator_alertmanager_storage_enabled is set to true. Defaults to 2Gi.
openshift_cluster_monitoring_operator_prometheus_storage_enabled=true Enable persistent storage for Prometheus
openshift_cluster_monitoring_operator_alertmanager_storage_enabled=true Enable persistent storage for Alertmanager

Run the playbook

$ ansible-playbook -i <INVENTORY>  /usr/share/ansible/openshift-ansible/playbooks/openshift-monitoring/config.yml```

Node hostpath configuration

  • Find the namespace ID: ``` $ oc get namespace openshift-monitoring -o yaml apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/description: Openshift Monitoring openshift.io/display-name: “” openshift.io/node-selector: “” openshift.io/sa.scc.mcs: s0:c11,c0 openshift.io/sa.scc.supplemental-groups: 1000110000/10000 openshift.io/sa.scc.uid-range: 1000110000/10000 creationTimestamp: 2019-04-29T03:48:23Z labels: openshift.io/cluster-monitoring: “true” name: openshift-monitoring resourceVersion: “15977564” selfLink: /api/v1/namespaces/openshift-monitoring uid: a3173b3e-6a31-11e9-b915-000c29451a06 spec: finalizers:
    • kubernetes status: phase: Active ```

In the example above, my namespace ID is 1000110000.

  • Change ownership and SELinux label of the HostPath:
      $ chown 1000110000:root /metrics/
      $ semanage fcontext -a -t svirt_sandbox_file_t /metrics
      $ restorecon -v /metrics/
      $ chown 1000110000:root /alertmanager/
      $ semanage fcontext -a -t svirt_sandbox_file_t /alertmanager
      $ restorecon -v /alertmanager/
    

Post Installation - Prometheus

After the playbook has completed successfully, you will see that the pods are starting:

 $ oc get pods
 NAME                                          READY     STATUS    RESTARTS   AGE
 cluster-monitoring-operator-6566bf44b-sb8wk   1/1       Running   0          1m
 prometheus-operator-77bc9b6c68-gw92h          1/1       Running   0          28s
  
 NAME                                          READY     STATUS    RESTARTS   AGE
 cluster-monitoring-operator-6566bf44b-sb8wk   1/1       Running   0          2m
 grafana-67cb69b946-684vc                      2/2       Running   0          55s
 prometheus-k8s-0                              0/4       Pending   0          5s
 prometheus-operator-77bc9b6c68-gw92h          1/1       Running   0          1m

By default, when openshift_cluster_monitoring_operator_prometheus_storage_enabled is set to true, it will create a default storage class and Prometheus-k8s-0 will remain in Pending state until it has successfully claims its PV. 

  • The first thing to do is to delete the PVC created by deployment:
      $ oc get pvc
      NAME                                 STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      prometheus-k8s-db-prometheus-k8s-0   Pending                                                      4m
    
      $ oc delete pvc prometheus-k8s-db-prometheus-k8s-0
      persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-0" deleted
    
  • Create the new PV for Prometheus:
      $ vi pv-prometheus-0.yml
    
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: prometheus-k8s-db-prometheus-k8s-0
      spec:
        capacity:
          storage: 100G
        accessModes:
          - ReadWriteOnce
        hostPath:
          path: /metrics
        persistentVolumeReclaimPolicy: Retain
      ClaimRef:
        name: prometheus-k8s-db-prometheus-k8s-0
        namespace: openshift-monitoring
    
      $ oc create -f pv-prometheus-0.yml
    

    Note that /metrics is the HostPath attached to the node.

  • Create the new PVC for Prometheus:
      $ vi pvc-prometheus-0.yml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: prometheus-k8s-db-prometheus-k8s-0
      spec:
        resources:
          requests:
            storage: 100G
        accessModes:
          - ReadWriteOnce
    
      $ oc create -f pvc-prometheus-0.yml -n openshift-monitoring
    
  • After this is done, the second Prometheus pod will start to spin up. So, repeat the steps 1 to 3, creating a new PV and PVC with prometheus-k8s-db-prometheus-k8s-1:
      $ vi pv-prometheus-1.yml
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: prometheus-k8s-db-prometheus-k8s-1
      spec:
        capacity:
          storage: 100G
        accessModes:
          - ReadWriteOnce
        hostPath:
          path: /metrics
        persistentVolumeReclaimPolicy: Retain
      ClaimRef:
        name: prometheus-k8s-db-prometheus-k8s-1
        namespace: openshift-monitoring
    
    
      $ vi pvc-prometheus-1.yml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: prometheus-k8s-db-prometheus-k8s-1
      spec:
        resources:
          requests:
            storage: 100G
      accessModes:
        - ReadWriteOnce
    

Post Installation - Alertmanager

After two Prometheus pods are spun up successfully, Alertmanager pods will be spinning up right after.

  $ oc get pods
  NAME                                          READY     STATUS    RESTARTS   AGE
  alertmanager-main-0                           0/3       Pending   0          42s
  cluster-monitoring-operator-6566bf44b-sb8wk   1/1       Running   0          14m
  grafana-67cb69b946-684vc                      2/2       Running   0          13m
  prometheus-k8s-0                              4/4       Running   1          1m
  prometheus-k8s-1                              4/4       Running   3          6m
  prometheus-operator-77bc9b6c68-gw92h          1/1       Running   0          13m

Again, the alertmanager will remain at Pending status until it has claimed the PV successfully.

  • Delete the Alertmanager PVC created by deployment:
      $ oc get pvc
          NAME                                       STATUS    VOLUME                               CAPACITY   ACCESS MODES   STORAGECLASS   AGE
          alertmanager-main-db-alertmanager-main-0   Pending                                                                                 41s
          prometheus-k8s-db-prometheus-k8s-0         Bound     prometheus-k8s-db-prometheus-k8s-0   100G       RWO                           7m
          prometheus-k8s-db-prometheus-k8s-1         Bound     prometheus-k8s-db-prometheus-k8s-1   100G       RWO                           4m
    
     $ oc delete pvc alertmanager-main-db-alertmanager-main-0
     persistentvolumeclaim "alertmanager-main-db-alertmanager-main-0" deleted
    
  • Create the new PV for Alertmanager:
      $ vi pv-alertmanager-0.yml
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: alertmanager-main-db-alertmanager-main-0
      spec:
        capacity:
          storage: 10G
        accessModes:
          - ReadWriteOnce
        hostPath:
          path: /alertmanager
        persistentVolumeReclaimPolicy: Retain
      storageClassName: alertmanager-storageclass
      ClaimRef:
        name: alertmanager-main-db-alertmanager-main-0
        namespace: openshift-monitoring
    
      $ oc create -f pv-alertmanager-0.yml
    

    Note that /alertmanager is the HostPath attached to the node.

  • Create the PVC for Alertmanager:
      $ vi pvc-alertmanager-0.yml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: alertmanager-main-db-alertmanager-main-0
      spec:
        resources:
          requests:
            storage: 10G
        accessModes:
          - ReadWriteOnce
    
      $ oc create -f pvc-alertmanager-0.yml -n openshift-monitoring
    

After this is done, the second and third Alertmanager pod will start to spin up. So, repeat the steps 1 to 3, creating a new PV and PVC with alertmanager-main-db-alertmanager-main-1 and alertmanager-main-db-alertmanager-main-2 respectively.

The end result for the PV and PVC in OpenShift-Monitoring:

  $ oc get pv
  NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                                       STORAGECLASS   REASON    AGE
  alertmanager-main-db-alertmanager-main-0   10G        RWO            Retain           Bound     openshift-monitoring/alertmanager-main-db-alertmanager-main-0                            5m
  alertmanager-main-db-alertmanager-main-1   10G        RWO            Retain           Bound     openshift-monitoring/alertmanager-main-db-alertmanager-main-1                            1m
  alertmanager-main-db-alertmanager-main-2   10G        RWO            Retain           Bound     openshift-monitoring/alertmanager-main-db-alertmanager-main-2                            8s
  prometheus-k8s-db-prometheus-k8s-0         100G       RWO            Retain           Bound     openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0                                  13m
  prometheus-k8s-db-prometheus-k8s-1         100G       RWO            Retain           Bound     openshift-monitoring/prometheus-k8s-db-prometheus-k8s-1                                  11m

  $ oc get pvc
  NAME                                       STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
  alertmanager-main-db-alertmanager-main-0   Bound     alertmanager-main-db-alertmanager-main-0   10G        RWO                           3m
  alertmanager-main-db-alertmanager-main-1   Bound     alertmanager-main-db-alertmanager-main-1   10G        RWO                           1m
  alertmanager-main-db-alertmanager-main-2   Bound     alertmanager-main-db-alertmanager-main-2   10G        RWO                           6s
  prometheus-k8s-db-prometheus-k8s-0         Bound     prometheus-k8s-db-prometheus-k8s-0         100G       RWO                           13m
  prometheus-k8s-db-prometheus-k8s-1         Bound     prometheus-k8s-db-prometheus-k8s-1         100G       RWO                           10m

References

Leave a comment