Managing Prometheus on Kubernetes with Prometheus Operator

Kubernetes and Prometheus are a popular combination. CoreOS created a Kubernetes Operator which streamlines operating and configuring Prometheus in Kubernetes. This post is an introduction to Prometheus Operator. It is assumed the reader is familiar with the fundamentals of Kubernetes and Prometheus.

Custom Resource Definitions

Prometheus Operator extends Kubernetes with four Custom Resource Definitions (CRDs):

Prometheus
ServiceMonitor
PrometheusRule
Alertmanager

Prometheus and Alertmanager resources define the desired Prometheus and Alertmanager deployments. The operator will ensure that the desired deployments exist. This means the operator will recreate a Prometheus deployment if it was deleted or delete the associated Prometheus deployment when a Prometheus resource is deleted. The same is true for Alermanagers.

ServiceMonitor resources configure discovery of scraping targets via label selectors. The operator will automatically generate a Prometheus scrape configuration based on the ServiceMonitors and prompt Prometheus to reload the configuration. For example, the following ServiceMonitor will select all endpoints belonging to services with the label app: example-app. For all of these endpoints, “/metrics” on the named port “monitoring” will be scraped. By default, a ServiceMonitor only targets services in the same namespace it was created in. Cross namespace monitoring scenarios are supported, too.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
    - port: monitoring
      path: /metrics

PrometheusRule resources define Prometheus alerting rules. The alerting rules will be added to the Prometheus configuration by Prometheus Operator.

The diagram below illustrates the interactions between all components.

Overview over the interactions between the components (Prometheus, Prometheus Operator, Alertmanager, etc.)

Advantages of Promethes Operator

According to CoreOS, “an Operator represents human operational knowledge in software.” Automating operational tasks leads to fewer avoidable mistakes, reproducibility and enables new patterns. For example, Prometheus Operator enables developers to easily spawn ad-hoc Prometheus and Alertmanager instances for testing purposes.

With ServiceMonitors, it is possible to define conventions that allow it to monitor new systems without any additional configuration. Adding a predefined label to a service and naming the port exposing the metrics endpoint a certain way is all that is necessary for service discovery. Any system that can’t adhere to the conventions, for example because the developers don’t control the application, can be complemented with it’s own ServiceMonitor resource.

Another pattern that is simplified by Prometheus Operator is splitting Prometheis by use for scalability. When one Prometheus instance isn’t enough anymore, it is recommended to use multiple Prometheus instances which are responsible for monitoring a subset of services each. For example, one Prometheus might ingest infrastructure metrics such as node exporter metrics while another is responsible for metrics of custom applications. Another possible approach is to use one Prometheus instance per team.

Finally, Prometheus configuration is quite verbose. It is a blessing that writing Prometheus configuration by hand can be avoided.

Summary

This post is a basic introduction to Prometheus Operator. We looked at the CRDs Prometheus Operator uses and discussed the operator’s advantages. If you want to learn more about Prometheus Operator, take a look at the “Further Reading” section below.