What is Prometheus Operator?
Prometheus Operator is an open-source community project that provides Kubernetes native deployment and management of Prometheus and related monitoring components.
It facilitates the creation of Alertmanager and Prometheus pods whose configuration is updated based on the state of the Kubernetes cluster based on the following CRDs exposed by the operator:
1) ServiceMonitors: a scrape configuration for Prometheus for one or more Kubernetes services
2) PodMonitors: a scrape configuration for Prometheus for one or more Kubernetes Pods
3) Probes: a scrape configuration for Prometheus for one or more ingresses or static targets
4) PrometheusRule: a set of rule groups for Prometheus that each contain either alerting rules or recording rules
5) AlertmanagerConfig: a subsection of the Alertmanager configuration
There are two issues with deploying Prometheus Operator in multi-tenant environments where only a single Prometheus or Alertmanager deployment is desired:
1) Providing users with permissions to create the CRDs can allow them to potentially set up monitoring or alerting for resources they do not have access to since Monitors can scrape across namespaces by default. A user can turn this ability off, but this breaks certain integrations with other features like Istio.
2) Providing users with permissions to access the UIs can allow them to query for series that they should not have access to.
Create a way for a user deploying a single Prometheus, Alertmanager, and Grafana to authorize users to perform actions on them based on k8s native RBAC.
Goal for this Hackweek
1) Create Admission Controllers for Prometheus Operator-based deployments (including Rancher Monitoring V1, Rancher Monitoring V2, and kube-prometheus-stack) that will determine whether a user has permissions to create a Prometheus Operator custom resource depending on their Kubernetes RBAC. e.g. you must have access to a Service to create a ServiceMonitor, you must have access to a Pod to create a PodMonitor, and any PrometheusRule you create must be scoped to metrics collected from namespaces / resources you have access to. 2) Create an optional Revocation Operator that will remove ServiceMonitors and PodMonitors from a cluster when a user who creates them no longer has access to a Service or Pod within the cluster. Once revoked, the Operator should send an alert directly to a configured Alertmanager to notify the cluster admin of the removal of these resources with YAML that allows them to add it back into the cluster. 3) Create an authorization plugin for Prometheus (similar to rancher/prometheus-auth) that will verify whether a user is authenticated with the k8s API and authorized to request the results of a given series.
Looking for help from people with experience in Prometheus, Grafana, or Alertmanager who have ideas on how to better support multi-tenant environments!
No Hackers yet
This project is part of:
Hack Week 20
This project is one of its kind!