Project Description

What is Prometheus Operator?

Prometheus Operator is an open-source community project that provides Kubernetes native deployment and management of Prometheus and related monitoring components.

It facilitates the creation of Alertmanager and Prometheus pods whose configuration is updated based on the state of the Kubernetes cluster based on the following CRDs exposed by the operator:

1) ServiceMonitors: a scrape configuration for Prometheus for one or more Kubernetes services

2) PodMonitors: a scrape configuration for Prometheus for one or more Kubernetes Pods

3) Probes: a scrape configuration for Prometheus for one or more ingresses or static targets

4) PrometheusRule: a set of rule groups for Prometheus that each contain either alerting rules or recording rules

5) AlertmanagerConfig: a subsection of the Alertmanager configuration

Problems

There are two issues with deploying Prometheus Operator in multi-tenant environments where only a single Prometheus or Alertmanager deployment is desired:

1) Providing users with permissions to create the CRDs can allow them to potentially set up monitoring or alerting for resources they do not have access to since Monitors can scrape across namespaces by default. A user can turn this ability off, but this breaks certain integrations with other features like Istio.

2) Providing users with permissions to access the UIs can allow them to query for series that they should not have access to.

Proposal

Create a way for a user deploying a single Prometheus, Alertmanager, and Grafana to authorize users to perform actions on them based on k8s native RBAC.

Goal for this Hackweek

1) Create Admission Controllers for Prometheus Operator-based deployments (including Rancher Monitoring V1, Rancher Monitoring V2, and kube-prometheus-stack) that will determine whether a user has permissions to create a Prometheus Operator custom resource depending on their Kubernetes RBAC. e.g. you must have access to a Service to create a ServiceMonitor, you must have access to a Pod to create a PodMonitor, and any PrometheusRule you create must be scoped to metrics collected from namespaces / resources you have access to. 2) Create an optional Revocation Operator that will remove ServiceMonitors and PodMonitors from a cluster when a user who creates them no longer has access to a Service or Pod within the cluster. Once revoked, the Operator should send an alert directly to a configured Alertmanager to notify the cluster admin of the removal of these resources with YAML that allows them to add it back into the cluster. 3) Create an authorization plugin for Prometheus (similar to rancher/prometheus-auth) that will verify whether a user is authenticated with the k8s API and authorized to request the results of a given series.

Resources

Looking for help from people with experience in Prometheus, Grafana, or Alertmanager who have ideas on how to better support multi-tenant environments!

Looking for hackers with the skills:

prometheus alertmanager grafana monitoring alerting

This project is part of:

Hack Week 20

Activity

  • almost 4 years ago: aiyengar2 added keyword "prometheus" to this project.
  • almost 4 years ago: aiyengar2 added keyword "alertmanager" to this project.
  • almost 4 years ago: aiyengar2 added keyword "grafana" to this project.
  • almost 4 years ago: aiyengar2 added keyword "monitoring" to this project.
  • almost 4 years ago: aiyengar2 added keyword "alerting" to this project.
  • almost 4 years ago: aiyengar2 originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Update my own python audio and video time-lapse and motion capture apps and publish by dmair

    Project Description

    Many years ago, in my own time, I wrote a Qt python application to periodically capture frames from a V4L2 video device (e.g. a webcam) and used it to create daily weather timelapse videos from windows at my home. I have maintained it at home in my own time and this year have added motion detection making it a functional video security tool but with no guarantees. I also wrote a linux audio monitoring app in python using Qt in my own time that captures live signal strength along with 24 hour history of audio signal level/range and audio spectrum. I recently added background noise filtering to the app. In due course I aim to include voice detection, currently I'm assuming via Google's public audio interface. Neither of these is a professional home security app but between them they permit a user to freely monitor video and audio data from a home in a manageable way. Both projects are on github but out-of-date with personal work, I would like to organize and update the github versions of these projects.

    Goal for this Hackweek

    It would probably help to migrate all the v4l2py module based video code to linuxpy.video based code and that looks like a re-write of large areas of the video code. It would also be good to remove a lot of python lint that is several years old to improve the projects with the main goal being to push the recent changes with better organized code to github. If there is enough time I'd like to take the in-line Qt QSettings persistent state code used per-app and write a python class that encapsulates the Qt QSettings class in a value_of(name)/name=value manner for shared use in projects so that persistent state can be accessed read or write anywhere within the apps using a simple interface.

    Resources

    I'm not specifically looking for help but welcome other input.


    Saline (state deployment control and monitoring tool for SUSE Manager/Uyuni) by vizhestkov

    Project Description

    Saline is an addition for salt used in SUSE Manager/Uyuni aimed to provide better control and visibility for states deploymend in the large scale environments.

    In current state the published version can be used only as a Prometheus exporter and missing some of the key features implemented in PoC (not published). Now it can provide metrics related to salt events and state apply process on the minions. But there is no control on this process implemented yet.

    Continue with implementation of the missing features and improve the existing implementation:

    • authentication (need to decide how it should be/or not related to salt auth)

    • web service providing the control of states deployment

    Goal for this Hackweek

    • Implement missing key features

    • Implement the tool for state deployment control with CLI

    Resources

    https://github.com/openSUSE/saline