Currently there is Monasca which can be used for monitoring an OpenStack cloud. But outside of the OpenStack world, Prometheus seems to be used more often.

Goal

  • learn prometheus
  • check which metrics can be collected and look which exporters would need to be written/extended to get enough information about the OpenStack services
  • have fun

Looking for hackers with the skills:

openstack monitoring prometheus

This project is part of:

Hack Week 16

Activity

  • over 7 years ago: tbechtold added keyword "openstack" to this project.
  • over 7 years ago: tbechtold added keyword "monitoring" to this project.
  • over 7 years ago: tbechtold added keyword "prometheus" to this project.
  • over 7 years ago: tbechtold originated this project.

  • Comments

    • joadavis
      over 7 years ago by joadavis | Reply

      If you pick this up for the next hackweek, give a shout out on the #cloud-lmm channel in rocket.chat for any assistance or findings. :)

    Similar Projects

    Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios

    Description

    Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.

    This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.

    The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.

    Goals

    By the end of Hack Week, we aim to have a single, working Python script that:

    1. Connects to Prometheus and executes a query to fetch detailed test failure history.
    2. Processes the raw data into a format suitable for the Gemini API.
    3. Successfully calls the Gemini API with the data and a clear prompt.
    4. Parses the AI's response to extract a simple list of flaky tests.
    5. Saves the list to a JSON file that can be displayed in Grafana.
    6. New panel in our Dashboard listing the Flaky tests

    Resources