Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])). - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring
Outcome
- Jenkins Flaky Test Detector: https://github.com/srbarrios/jenkins-flaky-tests-detector and its container
- IaC on MLM Team: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring/jenkinsflakytestsdetector?reftype=heads, https://gitlab.suse.de/galaxy/infrastructure/-/blob/master/srv/salt/monitoring/grafana/dashboards/flaky-tests.json?ref_type=heads, and others.
- Grafana Dashboard: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection @ @ text
Looking for hackers with the skills:
This project is part of:
Hack Week 25
Activity
Comments
-
about 1 month ago by oscar-barrios | Reply
The code of the flaky detector is here: https://github.com/srbarrios/jenkins-flaky-tests-detector
I also published a Docker container to use it here: https://github.com/srbarrios/jenkins-flaky-tests-detector/pkgs/container/jenkins-flaky-tests-detector
The plan now is to write a Salt state in our MLM internal infra, so it runs this container, it expose the results in a web server running on the container, and then I parse it on Grafana.
-
about 1 month ago by oscar-barrios | Reply
I created the new Grafana dashboard for Uyuni here: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection?orgId=1&from=now-6h&to=now&timezone=browser&refresh=1m
Next step now is to build it in a way that I can get the flaky tests for all the Jenkins job test results that we monitoring in MLM.
-
about 1 month ago by oscar-barrios | Reply
Now we can select any of the running test suites, and get a list of the most probable flaky tests :)
-
30 days ago by oscar-barrios | Reply
I will consider this hackweek done for now, to move to my second hackweek project. The outcome it has been good, I must admit that I also vibe coded some parts using Gemini 3. Also, the script analyzing the prometheus series is not relying only on a LLM call, but it also do a first triage based on a simple algorithm, saving resources to ask AI only for ambiguos and complex test failures.
Similar Projects
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
Set Up an Ephemeral Uyuni Instance by mbussolotto
Description
To test, check, and verify the latest changes in the master branch, we want to easily set up an ephemeral environment.
Goals
- Create an ephemeral environment manually
Create an ephemeral environment automatically
Resources
https://github.com/uyuni-project/uyuni
https://www.uyuni-project.org/uyuni-docs/en/uyuni/index.html
Enable more features in mcp-server-uyuni by j_renner
Description
I would like to contribute to mcp-server-uyuni, the MCP server for Uyuni / Multi-Linux Manager) exposing additional features as tools. There is lots of relevant features to be found throughout the API, for example:
- System operations and infos
- System groups
- Maintenance windows
- Ansible
- Reporting
- ...
At the end of the week I managed to enable basic system group operations:
- List all system groups visible to the user
- Create new system groups
- List systems assigned to a group
- Add and remove systems from groups
Goals
- Set up test environment locally with the MCP server and client + a recent MLM server [DONE]
- Identify features and use cases offering a benefit with limited effort required for enablement [DONE]
- Create a PR to the repo [DONE]
Resources
Ansible to Salt integration by vizhestkov
Description
We already have initial integration of Ansible in Salt with the possibility to run playbooks from the salt-master on the salt-minion used as an Ansible Control node.
In this project I want to check if it possible to make Ansible working on the transport of Salt. Basically run playbooks with Ansible through existing established Salt (ZeroMQ) transport and not using ssh at all.
It could be a good solution for the end users to reuse Ansible playbooks or run Ansible modules they got used to with no effort of complex configuration with existing Salt (or Uyuni/SUSE Multi Linux Manager) infrastructure.
Goals
- [v] Prepare the testing environment with Salt and Ansible installed
- [v] Discover Ansible codebase to figure out possible ways of integration
- [v] Create Salt/Uyuni inventory module
- [v] Make basic modules to work with no using separate ssh connection, but reusing existing Salt connection
- [v] Test some most basic playbooks
Resources
mgr-ansible-ssh - Intelligent, Lightweight CLI for Distributed Remote Execution by deve5h
Description
By the end of Hack Week, the target will be to deliver a minimal functional version 1 (MVP) of a custom command-line tool named mgr-ansible-ssh (a unified wrapper for BOTH ad-hoc shell & playbooks) that allows operators to:
- Execute arbitrary shell commands on thousand of remote machines simultaneously using Ansible Runner with artifacts saved locally.
- Pass runtime options such as inventory file, remote command string/ playbook execution, parallel forks, limits, dry-run mode, or no-std-ansible-output.
- Leverage existing SSH trust relationships without additional setup.
- Provide a clean, intuitive CLI interface with --help for ease of use. It should provide consistent UX & CI-friendly interface.
- Establish a foundation that can later be extended with advanced features such as logging, grouping, interactive shell mode, safe-command checks, and parallel execution tuning.
The MVP should enable day-to-day operations to efficiently target thousands of machines with a single, consistent interface.
Goals
Primary Goals (MVP):
Build a functional CLI tool (mgr-ansible-ssh) capable of executing shell commands on multiple remote hosts using Ansible Runner. Test the tool across a large distributed environment (1000+ machines) to validate its performance and reliability.
Looking forward to significantly reducing the zypper deployment time across all 351 RMT VM servers in our MLM cluster by eliminating the dependency on the taskomatic service, bringing execution down to a fraction of the current duration. The tool should also support multiple runtime flags, such as:
mgr-ansible-ssh: Remote command execution wrapper using Ansible Runner
Usage: mgr-ansible-ssh [--help] [--version] [--inventory INVENTORY]
[--run RUN] [--playbook PLAYBOOK] [--limit LIMIT]
[--forks FORKS] [--dry-run] [--no-ansible-output]
Required Arguments
--inventory, -i Path to Ansible inventory file to use
Any One of the Arguments Is Required
--run, -r Execute the specified shell command on target hosts
--playbook, -p Execute the specified Ansible playbook on target hosts
Optional Arguments
--help, -h Show the help message and exit
--version, -v Show the version and exit
--limit, -l Limit execution to specific hosts or groups
--forks, -f Number of parallel Ansible forks
--dry-run Run in Ansible check mode (requires -p or --playbook)
--no-ansible-output Suppress Ansible stdout output
Secondary/Stretched Goals (if time permits):
- Add pretty output formatting (success/failure summary per host).
- Implement basic logging of executed commands and results.
- Introduce safety checks for risky commands (shutdown, rm -rf, etc.).
- Package the tool so it can be installed with pip or stored internally.
Resources
Collaboration is welcome from anyone interested in CLI tooling, automation, or distributed systems. Skills that would be particularly valuable include:
- Python especially around CLI dev (argparse, click, rich)
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
Try AI training with ROCm and LoRA by bmwiedemann
Description
I want to setup a Radeon RX 9600 XT 16 GB at home with ROCm on Slowroll.
Goals
I want to test how fast AI inference can get with the GPU and if I can use LoRA to re-train an existing free model for some task.
Resources
- https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
- https://build.opensuse.org/project/show/science:GPU:ROCm
- https://src.opensuse.org/ROCm/
- https://www.suse.com/c/lora-fine-tuning-llms-for-text-classification/
Results
got inference working with llama.cpp:
export LLAMACPP_ROCM_ARCH=gfx1200
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
-Dhipblas_DIR=/usr/lib64/cmake/hipblaslt/ \
&& cmake --build build --config Release -j8
m=models/gpt-oss-20b-mxfp4.gguf
cd $P/llama.cpp && build/bin/llama-server --model $m --threads 8 --port 8005 --host 0.0.0.0 --device ROCm0 --n-gpu-layers 999
Without the --device option it faulted. Maybe because my APU also appears there?
I updated/fixed various related packages: https://src.opensuse.org/ROCm/rocm-examples/pulls/1 https://src.opensuse.org/ROCm/hipblaslt/pulls/1 SR 1320959
benchmark
I benchmarked inference with llama.cpp + gpt-oss-20b-mxfp4.gguf and ROCm offloading to a Radeon RX 9060 XT 16GB. I varied the number of layers that went to the GPU:
- 0 layers 14.49 tokens/s (8 CPU cores)
- 9 layers 17.79 tokens/s 34% VRAM
- 15 layers 22.39 tokens/s 51% VRAM
- 20 layers 27.49 tokens/s 64% VRAM
- 24 layers 41.18 tokens/s 74% VRAM
- 25+ layers 86.63 tokens/s 75% VRAM (only 200% CPU load)
So there is a significant performance-boost if the whole model fits into the GPU's VRAM.
Update M2Crypto by mcepl
There are couple of projects I work on, which need my attention and putting them to shape:
Goal for this Hackweek
- Put M2Crypto into better shape (most issues closed, all pull requests processed)
- More fun to learn jujutsu
- Play more with Gemini, how much it help (or not).
- Perhaps, also (just slightly related), help to fix vis to work with LuaJIT, particularly to make vis-lspc working.
Liz - Prompt autocomplete by ftorchia
Description
Liz is the Rancher AI assistant for cluster operations.
Goals
We want to help users when sending new messages to Liz, by adding an autocomplete feature to complete their requests based on the context.
Example:
- User prompt: "Can you show me the list of p"
- Autocomplete suggestion: "Can you show me the list of p...od in local cluster?"
Example:
- User prompt: "Show me the logs of #rancher-"
- Chat console: It shows a drop-down widget, next to the # character, with the list of available pod names starting with "rancher-".
Technical Overview
- The AI agent should expose a new ws/autocomplete endpoint to proxy autocomplete messages to the LLM.
- The UI extension should be able to display prompt suggestions and allow users to apply the autocomplete to the Prompt via keyboard shortcuts.
Resources
Enable more features in mcp-server-uyuni by j_renner
Description
I would like to contribute to mcp-server-uyuni, the MCP server for Uyuni / Multi-Linux Manager) exposing additional features as tools. There is lots of relevant features to be found throughout the API, for example:
- System operations and infos
- System groups
- Maintenance windows
- Ansible
- Reporting
- ...
At the end of the week I managed to enable basic system group operations:
- List all system groups visible to the user
- Create new system groups
- List systems assigned to a group
- Add and remove systems from groups
Goals
- Set up test environment locally with the MCP server and client + a recent MLM server [DONE]
- Identify features and use cases offering a benefit with limited effort required for enablement [DONE]
- Create a PR to the repo [DONE]
Resources
Move Uyuni Test Framework from Selenium to Playwright + AI by oscar-barrios

Description
This project aims to migrate the existing Uyuni Test Framework from Selenium to Playwright. The move will improve the stability, speed, and maintainability of our end-to-end tests by leveraging Playwright's modern features. We'll be rewriting the current Selenium code in Ruby to Playwright code in TypeScript, which includes updating the test framework runner, step definitions, and configurations. This is also necessary because we're moving from Cucumber Ruby to CucumberJS.
If you're still curious about the AI in the title, it was just a way to grab your attention. Thanks for your understanding.
Nah, let's be honest
AI helped a lot to vibe code a good part of the Ruby methods of the Test framework, moving them to Typescript, along with the migration from Capybara to Playwright. I've been using "Cline" as plugin for WebStorm IDE, using Gemini API behind it.
Goals
- Migrate Core tests including Onboarding of clients
- Improve test reliabillity: Measure and confirm a significant reduction of flakiness.
- Implement a robust framework: Establish a well-structured and reusable Playwright test framework using the CucumberJS
Resources
- Existing Uyuni Test Framework (Cucumber Ruby + Capybara + Selenium)
- My Template for CucumberJS + Playwright in TypeScript
- Started Hackweek Project