Description
For now installing Uyuni on Kubernetes requires running mgradm on a cluster node... which is not what users would do in the Kubernetes world. The idea is to implement an installation based only on helm charts and probably an operator.
Goals
Install Uyuni from Rancher UI.
Resources
mgradmcode: https://github.com/uyuni-project/uyuni-tools- Uyuni operator: https://github.com/cbosdo/uyuni-operator
Looking for hackers with the skills:
This project is part of:
Hack Week 24
Activity
Comments
-
about 1 year ago by cbosdonnat | Reply
At the end of the hackweek 24, the result is very encouraging:
- The server setup can now run in a Job instead of inside the running deployment
- The server installs correctly and the deployment is ready
- Salt systems can bootstrap when using
LoadBalancerservices on k3s. - Uninstalling the custom server resource cleans everything out of the box.
- The only things the user needs is to define the secrets and SSL certificates or the issuers for cert-manager, as well as an uyuni server custom resource.
The code:
- The operator: https://github.com/cbosdo/uyuni-operator
- The Uyuni-tools changes used by the operator: PR #507
- The Uyuni setup cleanup needed to run the setup in a separate pod: PR #9508
What's next:
- Implement migration from an old RPM-based server
- Implement update / upgrade of the server
- Play with more network setups
- Test with more kubernetes distros
-
about 1 year ago by cbosdonnat | Reply
Marked the project as completed as the initial stage is complete. PRs will eventually be polished and merged
-
about 1 year ago by cbosdonnat | Reply
Demo YAML file and video are available in https://github.com/cbosdo/uyuni-operator/tree/main/docs
Similar Projects
mgr-ansible-ssh - Intelligent, Lightweight CLI for Distributed Remote Execution by deve5h
Description
By the end of Hack Week, the target will be to deliver a minimal functional version 1 (MVP) of a custom command-line tool named mgr-ansible-ssh (a unified wrapper for BOTH ad-hoc shell & playbooks) that allows operators to:
- Execute arbitrary shell commands on thousand of remote machines simultaneously using Ansible Runner with artifacts saved locally.
- Pass runtime options such as inventory file, remote command string/ playbook execution, parallel forks, limits, dry-run mode, or no-std-ansible-output.
- Leverage existing SSH trust relationships without additional setup.
- Provide a clean, intuitive CLI interface with --help for ease of use. It should provide consistent UX & CI-friendly interface.
- Establish a foundation that can later be extended with advanced features such as logging, grouping, interactive shell mode, safe-command checks, and parallel execution tuning.
The MVP should enable day-to-day operations to efficiently target thousands of machines with a single, consistent interface.
Goals
Primary Goals (MVP):
Build a functional CLI tool (mgr-ansible-ssh) capable of executing shell commands on multiple remote hosts using Ansible Runner. Test the tool across a large distributed environment (1000+ machines) to validate its performance and reliability.
Looking forward to significantly reducing the zypper deployment time across all 351 RMT VM servers in our MLM cluster by eliminating the dependency on the taskomatic service, bringing execution down to a fraction of the current duration. The tool should also support multiple runtime flags, such as:
mgr-ansible-ssh: Remote command execution wrapper using Ansible Runner
Usage: mgr-ansible-ssh [--help] [--version] [--inventory INVENTORY]
[--run RUN] [--playbook PLAYBOOK] [--limit LIMIT]
[--forks FORKS] [--dry-run] [--no-ansible-output]
Required Arguments
--inventory, -i Path to Ansible inventory file to use
Any One of the Arguments Is Required
--run, -r Execute the specified shell command on target hosts
--playbook, -p Execute the specified Ansible playbook on target hosts
Optional Arguments
--help, -h Show the help message and exit
--version, -v Show the version and exit
--limit, -l Limit execution to specific hosts or groups
--forks, -f Number of parallel Ansible forks
--dry-run Run in Ansible check mode (requires -p or --playbook)
--no-ansible-output Suppress Ansible stdout output
Secondary/Stretched Goals (if time permits):
- Add pretty output formatting (success/failure summary per host).
- Implement basic logging of executed commands and results.
- Introduce safety checks for risky commands (shutdown, rm -rf, etc.).
- Package the tool so it can be installed with pip or stored internally.
Resources
Collaboration is welcome from anyone interested in CLI tooling, automation, or distributed systems. Skills that would be particularly valuable include:
- Python especially around CLI dev (argparse, click, rich)
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])). - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring
Outcome
- Jenkins Flaky Test Detector: https://github.com/srbarrios/jenkins-flaky-tests-detector and its container
- IaC on MLM Team: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring/jenkinsflakytestsdetector?reftype=heads, https://gitlab.suse.de/galaxy/infrastructure/-/blob/master/srv/salt/monitoring/grafana/dashboards/flaky-tests.json?ref_type=heads, and others.
- Grafana Dashboard: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection @ @ text
Set Uyuni to manage edge clusters at scale by RDiasMateus
Description
Prepare a Poc on how to use MLM to manage edge clusters. Those cluster are normally equal across each location, and we have a large number of them.
The goal is to produce a set of sets/best practices/scripts to help users manage this kind of setup.
Goals
step 1: Manual set-up
Goal: Have a running application in k3s and be able to update it using System Update Controler (SUC)
- Deploy Micro 6.2 machine
Deploy k3s - single node
- https://docs.k3s.io/quick-start
Build/find a simple web application (static page)
- Build/find a helmchart to deploy the application
Deploy the application on the k3s cluster
Install App updates through helm update
Install OS updates using MLM
step 2: Automate day 1
Goal: Trigger the application deployment and update from MLM
- Salt states For application (with static data)
- Deploy the application helmchart, if not present
- install app updates through helmchart parameters
- Link it to GIT
- Define how to link the state to the machines (based in some pillar data? Using configuration channels by importing the state? Naming convention?)
- Use git update to trigger helmchart app update
- Recurrent state applying configuration channel?
step 3: Multi-node cluster
Goal: Use SUC to update a multi-node cluster.
- Create a multi-node cluster
- Deploy application
- call the helm update/install only on control plane?
- Install App updates through helm update
- Prepare a SUC for OS update (k3s also? How?)
- https://github.com/rancher/system-upgrade-controller
- https://documentation.suse.com/cloudnative/k3s/latest/en/upgrades/automated.html
- Update/deploy the SUC?
- Update/deploy the SUC CRD with the update procedure
Uyuni Saltboot rework by oholecek
Description
When Uyuni switched over to the containerized proxies we had to abandon salt based saltboot infrastructure we had before. Uyuni already had integration with a Cobbler provisioning server and saltboot infra was re-implemented on top of this Cobbler integration.
What was not obvious from the start was that Cobbler, having all it's features, woefully slow when dealing with saltboot size environments. We did some improvements in performance, introduced transactions, and generally tried to make this setup usable. However the underlying slowness remained.
Goals
This project is not something trying to invent new things, it is just finally implementing saltboot infrastructure directly with the Uyuni server core.
Instead of generating grub and pxelinux configurations by Cobbler for all thousands of systems and branches, we will provide a GET access point to retrieve grub or pxelinux file during the boot:
/saltboot/group/grub/$fqdn and similar for systems /saltboot/system/grub/$mac
Next we adapt our tftpd translator to query these points when asked for default or mac based config.
Lastly similar thing needs to be done on our apache server when HTTP UEFI boot is used.
Resources
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0
Self-Scaling LLM Infrastructure Powered by Rancher

Description
The Problem
Running LLMs can get expensive and complex pretty quickly.
Today there are typically two choices:
- Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
- Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.
What if there was a middle ground?
What if infrastructure scaled itself instead of making you scale it?
Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?
Project Repository: github.com/alexander-demicev/llmserverless
What This Project Does
A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.
A complete, self-scaling LLM infrastructure that:
- Scales to zero when idle (no idle costs)
- Scales up automatically when requests come in
- Adds more nodes when needed, removes them when demand drops
- Runs on any infrastructure - laptop, bare metal, or cloud
Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.
How It Works
A combination of open source tools working together:
Flow:
- Users interact with OpenWebUI (chat interface)
- Requests go to LiteLLM Gateway
- LiteLLM routes requests to:
- Ollama (Knative) for local model inference (auto-scales pods)
- Or cloud APIs for fallback
Rancher/k8s Trouble-Maker by tonyhansen
Project Description
When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.
Goals for Hackweek 25
- Update to modern Rancher and verify that existing tests still work
- Change testing logic to populate secrets instead of requiring a secondary script
- Add new tests
Goals for Hackweek 24 (Complete)
- Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
- Create at least 5 modules that can be applied to the cluster and require troubleshooting
Resources
- https://github.com/celidon/rancher-troublemaker
- https://github.com/rancher/terraform-provider-rancher2
- https://github.com/rancher/tf-rancher-up
- https://github.com/rancher/quickstart
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Updates
- Training pipeline and datasets
- Inference Service py
A CLI for Harvester by mohamed.belgaied
Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.
Project Description
Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as:
harvester vm create my-vm --count 5
to create 5 VMs named my-vm-01 to my-vm-05.
Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.
Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli
Done in previous Hackweeks
- Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
- Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE
Goal for this Hackweek
The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.
Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it
Issue list is here: https://github.com/belgaied2/harvester-cli/issues
Resources
The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact).
Welcome contributions are:
- Testing it and creating issues
- Documentation
- Go code improvement
What you might learn
Harvester CLI might be interesting to you if you want to learn more about:
- GitHub Actions
- Harvester as a SUSE Product
- Go programming language
- Kubernetes API
- Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
DONE in 2025 (out of Hackweek)
- Support of ClusterClass
- Add to
clusterctlcommunity providers, you can add it directly withclusterctl - Testing on newer versions of Harvester v1.4.X and v1.5.X
- Support for
clusterctl generate cluster ... - Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
Goals for HackWeek 2025
- FIRST and FOREMOST, any topic is important to you
- Add e2e testing
- Certify the provider for Rancher Turtles
- Add Machine pool labeling
- Add PCI-e passthrough capabilities.
- Other improvement suggestions are welcome!
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
Rewrite Distrobox in go (POC) by fabriziosestito
Description
Rewriting Distrobox in Go.
Main benefits:
- Easier to maintain and to test
- Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)
Goals
- Build a minimal starting point with core commands
- Keep the CLI interface compatible: existing users shouldn't notice any difference
- Use a clean Go architecture with adapters for different container backends
- Keep dependencies minimal and binary size small
- Benchmark against the original shell script
Resources
- Upstream project: https://github.com/89luca89/distrobox/
- Distrobox site: https://distrobox.it/
- ArchWiki: https://wiki.archlinux.org/title/Distrobox
A CLI for Harvester by mohamed.belgaied
Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.
Project Description
Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as:
harvester vm create my-vm --count 5
to create 5 VMs named my-vm-01 to my-vm-05.
Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.
Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli
Done in previous Hackweeks
- Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
- Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE
Goal for this Hackweek
The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.
Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it
Issue list is here: https://github.com/belgaied2/harvester-cli/issues
Resources
The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact).
Welcome contributions are:
- Testing it and creating issues
- Documentation
- Go code improvement
What you might learn
Harvester CLI might be interesting to you if you want to learn more about:
- GitHub Actions
- Harvester as a SUSE Product
- Go programming language
- Kubernetes API
- Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)
SUSE Health Check Tools by roseswe
SUSE HC Tools Overview
A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.
Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.
Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.
Goals
- Overall improvement of the tools
- Adding new collectors
- Add support for SLES16
Resources
csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go
docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*
$ getrpm -r pacemaker
>> Product ID: 2795 (SUSE Linux Enterprise Server for SAP Applications 15 SP7 x86_64), RPM Name:
+--------------+----------------------------+--------+--------------+--------------------+
| Package Name | Version | Arch | Release | Repository |
+--------------+----------------------------+--------+--------------+--------------------+
| pacemaker | 2.1.10+20250718.fdf796ebc8 | x86_64 | 150700.3.3.1 | sle-ha/15.7/x86_64 |
| pacemaker | 2.1.9+20250410.471584e6a2 | x86_64 | 150700.1.9 | sle-ha/15.7/x86_64 |
+--------------+----------------------------+--------+--------------+--------------------+
Total packages found: 2
Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti
Description
The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.
The initial architecture can be checked out on the Repository listed under Resources.
Among the features that will differ this project from other monitoring/notification systems:
- Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
- All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
- Easy account management, being able to parse all required configuration by a single JSON file
- Eliminate integrations by not requiring metrics to go through a data-agreggator
Goals
- Create a minimal working prototype following the workflow specified on the documentation
- Provide instructions on installation/usage
- Work on email notifying capabilities
Resources
Create a go module to wrap happy-compta.fr by cbosdonnat
Description
https://happy-compta.fr is a tool for french work councils simple book keeping. While it does the job, it has no API to work with and it is tedious to enter loads of operations.
Goals
Write a go client module to be used as an API to programmatically manipulate the tool.
Writing an example tool to load data from a CSV file would be good too.
