Description
For now, there is no possible HA setup for Uyuni. The idea is to explore setting up a read-only shadow instance of an Uyuni and make it as useful as possible.
Possible things to look at:
- live sync of the database, probably using the WAL. Some of the tables may have to be skipped or some features disabled on the RO instance (taskomatic, PXT sessions…)
- Can we use a load balancer that routes read-only queries to either instance and the other to the RW one? For example, packages or PXE data can be served by both, the API GET requests too. The rest would be RW.
Goals
- Prepare a document explaining how to do it.
- PR with the needed code changes to support it
Looking for hackers with the skills:
This project is part of:
Hack Week 25
Activity
Comments
-
about 2 months ago by epenchev | Reply
Hi, I think there are a few solutions that might help.
Since I'm dealing a lot with HA and databases, would like to share my thoughts.
One possible solution would be to go with pgpool-II - Scaling PostgreSQL Master-Replica Load Balancing and Automatic Failover.
Such approach is described very much in details -> https://medium.com/@deysouvik700/scaling-postgresql-with-pgpool-ii-master-replica-load-balancing-and-automatic-failover-091983d4dd9a. In the example architecture the PgPool proxy itself is a single point of failure. The example setup could be extended by adding an additional proxy instance. Both proxy instances could be managed by keepalived + VirtualIP config. Of course there are other resources you can refer to as well.
Another possible solution which is kind of more automated would be to go with cnpg. This would require however to have a K8s cluster for your statefull PostgreSQL workload. So ideally you would need at least 3 Nodes HA K8s cluster. This is the minimal setup and all 3 nodes should be (control plane + worker roles) otherwise the standard setup will go up to 5 nodes (control planes and additional worker nodes.) With cnpg you can create multiple services (rw, ro, r) within you cluster and point clients to them https://cloudnative-pg.io/documentation/1.27/service_management/ and https://cloudnative-pg.io/documentation/1.27/architecture/.
Something more experimental that I'm working on recently and hoping to be way easier in operational perspective is https://github.com/kqlite/kqlite. It's a SQLite over the PostgreSQL wire protocol, with support for replication and clustering. However this limits the scope of database functionality down to SQLite only. Unfortunately using any PostgreSQL specific features and data types will not work with kqlite .
P.S. Also there is plenty of documentation on going with the standard approach patroni + HAProxy + etcd.
-
about 2 months ago by cbosdonnat | Reply
The first issue will be the replication of the DB itself. Since we have sequences and those are not logically replicated, we will have to check the possible options there.
-
Similar Projects
mgr-ansible-ssh - Intelligent, Lightweight CLI for Distributed Remote Execution by deve5h
Description
By the end of Hack Week, the target will be to deliver a minimal functional version 1 (MVP) of a custom command-line tool named mgr-ansible-ssh (a unified wrapper for BOTH ad-hoc shell & playbooks) that allows operators to:
- Execute arbitrary shell commands on thousand of remote machines simultaneously using Ansible Runner with artifacts saved locally.
- Pass runtime options such as inventory file, remote command string/ playbook execution, parallel forks, limits, dry-run mode, or no-std-ansible-output.
- Leverage existing SSH trust relationships without additional setup.
- Provide a clean, intuitive CLI interface with --help for ease of use. It should provide consistent UX & CI-friendly interface.
- Establish a foundation that can later be extended with advanced features such as logging, grouping, interactive shell mode, safe-command checks, and parallel execution tuning.
The MVP should enable day-to-day operations to efficiently target thousands of machines with a single, consistent interface.
Goals
Primary Goals (MVP):
Build a functional CLI tool (mgr-ansible-ssh) capable of executing shell commands on multiple remote hosts using Ansible Runner. Test the tool across a large distributed environment (1000+ machines) to validate its performance and reliability.
Looking forward to significantly reducing the zypper deployment time across all 351 RMT VM servers in our MLM cluster by eliminating the dependency on the taskomatic service, bringing execution down to a fraction of the current duration. The tool should also support multiple runtime flags, such as:
mgr-ansible-ssh: Remote command execution wrapper using Ansible Runner
Usage: mgr-ansible-ssh [--help] [--version] [--inventory INVENTORY]
[--run RUN] [--playbook PLAYBOOK] [--limit LIMIT]
[--forks FORKS] [--dry-run] [--no-ansible-output]
Required Arguments
--inventory, -i Path to Ansible inventory file to use
Any One of the Arguments Is Required
--run, -r Execute the specified shell command on target hosts
--playbook, -p Execute the specified Ansible playbook on target hosts
Optional Arguments
--help, -h Show the help message and exit
--version, -v Show the version and exit
--limit, -l Limit execution to specific hosts or groups
--forks, -f Number of parallel Ansible forks
--dry-run Run in Ansible check mode (requires -p or --playbook)
--no-ansible-output Suppress Ansible stdout output
Secondary/Stretched Goals (if time permits):
- Add pretty output formatting (success/failure summary per host).
- Implement basic logging of executed commands and results.
- Introduce safety checks for risky commands (shutdown, rm -rf, etc.).
- Package the tool so it can be installed with pip or stored internally.
Resources
Collaboration is welcome from anyone interested in CLI tooling, automation, or distributed systems. Skills that would be particularly valuable include:
- Python especially around CLI dev (argparse, click, rich)
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
Enable more features in mcp-server-uyuni by j_renner
Description
I would like to contribute to mcp-server-uyuni, the MCP server for Uyuni / Multi-Linux Manager) exposing additional features as tools. There is lots of relevant features to be found throughout the API, for example:
- System operations and infos
- System groups
- Maintenance windows
- Ansible
- Reporting
- ...
At the end of the week I managed to enable basic system group operations:
- List all system groups visible to the user
- Create new system groups
- List systems assigned to a group
- Add and remove systems from groups
Goals
- Set up test environment locally with the MCP server and client + a recent MLM server [DONE]
- Identify features and use cases offering a benefit with limited effort required for enablement [DONE]
- Create a PR to the repo [DONE]
Resources
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])). - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring
Outcome
- Jenkins Flaky Test Detector: https://github.com/srbarrios/jenkins-flaky-tests-detector and its container
- IaC on MLM Team: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring/jenkinsflakytestsdetector?reftype=heads, https://gitlab.suse.de/galaxy/infrastructure/-/blob/master/srv/salt/monitoring/grafana/dashboards/flaky-tests.json?ref_type=heads, and others.
- Grafana Dashboard: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection @ @ text
Uyuni Saltboot rework by oholecek
Description
When Uyuni switched over to the containerized proxies we had to abandon salt based saltboot infrastructure we had before. Uyuni already had integration with a Cobbler provisioning server and saltboot infra was re-implemented on top of this Cobbler integration.
What was not obvious from the start was that Cobbler, having all it's features, woefully slow when dealing with saltboot size environments. We did some improvements in performance, introduced transactions, and generally tried to make this setup usable. However the underlying slowness remained.
Goals
This project is not something trying to invent new things, it is just finally implementing saltboot infrastructure directly with the Uyuni server core.
Instead of generating grub and pxelinux configurations by Cobbler for all thousands of systems and branches, we will provide a GET access point to retrieve grub or pxelinux file during the boot:
/saltboot/group/grub/$fqdn and similar for systems /saltboot/system/grub/$mac
Next we adapt our tftpd translator to query these points when asked for default or mac based config.
Lastly similar thing needs to be done on our apache server when HTTP UEFI boot is used.
Resources
Casky – Lightweight C Key-Value Engine with Crash Recovery by pperego
Description
Casky is a lightweight, crash-safe key-value store written in C, designed for fast storage and retrieval of data with a minimal footprint. Built using Test-Driven Development (TDD), Casky ensures reliability while keeping the codebase clean and maintainable. It is inspired by Bitcask and aims to provide a simple, embeddable storage engine that can be integrated into microservices, IoT devices, and other C-based applications.
Objectives:
- Implement a minimal key-value store with append-only file storage.
- Support crash-safe persistence and recovery.
- Expose a simple public API: store(key, value), load(key), delete(key).
- Follow TDD methodology for robust and testable code.
- Provide a foundation for future extensions, such as in-memory caching, compaction, and eventual integration with vector-based databases like PixelDB.
Why This Project is Interesting:
Casky combines low-level C programming with modern database concepts, making it an ideal playground to explore storage engines, crash safety, and performance optimization. It’s small enough to complete during Hackweek, yet it provides a solid base for future experiments and more complex projects.
Goals
- Working prototype with append-only storage and memtable.
- TDD test suite covering core functionality and recovery.
- Demonstration of basic operations: insert, load, delete.
- Optional bonus: LRU caching, file compaction, performance benchmarks.
Future Directions:
After Hackweek, Casky can evolve into a backend engine for projects like PixelDB, supporting vector storage and approximate nearest neighbor search, combining low-level performance with cutting-edge AI retrieval applications.
Resources
The Bitcask paper: https://riak.com/assets/bitcask-intro.pdf The Casky repository: https://github.com/thesp0nge/casky
Day 1
[0.10.0] - 2025-12-01
Added
- Core in-memory KeyDir and EntryNode structures
- API functions: caskyopen, caskyclose, caskyput, caskyget, casky_delete
- Hash function: caskydjb2hash_xor
- Error handling via casky_errno
- Unit tests for all APIs using standard asserts
- Test cleanup of temporary files
Changed
- None (first MVP)
Fixed
- None (first MVP)
Day 2
[0.20.0] - 2025-12-02
Port the classic browser game HackTheNet to PHP 8 by dgedon
Description
The classic browser game HackTheNet from 2004 still runs on PHP 4/5 and MySQL 5 and needs a port to PHP 8 and e.g. MariaDB.
Goals
- Port the game to PHP 8 and MariaDB 11
- Create a container where the game server can simply be started/stopped
Resources
- https://github.com/nodeg/hackthenet
Sim racing track database by avicenzi
Description
Do you wonder which tracks are available in each sim racing game? Wonder no more.
Goals
Create a simple website that includes details about sim racing games.
The website should be static and built with Alpine.JS and TailwindCSS. Data should be consumed from JSON, easily done with Alpine.JS.
The main goal is to gather track information, because tracks vary by game. Older games might have older layouts, and newer games might have up-to-date layouts. Some games include historical layouts, some are laser scanned. Many tracks are available as DLCs.
Initially include official tracks from:
- ACC
- iRacing
- PC2
- LMU
- Raceroom
- Rennsport
These games have a short list of tracks and DLCs.
Resources
The hardest part is collecting information about tracks in each game. Active games usually have information on their website or even on Steam. Older games might be on Fandom or a Wiki. Real track information can be extracted from Wikipedia or the track website.
Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev
Description
Continue the work on kqlite (Lightweight remote SQLite with high availability and auto failover).
It's a solution for applications that require High Availability but don't need all the features of a complete RDBMS and can fit SQLite in their use case.
Also kqlite can be considered to be used as a lightweight storage backend for K8s (https://docs.k3s.io/datastore) and the Edge, and allowing to have only 2 Nodes for HA.
Goals
Push kqlite to a beta version.
kqlite as library for Go programs.
Resources
https://github.com/kqlite/kqlite
Collection and organisation of information about Bulgarian schools by iivanov
Description
To achieve this it will be necessary:
- Collect/download raw data from various government and non-governmental organizations
- Clean up raw data and organise it in some kind database.
- Create tool to make queries easy.
- Or perhaps dump all data into AI and ask questions in natural language.
Goals
By selecting particular school information like this will be provided:
- School scores on national exams.
- School scores from the external evaluations exams.
- School town, municipality and region.
- Employment rate in a town or municipality.
- Average health of the population in the region.
Resources
Some of these are available only in bulgarian.
- https://danybon.com/klasazia
- https://nvoresults.com/index.html
- https://ri.mon.bg/active-institutions
- https://www.nsi.bg/nrnm/ekatte/archive
Results
- Information about all Bulgarian schools with their scores during recent years cleaned and organised into SQL tables
- Information about all Bulgarian villages, cities, municipalities and districts cleaned and organised into SQL tables
- Information about all Bulgarian villages and cities census since beginning of this century cleaned and organised into SQL tables.
- Information about all Bulgarian municipalities about religion, ethnicity cleaned and organised into SQL tables.
- Data successfully loaded to locally running Ollama with help to Vanna.AI
- Seems to be usable.
TODO
- Add more statistical information about municipalities and ....
Code and data