Project Description
The goal is to have a language model, that is able to answer technical questions on Uyuni. Uyuni documentation is too large for in-context processing, so finetuning is the way to go.
Goal for this Hackweek
Finetune a model based on llama-2-7b.
Resources
No Hackers yet
This project is part of:
Hack Week 23
Comments
Be the first to comment!
Similar Projects
Extended private brain - RAG my own scripts and data into offline LLM AI by tjyrinki_suse
Description
For purely studying purposes, I'd like to find out if I could teach an LLM some of my own accumulated knowledge, to use it as a sort of extended brain.
I might use qwen3-coder or something similar as a starting point.
Everything would be done 100% offline without network available to the container, since I prefer to see when network is needed, and make it so it's never needed (other than initial downloads).
Goals
- Learn something about RAG, LLM, AI.
- Find out if everything works offline as intended.
- As an end result have a new way to access my own existing know-how, but so that I can query the wisdom in them.
- Be flexible to pivot in any direction, as long as there are new things learned.
Resources
To be found on the fly.
Timeline
Day 1 (of 4)
- Tried out a RAG demo, expanded on feeding it my own data
- Experimented with qwen3-coder to add a persistent chat functionality, and keeping vectors in a pickle file
- Optimizations to keep everything within context window
- Learn and add a bit of PyTest
Day 2
- More experimenting and more data
- Study ChromaDB
- Add a Web UI that works from another computer even though the container sees network is down
Day 3
- The above RAG is working well enough for demonstration purposes.
- Pivot to trying out OpenCode, configuring local Ollama qwen3-coder there, to analyze the RAG demo.
- Figured out how to configure Ollama template to be usable under OpenCode. OpenCode locally is super slow to just running qwen3-coder alone.
Day 4 (final day)
- Battle with OpenCode that was both slow and kept on piling up broken things.
- Call it success as after all the agentic AI was working locally.
- Clean up the mess left behind a bit.
Blog Post
Summarized the findings at blog post.
Update M2Crypto by mcepl
There are couple of projects I work on, which need my attention and putting them to shape:
Goal for this Hackweek
- Put M2Crypto into better shape (most issues closed, all pull requests processed)
- More fun to learn jujutsu
- Play more with Gemini, how much it help (or not).
- Perhaps, also (just slightly related), help to fix vis to work with LuaJIT, particularly to make vis-lspc working.
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Updates
- Training pipeline and datasets
- Inference Service py
Backporting patches using LLM by jankara
Description
Backporting Linux kernel fixes (either for CVE issues or as part of general git-fixes workflow) is boring and mostly mechanical work (dealing with changes in context, renamed variables, new helper functions etc.). The idea of this project is to explore usage of LLM for backporting Linux kernel commits to SUSE kernels using LLM.
Goals
- Create safe environment allowing LLM to run and backport patches without exposing the whole filesystem to it (for privacy and security reasons).
- Write prompt that will guide LLM through the backporting process. Fine tune it based on experimental results.
- Explore success rate of LLMs when backporting various patches.
Resources
- Docker
- Gemini CLI
Repository
Current version of the container with some instructions for use are at: https://gitlab.suse.de/jankara/gemini-cli-backporter
SUSE Observability MCP server by drutigliano
Description
The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.
This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.
Goals
- Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
- Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
- Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
- Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.
Hackweek STEP
- Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.
Scope
- Implement read-only MCP server that can:
- Connect to a live SUSE Observability instance and authenticate (with API token)
- Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
- Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
- Return the data as a structured JSON payload compliant with the MCP specification.
Deliverables
- MCP Server v0.1 A running Golang MCP server with at least one tool.
- A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.
Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.
Resources
- https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
- https://www.datadoghq.com/blog/datadog-remote-mcp-server
- https://modelcontextprotocol.io/specification/2025-06-18/index
- https://modelcontextprotocol.io/docs/develop/build-server
Basic implementation
- https://github.com/drutigliano19/suse-observability-mcp-server
Results
Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.
Example execution
Set Uyuni to manage edge clusters at scale by RDiasMateus
Description
Prepare a Poc on how to use MLM to manage edge clusters. Those cluster are normally equal across each location, and we have a large number of them.
The goal is to produce a set of sets/best practices/scripts to help users manage this kind of setup.
Goals
step 1: Manual set-up
Goal: Have a running application in k3s and be able to update it using System Update Controler (SUC)
- Deploy Micro 6.2 machine
Deploy k3s - single node
- https://docs.k3s.io/quick-start
Build/find a simple web application (static page)
- Build/find a helmchart to deploy the application
Deploy the application on the k3s cluster
Install App updates through helm update
Install OS updates using MLM
step 2: Automate day 1
Goal: Trigger the application deployment and update from MLM
- Salt states For application (with static data)
- Deploy the application helmchart, if not present
- install app updates through helmchart parameters
- Link it to GIT
- Define how to link the state to the machines (based in some pillar data? Using configuration channels by importing the state? Naming convention?)
- Use git update to trigger helmchart app update
- Recurrent state applying configuration channel?
step 3: Multi-node cluster
Goal: Use SUC to update a multi-node cluster.
- Create a multi-node cluster
- Deploy application
- call the helm update/install only on control plane?
- Install App updates through helm update
- Prepare a SUC for OS update (k3s also? How?)
- https://github.com/rancher/system-upgrade-controller
- https://documentation.suse.com/cloudnative/k3s/latest/en/upgrades/automated.html
- Update/deploy the SUC?
- Update/deploy the SUC CRD with the update procedure
Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil
Join the Gitter channel! https://gitter.im/uyuni-project/hackweek
Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!
Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.
For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.
No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)
The idea is testing Salt (including bootstrapping with bootstrap script) and Salt-ssh clients
To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):
- Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
- Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
- Package management (install, remove, update...)
- Patching
- Applying any basic salt state (including a formula)
- Salt remote commands
- Bonus point: Java part for product identification, and monitoring enablement
- Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
- Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
- Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)
If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)
- If you don't have knowledge about some of the steps: ask the team
- If you still don't know what to do: switch to another distribution and keep testing.
This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)
In progress/done for Hack Week 25
Guide
We started writin a Guide: Adding a new client GNU Linux distribution to Uyuni at https://github.com/uyuni-project/uyuni/wiki/Guide:-Adding-a-new-client-GNU-Linux-distribution-to-Uyuni, to make things easier for everyone, specially those not too familiar wht Uyuni or not technical.
openSUSE Leap 16.0
The distribution will all love!
https://en.opensuse.org/openSUSE:Roadmap#DRAFTScheduleforLeap16.0
Curent Status We started last year, it's complete now for Hack Week 25! :-D
[W]Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file) NOTE: Done, client tools for SLMicro6 are using as those for SLE16.0/openSUSE Leap 16.0 are not available yet[W]Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)[W]Package management (install, remove, update...). Works, even reboot requirement detection
Move Uyuni Test Framework from Selenium to Playwright + AI by oscar-barrios

Description
This project aims to migrate the existing Uyuni Test Framework from Selenium to Playwright. The move will improve the stability, speed, and maintainability of our end-to-end tests by leveraging Playwright's modern features. We'll be rewriting the current Selenium code in Ruby to Playwright code in TypeScript, which includes updating the test framework runner, step definitions, and configurations. This is also necessary because we're moving from Cucumber Ruby to CucumberJS.
If you're still curious about the AI in the title, it was just a way to grab your attention. Thanks for your understanding.
Nah, let's be honest
AI helped a lot to vibe code a good part of the Ruby methods of the Test framework, moving them to Typescript, along with the migration from Capybara to Playwright. I've been using "Cline" as plugin for WebStorm IDE, using Gemini API behind it.
Goals
- Migrate Core tests including Onboarding of clients
- Improve test reliabillity: Measure and confirm a significant reduction of flakiness.
- Implement a robust framework: Establish a well-structured and reusable Playwright test framework using the CucumberJS
Resources
- Existing Uyuni Test Framework (Cucumber Ruby + Capybara + Selenium)
- My Template for CucumberJS + Playwright in TypeScript
- Started Hackweek Project
Ansible to Salt integration by vizhestkov
Description
We already have initial integration of Ansible in Salt with the possibility to run playbooks from the salt-master on the salt-minion used as an Ansible Control node.
In this project I want to check if it possible to make Ansible working on the transport of Salt. Basically run playbooks with Ansible through existing established Salt (ZeroMQ) transport and not using ssh at all.
It could be a good solution for the end users to reuse Ansible playbooks or run Ansible modules they got used to with no effort of complex configuration with existing Salt (or Uyuni/SUSE Multi Linux Manager) infrastructure.
Goals
- [v] Prepare the testing environment with Salt and Ansible installed
- [v] Discover Ansible codebase to figure out possible ways of integration
- [v] Create Salt/Uyuni inventory module
- [v] Make basic modules to work with no using separate ssh connection, but reusing existing Salt connection
- [v] Test some most basic playbooks
Resources
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])). - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring
Outcome
- Jenkins Flaky Test Detector: https://github.com/srbarrios/jenkins-flaky-tests-detector and its container
- IaC on MLM Team: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring/jenkinsflakytestsdetector?reftype=heads, https://gitlab.suse.de/galaxy/infrastructure/-/blob/master/srv/salt/monitoring/grafana/dashboards/flaky-tests.json?ref_type=heads, and others.
- Grafana Dashboard: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection @ @ text