Project Description

The goal is to have a language model, that is able to answer technical questions on Uyuni. Uyuni documentation is too large for in-context processing, so finetuning is the way to go.

Goal for this Hackweek

Finetune a model based on llama-2-7b.

Resources

github repo

Looking for hackers with the skills:

ai uyuni

This project is part of:

Hack Week 23

Activity

  • about 2 years ago: nadvornik added keyword "ai" to this project.
  • about 2 years ago: nadvornik added keyword "uyuni" to this project.
  • about 2 years ago: nadvornik originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Background Coding Agent by mmanno

    Description

    I had only bad experiences with AI one-shots. However, monitoring agent work closely and interfering often did result in productivity gains.

    Now, other companies are using agents in pipelines. That makes sense to me, just like CI, we want to offload work to pipelines: Our engineering teams are consistently slowed down by "toil": low-impact, repetitive maintenance tasks. A simple linter rule change, a dependency bump, rebasing patch-sets on top of newer releases or API deprecation requires dozens of manual PRs, draining time from feature development.

    So far we have been writing deterministic, script-based automation for these tasks. And it turns out to be a common trap. These scripts are brittle, complex, and become a massive maintenance burden themselves.

    Can we make prompts and workflows smart enough to succeed at background coding?

    Goals

    We will build a platform that allows engineers to execute complex code transformations using prompts.

    By automating this toil, we accelerate large-scale migrations and allow teams to focus on high-value work.

    Our platform will consist of three main components:

    • "Change" Definition: Engineers will define a transformation as a simple, declarative manifest:
      • The target repositories.
      • A wrapper to run a "coding agent", e.g., "gemini-cli".
      • The task as a natural language prompt.
    • "Change" Management Service: A central service that orchestrates the jobs. It will receive Change definitions and be responsible for the job lifecycle.
    • Execution Runners: We could use existing sandboxed CI runners (like GitHub/GitLab runners) to execute each job or spawn a container.

    MVP

    • Define the Change manifest format.
    • Build the core Management Service that can accept and queue a Change.
    • Connect management service and runners, dynamically dispatch jobs to runners.
    • Create a basic runner script that can run a hard-coded prompt against a test repo and open a PR.

    Stretch Goals:

    • Multi-layered approach, Workflow Agents trigger Coding Agents:
      1. Workflow Agent: Gather information about the task interactively from the user.
      2. Coding Agent: Once the interactive agent has refined the task into a clear prompt, it hands this prompt off to the "coding agent." This background agent is responsible for executing the task and producing the actual pull request.
    • Use MCP:
      1. Workflow Agent gathers context information from Slack, Github, etc.
      2. Workflow Agent triggers a Coding Agent.
    • Create a "Standard Task" library with reliable prompts.
      1. Rebasing rancher-monitoring to a new version of kube-prom-stack
      2. Update charts to use new images
      3. Apply changes to comply with a new linter
      4. Bump complex Go dependencies, like k8s modules
      5. Backport pull requests to other branches
    • Add “review agents” that review the generated PR.

    See also


    Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0

    Self-Scaling LLM Infrastructure Powered by Rancher

    logo


    Description

    The Problem

    Running LLMs can get expensive and complex pretty quickly.

    Today there are typically two choices:

    1. Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
    2. Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.

    What if there was a middle ground?

    What if infrastructure scaled itself instead of making you scale it?

    Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?

    Project Repository: github.com/alexander-demicev/llmserverless


    What This Project Does

    A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.

    A complete, self-scaling LLM infrastructure that:

    • Scales to zero when idle (no idle costs)
    • Scales up automatically when requests come in
    • Adds more nodes when needed, removes them when demand drops
    • Runs on any infrastructure - laptop, bare metal, or cloud

    Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.

    How It Works

    A combination of open source tools working together:

    Flow:

    • Users interact with OpenWebUI (chat interface)
    • Requests go to LiteLLM Gateway
    • LiteLLM routes requests to:
      • Ollama (Knative) for local model inference (auto-scales pods)
      • Or cloud APIs for fallback


    Docs Navigator MCP: SUSE Edition by mackenzie.techdocs

    MCP Docs Navigator: SUSE Edition

    Description

    Docs Navigator MCP: SUSE Edition is an AI-powered documentation navigator that makes finding information across SUSE, Rancher, K3s, and RKE2 documentation effortless. Built as a Model Context Protocol (MCP) server, it enables semantic search, intelligent Q&A, and documentation summarization using 100% open-source AI models (no API keys required!). The project also allows you to bring your own keys from Anthropic and Open AI for parallel processing.

    Goals

    • [ X ] Build functional MCP server with documentation tools
    • [ X ] Implement semantic search with vector embeddings
    • [ X ] Create user-friendly web interface
    • [ X ] Optimize indexing performance (parallel processing)
    • [ X ] Add SUSE branding and polish UX
    • [ X ] Stretch Goal: Add more documentation sources
    • [ X ] Stretch Goal: Implement document change detection for auto-updates

    Coming Soon!

    • Community Feedback: Test with real users and gather improvement suggestions

    Resources


    GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov

    Motivation

    What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?

    Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?

    Description

    To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.

    Goals

    Set up a Python environment

    Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).

    Build the Dialogue Loop

    1. Write a basic Python script using the Gemini API.
    2. Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation

    Socratic/Systemic Strategy Implementation

    1. Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
    2. Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
    3. Implement Bugzillla call to collect the
    4. Implement Questioning Framework as LLVM pre-conditioning
    5. Define set of instructions
    6. Assemble the Tool

    Resources

    What are Systemic Questions?

    Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
    In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.

    Gitlab Project

    gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question


    Kubernetes-Based ML Lifecycle Automation by lmiranda

    Description

    This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.

    The pipeline will automate the lifecycle of a machine learning model, including:

    • Data ingestion/collection
    • Model training as a Kubernetes Job
    • Model artifact storage in an S3-compatible registry (e.g. Minio)
    • A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
    • A lightweight inference service that loads and serves the latest model
    • Monitoring of model performance and service health through Prometheus/Grafana

    The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.

    Goals

    By the end of Hack Week, the project should:

    1. Produce a fully functional ML pipeline running on Kubernetes with:

      • Data collection job
      • Training job container
      • Storage and versioning of trained models
      • Automated deployment of new model versions
      • Model inference API service
      • Basic monitoring dashboards
    2. Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.

    3. Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).

    4. Prepare a short demo explaining the end-to-end process and how new models flow through the system.

    Resources

    Project Repository

    Updates

    1. Training pipeline and datasets
    2. Inference Service py


    Uyuni read-only replica by cbosdonnat

    Description

    For now, there is no possible HA setup for Uyuni. The idea is to explore setting up a read-only shadow instance of an Uyuni and make it as useful as possible.

    Possible things to look at:

    • live sync of the database, probably using the WAL. Some of the tables may have to be skipped or some features disabled on the RO instance (taskomatic, PXT sessions…)
    • Can we use a load balancer that routes read-only queries to either instance and the other to the RW one? For example, packages or PXE data can be served by both, the API GET requests too. The rest would be RW.

    Goals

    • Prepare a document explaining how to do it.
    • PR with the needed code changes to support it


    Uyuni Health-check Grafana AI Troubleshooter by ygutierrez

    Description

    This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.

    Goals

    • Investigate if and how the grafana-llm-app plug-in can be used within the Uyuni Health-check tool.
    • Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
    • Evaluate support for local LLMs and external APIs through the plugin.
    • Evaluate if and how the Uyuni MCP server could be integrated as another source of information.

    Resources

    Grafana LMM plug-in

    Uyuni Health-check


    Ansible to Salt integration by vizhestkov

    Description

    We already have initial integration of Ansible in Salt with the possibility to run playbooks from the salt-master on the salt-minion used as an Ansible Control node.

    In this project I want to check if it possible to make Ansible working on the transport of Salt. Basically run playbooks with Ansible through existing established Salt (ZeroMQ) transport and not using ssh at all.

    It could be a good solution for the end users to reuse Ansible playbooks or run Ansible modules they got used to with no effort of complex configuration with existing Salt (or Uyuni/SUSE Multi Linux Manager) infrastructure.

    Goals

    • [v] Prepare the testing environment with Salt and Ansible installed
    • [v] Discover Ansible codebase to figure out possible ways of integration
    • [v] Create Salt/Uyuni inventory module
    • [v] Make basic modules to work with no using separate ssh connection, but reusing existing Salt connection
    • [v] Test some most basic playbooks

    Resources

    GitHub page

    Video of the demo


    Set Up an Ephemeral Uyuni Instance by mbussolotto

    Description

    To test, check, and verify the latest changes in the master branch, we want to easily set up an ephemeral environment.

    Goals

    • Create an ephemeral environment manually
    • Create an ephemeral environment automatically

      Resources

    • https://github.com/uyuni-project/uyuni

    • https://www.uyuni-project.org/uyuni-docs/en/uyuni/index.html


    Set Uyuni to manage edge clusters at scale by RDiasMateus

    Description

    Prepare a Poc on how to use MLM to manage edge clusters. Those cluster are normally equal across each location, and we have a large number of them.

    The goal is to produce a set of sets/best practices/scripts to help users manage this kind of setup.

    Goals

    step 1: Manual set-up

    Goal: Have a running application in k3s and be able to update it using System Update Controler (SUC)

    • Deploy Micro 6.2 machine
    • Deploy k3s - single node

      • https://docs.k3s.io/quick-start
    • Build/find a simple web application (static page)

      • Build/find a helmchart to deploy the application
    • Deploy the application on the k3s cluster

    • Install App updates through helm update

    • Install OS updates using MLM

    step 2: Automate day 1

    Goal: Trigger the application deployment and update from MLM

    • Salt states For application (with static data)
      • Deploy the application helmchart, if not present
      • install app updates through helmchart parameters
    • Link it to GIT
      • Define how to link the state to the machines (based in some pillar data? Using configuration channels by importing the state? Naming convention?)
      • Use git update to trigger helmchart app update
    • Recurrent state applying configuration channel?

    step 3: Multi-node cluster

    Goal: Use SUC to update a multi-node cluster.

    • Create a multi-node cluster
    • Deploy application
      • call the helm update/install only on control plane?
    • Install App updates through helm update
    • Prepare a SUC for OS update (k3s also? How?)
      • https://github.com/rancher/system-upgrade-controller
      • https://documentation.suse.com/cloudnative/k3s/latest/en/upgrades/automated.html
      • Update/deploy the SUC?
      • Update/deploy the SUC CRD with the update procedure