Self-Scaling LLM Infrastructure Powered by Rancher

logo


Description

The Problem

Running LLMs can get expensive and complex pretty quickly.

Today there are typically two choices:

  1. Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
  2. Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.

What if there was a middle ground?

What if infrastructure scaled itself instead of making you scale it?

Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?

Project Repository: github.com/alexander-demicev/llmserverless


What This Project Does

A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.

A complete, self-scaling LLM infrastructure that:

  • Scales to zero when idle (no idle costs)
  • Scales up automatically when requests come in
  • Adds more nodes when needed, removes them when demand drops
  • Runs on any infrastructure - laptop, bare metal, or cloud

Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.

How It Works

A combination of open source tools working together:

Flow:

  • Users interact with OpenWebUI (chat interface)
  • Requests go to LiteLLM Gateway
  • LiteLLM routes requests to:
    • Ollama (Knative) for local model inference (auto-scales pods)
    • Or cloud APIs for fallback
  • Cluster Autoscaler scales nodes up/down as needed
  • Fleet keeps everything in sync via GitOps

Goals

  • Provide a middle ground between cloud APIs and self-hosted LLMs
  • Enable cost-efficient, privacy-preserving, and flexible LLM deployments
  • Make LLM infrastructure easy to deploy and manage (Helm chart, GitOps)
  • Support local development and production scaling
  • Experiment with hybrid routing, serverless scaling, and GitOps automation

Resources

Features

  • Packaged as a Helm chart: The entire stack is delivered as a Helm chart for easy deployment. See the project repository for setup instructions.
  • Scale to Zero: No requests? No pods. No pods? No nodes (well, minimum 1). LLM infrastructure costs nothing when idle.
  • Hybrid Routing: Simple requests can use public APIs, while complex or private queries are handled in-house, balancing cost and privacy.
  • GitOps Native: Everything is Fleet bundles.
  • Local Development Ready: Uses KIND + Docker provider for local dev. Same architecture that scales to production.

Tech Stack

  • Rancher 2.13 - Cluster management (Turtles is now built-in!)
  • Cluster API - Infrastructure as Kubernetes resources
  • Knative Serving - Serverless pod autoscaling
  • Ollama - Run LLMs locally
  • LiteLLM - Unified LLM API gateway
  • OpenWebUI - Chat interface
  • Fleet - GitOps deployment

What's Next

This is a hackweek project, but here are ideas for the future:

  • GPU node pools for production workloads
  • Cloud provider templates (AWS/Azure/GCP)
  • Smarter routing based on prompt complexity
  • Cost tracking dashboard
  • Response caching

Setup & Usage

For all setup and usage instructions, please refer to the project repository.

Why This Matters

LLMs are becoming a core part of many applications. But infrastructure options are still catching up.

This project explores a middle path:

  • Privacy - run models locally, keep data in-house
  • Cost efficiency - scale to zero, pay only for actual usage
  • Flexibility - mix local and cloud models based on needs
  • Simplicity - one command to deploy, GitOps to manage

It's an experiment in making LLM infrastructure more accessible and practical.


Updates

  • Update 1: Pushed some project prototype I had before along with changes needed to run it on most recent Rancher version
  • Update 2: Added multiple improvements for POC

Hackweek Results and Conclusion

Project Repository: github.com/alexander-demicev/llmserverless

The main conclusion is that it’s already possible to build something like this using the existing Rancher provisioning and management features. However, there are still a few questions and areas to improve for the future:

  • The POC is based on Kubeadm, it can and should be migrated to RKE2.
  • The SUSE AI stack wasn’t used for the sake of time efficiency, the goal was to assemble something that might currently be missing from it.
  • Cluster Autoscaler is getting support in Rancher, so the POC should be updated to use the autoscaler setup recommended by Rancher.
  • I’m not sure Knative is the best tool for self-scaling, maybe Keda would be a better alternative? I found Knative a bit complicated to configure and use, and it might be an overhead for the scope we have.

Looking for hackers with the skills:

rancher ai ll llm kubernetes

This project is part of:

Hack Week 25

Activity

  • 18 days ago: pgonin liked this project.
  • 20 days ago: ademicev0 liked this project.
  • 20 days ago: ademicev0 added keyword "llm" to this project.
  • 20 days ago: ademicev0 added keyword "kubernetes" to this project.
  • 20 days ago: ademicev0 added keyword "rancher" to this project.
  • 20 days ago: ademicev0 added keyword "ai" to this project.
  • 20 days ago: ademicev0 added keyword "ll" to this project.
  • 20 days ago: ademicev0 started this project.
  • 20 days ago: ademicev0 originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    DONE in HackWeek 24:

    DONE in 2025 (out of Hackweek)

    • Support of ClusterClass
    • Add to clusterctl community providers, you can add it directly with clusterctl
    • Testing on newer versions of Harvester v1.4.X and v1.5.X
    • Support for clusterctl generate cluster ...
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)

    Goals for HackWeek 2025

    • FIRST and FOREMOST, any topic is important to you
    • Add e2e testing
    • Certify the provider for Rancher Turtles
    • Add Machine pool labeling
    • Add PCI-e passthrough capabilities.
    • Other improvement suggestions are welcome!

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    SUSE Virtualization (Harvester): VM Import UI flow by wombelix

    Description

    SUSE Virtualization (Harvester) has a vm-import-controller that allows migrating VMs from VMware and OpenStack, but users need to write manifest files and apply them with kubectl to use it. This project is about adding the missing UI pieces to the harvester-ui-extension, making VM Imports accessible without requiring Kubernetes and YAML knowledge.

    VMware and OpenStack admins aren't automatically familiar with Kubernetes and YAML. Implementing the UI part for the VM Import feature makes it easier to use and more accessible. The Harvester Enhancement Proposal (HEP) VM Migration controller included a UI flow implementation in its scope. Issue #2274 received multiple comments that an UI integration would be a nice addition, and issue #4663 was created to request the implementation but eventually stalled.

    Right now users need to manually create either VmwareSource or OpenstackSource resources, then write VirtualMachineImport manifests with network mappings and all the other configuration options. Users should be able to do that and track import status through the UI without writing YAML.

    Work during the Hack Week will be done in this fork in a branch called suse-hack-week-25, making progress publicly visible and open for contributions. When everything works out and the branch is in good shape, it will be submitted as a pull request to harvester-ui-extension to get it included in the next Harvester release.

    Testing will focus on VMware since that's what is available in the lab environment (SUSE Virtualization 1.6 single-node cluster, ESXi 8.0 standalone host). Given that this is about UI and surfacing what the vm-import-controller handles, the implementation should work for OpenStack imports as well.

    This project is also a personal challenge to learn vue.js and get familiar with Rancher Extensions development, since harvester-ui-extension is built on that framework.

    Goals

    • Learn Vue.js and Rancher Extensions fundamentals required to finish the project
    • Read and learn from other Rancher UI Extensions code, especially understanding the harvester-ui-extension code base
    • Understand what the vm-import-controller and its CRDs require, identify ready to use components in the Rancher UI Extension API that can be leveraged
    • Implement UI logic for creating and managing VmwareSource / OpenstackSource and VirtualMachineImport resources with all relevant configuration options and credentials
    • Implemnt UI elements to display VirtualMachineImport status and errors

    Resources

    HEP and related discussion

    SUSE Virtualization VM Import Documentation

    Rancher Extensions Documentation

    Rancher UI Plugin Examples

    Vue Router Essentials

    Vue Router API

    Vuex Documentation


    Rancher Cluster Lifecycle Visualizer by jferraz

    Description

    Rancher’s v2 provisioning system represents each downstream cluster with several Kubernetes custom resources across multiple API groups, such as clusters.provisioning.cattle.io and clusters.management.cattle.io. Understanding why a cluster is stuck in states like "Provisioning", "Updating", or "Unavailable" often requires jumping between these resources, reading conditions, and correlating them with agent connectivity and known failure modes. This project will build a Cluster Lifecycle Visualizer: a small, read-only controller that runs in the Rancher management cluster and generates a single, human-friendly view per cluster. It will watch Rancher cluster CRDs, derive a simplified lifecycle phase, keep a history of phase transitions from installation time onward, and attach a short, actionable recommendation string that hints at what the operator should check or do next.

    Goals

    • Provide a compact lifecycle summary for each Rancher-managed cluster (e.g. Provisioning, WaitingForClusterAgent, Active, Updating, Error) derived from provisioning.cattle.io/v1 Cluster and management.cattle.io/v3 Cluster status and conditions.
    • Maintain a phase history for each cluster, allowing operators to see how its state evolved over time since the visualizer was installed.
    • Attach a recommended action to the current phase using a small ruleset based on common Rancher failure modes (for example, cluster agent not connected, cluster still stabilizing after an upgrade, or generic error states), to improve the day-to-day debugging experience.
    • Deliver an easy-to-install, read-only component (single YAML or small Helm chart) that Rancher users can deploy to their management cluster and inspect via kubectl get/describe, without UI changes or direct access to downstream clusters.
    • Use idiomatic Go, wrangler, and Rancher APIs.

    Resources

    • Rancher Manager documentation on RKE2 and K3s cluster configuration and provisioning flows.
    • Rancher API Go types for provisioning.cattle.io/v1 and management.cattle.io/v3 (from the rancher/rancher repository or published Go packages).
    • Existing Rancher architecture docs and internal notes about cluster provisioning, cluster agents, and node agents.
    • A local Rancher management cluster (k3s or RKE2) with a few test downstream clusters to validate phase detection, history tracking, and recommendations.


    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    Liz - Prompt autocomplete by ftorchia

    Description

    Liz is the Rancher AI assistant for cluster operations.

    Goals

    We want to help users when sending new messages to Liz, by adding an autocomplete feature to complete their requests based on the context.

    Example:

    • User prompt: "Can you show me the list of p"
    • Autocomplete suggestion: "Can you show me the list of p...od in local cluster?"

    Example:

    • User prompt: "Show me the logs of #rancher-"
    • Chat console: It shows a drop-down widget, next to the # character, with the list of available pod names starting with "rancher-".

    Technical Overview

    1. The AI agent should expose a new ws/autocomplete endpoint to proxy autocomplete messages to the LLM.
    2. The UI extension should be able to display prompt suggestions and allow users to apply the autocomplete to the Prompt via keyboard shortcuts.

    Resources

    GitHub repository


    AI-Powered Unit Test Automation for Agama by joseivanlopez

    The Agama project is a multi-language Linux installer that leverages the distinct strengths of several key technologies:

    • Rust: Used for the back-end services and the core HTTP API, providing performance and safety.
    • TypeScript (React/PatternFly): Powers the modern web user interface (UI), ensuring a consistent and responsive user experience.
    • Ruby: Integrates existing, robust YaST libraries (e.g., yast-storage-ng) to reuse established functionality.

    The Problem: Testing Overhead

    Developing and maintaining code across these three languages requires a significant, tedious effort in writing, reviewing, and updating unit tests for each component. This high cost of testing is a drain on developer resources and can slow down the project's evolution.

    The Solution: AI-Driven Automation

    This project aims to eliminate the manual overhead of unit testing by exploring and integrating AI-driven code generation tools. We will investigate how AI can:

    1. Automatically generate new unit tests as code is developed.
    2. Intelligently correct and update existing unit tests when the application code changes.

    By automating this crucial but monotonous task, we can free developers to focus on feature implementation and significantly improve the speed and maintainability of the Agama codebase.

    Goals

    • Proof of Concept: Successfully integrate and demonstrate an authorized AI tool (e.g., gemini-cli) to automatically generate unit tests.
    • Workflow Integration: Define and document a new unit test automation workflow that seamlessly integrates the selected AI tool into the existing Agama development pipeline.
    • Knowledge Sharing: Establish a set of best practices for using AI in code generation, sharing the learned expertise with the broader team.

    Contribution & Resources

    We are seeking contributors interested in AI-powered development and improving developer efficiency. Whether you have previous experience with code generation tools or are eager to learn, your participation is highly valuable.

    If you want to dive deep into AI for software quality, please reach out and join the effort!

    • Authorized AI Tools: Tools supported by SUSE (e.g., gemini-cli)
    • Focus Areas: Rust, TypeScript, and Ruby components within the Agama project.

    Interesting Links


    GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov

    Motivation

    What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?

    Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?

    Description

    To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.

    Goals

    Set up a Python environment

    Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).

    Build the Dialogue Loop

    1. Write a basic Python script using the Gemini API.
    2. Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation

    Socratic/Systemic Strategy Implementation

    1. Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
    2. Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
    3. Implement Bugzillla call to collect the
    4. Implement Questioning Framework as LLVM pre-conditioning
    5. Define set of instructions
    6. Assemble the Tool

    Resources

    What are Systemic Questions?

    Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
    In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.

    Gitlab Project

    gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question


    Bugzilla goes AI - Phase 1 by nwalter

    Description

    This project, Bugzilla goes AI, aims to boost developer productivity by creating an autonomous AI bug agent during Hackweek. The primary goal is to reduce the time employees spend triaging bugs by integrating Ollama to summarize issues, recommend next steps, and push focused daily reports to a Web Interface.

    Goals

    To reduce employee time spent on Bugzilla by implementing an AI tool that triages and summarizes bug reports, providing actionable recommendations to the team via Web Interface.

    Project Charter

    Bugzilla goes AI Phase 1

    Description

    Project Achievements during Hackweek

    In this file you can read about what we achieved during Hackweek.

    Project Achievements


    Enable more features in mcp-server-uyuni by j_renner

    Description

    I would like to contribute to mcp-server-uyuni, the MCP server for Uyuni / Multi-Linux Manager) exposing additional features as tools. There is lots of relevant features to be found throughout the API, for example:

    • System operations and infos
    • System groups
    • Maintenance windows
    • Ansible
    • Reporting
    • ...

    At the end of the week I managed to enable basic system group operations:

    • List all system groups visible to the user
    • Create new system groups
    • List systems assigned to a group
    • Add and remove systems from groups

    Goals

    • Set up test environment locally with the MCP server and client + a recent MLM server [DONE]
    • Identify features and use cases offering a benefit with limited effort required for enablement [DONE]
    • Create a PR to the repo [DONE]

    Resources


    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

     Hackweek STEP

    • Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.

     Scope

    • Implement read-only MCP server that can:
      • Connect to a live SUSE Observability instance and authenticate (with API token)
      • Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
      • Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
      • Return the data as a structured JSON payload compliant with the MCP specification.

    Deliverables

    • MCP Server v0.1 A running Golang MCP server with at least one tool.
    • A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.

    Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index
    • https://modelcontextprotocol.io/docs/develop/build-server

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server

    Results

    Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.

    Example execution


    Bugzilla goes AI - Phase 1 by nwalter

    Description

    This project, Bugzilla goes AI, aims to boost developer productivity by creating an autonomous AI bug agent during Hackweek. The primary goal is to reduce the time employees spend triaging bugs by integrating Ollama to summarize issues, recommend next steps, and push focused daily reports to a Web Interface.

    Goals

    To reduce employee time spent on Bugzilla by implementing an AI tool that triages and summarizes bug reports, providing actionable recommendations to the team via Web Interface.

    Project Charter

    Bugzilla goes AI Phase 1

    Description

    Project Achievements during Hackweek

    In this file you can read about what we achieved during Hackweek.

    Project Achievements


    Extended private brain - RAG my own scripts and data into offline LLM AI by tjyrinki_suse

    Description

    For purely studying purposes, I'd like to find out if I could teach an LLM some of my own accumulated knowledge, to use it as a sort of extended brain.

    I might use qwen3-coder or something similar as a starting point.

    Everything would be done 100% offline without network available to the container, since I prefer to see when network is needed, and make it so it's never needed (other than initial downloads).

    Goals

    1. Learn something about RAG, LLM, AI.
    2. Find out if everything works offline as intended.
    3. As an end result have a new way to access my own existing know-how, but so that I can query the wisdom in them.
    4. Be flexible to pivot in any direction, as long as there are new things learned.

    Resources

    To be found on the fly.

    Timeline

    Day 1 (of 4)

    • Tried out a RAG demo, expanded on feeding it my own data
    • Experimented with qwen3-coder to add a persistent chat functionality, and keeping vectors in a pickle file
    • Optimizations to keep everything within context window
    • Learn and add a bit of PyTest

    Day 2

    • More experimenting and more data
    • Study ChromaDB
    • Add a Web UI that works from another computer even though the container sees network is down

    Day 3

    • The above RAG is working well enough for demonstration purposes.
    • Pivot to trying out OpenCode, configuring local Ollama qwen3-coder there, to analyze the RAG demo.
    • Figured out how to configure Ollama template to be usable under OpenCode. OpenCode locally is super slow to just running qwen3-coder alone.

    Day 4 (final day)

    • Battle with OpenCode that was both slow and kept on piling up broken things.
    • Call it success as after all the agentic AI was working locally.
    • Clean up the mess left behind a bit.

    Blog Post

    Summarized the findings at blog post.


    Song Search with CLAP by gcolangiuli

    Description

    Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

    SUSE Hackweek AI Song Search

    Goals

    Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

    • Music Tagging;
    • Free text search;
    • Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

    The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

    Result

    In this MVP we implemented:

    • Async Song Analysis with Clap model
    • Free Text Search of the songs
    • Similar song search based on vector representation
    • Containerised version with web interface

    We also documented what went well and what can be improved in the use of AI.

    You can have a look at the result here:

    Future implementation can be related to performance improvement and stability of the analysis.

    References


    Explore LLM evaluation metrics by thbertoldi

    Description

    Learn the best practices for evaluating LLM performance with an open-source framework such as DeepEval.

    Goals

    Curate the knowledge learned during practice and present it to colleagues.

    -> Maybe publish a blog post on SUSE's blog?

    Resources

    https://deepeval.com

    https://docs.pactflow.io/docs/bi-directional-contract-testing


    A CLI for Harvester by mohamed.belgaied

    Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.

    Project Description

    Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as: harvester vm create my-vm --count 5 to create 5 VMs named my-vm-01 to my-vm-05.

    asciicast

    Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.

    Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli

    Done in previous Hackweeks

    • Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
    • Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE

    Goal for this Hackweek

    The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.

    Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it

    Issue list is here: https://github.com/belgaied2/harvester-cli/issues

    Resources

    The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact). Welcome contributions are:

    • Testing it and creating issues
    • Documentation
    • Go code improvement

    What you might learn

    Harvester CLI might be interesting to you if you want to learn more about:

    • GitHub Actions
    • Harvester as a SUSE Product
    • Go programming language
    • Kubernetes API
    • Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)


    OpenPlatform Self-Service Portal by tmuntan1

    Description

    In SUSE IT, we developed an internal developer platform for our engineers using SUSE technologies such as RKE2, SUSE Virtualization, and Rancher. While it works well for our existing users, the onboarding process could be better.

    To improve our customer experience, I would like to build a self-service portal to make it easy for people to accomplish common actions. To get started, I would have the portal create Jira SD tickets for our customers to have better information in our tickets, but eventually I want to add automation to reduce our workload.

    Goals

    • Build a frontend website (Angular) that helps customers create Jira SD tickets.
    • Build a backend (Rust with Axum) for the backend, which would do all the hard work for the frontend.

    Resources (SUSE VPN only)

    • development site: https://ui-dev.openplatform.suse.com/login?returnUrl=%2Fopenplatform%2Fforms
    • https://gitlab.suse.de/itpe/core/open-platform/op-portal/backend
    • https://gitlab.suse.de/itpe/core/open-platform/op-portal/frontend


    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    DONE in HackWeek 24:

    DONE in 2025 (out of Hackweek)

    • Support of ClusterClass
    • Add to clusterctl community providers, you can add it directly with clusterctl
    • Testing on newer versions of Harvester v1.4.X and v1.5.X
    • Support for clusterctl generate cluster ...
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)

    Goals for HackWeek 2025

    • FIRST and FOREMOST, any topic is important to you
    • Add e2e testing
    • Certify the provider for Rancher Turtles
    • Add Machine pool labeling
    • Add PCI-e passthrough capabilities.
    • Other improvement suggestions are welcome!

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    Rancher/k8s Trouble-Maker by tonyhansen

    Project Description

    When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.

    Goals for Hackweek 25

    • Update to modern Rancher and verify that existing tests still work
    • Change testing logic to populate secrets instead of requiring a secondary script
    • Add new tests

    Goals for Hackweek 24 (Complete)

    • Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
    • Create at least 5 modules that can be applied to the cluster and require troubleshooting

    Resources

    • https://github.com/celidon/rancher-troublemaker
    • https://github.com/rancher/terraform-provider-rancher2
    • https://github.com/rancher/tf-rancher-up
    • https://github.com/rancher/quickstart


    Exploring Modern AI Trends and Kubernetes-Based AI Infrastructure by jluo

    Description

    Build a solid understanding of the current landscape of Artificial Intelligence and how modern cloud-native technologies—especially Kubernetes—support AI workloads.

    Goals

    Use Gemini Learning Mode to guide the exploration, surface relevant concepts, and structure the learning journey:

    • Gain insight into the latest AI trends, tools, and architectural concepts.
    • Understand how Kubernetes and related cloud-native technologies are used in the AI ecosystem (model training, deployment, orchestration, MLOps).

    Resources

    • Red Hat AI Topic Articles

      • https://www.redhat.com/en/topics/ai
    • Kubeflow Documentation

      • https://www.kubeflow.org/docs/
    • Q4 2025 CNCF Technology Landscape Radar report:

      • https://www.cncf.io/announcements/2025/11/11/cncf-and-slashdata-report-finds-leading-ai-tools-gaining-adoption-in-cloud-native-ecosystems/
      • https://www.cncf.io/wp-content/uploads/2025/11/cncfreporttechradar_111025a.pdf
    • Agent-to-Agent (A2A) Protocol

      • https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/