Description
Liz is the Rancher AI assistant for cluster operations.
Goals
We want to help users when sending new messages to Liz, by adding an autocomplete feature to complete their requests based on the context.
Example:
- User prompt: "Can you show me the list of p"
- Autocomplete suggestion: "Can you show me the list of p...od in local cluster?"
Example:
- User prompt: "Show me the logs of #rancher-"
- Chat console: It shows a drop-down widget, next to the # character, with the list of available pod names starting with "rancher-".
Technical Overview
- The AI agent should expose a new ws/autocomplete endpoint to proxy autocomplete messages to the LLM.
- The UI extension should be able to display prompt suggestions and allow users to apply the autocomplete to the Prompt via keyboard shortcuts.
Resources
This project is part of:
Hack Week 25
Activity
Similar Projects
GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov
Motivation
What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?
Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?
Description
To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.
Goals
Set up a Python environment
Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).
Build the Dialogue Loop
- Write a basic Python script using the Gemini API.
- Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation
Socratic/Systemic Strategy Implementation
- Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
- Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
- Implement Bugzillla call to collect the
- Implement Questioning Framework as LLVM pre-conditioning
- Define set of instructions
- Assemble the Tool
Resources
What are Systemic Questions?
Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.
Gitlab Project
gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])). - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring
Outcome
- Jenkins Flaky Test Detector: https://github.com/srbarrios/jenkins-flaky-tests-detector and its container
- IaC on MLM Team: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring/jenkinsflakytestsdetector?reftype=heads, https://gitlab.suse.de/galaxy/infrastructure/-/blob/master/srv/salt/monitoring/grafana/dashboards/flaky-tests.json?ref_type=heads, and others.
- Grafana Dashboard: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection @ @ text
Background Coding Agent by mmanno
Description
I had only bad experiences with AI one-shots. However, monitoring agent work closely and interfering often did result in productivity gains.
Now, other companies are using agents in pipelines. That makes sense to me, just like CI, we want to offload work to pipelines: Our engineering teams are consistently slowed down by "toil": low-impact, repetitive maintenance tasks. A simple linter rule change, a dependency bump, rebasing patch-sets on top of newer releases or API deprecation requires dozens of manual PRs, draining time from feature development.
So far we have been writing deterministic, script-based automation for these tasks. And it turns out to be a common trap. These scripts are brittle, complex, and become a massive maintenance burden themselves.
Can we make prompts and workflows smart enough to succeed at background coding?
Goals
We will build a platform that allows engineers to execute complex code transformations using prompts.
By automating this toil, we accelerate large-scale migrations and allow teams to focus on high-value work.
Our platform will consist of three main components:
- "Change" Definition: Engineers will define a transformation as a simple, declarative manifest:
- The target repositories.
- A wrapper to run a "coding agent", e.g., "gemini-cli".
- The task as a natural language prompt.
- The target repositories.
- "Change" Management Service: A central service that orchestrates the jobs. It will receive Change definitions and be responsible for the job lifecycle.
- Execution Runners: We could use existing sandboxed CI runners (like GitHub/GitLab runners) to execute each job or spawn a container.
MVP
- Define the Change manifest format.
- Build the core Management Service that can accept and queue a Change.
- Connect management service and runners, dynamically dispatch jobs to runners.
- Create a basic runner script that can run a hard-coded prompt against a test repo and open a PR.
Stretch Goals:
- Multi-layered approach, Workflow Agents trigger Coding Agents:
- Workflow Agent: Gather information about the task interactively from the user.
- Coding Agent: Once the interactive agent has refined the task into a clear prompt, it hands this prompt off to the "coding agent." This background agent is responsible for executing the task and producing the actual pull request.
- Workflow Agent: Gather information about the task interactively from the user.
- Use MCP:
- Workflow Agent gathers context information from Slack, Github, etc.
- Workflow Agent triggers a Coding Agent.
- Workflow Agent gathers context information from Slack, Github, etc.
- Create a "Standard Task" library with reliable prompts.
- Rebasing rancher-monitoring to a new version of kube-prom-stack
- Update charts to use new images
- Apply changes to comply with a new linter
- Bump complex Go dependencies, like k8s modules
- Backport pull requests to other branches
- Rebasing rancher-monitoring to a new version of kube-prom-stack
- Add “review agents” that review the generated PR.
See also
SUSE Observability MCP server by drutigliano
Description
The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.
This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.
Goals
- Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
- Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
- Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
- Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.
Hackweek STEP
- Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.
Scope
- Implement read-only MCP server that can:
- Connect to a live SUSE Observability instance and authenticate (with API token)
- Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
- Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
- Return the data as a structured JSON payload compliant with the MCP specification.
Deliverables
- MCP Server v0.1 A running Golang MCP server with at least one tool.
- A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.
Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.
Resources
- https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
- https://www.datadoghq.com/blog/datadog-remote-mcp-server
- https://modelcontextprotocol.io/specification/2025-06-18/index
- https://modelcontextprotocol.io/docs/develop/build-server
Basic implementation
- https://github.com/drutigliano19/suse-observability-mcp-server
Results
Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.
Example execution
Try AI training with ROCm and LoRA by bmwiedemann
Description
I want to setup a Radeon RX 9600 XT 16 GB at home with ROCm on Slowroll.
Goals
I want to test how fast AI inference can get with the GPU and if I can use LoRA to re-train an existing free model for some task.
Resources
- https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
- https://build.opensuse.org/project/show/science:GPU:ROCm
- https://src.opensuse.org/ROCm/
- https://www.suse.com/c/lora-fine-tuning-llms-for-text-classification/
Results
got inference working with llama.cpp:
export LLAMACPP_ROCM_ARCH=gfx1200
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
-Dhipblas_DIR=/usr/lib64/cmake/hipblaslt/ \
&& cmake --build build --config Release -j8
m=models/gpt-oss-20b-mxfp4.gguf
cd $P/llama.cpp && build/bin/llama-server --model $m --threads 8 --port 8005 --host 0.0.0.0 --device ROCm0 --n-gpu-layers 999
Without the --device option it faulted. Maybe because my APU also appears there?
I updated/fixed various related packages: https://src.opensuse.org/ROCm/rocm-examples/pulls/1 https://src.opensuse.org/ROCm/hipblaslt/pulls/1 SR 1320959
benchmark
I benchmarked inference with llama.cpp + gpt-oss-20b-mxfp4.gguf and ROCm offloading to a Radeon RX 9060 XT 16GB. I varied the number of layers that went to the GPU:
- 0 layers 14.49 tokens/s (8 CPU cores)
- 9 layers 17.79 tokens/s 34% VRAM
- 15 layers 22.39 tokens/s 51% VRAM
- 20 layers 27.49 tokens/s 64% VRAM
- 24 layers 41.18 tokens/s 74% VRAM
- 25+ layers 86.63 tokens/s 75% VRAM (only 200% CPU load)
So there is a significant performance-boost if the whole model fits into the GPU's VRAM.
SUSE Virtualization (Harvester): VM Import UI flow by wombelix
Description
SUSE Virtualization (Harvester) has a vm-import-controller that allows migrating VMs from VMware and OpenStack, but users need to write manifest files and apply them with kubectl to use it. This project is about adding the missing UI pieces to the harvester-ui-extension, making VM Imports accessible without requiring Kubernetes and YAML knowledge.
VMware and OpenStack admins aren't automatically familiar with Kubernetes and YAML. Implementing the UI part for the VM Import feature makes it easier to use and more accessible. The Harvester Enhancement Proposal (HEP) VM Migration controller included a UI flow implementation in its scope. Issue #2274 received multiple comments that an UI integration would be a nice addition, and issue #4663 was created to request the implementation but eventually stalled.
Right now users need to manually create either VmwareSource or OpenstackSource resources, then write VirtualMachineImport manifests with network mappings and all the other configuration options. Users should be able to do that and track import status through the UI without writing YAML.
Work during the Hack Week will be done in this fork in a branch called suse-hack-week-25, making progress publicly visible and open for contributions. When everything works out and the branch is in good shape, it will be submitted as a pull request to harvester-ui-extension to get it included in the next Harvester release.
Testing will focus on VMware since that's what is available in the lab environment (SUSE Virtualization 1.6 single-node cluster, ESXi 8.0 standalone host). Given that this is about UI and surfacing what the vm-import-controller handles, the implementation should work for OpenStack imports as well.
This project is also a personal challenge to learn vue.js and get familiar with Rancher Extensions development, since harvester-ui-extension is built on that framework.
Goals
- Learn Vue.js and Rancher Extensions fundamentals required to finish the project
- Read and learn from other Rancher UI Extensions code, especially understanding the
harvester-ui-extensioncode base - Understand what the
vm-import-controllerand its CRDs require, identify ready to use components in the Rancher UI Extension API that can be leveraged - Implement UI logic for creating and managing
VmwareSource/OpenstackSourceandVirtualMachineImportresources with all relevant configuration options and credentials - Implemnt UI elements to display
VirtualMachineImportstatus and errors
Resources
HEP and related discussion
- https://github.com/harvester/harvester/blob/master/enhancements/20220726-vm-migration.md
- https://github.com/harvester/harvester/issues/2274
- https://github.com/harvester/harvester/issues/4663
SUSE Virtualization VM Import Documentation
Rancher Extensions Documentation
Rancher UI Plugin Examples
Vue Router Essentials
Vue Router API
Vuex Documentation
Improvements to osc (especially with regards to the Git workflow) by mcepl
Description
There is plenty of hacking on osc, where we could spent some fun time. I would like to see a solution for https://github.com/openSUSE/osc/issues/2006 (which is sufficiently non-serious, that it could be part of HackWeek project).
Song Search with CLAP by gcolangiuli
Description
Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface
Goals
Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:
- Music Tagging;
- Free text search;
- Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.
The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.
Result
In this MVP we implemented:
- Async Song Analysis with Clap model
- Free Text Search of the songs
- Similar song search based on vector representation
- Containerised version with web interface
We also documented what went well and what can be improved in the use of AI.
You can have a look at the result here:
Future implementation can be related to performance improvement and stability of the analysis.
References
- CLAP: The main model being researched;
- huggingface: Pre-trained models for CLAP;
- Free Music Archive: Creative Commons songs that can be used for testing;
Help Create A Chat Control Resistant Turnkey Chatmail/Deltachat Relay Stack - Rootless Podman Compose, OpenSUSE BCI, Hardened, & SELinux by 3nd5h1771fy
Description
The Mission: Decentralized & Sovereign Messaging
FYI: If you have never heard of "Chatmail", you can visit their site here, but simply put it can be thought of as the underlying protocol/platform decentralized messengers like DeltaChat use for their communications. Do not confuse it with the honeypot looking non-opensource paid for prodect with better seo that directs you to chatmailsecure(dot)com
In an era of increasing centralized surveillance by unaccountable bad actors (aka BigTech), "Chat Control," and the erosion of digital privacy, the need for sovereign communication infrastructure is critical. Chatmail is a pioneering initiative that bridges the gap between classic email and modern instant messaging, offering metadata-minimized, end-to-end encrypted (E2EE) communication that is interoperable and open.
However, unless you are a seasoned sysadmin, the current recommended deployment method of a Chatmail relay is rigid, fragile, difficult to properly secure, and effectively takes over the entire host the "relay" is deployed on.
Why This Matters
A simple, host agnostic, reproducible deployment lowers the entry cost for anyone wanting to run a privacy‑preserving, decentralized messaging relay. In an era of perpetually resurrected chat‑control legislation threats, EU digital‑sovereignty drives, and many dangers of using big‑tech messaging platforms (Apple iMessage, WhatsApp, FB Messenger, Instagram, SMS, Google Messages, etc...) for any type of communication, providing an easy‑to‑use alternative empowers:
- Censorship resistance - No single entity controls the relay; operators can spin up new nodes quickly.
- Surveillance mitigation - End‑to‑end OpenPGP encryption ensures relay operators never see plaintext.
- Digital sovereignty - Communities can host their own infrastructure under local jurisdiction, aligning with national data‑policy goals.
By turning the Chatmail relay into a plug‑and‑play container stack, we enable broader adoption, foster a resilient messaging fabric, and give developers, activists, and hobbyists a concrete tool to defend privacy online.
Goals
As I indicated earlier, this project aims to drastically simplify the deployment of Chatmail relay. By converting this architecture into a portable, containerized stack using Podman and OpenSUSE base container images, we can allow anyone to deploy their own censorship-resistant, privacy-preserving communications node in minutes.
Our goal for Hack Week: package every component into containers built on openSUSE/MicroOS base images, initially orchestrated with a single container-compose.yml (podman-compose compatible). The stack will:
- Run on any host that supports Podman (including optimizations and enhancements for SELinux‑enabled systems).
- Allow network decoupling by refactoring configurations to move from file-system constrained Unix sockets to internal TCP networking, allowing containers achieve stricter isolation.
- Utilize Enhanced Security with SELinux by using purpose built utilities such as udica we can quickly generate custom SELinux policies for the container stack, ensuring strict confinement superior to standard/typical Docker deployments.
- Allow the use of bind or remote mounted volumes for shared data (
/var/vmail, DKIM keys, TLS certs, etc.). - Replace the local DNS server requirement with a remote DNS‑provider API for DKIM/TXT record publishing.
By delivering a turnkey, host agnostic, reproducible deployment, we lower the barrier for individuals and small communities to launch their own chatmail relays, fostering a decentralized, censorship‑resistant messaging ecosystem that can serve DeltaChat users and/or future services adopting this protocol
Resources
- The links included above
- https://chatmail.at/doc/relay/
- https://delta.chat/en/help
- Project repo -> https://codeberg.org/EndShittification/containerized-chatmail-relay
Improve/rework household chore tracker `chorazon` by gniebler
Description
I wrote a household chore tracker named chorazon, which is meant to be deployed as a web application in the household's local network.
It features the ability to set up different (so far only weekly) schedules per task and per person, where tasks may span several days.
There are "tokens", which can be collected by users. Tasks can (and usually will) have rewards configured where they yield a certain amount of tokens. The idea is that they can later be redeemed for (surprise) gifts, but this is not implemented yet. (So right now one needs to edit the DB manually to subtract tokens when they're redeemed.)
Days are not rolled over automatically, to allow for task completion control.
We used it in my household for several months, with mixed success. There are many limitations in the system that would warrant a revisit.
It's written using the Pyramid Python framework with URL traversal, ZODB as the data store and Web Components for the frontend.
Goals
- Add admin screens for users, tasks and schedules
- Add models, pages etc. to allow redeeming tokens for gifts/surprises
- …?
Resources
tbd (Gitlab repo)
Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil
Join the Gitter channel! https://gitter.im/uyuni-project/hackweek
Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!
Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.
For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.
No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)
The idea is testing Salt (including bootstrapping with bootstrap script) and Salt-ssh clients
To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):
- Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
- Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
- Package management (install, remove, update...)
- Patching
- Applying any basic salt state (including a formula)
- Salt remote commands
- Bonus point: Java part for product identification, and monitoring enablement
- Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
- Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
- Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)
If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)
- If you don't have knowledge about some of the steps: ask the team
- If you still don't know what to do: switch to another distribution and keep testing.
This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)
In progress/done for Hack Week 25
Guide
We started writin a Guide: Adding a new client GNU Linux distribution to Uyuni at https://github.com/uyuni-project/uyuni/wiki/Guide:-Adding-a-new-client-GNU-Linux-distribution-to-Uyuni, to make things easier for everyone, specially those not too familiar wht Uyuni or not technical.
openSUSE Leap 16.0
The distribution will all love!
https://en.opensuse.org/openSUSE:Roadmap#DRAFTScheduleforLeap16.0
Curent Status We started last year, it's complete now for Hack Week 25! :-D
[W]Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file) NOTE: Done, client tools for SLMicro6 are using as those for SLE16.0/openSUSE Leap 16.0 are not available yet[W]Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)[W]Package management (install, remove, update...). Works, even reboot requirement detection
Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0
Self-Scaling LLM Infrastructure Powered by Rancher

Description
The Problem
Running LLMs can get expensive and complex pretty quickly.
Today there are typically two choices:
- Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
- Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.
What if there was a middle ground?
What if infrastructure scaled itself instead of making you scale it?
Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?
Project Repository: github.com/alexander-demicev/llmserverless
What This Project Does
A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.
A complete, self-scaling LLM infrastructure that:
- Scales to zero when idle (no idle costs)
- Scales up automatically when requests come in
- Adds more nodes when needed, removes them when demand drops
- Runs on any infrastructure - laptop, bare metal, or cloud
Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.
How It Works
A combination of open source tools working together:
Flow:
- Users interact with OpenWebUI (chat interface)
- Requests go to LiteLLM Gateway
- LiteLLM routes requests to:
- Ollama (Knative) for local model inference (auto-scales pods)
- Or cloud APIs for fallback
SUSE Virtualization (Harvester): VM Import UI flow by wombelix
Description
SUSE Virtualization (Harvester) has a vm-import-controller that allows migrating VMs from VMware and OpenStack, but users need to write manifest files and apply them with kubectl to use it. This project is about adding the missing UI pieces to the harvester-ui-extension, making VM Imports accessible without requiring Kubernetes and YAML knowledge.
VMware and OpenStack admins aren't automatically familiar with Kubernetes and YAML. Implementing the UI part for the VM Import feature makes it easier to use and more accessible. The Harvester Enhancement Proposal (HEP) VM Migration controller included a UI flow implementation in its scope. Issue #2274 received multiple comments that an UI integration would be a nice addition, and issue #4663 was created to request the implementation but eventually stalled.
Right now users need to manually create either VmwareSource or OpenstackSource resources, then write VirtualMachineImport manifests with network mappings and all the other configuration options. Users should be able to do that and track import status through the UI without writing YAML.
Work during the Hack Week will be done in this fork in a branch called suse-hack-week-25, making progress publicly visible and open for contributions. When everything works out and the branch is in good shape, it will be submitted as a pull request to harvester-ui-extension to get it included in the next Harvester release.
Testing will focus on VMware since that's what is available in the lab environment (SUSE Virtualization 1.6 single-node cluster, ESXi 8.0 standalone host). Given that this is about UI and surfacing what the vm-import-controller handles, the implementation should work for OpenStack imports as well.
This project is also a personal challenge to learn vue.js and get familiar with Rancher Extensions development, since harvester-ui-extension is built on that framework.
Goals
- Learn Vue.js and Rancher Extensions fundamentals required to finish the project
- Read and learn from other Rancher UI Extensions code, especially understanding the
harvester-ui-extensioncode base - Understand what the
vm-import-controllerand its CRDs require, identify ready to use components in the Rancher UI Extension API that can be leveraged - Implement UI logic for creating and managing
VmwareSource/OpenstackSourceandVirtualMachineImportresources with all relevant configuration options and credentials - Implemnt UI elements to display
VirtualMachineImportstatus and errors
Resources
HEP and related discussion
- https://github.com/harvester/harvester/blob/master/enhancements/20220726-vm-migration.md
- https://github.com/harvester/harvester/issues/2274
- https://github.com/harvester/harvester/issues/4663
SUSE Virtualization VM Import Documentation
Rancher Extensions Documentation
Rancher UI Plugin Examples
Vue Router Essentials
Vue Router API
Vuex Documentation
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
DONE in 2025 (out of Hackweek)
- Support of ClusterClass
- Add to
clusterctlcommunity providers, you can add it directly withclusterctl - Testing on newer versions of Harvester v1.4.X and v1.5.X
- Support for
clusterctl generate cluster ... - Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
Goals for HackWeek 2025
- FIRST and FOREMOST, any topic is important to you
- Add e2e testing
- Certify the provider for Rancher Turtles
- Add Machine pool labeling
- Add PCI-e passthrough capabilities.
- Other improvement suggestions are welcome!
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
A CLI for Harvester by mohamed.belgaied
Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.
Project Description
Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as:
harvester vm create my-vm --count 5
to create 5 VMs named my-vm-01 to my-vm-05.
Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.
Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli
Done in previous Hackweeks
- Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
- Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE
Goal for this Hackweek
The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.
Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it
Issue list is here: https://github.com/belgaied2/harvester-cli/issues
Resources
The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact).
Welcome contributions are:
- Testing it and creating issues
- Documentation
- Go code improvement
What you might learn
Harvester CLI might be interesting to you if you want to learn more about:
- GitHub Actions
- Harvester as a SUSE Product
- Go programming language
- Kubernetes API
- Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)
The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio
Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. 
The Plan
Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!
Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:
❥ The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.
❥ The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.
❥ Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.
If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.
Why?
We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.
The CONCLUSION!!!
A
State of the Union
document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below!
