Project Description
Containers are great for scaling stateless services and making them robust. However, stateful services like databases continue to pose challenges. In particular, upgrading a running database service to a new version with near zero downtime is not trivial. Wouldn't it be great if a helm chart or K8S operator could automatically take care of this for you? This would allow you to fire up a database in your K8S cluster and consume it as-a-service. And put us one step closer to building a private cloud that runs anywhere, powered by pure open source.
Goal for this Hackweek
- Boot up an Amazon Kubernetes Service instance, deploy Rancher and register the cluster with Rancher
- Install a helm chart for an older version of a stateful service on the cluster, e.g. MariaDB, PostgreSQL, MongoDB or similar
- Put some data into the service and run a simulated load on it
- Naively upgrade the service to a newer version via helm and observe what, if anything, breaks
- Read database vendor instructions how human operators can achieve near zero downtime upgrades of the service
- Try and script these instructions to perform an automatic near zero downtime upgrade on the cluster
- Try and derive a general framework for near zero downtime upgrades of arbitrary stateful services (maybe as an extension to helm; needs to offer the right kind of service-specific extension points)
Resources
- Starting as an independent exploration
- Happy for anyone interested in Rancher, MariaDB and Helm to join!
No Hackers yet
Looking for hackers with the skills:
rancher containers mariadb postgresql mongodb near-zerodowntime upgrade
This project is part of:
Hack Week 20
Activity
Comments
Be the first to comment!
Similar Projects
Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0
Self-Scaling LLM Infrastructure Powered by Rancher

Description
The Problem
Running LLMs can get expensive and complex pretty quickly.
Today there are typically two choices:
- Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
- Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.
What if there was a middle ground?
What if infrastructure scaled itself instead of making you scale it?
Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?
Project Repository: github.com/alexander-demicev/llmserverless
What This Project Does
A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.
A complete, self-scaling LLM infrastructure that:
- Scales to zero when idle (no idle costs)
- Scales up automatically when requests come in
- Adds more nodes when needed, removes them when demand drops
- Runs on any infrastructure - laptop, bare metal, or cloud
Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.
How It Works
A combination of open source tools working together:
Flow:
- Users interact with OpenWebUI (chat interface)
- Requests go to LiteLLM Gateway
- LiteLLM routes requests to:
- Ollama (Knative) for local model inference (auto-scales pods)
- Or cloud APIs for fallback
Liz - Prompt autocomplete by ftorchia
Description
Liz is the Rancher AI assistant for cluster operations.
Goals
We want to help users when sending new messages to Liz, by adding an autocomplete feature to complete their requests based on the context.
Example:
- User prompt: "Can you show me the list of p"
- Autocomplete suggestion: "Can you show me the list of p...od in local cluster?"
Example:
- User prompt: "Show me the logs of #rancher-"
- Chat console: It shows a drop-down widget, next to the # character, with the list of available pod names starting with "rancher-".
Technical Overview
- The AI agent should expose a new ws/autocomplete endpoint to proxy autocomplete messages to the LLM.
- The UI extension should be able to display prompt suggestions and allow users to apply the autocomplete to the Prompt via keyboard shortcuts.
Resources
The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio
Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. 
The Plan
Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!
Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:
❥ The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.
❥ The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.
❥ Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.
If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.
Why?
We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.
The Outputs
❥ A "State of the Agentic Union" for SUSE engineers, detailing what works, what explodes, and how much coffee we can drink while the robots do the rebasing.
❥ Honest, Daily Updates With All the Gory Details
Rancher/k8s Trouble-Maker by tonyhansen
Project Description
When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.
Goals for Hackweek 25
- Update to modern Rancher and verify that existing tests still work
- Change testing logic to populate secrets instead of requiring a secondary script
- Add new tests
Goals for Hackweek 24 (Complete)
- Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
- Create at least 5 modules that can be applied to the cluster and require troubleshooting
Resources
- https://github.com/celidon/rancher-troublemaker
- https://github.com/rancher/terraform-provider-rancher2
- https://github.com/rancher/tf-rancher-up
- https://github.com/rancher/quickstart
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
DONE in 2025 (out of Hackweek)
- Support of ClusterClass
- Add to
clusterctlcommunity providers, you can add it directly withclusterctl - Testing on newer versions of Harvester v1.4.X and v1.5.X
- Support for
clusterctl generate cluster ... - Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
Goals for HackWeek 2025
- FIRST and FOREMOST, any topic is important to you
- Add e2e testing
- Certify the provider for Rancher Turtles
- Add Machine pool labeling
- Add PCI-e passthrough capabilities.
- Other improvement suggestions are welcome!
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
Help Create A Chat Control Resistant Turnkey Chatmail/Deltachat Relay Stack - Rootless Podman Compose, OpenSUSE BCI, Hardened, & SELinux by 3nd5h1771fy
Description
The Mission: Decentralized & Sovereign Messaging
FYI: If you have never heard of "Chatmail", you can visit their site here, but simply put it can be thought of as the underlying protocol/platform decentralized messengers like DeltaChat use for their communications. Do not confuse it with the honeypot looking non-opensource paid for prodect with better seo that directs you to chatmailsecure(dot)com
In an era of increasing centralized surveillance by unaccountable bad actors (aka BigTech), "Chat Control," and the erosion of digital privacy, the need for sovereign communication infrastructure is critical. Chatmail is a pioneering initiative that bridges the gap between classic email and modern instant messaging, offering metadata-minimized, end-to-end encrypted (E2EE) communication that is interoperable and open.
However, unless you are a seasoned sysadmin, the current recommended deployment method of a Chatmail relay is rigid, fragile, difficult to properly secure, and effectively takes over the entire host the "relay" is deployed on.
Why This Matters
A simple, host agnostic, reproducible deployment lowers the entry cost for anyone wanting to run a privacy‑preserving, decentralized messaging relay. In an era of perpetually resurrected chat‑control legislation threats, EU digital‑sovereignty drives, and many dangers of using big‑tech messaging platforms (Apple iMessage, WhatsApp, FB Messenger, Instagram, SMS, Google Messages, etc...) for any type of communication, providing an easy‑to‑use alternative empowers:
- Censorship resistance - No single entity controls the relay; operators can spin up new nodes quickly.
- Surveillance mitigation - End‑to‑end OpenPGP encryption ensures relay operators never see plaintext.
- Digital sovereignty - Communities can host their own infrastructure under local jurisdiction, aligning with national data‑policy goals.
By turning the Chatmail relay into a plug‑and‑play container stack, we enable broader adoption, foster a resilient messaging fabric, and give developers, activists, and hobbyists a concrete tool to defend privacy online.
Goals
As I indicated earlier, this project aims to drastically simplify the deployment of Chatmail relay. By converting this architecture into a portable, containerized stack using Podman and OpenSUSE base container images, we can allow anyone to deploy their own censorship-resistant, privacy-preserving communications node in minutes.
Our goal for Hack Week: package every component into containers built on openSUSE/MicroOS base images, initially orchestrated with a single container-compose.yml (podman-compose compatible). The stack will:
- Run on any host that supports Podman (including optimizations and enhancements for SELinux‑enabled systems).
- Allow network decoupling by refactoring configurations to move from file-system constrained Unix sockets to internal TCP networking, allowing containers achieve stricter isolation.
- Utilize Enhanced Security with SELinux by using purpose built utilities such as udica we can quickly generate custom SELinux policies for the container stack, ensuring strict confinement superior to standard/typical Docker deployments.
- Allow the use of bind or remote mounted volumes for shared data (
/var/vmail, DKIM keys, TLS certs, etc.). - Replace the local DNS server requirement with a remote DNS‑provider API for DKIM/TXT record publishing.
By delivering a turnkey, host agnostic, reproducible deployment, we lower the barrier for individuals and small communities to launch their own chatmail relays, fostering a decentralized, censorship‑resistant messaging ecosystem that can serve DeltaChat users and/or future services adopting this protocol
Resources
- The links included above
- https://chatmail.at/doc/relay/
- https://delta.chat/en/help
- Project repo -> https://codeberg.org/EndShittification/containerized-chatmail-relay
Technical talks at universities by agamez
Description
This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.
For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.
Goals
- Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
- Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
- Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.
Resources
- Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
- SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.
Rewrite Distrobox in go (POC) by fabriziosestito
Description
Rewriting Distrobox in Go.
Main benefits:
- Easier to maintain and to test
- Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)
Goals
- Build a minimal starting point with core commands
- Keep the CLI interface compatible: existing users shouldn't notice any difference
- Use a clean Go architecture with adapters for different container backends
- Keep dependencies minimal and binary size small
- Benchmark against the original shell script
Resources
- Upstream project: https://github.com/89luca89/distrobox/
- Distrobox site: https://distrobox.it/
- ArchWiki: https://wiki.archlinux.org/title/Distrobox
Uyuni read-only replica by cbosdonnat
Description
For now, there is no possible HA setup for Uyuni. The idea is to explore setting up a read-only shadow instance of an Uyuni and make it as useful as possible.
Possible things to look at:
- live sync of the database, probably using the WAL. Some of the tables may have to be skipped or some features disabled on the RO instance (taskomatic, PXT sessions…)
- Can we use a load balancer that routes read-only queries to either instance and the other to the RW one? For example, packages or PXE data can be served by both, the API GET requests too. The rest would be RW.
Goals
- Prepare a document explaining how to do it.
- PR with the needed code changes to support it