Project Description

Releasing rancher is currently a complex process and requires release captains to make alot of manual commits. Much of this work is very procedural. Our goal is to automate the more mundane tasks to make it faster and easier for new release captains to get started.

Goal for this Hackweek

To greatly simplify the rancher release process and automate a lot of the manual work that goes into releasing rancher.

Looking for hackers with the skills:

rancher releases

This project is part of:

Hack Week 20

Activity

  • over 4 years ago: pgonin liked this project.
  • over 4 years ago: jblainchristen liked this project.
  • over 4 years ago: rsanna liked this project.
  • over 4 years ago: rsanna joined this project.
  • over 4 years ago: jpayne added keyword "rancher" to this project.
  • over 4 years ago: jpayne added keyword "releases" to this project.
  • over 4 years ago: jpayne joined this project.
  • over 4 years ago: svansteenis started this project.
  • over 4 years ago: jpayne originated this project.

  • Comments

    • hfschmidt
      over 4 years ago by hfschmidt | Reply

      This is interesting! In SCC (scc.suse.com), customers will be able to open their support cases. So SCC will also need to keep track of Rancher / Longhorn versions, so that customers can select the correct version when they open their support case.

    Similar Projects

    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0

    Self-Scaling LLM Infrastructure Powered by Rancher

    logo


    Description

    The Problem

    Running LLMs can get expensive and complex pretty quickly.

    Today there are typically two choices:

    1. Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
    2. Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.

    What if there was a middle ground?

    What if infrastructure scaled itself instead of making you scale it?

    Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?

    Project Repository: github.com/alexander-demicev/llmserverless


    What This Project Does

    A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.

    A complete, self-scaling LLM infrastructure that:

    • Scales to zero when idle (no idle costs)
    • Scales up automatically when requests come in
    • Adds more nodes when needed, removes them when demand drops
    • Runs on any infrastructure - laptop, bare metal, or cloud

    Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.

    How It Works

    A combination of open source tools working together:

    Flow:

    • Users interact with OpenWebUI (chat interface)
    • Requests go to LiteLLM Gateway
    • LiteLLM routes requests to:
      • Ollama (Knative) for local model inference (auto-scales pods)
      • Or cloud APIs for fallback


    Rancher/k8s Trouble-Maker by tonyhansen

    Project Description

    When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.

    Goals for Hackweek 25

    • Update to modern Rancher and verify that existing tests still work
    • Change testing logic to populate secrets instead of requiring a secondary script
    • Add new tests

    Goals for Hackweek 24 (Complete)

    • Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
    • Create at least 5 modules that can be applied to the cluster and require troubleshooting

    Resources

    • https://github.com/celidon/rancher-troublemaker
    • https://github.com/rancher/terraform-provider-rancher2
    • https://github.com/rancher/tf-rancher-up
    • https://github.com/rancher/quickstart


    Liz - Prompt autocomplete by ftorchia

    Description

    Liz is the Rancher AI assistant for cluster operations.

    Goals

    We want to help users when sending new messages to Liz, by adding an autocomplete feature to complete their requests based on the context.

    Example:

    • User prompt: "Can you show me the list of p"
    • Autocomplete suggestion: "Can you show me the list of p...od in local cluster?"

    Example:

    • User prompt: "Show me the logs of #rancher-"
    • Chat console: It shows a drop-down widget, next to the # character, with the list of available pod names starting with "rancher-".

    Technical Overview

    1. The AI agent should expose a new ws/autocomplete endpoint to proxy autocomplete messages to the LLM.
    2. The UI extension should be able to display prompt suggestions and allow users to apply the autocomplete to the Prompt via keyboard shortcuts.

    Resources

    GitHub repository


    Rancher Cluster Lifecycle Visualizer by jferraz

    Description

    Rancher’s v2 provisioning system represents each downstream cluster with several Kubernetes custom resources across multiple API groups, such as clusters.provisioning.cattle.io and clusters.management.cattle.io. Understanding why a cluster is stuck in states like "Provisioning", "Updating", or "Unavailable" often requires jumping between these resources, reading conditions, and correlating them with agent connectivity and known failure modes. This project will build a Cluster Lifecycle Visualizer: a small, read-only controller that runs in the Rancher management cluster and generates a single, human-friendly view per cluster. It will watch Rancher cluster CRDs, derive a simplified lifecycle phase, keep a history of phase transitions from installation time onward, and attach a short, actionable recommendation string that hints at what the operator should check or do next.

    Goals

    • Provide a compact lifecycle summary for each Rancher-managed cluster (e.g. Provisioning, WaitingForClusterAgent, Active, Updating, Error) derived from provisioning.cattle.io/v1 Cluster and management.cattle.io/v3 Cluster status and conditions.
    • Maintain a phase history for each cluster, allowing operators to see how its state evolved over time since the visualizer was installed.
    • Attach a recommended action to the current phase using a small ruleset based on common Rancher failure modes (for example, cluster agent not connected, cluster still stabilizing after an upgrade, or generic error states), to improve the day-to-day debugging experience.
    • Deliver an easy-to-install, read-only component (single YAML or small Helm chart) that Rancher users can deploy to their management cluster and inspect via kubectl get/describe, without UI changes or direct access to downstream clusters.
    • Use idiomatic Go, wrangler, and Rancher APIs.

    Resources

    • Rancher Manager documentation on RKE2 and K3s cluster configuration and provisioning flows.
    • Rancher API Go types for provisioning.cattle.io/v1 and management.cattle.io/v3 (from the rancher/rancher repository or published Go packages).
    • Existing Rancher architecture docs and internal notes about cluster provisioning, cluster agents, and node agents.
    • A local Rancher management cluster (k3s or RKE2) with a few test downstream clusters to validate phase detection, history tracking, and recommendations.