Project Description
With the desire for Rancher Manager to scale to managing 1000s of clusters (10,000 i hear you say) we could try and have 1 instance of Rancher Manager doing it all. But could we have a Manager of Managers? How could we support multi-tenancy where each Rancher Manager has different versions etc?
One project that could be interesting to realizing this vision is KCP. It’s taking the ideas of "virtual clusters" (and projects like vcluster) and looking at providing a more lightweight solution where you don't need a full virtual cluster within another cluster whilst still supporting multi-tenancy, hierarchical workspaces, cross workspace operators and various other features.
Goal for this Hackweek
The purpose of this project is to practically research the following:
- Is the KCP project usable (when I originally looked at 1 year ago it was very hard to grok and get working)
- Have KCP managing the workloads for multiple clusters (we can use k3d for this)
- (Stretch goal) Can we get Rancher Manager (or cluster agent) working against KCP
At the end of the week, we should know if KCP is a project that would be helpful to the future of Rancher Manager. And whether it's worth us getting involved with the project.
KCP could also be useful to Fleet, but this will be out of scope for hack week.
Resources
Looking for hackers with the skills:
This project is part of:
Hack Week 22
Activity
Comments
Be the first to comment!
Similar Projects
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
Goals for HackWeek 2024
- Add support for ClusterClass
- Add e2e testing
- Add more Unit Tests
- Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
- Testing with newer Harvester version (v1.3.X and v1.4.X)
- Due to the length and complexity of the templates, maybe package some of them as Helm Charts.
- Other improvement suggestions are welcome!
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
- Cluster infrastructure provider contract
- Machine infrastructure provider contract
- Provider implementers guide
Rancher microfrontend extensions by ftorchia
Description
Rancher UI Extensions allow users, developers, partners, and customers to extend and enhance the Rancher UI. Extensions are Helm charts that can only be installed once into a cluster. The charts contain a UI built package that is downloaded and linked to the Host UI at runtime; this means that the extension pkg needs to be implemented using the same technology and have the same APIs as Rancher UI.
Goals
We want to create a new type of Rancher extension, based on microfrontend pattern. The extension is served in a docker container in the k8s clusters and embedded in the host UI; this would guarantee us to be able to create extensions unrelated to the rancher UI architecture, in any technology.
Non Goals
We want to apply the microfrontend pattern to the product-level extensions; we don't want to apply it to cluster-level extensions.
Resources
rancher-extension-microfrontend, Rancher extensions
Small healthcheck tool for Longhorn by mbrookhuis
Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py
CVE portal for SUSE Rancher products by gmacedo
Description
Currently it's a bit difficult for users to quickly see the list of CVEs affecting images in Rancher, RKE2, Harvester and Longhorn releases. Users need to individually look for each CVE in the SUSE CVE database page - https://www.suse.com/security/cve/ . This is not optimal, because those CVE pages are a bit hard to read and contain data for all SLE and BCI products too, making it difficult to easily see only the CVEs affecting the latest release of Rancher, for example. We understand that certain costumers are only looking for CVE data for Rancher and not SLE or BCI.
Goals
The objective is to create a simple to read and navigate page that contains only CVE data related to Rancher, RKE2, Harvester and Longhorn, where it's easy to search by a CVE ID, an image name or a release version. The page should also provide the raw data as an exportable CSV file.
It must be an MVP with the minimal amount of effort/time invested, but still providing great value to our users and saving the wasted time that the Rancher Security team needs to spend by manually sharing such data. It might not be long lived, as it can be replaced in 2-3 years with a better SUSE wide solution.
Resources
- The page must be simple and easy to read.
- The UI/UX must be as straightforward as possible with minimal visual noise.
- The content must be created automatically from the raw data that we already have internally.
- It must be updated automatically on a daily basis and on ad-hoc runs (when needed).
- The CVE status must be aligned with VEX.
- The raw data must be exportable as CSV file.
- Ideally it will be written in Go or pure Shell script with basic HTML and no external dependencies in CSS or JS.
Introducing "Bottles": A Proof of Concept for Multi-Version CRD Management in Kubernetes by aruiz
Description
As we delve deeper into the complexities of managing multiple CRD versions within a single Kubernetes cluster, I want to introduce "Bottles" - a proof of concept that aims to address these challenges.
Bottles propose a novel approach to isolating and deploying different CRD versions in a self-contained environment. This would allow for greater flexibility and efficiency in managing diverse workloads.
Goals
- Evaluate Feasibility: determine if this approach is technically viable, as well as identifying possible obstacles and limitations.
- Reuse existing technology: leverage existing products whenever possible, e.g. build on top of Kubewarden as admission controller.
- Focus on Rancher's use case: the ultimate goal is to be able to use this approach to solve Rancher users' needs.
Resources
Core concepts:
- ConfigMaps: Bottles could be defined and configured using ConfigMaps.
- Admission Controller: An admission controller will detect "bootled" CRDs being installed and replace the resource name used to store them.
- Aggregated API Server: By analyzing the author of a request, the aggregated API server will determine the correct bottle and route the request accordingly, making it transparent for the user.
Integrate Backstage with Rancher Manager by nwmacd
Description
Backstage (backstage.io) is an open-source, CNCF project that allows you to create your own developer portal. There are many plugins for Backstage.
This could be a great compliment to Rancher Manager.
Goals
Learn and experiment with Backstage and look at how this could be integrated with Rancher Manager. Goal is to have some kind of integration completed in this Hack week.
Progress
Screen shot of home page at the end of Hackweek:
Day One
- Got Backstage running locally, understanding configuration with HTTPs.
- Got Backstage embedded in an IFRAME inside of Rancher
- Added content into the software catalog (see: https://backstage.io/docs/features/techdocs/getting-started/)
- Understood more about the entity model
Day Two
- Connected Backstage to the Rancher local cluster and configured the Kubernetes plugin.
- Created Rancher theme to make the light theme more consistent with Rancher
Days Three and Day Four
Created two backend plugins for Backstage:
- Catalog Entity Provider - this imports users from Rancher into Backstage
- Auth Provider - uses the proxied sign-in pattern to check the Rancher session cookie, to user that to authenticate the user with Rancher and then log them into Backstage by connecting this to the imported User entity from the catalog entity provider plugin.
With this in place, you can single-sign-on between Rancher and Backstage when it is deployed within Rancher. Note this is only when running locally for development at present
Day Five
- Start to build out a production deployment for all of the above
- Made some progress, but hit issues with the authentication and proxying when running proxied within Rancher, which needs further investigation
Multi-pod, autoscalable Elixir application in Kubernetes using K8s resources by socon
Description
Elixir / Erlang use their own solutions to create clusters that work together. Kubernetes provide its own orchestration. Due to the nature of the BEAM, it looks a very promising technology for applications that run in Kubernetes and requite to be always on, specifically if they are created as web pages using Phoenix.
Goals
- Investigate and provide solutions that work in Phoenix LiveView using Kubernetes resources, so a multi-pod application can be used
- Provide an end to end example that creates and deploy a container from source code.
Resources
https://github.com/dwyl/phoenix-liveview-counter-tutorial https://github.com/propedeutica/elixir-k8s-counter
Install Uyuni on Kubernetes in cloud-native way by cbosdonnat
Description
For now installing Uyuni on Kubernetes requires running mgradm
on a cluster node... which is not what users would do in the Kubernetes world. The idea is to implement an installation based only on helm charts and probably an operator.
Goals
Install Uyuni from Rancher UI.
Resources
mgradm
code: https://github.com/uyuni-project/uyuni-tools- Uyuni operator: https://github.com/cbosdo/uyuni-operator
Metrics Server viewer for Kubernetes by bkampen
This project is finished please visit the github repo below for the tool.
Description
Build a CLI tools which can visualize Kubernetes metrics from the metrics-server, so you're able to watch these without installing Prometheus and Grafana on a cluster.
Goals
- Learn more about metrics-server
- Learn more about the inner workings of Kubernetes.
- Learn more about Go
Resources
https://github.com/bvankampen/metrics-viewer
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games