Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py
note:
If you want to run this program in you on python environment or the default of the server, please perform the following actions:
- install the needed modules:
pip install tabulate kubernetes
- change the first line of k8s-check.py to:
\#!/usr/bin/env python3
Usage
As mentioned above, for the executing of the script the kubeconfig is needed. If this happens on the node, please use the local kubeconfig (e.g. for k3s this will be: /etc/rancher/k3s/k3s.yaml). If the script is run on a different server, copy the kubeconfig to this server. Executing the following will do the magic:
/opt/k8s-check --kubeconfig
repository link
https://github.com/mbrookhuis/k8s-check
keywords: longhorn monitoring python
Looking for hackers with the skills:
This project is part of:
Hack Week 24
Activity
Comments
Be the first to comment!
Similar Projects
Extending KubeVirtBMC's capability by adding Redfish support by zchang
Description
In Hack Week 23, we delivered a project called KubeBMC (renamed to KubeVirtBMC now), which brings the good old-fashioned IPMI ways to manage virtual machines running on KubeVirt-powered clusters. This opens the possibility of integrating existing bare-metal provisioning solutions like Tinkerbell with virtualized environments. We even received an inquiry about transferring the project to the KubeVirt organization. So, a proposal was filed, which was accepted by the KubeVirt community, and the project was renamed after that. We have many tasks on our to-do list. Some of them are administrative tasks; some are feature-related. One of the most requested features is Redfish support.
Goals
Extend the capability of KubeVirtBMC by adding Redfish support. Currently, the virtbmc component only exposes IPMI endpoints. We need to implement another simulator to expose Redfish endpoints, as we did with the IPMI module. We aim at a basic set of functionalities:
- Power management
- Boot device selection
- Virtual media mount (this one is not so basic
)
Resources
Setup Kanidm as OIDC provider on Kubernetes by jkuzilek
Description
I am planning to upgrade my homelab Kubernetes cluster to the next level and need an OIDC provider for my services, including K8s itself.
Goals
- Successfully configure and deploy Kanidm on homelab cluster
- Integrate with K8s auth
- Integrate with other services (Envoy Gateway, Container Registry, future deployment of Forgejo?)
Resources
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games
ClusterOps - Easily install and manage your personal kubernetes cluster by andreabenini
Description
ClusterOps is a Kubernetes installer and operator designed to streamline the initial configuration
and ongoing maintenance of kubernetes clusters. The focus of this project is primarily on personal
or local installations. However, the goal is to expand its use to encompass all installations of
Kubernetes for local development purposes.
It simplifies cluster management by automating tasks and providing just one user-friendly YAML-based
configuration config.yml
.
Overview
- Simplified Configuration: Define your desired cluster state in a simple YAML file, and ClusterOps will handle the rest.
- Automated Setup: Automates initial cluster configuration, including network settings, storage provisioning, special requirements (for example GPUs) and essential components installation.
- Ongoing Maintenance: Performs routine maintenance tasks such as upgrades, security updates, and resource monitoring.
- Extensibility: Easily extend functionality with custom plugins and configurations.
- Self-Healing: Detects and recovers from common cluster issues, ensuring stability, idempotence and reliability. Same operation can be performed multiple times without changing the result.
- Discreet: It works only on what it knows, if you are manually configuring parts of your kubernetes and this configuration does not interfere with it you can happily continue to work on several parts and use this tool only for what is needed.
Features
- distribution and engine independence. Install your favorite kubernetes engine with your package
manager, execute one script and you'll have a complete working environment at your disposal.
- Basic config approach. One single
config.yml
file with configuration requirements (add/remove features): human readable, plain and simple. All fancy configs managed automatically (ingress, balancers, services, proxy, ...). - Local Builtin ContainerHub. The default installation provides a fully configured ContainerHub available locally along with the kubernetes installation. This configuration allows the user to build, upload and deploy custom container images as they were provided from external sources. Internet public sources are still available but local development can be kept in this localhost server. Builtin ClusterOps operator will be fetched from this ContainerHub registry too.
- Kubernetes official dashboard installed as a plugin, others planned too (k9s for example).
- Kubevirt plugin installed and properly configured. Unleash the power of classic virtualization (KVM+QEMU) on top of Kubernetes and manage your entire system from there, libvirtd and virsh libs are required.
- One operator to rule them all. The installation script configures your machine automatically during installation and adds one kubernetes operator to manage your local cluster. From there the operator takes care of the cluster on your behalf.
- Clean installation and removal. Just test it, when you are done just use the same program to uninstall everything without leaving configs (or pods) behind.
Planned features (Wishlist / TODOs)
- Containerized Data Importer (CDI). Persistent storage management add-on for Kubernetes to provide a declarative way of building and importing Virtual Machine Disks on PVCs for
Install Uyuni on Kubernetes in cloud-native way by cbosdonnat
Description
For now installing Uyuni on Kubernetes requires running mgradm
on a cluster node... which is not what users would do in the Kubernetes world. The idea is to implement an installation based only on helm charts and probably an operator.
Goals
Install Uyuni from Rancher UI.
Resources
mgradm
code: https://github.com/uyuni-project/uyuni-tools- Uyuni operator: https://github.com/cbosdo/uyuni-operator
Selenium with Python by xguo
Description
Try to create test case about Selenium base on Python
Goals
- Knowledge about Selenium with Python
- Create new test case about Selenium
Resources
https://selenium-python.readthedocs.io/ https://www.selenium.dev/
Symbol Relations by hli
Description
There are tools to build function call graphs based on parsing source code, for example, cscope
.
This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.
Detailed description and Demos can be found in the README file:
Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.
Tested with python3.6
Goals
Any comments are welcome.
Resources
https://github.com/lhb-cafe/SymbolRelations
symrellib.py: mplements the symbol relation graph and the disassembly parser
symrel_tracer*.py: implements tracing (-t option)
symrel.py: "cli parser"
Enhance UV openQA helper script by mdonis
Description
A couple months ago an UV openQA helper script was created to help/automate the searching phase inside openQA for a given MU to test. The script searches inside all our openQA job groups (qam-sle) related with a given MU and generates an output suitable to add (copy & paste) inside the update log.
This is still a WIP and could use some enhancements.
Goals
- Move script from bash to python: this would be useful in case we want to include this into MTUI in the future. The script will be separate from MTUI for now. The idea is to have this as a CLI tool using the click library or something similar.
- Add option to look for jobs in other sections inside aggregated updates: right now, when looking for regression tests under aggregated updates for a given MU, the script only looks inside the Core MU job group. This is where most of the regression tests we need are located, but some MUs have their regression tests under the YaST/Containers/Security MU job groups. We should keep the Core MU group as a default, but add an option to be able to look into other job groups under aggregated updates.
- Remove the
-a
option: this option is used to indicate the update ID and is mandatory right now. This is a bit weird and goes against posix stardards. It was developed this way in order to avoid using positional parameters. This problem should be fixed if we move the script to python.
Some other ideas to consider:
- Look into the QAM dashboard API. This has more info on each MU, could use this to link general openQA build results, whether the related RR is approved or not, etc
- Make it easier to see if there's regression tests for a package in an openQA test build. Check if there's a possibility to search for tests that have the package name in them inside each testsuite.
- Unit testing?
More ideas TBD
Resources
https://github.com/os-autoinst/scripts/blob/master/openqa-search-maintenance-core-jobs
https://confluence.suse.com/display/maintenanceqa/Guide+on+how+to+test+Updates
Post-Hackweek update
All major features were implemented. Unit tests are still in progress, and project will be moved to the SUSE github org once everything's done. https://github.com/mjdonis/oqa-search
Introducing "Bottles": A Proof of Concept for Multi-Version CRD Management in Kubernetes by aruiz
Description
As we delve deeper into the complexities of managing multiple CRD versions within a single Kubernetes cluster, I want to introduce "Bottles" - a proof of concept that aims to address these challenges.
Bottles propose a novel approach to isolating and deploying different CRD versions in a self-contained environment. This would allow for greater flexibility and efficiency in managing diverse workloads.
Goals
- Evaluate Feasibility: determine if this approach is technically viable, as well as identifying possible obstacles and limitations.
- Reuse existing technology: leverage existing products whenever possible, e.g. build on top of Kubewarden as admission controller.
- Focus on Rancher's use case: the ultimate goal is to be able to use this approach to solve Rancher users' needs.
Resources
Core concepts:
- ConfigMaps: Bottles could be defined and configured using ConfigMaps.
- Admission Controller: An admission controller will detect "bootled" CRDs being installed and replace the resource name used to store them.
- Aggregated API Server: By analyzing the author of a request, the aggregated API server will determine the correct bottle and route the request accordingly, making it transparent for the user.
CVE portal for SUSE Rancher products by gmacedo
Description
Currently it's a bit difficult for users to quickly see the list of CVEs affecting images in Rancher, RKE2, Harvester and Longhorn releases. Users need to individually look for each CVE in the SUSE CVE database page - https://www.suse.com/security/cve/ . This is not optimal, because those CVE pages are a bit hard to read and contain data for all SLE and BCI products too, making it difficult to easily see only the CVEs affecting the latest release of Rancher, for example. We understand that certain costumers are only looking for CVE data for Rancher and not SLE or BCI.
Goals
The objective is to create a simple to read and navigate page that contains only CVE data related to Rancher, RKE2, Harvester and Longhorn, where it's easy to search by a CVE ID, an image name or a release version. The page should also provide the raw data as an exportable CSV file.
It must be an MVP with the minimal amount of effort/time invested, but still providing great value to our users and saving the wasted time that the Rancher Security team needs to spend by manually sharing such data. It might not be long lived, as it can be replaced in 2-3 years with a better SUSE wide solution.
Resources
- The page must be simple and easy to read.
- The UI/UX must be as straightforward as possible with minimal visual noise.
- The content must be created automatically from the raw data that we already have internally.
- It must be updated automatically on a daily basis and on ad-hoc runs (when needed).
- The CVE status must be aligned with VEX.
- The raw data must be exportable as CSV file.
- Ideally it will be written in Go or pure Shell script with basic HTML and no external dependencies in CSS or JS.
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
Goals for HackWeek 2024
- Add support for ClusterClass
- Add e2e testing
- Add more Unit Tests
- Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
- Testing with newer Harvester version (v1.3.X and v1.4.X)
- Due to the length and complexity of the templates, maybe package some of them as Helm Charts.
- Other improvement suggestions are welcome!
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
- Cluster infrastructure provider contract
- Machine infrastructure provider contract
- Provider implementers guide
Rancher/k8s Trouble-Maker by tonyhansen
Project Description
When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.
Goal for this Hackweek
Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix Create at least 5 modules that can be applied to the cluster and require troubleshooting
Resources
https://github.com/rancher/terraform-provider-rancher2 https://github.com/rancher/tf-rancher-up
Rancher microfrontend extensions by ftorchia
Description
Rancher UI Extensions allow users, developers, partners, and customers to extend and enhance the Rancher UI. Extensions are Helm charts that can only be installed once into a cluster. The charts contain a UI built package that is downloaded and linked to the Host UI at runtime; this means that the extension pkg needs to be implemented using the same technology and have the same APIs as Rancher UI.
Goals
We want to create a new type of Rancher extension, based on microfrontend pattern. The extension is served in a docker container in the k8s clusters and embedded in the host UI; this would guarantee us to be able to create extensions unrelated to the rancher UI architecture, in any technology.
Non Goals
We want to apply the microfrontend pattern to the product-level extensions; we don't want to apply it to cluster-level extensions.
Resources
rancher-extension-microfrontend, Rancher extensions
Build Edge Image Builder ISO with SUSE Manager by mweiss2
Description
With SUSE Manager, we can build OS Images using KIWI and container images. As we have Edge Image Builder, we want to see if it is possible to use SUSE Manager to build/customize OS Images by integrating Edge Image Builder as well.
Goals
To make the process easier for customers, a single-build pipeline that automatically adds the combustion and artifact files from the EIB process is desirable.
- Kiwi and EIB need to come from a Git Repository.
- Kiwi and EIB need to be running as containers.
- Configuration options for the images used for Kiwi and EIB build.
- X86 and ARM64 Support.
- SUSE Manager 4.3 and 5.X Support.
- SLES 15 SP6 / SL Micro 6.0 and SL Micro 6.1 Support.
Outcome
- Change the Kiwi build process to use Podman with the Kiwi image registry.suse.com/bci/kiwi:10.1.10
- Change the Edge Image Builder to produce a combustion-only ISO
- Extract the contents and write them to a dedicated /OEM partition integrated via Kiwi into the ISO Kiwi creates.
Sources and PRs
- https://github.com/Martin-Weiss/kiwi-image-micro-gpu-60
- https://github.com/suse-edge/edge-image-builder/pull/618
- https://github.com/uyuni-project/uyuni/pull/9507
Edge Image Builder and mkosi for Uyuni by oholecek
Description
One part of Uyuni system management tool is ability to build custom images. Currently Uyuni supports only Kiwi image builder.
Kiwi however is not the only image building system out there and with the goal to also become familiar with other systems, this projects aim to add support for Edge Image builder and systemd's mkosi systems.
Goals
Uyuni is able to
- provision EIB and mkosi build hosts
- build EIB and mkosi images and store them
Resources
- Uyuni - https://github.com/uyuni-project/uyuni
- Edge Image builder - https://github.com/suse-edge/edge-image-builder
- mkosi - https://github.com/systemd/mkosi