Project Description
When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.
Goal for this Hackweek
Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix Create at least 5 modules that can be applied to the cluster and require troubleshooting
Resources
https://github.com/rancher/terraform-provider-rancher2 https://github.com/rancher/tf-rancher-up
Looking for hackers with the skills:
This project is part of:
Hack Week 24
Activity
Comments
-
13 days ago by tonyhansen | Reply
First commit and the basics are done. Testing Terraform and then to figure out how to intentionally break things.
https://github.com/celidon/rancher-troublemaker
Similar Projects
terraform-provider-feilong by e_bischoff
Project Description
People need to test operating systems and applications on s390 platform.
Installation from scratch solutions include:
- just deploy and provision manually (with the help of
ftpboot
script, if you are at SUSE) - use
s3270
terminal emulation (used byopenQA
people?) - use
LXC
from IBM to start CP commands and analyze the results - use
zPXE
to do some PXE-alike booting (used by theorthos
team?) - use
tessia
to install from scratch using autoyast - use
libvirt
for s390 to do some nested virtualization on some already deployed z/VM system - directly install a Linux kernel on a LPAR and use
kvm
+libvirt
from there
Deployment from image solutions include:
- use
ICIC
web interface (openstack
in disguise, contributed by IBM) - use
ICIC
from theopenstack
terraform
provider (used byRancher
QA) - use
zvm_ansible
to controlSMAPI
- connect directly to
SMAPI
low-level socket interface
IBM Cloud Infrastructure Center (ICIC
) harnesses the Feilong API, but you can use Feilong
without installing ICIC
, provided you set up a "z/VM cloud connector" into one of your VMs following this schema.
What about writing a terraform Feilong
provider, just like we have the terraform
libvirt
provider? That would allow to transparently call Feilong
from your main.tf files to deploy and destroy resources on your system/z.
Other Feilong-based solutions include:
- make
libvirt
Feilong-aware - simply call
Feilong
from shell scripts withcurl
- use
zvmconnector
client python library from Feilong - use
zthin
part of Feilong to directly commandSMAPI
.
Goal for Hackweek 23
My final goal is to be able to easily deploy and provision VMs automatically on a z/VM system, in a way that people might enjoy even outside of SUSE.
My technical preference is to write a terraform provider plugin, as it is the approach that involves the least software components for our deployments, while remaining clean, and compatible with our existing development infrastructure.
Goals for Hackweek 24
Feilong provider works and is used internally by SUSE Manager team. Let's push it forward!
Let's add support for fiberchannel disks and multipath.
Goals for Hackweek 25
- Finish support for fiberchannel disks and multipath
- Fix problems with registration on hashicorp providers registry
Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil
Join the Gitter channel! https://gitter.im/uyuni-project/hackweek
Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!
Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.
For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.
No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)
The idea is testing Salt and Salt-ssh clients, but NOT traditional clients, which are deprecated.
To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):
- Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
- Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
- Package management (install, remove, update...)
- Patching
- Applying any basic salt state (including a formula)
- Salt remote commands
- Bonus point: Java part for product identification, and monitoring enablement
- Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
- Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
- Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)
If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)
- If you don't have knowledge about some of the steps: ask the team
- If you still don't know what to do: switch to another distribution and keep testing.
This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)
Pending
FUSS
FUSS is a complete GNU/Linux solution (server, client and desktop/standalone) based on Debian for managing an educational network.
https://fuss.bz.it/
Seems to be a Debian 12 derivative, so adding it could be quite easy.
[x]
Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)[P]
Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator) --> Works from UI, need to test the rest.[P]
Package management (install, remove, update...) --> Installing a new package works, needs to test the rest.[ ]
Patching (if patch information is available, could require writing some code to parse it, but IIRC we have support for Ubuntu already)[ ]
Applying any basic salt state (including a formula)[ ]
Salt remote commands[ ]
Bonus point: Java part for product identification, and monitoring enablement
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games
Contribute to terraform-provider-libvirt by pinvernizzi
Description
The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.
It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.
If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.
Goals
- Get more familiar with Terraform provider development and libvirt bindings in Go
- Solve some issues and/or implement some features
- Get in touch with the community around the project
Resources
- CONTRIBUTING readme
- Go libvirt library in use by the project
- Terraform plugin development
- "Good first issue" list
Small healthcheck tool for Longhorn by mbrookhuis
Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py
Introducing "Bottles": A Proof of Concept for Multi-Version CRD Management in Kubernetes by aruiz
Description
As we delve deeper into the complexities of managing multiple CRD versions within a single Kubernetes cluster, I want to introduce "Bottles" - a proof of concept that aims to address these challenges.
Bottles propose a novel approach to isolating and deploying different CRD versions in a self-contained environment. This would allow for greater flexibility and efficiency in managing diverse workloads.
Goals
- Evaluate Feasibility: determine if this approach is technically viable, as well as identifying possible obstacles and limitations.
- Reuse existing technology: leverage existing products whenever possible, e.g. build on top of Kubewarden as admission controller.
- Focus on Rancher's use case: the ultimate goal is to be able to use this approach to solve Rancher users' needs.
Resources
Core concepts:
- ConfigMaps: Bottles could be defined and configured using ConfigMaps.
- Admission Controller: An admission controller will detect "bootled" CRDs being installed and replace the resource name used to store them.
- Aggregated API Server: By analyzing the author of a request, the aggregated API server will determine the correct bottle and route the request accordingly, making it transparent for the user.
Longhorn UI Extension (POC) by yiya.chen
Description
The goal is to create a Longhorn UI extension within Rancher using existing resources.
Longhorn’s UI is built using React, while Rancher’s UI extensions are built using Vue. Developers will explore different approaches to integrate and extend Longhorn’s UI within Rancher’s Vue-based ecosystem, aiming to create a seamless, functional UI extension.
Goals
- Build a Longhorn UI extension (look and feel)
- Support theme switching to align with Rancher’s UI
Results
- https://github.com/a110605/longhorn-hackday
- https://github.com/a110605/longhorn-ui/tree/darkmode
- https://github.com/houhoucoop/hackweek/tree/main/hackweek24
Resources
- Longhorn UI: https://github.com/longhorn/longhorn-ui
- Rancher UI Extension: https://extensions.rancher.io/extensions/next/home
- darkreader: https://www.npmjs.com/package/darkreader
- veaury: https://github.com/gloriasoft/veaury
- module federation: https://webpack.js.org/concepts/module-federation/
A CLI for Harvester by mohamed.belgaied
[comment]: # Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI [comment]: # Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. [comment]: # Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.
Project Description
Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as:
harvester vm create my-vm --count 5
to create 5 VMs named my-vm-01
to my-vm-05
.
Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.
Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli
Done in previous Hackweeks
- Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
- Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE
Goal for this Hackweek
The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.
Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it
Issue list is here: https://github.com/belgaied2/harvester-cli/issues
Resources
The project is written in Go, and using client-go
the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact).
Welcome contributions are:
- Testing it and creating issues
- Documentation
- Go code improvement
What you might learn
Harvester CLI might be interesting to you if you want to learn more about:
- GitHub Actions
- Harvester as a SUSE Product
- Go programming language
- Kubernetes API
Rancher microfrontend extensions by ftorchia
Description
Rancher UI Extensions allow users, developers, partners, and customers to extend and enhance the Rancher UI. Extensions are Helm charts that can only be installed once into a cluster. The charts contain a UI built package that is downloaded and linked to the Host UI at runtime; this means that the extension pkg needs to be implemented using the same technology and have the same APIs as Rancher UI.
Goals
We want to create a new type of Rancher extension, based on microfrontend pattern. The extension is served in a docker container in the k8s clusters and embedded in the host UI; this would guarantee us to be able to create extensions unrelated to the rancher UI architecture, in any technology.
Non Goals
We want to apply the microfrontend pattern to the product-level extensions; we don't want to apply it to cluster-level extensions.
Resources
rancher-extension-microfrontend, Rancher extensions
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games
Technical talks at universities by agamez
Description
This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.
For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.
Goals
- Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
- Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
- Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.
Resources
- Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
- SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.
Learn enough Golang and hack on CoreDNS by jkuzilek
Description
I'm implementing a split-horizon DNS for my home Kubernetes cluster to be able to access my internal (and external) services over the local network through public domains. I managed to make a PoC with the k8s_gateway plugin for CoreDNS. However, I soon found out it responds with IPs for all Gateways assigned to HTTPRoutes, publishing public IPs as well as the internal Loadbalancer ones.
To remedy this issue, a simple filtering mechanism has to be implemented.
Goals
- Learn an acceptable amount of Golang
- Implement GatewayClass (and IngressClass) filtering for k8s_gateway
- Deploy on homelab cluster
- Profit?
Resources
- https://github.com/ori-edge/k8s_gateway/issues/36
- https://github.com/coredns/coredns/issues/2465#issuecomment-593910983
EDIT: Feature mostly complete. An unfinished PR lies here. Successfully tested working on homelab cluster.
Integrate Backstage with Rancher Manager by nwmacd
Description
Backstage (backstage.io) is an open-source, CNCF project that allows you to create your own developer portal. There are many plugins for Backstage.
This could be a great compliment to Rancher Manager.
Goals
Learn and experiment with Backstage and look at how this could be integrated with Rancher Manager. Goal is to have some kind of integration completed in this Hack week.
Progress
Screen shot of home page at the end of Hackweek:
Day One
- Got Backstage running locally, understanding configuration with HTTPs.
- Got Backstage embedded in an IFRAME inside of Rancher
- Added content into the software catalog (see: https://backstage.io/docs/features/techdocs/getting-started/)
- Understood more about the entity model
Day Two
- Connected Backstage to the Rancher local cluster and configured the Kubernetes plugin.
- Created Rancher theme to make the light theme more consistent with Rancher
Days Three and Day Four
Created two backend plugins for Backstage:
- Catalog Entity Provider - this imports users from Rancher into Backstage
- Auth Provider - uses the proxied sign-in pattern to check the Rancher session cookie, to user that to authenticate the user with Rancher and then log them into Backstage by connecting this to the imported User entity from the catalog entity provider plugin.
With this in place, you can single-sign-on between Rancher and Backstage when it is deployed within Rancher. Note this is only when running locally for development at present
Day Five
- Start to build out a production deployment for all of the above
- Made some progress, but hit issues with the authentication and proxying when running proxied within Rancher, which needs further investigation
Small healthcheck tool for Longhorn by mbrookhuis
Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py