Project Description

When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.

Goal for this Hackweek

Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix Create at least 5 modules that can be applied to the cluster and require troubleshooting

Resources

https://github.com/rancher/terraform-provider-rancher2 https://github.com/rancher/tf-rancher-up

Looking for hackers with the skills:

terraform rancher kubernetes

This project is part of:

Hack Week 24

Activity

  • 5 months ago: FruityWelsh liked this project.
  • 5 months ago: FruityWelsh joined this project.
  • 5 months ago: tonyhansen started this project.
  • 5 months ago: tonyhansen added keyword "terraform" to this project.
  • 5 months ago: tonyhansen added keyword "rancher" to this project.
  • 5 months ago: tonyhansen added keyword "kubernetes" to this project.
  • 5 months ago: tonyhansen originated this project.

  • Comments

    • tonyhansen
      3 months ago by tonyhansen | Reply

      First commit and the basics are done. Testing Terraform and then to figure out how to intentionally break things.

      https://github.com/celidon/rancher-troublemaker

    • tonyhansen
      2 months ago by tonyhansen | Reply

      Up and running!

    Similar Projects

    SUSE AI Meets the Game Board by moio

    Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
    A chameleon playing chess in a train car, as a metaphor of SUSE AI applied to games


    Results: Infrastructure Achievements

    We successfully built and automated a containerized stack to support our AI experiments. This included:

    A screenshot of k9s and nvtop showing PyTAG running in Kubernetes with GPU acceleration

    ./deploy.sh and voilà - Kubernetes running PyTAG (k9s, above) with GPU acceleration (nvtop, below)

    Results: Game Design Insights

    Our project focused on modeling and analyzing two card games of our own design within the TAG framework:

    • Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
    • AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
    • Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .

    Cards from the three games

    A family picture of our card games in progress. From the top: Bamboo, Totoro, R3

    Results: Learning, Collaboration, and Innovation

    Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:

    • "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
    • AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
    • GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
    • Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.

    Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!

    The Context: AI + Board Games


    Contribute to terraform-provider-libvirt by pinvernizzi

    Description

    The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.

    It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.

    If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.

    Goals

    • Get more familiar with Terraform provider development and libvirt bindings in Go
    • Solve some issues and/or implement some features
    • Get in touch with the community around the project

    Resources


    Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil

    Join the Gitter channel! https://gitter.im/uyuni-project/hackweek

    Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!

    Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.

    For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.

    No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)

    The idea is testing Salt and Salt-ssh clients, but NOT traditional clients, which are deprecated.

    To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):

    1. Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
    2. Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
    3. Package management (install, remove, update...)
    4. Patching
    5. Applying any basic salt state (including a formula)
    6. Salt remote commands
    7. Bonus point: Java part for product identification, and monitoring enablement
    8. Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
    9. Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
    10. Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)

    If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)

    • If you don't have knowledge about some of the steps: ask the team
    • If you still don't know what to do: switch to another distribution and keep testing.

    This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)

    Pending

    FUSS

    FUSS is a complete GNU/Linux solution (server, client and desktop/standalone) based on Debian for managing an educational network.

    https://fuss.bz.it/

    Seems to be a Debian 12 derivative, so adding it could be quite easy.

    • [W] Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
    • [W] Onboarding (salt minion from UI, salt minion from bootstrap script, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator) --> Working for all 3 options (salt minion UI, salt minion bootstrap script and salt-ssh minion from the UI).
    • [W] Package management (install, remove, update...) --> Installing a new package works, needs to test the rest.
    • [I] Patching (if patch information is available, could require writing some code to parse it, but IIRC we have support for Ubuntu already). No patches detected. Do we support patches for Debian at all?
    • [W] Applying any basic salt state (including a formula)
    • [W] Salt remote commands
    • [ ] Bonus point: Java part for product identification, and monitoring enablement


    terraform-provider-feilong by e_bischoff

    Project Description

    People need to test operating systems and applications on s390 platform.

    Installation from scratch solutions include:

    • just deploy and provision manually add-emoji (with the help of ftpboot script, if you are at SUSE)
    • use s3270 terminal emulation (used by openQA people?)
    • use LXC from IBM to start CP commands and analyze the results
    • use zPXE to do some PXE-alike booting (used by the orthos team?)
    • use tessia to install from scratch using autoyast
    • use libvirt for s390 to do some nested virtualization on some already deployed z/VM system
    • directly install a Linux kernel on a LPAR and use kvm + libvirt from there

    Deployment from image solutions include:

    • use ICIC web interface (openstack in disguise, contributed by IBM)
    • use ICIC from the openstack terraform provider (used by Rancher QA)
    • use zvm_ansible to control SMAPI
    • connect directly to SMAPI low-level socket interface

    IBM Cloud Infrastructure Center (ICIC) harnesses the Feilong API, but you can use Feilong without installing ICIC, provided you set up a "z/VM cloud connector" into one of your VMs following this schema.

    What about writing a terraform Feilong provider, just like we have the terraform libvirt provider? That would allow to transparently call Feilong from your main.tf files to deploy and destroy resources on your system/z.

    Other Feilong-based solutions include:

    • make libvirt Feilong-aware
    • simply call Feilong from shell scripts with curl
    • use zvmconnector client python library from Feilong
    • use zthin part of Feilong to directly command SMAPI.

    Goal for Hackweek 23

    My final goal is to be able to easily deploy and provision VMs automatically on a z/VM system, in a way that people might enjoy even outside of SUSE.

    My technical preference is to write a terraform provider plugin, as it is the approach that involves the least software components for our deployments, while remaining clean, and compatible with our existing development infrastructure.

    Goals for Hackweek 24

    Feilong provider works and is used internally by SUSE Manager team. Let's push it forward!

    Let's add support for fiberchannel disks and multipath.

    Goals for Hackweek 25

    • Finish support for fiberchannel disks and multipath
    • Fix problems with registration on hashicorp providers registry


    Small healthcheck tool for Longhorn by mbrookhuis

    Project Description

    We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.

    As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.

    This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.

    Goal for this Hackweek

    At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.

    Overview

    This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:

    • node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems

    • deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.

    • pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.

    If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.

    The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.

    The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.

    Installation

    To install this project, perform the following steps:

    • Create the directory /opt/k8s-check

    mkdir /opt/k8s-check

    • Copy all the file to this directory and make the following changes:

    chmod +x k8s-check.py


    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    Goals for HackWeek 2024

    • Add support for ClusterClass
    • Add e2e testing
    • Add more Unit Tests
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)
    • Testing with newer Harvester version (v1.3.X and v1.4.X)
    • Due to the length and complexity of the templates, maybe package some of them as Helm Charts.
    • Other improvement suggestions are welcome!

    DONE in HackWeek 24:

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    Enabling Rancher as an OIDC Provider by rcabello

    Description

    Kubernetes supports OpenID Connect (OIDC) natively as an authentication mechanism, enabling token-based user authentication. This can be configured through flags in the Kubernetes API server or by using AuthenticationConfiguration.

    The purpose of this project is to enable Rancher to function as an OIDC provider, allowing Rancher's local cluster to act as an OIDC identity provider for downstream clusters. This setup will allow users to authenticate directly with downstream clusters without relying on Rancher’s proxy and impersonation mechanisms.

    Rancher will continue to support all authentication providers. When a user attempts to log in via the Rancher OIDC provider, they will be redirected to the authentication provider configured in Rancher.

    This approach also facilitates integration with third-party tools (e.g StackState)

    Goals

    • Implement Rancher as an OIDC provider using the ORY Fosite library, focusing only on the essential functionality required for basic integration.
    • Enable downstream clusters to authenticate using JWT tokens issued by Rancher.
    • Configure StackState to authenticate using Rancher as an OIDC provider.

    Resources

    https://github.com/ory/fosite


    CVE portal for SUSE Rancher products by gmacedo

    Description

    Currently it's a bit difficult for users to quickly see the list of CVEs affecting images in Rancher, RKE2, Harvester and Longhorn releases. Users need to individually look for each CVE in the SUSE CVE database page - https://www.suse.com/security/cve/ . This is not optimal, because those CVE pages are a bit hard to read and contain data for all SLE and BCI products too, making it difficult to easily see only the CVEs affecting the latest release of Rancher, for example. We understand that certain costumers are only looking for CVE data for Rancher and not SLE or BCI.

    Goals

    The objective is to create a simple to read and navigate page that contains only CVE data related to Rancher, RKE2, Harvester and Longhorn, where it's easy to search by a CVE ID, an image name or a release version. The page should also provide the raw data as an exportable CSV file.

    It must be an MVP with the minimal amount of effort/time invested, but still providing great value to our users and saving the wasted time that the Rancher Security team needs to spend by manually sharing such data. It might not be long lived, as it can be replaced in 2-3 years with a better SUSE wide solution.

    Resources

    • The page must be simple and easy to read.
    • The UI/UX must be as straightforward as possible with minimal visual noise.
    • The content must be created automatically from the raw data that we already have internally.
    • It must be updated automatically on a daily basis and on ad-hoc runs (when needed).
    • The CVE status must be aligned with VEX.
    • The raw data must be exportable as CSV file.
    • Ideally it will be written in Go or pure Shell script with basic HTML and no external dependencies in CSS or JS.


    Integrate Backstage with Rancher Manager by nwmacd

    Description

    Backstage (backstage.io) is an open-source, CNCF project that allows you to create your own developer portal. There are many plugins for Backstage.

    This could be a great compliment to Rancher Manager.

    Goals

    Learn and experiment with Backstage and look at how this could be integrated with Rancher Manager. Goal is to have some kind of integration completed in this Hack week.

    Progress

    Screen shot of home page at the end of Hackweek:

    Home

    Day One

    • Got Backstage running locally, understanding configuration with HTTPs.
    • Got Backstage embedded in an IFRAME inside of Rancher
    • Added content into the software catalog (see: https://backstage.io/docs/features/techdocs/getting-started/)
    • Understood more about the entity model

    Day Two

    • Connected Backstage to the Rancher local cluster and configured the Kubernetes plugin.
    • Created Rancher theme to make the light theme more consistent with Rancher

    Home

    Days Three and Day Four

    • Created two backend plugins for Backstage:

      1. Catalog Entity Provider - this imports users from Rancher into Backstage
      2. Auth Provider - uses the proxied sign-in pattern to check the Rancher session cookie, to user that to authenticate the user with Rancher and then log them into Backstage by connecting this to the imported User entity from the catalog entity provider plugin.
    • With this in place, you can single-sign-on between Rancher and Backstage when it is deployed within Rancher. Note this is only when running locally for development at present

    Home

    Home

    Day Five

    • Start to build out a production deployment for all of the above
    • Made some progress, but hit issues with the authentication and proxying when running proxied within Rancher, which needs further investigation


    Multi-pod, autoscalable Elixir application in Kubernetes using K8s resources by socon

    Description

    Elixir / Erlang use their own solutions to create clusters that work together. Kubernetes provide its own orchestration. Due to the nature of the BEAM, it looks a very promising technology for applications that run in Kubernetes and requite to be always on, specifically if they are created as web pages using Phoenix.

    Goals

    • Investigate and provide solutions that work in Phoenix LiveView using Kubernetes resources, so a multi-pod application can be used
    • Provide an end to end example that creates and deploy a container from source code.

    Resources

    https://github.com/dwyl/phoenix-liveview-counter-tutorial https://github.com/propedeutica/elixir-k8s-counter


    kubectl clone: Seamlessly Clone Kubernetes Resources Across Multiple Rancher Clusters and Projects by dpunia

    Description

    kubectl clone is a kubectl plugin that empowers users to clone Kubernetes resources across multiple clusters and projects managed by Rancher. It simplifies the process of duplicating resources from one cluster to another or within different namespaces and projects, with optional on-the-fly modifications. This tool enhances multi-cluster resource management, making it invaluable for environments where Rancher orchestrates numerous Kubernetes clusters.

    Goals

    1. Seamless Multi-Cluster Cloning
      • Clone Kubernetes resources across clusters/projects with one command.
      • Simplifies management, reduces operational effort.

    Resources

    1. Rancher & Kubernetes Docs

      • Rancher API, Cluster Management, Kubernetes client libraries.
    2. Development Tools

      • Kubectl plugin docs, Go programming resources.

    Building and Installing the Plugin

    1. Set Environment Variables: Export the Rancher URL and API token:
    • export RANCHER_URL="https://rancher.example.com"
    • export RANCHER_TOKEN="token-xxxxx:xxxxxxxxxxxxxxxxxxxx"
    1. Build the Plugin: Compile the Go program:
    • go build -o kubectl-clone ./pkg/
    1. Install the Plugin: Move the executable to a directory in your PATH:
    • mv kubectl-clone /usr/local/bin/

    Ensure the file is executable:

    • chmod +x /usr/local/bin/kubectl-clone
    1. Verify the Plugin Installation: Test the plugin by running:
    • kubectl clone --help

    You should see the usage information for the kubectl-clone plugin.

    Usage Examples

    1. Clone a Deployment from One Cluster to Another:
    • kubectl clone --source-cluster c-abc123 --type deployment --name nginx-deployment --target-cluster c-def456 --new-name nginx-deployment-clone
    1. Clone a Service into Another Namespace and Modify Labels:


    Introducing "Bottles": A Proof of Concept for Multi-Version CRD Management in Kubernetes by aruiz

    Description

    As we delve deeper into the complexities of managing multiple CRD versions within a single Kubernetes cluster, I want to introduce "Bottles" - a proof of concept that aims to address these challenges.

    Bottles propose a novel approach to isolating and deploying different CRD versions in a self-contained environment. This would allow for greater flexibility and efficiency in managing diverse workloads.

    Goals

    • Evaluate Feasibility: determine if this approach is technically viable, as well as identifying possible obstacles and limitations.
    • Reuse existing technology: leverage existing products whenever possible, e.g. build on top of Kubewarden as admission controller.
    • Focus on Rancher's use case: the ultimate goal is to be able to use this approach to solve Rancher users' needs.

    Resources

    Core concepts:

    • ConfigMaps: Bottles could be defined and configured using ConfigMaps.
    • Admission Controller: An admission controller will detect "bootled" CRDs being installed and replace the resource name used to store them.
    • Aggregated API Server: By analyzing the author of a request, the aggregated API server will determine the correct bottle and route the request accordingly, making it transparent for the user.


    SUSE AI Meets the Game Board by moio

    Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
    A chameleon playing chess in a train car, as a metaphor of SUSE AI applied to games


    Results: Infrastructure Achievements

    We successfully built and automated a containerized stack to support our AI experiments. This included:

    A screenshot of k9s and nvtop showing PyTAG running in Kubernetes with GPU acceleration

    ./deploy.sh and voilà - Kubernetes running PyTAG (k9s, above) with GPU acceleration (nvtop, below)

    Results: Game Design Insights

    Our project focused on modeling and analyzing two card games of our own design within the TAG framework:

    • Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
    • AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
    • Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .

    Cards from the three games

    A family picture of our card games in progress. From the top: Bamboo, Totoro, R3

    Results: Learning, Collaboration, and Innovation

    Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:

    • "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
    • AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
    • GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
    • Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.

    Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!

    The Context: AI + Board Games


    ddflare: (Dynamic)DNS management via Cloudflare API in Kubernetes by fgiudici

    Description

    ddflare is a project started a couple of weeks ago to provide DDNS management using v4 Cloudflare APIs: Cloudflare offers management via APIs and access tokens, so it is possible to register a domain and implement a DynDNS client without any other external service but their API.

    Since ddflare allows to set any IP to any domain name, one could manage multiple A and ALIAS domain records. Wouldn't be cool to allow full DNS control from the project and integrate it with your Kubernetes cluster?

    Goals

    Main goals are:

    1. add containerized image for ddflare
    2. extend ddflare to be able to add and remove DNS records (and not just update existing ones)
    3. add documentation, covering also a sample pod deployment for Kubernetes
    4. write a ddflare Kubernetes operator to enable domain management via Kubernetes resources (using kubebuilder)

    Available tasks and improvements tracked on ddflare github.

    Resources

    • https://github.com/fgiudici/ddflare
    • https://developers.cloudflare.com/api/
    • https://book.kubebuilder.io