Since SUSE Manager doesn't scale out and stacking it into another pyramid of susemanagers won't help here, the real architectural changes needs to be done to achieve true scale-out of this product. This hackweek project is about how to Turn SUSE Manager into a cluster.

Areas to be tackled:

  1. Distributed FS for storage
  2. Distributed messaging bus
  3. Distributed KV for metadata
  4. Control Node prototype (this component turns a SUSE Manager into a stateless-ish event-driven node, where SUSE Manager in a future is deployed into a container -> Kubernetes)
  5. Cluster Director prototype (this component orchestrates the entire cluster via events)
  6. API Gateway, providing 100% compatible API across all SUSE Manager nodes (this is done, demoed at SUMA Winter Summit 2020)
  7. Slightly modified SUSE Manager peripherals (reposync, DB etc).
  8. Client daemon, which is used to bind a single registered system to a cluster node according to Cluster Director service (on shrink, grow, rebalance and disaster recovery events).

The idea is to solve a set of core problems how to turn technologically outdated Uyuni Server / SUMA into a modern cloud-native cluster node, so with after a reasonable time this could be turned into a real product for SUSE Manager.


Progress

Day One (Monday, 10 Feb)

Storm seems over. Realised this website doesn't allow editing comments :astonished:.

Got working initial Cluster Daemon that runs REST API and talks to the distributed KV database.

Got working Client Daemon for every Client System. So far it can:

  • Pool the Client System to the cluster for staging
  • Talk to Cluster Director (CD) and get the status
  • Switch/reconfigure Salt Minion according to the CD's directives

Got initial tool (Python) that calls Cluster API. So far it is too simple to describe. add-emoji

Got initial Cluster Director with OpenAPI spec (Swagger) running.

TODO:

  • [x] Running distributed K/V database
    • [x] Verify it is buildable/package-able
  • [x] Running distributed message bus
    • [x] Verify it is buildable/package-able
  • [x] Manage cluster Zones
  • [x] mgr-clbd-admin tool
    • [x] List nodes
    • [x] List zones
    • [x] Format JSON input
    • [x] Call API via JSON input
  • [x] Add OpenAPI to Cluster Director daemon
    • [x] Swagger UI running
    • [x] APIs are automatically generated (updated Makefile)
    • [x] I can try that in browser
  • [x] Describe all the APIs I've got so far

The day is over. Did many refactorings, found that Gin doesn't do form parsing from the body on DELETE method but solved it, not yet haven't finished Zones management (which is damn easy now, but argghh still!).


Day Two (Tuesday, 11 Feb)

Turned mgr-clbd-admin into a repo subproject of the Cluster Director daemon. There should be a set of common Jinja-based formatters for those common return types, but that's "bells-n-whistles" I will take care later. Right now a raw JSON dump is good 'nuff.

Update (15:20): Group photo taken, many refactorings, Zones management done. Time to write Node Controller.

Done (partially):

  • [*] Node Controller (initial)
    • [x] "Wrap around head" overall code design
    • [x] SSH communication over RSA keypair to the staging Cluster Node
    • [x] Bi-directional pub/sub (initial)
    • [*] Configuration file
    • [x] SSH check remote host
    • [x] SSH disable host verification option
    • [*] Events are emitted from an arbitrary SUSE Manager to the Cluster bus via Node Controller (simple-n-stupid PoC ATM)
    • [*] Commands from Cluster Directory received and mapped to the emitter facility (write one for XML-RPC APIs)
    • [*] Cluster Node staging
      • [x] Initial overall PoC code
      • [x] Execute nanostates[1]
        • [x] SSH Runner runs a nanostate on a remote machine
        • [x] Refactor SSH Runner output type before it is too late. It is complex enough to be very bad as map[interface{}]interface{}.
        • [x] Implement local runner (for a client on localhost)
      • [*] Write few nanostate scenarios to complete Node staging:
        • [*] Reset/prepare PostgreSQL database
        • [*] Prepare/mount distributed File System

NodeController is about to execute nanostates. It is like "nano-ansible" in the pocket, a fusion of Salt and Ansible ideas into a small package, which is not intended to be as broad CMS as those two. Essentially, it just runs a series of commands over SSH to a specific release of supported Cluster Node (AKA SUSE Manager) and does some rudimentary "things" on it, once it was installed and setup. These are like getting machine-id, backing up some configuration files, making standard operations with the database, start/stop/restart some services etc: all what can be done just via plain command line and mostly used for informational purposes.

This alone removes any kind of need of internal configuration management system to just stage a Cluster Node and add it to the swarm.


Day Three (Wednesday, 12 Feb)

An Unwanted Accident... (almost)

While working on a Cluster Node code for staging and playing with SSH sessions and channels, I just accidentally "rewrote" Salt, combining both best practices of Salt and Ansible. Of course, it is far-far-far-far away from what Salt can do today.

Or... is it? add-emoji Let's see.

So the main plan was to manage cluster nodes and their components and nothing else. For that nobody needs a full-blown configuration management infrastructure, right? And so it happens that Cluster since has a Cluster Director (that supposed to scale-out on its own), but it is essentially just like that woulld be a Salt Master. And, consequently, since it is hooked up to an abstract Message Bus (you write an adapter connector for Kafka if you need many millions, but so far I am sure NATS will do for millions of #susemanagers), it talks to a Node Controllers on each #susemanager node, which is... right, an analogy to a Salt Minion. And here you go: bi-directional pub/sub that can perform something when a particular message arrives.

Security? Everything is happening over TLS anyway. But then another layer is based on pure OpenSSL: whatever Cluster Director needs to pass secretly to a specific Cluster Node, it sends to a channel, where every message is also encrypted on its own. Each Cluster Node is subscribed to a TWO channels:

  1. General public
  2. Private (by key fingerprint)

Cluster Director sends everything plain text to a public channel, but secret communication is running over private channel, encrypted with the public key of the recepient. Complexity of returners, pillars etc is no longer needed in this way.

It didn't took me long to craft a simple architecture to foresee even embedding Lua or Starlight into a state system. In fact, it would be even better than Salt, because one wouldn't have if/else imperative clutter in the declarative state (!). How a state looks like? Currently so (again, it is a hackweek of one solo "cowboy", not even pre-alpha):

id: some-test
description: This is a test state
state:
  gather-machine-summary:
    - shell:
        - get-id: "cat /etc/machine-id"
        - uptime: "uptime"
        - hostname: "hostname"

The shell is a module I just wrote. It takes series of the commands and runs it, returning me something like this:

{
    "get-id": "12e43783e54f25bb3f505cfeeff94045",
    "upteime": "13:54:07 up 18:59,  1 user,  load average: 0,10, 0,18, 0,17",
    "hostname": "rabbit"
}

It also happens that all the above can be ran locally or remotely. Or one-to-many remotely over SSH.

But... Wait a sec. Only $SHELL? Can it do a bit more than this? Wait-wait. So if I can run arbitrary stuff already (which Ansible and Salt are anyway), then what stops me to call pure Ansible binary modules, and just access all that pile of crazy modules they've got already working? Nothing! Just scp them there and stockpile on the client. In fact, just install the entire Ansible and run it there as is. It is as same as Saltsible works, after all.

Basically, I ended up with a message-driven cluster architecture that happens to be compatible with Ansible modules. Kind of. Not yet completely, but to bring that to 100% compatibility is no brainer, just need few more extra days to get done. Which not my goal and priority at the moment anyway.

So and then imagine if having Lua or Starlight (Python dialect) embedded, you could do the above another way:

import:
  - ssh_functions
id: some-test
description: This is a test state
state:
  gather-machine-summary:
    - shell: 
      - {get_id()}
      - {get_uptime()}
      - {get_hostname()}

These would be a functions somewhere in a file ssh_functions.lua. The same functions could do also this:

import:
  - ssh_functions
id: some-test
description: This is a test state
state:
  create-user:
    - shell: 
      - {add_user(uid=get_salt_pillar("uid"))}

Well, you've got the idea. Anyway, I will focus on unchecked boxes above from yesterday and will finish this working at least anything. And thus won't extrapolate this up to eleven. At least not at the moment.

Update (17:40) Nanostates are happily running passed on scenarios on remote machines add-emoji Sort of declarative orchestration. Not yet asynchronous. Few minutes left, maybe I will implement local runner?.. add-emoji add-emoji

Update (somewhere evening) Refactored runners and implemented SSH runner as well as local runner. Nah, but the possibility of running Ansible modules FOR FREE is still bugging me! add-emoji Instead of reimplementing PostgreSQL start/stop and prepare/mount distributed file system in a shell command line, how much time it will take to hook the whole Ansible modular system and use it from the nanostates? It is a Hackweek, after all! add-emoji


Day Four (Thursday, 13 Feb)

What one can do in basically four days, having almost nothing? A lot! add-emoji So far, my leftovers from yesterday:

TODO:

  • [x] Integrate Ansible
    • [x] Runs binary modules
    • [x] Runs Python modules
  • [ ] Write few nanostate scenarios to complete Node staging:
    • [ ] Reset/prepare PostgreSQL database
    • [ ] Prepare/mount distributed File System
  • [ ] Cluster Node staging
    • [ ] Integrate staging part together with the Node Controller
  • [ ] Node Controller (initial)
    • [x] Configuration file
    • [x] PostgreSQL event emitter
    • [x] Events are emitted from an arbitrary SUSE Manager to the Cluster bus via Node Controller (simple-n-stupid PoC ATM)
    • [ ] Commands from Cluster Directory received and mapped to the emitter facility (write one for XML-RPC APIs)

OK, well... Ansible would be certainly a right next step to look at, but ATM I'd rather save time and focus on emitting messages from the PostreSQL database, which is deep inside SUSE Manager's guts. So toss-in few basic shell commands for Node staging and that's it.

Update (12:00) SCNR...

Took this official Ansible module. Then wrote a nanostate snippet:

- ansible.helloworld:
    name: "Cluster"

Result:

    {
      "Module": "ansible.helloworld",
      "Errcode": 0,
      "Errmsg": "",
      "Response": [
        {
          "Host": "localhost",
          "Response": {
            "ansible.helloworld": {
              "Stdout": "",
              "Stderr": "",
              "Errmsg": "",
              "Errcode": 0,
              "Json": {
                "changed": false,
                "failed": false,
                "msg": "Hello, Cluster!"
              }
            }
          }
        }
      ]
    }

Of course, this inherited Ansible's main illness: dont_run_this_twice.yaml. Calling nanostates nanostates is too loud at the moment: they won't check the state, but just fire whatever in them "into the woods". But the goal of the project nor to write another Configuration Management, n̶e̶i̶t̶h̶e̶r̶ ̶s̶c̶a̶l̶e̶-̶o̶u̶t̶ ̶A̶n̶s̶i̶b̶l̶e̶ (oops, that just happened unplanned), neither to fix Ansible imperative behaviour and build around it declarative runners (which is not really a problem, BTW).

Oh well. Fun. Now messaging bus story: Postgres, here I come!

Update (somewhere evening) PostgreSQL happily spitting out every changes to its tables through whatever way in Uyuni Server. The XML-RPC APIs are very slow, on the other hand. I was exploring ways how to implement plugins in Go, so then I don't have to bundle everything into one binary. The gRPC way is the only reliable and nicely decouple-able. The "native" Go plugins are an interesting tech preview, working nice (as long as the same $GOPATH and the same compiler is used) but sadly they seems still quite far away from production status. Plugins supposed to be written by different vendors, which is not the case right now seems to be supported.

I am right now solving problems how the Node Controller will reconcile network transaction across the entire cluster, making 100% sure all-or-none nodes has been updated. As always, there are several ways of doing it, but I have to find out which one suits best.


The Last Day of The Hackweek (Friday, 14 Feb)

At least it isn't Friday 13. Starting from touch main.go, so far what I've got per these days:

Someone Did It

But I chose it and put it together. I chose that, because I can also support it and bugfix it.

  • Running equivalent to etcd, which scales out way better than etcd. Check out TiKV. If you know Rust, you will have lots of fun.
  • Running MySQL compatibility layer on top of it. Performance is about 10x times slower than MySQL's InnoDB, but in this case performance isn't an issue at all. Important that this thing scales out infinitely, just a bit of space hungry. Check out TiDB
  • Running distributed storage and mountable filesystem. If SES guys will one day support "SUSE Manager on Ceph nodes", it will be just fantastic. Until then — other solutions. Check out SeaweedFS and IPFS. The IPFS is running Tumbleweed repo at SUSE.
  • Running message bus that supposed to scale out same as Apache Kafka does. The reason not to use Apache Kafka is very trivial: its infrastructure is much harder to maintain. But this is not a reason and so hard infrastructure maintenance on its own does not rules Kafka out! You want it? No problemo: — just add another adapter to Apache Kafka and replace with currently used NATS. In fact, NATS perfectly co-exists with Kafka in some infrastructures. Check out NATS

I Did It

  • Running Client System Daemon (i.e. "runs on registered client system") which main basic role to ask Cluster what node to use, automatically reconfigure Salt Minion and other configuration and then re-point client system to a new Cluster Node (AKA Uyuni Server), if that is needed. It also recovers client system back to the cluster, once Cluster Node puffed in smokes.
  • "one to many and many to one" API Gateway, which allows spacecmd and similar tools to "just work" across multiple nodes. Granted, it wasn't written during this Hackweek and is 99.999% compatible (I was too lazy to get back and implement overloaded XML-RPC signatures for REST, as well I am returning nil instead of an empty dictionary — probably a bug, but... meh... later). This thing also runs Swagger UI for OpenAPI specs against all XML-RPC API for SUSE Manager. add-emoji
  • Very basic Cluster Director that can manage zones in and add cluster nodes. It as well runs OpenAPI/Swagger UI. Very basic, because it has no features yet. But doesn't mean it doesn't have more-less solid architecture.
  • Library that runs Ansible in Salt fashion (via bi-directional pub/sub, which rules out returners/pillars as unnecessary). I am going to use that internally instead of both Salt and Ansible on their own. Again, it is simpler to call Ansible module reusing existing scaled out infrastructure, rather then run-and-take-care-of yet another components. And I don't have to maintain Ansible: it is perfectly tested anyway.
  • Library that resembles SaltSSH by running Ansible modules (both Python and binary). I am considering I've done it, because I could (doesn't mean I should).
  • Very unfinished Initial Node Controller Daemon, which listens to Uyuni Server events and emits messages to the bus for further operations.

Phew. Not bad as for basically four days, I'd say! All that stuff I wrote in Go. I'd say it does make sense to use that language, if you don't want to write Java or Python.

What are my nearest plans?

  • Finish the "loop" and have all components running, talking to each other, client nodes are transfered seamlessly.
  • Achieve network transaction on updating Cluster Nodes.
  • Write some Ansible modules, likely in plain old C and Rust, add their caching on the client so it will perform well, add seamless module updates. Generic Ansible doesn't hurt, but I don't need it for Cluster needs.
  • Modular/pluggable system in Go, so this whole project can be adaptable to other products, not just SUSE Manager.

Presentation Slides

I've put an outline together all that into my Google Drive. Enjoy.

...and stay tuned add-emoji

Looking for hackers with the skills:

distributedsystems cluster cloud kubernetes golang go rust

This project is part of:

Hack Week 19

Activity

  • almost 6 years ago: keichwa liked this project.
  • almost 6 years ago: ktsamis liked this project.
  • almost 6 years ago: bmaryniuk added keyword "rust" to this project.
  • almost 6 years ago: bmaryniuk added keyword "golang" to this project.
  • almost 6 years ago: bmaryniuk added keyword "go" to this project.
  • almost 6 years ago: pagarcia liked this project.
  • almost 6 years ago: bmaryniuk added keyword "kubernetes" to this project.
  • almost 6 years ago: bmaryniuk added keyword "distributedsystems" to this project.
  • almost 6 years ago: bmaryniuk added keyword "cluster" to this project.
  • almost 6 years ago: bmaryniuk added keyword "cloud" to this project.
  • almost 6 years ago: bmaryniuk started this project.
  • almost 6 years ago: bmaryniuk originated this project.

  • Comments

    • bmaryniuk
      almost 6 years ago by bmaryniuk | Reply

      Day One

      TODO:

      • [x] Running distributed K/V database
        • [x] Verify it is buildable/package-able
      • [x] Running distributed message bus
        • [x] Verify it is buildable/package-able
      • [ ] Manage cluster Zones
      • [ ]

      Summary

      Got running Client Daemon. It can:

      • Talk to Cluster Director (CD) and ask for status
      • Switch/reconfigure Salt Minion according to the CD's directives

    • keichwa
      almost 6 years ago by keichwa | Reply

      Yes, and presentation slides!

    Similar Projects

    Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev

    Description

    Continue the work on kqlite (Lightweight remote SQLite with high availability and auto failover).
    It's a solution for applications that require High Availability but don't need all the features of a complete RDBMS and can fit SQLite in their use case.
    Also kqlite can be considered to be used as a lightweight storage backend for K8s (https://docs.k3s.io/datastore) and the Edge, and allowing to have only 2 Nodes for HA.

    Goals

    Push kqlite to a beta version.
    kqlite as library for Go programs.

    Resources

    https://github.com/kqlite/kqlite


    Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti

    Description

    The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.

    The initial architecture can be checked out on the Repository listed under Resources.

    Among the features that will differ this project from other monitoring/notification systems:

    • Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
    • All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
    • Easy account management, being able to parse all required configuration by a single JSON file
    • Eliminate integrations by not requiring metrics to go through a data-agreggator

    Goals

    • Create a minimal working prototype following the workflow specified on the documentation
    • Provide instructions on installation/usage
    • Work on email notifying capabilities

    Resources


    Mammuthus - The NFS-Ganesha inside Kubernetes controller by vcheng

    Description

    As the user-space NFS provider, the NFS-Ganesha is wieldy use with serval projects. e.g. Longhorn/Rook. We want to create the Kubernetes Controller to make configuring NFS-Ganesha easy. This controller will let users configure NFS-Ganesha through different backends like VFS/CephFS.

    Goals

    1. Create NFS-Ganesha Package on OBS: nfs-ganesha5, nfs-ganesha6
    2. Create NFS-Ganesha Container Image on OBS: Image
    3. Create a Kubernetes controller for NFS-Ganesha and support the VFS configuration on demand. Mammuthus

    Resources

    NFS-Ganesha


    A CLI for Harvester by mohamed.belgaied

    Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.

    Project Description

    Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as: harvester vm create my-vm --count 5 to create 5 VMs named my-vm-01 to my-vm-05.

    asciicast

    Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.

    Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli

    Done in previous Hackweeks

    • Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
    • Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE

    Goal for this Hackweek

    The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.

    Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it

    Issue list is here: https://github.com/belgaied2/harvester-cli/issues

    Resources

    The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact). Welcome contributions are:

    • Testing it and creating issues
    • Documentation
    • Go code improvement

    What you might learn

    Harvester CLI might be interesting to you if you want to learn more about:

    • GitHub Actions
    • Harvester as a SUSE Product
    • Go programming language
    • Kubernetes API
    • Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)


    Rancher/k8s Trouble-Maker by tonyhansen

    Project Description

    When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.

    Goals for Hackweek 25

    • Update to modern Rancher and verify that existing tests still work
    • Change testing logic to populate secrets instead of requiring a secondary script
    • Add new tests

    Goals for Hackweek 24 (Complete)

    • Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
    • Create at least 5 modules that can be applied to the cluster and require troubleshooting

    Resources

    • https://github.com/celidon/rancher-troublemaker
    • https://github.com/rancher/terraform-provider-rancher2
    • https://github.com/rancher/tf-rancher-up
    • https://github.com/rancher/quickstart


    Technical talks at universities by agamez

    Description

    This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.

    For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.

    Goals

    • Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
    • Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
    • Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.

    Resources

    • Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
    • SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.


    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    DONE in HackWeek 24:

    DONE in 2025 (out of Hackweek)

    • Support of ClusterClass
    • Add to clusterctl community providers, you can add it directly with clusterctl
    • Testing on newer versions of Harvester v1.4.X and v1.5.X
    • Support for clusterctl generate cluster ...
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)

    Goals for HackWeek 2025

    • FIRST and FOREMOST, any topic is important to you
    • Add e2e testing
    • Certify the provider for Rancher Turtles
    • Add Machine pool labeling
    • Add PCI-e passthrough capabilities.
    • Other improvement suggestions are welcome!

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes

    Description

    The go-git library implements the git internals in pure Go, so that any Go application can handle not only Git repositories, but also lower-level primitives (e.g. packfiles, idxfiles, etc) without needing to shell out to the git binary.

    The focus for this Hackweek is to fast track key improvements for the project ahead of the upstream release of Git V3, which may take place at some point next year.

    Goals

    Stretch goals

    Resources

    • https://github.com/go-git/go-git/
    • https://go-git.github.io/docs/


    Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti

    Description

    The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.

    The initial architecture can be checked out on the Repository listed under Resources.

    Among the features that will differ this project from other monitoring/notification systems:

    • Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
    • All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
    • Easy account management, being able to parse all required configuration by a single JSON file
    • Eliminate integrations by not requiring metrics to go through a data-agreggator

    Goals

    • Create a minimal working prototype following the workflow specified on the documentation
    • Provide instructions on installation/usage
    • Work on email notifying capabilities

    Resources


    Mammuthus - The NFS-Ganesha inside Kubernetes controller by vcheng

    Description

    As the user-space NFS provider, the NFS-Ganesha is wieldy use with serval projects. e.g. Longhorn/Rook. We want to create the Kubernetes Controller to make configuring NFS-Ganesha easy. This controller will let users configure NFS-Ganesha through different backends like VFS/CephFS.

    Goals

    1. Create NFS-Ganesha Package on OBS: nfs-ganesha5, nfs-ganesha6
    2. Create NFS-Ganesha Container Image on OBS: Image
    3. Create a Kubernetes controller for NFS-Ganesha and support the VFS configuration on demand. Mammuthus

    Resources

    NFS-Ganesha


    Rewrite Distrobox in go (POC) by fabriziosestito

    Description

    Rewriting Distrobox in Go.

    Main benefits:

    • Easier to maintain and to test
    • Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)

    Goals

    • Build a minimal starting point with core commands
    • Keep the CLI interface compatible: existing users shouldn't notice any difference
    • Use a clean Go architecture with adapters for different container backends
    • Keep dependencies minimal and binary size small
    • Benchmark against the original shell script

    Resources

    • Upstream project: https://github.com/89luca89/distrobox/
    • Distrobox site: https://distrobox.it/
    • ArchWiki: https://wiki.archlinux.org/title/Distrobox


    SUSE Health Check Tools by roseswe

    SUSE HC Tools Overview

    A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.

    Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.

    Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.

    Goals

    • Overall improvement of the tools
    • Adding new collectors
    • Add support for SLES16

    Resources

    csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go

    docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*

    $ getrpm -r pacemaker >> Product ID: 2795 (SUSE Linux Enterprise Server for SAP Applications 15 SP7 x86_64), RPM Name: +--------------+----------------------------+--------+--------------+--------------------+ | Package Name | Version | Arch | Release | Repository | +--------------+----------------------------+--------+--------------+--------------------+ | pacemaker | 2.1.10+20250718.fdf796ebc8 | x86_64 | 150700.3.3.1 | sle-ha/15.7/x86_64 | | pacemaker | 2.1.9+20250410.471584e6a2 | x86_64 | 150700.1.9 | sle-ha/15.7/x86_64 | +--------------+----------------------------+--------+--------------+--------------------+ Total packages found: 2


    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    DONE in HackWeek 24:

    DONE in 2025 (out of Hackweek)

    • Support of ClusterClass
    • Add to clusterctl community providers, you can add it directly with clusterctl
    • Testing on newer versions of Harvester v1.4.X and v1.5.X
    • Support for clusterctl generate cluster ...
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)

    Goals for HackWeek 2025

    • FIRST and FOREMOST, any topic is important to you
    • Add e2e testing
    • Certify the provider for Rancher Turtles
    • Add Machine pool labeling
    • Add PCI-e passthrough capabilities.
    • Other improvement suggestions are welcome!

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    Play with the userfaultfd(2) system call and download on demand using HTTP Range Requests with Golang by rbranco

    Description

    The userfaultfd(2) is a cool system call to handle page faults in user-space. This should allow me to list the contents of an ISO or similar archive without downloading the whole thing. The userfaultfd(2) part can also be done in theory with the PROT_NONE mprotect + SIGSEGV trick, for complete Unix portability, though reportedly being slower.

    Goals

    1. Create my own library for userfaultfd(2) in Golang.
    2. Create my own library for HTTP Range Requests.
    3. Complete portability with Unix.
    4. Benchmarks.
    5. Contribute some tests to LTP.

    Resources

    1. https://docs.kernel.org/admin-guide/mm/userfaultfd.html
    2. https://github.com/loopholelabs/userfaultfd-go
    3. https://github.com/DHowett/ranger
    4. https://www.cons.org/cracauer/cracauer-userfaultfd.html


    Rewrite Distrobox in go (POC) by fabriziosestito

    Description

    Rewriting Distrobox in Go.

    Main benefits:

    • Easier to maintain and to test
    • Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)

    Goals

    • Build a minimal starting point with core commands
    • Keep the CLI interface compatible: existing users shouldn't notice any difference
    • Use a clean Go architecture with adapters for different container backends
    • Keep dependencies minimal and binary size small
    • Benchmark against the original shell script

    Resources

    • Upstream project: https://github.com/89luca89/distrobox/
    • Distrobox site: https://distrobox.it/
    • ArchWiki: https://wiki.archlinux.org/title/Distrobox


    A CLI for Harvester by mohamed.belgaied

    Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.

    Project Description

    Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as: harvester vm create my-vm --count 5 to create 5 VMs named my-vm-01 to my-vm-05.

    asciicast

    Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.

    Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli

    Done in previous Hackweeks

    • Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
    • Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE

    Goal for this Hackweek

    The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.

    Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it

    Issue list is here: https://github.com/belgaied2/harvester-cli/issues

    Resources

    The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact). Welcome contributions are:

    • Testing it and creating issues
    • Documentation
    • Go code improvement

    What you might learn

    Harvester CLI might be interesting to you if you want to learn more about:

    • GitHub Actions
    • Harvester as a SUSE Product
    • Go programming language
    • Kubernetes API
    • Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)


    SUSE Health Check Tools by roseswe

    SUSE HC Tools Overview

    A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.

    Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.

    Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.

    Goals

    • Overall improvement of the tools
    • Adding new collectors
    • Add support for SLES16

    Resources

    csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go

    docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*

    $ getrpm -r pacemaker >> Product ID: 2795 (SUSE Linux Enterprise Server for SAP Applications 15 SP7 x86_64), RPM Name: +--------------+----------------------------+--------+--------------+--------------------+ | Package Name | Version | Arch | Release | Repository | +--------------+----------------------------+--------+--------------+--------------------+ | pacemaker | 2.1.10+20250718.fdf796ebc8 | x86_64 | 150700.3.3.1 | sle-ha/15.7/x86_64 | | pacemaker | 2.1.9+20250410.471584e6a2 | x86_64 | 150700.1.9 | sle-ha/15.7/x86_64 | +--------------+----------------------------+--------+--------------+--------------------+ Total packages found: 2


    AI-Powered Unit Test Automation for Agama by joseivanlopez

    The Agama project is a multi-language Linux installer that leverages the distinct strengths of several key technologies:

    • Rust: Used for the back-end services and the core HTTP API, providing performance and safety.
    • TypeScript (React/PatternFly): Powers the modern web user interface (UI), ensuring a consistent and responsive user experience.
    • Ruby: Integrates existing, robust YaST libraries (e.g., yast-storage-ng) to reuse established functionality.

    The Problem: Testing Overhead

    Developing and maintaining code across these three languages requires a significant, tedious effort in writing, reviewing, and updating unit tests for each component. This high cost of testing is a drain on developer resources and can slow down the project's evolution.

    The Solution: AI-Driven Automation

    This project aims to eliminate the manual overhead of unit testing by exploring and integrating AI-driven code generation tools. We will investigate how AI can:

    1. Automatically generate new unit tests as code is developed.
    2. Intelligently correct and update existing unit tests when the application code changes.

    By automating this crucial but monotonous task, we can free developers to focus on feature implementation and significantly improve the speed and maintainability of the Agama codebase.

    Goals

    • Proof of Concept: Successfully integrate and demonstrate an authorized AI tool (e.g., gemini-cli) to automatically generate unit tests.
    • Workflow Integration: Define and document a new unit test automation workflow that seamlessly integrates the selected AI tool into the existing Agama development pipeline.
    • Knowledge Sharing: Establish a set of best practices for using AI in code generation, sharing the learned expertise with the broader team.

    Contribution & Resources

    We are seeking contributors interested in AI-powered development and improving developer efficiency. Whether you have previous experience with code generation tools or are eager to learn, your participation is highly valuable.

    If you want to dive deep into AI for software quality, please reach out and join the effort!

    • Authorized AI Tools: Tools supported by SUSE (e.g., gemini-cli)
    • Focus Areas: Rust, TypeScript, and Ruby components within the Agama project.

    Interesting Links


    RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

    Description

    The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.

    We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.

    This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.

    The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.

    Goals

    The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.

    Key goals for the week:

    1. Analyze & Identify: Dive into the SUSE/rmt Ruby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions).
    2. Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
    3. Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like rb-sys or magnus.
    4. Benchmark: Create a benchmarking script (e.g., using k6, ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients.
    5. Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.

    Resources

    • RMT Source Code (Ruby):
      • https://github.com/SUSE/rmt
    • RMT Documentation:
      • https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
    • Tooling & Stacks:
      • RMT/Ruby development environment (for running the base RMT)
      • Rust development environment (rustup, cargo)
    • Potential Integration Libraries:
      • rb-sys: https://github.com/oxidize-rb/rb-sys
      • Magnus: https://github.com/matsadler/magnus
    • Benchmarking Tools:
      • k6 (https://k6.io/)
      • ab (ApacheBench)


    Modal editor in Rust by acervesato

    Description

    To write a modal editor in Rust inspired by vim and having the following features:

    • vim basic motion commands + insert/visual mode
    • multiple buffers with tabs
    • status bar

    It should be written for terminal only using ratatui library and crossterm.

    Goals

    The goal is to start with a functional prototype that can be extended in the future with the following features (in random order):

    • treesitter support + styles
    • fuzzy finder
    • grep finder
    • integration with git
    • tree viewer
    • internal terminal floating window
    • mailing list workflow integration

    Resources


    Looking at Rust if it could be an interesting programming language by jsmeix

    Get some basic understanding of Rust security related features from a general point of view.

    This Hack Week project is not to learn Rust to become a Rust programmer. This might happen later but it is not the goal of this Hack Week project.

    The goal of this Hack Week project is to evaluate if Rust could be an interesting programming language.

    An interesting programming language must make it easier to write code that is correct and stays correct when over time others maintain and enhance it than the opposite.