Since SUSE Manager doesn't scale out and stacking it into another pyramid of susemanagers won't help here, the real architectural changes needs to be done to achieve true scale-out of this product. This hackweek project is about how to Turn SUSE Manager into a cluster.
Areas to be tackled:
- Distributed FS for storage
- Distributed messaging bus
- Distributed KV for metadata
- Control Node prototype (this component turns a SUSE Manager into a stateless-ish event-driven node, where SUSE Manager in a future is deployed into a container -> Kubernetes)
- Cluster Director prototype (this component orchestrates the entire cluster via events)
- API Gateway, providing 100% compatible API across all SUSE Manager nodes (this is done, demoed at SUMA Winter Summit 2020)
- Slightly modified SUSE Manager peripherals (reposync, DB etc).
- Client daemon, which is used to bind a single registered system to a cluster node according to Cluster Director service (on shrink, grow, rebalance and disaster recovery events).
The idea is to solve a set of core problems how to turn technologically outdated Uyuni Server / SUMA into a modern cloud-native cluster node, so with after a reasonable time this could be turned into a real product for SUSE Manager.
Progress
Day One (Monday, 10 Feb)
Storm seems over. Realised this website doesn't allow editing comments :astonished:.
Got working initial Cluster Daemon that runs REST API and talks to the distributed KV database.
Got working Client Daemon for every Client System. So far it can:
- Pool the Client System to the cluster for staging
- Talk to Cluster Director (CD) and get the status
- Switch/reconfigure Salt Minion according to the CD's directives
Got initial tool (Python) that calls Cluster API. So far it is too simple to describe.
Got initial Cluster Director with OpenAPI spec (Swagger) running.
TODO:
[x]Running distributed K/V database[x]Verify it is buildable/package-able
[x]Running distributed message bus[x]Verify it is buildable/package-able
[x]Manage cluster Zones[x]mgr-clbd-admintool[x]List nodes[x]List zones[x]Format JSON input[x]Call API via JSON input
[x]Add OpenAPI to Cluster Director daemon[x]Swagger UI running[x]APIs are automatically generated (updatedMakefile)[x]I can try that in browser
[x]Describe all the APIs I've got so far
The day is over. Did many refactorings, found that Gin doesn't do form parsing from the body on DELETE method but solved it, not yet haven't finished Zones management (which is damn easy now, but argghh still!).
Day Two (Tuesday, 11 Feb)
Turned mgr-clbd-admin into a repo subproject of the Cluster Director daemon. There should be a set of common Jinja-based formatters for those common return types, but that's "bells-n-whistles" I will take care later. Right now a raw JSON dump is good 'nuff.
Update (15:20): Group photo taken, many refactorings, Zones management done. Time to write Node Controller.
Done (partially):
[*]Node Controller (initial)[x]"Wrap around head" overall code design[x]SSH communication over RSA keypair to the staging Cluster Node[x]Bi-directional pub/sub (initial)[*]Configuration file[x]SSH check remote host[x]SSH disable host verification option[*]Events are emitted from an arbitrary SUSE Manager to the Cluster bus via Node Controller (simple-n-stupid PoC ATM)[*]Commands from Cluster Directory received and mapped to the emitter facility (write one for XML-RPC APIs)[*]Cluster Node staging[x]Initial overall PoC code[x]Execute nanostates[1][x]SSH Runner runs a nanostate on a remote machine[x]Refactor SSH Runner output type before it is too late. It is complex enough to be very bad asmap[interface{}]interface{}.[x]Implement local runner (for a client onlocalhost)
[*]Write few nanostate scenarios to complete Node staging:[*]Reset/prepare PostgreSQL database[*]Prepare/mount distributed File System
NodeController is about to execute nanostates. It is like "nano-ansible" in the pocket, a fusion of Salt and Ansible ideas into a small package, which is not intended to be as broad CMS as those two. Essentially, it just runs a series of commands over SSH to a specific release of supported Cluster Node (AKA SUSE Manager) and does some rudimentary "things" on it, once it was installed and setup. These are like getting machine-id, backing up some configuration files, making standard operations with the database, start/stop/restart some services etc: all what can be done just via plain command line and mostly used for informational purposes.
This alone removes any kind of need of internal configuration management system to just stage a Cluster Node and add it to the swarm.
Day Three (Wednesday, 12 Feb)
An Unwanted Accident... (almost)
While working on a Cluster Node code for staging and playing with SSH sessions and channels, I just accidentally "rewrote" Salt, combining both best practices of Salt and Ansible. Of course, it is far-far-far-far away from what Salt can do today.
Or... is it?
Let's see.
So the main plan was to manage cluster nodes and their components and nothing else. For that nobody needs a full-blown configuration management infrastructure, right? And so it happens that Cluster since has a Cluster Director (that supposed to scale-out on its own), but it is essentially just like that woulld be a Salt Master. And, consequently, since it is hooked up to an abstract Message Bus (you write an adapter connector for Kafka if you need many millions, but so far I am sure NATS will do for millions of #susemanagers), it talks to a Node Controllers on each #susemanager node, which is... right, an analogy to a Salt Minion. And here you go: bi-directional pub/sub that can perform something when a particular message arrives.
Security? Everything is happening over TLS anyway. But then another layer is based on pure OpenSSL: whatever Cluster Director needs to pass secretly to a specific Cluster Node, it sends to a channel, where every message is also encrypted on its own. Each Cluster Node is subscribed to a TWO channels:
- General public
- Private (by key fingerprint)
Cluster Director sends everything plain text to a public channel, but secret communication is running over private channel, encrypted with the public key of the recepient. Complexity of returners, pillars etc is no longer needed in this way.
It didn't took me long to craft a simple architecture to foresee even embedding Lua or Starlight into a state system. In fact, it would be even better than Salt, because one wouldn't have if/else imperative clutter in the declarative state (!). How a state looks like? Currently so (again, it is a hackweek of one solo "cowboy", not even pre-alpha):
id: some-test
description: This is a test state
state:
gather-machine-summary:
- shell:
- get-id: "cat /etc/machine-id"
- uptime: "uptime"
- hostname: "hostname"
The shell is a module I just wrote. It takes series of the commands and runs it, returning me something like this:
{
"get-id": "12e43783e54f25bb3f505cfeeff94045",
"upteime": "13:54:07 up 18:59, 1 user, load average: 0,10, 0,18, 0,17",
"hostname": "rabbit"
}
It also happens that all the above can be ran locally or remotely. Or one-to-many remotely over SSH.
But... Wait a sec. Only $SHELL? Can it do a bit more than this? Wait-wait. So if I can run arbitrary stuff already (which Ansible and Salt are anyway), then what stops me to call pure Ansible binary modules, and just access all that pile of crazy modules they've got already working? Nothing! Just scp them there and stockpile on the client. In fact, just install the entire Ansible and run it there as is. It is as same as Saltsible works, after all.
Basically, I ended up with a message-driven cluster architecture that happens to be compatible with Ansible modules. Kind of. Not yet completely, but to bring that to 100% compatibility is no brainer, just need few more extra days to get done. Which not my goal and priority at the moment anyway.
So and then imagine if having Lua or Starlight (Python dialect) embedded, you could do the above another way:
import:
- ssh_functions
id: some-test
description: This is a test state
state:
gather-machine-summary:
- shell:
- {get_id()}
- {get_uptime()}
- {get_hostname()}
These would be a functions somewhere in a file ssh_functions.lua. The same functions could do also this:
import:
- ssh_functions
id: some-test
description: This is a test state
state:
create-user:
- shell:
- {add_user(uid=get_salt_pillar("uid"))}
Well, you've got the idea. Anyway, I will focus on unchecked boxes above from yesterday and will finish this working at least anything. And thus won't extrapolate this up to eleven. At least not at the moment.
Update (17:40) Nanostates are happily running passed on scenarios on remote machines
Sort of declarative orchestration. Not yet asynchronous. Few minutes left, maybe I will implement local runner?..
Update (somewhere evening) Refactored runners and implemented SSH runner as well as local runner. Nah, but the possibility of running Ansible modules FOR FREE is still bugging me!
Instead of reimplementing PostgreSQL start/stop and prepare/mount distributed file system in a shell command line, how much time it will take to hook the whole Ansible modular system and use it from the nanostates? It is a Hackweek, after all!
Day Four (Thursday, 13 Feb)
What one can do in basically four days, having almost nothing? A lot!
So far, my leftovers from yesterday:
TODO:
[x]Integrate Ansible[x]Runs binary modules[x]Runs Python modules
[ ]Write few nanostate scenarios to complete Node staging:[ ]Reset/prepare PostgreSQL database[ ]Prepare/mount distributed File System
[ ]Cluster Node staging[ ]Integrate staging part together with the Node Controller
[ ]Node Controller (initial)[x]Configuration file[x]PostgreSQL event emitter[x]Events are emitted from an arbitrary SUSE Manager to the Cluster bus via Node Controller (simple-n-stupid PoC ATM)[ ]Commands from Cluster Directory received and mapped to the emitter facility (write one for XML-RPC APIs)
OK, well... Ansible would be certainly a right next step to look at, but ATM I'd rather save time and focus on emitting messages from the PostreSQL database, which is deep inside SUSE Manager's guts. So toss-in few basic shell commands for Node staging and that's it.
Update (12:00) SCNR...
Took this official Ansible module. Then wrote a nanostate snippet:
- ansible.helloworld:
name: "Cluster"
Result:
{
"Module": "ansible.helloworld",
"Errcode": 0,
"Errmsg": "",
"Response": [
{
"Host": "localhost",
"Response": {
"ansible.helloworld": {
"Stdout": "",
"Stderr": "",
"Errmsg": "",
"Errcode": 0,
"Json": {
"changed": false,
"failed": false,
"msg": "Hello, Cluster!"
}
}
}
}
]
}
Of course, this inherited Ansible's main illness: dont_run_this_twice.yaml. Calling nanostates nanostates is too loud at the moment: they won't check the state, but just fire whatever in them "into the woods". But the goal of the project nor to write another Configuration Management, n̶e̶i̶t̶h̶e̶r̶ ̶s̶c̶a̶l̶e̶-̶o̶u̶t̶ ̶A̶n̶s̶i̶b̶l̶e̶ (oops, that just happened unplanned), neither to fix Ansible imperative behaviour and build around it declarative runners (which is not really a problem, BTW).
Oh well. Fun. Now messaging bus story: Postgres, here I come!
Update (somewhere evening) PostgreSQL happily spitting out every changes to its tables through whatever way in Uyuni Server. The XML-RPC APIs are very slow, on the other hand. I was exploring ways how to implement plugins in Go, so then I don't have to bundle everything into one binary. The gRPC way is the only reliable and nicely decouple-able. The "native" Go plugins are an interesting tech preview, working nice (as long as the same $GOPATH and the same compiler is used) but sadly they seems still quite far away from production status. Plugins supposed to be written by different vendors, which is not the case right now seems to be supported.
I am right now solving problems how the Node Controller will reconcile network transaction across the entire cluster, making 100% sure all-or-none nodes has been updated. As always, there are several ways of doing it, but I have to find out which one suits best.
The Last Day of The Hackweek (Friday, 14 Feb)
At least it isn't Friday 13. Starting from touch main.go, so far what I've got per these days:
Someone Did It
But I chose it and put it together. I chose that, because I can also support it and bugfix it.
- Running equivalent to
etcd, which scales out way better thanetcd. Check out TiKV. If you know Rust, you will have lots of fun. - Running MySQL compatibility layer on top of it. Performance is about 10x times slower than MySQL's InnoDB, but in this case performance isn't an issue at all. Important that this thing scales out infinitely, just a bit of space hungry. Check out TiDB
- Running distributed storage and mountable filesystem. If SES guys will one day support "SUSE Manager on Ceph nodes", it will be just fantastic. Until then — other solutions. Check out SeaweedFS and IPFS. The IPFS is running Tumbleweed repo at SUSE.
- Running message bus that supposed to scale out same as Apache Kafka does. The reason not to use Apache Kafka is very trivial: its infrastructure is much harder to maintain. But this is not a reason and so hard infrastructure maintenance on its own does not rules Kafka out! You want it? No problemo: — just add another adapter to Apache Kafka and replace with currently used NATS. In fact, NATS perfectly co-exists with Kafka in some infrastructures. Check out NATS
I Did It
- Running Client System Daemon (i.e. "runs on registered client system") which main basic role to ask Cluster what node to use, automatically reconfigure Salt Minion and other configuration and then re-point client system to a new Cluster Node (AKA Uyuni Server), if that is needed. It also recovers client system back to the cluster, once Cluster Node puffed in smokes.
- "one to many and many to one" API Gateway, which allows
spacecmdand similar tools to "just work" across multiple nodes. Granted, it wasn't written during this Hackweek and is 99.999% compatible (I was too lazy to get back and implement overloaded XML-RPC signatures for REST, as well I am returningnilinstead of an empty dictionary — probably a bug, but... meh... later). This thing also runs Swagger UI for OpenAPI specs against all XML-RPC API for SUSE Manager.
- Very basic Cluster Director that can manage zones in and add cluster nodes. It as well runs OpenAPI/Swagger UI. Very basic, because it has no features yet. But doesn't mean it doesn't have more-less solid architecture.
- Library that runs Ansible in Salt fashion (via bi-directional pub/sub, which rules out returners/pillars as unnecessary). I am going to use that internally instead of both Salt and Ansible on their own. Again, it is simpler to call Ansible module reusing existing scaled out infrastructure, rather then run-and-take-care-of yet another components. And I don't have to maintain Ansible: it is perfectly tested anyway.
- Library that resembles SaltSSH by running Ansible modules (both Python and binary). I am considering I've done it, because I could (doesn't mean I should).
- Very unfinished Initial Node Controller Daemon, which listens to Uyuni Server events and emits messages to the bus for further operations.
Phew. Not bad as for basically four days, I'd say! All that stuff I wrote in Go. I'd say it does make sense to use that language, if you don't want to write Java or Python.
What are my nearest plans?
- Finish the "loop" and have all components running, talking to each other, client nodes are transfered seamlessly.
- Achieve network transaction on updating Cluster Nodes.
- Write some Ansible modules, likely in plain old C and Rust, add their caching on the client so it will perform well, add seamless module updates. Generic Ansible doesn't hurt, but I don't need it for Cluster needs.
- Modular/pluggable system in Go, so this whole project can be adaptable to other products, not just SUSE Manager.
Presentation Slides
I've put an outline together all that into my Google Drive. Enjoy.
...and stay tuned
Looking for hackers with the skills:
This project is part of:
Hack Week 19
Activity
Comments
-
almost 6 years ago by bmaryniuk | Reply
Day One
TODO:
[x]Running distributed K/V database[x]Verify it is buildable/package-able
[x]Running distributed message bus[x]Verify it is buildable/package-able
[ ]Manage cluster Zones[ ]
Summary
Got running Client Daemon. It can:
- Talk to Cluster Director (CD) and ask for status
- Switch/reconfigure Salt Minion according to the CD's directives
-
Similar Projects
Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev
Description
Continue the work on kqlite (Lightweight remote SQLite with high availability and auto failover).
It's a solution for applications that require High Availability but don't need all the features of a complete RDBMS and can fit SQLite in their use case.
Also kqlite can be considered to be used as a lightweight storage backend for K8s (https://docs.k3s.io/datastore) and the Edge, and allowing to have only 2 Nodes for HA.
Goals
Push kqlite to a beta version.
kqlite as library for Go programs.
Resources
https://github.com/kqlite/kqlite
Hacking a SUSE MLS 7.9 Cluster by roseswe
Description
SUSE MLS (Multi-Linux Support) - A subscription where SUSE provides technical support and updates for Red Hat Enterprise Linux (RHEL) and CentOS servers
The most significant operational difference between SUSE MLS 7 and the standard SUSE Linux Enterprise Server High Availability Extension (SLES HAE) lies in the administrative toolchain. While both distributions rely on the same underlying Pacemaker resource manager and Corosync messaging layer, MLS 7 preserves the native Red Hat Enterprise Linux 7 user space. Consequently, MLS 7 administrators must utilize the Pacemaker Configuration System (pcs), a monolithic and imperative tool. The pcs utility abstracts the entire stack, controlling Corosync networking, cluster bootstrapping, and resource management through single-line commands that automatically generate the necessary configuration files. In contrast, SLES HAE employs the Cluster Resource Management Shell (crmsh). The crm utility operates as a declarative shell that focuses primarily on the Cluster Information Base (CIB). Unlike the command-driven nature of pcs, crmsh allows administrators to enter a configuration context to define the desired state of the cluster using syntax that maps closely to the underlying XML structure. This makes SLES HAE more flexible for complex edits but requires a different syntax knowledge base compared to the rigid, command-execution workflow of MLS 7.
Scope is here MLS 7.9
Goals
- Get more familiar with MLS7.9 HA toolchain and Graphical User Interface and Daemons
- Create a two node MLS cluster with SBD
- Check different use cases
- Create a "SUSE Best Practices" presentation slide set suitable for Consulting Customers
Resources
- You need MLS7.9 (Qcow2) installed + subscription
- KVM server with 2 KVMs, 2 SBD
- RHEL7 and HA skills
Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti
Description
The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.
The initial architecture can be checked out on the Repository listed under Resources.
Among the features that will differ this project from other monitoring/notification systems:
- Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
- All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
- Easy account management, being able to parse all required configuration by a single JSON file
- Eliminate integrations by not requiring metrics to go through a data-agreggator
Goals
- Create a minimal working prototype following the workflow specified on the documentation
- Provide instructions on installation/usage
- Work on email notifying capabilities
Resources
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
DONE in 2025 (out of Hackweek)
- Support of ClusterClass
- Add to
clusterctlcommunity providers, you can add it directly withclusterctl - Testing on newer versions of Harvester v1.4.X and v1.5.X
- Support for
clusterctl generate cluster ... - Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
Goals for HackWeek 2025
- FIRST and FOREMOST, any topic is important to you
- Add e2e testing
- Certify the provider for Rancher Turtles
- Add Machine pool labeling
- Add PCI-e passthrough capabilities.
- Other improvement suggestions are welcome!
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0
Self-Scaling LLM Infrastructure Powered by Rancher

Description
The Problem
Running LLMs can get expensive and complex pretty quickly.
Today there are typically two choices:
- Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
- Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.
What if there was a middle ground?
What if infrastructure scaled itself instead of making you scale it?
Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?
Project Repository: github.com/alexander-demicev/llmserverless
What This Project Does
A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.
A complete, self-scaling LLM infrastructure that:
- Scales to zero when idle (no idle costs)
- Scales up automatically when requests come in
- Adds more nodes when needed, removes them when demand drops
- Runs on any infrastructure - laptop, bare metal, or cloud
Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.
How It Works
A combination of open source tools working together:
Flow:
- Users interact with OpenWebUI (chat interface)
- Requests go to LiteLLM Gateway
- LiteLLM routes requests to:
- Ollama (Knative) for local model inference (auto-scales pods)
- Or cloud APIs for fallback
OpenPlatform Self-Service Portal by tmuntan1
Description
In SUSE IT, we developed an internal developer platform for our engineers using SUSE technologies such as RKE2, SUSE Virtualization, and Rancher. While it works well for our existing users, the onboarding process could be better.
To improve our customer experience, I would like to build a self-service portal to make it easy for people to accomplish common actions. To get started, I would have the portal create Jira SD tickets for our customers to have better information in our tickets, but eventually I want to add automation to reduce our workload.
Goals
- Build a frontend website (Angular) that helps customers create Jira SD tickets.
- Build a backend (Rust with Axum) for the backend, which would do all the hard work for the frontend.
Resources
The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio
Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. 
The Plan
Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!
Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:
❥ The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.
❥ The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.
❥ Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.
If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.
Why?
We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.
The Outputs
❥ A "State of the Agentic Union" for SUSE engineers, detailing what works, what explodes, and how much coffee we can drink while the robots do the rebasing.
❥ Honest, Daily Updates With All the Gory Details
Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti
Description
The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.
The initial architecture can be checked out on the Repository listed under Resources.
Among the features that will differ this project from other monitoring/notification systems:
- Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
- All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
- Easy account management, being able to parse all required configuration by a single JSON file
- Eliminate integrations by not requiring metrics to go through a data-agreggator
Goals
- Create a minimal working prototype following the workflow specified on the documentation
- Provide instructions on installation/usage
- Work on email notifying capabilities
Resources
Play with the userfaultfd(2) system call and download on demand using HTTP Range Requests with Golang by rbranco
Description
The userfaultfd(2) is a cool system call to handle page faults in user-space. This should allow me to list the contents of an ISO or similar archive without downloading the whole thing. The userfaultfd(2) part can also be done in theory with the PROT_NONE mprotect + SIGSEGV trick, for complete Unix portability, though reportedly being slower.
Goals
- Create my own library for userfaultfd(2) in Golang.
- Create my own library for HTTP Range Requests.
- Complete portability with Unix.
- Benchmarks.
- Contribute some tests to LTP.
Resources
- https://docs.kernel.org/admin-guide/mm/userfaultfd.html
- https://www.cons.org/cracauer/cracauer-userfaultfd.html
Updatecli Autodiscovery supporting WASM plugins by olblak
Description
Updatecli is a Golang Update policy engine that allow to write Update policies in YAML manifest. Updatecli already has a plugin ecosystem for common update strategies such as automating Dockerfile or Kubernetes manifest from Git repositories.
This is what we call autodiscovery where Updatecli generate manifest and apply them dynamically based on some context.
Obviously, the Updatecli project doesn't accept plugins specific to an organization.
I saw project using different languages such as python, C#, or JS to generate those manifest.
It would be great to be able to share and reuse those specific plugins
During the HackWeek, I'll hang on the Updatecli matrix channel
https://matrix.to/#/#Updatecli_community:gitter.im
Goals
Implement autodiscovery plugins using WASM. I am planning to experiment with https://github.com/extism/extism
To build a simple WASM autodiscovery plugin and run it from Updatecli
Resources
- https://github.com/extism/extism
- https://github.com/updatecli/updatecli
- https://www.updatecli.io/docs/core/autodiscovery/
- https://matrix.to/#/#Updatecli_community:gitter.im
terraform-provider-feilong by e_bischoff
Project Description
People need to test operating systems and applications on s390 platform. While this is straightforward with KVM, this is very difficult with z/VM.
IBM Cloud Infrastructure Center (ICIC) harnesses the Feilong API, but you can use Feilong without installing ICIC(see this schema).
What about writing a terraform Feilong provider, just like we have the terraform libvirt provider? That would allow to transparently call Feilong from your main.tf files to deploy and destroy resources on your z/VM system.
Goal for Hackweek 23
I would like to be able to easily deploy and provision VMs automatically on a z/VM system, in a way that people might enjoy even outside of SUSE.
My technical preference is to write a terraform provider plugin, as it is the approach that involves the least software components for our deployments, while remaining clean, and compatible with our existing development infrastructure.
Goals for Hackweek 24
Feilong provider works and is used internally by SUSE Manager team. Let's push it forward!
Let's add support for fiberchannel disks and multipath.
Goals for Hackweek 25
Modernization, maturity, and maintenance: support for SLES 16 and openTofu, new API calls, fixes...
Resources
Outcome
Contribute to terraform-provider-libvirt by pinvernizzi
Description
The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.
It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.
If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.
Goals
- Get more familiar with Terraform provider development and libvirt bindings in Go
- Solve some issues and/or implement some features
- Get in touch with the community around the project
Resources
- CONTRIBUTING readme
- Go libvirt library in use by the project
- Terraform plugin development
- "Good first issue" list
Create a go module to wrap happy-compta.fr by cbosdonnat
Description
https://happy-compta.fr is a tool for french work councils simple book keeping. While it does the job, it has no API to work with and it is tedious to enter loads of operations.
Goals
Write a go client module to be used as an API to programmatically manipulate the tool.
Writing an example tool to load data from a CSV file would be good too.
HTTP API for nftables by crameleon
Background
The idea originated in https://progress.opensuse.org/issues/164060 and is about building RESTful API which translates authorized HTTP requests to operations in nftables, possibly utilizing libnftables-json(5).
Originally, I started developing such an interface in Go, utilizing https://github.com/google/nftables. The conversion of string networks to nftables set elements was problematic (unfortunately no record of details), and I started a second attempt in Python, which made interaction much simpler thanks to native nftables Python bindings.
Goals
- Find and track the issue with google/nftables
- Revisit and polish the Go or Python code (prefer Go, but possibly depends on implementing missing functionality), primarily the server component
- Finish functionality to interact with nftables sets (retrieving and updating elements), which are of interest for the originating issue
- Align test suite
- Packaging
Resources
- https://git.netfilter.org/nftables/tree/py/src/nftables.py
- https://git.com.de/Georg/nftables-http-api (to be moved to GitHub)
- https://build.opensuse.org/package/show/home:crameleon:containers/pytest-nftables-container
Results
- Go nftables issue was related to set elements needing to be added with different start and end addresses - coincidentally, this was recently discovered by someone else, who added a useful helper function for this: https://github.com/google/nftables/pull/342.
Side results
Upon starting to unify the structure and implementing more functionality, missing JSON output support was noticed for some subcommands in libnftables. I am submitting patches as needed:
- https://lore.kernel.org/netfilter-devel/20251203131736.4036382-2-georg@syscid.com/T/#u
Contribute to terraform-provider-libvirt by pinvernizzi
Description
The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.
It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.
If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.
Goals
- Get more familiar with Terraform provider development and libvirt bindings in Go
- Solve some issues and/or implement some features
- Get in touch with the community around the project
Resources
- CONTRIBUTING readme
- Go libvirt library in use by the project
- Terraform plugin development
- "Good first issue" list
SUSE Health Check Tools by roseswe
SUSE HC Tools Overview
A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.
Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.
Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.
Goals
- Overall improvement of the tools
- Adding new collectors
- Add support for SLES16
Resources
csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go
docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*
$ getrpm -r pacemaker
>> Product ID: 2795 (SUSE Linux Enterprise Server for SAP Applications 15 SP7 x86_64), RPM Name:
+--------------+----------------------------+--------+--------------+--------------------+
| Package Name | Version | Arch | Release | Repository |
+--------------+----------------------------+--------+--------------+--------------------+
| pacemaker | 2.1.10+20250718.fdf796ebc8 | x86_64 | 150700.3.3.1 | sle-ha/15.7/x86_64 |
| pacemaker | 2.1.9+20250410.471584e6a2 | x86_64 | 150700.1.9 | sle-ha/15.7/x86_64 |
+--------------+----------------------------+--------+--------------+--------------------+
Total packages found: 2
Cluster API Provider for Harvester by rcase
Project Description
The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.
The project has been bootstrapped in HackWeek 23, and its code is available here.
Work done in HackWeek 2023
- Have a early working version of the provider available on Rancher Sandbox : *DONE *
- Demonstrated the created cluster can be imported using Rancher Turtles: DONE
- Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo
DONE in HackWeek 24:
- Add more Unit Tests
- Improve Status Conditions for some phases
- Add cloud provider config generation
- Testing with Harvester v1.3.2
- Template improvements
- Issues creation
DONE in 2025 (out of Hackweek)
- Support of ClusterClass
- Add to
clusterctlcommunity providers, you can add it directly withclusterctl - Testing on newer versions of Harvester v1.4.X and v1.5.X
- Support for
clusterctl generate cluster ... - Improve Status Conditions to reflect current state of Infrastructure
- Improve CI (some bugs for release creation)
Goals for HackWeek 2025
- FIRST and FOREMOST, any topic is important to you
- Add e2e testing
- Certify the provider for Rancher Turtles
- Add Machine pool labeling
- Add PCI-e passthrough capabilities.
- Other improvement suggestions are welcome!
Thanks to @isim and Dominic Giebert for their contributions!
Resources
Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.
This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:
AI-Powered Unit Test Automation for Agama by joseivanlopez
The Agama project is a multi-language Linux installer that leverages the distinct strengths of several key technologies:
- Rust: Used for the back-end services and the core HTTP API, providing performance and safety.
- TypeScript (React/PatternFly): Powers the modern web user interface (UI), ensuring a consistent and responsive user experience.
- Ruby: Integrates existing, robust YaST libraries (e.g.,
yast-storage-ng) to reuse established functionality.
The Problem: Testing Overhead
Developing and maintaining code across these three languages requires a significant, tedious effort in writing, reviewing, and updating unit tests for each component. This high cost of testing is a drain on developer resources and can slow down the project's evolution.
The Solution: AI-Driven Automation
This project aims to eliminate the manual overhead of unit testing by exploring and integrating AI-driven code generation tools. We will investigate how AI can:
- Automatically generate new unit tests as code is developed.
- Intelligently correct and update existing unit tests when the application code changes.
By automating this crucial but monotonous task, we can free developers to focus on feature implementation and significantly improve the speed and maintainability of the Agama codebase.
Goals
- Proof of Concept: Successfully integrate and demonstrate an authorized AI tool (e.g.,
gemini-cli) to automatically generate unit tests. - Workflow Integration: Define and document a new unit test automation workflow that seamlessly integrates the selected AI tool into the existing Agama development pipeline.
- Knowledge Sharing: Establish a set of best practices for using AI in code generation, sharing the learned expertise with the broader team.
Contribution & Resources
We are seeking contributors interested in AI-powered development and improving developer efficiency. Whether you have previous experience with code generation tools or are eager to learn, your participation is highly valuable.
If you want to dive deep into AI for software quality, please reach out and join the effort!
- Authorized AI Tools: Tools supported by SUSE (e.g.,
gemini-cli) - Focus Areas: Rust, TypeScript, and Ruby components within the Agama project.
Interesting Links
Build a terminal user-interface (TUI) for Agama by IGonzalezSosa
Description
Officially, Agama offers two different user interfaces. On the one hand, we have the web-based interface, which is the one you see when you run the installation media. On the other hand, we have a command-line interface. In both cases, you can use them using a remote system, either using a browser or the agama CLI.
We would expect most of the cases to be covered by this approach. However, if you cannot use the web-based interface and, for some reason, you cannot access the system through the network, your only option is to use the CLI. This interface offers a mechanism to modify Agama's configuration using an editor (vim, by default), but perhaps you might want to have a more user-friendly way.
Goals
The main goal of this project is to built a minimal terminal user-interface for Agama. This interface will allow the user to install the system providing just a few settings (selecting a product, a storage device and a user password). Then it should report the installation progress.
Resources
- https://agama-project.github.io/
- https://ratatui.rs/
Conclusions
We have summarized our conclusions in a pull request. It includes screenshots ;-) We did not implement all the features we wanted, but we learn a lot during the process. We know that, if needed, we could write a TUI for Agama and we have an idea about how to build it. Good enough.
Exploring Rust's potential: from basics to security by sferracci
Description
This project aims to conduct a focused investigation and practical application of the Rust programming language, with a specific emphasis on its security model. A key component will be identifying and understanding the most common vulnerabilities that can be found in Rust code.
Goals
Achieve a beginner/intermediate level of proficiency in writing Rust code. This will be measured by trying to solve LeetCode problems focusing on common data structures and algorithms. Study Rust vulnerabilities and learning best practices to avoid them.
Resources
Rust book: https://doc.rust-lang.org/book/
Learn a bit of embedded programming with Rust in a micro:bit v2 by aplanas
Description
micro:bit is a small single board computer with a ARM Cortex-M4 with the FPU extension, with a very constrain amount of memory and a bunch of sensors and leds.
The board is very well documented, with schematics and code for all the features available, so is an excellent platform for learning embedded programming.
Rust is a system programming language that can generate ARM code, and has crates (libraries) to access the micro:bit hardware. There is plenty documentation about how to make small programs that will run in the micro:bit.
Goals
Start learning about embedded programming in Rust, and maybe make some code to the small KS4036F Robot car from keyestudio.
Resources
- micro:bit
- KS4036F
- microbit technical documentation
- schematic
- impl Rust for micro:bit
- Rust Embedded MB2 Discovery Book
- nRF-HAL
- nRF Microbit-v2 BSP (blocking)
- knurling-rs
- C++ microbit codal
- microbit-bsp for Embassy
- Embassy
Diary
Day 1
- Start reading https://mb2.implrust.com/abstraction-layers.html
- Prepare the dev environment (cross compiler, probe-rs)
- Flash first code in the board (blinky led)
- Checking differences between BSP and HAL
- Compile and install a more complex example, with stack protection
- Reading about the simplicity of xtask, as alias for workspace execution
- Reading the CPP code of the official micro:bit libraries. They have a font!
Day 2
- There are multiple BSP for the microbit. One is using async code for non-blocking operations
- Download and study a bit the API for microbit-v2, the nRF official crate
- Take a look of the KS4036F programming, seems that the communication is multiplexed via I2C
- The motor speed can be selected via PWM (pulse with modulation): power it longer (high frequency), and it will increase the speed
- Scrolling some text
- Debug by printing! defmt is a crate that can be used with probe-rs to emit logs
- Start reading input from the board: buttons
- The logo can be touched and detected as a floating point value
Day 3
- A bit confused how to read the float value from a pin
Mail client with mailing list workflow support in Rust by acervesato
Description
To create a mail user interface using Rust programming language, supporting mailing list patches workflow. I know, aerc is already there, but I would like to create something simpler, without integrated protocols. Just a plain user interface that is using some crates to read and create emails which are fetched and sent via external tools.
I already know Rust, but not the async support, which is needed in this case in order to handle events inside the mail folder and to send notifications.
Goals
- simple user interface in the style of
aerc, with some vim keybindings for motions and search - automatic run of external tools (like
mbsync) for checking emails - automatic run commands for notifications
- apply patch set from ML
- tree-sitter support with styles
Resources
- ratatui: user interface (https://ratatui.rs/)
- notify: folder watcher (https://docs.rs/notify/latest/notify/)
- mail-parser: parser for emails (https://crates.io/crates/mail-parser)
- mail-builder: create emails in proper format (https://docs.rs/mail-builder/latest/mail_builder/)
- gitpatch: ML support (https://crates.io/crates/gitpatch)
- tree-sitter-rust: support for mail format (https://crates.io/crates/tree-sitter)