Since SUSE Manager doesn't scale out and stacking it into another pyramid of susemanagers won't help here, the real architectural changes needs to be done to achieve true scale-out of this product. This hackweek project is about how to Turn SUSE Manager into a cluster.
Areas to be tackled:
- Distributed FS for storage
- Distributed messaging bus
- Distributed KV for metadata
- Control Node prototype (this component turns a SUSE Manager into a stateless-ish event-driven node, where SUSE Manager in a future is deployed into a container -> Kubernetes)
- Cluster Director prototype (this component orchestrates the entire cluster via events)
- API Gateway, providing 100% compatible API across all SUSE Manager nodes (this is done, demoed at SUMA Winter Summit 2020)
- Slightly modified SUSE Manager peripherals (reposync, DB etc).
- Client daemon, which is used to bind a single registered system to a cluster node according to Cluster Director service (on shrink, grow, rebalance and disaster recovery events).
The idea is to solve a set of core problems how to turn technologically outdated Uyuni Server / SUMA into a modern cloud-native cluster node, so with after a reasonable time this could be turned into a real product for SUSE Manager.
Progress
Day One (Monday, 10 Feb)
Storm seems over. Realised this website doesn't allow editing comments :astonished:.
Got working initial Cluster Daemon that runs REST API and talks to the distributed KV database.
Got working Client Daemon for every Client System. So far it can:
- Pool the Client System to the cluster for staging
- Talk to Cluster Director (CD) and get the status
- Switch/reconfigure Salt Minion according to the CD's directives
Got initial tool (Python) that calls Cluster API. So far it is too simple to describe.
Got initial Cluster Director with OpenAPI spec (Swagger) running.
TODO:
[x]
Running distributed K/V database[x]
Verify it is buildable/package-able
[x]
Running distributed message bus[x]
Verify it is buildable/package-able
[x]
Manage cluster Zones[x]
mgr-clbd-admin
tool[x]
List nodes[x]
List zones[x]
Format JSON input[x]
Call API via JSON input
[x]
Add OpenAPI to Cluster Director daemon[x]
Swagger UI running[x]
APIs are automatically generated (updatedMakefile
)[x]
I can try that in browser
[x]
Describe all the APIs I've got so far
The day is over. Did many refactorings, found that Gin doesn't do form parsing from the body on DELETE method but solved it, not yet haven't finished Zones management (which is damn easy now, but argghh still!).
Day Two (Tuesday, 11 Feb)
Turned mgr-clbd-admin
into a repo subproject of the Cluster Director daemon. There should be a set of common Jinja-based formatters for those common return types, but that's "bells-n-whistles" I will take care later. Right now a raw JSON dump is good 'nuff.
Update (15:20): Group photo taken, many refactorings, Zones management done. Time to write Node Controller.
Done (partially):
[*]
Node Controller (initial)[x]
"Wrap around head" overall code design[x]
SSH communication over RSA keypair to the staging Cluster Node[x]
Bi-directional pub/sub (initial)[*]
Configuration file[x]
SSH check remote host[x]
SSH disable host verification option[*]
Events are emitted from an arbitrary SUSE Manager to the Cluster bus via Node Controller (simple-n-stupid PoC ATM)[*]
Commands from Cluster Directory received and mapped to the emitter facility (write one for XML-RPC APIs)[*]
Cluster Node staging[x]
Initial overall PoC code[x]
Execute nanostates[1][x]
SSH Runner runs a nanostate on a remote machine[x]
Refactor SSH Runner output type before it is too late. It is complex enough to be very bad asmap[interface{}]interface{}
.[x]
Implement local runner (for a client onlocalhost
)
[*]
Write few nanostate scenarios to complete Node staging:[*]
Reset/prepare PostgreSQL database[*]
Prepare/mount distributed File System
NodeController is about to execute nanostates. It is like "nano-ansible" in the pocket, a fusion of Salt and Ansible ideas into a small package, which is not intended to be as broad CMS as those two. Essentially, it just runs a series of commands over SSH to a specific release of supported Cluster Node (AKA SUSE Manager) and does some rudimentary "things" on it, once it was installed and setup. These are like getting machine-id
, backing up some configuration files, making standard operations with the database, start/stop/restart some services etc: all what can be done just via plain command line and mostly used for informational purposes.
This alone removes any kind of need of internal configuration management system to just stage a Cluster Node and add it to the swarm.
Day Three (Wednesday, 12 Feb)
An Unwanted Accident... (almost)
While working on a Cluster Node code for staging and playing with SSH sessions and channels, I just accidentally "rewrote" Salt, combining both best practices of Salt and Ansible. Of course, it is far-far-far-far away from what Salt can do today.
Or... is it? Let's see.
So the main plan was to manage cluster nodes and their components and nothing else. For that nobody needs a full-blown configuration management infrastructure, right? And so it happens that Cluster since has a Cluster Director (that supposed to scale-out on its own), but it is essentially just like that woulld be a Salt Master. And, consequently, since it is hooked up to an abstract Message Bus (you write an adapter connector for Kafka if you need many millions, but so far I am sure NATS will do for millions of #susemanager
s), it talks to a Node Controllers on each #susemanager
node, which is... right, an analogy to a Salt Minion. And here you go: bi-directional pub/sub that can perform something when a particular message arrives.
Security? Everything is happening over TLS anyway. But then another layer is based on pure OpenSSL: whatever Cluster Director needs to pass secretly to a specific Cluster Node, it sends to a channel, where every message is also encrypted on its own. Each Cluster Node is subscribed to a TWO channels:
- General public
- Private (by key fingerprint)
Cluster Director sends everything plain text to a public channel, but secret communication is running over private channel, encrypted with the public key of the recepient. Complexity of returners, pillars etc is no longer needed in this way.
It didn't took me long to craft a simple architecture to foresee even embedding Lua or Starlight into a state system. In fact, it would be even better than Salt, because one wouldn't have if/else
imperative clutter in the declarative state (!). How a state looks like? Currently so (again, it is a hackweek of one solo "cowboy", not even pre-alpha):
id: some-test
description: This is a test state
state:
gather-machine-summary:
- shell:
- get-id: "cat /etc/machine-id"
- uptime: "uptime"
- hostname: "hostname"
The shell
is a module I just wrote. It takes series of the commands and runs it, returning me something like this:
{
"get-id": "12e43783e54f25bb3f505cfeeff94045",
"upteime": "13:54:07 up 18:59, 1 user, load average: 0,10, 0,18, 0,17",
"hostname": "rabbit"
}
It also happens that all the above can be ran locally or remotely. Or one-to-many remotely over SSH.
But... Wait a sec. Only $SHELL
? Can it do a bit more than this? Wait-wait. So if I can run arbitrary stuff already (which Ansible and Salt are anyway), then what stops me to call pure Ansible binary modules, and just access all that pile of crazy modules they've got already working? Nothing! Just scp
them there and stockpile on the client. In fact, just install the entire Ansible and run it there as is. It is as same as Saltsible works, after all.
Basically, I ended up with a message-driven cluster architecture that happens to be compatible with Ansible modules. Kind of. Not yet completely, but to bring that to 100% compatibility is no brainer, just need few more extra days to get done. Which not my goal and priority at the moment anyway.
So and then imagine if having Lua or Starlight (Python dialect) embedded, you could do the above another way:
import:
- ssh_functions
id: some-test
description: This is a test state
state:
gather-machine-summary:
- shell:
- {get_id()}
- {get_uptime()}
- {get_hostname()}
These would be a functions somewhere in a file ssh_functions.lua
. The same functions could do also this:
import:
- ssh_functions
id: some-test
description: This is a test state
state:
create-user:
- shell:
- {add_user(uid=get_salt_pillar("uid"))}
Well, you've got the idea. Anyway, I will focus on unchecked boxes above from yesterday and will finish this working at least anything. And thus won't extrapolate this up to eleven. At least not at the moment.
Update (17:40) Nanostates are happily running passed on scenarios on remote machines Sort of declarative orchestration. Not yet asynchronous. Few minutes left, maybe I will implement local runner?..
Update (somewhere evening) Refactored runners and implemented SSH runner as well as local runner. Nah, but the possibility of running Ansible modules FOR FREE is still bugging me! Instead of reimplementing PostgreSQL start/stop and prepare/mount distributed file system in a shell command line, how much time it will take to hook the whole Ansible modular system and use it from the nanostates? It is a Hackweek, after all!
Day Four (Thursday, 13 Feb)
What one can do in basically four days, having almost nothing? A lot! So far, my leftovers from yesterday:
TODO:
[x]
Integrate Ansible[x]
Runs binary modules[x]
Runs Python modules
[ ]
Write few nanostate scenarios to complete Node staging:[ ]
Reset/prepare PostgreSQL database[ ]
Prepare/mount distributed File System
[ ]
Cluster Node staging[ ]
Integrate staging part together with the Node Controller
[ ]
Node Controller (initial)[x]
Configuration file[x]
PostgreSQL event emitter[x]
Events are emitted from an arbitrary SUSE Manager to the Cluster bus via Node Controller (simple-n-stupid PoC ATM)[ ]
Commands from Cluster Directory received and mapped to the emitter facility (write one for XML-RPC APIs)
OK, well... Ansible would be certainly a right next step to look at, but ATM I'd rather save time and focus on emitting messages from the PostreSQL database, which is deep inside SUSE Manager's guts. So toss-in few basic shell commands for Node staging and that's it.
Update (12:00) SCNR...
Took this official Ansible module. Then wrote a nanostate snippet:
- ansible.helloworld:
name: "Cluster"
Result:
{
"Module": "ansible.helloworld",
"Errcode": 0,
"Errmsg": "",
"Response": [
{
"Host": "localhost",
"Response": {
"ansible.helloworld": {
"Stdout": "",
"Stderr": "",
"Errmsg": "",
"Errcode": 0,
"Json": {
"changed": false,
"failed": false,
"msg": "Hello, Cluster!"
}
}
}
}
]
}
Of course, this inherited Ansible's main illness: dont_run_this_twice.yaml
. Calling nanostates nanostates is too loud at the moment: they won't check the state, but just fire whatever in them "into the woods". But the goal of the project nor to write another Configuration Management, n̶e̶i̶t̶h̶e̶r̶ ̶s̶c̶a̶l̶e̶-̶o̶u̶t̶ ̶A̶n̶s̶i̶b̶l̶e̶ (oops, that just happened unplanned), neither to fix Ansible imperative behaviour and build around it declarative runners (which is not really a problem, BTW).
Oh well. Fun. Now messaging bus story: Postgres, here I come!
Update (somewhere evening) PostgreSQL happily spitting out every changes to its tables through whatever way in Uyuni Server. The XML-RPC APIs are very slow, on the other hand. I was exploring ways how to implement plugins in Go, so then I don't have to bundle everything into one binary. The gRPC way is the only reliable and nicely decouple-able. The "native" Go plugins are an interesting tech preview, working nice (as long as the same $GOPATH
and the same compiler is used) but sadly they seems still quite far away from production status. Plugins supposed to be written by different vendors, which is not the case right now seems to be supported.
I am right now solving problems how the Node Controller will reconcile network transaction across the entire cluster, making 100% sure all-or-none nodes has been updated. As always, there are several ways of doing it, but I have to find out which one suits best.
The Last Day of The Hackweek (Friday, 14 Feb)
At least it isn't Friday 13. Starting from touch main.go
, so far what I've got per these days:
Someone Did It
But I chose it and put it together. I chose that, because I can also support it and bugfix it.
- Running equivalent to
etcd
, which scales out way better thanetcd
. Check out TiKV. If you know Rust, you will have lots of fun. - Running MySQL compatibility layer on top of it. Performance is about 10x times slower than MySQL's InnoDB, but in this case performance isn't an issue at all. Important that this thing scales out infinitely, just a bit of space hungry. Check out TiDB
- Running distributed storage and mountable filesystem. If SES guys will one day support "SUSE Manager on Ceph nodes", it will be just fantastic. Until then — other solutions. Check out SeaweedFS and IPFS. The IPFS is running Tumbleweed repo at SUSE.
- Running message bus that supposed to scale out same as Apache Kafka does. The reason not to use Apache Kafka is very trivial: its infrastructure is much harder to maintain. But this is not a reason and so hard infrastructure maintenance on its own does not rules Kafka out! You want it? No problemo: — just add another adapter to Apache Kafka and replace with currently used NATS. In fact, NATS perfectly co-exists with Kafka in some infrastructures. Check out NATS
I Did It
- Running Client System Daemon (i.e. "runs on registered client system") which main basic role to ask Cluster what node to use, automatically reconfigure Salt Minion and other configuration and then re-point client system to a new Cluster Node (AKA Uyuni Server), if that is needed. It also recovers client system back to the cluster, once Cluster Node puffed in smokes.
- "one to many and many to one" API Gateway, which allows
spacecmd
and similar tools to "just work" across multiple nodes. Granted, it wasn't written during this Hackweek and is 99.999% compatible (I was too lazy to get back and implement overloaded XML-RPC signatures for REST, as well I am returningnil
instead of an empty dictionary — probably a bug, but... meh... later). This thing also runs Swagger UI for OpenAPI specs against all XML-RPC API for SUSE Manager. - Very basic Cluster Director that can manage zones in and add cluster nodes. It as well runs OpenAPI/Swagger UI. Very basic, because it has no features yet. But doesn't mean it doesn't have more-less solid architecture.
- Library that runs Ansible in Salt fashion (via bi-directional pub/sub, which rules out returners/pillars as unnecessary). I am going to use that internally instead of both Salt and Ansible on their own. Again, it is simpler to call Ansible module reusing existing scaled out infrastructure, rather then run-and-take-care-of yet another components. And I don't have to maintain Ansible: it is perfectly tested anyway.
- Library that resembles SaltSSH by running Ansible modules (both Python and binary). I am considering I've done it, because I could (doesn't mean I should).
- Very unfinished Initial Node Controller Daemon, which listens to Uyuni Server events and emits messages to the bus for further operations.
Phew. Not bad as for basically four days, I'd say! All that stuff I wrote in Go. I'd say it does make sense to use that language, if you don't want to write Java or Python.
What are my nearest plans?
- Finish the "loop" and have all components running, talking to each other, client nodes are transfered seamlessly.
- Achieve network transaction on updating Cluster Nodes.
- Write some Ansible modules, likely in plain old C and Rust, add their caching on the client so it will perform well, add seamless module updates. Generic Ansible doesn't hurt, but I don't need it for Cluster needs.
- Modular/pluggable system in Go, so this whole project can be adaptable to other products, not just SUSE Manager.
Presentation Slides
I've put an outline together all that into my Google Drive. Enjoy.
...and stay tuned
Looking for hackers with the skills:
This project is part of:
Hack Week 19
Activity
Comments
-
almost 5 years ago by bmaryniuk | Reply
Day One
TODO:
[x]
Running distributed K/V database[x]
Verify it is buildable/package-able
[x]
Running distributed message bus[x]
Verify it is buildable/package-able
[ ]
Manage cluster Zones[ ]
Summary
Got running Client Daemon. It can:
- Talk to Cluster Director (CD) and ask for status
- Switch/reconfigure Salt Minion according to the CD's directives
Similar Projects
Mortgage Plan Analyzer by RMestre
https://github.com/rjpmestre/mortgage-plan-analyzer
Project Description
Many people face challenges when trying to renegotiate their mortgages with different banks. They receive offers from multiple lenders and struggle to compare them effectively. Each proposal may have slightly different terms and data presentation, making it hard to make informed decisions. Additionally, understanding the impact of various taxes and variables can be complex. The Mortgage Plan Analyzer project aims to address these issues.
Project Overview:
The Mortgage Plan Analyzer is a web-based tool built using PHP, Laravel, Livewire, and AdminLTE/bootstrap. It provides a user-friendly platform for individuals to input basic specifications about their mortgage, adjust taxes and variables, and obtain short-term projections for each proposal. Users can also compare multiple mortgage offers side by side, enabling them to make informed decisions about their mortgage renegotiation.
Why Start This Project:
I found myself in this position and most tools I found around are either for marketing/selling purposes or not flexible enough. As i was starting getting lost in a jungle of spreadsheets i thought I could just create a tool to help me and others that may be experiencing the same struggles to provide clarity and transparency in the decision-making process.
Hackweek 25 ideas (to refine still :) )
- Euribor Trends in Projections
- - Use historical Euribor data to model optimistic and pessimistic scenarios for variable-rate loans.
- Use the annual summaries (installments, amortizations, etc) and run some analysis to highlight key differences, like short-term savings vs. long-term costs
- Financial plan can be hard/boring to follow. Create a simple viewing mode that summarizes monthly values and their annual sums.
Hackweek 24 update
- Improved summaries graphs by adding:
- - Line graph;
- - Accumulated line graph;
- - Set the range to short/mid/long term;
- - Highlight best simulation and value per year;
- Improve the general behaviour of the forms:
- - Simulations name setting;
- - Cloning simulations;
- - Adjust update timing on input changes;
- Show/Hide big tables;
- Support multi languages (added english);
- Added examples;
- Adjustments to fonts and sizes;
- Fixed loading screen;
- Dependencies adjustments;
Hackweek 23 initial release
- Developed a base site that:
- - Allows adding up to 3 simulations;
- - Create financial plans;
- - Simulations comparison graph for the first 4 years;
- Created Github project @ https://github.com/rjpmestre/mortgage-plan-analyzer ;
- Launched a demo instance using Oracle Cloud Free Tier currently @ http://138.3.251.182/
Resources
- Banco de Portugal: Main simulator all portuguese banks have to follow ( https://clientebancario.bportugal.pt/credito-habitacao )
- Laravel: A PHP web application framework for building robust and scalable applications. ( https://laravel.com/ )
- Livewire: A Laravel library for building dynamic interfaces without writing JavaScript. ( https://livewire.laravel.com/ )
- AdminLTE: A responsive admin dashboard template for creating a visually appealing interface. ( https://adminlte.io/ )
Save pytorch models in OCI registries by jguilhermevanz
Description
A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.
Goals
Allow PyTorch users to save and load machine learning models in OCI registries.
Resources
Learn enough Golang and hack on CoreDNS by jkuzilek
Description
I'm implementing a split-horizon DNS for my home Kubernetes cluster to be able to access my internal (and external) services over the local network through public domains. I managed to make a PoC with the k8s_gateway plugin for CoreDNS. However, I soon found out it responds with IPs for all Gateways assigned to HTTPRoutes, publishing public IPs as well as the internal Loadbalancer ones.
To remedy this issue, a simple filtering mechanism has to be implemented.
Goals
- Learn an acceptable amount of Golang
- Implement GatewayClass (and IngressClass) filtering for k8s_gateway
- Deploy on homelab cluster
- Profit?
Resources
- https://github.com/ori-edge/k8s_gateway/issues/36
- https://github.com/coredns/coredns/issues/2465#issuecomment-593910983
EDIT: Feature mostly complete. An unfinished PR lies here. Successfully tested working on homelab cluster.
Introducing "Bottles": A Proof of Concept for Multi-Version CRD Management in Kubernetes by aruiz
Description
As we delve deeper into the complexities of managing multiple CRD versions within a single Kubernetes cluster, I want to introduce "Bottles" - a proof of concept that aims to address these challenges.
Bottles propose a novel approach to isolating and deploying different CRD versions in a self-contained environment. This would allow for greater flexibility and efficiency in managing diverse workloads.
Goals
- Evaluate Feasibility: determine if this approach is technically viable, as well as identifying possible obstacles and limitations.
- Reuse existing technology: leverage existing products whenever possible, e.g. build on top of Kubewarden as admission controller.
- Focus on Rancher's use case: the ultimate goal is to be able to use this approach to solve Rancher users' needs.
Resources
Core concepts:
- ConfigMaps: Bottles could be defined and configured using ConfigMaps.
- Admission Controller: An admission controller will detect "bootled" CRDs being installed and replace the resource name used to store them.
- Aggregated API Server: By analyzing the author of a request, the aggregated API server will determine the correct bottle and route the request accordingly, making it transparent for the user.
SUSE AI Meets the Game Board by moio
Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
Results: Infrastructure Achievements
We successfully built and automated a containerized stack to support our AI experiments. This included:
- a Fully-Automated, One-Command, GPU-accelerated Kubernetes setup: we created an OpenTofu based script, tofu-tag, to deploy SUSE's RKE2 Kubernetes running on CUDA-enabled nodes in AWS, powered by openSUSE with GPU drivers and gpu-operator
- Containerization of the TAG and PyTAG frameworks: TAG (Tabletop AI Games) and PyTAG were patched for seamless deployment in containerized environments. We automated the container image creation process with GitHub Actions. Our forks (PRs upstream upcoming):
./deploy.sh
and voilà - Kubernetes running PyTAG (k9s
, above) with GPU acceleration (nvtop
, below)
Results: Game Design Insights
Our project focused on modeling and analyzing two card games of our own design within the TAG framework:
- Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
- AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
- Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .
- more about Bamboo on Dario's site
- more about R3 on Silvio's site (italian, translation coming)
- more about Totoro on Silvio's site
A family picture of our card games in progress. From the top: Bamboo, Totoro, R3
Results: Learning, Collaboration, and Innovation
Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:
- "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
- AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
- GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
- Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.
Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!
The Context: AI + Board Games
Small healthcheck tool for Longhorn by mbrookhuis
Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py
Mammuthus - The NFS-Ganesha inside Kubernetes controller by vcheng
Description
As the user-space NFS provider, the NFS-Ganesha is wieldy use with serval projects. e.g. Longhorn/Rook. We want to create the Kubernetes Controller to make configuring NFS-Ganesha easy. This controller will let users configure NFS-Ganesha through different backends like VFS/CephFS.
Goals
- Create NFS-Ganesha Package on OBS: nfs-ganesha5, nfs-ganesha6
- Create NFS-Ganesha Container Image on OBS: Image
- Create a Kubernetes controller for NFS-Ganesha and support the VFS configuration on demand. Mammuthus
Resources
terraform-provider-feilong by e_bischoff
Project Description
People need to test operating systems and applications on s390 platform.
Installation from scratch solutions include:
- just deploy and provision manually (with the help of
ftpboot
script, if you are at SUSE) - use
s3270
terminal emulation (used byopenQA
people?) - use
LXC
from IBM to start CP commands and analyze the results - use
zPXE
to do some PXE-alike booting (used by theorthos
team?) - use
tessia
to install from scratch using autoyast - use
libvirt
for s390 to do some nested virtualization on some already deployed z/VM system - directly install a Linux kernel on a LPAR and use
kvm
+libvirt
from there
Deployment from image solutions include:
- use
ICIC
web interface (openstack
in disguise, contributed by IBM) - use
ICIC
from theopenstack
terraform
provider (used byRancher
QA) - use
zvm_ansible
to controlSMAPI
- connect directly to
SMAPI
low-level socket interface
IBM Cloud Infrastructure Center (ICIC
) harnesses the Feilong API, but you can use Feilong
without installing ICIC
, provided you set up a "z/VM cloud connector" into one of your VMs following this schema.
What about writing a terraform Feilong
provider, just like we have the terraform
libvirt
provider? That would allow to transparently call Feilong
from your main.tf files to deploy and destroy resources on your system/z.
Other Feilong-based solutions include:
- make
libvirt
Feilong-aware - simply call
Feilong
from shell scripts withcurl
- use
zvmconnector
client python library from Feilong - use
zthin
part of Feilong to directly commandSMAPI
.
Goal for Hackweek 23
My final goal is to be able to easily deploy and provision VMs automatically on a z/VM system, in a way that people might enjoy even outside of SUSE.
My technical preference is to write a terraform provider plugin, as it is the approach that involves the least software components for our deployments, while remaining clean, and compatible with our existing development infrastructure.
Goals for Hackweek 24
Feilong provider works and is used internally by SUSE Manager team. Let's push it forward!
Let's add support for fiberchannel disks and multipath.
Goals for Hackweek 25
- Finish support for fiberchannel disks and multipath
- Fix problems with registration on hashicorp providers registry
A CLI for Harvester by mohamed.belgaied
[comment]: # Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI [comment]: # Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. [comment]: # Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.
Project Description
Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as:
harvester vm create my-vm --count 5
to create 5 VMs named my-vm-01
to my-vm-05
.
Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.
Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli
Done in previous Hackweeks
- Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
- Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE
Goal for this Hackweek
The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.
Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it
Issue list is here: https://github.com/belgaied2/harvester-cli/issues
Resources
The project is written in Go, and using client-go
the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact).
Welcome contributions are:
- Testing it and creating issues
- Documentation
- Go code improvement
What you might learn
Harvester CLI might be interesting to you if you want to learn more about:
- GitHub Actions
- Harvester as a SUSE Product
- Go programming language
- Kubernetes API
kubectl clone: Seamlessly Clone Kubernetes Resources Across Multiple Rancher Clusters and Projects by dpunia
Description
kubectl clone is a kubectl plugin that empowers users to clone Kubernetes resources across multiple clusters and projects managed by Rancher. It simplifies the process of duplicating resources from one cluster to another or within different namespaces and projects, with optional on-the-fly modifications. This tool enhances multi-cluster resource management, making it invaluable for environments where Rancher orchestrates numerous Kubernetes clusters.
Goals
- Seamless Multi-Cluster Cloning
- Clone Kubernetes resources across clusters/projects with one command.
- Simplifies management, reduces operational effort.
Resources
Rancher & Kubernetes Docs
- Rancher API, Cluster Management, Kubernetes client libraries.
Development Tools
- Kubectl plugin docs, Go programming resources.
Building and Installing the Plugin
- Set Environment Variables: Export the Rancher URL and API token:
export RANCHER_URL="https://rancher.example.com"
export RANCHER_TOKEN="token-xxxxx:xxxxxxxxxxxxxxxxxxxx"
- Build the Plugin: Compile the Go program:
go build -o kubectl-clone ./pkg/
- Install the Plugin:
Move the executable to a directory in your
PATH
:
mv kubectl-clone /usr/local/bin/
Ensure the file is executable:
chmod +x /usr/local/bin/kubectl-clone
- Verify the Plugin Installation: Test the plugin by running:
kubectl clone --help
You should see the usage information for the kubectl-clone
plugin.
Usage Examples
- Clone a Deployment from One Cluster to Another:
kubectl clone --source-cluster c-abc123 --type deployment --name nginx-deployment --target-cluster c-def456 --new-name nginx-deployment-clone
- Clone a Service into Another Namespace and Modify Labels:
suse-rancher-supportconfig by eminguez
Description
Update: Live at https://github.com/e-minguez/suse-rancher-supportconfig I finally didn't used golang but used gum instead
SUSE's supportconfig
support tool collects data from the SUSE Operating system. Rancher's rancher2_logs_collector.sh
support tool does the same for RKE2/K3s.
Wouldn't be nice to have a way to run both and collect all data for SUSE based RKE2/K3s clusters? Wouldn't be even better with a fancy TUI tool like bubbletea?
Ideally the output should be an html page where you can see the logs/data directly from the browser.
Goals
- Familiarize myself with both
supportconfig
andrancher2_logs_collector.sh
tools - Refresh my golang knowledge
- Have something that works at the end of the hackweek ("works" may vary )
- Be better in naming things
Resources
All links provided above as well as huh
Dartboard TUI by IValentin
Description
Our scalability and performance testing swiss-army knife tool Dartboard is a major WIP so why not add more scope creep? Dartboard is a cli tool which enables users to:
- Define a "Dart" config file as YAML which defines the various components to be created/setup when Dartboard runs its commands
- Spin up infrastructure utilizing opentofu/terraform providers
- Setup K3s or RKE2 clusters on the newly created infrastructure
- Deploy Rancher (with or without downstream cluster), rancher-monitoring (Grafana + Prometheus)
- Create resources in-bulk within the newly created Rancher cluster (ConfigMaps, Secrets, Users, Roles, etc.)
- Run various performance and scalability tests via k6
- Export/Import various tracked metrics (WIP)
Given all these features (and the features to come), it can be difficult to onboard and transfer knowledge of the tool. With a TUI, Dartboard's usage complexity can be greatly reduced!
Goals
- Create a TUI for Dartboard's "subcommands"
- Gain more familiarity with Dartboard and create a more user-friendly interface to enable others to use it
- Stretch Create a TUI workflow for generating a Dart file
Resources
https://github.com/charmbracelet/bubbletea
Contribute to terraform-provider-libvirt by pinvernizzi
Description
The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.
It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.
If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.
Goals
- Get more familiar with Terraform provider development and libvirt bindings in Go
- Solve some issues and/or implement some features
- Get in touch with the community around the project
Resources
- CONTRIBUTING readme
- Go libvirt library in use by the project
- Terraform plugin development
- "Good first issue" list
toptop - a top clone written in Go by dshah
Description
toptop
is a clone of Linux's top
CLI tool, but written in Go.
Goals
Learn more about Go (mainly bubbletea) and Linux
Resources
Hack on rich terminal user interfaces by amanzini
Description
TUIs (Textual User Interface) are a big classic of our daily workflow. Many linux users 'live' in the terminal and modern implementations have a lot to offer : unicode fonts, 24 bit colors etc.
Goals
- Explore the current available solution on modern languages and implement a PoC , for example a small maze generator, porting of a classic game or just display the HackWeek cute logo.
- Practice some Go / Rust coding and programming patterns
- Fiddle around, hack, learn, have fun
- keep a development diary, practice on project documentation
Follow this link for source code repository
- includes development diary
Some ideas for inspiration:
- https://github.com/coding-horror/basic-computer-games
- https://git.imzadi.de/acn/vt100-games
- https://github.com/skx/lighthouse-of-doom
- https://github.com/rothgar/awesome-tuis
- https://www.zq1.de/~bernhard/images/share/geeko/logo.txt
Related projects:
Resources
Python:
Go:
Rust:
Misc:
Metrics Server viewer for Kubernetes by bkampen
This project is finished please visit the github repo below for the tool.
Description
Build a CLI tools which can visualize Kubernetes metrics from the metrics-server, so you're able to watch these without installing Prometheus and Grafana on a cluster.
Goals
- Learn more about metrics-server
- Learn more about the inner workings of Kubernetes.
- Learn more about Go
Resources
https://github.com/bvankampen/metrics-viewer
ClusterOps - Easily install and manage your personal kubernetes cluster by andreabenini
Description
ClusterOps is a Kubernetes installer and operator designed to streamline the initial configuration
and ongoing maintenance of kubernetes clusters. The focus of this project is primarily on personal
or local installations. However, the goal is to expand its use to encompass all installations of
Kubernetes for local development purposes.
It simplifies cluster management by automating tasks and providing just one user-friendly YAML-based
configuration config.yml
.
Overview
- Simplified Configuration: Define your desired cluster state in a simple YAML file, and ClusterOps will handle the rest.
- Automated Setup: Automates initial cluster configuration, including network settings, storage provisioning, special requirements (for example GPUs) and essential components installation.
- Ongoing Maintenance: Performs routine maintenance tasks such as upgrades, security updates, and resource monitoring.
- Extensibility: Easily extend functionality with custom plugins and configurations.
- Self-Healing: Detects and recovers from common cluster issues, ensuring stability, idempotence and reliability. Same operation can be performed multiple times without changing the result.
- Discreet: It works only on what it knows, if you are manually configuring parts of your kubernetes and this configuration does not interfere with it you can happily continue to work on several parts and use this tool only for what is needed.
Features
- distribution and engine independence. Install your favorite kubernetes engine with your package
manager, execute one script and you'll have a complete working environment at your disposal.
- Basic config approach. One single
config.yml
file with configuration requirements (add/remove features): human readable, plain and simple. All fancy configs managed automatically (ingress, balancers, services, proxy, ...). - Local Builtin ContainerHub. The default installation provides a fully configured ContainerHub available locally along with the kubernetes installation. This configuration allows the user to build, upload and deploy custom container images as they were provided from external sources. Internet public sources are still available but local development can be kept in this localhost server. Builtin ClusterOps operator will be fetched from this ContainerHub registry too.
- Kubernetes official dashboard installed as a plugin, others planned too (k9s for example).
- Kubevirt plugin installed and properly configured. Unleash the power of classic virtualization (KVM+QEMU) on top of Kubernetes and manage your entire system from there, libvirtd and virsh libs are required.
- One operator to rule them all. The installation script configures your machine automatically during installation and adds one kubernetes operator to manage your local cluster. From there the operator takes care of the cluster on your behalf.
- Clean installation and removal. Just test it, when you are done just use the same program to uninstall everything without leaving configs (or pods) behind.
Planned features (Wishlist / TODOs)
- Containerized Data Importer (CDI). Persistent storage management add-on for Kubernetes to provide a declarative way of building and importing Virtual Machine Disks on PVCs for
Kanidm: A safe and modern IDM system by firstyear
Kanidm is an IDM system written in Rust for modern systems authentication. The github repo has a detailed "getting started" on the readme.
In addition Kanidm has spawn a number of adjacent projects in the Rust ecosystem such as LDAP, Kerberos, Webauthn, and cryptography libraries.
In this hack week, we'll be working on Quokca, a certificate authority that supports PKCS11/TPM storage of keys, issuance of PIV certificates, and ACME without the feature gatekeeping implemented by other CA's like smallstep.
For anyone who wants to participate in Kanidm, we have documentation and developer guides which can help.
I'm happy to help and share more, so please get in touch!
Hacking on sched_ext by flonnegren
Description
Sched_ext upstream has some interesting issues open for grabs:
Goals
Send patches to sched_ext upstream
Also set up perfetto to trace some of the example schedulers.
Resources
https://github.com/sched-ext/scx
SMB3 Server written entirely in Rust by dmulder
Description
Given the number of bugs frequently discovered in the Samba code caused by memory issues, it makes sense to re-write the smbd service purely in Rust code. Meanwhile, it would be wise to abandon backwards compatibility here with insecure protocol versions, and simply implement the SMB3 spec.
Goals
Get a simple server up and running and get it merged into upstream Samba (which now has Rust build support).
Resources
Grapesss: a physical Shamir's Secret Sharing application [ESP32-C3 + Mobile] by ecandino
Description
A couple of years ago I created StegoSecretS, a small cli used to encrypt and split a secret into multiple keys, using the Shamir's Secret Sharing algorithm.
The idea is to re-implement the project using physical devices. One device alone will be useless, but when close together they can be used to decrypt the secret.
On a practical side the user encrypts the secret with a mobile application. The same application is used to split the secret, and load the partial keys into different micro-controllers. Another user will be able to decrypt the secret only having at least N devices close together (using the application).
I'm planning to use a couple of ESP32-C3 I bought, and build a very simple Android mobile application.
Goals
- Learn about Rust and micro-controllers (ESP32-C3)
- Learn about mobile applications (Android and Kotlin)
Resources
Hack on isotest-ng - a rust port of isotovideo (os-autoinst aka testrunner of openQA) by szarate
Description
Some time ago, I managed to convince ByteOtter to hack something that resembles isotovideo but in Rust, not because I believe that Perl is dead, but more because there are certain limitations in the perl code (how it was written), and its always hard to add new functionalities when they are about implementing a new backend, or fixing bugs (Along with people complaining that Perl is dead, and that they don't like it)
In reality, I wanted to see if this could be done, and ByteOtter proved that it could be, while doing an amazing job at hacking a vnc console, and helping me understand better what RuPerl needs to work.
I plan to keep working on this for the next few years, and while I don't aim for feature completion or replacing isotovideo tih isotest-ng (name in progress), I do plan to be able to use it on a daily basis, using specialized tooling with interfaces, instead of reimplementing everything in the backend
Todo
- Add
make
targets for testability, e.g "spawn qemu and type" - Add image search matching algorithm
- Add a Null test distribution provider
- Add a Perl Test Distribution Provider
- Fix unittests https://github.com/os-autoinst/isotest-ng/issues/5
- Research OpenTofu how to add new hypervisors/baremetal to OpenTofu
- Add an interface to openQA cli
Goals
- Implement at least one of the above, prepare proposals for GSoC
- Boot a system via it's BMC
Resources
See https://github.com/os-autoinst/isotest-ng