The goal of this project is to get an overview of the state-of-the-art technology on training and deploying machine learning projects with kubernetes and apply that to a SUSE CaaSP cluster.

With that in mind, we will train and deploy a model for summarizing github issues:

https://github.com/kubeflow/examples/tree/master/githubissuesummarization

This example, will make use of the following technology:

kubeflow: Machine Learning Toolkit for Kubernetes
Keras: The Python Deep Learning library
Seldon Core: Machine Learning Deployment for Kubernetes
Tensorfow: An open source machine learning framework for everyone
cri-o: Lightweight Container Runtime for Kubernetes
Kubernetes
SUSE CaaSP: SUSE Container as a Service Platform
Nvidia container engine
Cuda

For this project, I will use a workstation with a nvidia GeForce GTX 1060 which is supported by CUDA and I will install SUSE CaaSP in it.

Looking for hackers with the skills:

machinelearning kubeflow keras seldoncore tensorflow cri-o kubernetes caasp nvidia cuda gpu containers

This project is part of:

Hack Week 17 Hack Week 18

Activity

over 6 years ago: mmontecchi liked this project.

over 6 years ago: mvarlese liked this project.

over 6 years ago: a_faerber liked this project.

over 7 years ago: mbrugger liked this project.

over 7 years ago: jordimassaguerpla started this project.

over 7 years ago: jordimassaguerpla added keyword "machinelearning" to this project.

over 7 years ago: jordimassaguerpla added keyword "kubeflow" to this project.

over 7 years ago: jordimassaguerpla added keyword "keras" to this project.

over 7 years ago: jordimassaguerpla added keyword "seldoncore" to this project.

over 7 years ago: jordimassaguerpla added keyword "tensorflow" to this project.

over 7 years ago: jordimassaguerpla added keyword "cri-o" to this project.

over 7 years ago: jordimassaguerpla added keyword "kubernetes" to this project.

over 7 years ago: jordimassaguerpla added keyword "caasp" to this project.

over 7 years ago: jordimassaguerpla added keyword "nvidia" to this project.

over 7 years ago: jordimassaguerpla added keyword "cuda" to this project.

over 7 years ago: jordimassaguerpla added keyword "gpu" to this project.

over 7 years ago: jordimassaguerpla added keyword "containers" to this project.

over 7 years ago: jordimassaguerpla added keyword "cuda" to this project.

over 7 years ago: jordimassaguerpla added keyword "gpu" to this project.

over 7 years ago: jordimassaguerpla added keyword "containers" to this project.

over 7 years ago: jordimassaguerpla added keyword "machinelearning" to this project.

over 7 years ago: jordimassaguerpla added keyword "kubeflow" to this project.

over 7 years ago: jordimassaguerpla added keyword "keras" to this project.

over 7 years ago: jordimassaguerpla added keyword "seldoncore" to this project.

over 7 years ago: jordimassaguerpla added keyword "tensorflow" to this project.

Comments

over 7 years ago by jordimassaguerpla | Reply

I think I was a bit too ambitious when I wrote this description :) ... but it is been fun any way.

This is what I accomplished
- Seting up a SUSE CaaSP cluster where the admin and the master where running on top of kvm and the worker was a workstation with an nvidia GPU. The first trick was to setup the virtual machines to use the ethernet network interface from the host (macvtap). For whatever reason I could not setup this with the virt-manager run as a "normal user" but I could if I started virt-manager from YaST (with root permissions... may that be the reason?). The second trick was to restrict master to 2GB of RAM and admin to 4GB, so I could run this on my laptop (thanks @ereslibre !) Finally the third trick was to add "hostname=UNIQUE_HOSTNAME" as a linuxrc parameter when installing each machine (otherwise they were all be named linux.lan :) )
- Building nvidia packages for CaaSP. Nvidia packages built for SLE12SP3 by SUSE, but provided by nvidia at http://download.nvidia.com/suse/sles12sp3, had been built for an older kernel than the one released in CaaSP. Thus, when installing those packages, the nvidia kernel modules could not be loaded. For this reason, I built them for the latest kernel in openSUSE Leap 42.3, and install them at the same time I was upgrading the kernel to the one in openSUSE Leap 42.3 (see [0] why openSUSE Leap 42.3). You can download them from this project.
- Installing and fixing nvidia-runtime-hooks and libnvidia-containers: There is no package for SUSE but instead I took the ones from centos 7; the trick was to run a centos7 container, and follow the instructions from https://nvidia.github.io/libnvidia-container/, but add the "--download-only" option to yum. Luckily, the packages installed without any error ... but they were not really working! Using "strace nvidia-container-cli info" I realized the problem was on the permissions of /dev/nvidia* files. Thus, running "chmod 0666 /dev/nvidia*" fixed the installation... but you have to do this on every reboot (actually, everytime the nvidia mod is loaded). The trick was to use "transactional-update shell" to do all these changes :) . Note I am not installing nvidia-container-runtime, but only the hook. That is because we will use cri-o and not docker. For cri-o we don't need to install the nvidia-container-runtime.
See as a "proof":

> nvidia-container-cli info

NVRM version: 390.67
CUDA version: 9.1

Device Index: 0
Device Minor: 0
Model: GeForce GTX 1060 3GB
GPU UUID: GPU-f96a76d4-7ba9-07cc-2774-bb7a55ef3e68
Bus Location: 00000000:02:00.0
Architecture: 6.1
- Setting up the cri-o hook to use libnvidia-container: I just had to follow the instructions here: https://github.com/kubernetes-incubator/cri-o/issues/1222. I couldn't really verify this, but I am quite confident this worked, as kubelet was starting and parsing the hook.
and this is where I failed
- Using a chainned forward proxy to add the workstation into a SUSE CaaSP cluster which was running in a SUSE Cloud cluster I tried configuring 2 proxies with apache2 and using modproxy, modproxyhttp, modproxyconnect, where both were configured as forward proxies and the second one was using the "RemoteProxy" configuration to "chain" with the first one. Then I placed the first one inside the SUSE Cloud cluster, as a virtual machine, and the second one on my laptop. The tricked worked, and I was able to access the autoyast file from the admin node which was in the SUSE Cloud cluster (http://adminnode/autoyast), when installing the workstation via the DVD, even thought the admin node was not accessible outside the SUSE Cloud cluster, and the SUSE Cloud cluster is inside the vpn, where the workstation is not (but the laptop is). It sounds a bit complicated but actually the solution was quite simple. However, salt-minion does not use http but zeromq, and was not going through the proxies.
- Building nvidia-container and libnvidia-container packages for SUSE: I tried getting the spec file from github but it required too many tunning that it would have taken me the whole hackweek (or more) to have them building for SUSE, so I ended up using the ones from centos 7.
- Setting up k8s to schedule jobs that require gpu: Even thought cri-o seemed correctly configured, jobs were not being scheduled. More docs I found in internet were referring to add the "--experimental-nvidia-gpus=1" option to kubelet, but this is not possible because kubelet does not recognize this option and fails to start. Then, I read in the k8s docs about enabling this via a device plugin: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/. This required enabling feature gates, which by default is not. Here I think I failed cause I didn't know how to do it and unfortunately I run out of time ... However, while writing this report, flavio pointed me to https://wiki.microfocus.com/index.php/SUSECaaSPlatform/FAQ#HowtoenableKubernetesfeature_gates (thanks @flavio_castelli !) where you can see how to enable the feature gates. This is where we should resume the work if we have some time at some point.
- Run a kubeflow deployment I didn't had time to reach to this point. This was the last step and a project on its own. Next hackweek, maybe...
[0] Why openSUSE Leap 42.3? SLE12SP3 has the same common code as openSUSE Leap 42.3, and for the hackweek I wanted to build the nvidia package in the openBuildService https://build.opensuse.org. Using openSUSE Leap 42.3 (plus its update repo) was easier than trying to build that for exact kernel that has been shipped in CaaSPv3.

over 7 years ago by jordimassaguerpla | Reply

and thanks to @vrothberg for helping me out with cri-o/podman.

over 7 years ago by jordimassaguerpla | Reply

The url for how to enable the feature gates got formatted weirdly ... This is the url

https://wiki.microfocus.com/index.php/SUSE_CaaS_Platform/FAQ#How_to_enable_Kubernetes_feature_gates

and I think this is an internal document, so for the ones that do not have access:

How to enable Kubernetes feature gates

Feature gates are a way used by kubernetes to enable experimental features in advance.

It's possible to enable Kubernetes feature gates on SUSE CaaS Platform 3.

Please note: feature gates are experimental features, hence they won't be supported by SUSE.

Let's assume a user wants to use two feature gates:
```
DevicePlugins
ReadOnlyAPIDataVolumes
```
The user would have to log into the admin node and execute this command:
```
docker exec $(docker ps | grep velum-dashboard | awk {'print $1'}) entrypoint.sh bundle exec rails runner "Pillar.apply(kubernetes_feature_gates: 'DevicePlugins=true,ReadOnlyAPIDataVolumes=true')"
```
And then issue an orchestration. This can be done using the following command on the admin node:
```
docker exec $(docker ps | grep salt-master | awk {'print $1'}) salt-run state.orchestrate orch.kubernetes
```

The url for how to enable the feature gates got formatted weirdly ... This is the url

https://wiki.microfocus.com/index.php/SUSE_CaaS_Platform/FAQ#How_to_enable_Kubernetes_feature_gates

and I think this is an internal document, so for the ones that do not have access:

How to enable Kubernetes feature gates

Feature gates are a way used by kubernetes to enable experimental features in advance.

It's possible to enable Kubernetes feature gates on SUSE CaaS Platform 3.

Please note: feature gates are experimental features, hence they won't be supported by SUSE.

Let's assume a user wants to use two feature gates:

DevicePlugins
ReadOnlyAPIDataVolumes

The user would have to log into the admin node and execute this command:

docker exec $(docker ps | grep velum-dashboard | awk {'print $1'}) entrypoint.sh bundle exec rails runner "Pillar.apply(kubernetes_feature_gates: 'DevicePlugins=true,ReadOnlyAPIDataVolumes=true')"

And then issue an orchestration. This can be done using the following command on the admin node:

docker exec $(docker ps | grep salt-master | awk {'print $1'}) salt-run state.orchestrate orch.kubernetes

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

over 7 years ago by jordimassaguerpla | Reply

Following the instructions in the previous comment, I was able to enable the device plugin.

However, when deploying the nvidia plugin, this didn't deploy a pod as expected, so I opened an issue upstream asking for more information.

https://github.com/NVIDIA/k8s-device-plugin/issues/62

over 7 years ago by jordimassaguerpla | Reply

namespaced RoleBinding would add host path mount privileges , without granting excess privileges over all namespaces:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nvidia-device-plugin
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nvidia-device-plugin-psp-privileged
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: suse:caasp:psp:privileged
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: nvidia-device-plugin
  namespace: kube-system

And then in your DeamonSet spec, `serviceAccount: nvidia-device-plugin` .

This creates the ServiceAccount+RoleBinding in the kube-system
namespace - if you're deploying into another NS, swap out `kube-system` 
for the namespace you're using.

Thanks to Ludovic and Kiall

namespaced RoleBinding would add host path mount privileges , without granting excess privileges over all namespaces:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nvidia-device-plugin
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nvidia-device-plugin-psp-privileged
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: suse:caasp:psp:privileged
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: nvidia-device-plugin
  namespace: kube-system

And then in your DeamonSet spec, `serviceAccount: nvidia-device-plugin` .

This creates the ServiceAccount+RoleBinding in the kube-system
namespace - if you're deploying into another NS, swap out `kube-system` 
for the namespace you're using.

Thanks to Ludovic and Kiall

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

over 6 years ago by jordimassaguerpla | Reply

Trying it again with CaaSPv4 :) on hackweek 18

over 6 years ago by jordimassaguerpla | Reply

So I was able to make SUSE CaaSP schedule jobs that need an nvidia GPU to the node that has an nvidia GPU :)

Here the documentation:

> Disclaimer > > This is a hackweek project, so this is not ready for production use. It contains hacks and workarounds just to "make it work". > > This has been tested with SUSE CaaSP Beta 3 (public beta). I used that "kubernetes distribution" because given I am > involved on that project, I am familiar with it (and I love it :) ) > >However, instructions here should be also valid openSUSE Kubic and in general for any kubernetes+cri-o distribution

How to setup SUSE CaaSP to work with GPU

...

Installing SUSE CaaSP

We need a cluster, that is obvious :), so let's start with installing two nodes with SUSE CaaSP 4.0 Beta3 powered by SUSE Linux Enterprise Server 15 SP1, which will become our worker and master nodes:
- gpu: A bare metal workstation with NVIDIA Quadro K2000 graphics card
- master:A virtual machine
You can do that by following the SUSE CaaSP Beta 3 deployment instructions.

> Tip
Make sure you have a user "sles" which can run "sudo" without a password, and that both nodes have the other's public ssh keys in "~sles/.ssh/authorizedkeys". Also, as stated in the deployment guide, that you have the ssh-agent running, and add the hostnames into /etc/hosts so they are both reachable by hostname. Then, disable _firewalld and enable sshd.

> Tip Do not configure a swap partition and set the vm to use 2 CPUs. Otherwise, SUSE CaaSP will fail to install. >

Finally, we need both machines to be in the same network. For this, I setup the vm to use a macvtap host device. You can find more info on macvtap here. However, I just did that with virt-manager on openSUSE Leap 15.1*.

(*): This is not precise. Actually, for whatever reason, I could not setup this with the virt-manager run as a "normal user" but I could if I started virt-manager from YaST (with root permissions... may that be the reason?)

>Tip Do you want to know if the cluster is properly setup? Run the k8s conformance tests. >

Once we have a SUSE CaaSP cluster running, we can proceed to install the NVIDIA drivers.

Installing nvidia graphics driver kernel module

So we have a workstation with NVIDIA GPU compatible with CUDA (in our case Quadro K2000). Now is time to install the right drivers so we can use that.

Drivers can be installed from NVIDIA download servers: zypper ar https://developer.nvidia.com/cuda-gpus nvidia zypper ref zypper install nvidia-gfxG05-kmp-default

You can check if drivers are loaded: lsmod | grep nvidia

Now that we have the drivers, we can install the NVIDIA driver for computing with GPUs using CUDA.

Installing NVIDIA driver for computing with GPUs using CUDA

After installing nvidia drivers, we need to install the nvidia-computeG05. Given we setup the nvidia repo from the previous step, all we need to do is:
```
zypper install nvidia-computeG05
```
You can check if this is running by running nvidia-smi

and you should get this output

Wed Jun 26 14:30:59 2019
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2000 Off | 00000000:05:00.0 Off | N/A | | 31% 47C P8 N/A / N/A | 0MiB / 1998MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Installing libnvidia

Installing nvidia-container-cli

Installing nvidia-container-runtime-hook

Creating a Service Daemon Device Nvidia

Testing

TODOs
- Package
- Package text
- over 6 years ago by jordimassaguerpla | Reply
  
  I thought I was going to be able to edit the comment afterwards, sorry for this mess. I will instead create a github page and link it from here.

So I was able to make SUSE CaaSP schedule jobs that need an nvidia GPU to the node that has an nvidia GPU :)

Here the documentation:

> Disclaimer > > This is a hackweek project, so this is not ready for production use. It contains hacks and workarounds just to "make it work". > > This has been tested with SUSE CaaSP Beta 3 (public beta). I used that "kubernetes distribution" because given I am > involved on that project, I am familiar with it (and I love it :) ) > >However, instructions here should be also valid openSUSE Kubic and in general for any kubernetes+cri-o distribution

How to setup SUSE CaaSP to work with GPU

...

Installing SUSE CaaSP

We need a cluster, that is obvious :), so let's start with installing two nodes with SUSE CaaSP 4.0 Beta3 powered by SUSE Linux Enterprise Server 15 SP1, which will become our worker and master nodes:

gpu: A bare metal workstation with NVIDIA Quadro K2000 graphics card
master:A virtual machine

You can do that by following the SUSE CaaSP Beta 3 deployment instructions.

> Tip
Make sure you have a user "sles" which can run "sudo" without a password, and that both nodes have the other's public ssh keys in "~sles/.ssh/authorizedkeys". Also, as stated in the deployment guide, that you have the ssh-agent running, and add the hostnames into /etc/hosts so they are both reachable by hostname. Then, disable _firewalld and enable sshd.

> Tip Do not configure a swap partition and set the vm to use 2 CPUs. Otherwise, SUSE CaaSP will fail to install. >

Finally, we need both machines to be in the same network. For this, I setup the vm to use a macvtap host device. You can find more info on macvtap here. However, I just did that with virt-manager on openSUSE Leap 15.1*.

(*): This is not precise. Actually, for whatever reason, I could not setup this with the virt-manager run as a "normal user" but I could if I started virt-manager from YaST (with root permissions... may that be the reason?)

>Tip Do you want to know if the cluster is properly setup? Run the k8s conformance tests. >

Once we have a SUSE CaaSP cluster running, we can proceed to install the NVIDIA drivers.

Installing nvidia graphics driver kernel module

So we have a workstation with NVIDIA GPU compatible with CUDA (in our case Quadro K2000). Now is time to install the right drivers so we can use that.

Drivers can be installed from NVIDIA download servers: zypper ar https://developer.nvidia.com/cuda-gpus nvidia zypper ref zypper install nvidia-gfxG05-kmp-default

You can check if drivers are loaded: lsmod | grep nvidia

Now that we have the drivers, we can install the NVIDIA driver for computing with GPUs using CUDA.

Installing NVIDIA driver for computing with GPUs using CUDA

After installing nvidia drivers, we need to install the nvidia-computeG05. Given we setup the nvidia repo from the previous step, all we need to do is:

zypper install nvidia-computeG05

You can check if this is running by running nvidia-smi

and you should get this output

Wed Jun 26 14:30:59 2019
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2000 Off | 00000000:05:00.0 Off | N/A | | 31% 47C P8 N/A / N/A | 0MiB / 1998MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Installing libnvidia

Installing nvidia-container-cli

Installing nvidia-container-runtime-hook

Creating a Service Daemon Device Nvidia

Testing

TODOs

Package
Package text

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

over 6 years ago by jordimassaguerpla | Reply

Docs: https://github.com/jordimassaguerpla/SUSEhackweek18/blob/master/HowtosetupSUSECaaSPkubernetescrio_GPU.md

over 6 years ago by jordimassaguerpla | Reply

docs

over 6 years ago by jordimassaguerpla | Reply

01-How to setup SUSE CaaSP kubernetes crio GPU 02-How to deploy kubeflow Deploying a KF example with Kera, Tensorflow and Seldon

over 6 years ago by jordimassaguerpla | Reply

The third document needs some "love" but I can say this time I was able to make it and use all these technologies:
- kubeflow: Machine Learning Toolkit for Kubernetes
- Keras: The Python Deep Learning library
- Seldon Core: Machine Learning Deployment for Kubernetes
- Tensorfow: An open source machine learning framework for everyone
- cri-o: Lightweight Container Runtime for Kubernetes
- Kubernetes
- SUSE CaaSP: SUSE Container as a Service Platform
- Nvidia container engine
- Cuda

Similar Projects

machinelearning

Song Search with CLAP by gcolangiuli

Description

Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

Goals

Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

Music Tagging;
Free text search;
Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

Result

In this MVP we implemented:

Async Song Analysis with Clap model
Free Text Search of the songs
Similar song search based on vector representation
Containerised version with web interface

We also documented what went well and what can be improved in the use of AI.

You can have a look at the result here:

Future implementation can be related to performance improvement and stability of the analysis.

References

CLAP: The main model being researched;
huggingface: Pre-trained models for CLAP;
Free Music Archive: Creative Commons songs that can be used for testing;

kubernetes

Rancher/k8s Trouble-Maker by tonyhansen

Project Description

When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.

Goals for Hackweek 25

Update to modern Rancher and verify that existing tests still work
Change testing logic to populate secrets instead of requiring a secondary script
Add new tests

Goals for Hackweek 24 (Complete)

Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
Create at least 5 modules that can be applied to the cluster and require troubleshooting

Resources

https://github.com/celidon/rancher-troublemaker
https://github.com/rancher/terraform-provider-rancher2
https://github.com/rancher/tf-rancher-up
https://github.com/rancher/quickstart

Exploring Modern AI Trends and Kubernetes-Based AI Infrastructure by jluo

Description

Build a solid understanding of the current landscape of Artificial Intelligence and how modern cloud-native technologies—especially Kubernetes—support AI workloads.

Goals

Use Gemini Learning Mode to guide the exploration, surface relevant concepts, and structure the learning journey:

Gain insight into the latest AI trends, tools, and architectural concepts.
Understand how Kubernetes and related cloud-native technologies are used in the AI ecosystem (model training, deployment, orchestration, MLOps).

Resources

Red Hat AI Topic Articles
- https://www.redhat.com/en/topics/ai
Kubeflow Documentation
- https://www.kubeflow.org/docs/
Q4 2025 CNCF Technology Landscape Radar report:
- https://www.cncf.io/announcements/2025/11/11/cncf-and-slashdata-report-finds-leading-ai-tools-gaining-adoption-in-cloud-native-ecosystems/
- https://www.cncf.io/wp-content/uploads/2025/11/cncfreporttechradar_111025a.pdf
Agent-to-Agent (A2A) Protocol
- https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/

Preparing KubeVirtBMC for project transfer to the KubeVirt organization by zchang

Description

KubeVirtBMC is preparing to transfer the project to the KubeVirt organization. One requirement is to enhance the modeling design's security. The current v1alpha1 API (the VirtualMachineBMC CRD) was designed during the proof-of-concept stage. It's immature and inherently insecure due to its cross-namespace object references, exposing security concerns from an RBAC perspective.

The other long-awaited feature is the ability to mount virtual media so that virtual machines can boot from remote ISO images.

Goals

Deliver the v1beta1 API and its corresponding controller implementation
Enable the Redfish virtual media mount function for KubeVirt virtual machines

Resources

A CLI for Harvester by mohamed.belgaied

Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.

Project Description

Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as: harvester vm create my-vm --count 5 to create 5 VMs named my-vm-01 to my-vm-05.

Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.

Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli

Done in previous Hackweeks

Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE

Goal for this Hackweek

The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.

Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it

Issue list is here: https://github.com/belgaied2/harvester-cli/issues

Resources

The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact). Welcome contributions are:

Testing it and creating issues
Documentation
Go code improvement

What you might learn

Harvester CLI might be interesting to you if you want to learn more about:

GitHub Actions
Harvester as a SUSE Product
Go programming language
Kubernetes API
Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)

The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it.

The Plan

Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:

❥ The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

❥ The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

❥ Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.

If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

Why?

We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

The CONCLUSION!!!

A State of the Union document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below!

containers

Technical talks at universities by agamez

Description

This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.

For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.

Goals

Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.

Resources

Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.

Port the classic browser game HackTheNet to PHP 8 by dgedon

Description

The classic browser game HackTheNet from 2004 still runs on PHP 4/5 and MySQL 5 and needs a port to PHP 8 and e.g. MariaDB.

Goals

Port the game to PHP 8 and MariaDB 11
Create a container where the game server can simply be started/stopped

Resources

https://github.com/nodeg/hackthenet

Rewrite Distrobox in go (POC) by fabriziosestito

Description

Rewriting Distrobox in Go.

Main benefits:

Easier to maintain and to test
Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)

Goals

Build a minimal starting point with core commands
Keep the CLI interface compatible: existing users shouldn't notice any difference
Use a clean Go architecture with adapters for different container backends
Keep dependencies minimal and binary size small
Benchmark against the original shell script

Resources

Upstream project: https://github.com/89luca89/distrobox/
Distrobox site: https://distrobox.it/
ArchWiki: https://wiki.archlinux.org/title/Distrobox

Help Create A Chat Control Resistant Turnkey Chatmail/Deltachat Relay Stack - Rootless Podman Compose, OpenSUSE BCI, Hardened, & SELinux by 3nd5h1771fy

Description

The Mission: Decentralized & Sovereign Messaging

FYI: If you have never heard of "Chatmail", you can visit their site here, but simply put it can be thought of as the underlying protocol/platform decentralized messengers like DeltaChat use for their communications. Do not confuse it with the honeypot looking non-opensource paid for prodect with better seo that directs you to chatmailsecure(dot)com

In an era of increasing centralized surveillance by unaccountable bad actors (aka BigTech), "Chat Control," and the erosion of digital privacy, the need for sovereign communication infrastructure is critical. Chatmail is a pioneering initiative that bridges the gap between classic email and modern instant messaging, offering metadata-minimized, end-to-end encrypted (E2EE) communication that is interoperable and open.

However, unless you are a seasoned sysadmin, the current recommended deployment method of a Chatmail relay is rigid, fragile, difficult to properly secure, and effectively takes over the entire host the "relay" is deployed on.

Why This Matters

A simple, host agnostic, reproducible deployment lowers the entry cost for anyone wanting to run a privacy‑preserving, decentralized messaging relay. In an era of perpetually resurrected chat‑control legislation threats, EU digital‑sovereignty drives, and many dangers of using big‑tech messaging platforms (Apple iMessage, WhatsApp, FB Messenger, Instagram, SMS, Google Messages, etc...) for any type of communication, providing an easy‑to‑use alternative empowers:

Censorship resistance - No single entity controls the relay; operators can spin up new nodes quickly.
Surveillance mitigation - End‑to‑end OpenPGP encryption ensures relay operators never see plaintext.
Digital sovereignty - Communities can host their own infrastructure under local jurisdiction, aligning with national data‑policy goals.

By turning the Chatmail relay into a plug‑and‑play container stack, we enable broader adoption, foster a resilient messaging fabric, and give developers, activists, and hobbyists a concrete tool to defend privacy online.

Goals

As I indicated earlier, this project aims to drastically simplify the deployment of Chatmail relay. By converting this architecture into a portable, containerized stack using Podman and OpenSUSE base container images, we can allow anyone to deploy their own censorship-resistant, privacy-preserving communications node in minutes.

Our goal for Hack Week: package every component into containers built on openSUSE/MicroOS base images, initially orchestrated with a single container-compose.yml (podman-compose compatible). The stack will:

Run on any host that supports Podman (including optimizations and enhancements for SELinux‑enabled systems).
Allow network decoupling by refactoring configurations to move from file-system constrained Unix sockets to internal TCP networking, allowing containers achieve stricter isolation.
Utilize Enhanced Security with SELinux by using purpose built utilities such as udica we can quickly generate custom SELinux policies for the container stack, ensuring strict confinement superior to standard/typical Docker deployments.
Allow the use of bind or remote mounted volumes for shared data (/var/vmail, DKIM keys, TLS certs, etc.).
Replace the local DNS server requirement with a remote DNS‑provider API for DKIM/TXT record publishing.

By delivering a turnkey, host agnostic, reproducible deployment, we lower the barrier for individuals and small communities to launch their own chatmail relays, fostering a decentralized, censorship‑resistant messaging ecosystem that can serve DeltaChat users and/or future services adopting this protocol

Resources