Kubernetes API caching layer according to Stable Diffusion

Make it faster!

There are use cases that put the Kubernetes API under heavy load - using Rancher at scale can be one of them.

Also, there are use cases in which a connection to the Kubernetes API might not always be present, or with good bandwidth - using Rancher for edge use cases can be one of them.

This project aims to create a local cache serving data from the Kubernetes API - with good performance and displaying last-good-results on a flaky connection.

Goal for this Hackweek

Implement Proof-Of-Concept client-go components backed by SQLite.

https://github.com/moio/vai

Resources

Golang and ideally Kubernetes hackers are more than welcome!

Looking for hackers with the skills:

kubernetes k8s api golang go performance testautomation scalability

This project is part of:

Hack Week 22

Activity

  • almost 3 years ago: lizhang liked this project.
  • almost 3 years ago: moio added keyword "k8s" to this project.
  • almost 3 years ago: moio added keyword "api" to this project.
  • almost 3 years ago: moio added keyword "golang" to this project.
  • almost 3 years ago: moio added keyword "go" to this project.
  • almost 3 years ago: moio added keyword "performance" to this project.
  • almost 3 years ago: moio added keyword "testautomation" to this project.
  • almost 3 years ago: moio added keyword "scalability" to this project.
  • almost 3 years ago: moio added keyword "kubernetes" to this project.
  • almost 3 years ago: moio liked this project.
  • almost 3 years ago: paulgonin liked this project.
  • almost 3 years ago: moio started this project.
  • almost 3 years ago: moio originated this project.

  • Comments

    • moio
      almost 3 years ago by moio | Reply

      Day 1 question: is a separate daemon design better than creating an Informer backed by a SQL cache.Store?

    • moio
      almost 3 years ago by moio | Reply

      Day 1 answer: no. Pivoting project to the creation of a SQL-based Indexer

    • moio
      almost 3 years ago by moio | Reply

      Day 2 progress: SQL-backed Store works. https://github.com/moio/vai

    • moio
      almost 3 years ago by moio | Reply

      Day 3 progress: SQL-backed Indexer works

    • moio
      almost 3 years ago by moio | Reply

      Day 4 question: where would it fit best? Steve or Lasso, and where?

    • moio
      almost 3 years ago by moio | Reply

      Day 4 answer: Steve, as an alternative to the current LRU cache of k8s API responses

    • moio
      almost 3 years ago by moio | Reply

      Day 5 progress: SQL-backed ThreadSafeStore works. History-preserving VersionedStore also works

    Similar Projects

    Exploring Modern AI Trends and Kubernetes-Based AI Infrastructure by jluo

    Description

    Build a solid understanding of the current landscape of Artificial Intelligence and how modern cloud-native technologies—especially Kubernetes—support AI workloads.

    Goals

    Use Gemini Learning Mode to guide the exploration, surface relevant concepts, and structure the learning journey:

    • Gain insight into the latest AI trends, tools, and architectural concepts.
    • Understand how Kubernetes and related cloud-native technologies are used in the AI ecosystem (model training, deployment, orchestration, MLOps).

    Resources

    • Red Hat AI Topic Articles

      • https://www.redhat.com/en/topics/ai
    • Kubeflow Documentation

      • https://www.kubeflow.org/docs/
    • Q4 2025 CNCF Technology Landscape Radar report:

      • https://www.cncf.io/announcements/2025/11/11/cncf-and-slashdata-report-finds-leading-ai-tools-gaining-adoption-in-cloud-native-ecosystems/
      • https://www.cncf.io/wp-content/uploads/2025/11/cncfreporttechradar_111025a.pdf
    • Agent-to-Agent (A2A) Protocol

      • https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/


    Rancher/k8s Trouble-Maker by tonyhansen

    Project Description

    When studying for my RHCSA, I found trouble-maker, which is a program that breaks a Linux OS and requires you to fix it. I want to create something similar for Rancher/k8s that can allow for troubleshooting an unknown environment.

    Goals for Hackweek 25

    • Update to modern Rancher and verify that existing tests still work
    • Change testing logic to populate secrets instead of requiring a secondary script
    • Add new tests

    Goals for Hackweek 24 (Complete)

    • Create a basic framework for creating Rancher/k8s cluster lab environments as needed for the Break/Fix
    • Create at least 5 modules that can be applied to the cluster and require troubleshooting

    Resources

    • https://github.com/celidon/rancher-troublemaker
    • https://github.com/rancher/terraform-provider-rancher2
    • https://github.com/rancher/tf-rancher-up
    • https://github.com/rancher/quickstart


    Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0

    Self-Scaling LLM Infrastructure Powered by Rancher

    logo


    Description

    The Problem

    Running LLMs can get expensive and complex pretty quickly.

    Today there are typically two choices:

    1. Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
    2. Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.

    What if there was a middle ground?

    What if infrastructure scaled itself instead of making you scale it?

    Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?

    Project Repository: github.com/alexander-demicev/llmserverless


    What This Project Does

    A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.

    A complete, self-scaling LLM infrastructure that:

    • Scales to zero when idle (no idle costs)
    • Scales up automatically when requests come in
    • Adds more nodes when needed, removes them when demand drops
    • Runs on any infrastructure - laptop, bare metal, or cloud

    Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.

    How It Works

    A combination of open source tools working together:

    Flow:

    • Users interact with OpenWebUI (chat interface)
    • Requests go to LiteLLM Gateway
    • LiteLLM routes requests to:
      • Ollama (Knative) for local model inference (auto-scales pods)
      • Or cloud APIs for fallback


    Kubernetes-Based ML Lifecycle Automation by lmiranda

    Description

    This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.

    The pipeline will automate the lifecycle of a machine learning model, including:

    • Data ingestion/collection
    • Model training as a Kubernetes Job
    • Model artifact storage in an S3-compatible registry (e.g. Minio)
    • A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
    • A lightweight inference service that loads and serves the latest model
    • Monitoring of model performance and service health through Prometheus/Grafana

    The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.

    Goals

    By the end of Hack Week, the project should:

    1. Produce a fully functional ML pipeline running on Kubernetes with:

      • Data collection job
      • Training job container
      • Storage and versioning of trained models
      • Automated deployment of new model versions
      • Model inference API service
      • Basic monitoring dashboards
    2. Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.

    3. Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).

    4. Prepare a short demo explaining the end-to-end process and how new models flow through the system.

    Resources

    Project Repository

    Updates

    1. Training pipeline and datasets
    2. Inference Service py


    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    Bugzilla goes AI - Phase 1 by nwalter

    Description

    This project, Bugzilla goes AI, aims to boost developer productivity by creating an autonomous AI bug agent during Hackweek. The primary goal is to reduce the time employees spend triaging bugs by integrating Ollama to summarize issues, recommend next steps, and push focused daily reports to a Web Interface.

    Goals

    To reduce employee time spent on Bugzilla by implementing an AI tool that triages and summarizes bug reports, providing actionable recommendations to the team via Web Interface.

    Project Charter

    https://docs.google.com/document/d/1HbAvgrg8T3pd1FIx74nEfCObCljpO77zz5In_Jpw4as/edit?usp=sharing## Description

    Project Achievements during Hackweek

    In this file you can read about what we achieved during Hackweek.

    https://docs.google.com/document/d/14gtG9-ZvVpBgkh33Z4AM6iLFWqZcicQPD41MM-Pg0/edit?usp=sharing


    HTTP API for nftables by crameleon

    Background

    The idea originated in https://progress.opensuse.org/issues/164060 and is about building RESTful API which translates authorized HTTP requests to operations in nftables, possibly utilizing libnftables-json(5).

    Originally, I started developing such an interface in Go, utilizing https://github.com/google/nftables. The conversion of string networks to nftables set elements was problematic (unfortunately no record of details), and I started a second attempt in Python, which made interaction much simpler thanks to native nftables Python bindings.

    Goals

    1. Find and track the issue with google/nftables
    2. Revisit and polish the Go or Python code (prefer Go, but possibly depends on implementing missing functionality), primarily the server component
    3. Finish functionality to interact with nftables sets (retrieving and updating elements), which are of interest for the originating issue
    4. Align test suite
    5. Packaging

    Resources

    • https://git.netfilter.org/nftables/tree/py/src/nftables.py
    • https://git.com.de/Georg/nftables-http-api (to be moved to GitHub)
    • https://build.opensuse.org/package/show/home:crameleon:containers/pytest-nftables-container

    Results

    • Started new https://github.com/tacerus/nftables-http-api.
    • First Go nftables issue was related to set elements needing to be added with different start and end addresses - coincidentally, this was recently discovered by someone else, who added a useful helper function for this: https://github.com/google/nftables/pull/342.
    • Further improvements submitted: https://github.com/google/nftables/pull/347.

    Side results

    Upon starting to unify the structure and implementing more functionality, missing JSON output support was noticed for some subcommands in libnftables. Submitted patches here as well:

    • https://lore.kernel.org/netfilter-devel/20251203131736.4036382-2-georg@syscid.com/T/#u


    Q2Boot - A handy QEMU VM launcher by amanzini

    Description

    Q2Boot (Qemu Quick Boot) is a command-line tool that wraps QEMU to provide a streamlined experience for launching virtual machines. It automatically configures common settings like KVM acceleration, virtio drivers, and networking while allowing customization through both configuration files and command-line options.

    The project originally was a personal utility in D, now recently rewritten in idiomatic Go. It lives at repository https://github.com/ilmanzo/q2boot

    Goals

    Improve the project, testing with different scenarios , address issues and propose new features. It will benefit of some basic integration testing by providing small sample disk images.

    Updates

    • Dec 1, 2025 : refactor command line options, added structured logging. Released v0.0.2
    • Dec 2, 2025 : added external monitor via telnet option
    • Dec 4, 2025 : released v0.0.3 with architecture auto-detection
    • Dec 5, 2025 : filing new issues and general polishment. Designing E2E testing

    Resources


    go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes

    Description

    The go-git library implements the git internals in pure Go, so that any Go application can handle not only Git repositories, but also lower-level primitives (e.g. packfiles, idxfiles, etc) without needing to shell out to the git binary.

    The focus for this Hackweek is to fast track key improvements for the project ahead of the upstream release of Git V3, which may take place at some point next year.

    Goals

    Stretch goals

    Resources

    • https://github.com/go-git/go-git/
    • https://go-git.github.io/docs/


    Rewrite Distrobox in go (POC) by fabriziosestito

    Description

    Rewriting Distrobox in Go.

    Main benefits:

    • Easier to maintain and to test
    • Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)

    Goals

    • Build a minimal starting point with core commands
    • Keep the CLI interface compatible: existing users shouldn't notice any difference
    • Use a clean Go architecture with adapters for different container backends
    • Keep dependencies minimal and binary size small
    • Benchmark against the original shell script

    Resources

    • Upstream project: https://github.com/89luca89/distrobox/
    • Distrobox site: https://distrobox.it/
    • ArchWiki: https://wiki.archlinux.org/title/Distrobox


    Contribute to terraform-provider-libvirt by pinvernizzi

    Description

    The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.

    It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.

    If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.

    Goals

    • Get more familiar with Terraform provider development and libvirt bindings in Go
    • Solve some issues and/or implement some features
    • Get in touch with the community around the project

    Resources


    Updatecli Autodiscovery supporting WASM plugins by olblak

    Description

    Updatecli is a Golang Update policy engine that allow to write Update policies in YAML manifest. Updatecli already has a plugin ecosystem for common update strategies such as automating Dockerfile or Kubernetes manifest from Git repositories.

    This is what we call autodiscovery where Updatecli generate manifest and apply them dynamically based on some context.

    Obviously, the Updatecli project doesn't accept plugins specific to an organization.

    I saw project using different languages such as python, C#, or JS to generate those manifest.

    It would be great to be able to share and reuse those specific plugins

    During the HackWeek, I'll hang on the Updatecli matrix channel

    https://matrix.to/#/#Updatecli_community:gitter.im

    Goals

    Implement autodiscovery plugins using WASM. I am planning to experiment with https://github.com/extism/extism

    To build a simple WASM autodiscovery plugin and run it from Updatecli

    Resources

    • https://github.com/extism/extism
    • https://github.com/updatecli/updatecli
    • https://www.updatecli.io/docs/core/autodiscovery/
    • https://matrix.to/#/#Updatecli_community:gitter.im


    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    DONE in HackWeek 24:

    DONE in 2025 (out of Hackweek)

    • Support of ClusterClass
    • Add to clusterctl community providers, you can add it directly with clusterctl
    • Testing on newer versions of Harvester v1.4.X and v1.5.X
    • Support for clusterctl generate cluster ...
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)

    Goals for HackWeek 2025

    • FIRST and FOREMOST, any topic is important to you
    • Add e2e testing
    • Certify the provider for Rancher Turtles
    • Add Machine pool labeling
    • Add PCI-e passthrough capabilities.
    • Other improvement suggestions are welcome!

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    SUSE Health Check Tools by roseswe

    SUSE HC Tools Overview

    A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.

    Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.

    Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.

    Goals

    • Overall improvement of the tools
    • Adding new collectors
    • Add support for SLES16

    Resources

    csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go

    docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*

    $ getrpm -r pacemaker >> Product ID: 2795 (SUSE Linux Enterprise Server for SAP Applications 15 SP7 x86_64), RPM Name: +--------------+----------------------------+--------+--------------+--------------------+ | Package Name | Version | Arch | Release | Repository | +--------------+----------------------------+--------+--------------+--------------------+ | pacemaker | 2.1.10+20250718.fdf796ebc8 | x86_64 | 150700.3.3.1 | sle-ha/15.7/x86_64 | | pacemaker | 2.1.9+20250410.471584e6a2 | x86_64 | 150700.1.9 | sle-ha/15.7/x86_64 | +--------------+----------------------------+--------+--------------+--------------------+ Total packages found: 2


    HTTP API for nftables by crameleon

    Background

    The idea originated in https://progress.opensuse.org/issues/164060 and is about building RESTful API which translates authorized HTTP requests to operations in nftables, possibly utilizing libnftables-json(5).

    Originally, I started developing such an interface in Go, utilizing https://github.com/google/nftables. The conversion of string networks to nftables set elements was problematic (unfortunately no record of details), and I started a second attempt in Python, which made interaction much simpler thanks to native nftables Python bindings.

    Goals

    1. Find and track the issue with google/nftables
    2. Revisit and polish the Go or Python code (prefer Go, but possibly depends on implementing missing functionality), primarily the server component
    3. Finish functionality to interact with nftables sets (retrieving and updating elements), which are of interest for the originating issue
    4. Align test suite
    5. Packaging

    Resources

    • https://git.netfilter.org/nftables/tree/py/src/nftables.py
    • https://git.com.de/Georg/nftables-http-api (to be moved to GitHub)
    • https://build.opensuse.org/package/show/home:crameleon:containers/pytest-nftables-container

    Results

    • Started new https://github.com/tacerus/nftables-http-api.
    • First Go nftables issue was related to set elements needing to be added with different start and end addresses - coincidentally, this was recently discovered by someone else, who added a useful helper function for this: https://github.com/google/nftables/pull/342.
    • Further improvements submitted: https://github.com/google/nftables/pull/347.

    Side results

    Upon starting to unify the structure and implementing more functionality, missing JSON output support was noticed for some subcommands in libnftables. Submitted patches here as well:

    • https://lore.kernel.org/netfilter-devel/20251203131736.4036382-2-georg@syscid.com/T/#u


    Create a go module to wrap happy-compta.fr by cbosdonnat

    Description

    https://happy-compta.fr is a tool for french work councils simple book keeping. While it does the job, it has no API to work with and it is tedious to enter loads of operations.

    Goals

    Write a go client module to be used as an API to programmatically manipulate the tool.

    Writing an example tool to load data from a CSV file would be good too.


    Contribute to terraform-provider-libvirt by pinvernizzi

    Description

    The SUSE Manager (SUMA) teams' main tool for infrastructure automation, Sumaform, largely relies on terraform-provider-libvirt. That provider is also widely used by other teams, both inside and outside SUSE.

    It would be good to help the maintainers of this project and give back to the community around it, after all the amazing work that has been already done.

    If you're interested in any of infrastructure automation, Terraform, virtualization, tooling development, Go (...) it is also a good chance to learn a bit about them all by putting your hands on an interesting, real-use-case and complex project.

    Goals

    • Get more familiar with Terraform provider development and libvirt bindings in Go
    • Solve some issues and/or implement some features
    • Get in touch with the community around the project

    Resources


    RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

    Description

    The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.

    We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.

    This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.

    The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.

    Goals

    The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.

    Key goals for the week:

    1. Analyze & Identify: Dive into the SUSE/rmt Ruby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions).
    2. Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
    3. Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like rb-sys or magnus.
    4. Benchmark: Create a benchmarking script (e.g., using k6, ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients.
    5. Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.

    Resources

    • RMT Source Code (Ruby):
      • https://github.com/SUSE/rmt
    • RMT Documentation:
      • https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
    • Tooling & Stacks:
      • RMT/Ruby development environment (for running the base RMT)
      • Rust development environment (rustup, cargo)
    • Potential Integration Libraries:
      • rb-sys: https://github.com/oxidize-rb/rb-sys
      • Magnus: https://github.com/matsadler/magnus
    • Benchmarking Tools:
      • k6 (https://k6.io/)
      • ab (ApacheBench)


    dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari

    Description

    dynticks-testing is a project started years ago by Frederic Weisbecker. One of the feature is to check the actual configuration (isolcpus, irqaffinity etc etc) and give feedback on it.

    An important goal of this tool is to parse the output of trace-cmd / perf and provide more readable data, showing the duration of every events grouped by PID (showing also the CPU number, if the tasks has been migrated etc).

    An example of data captured on my laptop (incomplete!!):

              -0     [005] dN.2. 20310.270699: sched_wakeup:         WaylandProxy:46380 [120] CPU:005
              -0     [005] d..2. 20310.270702: sched_switch:         swapper/5:0 [120] R ==> WaylandProxy:46380 [120]
    ...
        WaylandProxy-46380 [004] d..2. 20310.295397: sched_switch:         WaylandProxy:46380 [120] S ==> swapper/4:0 [120]
              -0     [006] d..2. 20310.295397: sched_switch:         swapper/6:0 [120] R ==> firefox:46373 [120]
             firefox-46373 [006] d..2. 20310.295408: sched_switch:         firefox:46373 [120] S ==> swapper/6:0 [120]
              -0     [004] dN.2. 20310.295466: sched_wakeup:         WaylandProxy:46380 [120] CPU:004
    

    Output of noise_parse.py:

    Task: WaylandProxy Pid: 46380 cpus: {4, 5} (Migrated!!!)
            Wakeup Latency                                Nr:        24     Duration:          89
            Sched switch: kworker/12:2                    Nr:         1     Duration:           6
    

    My first contribution is around Nov. 2024!

    Goals

    • add more features (eg cpuset)
    • test / bugfix

    Resources

    Progresses

    isolcpus and cpusets implemented and merged in master: dynticks-testing.git commit


    RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

    Description

    The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.

    We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.

    This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.

    The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.

    Goals

    The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.

    Key goals for the week:

    1. Analyze & Identify: Dive into the SUSE/rmt Ruby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions).
    2. Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
    3. Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like rb-sys or magnus.
    4. Benchmark: Create a benchmarking script (e.g., using k6, ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients.
    5. Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.

    Resources

    • RMT Source Code (Ruby):
      • https://github.com/SUSE/rmt
    • RMT Documentation:
      • https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
    • Tooling & Stacks:
      • RMT/Ruby development environment (for running the base RMT)
      • Rust development environment (rustup, cargo)
    • Potential Integration Libraries:
      • rb-sys: https://github.com/oxidize-rb/rb-sys
      • Magnus: https://github.com/matsadler/magnus
    • Benchmarking Tools:
      • k6 (https://k6.io/)
      • ab (ApacheBench)