Project Description
MONAI Deploy aims to become the de-facto standard for developing packaging, testing, deploying, and running medical AI applications in clinical production. MONAI Deploy creates a set of intermediate steps where researchers and physicians can build confidence in the techniques and approaches used with AI — allowing for an iterative workflow.
Contributors to MONAI include Nvidia, Mayo Clinic, King's College London, NHS, Standford University, ... and many others.
The core piece of MONAI Deploy is the Monai Application Package (MAP), which contains the ML model in a "runnable state". It is implemented as a container, and built with docker-nvidia. This is relevant because you need a GPU to build the MAP. You'll see later why this is relevant. This container can be later run on a k8s cluster.
Before you create a MAP, you need to train the ML models. They are trained with MONAI Core, another framework which is a piece of the whole MONAI "puzzle". Those models can be published in the MONAI Model Zoo. They are published using a very specific format, which is called "a bundle".
In the monai deploy app sdk project in github, you can see several examples on how to package a "model bundle" into a MAP. Plus in the documentation you can find a step by step guide on how to build them, meaning how to create the MAP (the container). Examples, both code and documentation, use the ML models in the MONAI model Zoo.
MONAI model Zoo is free, you can search for models and use them for your research. However, there is not such a repository for MAPs, even the docs and examples show how to build those models into MAPS.
And this is the motivation of the project, to create this "link", and release into a registry, at least one MAP based on a model in MONAI Model Zoo.
Goal for this Hackweek
The specific goal is to implement a Continuous Integration workflow that builds a MAP (Monai Application Package), based on the example in code and documentation. Specifically, it is to implement a github action workflow that releases it into github container registry.
Implementation
The github action workflow will be added to a fork of the monai-deploy-app-sdk project, given we will be using the code that is already in the examples directory. Later a Pull Request can be created to the upstream project.
A limitation of this project is that we need to run the github action in a GPU node. ASFAIK github does not support that, so we need to run this on an external runner. For that I will be using MS Azure cloud to host a vm with GPU. For 3 reasons: first, it should be faster to clone from github from azure; second, I will try to use the free 90 days; third, I want to get familiar with Azure.
Finally, most probably I will use terraform to deploy the node in Azure.
This way, every time we want to release a new model in the MAP format, we will deploy a vm in Azure, do the build with the GPU, release into the github container registry, and remove the vm.
Resources
https://monai.io/ https://monai.io/model-zoo.html https://docs.monai.io/projects/monai-deploy-app-sdk/en/latest/gettingstarted/tutorials/monaibundle_app.html https://github.com/Project-MONAI/monai-deploy-app-sdk/tree/main/examples/apps
Looking for hackers with the skills:
ml mlops ai artificial-intelligence gpu azure cloud monai medical containers github_actions github-ci ci
This project is part of:
Hack Week 23
Activity
Comments
-
about 2 years ago by jordimassaguerpla | Reply
I was able to create a terraform file and a workflow file but then I was not able to make the build work.
Here the terraform file:
https://github.com/jordimassaguerpla/monai-deploy-app-sdk/blob/main/main.tf
Here the workflow file:
https://github.com/jordimassaguerpla/monai-deploy-app-sdk/blob/main/.github/workflows/buildandpush_models.yml
-
about 2 years ago by jordimassaguerpla | Reply
I think the issue is that it tries to load the container, but I had not installed nvidia-docker2, and thus it can't load the container.
-
about 2 years ago by jordimassaguerpla | Reply
Here the fix for libseccomp, so nvidia-container-toolkit can be installed: https://build.opensuse.org/request/show/1128309 Here the fix for nvidia-holoscan: https://github.com/nvidia-holoscan/holoscan-sdk/pull/14
With these 2 fixes and by increasing the Disc in Azure to 64GB, I was able to build the ML model as a container :)
-
about 2 years ago by jordimassaguerpla | Reply
And voilà, here the MONAI Application Package ready to be used:
https://github.com/jordimassaguerpla/monai-deploy-app-sdk/pkgs/container/monai-deploy-app-sdk%2Fsimple_app-x64-workstation-dgpu-linux-amd64
Similar Projects
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Updates
- Training pipeline and datasets
- Inference Service py
Exploring Modern AI Trends and Kubernetes-Based AI Infrastructure by jluo
Description
Build a solid understanding of the current landscape of Artificial Intelligence and how modern cloud-native technologies—especially Kubernetes—support AI workloads.
Goals
Use Gemini Learning Mode to guide the exploration, surface relevant concepts, and structure the learning journey:
- Gain insight into the latest AI trends, tools, and architectural concepts.
- Understand how Kubernetes and related cloud-native technologies are used in the AI ecosystem (model training, deployment, orchestration, MLOps).
Resources
Red Hat AI Topic Articles
- https://www.redhat.com/en/topics/ai
Kubeflow Documentation
- https://www.kubeflow.org/docs/
Q4 2025 CNCF Technology Landscape Radar report:
- https://www.cncf.io/announcements/2025/11/11/cncf-and-slashdata-report-finds-leading-ai-tools-gaining-adoption-in-cloud-native-ecosystems/
- https://www.cncf.io/wp-content/uploads/2025/11/cncfreporttechradar_111025a.pdf
Agent-to-Agent (A2A) Protocol
- https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Updates
- Training pipeline and datasets
- Inference Service py
Try out Neovim Plugins supporting AI Providers by enavarro_suse
Description
Experiment with several Neovim plugins that integrate AI model providers such as Gemini and Ollama.
Goals
Evaluate how these plugins enhance the development workflow, how they differ in capabilities, and how smoothly they integrate into Neovim for day-to-day coding tasks.
Resources
- Neovim 0.11.5
- AI-enabled Neovim plugins:
- avante.nvim: https://github.com/yetone/avante.nvim
- Gp.nvim: https://github.com/Robitx/gp.nvim
- parrot.nvim: https://github.com/frankroeder/parrot.nvim
- gemini.nvim: https://dotfyle.com/plugins/kiddos/gemini.nvim
- ...
- Accounts or API keys for AI model providers.
- Local model serving setup (e.g., Ollama)
- Test projects or codebases for practical evaluation:
- OBS: https://build.opensuse.org/
- OBS blog and landing page: https://openbuildservice.org/
- ...
GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov
Motivation
What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?
Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?
Description
To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.
Goals
Set up a Python environment
Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).
Build the Dialogue Loop
- Write a basic Python script using the Gemini API.
- Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation
Socratic/Systemic Strategy Implementation
- Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
- Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
- Implement Bugzillla call to collect the
- Implement Questioning Framework as LLVM pre-conditioning
- Define set of instructions
- Assemble the Tool
Resources
What are Systemic Questions?
Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.
Gitlab Project
gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question
Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0
Self-Scaling LLM Infrastructure Powered by Rancher

Description
The Problem
Running LLMs can get expensive and complex pretty quickly.
Today there are typically two choices:
- Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
- Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.
What if there was a middle ground?
What if infrastructure scaled itself instead of making you scale it?
Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?
Project Repository: github.com/alexander-demicev/llmserverless
What This Project Does
A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.
A complete, self-scaling LLM infrastructure that:
- Scales to zero when idle (no idle costs)
- Scales up automatically when requests come in
- Adds more nodes when needed, removes them when demand drops
- Runs on any infrastructure - laptop, bare metal, or cloud
Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.
How It Works
A combination of open source tools working together:
Flow:
- Users interact with OpenWebUI (chat interface)
- Requests go to LiteLLM Gateway
- LiteLLM routes requests to:
- Ollama (Knative) for local model inference (auto-scales pods)
- Or cloud APIs for fallback
MCP Trace Suite by r1chard-lyu
Description
This project plans to create an MCP Trace Suite, a system that consolidates commonly used Linux debugging tools such as bpftrace, perf, and ftrace.
The suite is implemented as an MCP Server. This architecture allows an AI agent to leverage the server to diagnose Linux issues and perform targeted system debugging by remotely executing and retrieving tracing data from these powerful tools.
- Repo: https://github.com/r1chard-lyu/systracesuite
- Demo: Slides
Goals
Build an MCP Server that can integrate various Linux debugging and tracing tools, including bpftrace, perf, ftrace, strace, and others, with support for future expansion of additional tools.
Perform testing by intentionally creating bugs or issues that impact system performance, allowing an AI agent to analyze the root cause and identify the underlying problem.
Resources
- Gemini CLI: https://geminicli.com/
- eBPF: https://ebpf.io/
- bpftrace: https://github.com/bpftrace/bpftrace/
- perf: https://perfwiki.github.io/main/
- ftrace: https://github.com/r1chard-lyu/tracium/
Update M2Crypto by mcepl
There are couple of projects I work on, which need my attention and putting them to shape:
Goal for this Hackweek
- Put M2Crypto into better shape (most issues closed, all pull requests processed)
- More fun to learn jujutsu
- Play more with Gemini, how much it help (or not).
- Perhaps, also (just slightly related), help to fix vis to work with LuaJIT, particularly to make vis-lspc working.
SUSE Observability MCP server by drutigliano
Description
The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.
This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.
Goals
- Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
- Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
- Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
- Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.
Hackweek STEP
- Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.
Scope
- Implement read-only MCP server that can:
- Connect to a live SUSE Observability instance and authenticate (with API token)
- Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
- Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
- Return the data as a structured JSON payload compliant with the MCP specification.
Deliverables
- MCP Server v0.1 A running Golang MCP server with at least one tool.
- A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.
Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.
Resources
- https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
- https://www.datadoghq.com/blog/datadog-remote-mcp-server
- https://modelcontextprotocol.io/specification/2025-06-18/index
- https://modelcontextprotocol.io/docs/develop/build-server
Basic implementation
- https://github.com/drutigliano19/suse-observability-mcp-server
Results
Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.
Example execution
Provide personal account sign-in for Himmelblau (Azure Entra Id) by dmulder
Description
Himmelblau currently does not support personal account sign-in, but only sign-in for business/school accounts. Adding personal account sign-in would broaden the userbase, and potentially attract more users to Linux from Windows (since they could easily migrate their existing Windows account, etc).
Goals
Implement personal account sign-in for libhimmelblau.
Resources
gitlab.com/samba-team/libhimmelblau
Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti
Description
The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.
The initial architecture can be checked out on the Repository listed under Resources.
Among the features that will differ this project from other monitoring/notification systems:
- Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
- All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
- Easy account management, being able to parse all required configuration by a single JSON file
- Eliminate integrations by not requiring metrics to go through a data-agreggator
Goals
- Create a minimal working prototype following the workflow specified on the documentation
- Provide instructions on installation/usage
- Work on email notifying capabilities
Resources
Rewrite Distrobox in go (POC) by fabriziosestito
Description
Rewriting Distrobox in Go.
Main benefits:
- Easier to maintain and to test
- Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)
Goals
- Build a minimal starting point with core commands
- Keep the CLI interface compatible: existing users shouldn't notice any difference
- Use a clean Go architecture with adapters for different container backends
- Keep dependencies minimal and binary size small
- Benchmark against the original shell script
Resources
- Upstream project: https://github.com/89luca89/distrobox/
- Distrobox site: https://distrobox.it/
- ArchWiki: https://wiki.archlinux.org/title/Distrobox
Help Create A Chat Control Resistant Turnkey Chatmail/Deltachat Relay Stack - Rootless Podman Compose, OpenSUSE BCI, Hardened, & SELinux by 3nd5h1771fy
Description
The Mission: Decentralized & Sovereign Messaging
FYI: If you have never heard of "Chatmail", you can visit their site here, but simply put it can be thought of as the underlying protocol/platform decentralized messengers like DeltaChat use for their communications. Do not confuse it with the honeypot looking non-opensource paid for prodect with better seo that directs you to chatmailsecure(dot)com
In an era of increasing centralized surveillance by unaccountable bad actors (aka BigTech), "Chat Control," and the erosion of digital privacy, the need for sovereign communication infrastructure is critical. Chatmail is a pioneering initiative that bridges the gap between classic email and modern instant messaging, offering metadata-minimized, end-to-end encrypted (E2EE) communication that is interoperable and open.
However, unless you are a seasoned sysadmin, the current recommended deployment method of a Chatmail relay is rigid, fragile, difficult to properly secure, and effectively takes over the entire host the "relay" is deployed on.
Why This Matters
A simple, host agnostic, reproducible deployment lowers the entry cost for anyone wanting to run a privacy‑preserving, decentralized messaging relay. In an era of perpetually resurrected chat‑control legislation threats, EU digital‑sovereignty drives, and many dangers of using big‑tech messaging platforms (Apple iMessage, WhatsApp, FB Messenger, Instagram, SMS, Google Messages, etc...) for any type of communication, providing an easy‑to‑use alternative empowers:
- Censorship resistance - No single entity controls the relay; operators can spin up new nodes quickly.
- Surveillance mitigation - End‑to‑end OpenPGP encryption ensures relay operators never see plaintext.
- Digital sovereignty - Communities can host their own infrastructure under local jurisdiction, aligning with national data‑policy goals.
By turning the Chatmail relay into a plug‑and‑play container stack, we enable broader adoption, foster a resilient messaging fabric, and give developers, activists, and hobbyists a concrete tool to defend privacy online.
Goals
As I indicated earlier, this project aims to drastically simplify the deployment of Chatmail relay. By converting this architecture into a portable, containerized stack using Podman and OpenSUSE base container images, we can allow anyone to deploy their own censorship-resistant, privacy-preserving communications node in minutes.
Our goal for Hack Week: package every component into containers built on openSUSE/MicroOS base images, initially orchestrated with a single container-compose.yml (podman-compose compatible). The stack will:
- Run on any host that supports Podman (including optimizations and enhancements for SELinux‑enabled systems).
- Allow network decoupling by refactoring configurations to move from file-system constrained Unix sockets to internal TCP networking, allowing containers achieve stricter isolation.
- Utilize Enhanced Security with SELinux by using purpose built utilities such as udica we can quickly generate custom SELinux policies for the container stack, ensuring strict confinement superior to standard/typical Docker deployments.
- Allow the use of bind or remote mounted volumes for shared data (
/var/vmail, DKIM keys, TLS certs, etc.). - Replace the local DNS server requirement with a remote DNS‑provider API for DKIM/TXT record publishing.
By delivering a turnkey, host agnostic, reproducible deployment, we lower the barrier for individuals and small communities to launch their own chatmail relays, fostering a decentralized, censorship‑resistant messaging ecosystem that can serve DeltaChat users and/or future services adopting this protocol
Resources
- The links included above
- https://chatmail.at/doc/relay/
- https://delta.chat/en/help
- Project repo -> https://codeberg.org/EndShittification/containerized-chatmail-relay
Technical talks at universities by agamez
Description
This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.
For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.
Goals
- Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
- Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
- Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.
Resources
- Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
- SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.
DNS management with DNSControl by itorres
Description
We use several systems to manage DNS at SUSE and openSUSE: BIND, external providers, PowerDNS... each of them is managed in a different way either with raw zones (BIND) or Terraform (external providers).
DNSControl is an opinionated tool to manage DNS as code while being provider agnostic. It's developed and used by StackExchange, was spearheaded by Tom Limoncelly and is already being used to manage DNS for openSUSE.
Implementing DNSControl should allow us to have a single DNS operations interface that end users can leverage.
This would reduce complexity for end users as they can use a single simplified ECMAScript based DSL instead of BIND zones for internal and HCL config for external.
Operations for our IT organization would be greatly reduced. DNSControl itself has several internal checks that reduce our need to do linting and we can concentrate on implementing logical checks based on ownership.
This simplifies reviews a lot and the integration with BIND and providers allows our IT organization to implement an apply on merge.
At an organizational level it will separate our DNS tasks from other IT operations, speeding up DNS changes and allowing us to delegate DNS reviews to service desk or even customer teams through CODEOWNERS.
Goals
- Create a test subdomain in one of our internal BIND servers to be managed with DNSControl.
- Create an internal DNSControl repository to implement gitops for DNS.
- Deploy DNS changes strictly through gitops.
Extended goals
- Implement CODEOWNERS.
- Replicate main goals for external DNS.
Resources
- DNSControl documentation and introduction
- Opinions guiding DNSControl
- Package in OBS
- openSUSE repo to manage DNS with DNS Control