Description
A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.
Goals
Allow PyTorch users to save and load machine learning models in OCI registries.
Resources
This project is part of:
Hack Week 24
Activity
Comments
Be the first to comment!
Similar Projects
Update M2Crypto by mcepl
There are couple of projects I work on, which need my attention and putting them to shape:
Goal for this Hackweek
- Put M2Crypto into better shape (most issues closed, all pull requests processed)
- More fun to learn jujutsu
- Play more with Gemini, how much it help (or not).
- Perhaps, also (just slightly related), help to fix vis to work with LuaJIT, particularly to make vis-lspc working.
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])). - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring
Outcome
- Jenkins Flaky Test Detector: https://github.com/srbarrios/jenkins-flaky-tests-detector and its container
- IaC on MLM Team: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring/jenkinsflakytestsdetector?reftype=heads, https://gitlab.suse.de/galaxy/infrastructure/-/blob/master/srv/salt/monitoring/grafana/dashboards/flaky-tests.json?ref_type=heads, and others.
- Grafana Dashboard: https://grafana.mgr.suse.de/d/flaky-tests/flaky-tests-detection @ @ text
Docs Navigator MCP: SUSE Edition by mackenzie.techdocs

Description
Docs Navigator MCP: SUSE Edition is an AI-powered documentation navigator that makes finding information across SUSE, Rancher, K3s, and RKE2 documentation effortless. Built as a Model Context Protocol (MCP) server, it enables semantic search, intelligent Q&A, and documentation summarization using 100% open-source AI models (no API keys required!). The project also allows you to bring your own keys from Anthropic and Open AI for parallel processing.
Goals
- [ X ] Build functional MCP server with documentation tools
- [ X ] Implement semantic search with vector embeddings
- [ X ] Create user-friendly web interface
- [ X ] Optimize indexing performance (parallel processing)
- [ X ] Add SUSE branding and polish UX
- [ X ] Stretch Goal: Add more documentation sources
- [ X ] Stretch Goal: Implement document change detection for auto-updates
Coming Soon!
- Community Feedback: Test with real users and gather improvement suggestions
Resources
- Repository: Docs Navigator MCP: SUSE Edition GitHub
- UI Demo: Live UI Demo of Docs Navigator MCP: SUSE Edition
GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov
Motivation
What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?
Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?
Description
To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.
Goals
Set up a Python environment
Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).
Build the Dialogue Loop
- Write a basic Python script using the Gemini API.
- Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation
Socratic/Systemic Strategy Implementation
- Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
- Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
- Implement Bugzillla call to collect the
- Implement Questioning Framework as LLVM pre-conditioning
- Define set of instructions
- Assemble the Tool
Resources
What are Systemic Questions?
Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.
Gitlab Project
gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question
SUSE Edge Image Builder MCP by eminguez
Description
Based on my other hackweek project, SUSE Edge Image Builder's Json Schema I would like to build also a MCP to be able to generate EIB config files the AI way.
Realistically I don't think I'll be able to have something consumable at the end of this hackweek but at least I would like to start exploring MCPs, the difference between an API and MCP, etc.
Goals
- Familiarize myself with MCPs
- Unrealistic: Have an MCP that can generate an EIB config file
Resources
Result
https://github.com/e-minguez/eib-mcp
I've extensively used antigravity and its agent mode to code this. This heavily uses https://hackweek.opensuse.org/25/projects/suse-edge-image-builder-json-schema for the MCP to be built.
I've ended up learning a lot of things about "prompting", json schemas in general, some golang, MCPs and AI in general :)
Example:
Generate an Edge Image Builder configuration for an ISO image based on slmicro-6.2.iso, targeting x86_64 architecture. The output name should be 'my-edge-image' and it should install to /dev/sda. It should deploy
a 3 nodes kubernetes cluster with nodes names "node1", "node2" and "node3" as:
* hostname: node1, IP: 1.1.1.1, role: initializer
* hostname: node2, IP: 1.1.1.2, role: agent
* hostname: node3, IP: 1.1.1.3, role: agent
The kubernetes version should be k3s 1.33.4-k3s1 and it should deploy a cert-manager helm chart (the latest one available according to https://cert-manager.io/docs/installation/helm/). It should create a user
called "suse" with password "suse" and set ntp to "foo.ntp.org". The VIP address for the API should be 1.2.3.4
Generates:
``` apiVersion: "1.0" image: arch: x86_64 baseImage: slmicro-6.2.iso imageType: iso outputImageName: my-edge-image kubernetes: helm: charts: - name: cert-manager repositoryName: jetstack
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Updates
- Training pipeline and datasets
- Inference Service py
Exploring Modern AI Trends and Kubernetes-Based AI Infrastructure by jluo
Description
Build a solid understanding of the current landscape of Artificial Intelligence and how modern cloud-native technologies—especially Kubernetes—support AI workloads.
Goals
Use Gemini Learning Mode to guide the exploration, surface relevant concepts, and structure the learning journey:
- Gain insight into the latest AI trends, tools, and architectural concepts.
- Understand how Kubernetes and related cloud-native technologies are used in the AI ecosystem (model training, deployment, orchestration, MLOps).
Resources
Red Hat AI Topic Articles
- https://www.redhat.com/en/topics/ai
Kubeflow Documentation
- https://www.kubeflow.org/docs/
Q4 2025 CNCF Technology Landscape Radar report:
- https://www.cncf.io/announcements/2025/11/11/cncf-and-slashdata-report-finds-leading-ai-tools-gaining-adoption-in-cloud-native-ecosystems/
- https://www.cncf.io/wp-content/uploads/2025/11/cncfreporttechradar_111025a.pdf
Agent-to-Agent (A2A) Protocol
- https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti
Description
The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.
The initial architecture can be checked out on the Repository listed under Resources.
Among the features that will differ this project from other monitoring/notification systems:
- Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
- All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
- Easy account management, being able to parse all required configuration by a single JSON file
- Eliminate integrations by not requiring metrics to go through a data-agreggator
Goals
- Create a minimal working prototype following the workflow specified on the documentation
- Provide instructions on installation/usage
- Work on email notifying capabilities
Resources