Description
Backporting Linux kernel fixes (either for CVE issues or as part of general git-fixes workflow) is boring and mostly mechanical work (dealing with changes in context, renamed variables, new helper functions etc.). The idea of this project is to explore usage of LLM for backporting Linux kernel commits to SUSE kernels using LLM.
Goals
- Create safe environment allowing LLM to run and backport patches without exposing the whole filesystem to it (for privacy and security reasons).
- Write prompt that will guide LLM through the backporting process. Fine tune it based on experimental results.
- Explore success rate of LLMs when backporting various patches.
Resources
- Docker
- Gemini CLI
Repository
Current version of the container with some instructions for use are at: https://gitlab.suse.de/jankara/gemini-cli-backporter
This project is part of:
Hack Week 25
Comments
-
2 months ago by jankara | Reply
I've created Docker container for this purpose and some scripting around it to save some typing (see the repository for details). Gemini successfully backported upstream commit 8ecb790ea8c3 ("ext4: avoid potential buffer over-read in parseapplysbmountoptions()") to SLE12-SP5 without any manual intervention. That is a good success because the commit requires applying changes to different place (and different function - the code has been significantly refactored due to mount API conversion). Backport of follow up commit ee5a977b4e ("ext4: fix string copying in parseapplysbmountoptions()") required hint to Gemini that 8ecb790ea8c3 was already backported. I have updated prompt to instruct Gemini how to search for already applied commits to possibly figure this out itself. I didn't have time to test that. Backport of commit 1d3ad18394 ("ext4: detect invalid INLINE_DATA + EXTENTS flag combination") was smooth as well but there the problem was just slightly modified context.
On the other hand backport of commit b86433721f4 ("blk-mq: fix potential deadlock while nr_requests grown") to SL-16.0 was too much. Gemini managed to create a patch that would apply (and likely compile) but the adaptation had multiple functional issues. The problem here was that the commit was part of a larger sequence of fixes to this area that significantly refactored the code and data structures. Also I've run out of credits for the PRO model during backporting so part of the backport was done by FLASH model which is apparently not clever enough.
To summarize the prompt certainly needs more work to better handle situations where more commits need backporting (and more thought what Gemini should do in that case - when it should decide the backport is just too complex and bail out?). Also PRO model seems to be significantly better than FLASH model but backport of one or two patches is enough to burn all your free credits for the day which somewhat slows down experimentation.
Another research direction is trying Claude Code which has CLI client as well and is deemed to be more advanced in coding tasks than Gemini. However a quick research seems to indicate that for Claude CLI access one needs a paid version so experiments require some investment...
Similar Projects
Kubernetes-Based ML Lifecycle Automation by lmiranda
Description
This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.
The pipeline will automate the lifecycle of a machine learning model, including:
- Data ingestion/collection
- Model training as a Kubernetes Job
- Model artifact storage in an S3-compatible registry (e.g. Minio)
- A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
- A lightweight inference service that loads and serves the latest model
- Monitoring of model performance and service health through Prometheus/Grafana
The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.
Goals
By the end of Hack Week, the project should:
Produce a fully functional ML pipeline running on Kubernetes with:
- Data collection job
- Training job container
- Storage and versioning of trained models
- Automated deployment of new model versions
- Model inference API service
- Basic monitoring dashboards
Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.
Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).
Prepare a short demo explaining the end-to-end process and how new models flow through the system.
Resources
Updates
- Training pipeline and datasets
- Inference Service py
GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov
Motivation
What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?
Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?
Description
To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.
Goals
Set up a Python environment
Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).
Build the Dialogue Loop
- Write a basic Python script using the Gemini API.
- Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation
Socratic/Systemic Strategy Implementation
- Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
- Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
- Implement Bugzillla call to collect the
- Implement Questioning Framework as LLVM pre-conditioning
- Define set of instructions
- Assemble the Tool
Resources
What are Systemic Questions?
Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.
Gitlab Project
gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question
Try out Neovim Plugins supporting AI Providers by enavarro_suse
Description
Experiment with several Neovim plugins that integrate AI model providers such as Gemini and Ollama.
Goals
Evaluate how these plugins enhance the development workflow, how they differ in capabilities, and how smoothly they integrate into Neovim for day-to-day coding tasks.
Resources
- Neovim 0.11.5
- AI-enabled Neovim plugins:
- avante.nvim: https://github.com/yetone/avante.nvim
- Gp.nvim: https://github.com/Robitx/gp.nvim
- parrot.nvim: https://github.com/frankroeder/parrot.nvim
- gemini.nvim: https://dotfyle.com/plugins/kiddos/gemini.nvim
- ...
- Accounts or API keys for AI model providers.
- Local model serving setup (e.g., Ollama)
- Test projects or codebases for practical evaluation:
- OBS: https://build.opensuse.org/
- OBS blog and landing page: https://openbuildservice.org/
- ...
Docs Navigator MCP: SUSE Edition by mackenzie.techdocs

Description
Docs Navigator MCP: SUSE Edition is an AI-powered documentation navigator that makes finding information across SUSE, Rancher, K3s, and RKE2 documentation effortless. Built as a Model Context Protocol (MCP) server, it enables semantic search, intelligent Q&A, and documentation summarization using 100% open-source AI models (no API keys required!). The project also allows you to bring your own keys from Anthropic and Open AI for parallel processing.
Goals
- [ X ] Build functional MCP server with documentation tools
- [ X ] Implement semantic search with vector embeddings
- [ X ] Create user-friendly web interface
- [ X ] Optimize indexing performance (parallel processing)
- [ X ] Add SUSE branding and polish UX
- [ X ] Stretch Goal: Add more documentation sources
- [ X ] Stretch Goal: Implement document change detection for auto-updates
Coming Soon!
- Community Feedback: Test with real users and gather improvement suggestions
Resources
- Repository: Docs Navigator MCP: SUSE Edition GitHub
- UI Demo: Live UI Demo of Docs Navigator MCP: SUSE Edition
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
issuefs: FUSE filesystem representing issues (e.g. JIRA) for the use with AI agents code-assistants by llansky3
Description
Creating a FUSE filesystem (issuefs) that mounts issues from various ticketing systems (Github, Jira, Bugzilla, Redmine) as files to your local file system.
And why this is good idea?
- User can use favorite command line tools to view and search the tickets from various sources
- User can use AI agents capabilities from your favorite IDE or cli to ask question about the issues, project or functionality while providing relevant tickets as context without extra work.
- User can use it during development of the new features when you let the AI agent to jump start the solution. The issuefs will give the AI agent the context (AI agents just read few more files) about the bug or requested features. No need for copying and pasting issues to user prompt or by using extra MCP tools to access the issues. These you can still do but this approach is on purpose different.

Goals
- Add Github issue support
- Proof the concept/approach by apply the approach on itself using Github issues for tracking and development of new features
- Add support for Bugzilla and Redmine using this approach in the process of doing it. Record a video of it.
- Clean-up and test the implementation and create some documentation
- Create a blog post about this approach
Resources
There is a prototype implementation here. This currently sort of works with JIRA only.
Bugzilla goes AI - Phase 1 by nwalter
Description
This project, Bugzilla goes AI, aims to boost developer productivity by creating an autonomous AI bug agent during Hackweek. The primary goal is to reduce the time employees spend triaging bugs by integrating Ollama to summarize issues, recommend next steps, and push focused daily reports to a Web Interface.
Goals
To reduce employee time spent on Bugzilla by implementing an AI tool that triages and summarizes bug reports, providing actionable recommendations to the team via Web Interface.
Project Charter
Description
Project Achievements during Hackweek
In this file you can read about what we achieved during Hackweek.
Extended private brain - RAG my own scripts and data into offline LLM AI by tjyrinki_suse
Description
For purely studying purposes, I'd like to find out if I could teach an LLM some of my own accumulated knowledge, to use it as a sort of extended brain.
I might use qwen3-coder or something similar as a starting point.
Everything would be done 100% offline without network available to the container, since I prefer to see when network is needed, and make it so it's never needed (other than initial downloads).
Goals
- Learn something about RAG, LLM, AI.
- Find out if everything works offline as intended.
- As an end result have a new way to access my own existing know-how, but so that I can query the wisdom in them.
- Be flexible to pivot in any direction, as long as there are new things learned.
Resources
To be found on the fly.
Timeline
Day 1 (of 4)
- Tried out a RAG demo, expanded on feeding it my own data
- Experimented with qwen3-coder to add a persistent chat functionality, and keeping vectors in a pickle file
- Optimizations to keep everything within context window
- Learn and add a bit of PyTest
Day 2
- More experimenting and more data
- Study ChromaDB
- Add a Web UI that works from another computer even though the container sees network is down
Day 3
- The above RAG is working well enough for demonstration purposes.
- Pivot to trying out OpenCode, configuring local Ollama qwen3-coder there, to analyze the RAG demo.
- Figured out how to configure Ollama template to be usable under OpenCode. OpenCode locally is super slow to just running qwen3-coder alone.
Day 4 (final day)
- Battle with OpenCode that was both slow and kept on piling up broken things.
- Call it success as after all the agentic AI was working locally.
- Clean up the mess left behind a bit.
Blog Post
Summarized the findings at blog post.
SUSE Observability MCP server by drutigliano
Description
The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.
This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.
Goals
- Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
- Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
- Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
- Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.
Hackweek STEP
- Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.
Scope
- Implement read-only MCP server that can:
- Connect to a live SUSE Observability instance and authenticate (with API token)
- Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
- Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
- Return the data as a structured JSON payload compliant with the MCP specification.
Deliverables
- MCP Server v0.1 A running Golang MCP server with at least one tool.
- A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.
Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.
Resources
- https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
- https://www.datadoghq.com/blog/datadog-remote-mcp-server
- https://modelcontextprotocol.io/specification/2025-06-18/index
- https://modelcontextprotocol.io/docs/develop/build-server
Basic implementation
- https://github.com/drutigliano19/suse-observability-mcp-server
Results
Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.
Example execution
Song Search with CLAP by gcolangiuli
Description
Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface
Goals
Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:
- Music Tagging;
- Free text search;
- Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.
The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.
Result
In this MVP we implemented:
- Async Song Analysis with Clap model
- Free Text Search of the songs
- Similar song search based on vector representation
- Containerised version with web interface
We also documented what went well and what can be improved in the use of AI.
You can have a look at the result here:
Future implementation can be related to performance improvement and stability of the analysis.
References
- CLAP: The main model being researched;
- huggingface: Pre-trained models for CLAP;
- Free Music Archive: Creative Commons songs that can be used for testing;
bpftrace contribution by mkoutny
Description
bpftrace is a great tool, no need to sing odes to it here. It can access any kernel data and process them in real time. It provides helpers for some common Linux kernel structures but not all.
Goals
- set up bpftrace toolchain
- learn about bpftrace implementation and internals
- implement support for
percpu_counters - look into some of the first issues
- send a refined PR (on Thu)
Resources
pudc - A PID 1 process that barks to the internet by mssola
Description
As a fun exercise in order to dig deeper into the Linux kernel, its interfaces, the RISC-V architecture, and all the dragons in between; I'm building a blog site cooked like this:
- The backend is written in a mixture of C and RISC-V assembly.
- The backend is actually PID1 (for real, not within a container).
- We poll and parse incoming HTTP requests ourselves.
- The frontend is a mere HTML page with htmx.
The project is meant to be Linux-specific, so I'm going to use io_uring, pidfs, namespaces, and Linux-specific features in order to drive all of this.
I'm open for suggestions and so on, but this is meant to be a solo project, as this is more of a learning exercise for me than anything else.
Goals
- Have a better understanding of different Linux features from user space down to the kernel internals.
- Most importantly: have fun.
Resources
early stage kdump support by mbrugger
Project Description
When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.
Goal for the Hackweeks
- Investigate if this is possible and the implications it would have (done in HW21)
- Hack up a PoC (done in HW22 and HW23)
- Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).
update HW23
- I was able to include the crash kernel into the kernel Image.
- I'll need to find a way to load that from
init/main.c:start_kernel()probably afterkcsan_init() - I workaround for a smoke test was to hack
kexec_file_load()systemcall which has two problems:- My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
- As the crash kernel is part of init.data it will be already stale once I can call
kexec_file_load()from user-space.
The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.
update HW24
- Day1
- rebased on v6.12 with no problems others then me breaking the config
- setting up a new compilation and qemu/virtme env
- getting desperate as nothing works that used to work
- Day 2
- getting to call the invocation of loading the early kernel from
__initafterkcsan_init()
- getting to call the invocation of loading the early kernel from
Day 3
- fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
- code refactoring
- I'm now able to load the crash kernel
- When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in
elfcorehdr_read_notes()
Day 4
- crash systems crashes (no pun intended) in
copy_old_mempage()link; will need to understand elfcorehdr... - call path
vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
- crash systems crashes (no pun intended) in
Day 5
- hacking
arch/arm64/kernel/crash_dump.c:copy_old_mempage()to see if crash system really starts. It does. - fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
- hacking
update HW25
- Day 1
- rebased crash-kernel on v6.12.59 (for now), still crashing
dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari
Description
dynticks-testing is a project started years ago by Frederic Weisbecker. One of the feature is to check the actual configuration (isolcpus, irqaffinity etc etc) and give feedback on it.
An important goal of this tool is to parse the output of trace-cmd / perf and provide more readable data, showing the duration of every events grouped by PID (showing also the CPU number, if the tasks has been migrated etc).
An example of data captured on my laptop (incomplete!!):
-0 [005] dN.2. 20310.270699: sched_wakeup: WaylandProxy:46380 [120] CPU:005
-0 [005] d..2. 20310.270702: sched_switch: swapper/5:0 [120] R ==> WaylandProxy:46380 [120]
...
WaylandProxy-46380 [004] d..2. 20310.295397: sched_switch: WaylandProxy:46380 [120] S ==> swapper/4:0 [120]
-0 [006] d..2. 20310.295397: sched_switch: swapper/6:0 [120] R ==> firefox:46373 [120]
firefox-46373 [006] d..2. 20310.295408: sched_switch: firefox:46373 [120] S ==> swapper/6:0 [120]
-0 [004] dN.2. 20310.295466: sched_wakeup: WaylandProxy:46380 [120] CPU:004
Output of noise_parse.py:
Task: WaylandProxy Pid: 46380 cpus: {4, 5} (Migrated!!!)
Wakeup Latency Nr: 24 Duration: 89
Sched switch: kworker/12:2 Nr: 1 Duration: 6
My first contribution is around Nov. 2024!
Goals
- add more features (eg cpuset)
- test / bugfix
Resources
- Frederic's public repository: https://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git/
- https://docs.kernel.org/timers/no_hz.html#testing
Progresses
isolcpus and cpusets implemented and merged in master: dynticks-testing.git commit
Improve UML page fault handler by ptesarik
Description
Improve UML handling of segmentation faults in kernel mode. Although such page faults are generally caused by a kernel bug, it is annoying if they cause an infinite loop, or panic the kernel. More importantly, a robust implementation allows to write KUnit tests for various guard pages, preventing potential kernel self-protection regressions.
Goals
Convert the UML page fault handler to use oops_* helpers, go through a few review rounds and finally get my patch series merged in 6.14.
Resources
Wrong initial attempt: https://lore.kernel.org/lkml/20231215121431.680-1-petrtesarik@huaweicloud.com/T/