Project here: https://confluence.suse.com/display/AAI/HackWeek19 Will keep working out of HackWeek as "best effort" personal project to make it evolve and keep learning.
What this project is about?
Data Scientist ofter starts working on their laptop before moving into company resources. As in many other cases they have to solve many challenges by themselves before actually start working on "their stuff". The idea is to build a prototype we will eventually try to evolve in a product that answers the following pre-requisites:
- Rapid Time to work: I, as Data Scientist or Data Engineer, need to install the playground quickly and be ready to work
- Everything at the right place: I as Data Scientist or Data Engineer want an easy way to find things and use them
- No time to waste: I as Data Scientist or Data Engineer want to be able to replicate the model synchronizing it with another infrastructure through a "click and done" model
- No complexity rule: I as Data Scientist or Data Engineer want to avoid waste time in complex configurations or debug things. Complexity needs to hided to me
Project Team requirements
Because this is a first attempt to prototype I have to ask for some "not official" rules to be applied:
- Max 7/9 people in the team with a max of 3 Engineers
- If you apply you have to make yourself available from 10 am to 5 pm CET (if you're on a different time zone you have to consider we'll have a lot of team discussion so could be challenging)
- This is a 5 days sprint approach where everyone needs to be open, collaborative, bold, creative.
FAQ
- I'm not an engineer or an expert: Great this project require (possibly) at least 1 person from marketing, sales-engineering, services, support
- Am I required to code?: No, but you're required to share your ideas and views, while the end goal is to build a prototype (that's why we need a couple of engineers) the scope is to have something to show and demonstrate we may build something useful for the Data Scientist community
- Woah this seems to be a super serious project: Nah it's a fun experiment to learn how much we may push our limit through rapid prototyping and "be different"
- So how do I signup?: easy just join the team here on hackweek and/or contact me alessandro.festa@suse.com for further details.
Looking for hackers with the skills:
ai artificial-intelligence machinelearning prototype agile projectmanagement innovation
This project is part of:
Hack Week 19
Activity
Comments
-
about 6 years ago by hennevogel | Reply
Can you explain what kind of output you would expect? Like an application? A set of packages? Some IaC description?
-
about 6 years ago by afesta | Reply
This is something we have to decide during the hack week, usually a prototype based on a target of the challenge decided by the team. If this will be simple artifacts made of a sum of existing items, an application or a set of packages has to be decided. The scope is to foster innovation under a very fast cycle (5 days) and get a result that allows us to learn if: is doable, what we need to address to make it a real product and how long could take. Don't expect huge development or impossible challenges, this is about pure innovation and ideas.. and build a way to demonstrate our idea.
-
-
about 6 years ago by afesta | Reply
This is something we have to decide during the hack week, usually a prototype based on a target of the challenge decided by the team. If this will be simple artifacts made of a sum of existing items, an application or a set of packages has to be decided. The scope is to foster innovation under a very fast cycle (5 days) and get a result that allows us to learn if: is doable, what we need to address to make it a real product and how long could take. Don't expect huge development or impossible challenges, this is about pure innovation and ideas.. and build a way to demonstrate our idea.
-
almost 6 years ago by bmwiedemann | Reply
If you have a need for this project for 2x NVIDIA Tesla T4, 16GB - ping me.
-
almost 6 years ago by rsblendido | Reply
Is this about Kubeflow?
-
almost 6 years ago by afesta | Reply
Could be. I mean the only "constraint" is that ideally should work on a laptop and Kubeflow works on K8's but if you use something like MLRun you may overcome many challenges. The ultimate goal of the project is to provide Data scientists a playground so that they do not need to learn and install and configure everything but it's easy enough to start from your laptop (and eventually) move it to a server/cloud environment.
-
-
Similar Projects
Backporting patches using LLM by jankara
Description
Backporting Linux kernel fixes (either for CVE issues or as part of general git-fixes workflow) is boring and mostly mechanical work (dealing with changes in context, renamed variables, new helper functions etc.). The idea of this project is to explore usage of LLM for backporting Linux kernel commits to SUSE kernels using LLM.
Goals
- Create safe environment allowing LLM to run and backport patches without exposing the whole filesystem to it (for privacy and security reasons).
- Write prompt that will guide LLM through the backporting process. Fine tune it based on experimental results.
- Explore success rate of LLMs when backporting various patches.
Resources
- Docker
- Gemini CLI
Repository
Current version of the container with some instructions for use are at: https://gitlab.suse.de/jankara/gemini-cli-backporter
Enable more features in mcp-server-uyuni by j_renner
Description
I would like to contribute to mcp-server-uyuni, the MCP server for Uyuni / Multi-Linux Manager) exposing additional features as tools. There is lots of relevant features to be found throughout the API, for example:
- System operations and infos
- System groups
- Maintenance windows
- Ansible
- Reporting
- ...
At the end of the week I managed to enable basic system group operations:
- List all system groups visible to the user
- Create new system groups
- List systems assigned to a group
- Add and remove systems from groups
Goals
- Set up test environment locally with the MCP server and client + a recent MLM server [DONE]
- Identify features and use cases offering a benefit with limited effort required for enablement [DONE]
- Create a PR to the repo [DONE]
Resources
MCP Trace Suite by r1chard-lyu
Description
This project plans to create an MCP Trace Suite, a system that consolidates commonly used Linux debugging tools such as bpftrace, perf, and ftrace.
The suite is implemented as an MCP Server. This architecture allows an AI agent to leverage the server to diagnose Linux issues and perform targeted system debugging by remotely executing and retrieving tracing data from these powerful tools.
- Repo: https://github.com/r1chard-lyu/systracesuite
- Demo: Slides
Goals
Build an MCP Server that can integrate various Linux debugging and tracing tools, including bpftrace, perf, ftrace, strace, and others, with support for future expansion of additional tools.
Perform testing by intentionally creating bugs or issues that impact system performance, allowing an AI agent to analyze the root cause and identify the underlying problem.
Resources
- Gemini CLI: https://geminicli.com/
- eBPF: https://ebpf.io/
- bpftrace: https://github.com/bpftrace/bpftrace/
- perf: https://perfwiki.github.io/main/
- ftrace: https://github.com/r1chard-lyu/tracium/
Multi-agent AI assistant for Linux troubleshooting by doreilly
Description
Explore multi-agent architecture as a way to avoid MCP context rot.
Having one agent with many tools bloats the context with low-level details about tool descriptions, parameter schemas etc which hurts LLM performance. Instead have many specialised agents, each with just the tools it needs for its role. A top level supervisor agent takes the user prompt and delegates to appropriate sub-agents.
Goals
Create an AI assistant with some sub-agents that are specialists at troubleshooting Linux subsystems, e.g. systemd, selinux, firewalld etc. The agents can get information from the system by implementing their own tools with simple function calls, or use tools from MCP servers, e.g. a systemd-agent can use tools from systemd-mcp.
Example prompts/responses:
user$ the system seems slow
assistant$ process foo with pid 12345 is using 1000% cpu ...
user$ I can't connect to the apache webserver
assistant$ the firewall is blocking http ... you can open the port with firewall-cmd --add-port ...
Resources
Language Python. The Python ADK is more mature than Golang.
https://google.github.io/adk-docs/
https://github.com/djoreilly/linux-helper
Uyuni Health-check Grafana AI Troubleshooter by ygutierrez
Description
This project explores the feasibility of using the open-source Grafana LLM plugin to enhance the Uyuni Health-check tool with LLM capabilities. The idea is to integrate a chat-based "AI Troubleshooter" directly into existing dashboards, allowing users to ask natural-language questions about errors, anomalies, or performance issues.
Goals
- Investigate if and how the
grafana-llm-appplug-in can be used within the Uyuni Health-check tool. - Investigate if this plug-in can be used to query LLMs for troubleshooting scenarios.
- Evaluate support for local LLMs and external APIs through the plugin.
- Evaluate if and how the Uyuni MCP server could be integrated as another source of information.
Resources
SUSE Observability MCP server by drutigliano
Description
The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.
This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.
Goals
- Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
- Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
- Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
- Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.
Hackweek STEP
- Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.
Scope
- Implement read-only MCP server that can:
- Connect to a live SUSE Observability instance and authenticate (with API token)
- Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
- Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
- Return the data as a structured JSON payload compliant with the MCP specification.
Deliverables
- MCP Server v0.1 A running Golang MCP server with at least one tool.
- A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.
Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.
Resources
- https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
- https://www.datadoghq.com/blog/datadog-remote-mcp-server
- https://modelcontextprotocol.io/specification/2025-06-18/index
- https://modelcontextprotocol.io/docs/develop/build-server
Basic implementation
- https://github.com/drutigliano19/suse-observability-mcp-server
Results
Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.
Example execution
Song Search with CLAP by gcolangiuli
Description
Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface
Goals
Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:
- Music Tagging;
- Free text search;
- Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.
The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.
Result
In this MVP we implemented:
- Async Song Analysis with Clap model
- Free Text Search of the songs
- Similar song search based on vector representation
- Containerised version with web interface
We also documented what went well and what can be improved in the use of AI.
You can have a look at the result here:
Future implementation can be related to performance improvement and stability of the analysis.
References
- CLAP: The main model being researched;
- huggingface: Pre-trained models for CLAP;
- Free Music Archive: Creative Commons songs that can be used for testing;
Review SCC team internal development processes by calmeidadeoliveira
Description
Continue with the Hackweek 2024, with focus on reviewing existing processes / ways of working and creating workflows:
Goals
- Check all the processes from [1] and [3]
- move them to confluence [4], make comments, corrections, etc.
- present the result to the SCC team and ask for reviews
Resources
[1] https://github.com/SUSE/scc-docs/blob/master/team/workflow/kanban-process.md [2] https://github.com/SUSE/scc-docs/tree/master/team [3] https://github.com/SUSE/scc-docs/tree/master/team/workflow
Confluence
[4] https://confluence.suse.com/spaces/scc/pages/1537703975/Processes+and+ways+of+working