SUSE Hack Week: Research how LLMs could help to Linux developers and/or users

Description

Large language models like ChatGPT have demonstrated remarkable capabilities across a variety of applications. However, their potential for enhancing the Linux development and user ecosystem remains largely unexplored. This project seeks to bridge that gap by researching practical applications of LLMs to improve workflows in areas such as backporting, packaging, log analysis, system migration, and more. By identifying patterns that LLMs can leverage, we aim to uncover new efficiencies and automation strategies that can benefit developers, maintainers, and end users alike.

Goals

Evaluate Existing LLM Capabilities: Research and document the current state of LLM usage in open-source and Linux development projects, noting successes and limitations.
Prototype Tools and Scripts: Develop proof-of-concept scripts or tools that leverage LLMs to perform specific tasks like automated log analysis, assisting with backporting patches, or generating packaging metadata.
Assess Performance and Reliability: Test the tools' effectiveness on real-world Linux data and analyze their accuracy, speed, and reliability.
Identify Best Use Cases: Pinpoint which tasks are most suitable for LLM support, distinguishing between high-impact and impractical applications.
Document Findings and Recommendations: Summarize results with clear documentation and suggest next steps for potential integration or further development.

Resources

Local LLM Implementations: Access to locally hosted LLMs such as LLaMA, GPT-J, or similar open-source models that can be run and fine-tuned on local hardware.
Computing Resources: Workstations or servers capable of running LLMs locally, equipped with sufficient GPU power for training and inference.
Sample Data: Logs, source code, patches, and packaging data from openSUSE or SUSE repositories for model training and testing.
Public LLMs for Benchmarking: Access to APIs from platforms like OpenAI or Hugging Face for comparative testing and performance assessment.
Existing NLP Tools: Libraries such as spaCy, Hugging Face Transformers, and PyTorch for building and interacting with local LLMs.
Technical Documentation: Tutorials and resources focused on setting up and optimizing local LLMs for tasks relevant to Linux development.
Collaboration: Engagement with community experts and teams experienced in AI and Linux for feedback and joint exploration.

Join this project Leave this project

Looking for hackers with the skills:

This project is part of:

Hack Week 24

Activity

about 1 year ago: PSuarezHernandez liked this project.

about 1 year ago: jiriwiesner liked this project.

about 1 year ago: anicka added keyword "ai" to this project.

about 1 year ago: moio liked this project.

about 1 year ago: livdywan liked this project.

about 1 year ago: mwilck liked this project.

about 1 year ago: bfilho liked this project.

about 1 year ago: vlefebvre liked this project.

about 1 year ago: wfrisch liked this project.

about 1 year ago: anicka started this project.

about 1 year ago: anicka originated this project.

Comments

about 1 year ago by wfrisch | Reply

If someone could recreate Google's Project Naptime, or at least something similar to it, that would be very interesting:

Two key features:
- Tool use in general
- Tool-assisted verification of LLM results

about 1 year ago by jiriwiesner | Reply

I would like to ask an LLM instance about the inner workings on the Linux kernel code. It is a common task of mine to look for a bug in a subsystem or a layer that can easily have tens of thousands of lines of code (e.g. bsc 1216813). I know having an understanding of the Linux code is what we do as developers but my understanding and knowledge is always limited because I simply do not have the time to read all of the code possibly involved in an issue. If the LLM was trained to process the source code of a specific version of Linux a developer could then ask involved questions about the code using the terms found in the code base. It should basically be something that allows a developer find the interesting parts of the code better than when using just grep.
- about 1 year ago by anicka | Reply
  
  Actually, it looks like that off-the-shelf ChatGPT 4 can be already quite helpful in such tasks.
  
  But training something like code llama on our kernels is something I indeed want to look into next time because if there is any way how to leverage LLMs in our bugfixing or backporting, this is it.

Similar Projects

ai

Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios

Description

Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.

This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.

The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.

Goals

By the end of Hack Week, we aim to have a single, working Python script that:

Connects to Prometheus and executes a query to fetch detailed test failure history.
Processes the raw data into a format suitable for the Gemini API.
Successfully calls the Gemini API with the data and a clear prompt.
Parses the AI's response to extract a simple list of flaky tests.
Saves the list to a JSON file that can be displayed in Grafana.
New panel in our Dashboard listing the Flaky tests

Resources

Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
Data Source: Our internal Prometheus server.
Key Metric: jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}.
Existing Query for Reference: count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range])).
AI Model: The Google Gemini API.
Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
Visualization: Our internal Grafana Dashboard.
Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring

Update M2Crypto by mcepl

There are couple of projects I work on, which need my attention and putting them to shape:

M2Crypto

Goal for this Hackweek

Put M2Crypto into better shape (most issues closed, all pull requests processed)
More fun to learn jujutsu
Play more with Gemini, how much it help (or not).
Perhaps, also (just slightly related), help to fix vis to work with LuaJIT, particularly to make vis-lspc working.

Bugzilla goes AI - Phase 1 by nwalter

Description

This project, Bugzilla goes AI, aims to boost developer productivity by creating an autonomous AI bug agent during Hackweek. The primary goal is to reduce the time employees spend triaging bugs by integrating Ollama to summarize issues, recommend next steps, and push focused daily reports to a Web Interface.

Goals

To reduce employee time spent on Bugzilla by implementing an AI tool that triages and summarizes bug reports, providing actionable recommendations to the team via Web Interface.

Project Charter

https://docs.google.com/document/d/1HbAvgrg8T3pd1FIx74nEfCObCljpO77zz5In_Jpw4as/edit?usp=sharing## Description

SUSE Observability MCP server by drutigliano

Description

The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

Goals

Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

Hackweek STEP

Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.

Scope

Implement read-only MCP server that can:
- Connect to a live SUSE Observability instance and authenticate (with API token)
- Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
- Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
- Return the data as a structured JSON payload compliant with the MCP specification.

Deliverables

MCP Server v0.1 A running Python web server (e.g., using FastAPI) with at least one tool.
A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.

Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.

Resources

https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
https://www.datadoghq.com/blog/datadog-remote-mcp-server
https://modelcontextprotocol.io/specification/2025-06-18/index
https://modelcontextprotocol.io/docs/develop/build-server

Basic implementation

https://github.com/drutigliano19/suse-observability-mcp-server

Gemini-Powered Socratic Bug Evaluation and Management Assistant by rtsvetkov

Description

To build a tool or system that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.

Goals

Set up a Python environment

Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).

Build the Dialogue Loop

Write a basic Python script using the Gemini API.
Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation

Socratic Strategy Implementation

Refine the logic to ensure the questions follow a Socratic path (e.g., from symptom-> context -> assumptions -> root cause).
Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."

Description

Goals

Resources

Looking for hackers with the skills:

This project is part of:

Activity

Comments

about 1 year ago by wfrisch | Reply

about 1 year ago by jiriwiesner | Reply

about 1 year ago by anicka | Reply

Similar Projects

ai

Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios

Description

Goals

Resources

Update M2Crypto by mcepl

Goal for this Hackweek

Bugzilla goes AI - Phase 1 by nwalter

Description

Goals

Project Charter

SUSE Observability MCP server by drutigliano

Description

Goals

Hackweek STEP

Scope

Deliverables

Resources

Basic implementation

Gemini-Powered Socratic Bug Evaluation and Management Assistant by rtsvetkov

Description

Goals

Set up a Python environment

Build the Dialogue Loop

Socratic Strategy Implementation

Resources