Description

Start experimenting the generative SUSE-AI chat bot, asking questions on different areas of knowledge or science and possibly analyze the quality of the LLM model response, specific and comparative, checking the answers provided by different LLM models to a same query, using proper quality metrics or tools or methodologies.

Try to define basic guidelines and requirements for quality test automation of AI-generated responses.

First approach of investigation can be based on manual testing: methodologies, findings and data can be useful then to organize valid automated testing.

Goals

  • Identify criteria and measuring scales for assessment of a text content.
  • Define quality of an answer/text based on defined criteria .
  • Identify some knowledge sectors and a proper list of problems/questions per sector.
  • Manually run query session and apply evaluation criteria to answers.
  • Draft requirements for test automation of AI answers.

Resources

  • Announcement of SUSE-AI for Hack Week in Slack
  • Openplatform and related 3 LLM models gemma:2b, llama3.1:8b, qwen2.5-coder:3b.

Notes

  • Foundation models (FMs):
    are large deep learning neural networks, trained on massive datasets, that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively.

  • Large language models (LLMs):
    are a category of foundation models pre-trained on immense amounts of data acquiring abilities by learning statistical relationships from vast amounts of text during a self- and semi-supervised training process, making them capable of understanding and generating natural language and other types of content , to perform a wide range of tasks.
    LLMs can be used for generative AI (artificial intelligence) to produce content based on input prompts in human language.

Validation of a AI-generated answer is not an easy task to perform, as manually as automated.
An LLM answer text shall contain a given level of informations: correcness, completeness, reasoning description etc.
We shall rely in properly applicable and measurable criteria of validation to get an assessment in a limited amount of time and resources.

Looking for hackers with the skills:

ai llm

This project is part of:

Hack Week 24

Activity

  • about 1 year ago: mdati added keyword "llm" to this project.
  • about 1 year ago: mdati added keyword "ai" to this project.
  • about 1 year ago: mdati liked this project.
  • about 1 year ago: mdati started this project.
  • about 1 year ago: mdati originated this project.

  • Comments

    • livdywan
      about 1 year ago by livdywan | Reply

      You might want to add an ai tag

    Similar Projects

    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    Multi-agent AI assistant for Linux troubleshooting by doreilly

    Description

    Explore multi-agent architecture as a way to avoid MCP context rot.

    Having one agent with many tools bloats the context with low-level details about tool descriptions, parameter schemas etc which hurts LLM performance. Instead have many specialised agents, each with just the tools it needs for its role. A top level supervisor agent takes the user prompt and delegates to appropriate sub-agents.

    Goals

    Create an AI assistant with some sub-agents that are specialists at troubleshooting Linux subsystems, e.g. systemd, selinux, firewalld etc. The agents can get information from the system by implementing their own tools with simple function calls, or use tools from MCP servers, e.g. a systemd-agent can use tools from systemd-mcp.

    Example prompts/responses:

    user$ the system seems slow
    assistant$ process foo with pid 12345 is using 1000% cpu ...
    
    user$ I can't connect to the apache webserver
    assistant$ the firewall is blocking http ... you can open the port with firewall-cmd --add-port ...
    

    Resources

    Language Python. The Python ADK is more mature than Golang.

    https://google.github.io/adk-docs/

    https://github.com/djoreilly/linux-helper


    Update M2Crypto by mcepl

    There are couple of projects I work on, which need my attention and putting them to shape:

    Goal for this Hackweek

    • Put M2Crypto into better shape (most issues closed, all pull requests processed)
    • More fun to learn jujutsu
    • Play more with Gemini, how much it help (or not).
    • Perhaps, also (just slightly related), help to fix vis to work with LuaJIT, particularly to make vis-lspc working.


    Bugzilla goes AI - Phase 1 by nwalter

    Description

    This project, Bugzilla goes AI, aims to boost developer productivity by creating an autonomous AI bug agent during Hackweek. The primary goal is to reduce the time employees spend triaging bugs by integrating Ollama to summarize issues, recommend next steps, and push focused daily reports to a Web Interface.

    Goals

    To reduce employee time spent on Bugzilla by implementing an AI tool that triages and summarizes bug reports, providing actionable recommendations to the team via Web Interface.

    Project Charter

    Bugzilla goes AI Phase 1

    Description

    Project Achievements during Hackweek

    In this file you can read about what we achieved during Hackweek.

    Project Achievements


    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

     Hackweek STEP

    • Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.

     Scope

    • Implement read-only MCP server that can:
      • Connect to a live SUSE Observability instance and authenticate (with API token)
      • Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
      • Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
      • Return the data as a structured JSON payload compliant with the MCP specification.

    Deliverables

    • MCP Server v0.1 A running Golang MCP server with at least one tool.
    • A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.

    Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index
    • https://modelcontextprotocol.io/docs/develop/build-server

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server

    Results

    Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.

    Example execution


    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

     Hackweek STEP

    • Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.

     Scope

    • Implement read-only MCP server that can:
      • Connect to a live SUSE Observability instance and authenticate (with API token)
      • Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
      • Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
      • Return the data as a structured JSON payload compliant with the MCP specification.

    Deliverables

    • MCP Server v0.1 A running Golang MCP server with at least one tool.
    • A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.

    Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index
    • https://modelcontextprotocol.io/docs/develop/build-server

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server

    Results

    Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.

    Example execution


    Explore LLM evaluation metrics by thbertoldi

    Description

    Learn the best practices for evaluating LLM performance with an open-source framework such as DeepEval.

    Goals

    Curate the knowledge learned during practice and present it to colleagues.

    -> Maybe publish a blog post on SUSE's blog?

    Resources

    https://deepeval.com

    https://docs.pactflow.io/docs/bi-directional-contract-testing


    Song Search with CLAP by gcolangiuli

    Description

    Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

    SUSE Hackweek AI Song Search

    Goals

    Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

    • Music Tagging;
    • Free text search;
    • Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

    The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

    Result

    In this MVP we implemented:

    • Async Song Analysis with Clap model
    • Free Text Search of the songs
    • Similar song search based on vector representation
    • Containerised version with web interface

    We also documented what went well and what can be improved in the use of AI.

    You can have a look at the result here:

    Future implementation can be related to performance improvement and stability of the analysis.

    References


    Extended private brain - RAG my own scripts and data into offline LLM AI by tjyrinki_suse

    Description

    For purely studying purposes, I'd like to find out if I could teach an LLM some of my own accumulated knowledge, to use it as a sort of extended brain.

    I might use qwen3-coder or something similar as a starting point.

    Everything would be done 100% offline without network available to the container, since I prefer to see when network is needed, and make it so it's never needed (other than initial downloads).

    Goals

    1. Learn something about RAG, LLM, AI.
    2. Find out if everything works offline as intended.
    3. As an end result have a new way to access my own existing know-how, but so that I can query the wisdom in them.
    4. Be flexible to pivot in any direction, as long as there are new things learned.

    Resources

    To be found on the fly.

    Timeline

    Day 1 (of 4)

    • Tried out a RAG demo, expanded on feeding it my own data
    • Experimented with qwen3-coder to add a persistent chat functionality, and keeping vectors in a pickle file
    • Optimizations to keep everything within context window
    • Learn and add a bit of PyTest

    Day 2

    • More experimenting and more data
    • Study ChromaDB
    • Add a Web UI that works from another computer even though the container sees network is down

    Day 3

    • The above RAG is working well enough for demonstration purposes.
    • Pivot to trying out OpenCode, configuring local Ollama qwen3-coder there, to analyze the RAG demo.
    • Figured out how to configure Ollama template to be usable under OpenCode. OpenCode locally is super slow to just running qwen3-coder alone.

    Day 4 (final day)

    • Battle with OpenCode that was both slow and kept on piling up broken things.
    • Call it success as after all the agentic AI was working locally.
    • Clean up the mess left behind a bit.

    Blog Post

    Summarized the findings at blog post.


    "what is it" file and directory analysis via MCP and local LLM, for console and KDE by rsimai

    Description

    Users sometimes wonder what files or directories they find on their local PC are good for. If they can't determine from the filename or metadata, there should an easy way to quickly analyze the content and at least guess the meaning. An LLM could help with that, through the use of a filesystem MCP and to-text-converters for typical file types. Ideally this is integrated into the desktop environment but works as well from a console. All data is processed locally or "on premise", no artifacts remain or leave the system.

    Goals

    • The user can run a command from the console, to check on a file or directory
    • The filemanager contains the "analyze" feature within the context menu
    • The local LLM could serve for other use cases where privacy matters

    TBD

    • Find or write capable one-shot and interactive MCP client
    • Find or write simple+secure file access MCP server
    • Create local LLM service with appropriate footprint, containerized
    • Shell command with options
    • KDE integration (Dolphin)
    • Package
    • Document

    Resources