Description

AI has the potential to help with something many of us spend a lot of time doing which is making sense of openQA logs when a job fails.

User Story

Allison Average has a puzzled look on their face while staring at log files that seem to make little sense. Is this a known issue, something completely new or maybe related to infrastructure changes?

Goals

  • Leverage a chat interface to help Allison
  • Create a model from scratch based on data from openQA
  • Proof of concept for automated analysis of openQA test results

Bonus

  • Use AI to suggest solutions to merge conflicts
    • This would need a merge conflict editor that can suggest solving the conflict
  • Use image recognition for needles

Resources

Timeline

Day 1

  • Conversing with open-webui to teach me how to create a model based on openQA test results

Day 2

Highlights

  • I briefly tested compared models to see if they would make me more productive. Between llama, gemma and mistral there was no amazing difference in the results for my case.
  • Convincing the chat interface to produce code specific to my use case required very explicit instructions.
  • Asking for advice on how to use open-webui itself better was frustratingly unfruitful both in trivial and more advanced regards.
  • Documentation on source materials used by LLM's and tools for this purpose seems virtually non-existent - specifically if a logo can be generated based on particular licenses

Outcomes

  • Chat interface-supported development is providing good starting points and open-webui being open source is more flexible than Gemini. Although currently some fancy features such as grounding and generated podcasts are missing.
  • Allison still has to be very experienced with openQA to use a chat interface for test review. Publicly available system prompts would make that easier, though.
  • The proof of concept for a model based on test results (Testimony) looks promising, although for real-world use more effort needs to be put into improving the dataset and selecting relevant features.

Looking for hackers with the skills:

ai openqa tensorflow testing python

This project is part of:

Hack Week 24

Activity

  • 11 months ago: livdywan added keyword "python" to this project.
  • 11 months ago: livdywan added keyword "testing" to this project.
  • 11 months ago: livdywan started this project.
  • 11 months ago: livdywan added keyword "ai" to this project.
  • 11 months ago: livdywan added keyword "openqa" to this project.
  • 11 months ago: livdywan added keyword "tensorflow" to this project.
  • 11 months ago: livdywan originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server


    Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios

    Description

    Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.

    This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.

    The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.

    Goals

    By the end of Hack Week, we aim to have a single, working Python script that:

    1. Connects to Prometheus and executes a query to fetch detailed test failure history.
    2. Processes the raw data into a format suitable for the Gemini API.
    3. Successfully calls the Gemini API with the data and a clear prompt.
    4. Parses the AI's response to extract a simple list of flaky tests.
    5. Saves the list to a JSON file that can be displayed in Grafana.
    6. New panel in our Dashboard listing the Flaky tests

    Resources