Description

Start experimenting the generative SUSE-AI chat bot, asking questions on different areas of knowledge or science and possibly analyze the quality of the LLM model response, specific and comparative, checking the answers provided by different LLM models to a same query, using proper quality metrics or tools or methodologies.

Try to define basic guidelines and requirements for quality test automation of AI-generated responses.

First approach of investigation can be based on manual testing: methodologies, findings and data can be useful then to organize valid automated testing.

Goals

  • Identify criteria and measuring scales for assessment of a text content.
  • Define quality of an answer/text based on defined criteria .
  • Identify some knowledge sectors and a proper list of problems/questions per sector.
  • Manually run query session and apply evaluation criteria to answers.
  • Draft requirements for test automation of AI answers.

Resources

  • Announcement of SUSE-AI for Hack Week in Slack
  • Openplatform and related 3 LLM models gemma:2b, llama3.1:8b, qwen2.5-coder:3b.

Notes

  • Foundation models (FMs):
    are large deep learning neural networks, trained on massive datasets, that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively.

  • Large language models (LLMs):
    are a category of foundation models pre-trained on immense amounts of data acquiring abilities by learning statistical relationships from vast amounts of text during a self- and semi-supervised training process, making them capable of understanding and generating natural language and other types of content , to perform a wide range of tasks.
    LLMs can be used for generative AI (artificial intelligence) to produce content based on input prompts in human language.

Validation of a AI-generated answer is not an easy task to perform, as manually as automated.
An LLM answer text shall contain a given level of informations: correcness, completeness, reasoning description etc.
We shall rely in properly applicable and measurable criteria of validation to get an assessment in a limited amount of time and resources.

Looking for hackers with the skills:

ai llm

This project is part of:

Hack Week 24

Activity

  • 2 months ago: mdati added keyword "llm" to this project.
  • 2 months ago: mdati added keyword "ai" to this project.
  • 3 months ago: mdati liked this project.
  • 3 months ago: mdati started this project.
  • 3 months ago: mdati originated this project.

  • Comments

    • livdywan
      3 months ago by livdywan | Reply

      You might want to add an ai tag

    Similar Projects

    Use local/private LLM for semantic knowledge search by digitaltomm

    Description

    Use a local LLM, based on SUSE AI (ollama, openwebui) to power geeko search (public instance: https://geeko.port0.org/).

    Goals

    Build a SUSE internal instance of https://geeko.port0.org/ that can operate on internal resources, crawling confluence.suse.com, gitlab.suse.de, etc.

    Resources

    Repo: https://github.com/digitaltom/semantic-knowledge-search

    Public instance: https://geeko.port0.org/

    Results

    Internal instance:

    I have an internal test instance running which has indexed a couple of internal wiki pages from the SCC team. It's using the ollama (llama3.1:8b) backend of suse-ai.openplatform.suse.com to create embedding vectors for indexed resources and to create a chat response. The semantic search for documents is done with a vector search inside of sqlite, using sqlite-vec.

    image


    COOTWbot by ngetahun

    Project Description

    At SCC, we have a rotating task of COOTW (Commanding Office of the Week). This task involves responding to customer requests from jira and slack help channels, monitoring production systems and doing small chores. Usually, we have documentation to help the COOTW answer questions and quickly find fixes. Most of these are distributed across github, trello and SUSE Support documentation. The aim of this project is to explore the magic of LLMs and create a conversational bot.

    Goal for this Hackweek

    • Build data ingestion Data source:
      • SUSE KB docs
      • scc github docs
      • scc trello knowledge board
    • Test out new RAG architecture

    • https://gitlab.suse.de/ngetahun/cootwbot


    Automated Test Report reviewer by oscar-barrios

    Description

    In SUMA/Uyuni team we spend a lot of time reviewing test reports, analyzing each of the test cases failing, checking if the test is a flaky test, checking logs, etc.

    Goals

    Speed up the review by automating some parts through AI, in a way that we can consume some summary of that report that could be meaningful for the reviewer.

    Resources

    No idea about the resources yet, but we will make use of:

    • HTML/JSON Report (text + screenshots)
    • The Test Suite Status GithHub board (via API)
    • The environment tested (via SSH)
    • The test framework code (via files)


    Use AI tools to convert legacy perl scripts to bash by nadvornik

    Description

    Use AI tools to convert legacy perl scripts to bash

    Goals

    Uyuni project contains legacy perl scripts used for setup. The perl dependency could be removed, to reduce the container size. The goal of this project is to research use of AI tools for this task.

    Resources

    Aider

    Results:

    Aider is not the right tool for this. It works ok for small changes, but not for complete rewrite from one language to another.

    I got better results with direct API use from script.


    AI for product management by a_jaeger

    Description

    Learn about AI and how it can help myself

    What are the jobs that a PM does where AI can help - and how?

    Goals

    • Investigate how AI can help with different tasks
    • Check out different AI tools, which one is best for which job
    • Summarize learning

    Resources

    • Reading some blog posts by PMs that looked into it
    • Popular and less popular AI tools

    Work is done SUSE internally at https://confluence.suse.com/display/~a_jaeger/Hackweek+25+-+AI+for+a+PM and subpages.


    Run local LLMs with Ollama and explore possible integrations with Uyuni by PSuarezHernandez

    Description

    Using Ollama you can easily run different LLM models in your local computer. This project is about exploring Ollama, testing different LLMs and try to fine tune them. Also, explore potential ways of integration with Uyuni.

    Goals

    • Explore Ollama
    • Test different models
    • Fine tuning
    • Explore possible integration in Uyuni

    Resources

    • https://ollama.com/
    • https://huggingface.co/
    • https://apeatling.com/articles/part-2-building-your-training-data-for-fine-tuning/


    COOTWbot by ngetahun

    Project Description

    At SCC, we have a rotating task of COOTW (Commanding Office of the Week). This task involves responding to customer requests from jira and slack help channels, monitoring production systems and doing small chores. Usually, we have documentation to help the COOTW answer questions and quickly find fixes. Most of these are distributed across github, trello and SUSE Support documentation. The aim of this project is to explore the magic of LLMs and create a conversational bot.

    Goal for this Hackweek

    • Build data ingestion Data source:
      • SUSE KB docs
      • scc github docs
      • scc trello knowledge board
    • Test out new RAG architecture

    • https://gitlab.suse.de/ngetahun/cootwbot