SUSE Hack Week: Gen-AI chatbots and test-automation of generated responses

Description

Start experimenting the generative SUSE-AI chat bot, asking questions on different areas of knowledge or science and possibly analyze the quality of the LLM model response, specific and comparative, checking the answers provided by different LLM models to a same query, using proper quality metrics or tools or methodologies.

Try to define basic guidelines and requirements for quality test automation of AI-generated responses.

First approach of investigation can be based on manual testing: methodologies, findings and data can be useful then to organize valid automated testing.

Goals

Identify criteria and measuring scales for assessment of a text content.
Define quality of an answer/text based on defined criteria .
Identify some knowledge sectors and a proper list of problems/questions per sector.
Manually run query session and apply evaluation criteria to answers.
Draft requirements for test automation of AI answers.

Resources

Announcement of SUSE-AI for Hack Week in Slack
Openplatform and related 3 LLM models gemma:2b, llama3.1:8b, qwen2.5-coder:3b.

Notes

Foundation models (FMs):
are large deep learning neural networks, trained on massive datasets, that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively.
Large language models (LLMs):
are a category of foundation models pre-trained on immense amounts of data acquiring abilities by learning statistical relationships from vast amounts of text during a self- and semi-supervised training process, making them capable of understanding and generating natural language and other types of content , to perform a wide range of tasks.
LLMs can be used for generative AI (artificial intelligence) to produce content based on input prompts in human language.

Validation of a AI-generated answer is not an easy task to perform, as manually as automated.
An LLM answer text shall contain a given level of informations: correcness, completeness, reasoning description etc.
We shall rely in properly applicable and measurable criteria of validation to get an assessment in a limited amount of time and resources.

Join this project Leave this project

Looking for hackers with the skills:

ai llm

This project is part of:

Hack Week 24

Activity

7 months ago: mdati added keyword "llm" to this project.

7 months ago: mdati added keyword "ai" to this project.

8 months ago: mdati liked this project.

8 months ago: mdati started this project.

8 months ago: mdati originated this project.

Comments

8 months ago by livdywan | Reply

You might want to add an ai tag

Similar Projects

This project is one of its kind!