I once had a bad dream.
I started good, a sunny day. I had just fixed an issue and push it to my fork, in order to create a Pull Request. I was happy. It felt awesome to have found a fix so elegant. Two lines of code.
But then, something happened. A cloud appear in the sky and partially hide the sun. github triggered the end to end test suite. You could see the waiting icon, you could almost hear the engines starting, ... a lightning and a thunder appeared in the sky, the sky turned dark and the e2e test started in our private jenkins server.
1 hour passed by... 2 hours .... the e2e tests still running ... 4 hours, not even 50%. Finally, I had to go, get some dinner, get some rest, so I decided to look at it the next day.
Sleep was not good. Dreaming the tests would fail and the deadline be missed .... I couldn't sleep no more so I wake up early in the morning, before sunrise. The computer still opened was lightning up the room. Out, the storm turned out to be a heavy rain. And the tests did not passed! 4 hours and half after starting them, so half an hour after leaving for dinner, and there was a fail test.
Finding out what went wrong took 2 hours...it was not even related to the code, but an infrastructure issue. A timeout when connecting to a mirror which was being restarted when the tests run. Apparently we hit a maintenance window.
Damn! So let's just "restart" the tests. This time, I decided to put an alarm 4 hours after and switch to another task, just to avoid the anxiety of the previous evening looking at the results. 4 hours passed by, and the tests are good. It is not raining anymore, a git of sun is filtering throw the window, hope is in the air. 4 more hours, and the tests still have not failed. Going to dinner and bed again and putting the alarm early in the morning. First thing in the morning I looked at the tests ... running again, feels good. All tests have passed and there is only one final "cleanup" state. Shower, breakfeast, sun is shining again!
Finally, all is green, so let's go and push the merge button ... oh oh ... clouds hide the sun again ... where is the merge button?? no way, there is a conflict with the code in master and I can't merge!!! I can't merge!!! shocked, I looked into the history ... John merged a PR overnight that was touching the same file ... hopeless I start crying on my desk...
Scared?
This is not so different on what could happen if we were running the whole e2e test suite in the SUSE Manager (uyuni) Pull Requests. However, not running any e2e tests has also a bad consequence. We find the issues after code has been merged in master, and then we spend days looking at what could have caused this, reverting commits, and starting again.
However, if we could run a subset of the e2e tests, we could shorten the time it takes and so run them at the PR level, so no broken code could get merged.
The thing is , how do we select which tests to run?
This project is about using Machine Learning to do predictive test selection, thus selecting which tests to run based on the history of previous test runs.
Inspired by: https://engineering.fb.com/2018/11/21/developer-tools/predictive-test-selection/
No Hackers yet
This project is part of:
Hack Week 20
Activity
Comments
-
almost 4 years ago by llansky3 | Reply
This could have some wider potential - clever picking of tests to shorten the time yet maximize the findings (and learning that in time based on test execution history to predict). I can see how that saves tons of money in cases where test on a real industrial machines need to be executed (e.g. cost of running engine on a test bed is 5-10k$ per day so there is huge difference if you need to run for 1 or 3 days)
Similar Projects
COOTWbot by ngetahun
Project Description
At SCC, we have a rotating task of COOTW (Commanding Office of the Week). This task involves responding to customer requests from jira and slack help channels, monitoring production systems and doing small chores. Usually, we have documentation to help the COOTW answer questions and quickly find fixes. Most of these are distributed across github, trello and SUSE Support documentation. The aim of this project is to explore the magic of LLMs and create a conversational bot.
Goal for this Hackweek
- Build data ingestion
Data source:
- SUSE KB docs
- scc github docs
- scc trello knowledge board
Test out new RAG architecture
https://gitlab.suse.de/ngetahun/cootwbot
Drag Race - comparative performance testing for pull requests by balanza
Description
«Sophia, a backend developer, submitted a pull request with optimizations for a critical database query. Once she pushed her code, an automated load test ran, comparing her query against the main branch. Moments later, she saw a new comment automatically added to her PR: the comparison results showed reduced execution time and improved efficiency. Smiling, Sophia messaged her team, “Performance gains confirmed!”»
Goals
- To have a convenient and ergonomic framework to describe test scenarios, including environment and seed;
- to compare results from different tests
- to have a GitHub action that executes such tests on a CI environment
Resources
The MVP will be built on top of Preevy and K6.
Yearly Quality Engineering Ask me Anything - AMA for not-engineering by szarate
Goal
Get a closer look at how developers work on the Engineering team (R & D) of SUSE, and close the collaboration gap between GSI and Engineering
Why?
Santiago can go over different development workflows, and can do a deepdive into how Quality Engineering works (think of my QE Team, the advocates for your customers), The idea of this session is to help open the doors to opportunities for collaboration, and broaden our understanding of SUSE as a whole.
Objectives
- Give $audience a small window on how to get some questions answered either on the spot or within days of how some things at engineering are done
- Give Santiago Zarate from Quality Engineering a look into how $audience sees the engineering departments, and find out possibilities of further collaboration
How?
By running an "Ask me Anything" session, which is a format of a kind of open Q & A session, where participants ask the host multiple questions.
How to make it happen?
I'm happy to help joining a call or we can do it async (online/in person is more fun). Ping me over email-slack and lets make the magic happen!. Doesn't need to be during hackweek, but we gotta kickstart the idea during hackweek ;)
Rules
The rules are simple, the more questions the more fun it will be; while this will be only a window into engineering, it can also be the place to help all of us get to a similar level of understanding of the processes that are behind our respective areas of the organization.
Dynamics
The host will be monitoring the questions on some pre-agreed page, and try to answer to the best of their knowledge, if a question is too difficult or the host doesn't have the answer, he will do his best to provide an answer at a later date.
Atendees are encouraged to add questions beforehand; in the case there aren't any, we would be looking at how Quality Engineering tests new products or performs regression tests
Agenda
- Introduction of Santiago Zarate, Product Owner of Quality Engineering Core team
- Introduction of the Group/Team/Persons interested
- Ice breaker
- AMA time! Add your questions $PAGE
- Looking at QE Workflows: How is
- A maintenance update being tested before being released to our customers
- Products in development are tested before making it generally available
- Engineering Opportunity Board
Gen-AI chatbots and test-automation of generated responses by mdati
Description
Start experimenting the generative SUSE-AI chat bot, asking questions on different areas of knowledge or science and possibly analyze the quality of the LLM model response, specific and comparative, checking the answers provided by different LLM models to a same query, using proper quality metrics or tools or methodologies.
Try to define basic guidelines and requirements for quality test automation of AI-generated responses.
First approach of investigation can be based on manual testing: methodologies, findings and data can be useful then to organize valid automated testing.
Goals
- Identify criteria and measuring scales for assessment of a text content.
- Define quality of an answer/text based on defined criteria .
- Identify some knowledge sectors and a proper list of problems/questions per sector.
- Manually run query session and apply evaluation criteria to answers.
- Draft requirements for test automation of AI answers.
Resources
- Announcement of SUSE-AI for Hack Week in Slack
- Openplatform and related 3 LLM models gemma:2b, llama3.1:8b, qwen2.5-coder:3b.
Notes
Foundation models (FMs):
are large deep learning neural networks, trained on massive datasets, that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively.Large language models (LLMs):
are a category of foundation models pre-trained on immense amounts of data acquiring abilities by learning statistical relationships from vast amounts of text during a self- and semi-supervised training process, making them capable of understanding and generating natural language and other types of content , to perform a wide range of tasks.
LLMs can be used for generative AI (artificial intelligence) to produce content based on input prompts in human language.
Validation of a AI-generated answer is not an easy task to perform, as manually as automated.
An LLM answer text shall contain a given level of informations: correcness, completeness, reasoning description etc.
We shall rely in properly applicable and measurable criteria of validation to get an assessment in a limited amount of time and resources.
Run local LLMs with Ollama and explore possible integrations with Uyuni by PSuarezHernandez
Description
Using Ollama you can easily run different LLM models in your local computer. This project is about exploring Ollama, testing different LLMs and try to fine tune them. Also, explore potential ways of integration with Uyuni.
Goals
- Explore Ollama
- Test different models
- Fine tuning
- Explore possible integration in Uyuni
Resources
- https://ollama.com/
- https://huggingface.co/
- https://apeatling.com/articles/part-2-building-your-training-data-for-fine-tuning/
Use AI tools to convert legacy perl scripts to bash by nadvornik
Description
Use AI tools to convert legacy perl scripts to bash
Goals
Uyuni project contains legacy perl scripts used for setup. The perl dependency could be removed, to reduce the container size. The goal of this project is to research use of AI tools for this task.
Resources
Results:
Aider is not the right tool for this. It works ok for small changes, but not for complete rewrite from one language to another.
I got better results with direct API use from script.
AI for product management by a_jaeger
Description
Learn about AI and how it can help myself
What are the jobs that a PM does where AI can help - and how?
Goals
- Investigate how AI can help with different tasks
- Check out different AI tools, which one is best for which job
- Summarize learning
Resources
- Reading some blog posts by PMs that looked into it
- Popular and less popular AI tools
Work is done SUSE internally at https://confluence.suse.com/display/~a_jaeger/Hackweek+25+-+AI+for+a+PM and subpages.
Research how LLMs could help to Linux developers and/or users by anicka
Description
Large language models like ChatGPT have demonstrated remarkable capabilities across a variety of applications. However, their potential for enhancing the Linux development and user ecosystem remains largely unexplored. This project seeks to bridge that gap by researching practical applications of LLMs to improve workflows in areas such as backporting, packaging, log analysis, system migration, and more. By identifying patterns that LLMs can leverage, we aim to uncover new efficiencies and automation strategies that can benefit developers, maintainers, and end users alike.
Goals
- Evaluate Existing LLM Capabilities: Research and document the current state of LLM usage in open-source and Linux development projects, noting successes and limitations.
- Prototype Tools and Scripts: Develop proof-of-concept scripts or tools that leverage LLMs to perform specific tasks like automated log analysis, assisting with backporting patches, or generating packaging metadata.
- Assess Performance and Reliability: Test the tools' effectiveness on real-world Linux data and analyze their accuracy, speed, and reliability.
- Identify Best Use Cases: Pinpoint which tasks are most suitable for LLM support, distinguishing between high-impact and impractical applications.
- Document Findings and Recommendations: Summarize results with clear documentation and suggest next steps for potential integration or further development.
Resources
- Local LLM Implementations: Access to locally hosted LLMs such as LLaMA, GPT-J, or similar open-source models that can be run and fine-tuned on local hardware.
- Computing Resources: Workstations or servers capable of running LLMs locally, equipped with sufficient GPU power for training and inference.
- Sample Data: Logs, source code, patches, and packaging data from openSUSE or SUSE repositories for model training and testing.
- Public LLMs for Benchmarking: Access to APIs from platforms like OpenAI or Hugging Face for comparative testing and performance assessment.
- Existing NLP Tools: Libraries such as spaCy, Hugging Face Transformers, and PyTorch for building and interacting with local LLMs.
- Technical Documentation: Tutorials and resources focused on setting up and optimizing local LLMs for tasks relevant to Linux development.
- Collaboration: Engagement with community experts and teams experienced in AI and Linux for feedback and joint exploration.