Simplify the codebase by using a more modern toolkit to accelerate maintenance and future development.
Enjoy Hakkaweek!
Problem
Uyuni's codebase has a 10+ year history, with contributions from dozens of individuals at varying points in their technical experience, understanding of the project, goals and timelines.
Inevitably, solutions to some of the common problems have grown organically, with different people solving slightly different instances of the same problem in different and possibly incomplete ways. This makes maintenance and future evolution of Uyuni more difficult, as developers and designers need to keep specifics of all such solutions into consideration.
One such area is the handling of asynchronous tasks within Java code, where currently we have at least 4 different implementations, none of which is really complete.
Spirit of this HackWeek effort
Write a proof-of-concept to replace as many existing, incomplete implementations as possible using one piece of existing, proven open source technology which already fulfills all known requirements (akka.io).
Should such a proof-of-concept succeed, integrating it into the project would:
- Simplify the code base, and possibly user experience, given there would be 1 solution instead of 4
- Modernize our project, by using a new toolkit instead of several older, homegrown solutions
- Accelerate new development of Uyuni, by leveraging existing know-how of Akka, and not requiring developers to know all current solutions
Failure... is definitely an option!
Learning why this can't work will be as valuable as having it working!
Not learning and not having fun... are NOT options!
That's HackWeek folks!
Hacking details
For asynchronous task handling we currently have:
- Taskomatic's basic facilities, including one-shot tasks
- Some specific task facilities (eg.
ErrataCacheDriver
,SSHPushDriver
) - Tomcat's
MessageQueue
facility - Tomcat's Salt event handler (
PGEventListener
)
Open problems that in various degrees affect all of the above are:
- enforcing dependencies between asynchronous tasks
- enforcing dependencies between asynchronous tasks and other parts of the codebase
- establishing priority of tasks
- tuning, performance
- observability
akka.io seems to have all needed mechanisms:
- dependencies can be expressed by actors firing messages to other actors when done
- any other part of the code base can ask an actor to do something by sending it a message
- Akka has a concept of priority messages to actors
- Akka seems to be regarded as high performance, has lots of tuning knobs
- Akka has a lot of utilities for monitoring
Moreover Akka can run actors on different processes or even hosts transparently to the code. We could exploit this to improve the Tomcat/Taskomatic interface or even offload some tasks to separate machines.
Battle plan
- spend one day researching Akka and understanding how it works
- attack the simplest subsystem first, try to have it working somehow with Akka
- attack as many other subsystems as possible
- last half day: prepare demo for Monday : : :
Looking for hackers with the skills:
This project is part of:
Hack Week 19
Activity
Comments
-
over 5 years ago by moio | Reply
Diary - day 1: learning
We learned many concepts:
- some immediately relevant for this HackWeek: Actor, ActorSystem, ActorContext, typed Actor, Actor hierarchy, Behavior, Mailbox, Dispatcher, Routing, at-most-once delivery
- others not immediately relevant, still interesting: Supervision, Artery, Akka Clustering, Akka Streams, backpressure, the Phi Accrual Failure Detector, Akka Persistence/Event Sourcing, consistent hashing
- on top of that: some new related tech bits: vavr.io, aeron.io, logback, Lightbend Telemetry, HOCON
We also played with some "hello world" examples.
-
over 5 years ago by moio | Reply
Diary - day 2: MessageQueue classes
- "hello world" examples moved inside the Uyuni code base and ran in Uyuni
- replaced "hello world" examples with code to run three MessageQueue tasks, which work after the port to akka!
- added reflection-based glue code to automatically register Actors, minimizing boilerplate for next iterations
- in the process, developed several refactoring patches to simplify our codebase that can be accepted even outside of the scope of this effort (and a bugfix and a contribution to an unrelated PR)
- learnt about the Reflections library
Always-up-to-date-Uyuni PR with latest results available
To try:
- deploy an Uyuni Server (Master branch)
- use manager-build.xml to resolve Ivy dependencies and deploy
-
over 5 years ago by moio | Reply
Diary - day 3: traditional MessageQueue is obsoleted
- created a better mechanism to start asynchronous tasks after a Transaction has ended. Current code requires polling every 10ms!
- more use cases of MessageQueue were moved to Akka. By end of day only the ones for Salt management remain
- configuration mechanism added to decide concurrency limits (per message type!)
- some dead code removed
-
over 5 years ago by moio | Reply
Diary - day 4: MessageQueue and Salt event thread pool are removed
- many more use cases are switched to Akka (total is ~35)
- MessageQueue and related classes are removed completely, together with all surrounding likely unsafe code
- deferring mechanism moved to race-condition prone plain Java to Actor
- hash routing implemented in a straightforward patch compared to a much more complicated one we had, touching 3 programming languages
-
-
over 5 years ago by moio | Reply
Diary - HackWeekEnd: (a very first) distributed Uyuni works!
- we feel like we just can't leave this code alone
- created a worker node alongside the Uyuni Server
- flagged the Minion Start Event worker (Akka-ified on day 4) to run in the worker
- fixed bugs and added hacks until registration works, moving load from the Server to the worker!
- adding more workers distributes load round-robin!
Soooo...
[
Similar Projects
Move Uyuni Test Framework from Selenium to Playwright + AI by oscar-barrios
Description
This project aims to migrate the existing Uyuni Test Framework from Selenium to Playwright. The move will improve the stability, speed, and maintainability of our end-to-end tests by leveraging Playwright's modern features. We'll be rewriting the current Selenium code in Ruby to Playwright code in TypeScript, which includes updating the test framework runner, step definitions, and configurations. This is also necessary because we're moving from Cucumber Ruby to CucumberJS.
If you're still curious about the AI in the title, it was just a way to grab your attention. Thanks for your understanding.
Goals
- Migrate Core tests including Onboarding of clients
- Improve test reliabillity: Measure and confirm a significant reduction of flakynes.
- Implement a robust framework: Establish a well-structured and reusable Playwright test framework using the CucumberJS
Resources
- Existing Uyuni Test Framework (Cucumber Ruby + Capybara + Selenium)
- My Template for CucumberJS + Playwright in TypeScript
- Started Hackweek Project
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age
metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}
. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range]))
. - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring