Let's make reposync faster
Every day,
Multiple times a day,
Every SUSE Manager customer,
Every Red Hat Satellite customer,
Every Spacewalk user,
And every Uyuni user...
...spends a lot of CPU and wall clock time in reposyncing.
Intro
A lot of that time is wasted by an old, overcomplicated and most of all inefficient algorithm that contributes heavily on heat dissipation and user patience depletion!
HackWeek hackers, we can change that!
Past attempts only partially succeeded: https://trello.com/c/inl9Wu0p/40-reduce-global-warming, https://trello.com/c/dYAR0J8K/13-reduce-global-warming-take-2
But we have better tools now!
Tooling
py-spy to the rescue: introduction
Install with:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install py-spy
Trace a running spacewalk-repo-sync
with:
py-spy --nonblocking --pid `ps aux | grep spacewalk-repo-sync | grep -v grep | awk '{print $2}'` --flame output.svg --duration 10
Look at the results with:
python -m SimpleHTTPServer 8666
And point your browser to http://<NAME_HERE>:8666/output.html
. Here is one such example:
Current remarks:
- we currently spend a lot of time in lookup functions
- lookup functions SELECT rows at every INSERT
- this is especially bad for checksums, capabilities and some other cases
- design comes from Oracle and can probably be changed!
Looking for hackers with the skills:
This project is part of:
Hack Week 18
Activity
Comments
-
almost 4 years ago by ebischoff | Reply
See also this fate request "Have a synchronization that does not take hours (or days)"
-
almost 4 years ago by joachimwerner | Reply
Related, but probably out of scope for your hack week project: Once we've optimized the syncing code, I think we could also reduce the scope of what needs to be synced for many customers: Especially for pilots, but also in real life, many of the older updates (e.g. several complete kernels, several Java updates) are never going to be needed, but still synced. We should investigate how we can offer something like a "JeR" ("Just enough Repo") to speed things up even more. This could be done server-side (provide alternative repo metadata for a "current stuff only" repo or client-side (but then some dependency resolution magic is probably needed).
-
Similar Projects
OSSelot collaboration on legal reviews of community packages by lkocman
[comment]: # (Please use the project descriptio...
Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil
Join the Gitter channel! [https://gitter.im/uy...
Give some love to spec-cleaner by kstreitova
Project Description
spec-cleaner is an open...
zypper plugin for discovering reverse dependencies by bzoltan1
Often when we test maintenance updates or when ...
Linux Immersive Learning System (LILS) by dgarcia
[comment]: # (Please use the project descriptio...
Vai: a Kubernetes API accelerator/cache by moio
![Kubernetes API caching layer according to Sta...
Visualization of historical sar(1) archives by ggherdovich
Project Description
The sar(1)
tool, fr...
Learn PostgreSQL advanced features by okurz
Motivation
The PostgreSQL database implement...
Port the Minion job queue to TypeScript by kraih
Project Description
As part of the [mojo....