Let's make reposync faster

Every day,

Multiple times a day,

Every SUSE Manager customer,

Every Red Hat Satellite customer,

Every Spacewalk user,

And every Uyuni user...

...spends a lot of CPU and wall clock time in reposyncing.

Intro

A lot of that time is wasted by an old, overcomplicated and most of all inefficient algorithm that contributes heavily on heat dissipation and user patience depletion!

HackWeek hackers, we can change that!

Past attempts only partially succeeded: https://trello.com/c/inl9Wu0p/40-reduce-global-warming, https://trello.com/c/dYAR0J8K/13-reduce-global-warming-take-2

But we have better tools now!

Tooling

py-spy to the rescue: introduction

Install with: curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python get-pip.py pip install py-spy

Trace a running spacewalk-repo-sync with: py-spy --nonblocking --pid `ps aux | grep spacewalk-repo-sync | grep -v grep | awk '{print $2}'` --flame output.svg --duration 10

Look at the results with:

python -m SimpleHTTPServer 8666

And point your browser to http://<NAME_HERE>:8666/output.html. Here is one such example:

Flame Graph

Current remarks:

  • we currently spend a lot of time in lookup functions
  • lookup functions SELECT rows at every INSERT
  • this is especially bad for checksums, capabilities and some other cases
  • design comes from Oracle and can probably be changed!

Looking for hackers with the skills:

python performance databases postgresql

This project is part of:

Hack Week 18

Activity

  • about 3 years ago: joachimwerner liked this project.
  • about 3 years ago: mkoutny liked this project.
  • about 3 years ago: Pharaoh_Atem liked this project.
  • over 3 years ago: PSuarezHernandez liked this project.
  • over 3 years ago: ebischoff liked this project.
  • over 3 years ago: moio added keyword "python" to this project.
  • over 3 years ago: moio added keyword "performance" to this project.
  • over 3 years ago: moio added keyword "databases" to this project.
  • over 3 years ago: moio added keyword "postgresql" to this project.
  • over 3 years ago: cbosdonnat liked this project.
  • over 3 years ago: moio started this project.
  • over 3 years ago: moio liked this project.
  • over 3 years ago: jbrielmaier liked this project.
  • over 3 years ago: moio originated this project.

  • Comments

    • ebischoff
      over 3 years ago by ebischoff | Reply

      See also this fate request "Have a synchronization that does not take hours (or days)"

    • joachimwerner
      about 3 years ago by joachimwerner | Reply

      Related, but probably out of scope for your hack week project: Once we've optimized the syncing code, I think we could also reduce the scope of what needs to be synced for many customers: Especially for pilots, but also in real life, many of the older updates (e.g. several complete kernels, several Java updates) are never going to be needed, but still synced. We should investigate how we can offer something like a "JeR" ("Just enough Repo") to speed things up even more. This could be done server-side (provide alternative repo metadata for a "current stuff only" repo or client-side (but then some dependency resolution magic is probably needed).

    • chasecrum
      about 3 years ago by chasecrum | Reply

      Any update on how this turned out?

    Similar Projects

    Video For Linux 2 Camera monitor, frame capture, day/night exposure control and timelapse generator in python by dmair

    Use the V4L2 API in a PySide qt6.3 application ...


    Rewrite libkdumpfile Python binding using CFFI by ptesarik

    Project Description

    The [libkdumpfile](ht...


    OMEMO Hexchat plugin by dknorr

    [comment]: # (Please use the project descriptio...


    web-based testing tracker for aquarist-labs's s3gw by jluis

    [comment]: # (Please use the project descriptio...


    Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil

    Join the Gitter channel! [https://gitter.im/uy...


    Port the Minion job queue to TypeScript by kraih

    Project Description

    As part of the [mojo....


    Learn PostgreSQL advanced features by okurz

    Motivation

    The PostgreSQL database implement...