Project Description

When zypper downloads many RPMs as part of an update, the download phase can be greatly sped up.

The goal of this project is to hack together a better faster version of zypper that speeds up downloads the process up. Then if there are results we can start discussing how to bring the ideas into upstream.

Current Situation

At the moment all downloads are done in series and never done in parallel. Each request to download.opensuse.org leads to an HTTP 302 response that redirects to a mirror.

When connections like this are not reused, there are two sources of bottleneck:

  1. Having to connect at all means usually a DNS lookup (1RT), opening a new TCP socket (1RT), HTTP Request with Redirect Response (1RT), DNS lookup (1RT), opening TCP socket (1RT), HTTP Request, transfer starts. Even at a mere 30ms RTT, this adds up to 180ms delay.
  2. The initial TCP window size is not always preserved (yeah yeah there is TCP metric save in the kernel) and starts off much smaller than the user's available bandwidth. Since most packages are below 1MB in size, the initial TCP window size probing phase is not completed and hence transfers are slow. Aka the transfer completes before the window size starts to match the available bandwidth. The higher the latency between mirror and client, the longer the transfer takes in this case.

Solution approach

Here is one concept idea to solve this.

  1. Do a single query to download.opensuse.org aka MirrorCache to figure out all 302 redirects in one connection or go.
  2. Sort the downloads by mirrors and use a single http connection for each mirror that stays alive for all requests during the download phase.

It could make sense (if libcurl supports that) to combine those steps by maintaining one reusable http connection for each SCHEMA/HOST/PORT combination (each server) and simply always use the reusable http connection for each request.

Another idea could be to "just" have multiple threads processing each walking through the set of required package downloads in parallel. While this is a naive approach, this alone will bring improvements and a better bandwidth utilisation, even without connection reuse.

How any changes here affect CLI output or YaST output is yet tbd during the project.

Ressources

  • https://en.opensuse.org/MirrorCache
  • https://en.opensuse.org/Portal:Zypper

Looking for hackers with the skills:

zypper libcurl http mirrorcache

This project is part of:

No hackweek.

Activity

  • over 1 year ago: tschmitz joined this project.
  • over 1 year ago: dheidler started this project.
  • over 1 year ago: dheidler added keyword "zypper" to this project.
  • over 1 year ago: dheidler added keyword "libcurl" to this project.
  • over 1 year ago: dheidler added keyword "http" to this project.
  • over 1 year ago: dheidler added keyword "mirrorcache" to this project.
  • over 1 year ago: dheidler originated this project.

  • Comments

    • dheidler
      over 1 year ago by dheidler | Reply

      https://github.com/Firstyear/mirrorsorcerer#why-mirrorsorcerer---technical-details https://github.com/openSUSE/libzypp/pull/386

    Similar Projects

    Ansible for add-on management by lmanfredi

    Description

    Machines can contains various combinations of add-ons and are often modified during the time.

    The list of repos can change so I would like to create an automation able to reset the status to a given state, based on metadata available for these machines

    Goals

    Create an Ansible automation able to take care of add-on (repo list) configuration using metadata as reference

    Resources

    Results

    Created WIP project Ansible-add-on-openSUSE


    "autoremove" functionality for zypper by e_bischoff

    The purpose would be to have the equivalent of Ubuntu's "apt-get autoremove" functionality.

    When you install package P, it might draw in dependancies D1, D2, ... Dn automatically.

    When you later uninstall P, the dependancies D1, D2, ..., Dn might remain on your system.

    If you keep installing and uninstalling packages, after a while your system remains cluttered with things you don't need.

    The idea would be to mark all dependancies that were installed but not explicitely requested as "installed automatically". Then a command like "zypper autoremove" could remove them at once if they are not needed anymore.

    Approaches seen online

    After scouring forms for solutions these are some alias's used to replicate the functionality

    1. zypper packages --unneeded | awk -F'|' 'NR==0 || NR==1 || NR==2 || NR==3 || NR==4 {next} {print $3}' | grep -v Name | sudo xargs zypper remove --clean-deps

    2. This one is a script and has bashisms bash mapfile -t unneeded < <(zypper --quiet pa --unneeded | awk '$1 == "i" { print $5, "-", $7 }') (( ${#unneeded@]} )) && sudo zypper --quiet rm --clean-deps --details "${unneeded@]}"

    3. sudo zypper rm $(zypper pa --unneeded | awk '/i / {print $3}' FS='|' | uniq | tr -d ' ')

    Based on testing zypper packages --orphaned provides packages that are not in any repo, even if a user has explicitly installed them, so --orphaned may not be the way to go, instead focusing on --unneeded