SUSE Hack Week: Speed up downloading RPM packages from mirror in zypper

Project Description

When zypper downloads many RPMs as part of an update, the download phase can be greatly sped up.

The goal of this project is to hack together a better faster version of zypper that speeds up downloads the process up. Then if there are results we can start discussing how to bring the ideas into upstream.

Current Situation

At the moment all downloads are done in series and never done in parallel. Each request to download.opensuse.org leads to an HTTP 302 response that redirects to a mirror.

When connections like this are not reused, there are two sources of bottleneck:

Having to connect at all means usually a DNS lookup (1RT), opening a new TCP socket (1RT), HTTP Request with Redirect Response (1RT), DNS lookup (1RT), opening TCP socket (1RT), HTTP Request, transfer starts. Even at a mere 30ms RTT, this adds up to 180ms delay.
The initial TCP window size is not always preserved (yeah yeah there is TCP metric save in the kernel) and starts off much smaller than the user's available bandwidth. Since most packages are below 1MB in size, the initial TCP window size probing phase is not completed and hence transfers are slow. Aka the transfer completes before the window size starts to match the available bandwidth. The higher the latency between mirror and client, the longer the transfer takes in this case.

Solution approach

Here is one concept idea to solve this.

Do a single query to download.opensuse.org aka MirrorCache to figure out all 302 redirects in one connection or go.
Sort the downloads by mirrors and use a single http connection for each mirror that stays alive for all requests during the download phase.

It could make sense (if libcurl supports that) to combine those steps by maintaining one reusable http connection for each SCHEMA/HOST/PORT combination (each server) and simply always use the reusable http connection for each request.

Another idea could be to "just" have multiple threads processing each walking through the set of required package downloads in parallel. While this is a naive approach, this alone will bring improvements and a better bandwidth utilisation, even without connection reuse.

How any changes here affect CLI output or YaST output is yet tbd during the project.

Ressources

https://en.opensuse.org/MirrorCache
https://en.opensuse.org/Portal:Zypper

Join this project Leave this project

Looking for hackers with the skills:

zypper libcurl http mirrorcache

This project is part of:

No hackweek.

Activity

about 2 years ago: tschmitz joined this project.

about 2 years ago: dheidler started this project.

about 2 years ago: dheidler added keyword "zypper" to this project.

about 2 years ago: dheidler added keyword "libcurl" to this project.

about 2 years ago: dheidler added keyword "http" to this project.

about 2 years ago: dheidler added keyword "mirrorcache" to this project.

about 2 years ago: dheidler originated this project.

Comments

about 2 years ago by dheidler | Reply

https://github.com/Firstyear/mirrorsorcerer#why-mirrorsorcerer---technical-details https://github.com/openSUSE/libzypp/pull/386

Similar Projects

This project is one of its kind!