Project Description
When zypper downloads many RPMs as part of an update, the download phase can be greatly sped up.
The goal of this project is to hack together a better faster version of zypper that speeds up downloads the process up. Then if there are results we can start discussing how to bring the ideas into upstream.
Current Situation
At the moment all downloads are done in series and never done in parallel. Each request to download.opensuse.org leads to an HTTP 302 response that redirects to a mirror.
When connections like this are not reused, there are two sources of bottleneck:
- Having to connect at all means usually a DNS lookup (1RT), opening a new TCP socket (1RT), HTTP Request with Redirect Response (1RT), DNS lookup (1RT), opening TCP socket (1RT), HTTP Request, transfer starts. Even at a mere 30ms RTT, this adds up to 180ms delay.
- The initial TCP window size is not always preserved (yeah yeah there is TCP metric save in the kernel) and starts off much smaller than the user's available bandwidth. Since most packages are below 1MB in size, the initial TCP window size probing phase is not completed and hence transfers are slow. Aka the transfer completes before the window size starts to match the available bandwidth. The higher the latency between mirror and client, the longer the transfer takes in this case.
Solution approach
Here is one concept idea to solve this.
- Do a single query to download.opensuse.org aka MirrorCache to figure out all 302 redirects in one connection or go.
- Sort the downloads by mirrors and use a single http connection for each mirror that stays alive for all requests during the download phase.
It could make sense (if libcurl supports that) to combine those steps by maintaining one reusable http connection for each SCHEMA/HOST/PORT combination (each server) and simply always use the reusable http connection for each request.
Another idea could be to "just" have multiple threads processing each walking through the set of required package downloads in parallel. While this is a naive approach, this alone will bring improvements and a better bandwidth utilisation, even without connection reuse.
How any changes here affect CLI output or YaST output is yet tbd during the project.
Ressources
- https://en.opensuse.org/MirrorCache
- https://en.opensuse.org/Portal:Zypper
Looking for hackers with the skills:
This project is part of:
No hackweek.
Activity