SUSE Hack Week: Investigate zypper/openSUSE repository refresh optimisations

Project Description

For a few months, openSUSE community has the ability to host the openSUSE rpm repositories on a commercial CDN and it is in a slowly rolling out phase. There are however remaining potential bottlenecks and optimisation opportunities. My goal for this hack week is to investigate them and make reasonable progress on resolving them.

Topics that are in scope and being investigated:

Switch of repository metadata from gzip to the more modern (faster to decompress, smaller sizes) zstd
investigate zypper ref performance overheads
Understand Zsync and ZChunk and benchmark for tradeoffs
Leverage the CDN for delivering repository metadata and a mirrorlist so that the roundtrips to download.o.org can be reduced for users outside europe.
Investigate performance of HTTP/3 to see if it would benefit us

Goal for this Hackweek

Focus on openSUSE Tumbleweed

Looking for hackers with the skills:

opensuse cdn zypper rpm performance benchmark

This project is part of:

Hack Week 23

Activity

about 2 years ago: mlschroe joined this project.

about 2 years ago: favogt joined this project.

about 2 years ago: favogt liked this project.

about 2 years ago: dirkmueller added keyword "cdn" to this project.

about 2 years ago: dirkmueller added keyword "zypper" to this project.

about 2 years ago: dirkmueller added keyword "rpm" to this project.

about 2 years ago: dirkmueller added keyword "performance" to this project.

about 2 years ago: dirkmueller added keyword "benchmark" to this project.

about 2 years ago: dirkmueller added keyword "opensuse" to this project.

about 2 years ago: dirkmueller started this project.

about 2 years ago: dirkmueller originated this project.

Comments

about 2 years ago by dirkmueller | Reply

It turns out that createrepo_c was already preparing the switch to Zstd and Zchunk, so the bulk of the work has been to fix various places in the Open Build Service and product building logic to handle that. The submissions are planned to go live on Nov 13th.

This provides overall a 10-30% reduction in download size for repository data and removes a few hundred ms in decompression time.

about 2 years ago by dirkmueller | Reply

@mlschroe made a patch to libsolv that removes ~ 500ms from the parsing time for the tumbleweed repositories: https://github.com/openSUSE/libsolv/commit/23cbed3219bd07b5c3fa1ed8a6f2fa6c478c0fdb

about 2 years ago by dirkmueller | Reply

I've spend several days on profiling and tuning zchunk support for the tumbleweed usecase (updating from tumbleweed snapshots) and made some upstream contributions to createrepo_c and zchunk to allow for these tunings.

about 2 years ago by dirkmueller | Reply

@mlschroe worked on reviving zsync support in libzypp's default multicurl implementation

about 2 years ago by dirkmueller | Reply

I've submitted necessary changes to enable curl with http/3 support. However this requires either switching to gnutls (which the security team doesn't like) or include the quictls patches for openssl. submitted the latter and waiting for maintainer review.

Similar Projects

opensuse

Create openSUSE images for Arm and RISC-V boards by avicenzi

Project Description

Create openSUSE images (or test generic EFI images) for Arm and RISC-V boards that are not yet supported.

Goal for Hackweek

Create bootable images of Tumbleweed for SBCs that currently have no images available or are untested.

Consider generic EFI images where possible, as some boards can hold a bootloader.

Document in the openSUSE Wiki how to flash and use the image for a given board.

Hack Week 25

Hack Week 24

Hack Week 23

SiFive HiFive Unmatched

Hack Week 22

NanoPC T4

Hack Week 21

Resources

zypper

File search subcommand for zypper by mook_work

Description

Zypper currently only supports searching for files in a few pre-defined prefixes (/usr/bin, /usr/sbin, etc.) via using rpm provides. This means that it is difficult to find files that are not in the explicit list.

Doing this in zypper itself seems difficult.

Goals

Create a zypper subcommand (plugin) that can download file lists as needed to search for file contents. As a stretch goal, have a secondary subcommand that can list files in packages regardless of whether they are already installed.

At this point, making the resulting code usable as part of zypper itself is not a goal, as it would probably take more time than available.

Resources

Upstream issue: https://github.com/openSUSE/zypper/issues/469

Stretch goal: https://github.com/openSUSE/zypper/issues/164

Subcommands: https://manpages.opensuse.org/Tumbleweed/zypper/zypper.8.en.html#SUBCOMMANDS

rpm

Switch software-o-o to store repomd in a database by hennevogel

Description

The openSUSE Software portal is a web app to explore binary packages of openSUSE distributions. Kind of like an package manager / app store.

https://software.opensuse.org/

This app has been around forever (August 2007) and it's architecture is a bit brittle. It acts as a frontend to the OBS distributions and published binary search APIs, calculates and caches a lot of stuff in memory and needs code changes nearly every openSUSE release to keep up.

As you can imagine, it's a heavy user of the OBS API, especially when caches are cold.

Goals

I want to change the app to cache repomod data in a (postgres) database structure

Distributions have many Repositories
Repositories have many Packages
Packages have many Patches

The UI workflows will be as following

As an admin I setup Distribution and it's repositories
As an admin I sync all repositories repomd files into to the database
As a user I browse a Distribution by category
As a user I search for Package of a Distribution in it's Repositories
As a user I extend the search to Package build on OBS for this Distribution

This has a couple of pro's:

Less traffic on the OBS API as the usual Packages are inside the database
Easier base to add features to this page. Like comments, ratings, openSUSE specific screenshots etc.
Separating the Distribution package search from searching through OBS will hopefully make more clear for newbies that enabling extra repositories is kind of dangerous.

And one con:

You can't search for packages build for foreign distributions with this app anymore (although we could consume their repomd etc. but I doubt we have the audience on an opensuse.org domain...)

TODO

Introduce a PG database
Add clockworkd as scheduler and delayed_job as ActiveJob backend
Introduce ActiveStorage
Build initial data model
Introduce repomd to database sync
- Adapt repomd sync to Leap 16.0 repomod layout changes (single arch, no update repo)
- Make repomd sync idempotent
Introduce database search
Setup foreman to run rails s and rake jobs:workoff
Adapt UI
- Build Category Browsing
- Build Admin Distribution CRUD interface

Kudos aka openSUSE Recognition Platform by lkocman

Description

Relevant blog post at news-o-o

I started the Kudos application shortly after Leap 16.0 to create a simple, friendly way to recognize people for their work and contributions to openSUSE. There’s so much more to our community than just submitting requests in OBS or gitea we have translations (not only in Weblate), wiki edits, forum and social media moderation, infrastructure maintenance, booth participation, talks, manual testing, openQA test suites, and more!

Goals

Kudos under github.com/openSUSE/kudos with build previews aka netlify
Have a kudos.opensuse.org instance running in production
Build an easy-to-contribute recognition platform for the openSUSE community a place where everyone can send and receive appreciation for their work, across all areas of contribution.
In the future, we could even explore reward options such as vouchers for t-shirts or other community swag, small tokens of appreciation to make recognition more tangible.

Resources

(Do not create new badge requests during hackweek, unless you'll make the badge during hackweek)

Source code: openSUSE/kudos
Badges: openSUSE/kudos-badges
Issue tracker: kudos/issues

A CLI for Harvester by mohamed.belgaied

Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.

Project Description

Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as: harvester vm create my-vm --count 5 to create 5 VMs named my-vm-01 to my-vm-05.

Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.

Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli

Done in previous Hackweeks

Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE

Goal for this Hackweek

The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.

Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it

Issue list is here: https://github.com/belgaied2/harvester-cli/issues

Resources

The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact). Welcome contributions are:

Testing it and creating issues
Documentation
Go code improvement

What you might learn

Harvester CLI might be interesting to you if you want to learn more about:

GitHub Actions
Harvester as a SUSE Product
Go programming language
Kubernetes API
Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)

Create a page with all devel:languages:perl packages and their versions by tinita

Description

Perl projects now live in git: https://src.opensuse.org/perl

It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.

I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta

But I never had the time to automate this.

I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).

Goals

Have a script that updates a central repository (e.g. https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run)
Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)

Resources

Results

Day 1

First part of the code which retrieves data from https://src.opensuse.org/perl/_ObsPrj with submodules and creates a YAML and a JSON file.
Repo: https://github.com/perlpunk/opensuse-perl-meta
Also a first version of the HTML is live: https://perlpunk.github.io/opensuse-perl-meta/

Day 2

HTML Page has now links to src.opensuse.org and the date of the last update, plus a short info at the top
Code is now 100% covered by tests: https://app.codecov.io/gh/perlpunk/opensuse-perl-meta
I used the modern perl class feature, which makes perl classes even nicer and shorter. See example
Tests
- I tried out the mocking feature of the modern Test2::V0 library which provides call tracking. See example
- I tried out comparing data structures with the new Test2::V0 library. It let's you compare parts of the structure with the like function, which only compares the date that is mentioned in the expected data. example

Day 3

Added various things to the table
- Dependencies column
- Show popup with info for cpanspec, patches and dependencies
- Added last date / commit to the data export.

Plan: With the added date / commit we can now daily check _ObsPrj for changes and only fetch the data for changed packages.

Day 4

performance

dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari

Description

dynticks-testing is a project started years ago by Frederic Weisbecker. One of the feature is to check the actual configuration (isolcpus, irqaffinity etc etc) and give feedback on it.

An important goal of this tool is to parse the output of trace-cmd / perf and provide more readable data, showing the duration of every events grouped by PID (showing also the CPU number, if the tasks has been migrated etc).

An example of data captured on my laptop (incomplete!!):

          -0     [005] dN.2. 20310.270699: sched_wakeup:         WaylandProxy:46380 [120] CPU:005
          -0     [005] d..2. 20310.270702: sched_switch:         swapper/5:0 [120] R ==> WaylandProxy:46380 [120]
...
    WaylandProxy-46380 [004] d..2. 20310.295397: sched_switch:         WaylandProxy:46380 [120] S ==> swapper/4:0 [120]
          -0     [006] d..2. 20310.295397: sched_switch:         swapper/6:0 [120] R ==> firefox:46373 [120]
         firefox-46373 [006] d..2. 20310.295408: sched_switch:         firefox:46373 [120] S ==> swapper/6:0 [120]
          -0     [004] dN.2. 20310.295466: sched_wakeup:         WaylandProxy:46380 [120] CPU:004

Output of noise_parse.py:

Task: WaylandProxy Pid: 46380 cpus: {4, 5} (Migrated!!!)
        Wakeup Latency                                Nr:        24     Duration:          89
        Sched switch: kworker/12:2                    Nr:         1     Duration:           6

My first contribution is around Nov. 2024!

Goals

add more features (eg cpuset)
test / bugfix

Resources

Frederic's public repository: https://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git/
https://docs.kernel.org/timers/no_hz.html#testing

Progresses

isolcpus and cpusets implemented and merged in master: dynticks-testing.git commit

RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

Description

The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.

We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.

This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.

The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.

Goals

The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.

Key goals for the week:

Analyze & Identify: Dive into the SUSE/rmt Ruby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions).
Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like rb-sys or magnus.
Benchmark: Create a benchmarking script (e.g., using k6, ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients.
Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.

Resources

RMT Source Code (Ruby):
- https://github.com/SUSE/rmt
RMT Documentation:
- https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
Tooling & Stacks:
- RMT/Ruby development environment (for running the base RMT)
- Rust development environment (rustup, cargo)
Potential Integration Libraries:
- rb-sys: https://github.com/oxidize-rb/rb-sys
- Magnus: https://github.com/matsadler/magnus
Benchmarking Tools:
- k6 (https://k6.io/)
- ab (ApacheBench)

Project Description

Goal for this Hackweek

Looking for hackers with the skills:

This project is part of:

Activity

Comments

about 2 years ago by dirkmueller | Reply

about 2 years ago by dirkmueller | Reply

about 2 years ago by dirkmueller | Reply

about 2 years ago by dirkmueller | Reply

about 2 years ago by dirkmueller | Reply

Similar Projects

opensuse

Create openSUSE images for Arm and RISC-V boards by avicenzi

Project Description

Goal for Hackweek

Hack Week 25

Hack Week 24

Hack Week 23

Hack Week 22

Hack Week 21

Resources

zypper

File search subcommand for zypper by mook_work

Description

Goals

Resources

rpm

Switch software-o-o to store repomd in a database by hennevogel

Description

Goals

TODO

Kudos aka openSUSE Recognition Platform by lkocman

Description

Goals

Resources

A CLI for Harvester by mohamed.belgaied

Project Description

Done in previous Hackweeks

Goal for this Hackweek

Resources

What you might learn

Create a page with all devel:languages:perl packages and their versions by tinita

Description

Goals

Resources

Results

Day 1

Day 2

Day 3

Day 4

performance

dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari

Description

Goals

Resources

Progresses

RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

Description

Goals

Resources