Project Description

For a few months, openSUSE community has the ability to host the openSUSE rpm repositories on a commercial CDN and it is in a slowly rolling out phase. There are however remaining potential bottlenecks and optimisation opportunities. My goal for this hack week is to investigate them and make reasonable progress on resolving them.

Topics that are in scope and being investigated:

  • Switch of repository metadata from gzip to the more modern (faster to decompress, smaller sizes) zstd
  • investigate zypper ref performance overheads
  • Understand Zsync and ZChunk and benchmark for tradeoffs
  • Leverage the CDN for delivering repository metadata and a mirrorlist so that the roundtrips to download.o.org can be reduced for users outside europe.
  • Investigate performance of HTTP/3 to see if it would benefit us

Goal for this Hackweek

Focus on openSUSE Tumbleweed

Looking for hackers with the skills:

opensuse cdn zypper rpm performance benchmark

This project is part of:

Hack Week 23

Activity

  • about 2 years ago: mlschroe joined this project.
  • about 2 years ago: favogt joined this project.
  • about 2 years ago: favogt liked this project.
  • about 2 years ago: dirkmueller added keyword "cdn" to this project.
  • about 2 years ago: dirkmueller added keyword "zypper" to this project.
  • about 2 years ago: dirkmueller added keyword "rpm" to this project.
  • about 2 years ago: dirkmueller added keyword "performance" to this project.
  • about 2 years ago: dirkmueller added keyword "benchmark" to this project.
  • about 2 years ago: dirkmueller added keyword "opensuse" to this project.
  • about 2 years ago: dirkmueller started this project.
  • about 2 years ago: dirkmueller originated this project.

  • Comments

    • dirkmueller
      about 2 years ago by dirkmueller | Reply

      It turns out that createrepo_c was already preparing the switch to Zstd and Zchunk, so the bulk of the work has been to fix various places in the Open Build Service and product building logic to handle that. The submissions are planned to go live on Nov 13th.

      This provides overall a 10-30% reduction in download size for repository data and removes a few hundred ms in decompression time.

    • dirkmueller
      about 2 years ago by dirkmueller | Reply

      @mlschroe made a patch to libsolv that removes ~ 500ms from the parsing time for the tumbleweed repositories: https://github.com/openSUSE/libsolv/commit/23cbed3219bd07b5c3fa1ed8a6f2fa6c478c0fdb

    • dirkmueller
      about 2 years ago by dirkmueller | Reply

      I've spend several days on profiling and tuning zchunk support for the tumbleweed usecase (updating from tumbleweed snapshots) and made some upstream contributions to createrepo_c and zchunk to allow for these tunings.

    • dirkmueller
      about 2 years ago by dirkmueller | Reply

      @mlschroe worked on reviving zsync support in libzypp's default multicurl implementation

    • dirkmueller
      about 2 years ago by dirkmueller | Reply

      I've submitted necessary changes to enable curl with http/3 support. However this requires either switching to gnutls (which the security team doesn't like) or include the quictls patches for openssl. submitted the latter and waiting for maintainer review.

    Similar Projects

    Create openSUSE images for Arm and RISC-V boards by avicenzi

    Project Description

    Create openSUSE images (or test generic EFI images) for Arm and RISC-V boards that are not yet supported.

    Goal for Hackweek

    Create bootable images of Tumbleweed for SBCs that currently have no images available or are untested.

    Consider generic EFI images where possible, as some boards can hold a bootloader.

    Document in the openSUSE Wiki how to flash and use the image for a given board.

    Hack Week 25

    Hack Week 24

    Hack Week 23

    Hack Week 22

    Hack Week 21

    Resources


    File search subcommand for zypper by mook_work

    Description

    Zypper currently only supports searching for files in a few pre-defined prefixes (/usr/bin, /usr/sbin, etc.) via using rpm provides. This means that it is difficult to find files that are not in the explicit list.

    Doing this in zypper itself seems difficult.

    Goals

    Create a zypper subcommand (plugin) that can download file lists as needed to search for file contents. As a stretch goal, have a secondary subcommand that can list files in packages regardless of whether they are already installed.

    At this point, making the resulting code usable as part of zypper itself is not a goal, as it would probably take more time than available.

    Resources

    Upstream issue: https://github.com/openSUSE/zypper/issues/469

    Stretch goal: https://github.com/openSUSE/zypper/issues/164

    Subcommands: https://manpages.opensuse.org/Tumbleweed/zypper/zypper.8.en.html#SUBCOMMANDS


    Kudos aka openSUSE Recognition Platform by lkocman

    Description

    Relevant blog post at news-o-o

    I started the Kudos application shortly after Leap 16.0 to create a simple, friendly way to recognize people for their work and contributions to openSUSE. There’s so much more to our community than just submitting requests in OBS or gitea we have translations (not only in Weblate), wiki edits, forum and social media moderation, infrastructure maintenance, booth participation, talks, manual testing, openQA test suites, and more!

    Goals

    • Kudos under github.com/openSUSE/kudos with build previews aka netlify

    • Have a kudos.opensuse.org instance running in production

    • Build an easy-to-contribute recognition platform for the openSUSE community a place where everyone can send and receive appreciation for their work, across all areas of contribution.

    • In the future, we could even explore reward options such as vouchers for t-shirts or other community swag, small tokens of appreciation to make recognition more tangible.

    Resources

    (Do not create new badge requests during hackweek, unless you'll make the badge during hackweek)


    A CLI for Harvester by mohamed.belgaied

    Harvester does not officially come with a CLI tool, the user is supposed to interact with Harvester mostly through the UI. Though it is theoretically possible to use kubectl to interact with Harvester, the manipulation of Kubevirt YAML objects is absolutely not user friendly. Inspired by tools like multipass from Canonical to easily and rapidly create one of multiple VMs, I began the development of Harvester CLI. Currently, it works but Harvester CLI needs some love to be up-to-date with Harvester v1.0.2 and needs some bug fixes and improvements as well.

    Project Description

    Harvester CLI is a command line interface tool written in Go, designed to simplify interfacing with a Harvester cluster as a user. It is especially useful for testing purposes as you can easily and rapidly create VMs in Harvester by providing a simple command such as: harvester vm create my-vm --count 5 to create 5 VMs named my-vm-01 to my-vm-05.

    asciicast

    Harvester CLI is functional but needs a number of improvements: up-to-date functionality with Harvester v1.0.2 (some minor issues right now), modifying the default behaviour to create an opensuse VM instead of an ubuntu VM, solve some bugs, etc.

    Github Repo for Harvester CLI: https://github.com/belgaied2/harvester-cli

    Done in previous Hackweeks

    • Create a Github actions pipeline to automatically integrate Harvester CLI to Homebrew repositories: DONE
    • Automatically package Harvester CLI for OpenSUSE / Redhat RPMs or DEBs: DONE

    Goal for this Hackweek

    The goal for this Hackweek is to bring Harvester CLI up-to-speed with latest Harvester versions (v1.3.X and v1.4.X), and improve the code quality as well as implement some simple features and bug fixes.

    Some nice additions might be: * Improve handling of namespaced objects * Add features, such as network management or Load Balancer creation ? * Add more unit tests and, why not, e2e tests * Improve CI * Improve the overall code quality * Test the program and create issues for it

    Issue list is here: https://github.com/belgaied2/harvester-cli/issues

    Resources

    The project is written in Go, and using client-go the Kubernetes Go Client libraries to communicate with the Harvester API (which is Kubernetes in fact). Welcome contributions are:

    • Testing it and creating issues
    • Documentation
    • Go code improvement

    What you might learn

    Harvester CLI might be interesting to you if you want to learn more about:

    • GitHub Actions
    • Harvester as a SUSE Product
    • Go programming language
    • Kubernetes API
    • Kubevirt API objects (Manipulating VMs and VM Configuration in Kubernetes using Kubevirt)


    Create a page with all devel:languages:perl packages and their versions by tinita

    Description

    Perl projects now live in git: https://src.opensuse.org/perl

    It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.

    I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta

    But I never had the time to automate this.

    I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).

    Goals

    • Have a script that updates a central repository (e.g. https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run)
    • Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)

    Resources

    Results

    Day 1

    Day 2

    • HTML Page has now links to src.opensuse.org and the date of the last update, plus a short info at the top
    • Code is now 100% covered by tests: https://app.codecov.io/gh/perlpunk/opensuse-perl-meta
    • I used the modern perl class feature, which makes perl classes even nicer and shorter. See example
    • Tests
      • I tried out the mocking feature of the modern Test2::V0 library which provides call tracking. See example
      • I tried out comparing data structures with the new Test2::V0 library. It let's you compare parts of the structure with the like function, which only compares the date that is mentioned in the expected data. example

    Day 3

    • Added various things to the table
      • Dependencies column
      • Show popup with info for cpanspec, patches and dependencies
      • Added last date / commit to the data export.

    Plan: With the added date / commit we can now daily check _ObsPrj for changes and only fetch the data for changed packages.

    Day 4


    Switch software-o-o to store repomd in a database by hennevogel

    Description

    The openSUSE Software portal is a web app to explore binary packages of openSUSE distributions. Kind of like an package manager / app store.

    https://software.opensuse.org/

    This app has been around forever (August 2007) and it's architecture is a bit brittle. It acts as a frontend to the OBS distributions and published binary search APIs, calculates and caches a lot of stuff in memory and needs code changes nearly every openSUSE release to keep up.

    As you can imagine, it's a heavy user of the OBS API, especially when caches are cold.

    Goals

    I want to change the app to cache repomod data in a (postgres) database structure

    • Distributions have many Repositories
    • Repositories have many Packages
    • Packages have many Patches

    The UI workflows will be as following

    • As an admin I setup Distribution and it's repositories
    • As an admin I sync all repositories repomd files into to the database
    • As a user I browse a Distribution by category
    • As a user I search for Package of a Distribution in it's Repositories
    • As a user I extend the search to Package build on OBS for this Distribution

    This has a couple of pro's:

    • Less traffic on the OBS API as the usual Packages are inside the database
    • Easier base to add features to this page. Like comments, ratings, openSUSE specific screenshots etc.
    • Separating the Distribution package search from searching through OBS will hopefully make more clear for newbies that enabling extra repositories is kind of dangerous.

    And one con:

    • You can't search for packages build for foreign distributions with this app anymore (although we could consume their repomd etc. but I doubt we have the audience on an opensuse.org domain...)

    TODO

    • add-emoji Introduce a PG database
    • add-emoji Add clockworkd as scheduler and delayed_job as ActiveJob backend
    • add-emoji Introduce ActiveStorage
    • add-emoji Build initial data model
    • add-emoji Introduce repomd to database sync
      • add-emoji Adapt repomd sync to Leap 16.0 repomod layout changes (single arch, no update repo)
      • add-emoji Make repomd sync idempotent
    • add-emoji Introduce database search
    • add-emoji Setup foreman to run rails s and rake jobs:workoff
    • Adapt UI
      • add-emoji Build Category Browsing
      • add-emoji Build Admin Distribution CRUD interface


    RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

    Description

    The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.

    We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.

    This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.

    The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.

    Goals

    The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.

    Key goals for the week:

    1. Analyze & Identify: Dive into the SUSE/rmt Ruby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions).
    2. Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
    3. Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like rb-sys or magnus.
    4. Benchmark: Create a benchmarking script (e.g., using k6, ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients.
    5. Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.

    Resources

    • RMT Source Code (Ruby):
      • https://github.com/SUSE/rmt
    • RMT Documentation:
      • https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
    • Tooling & Stacks:
      • RMT/Ruby development environment (for running the base RMT)
      • Rust development environment (rustup, cargo)
    • Potential Integration Libraries:
      • rb-sys: https://github.com/oxidize-rb/rb-sys
      • Magnus: https://github.com/matsadler/magnus
    • Benchmarking Tools:
      • k6 (https://k6.io/)
      • ab (ApacheBench)


    dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari

    Description

    dynticks-testing is a project started years ago by Frederic Weisbecker. One of the feature is to check the actual configuration (isolcpus, irqaffinity etc etc) and give feedback on it.

    An important goal of this tool is to parse the output of trace-cmd / perf and provide more readable data, showing the duration of every events grouped by PID (showing also the CPU number, if the tasks has been migrated etc).

    An example of data captured on my laptop (incomplete!!):

              -0     [005] dN.2. 20310.270699: sched_wakeup:         WaylandProxy:46380 [120] CPU:005
              -0     [005] d..2. 20310.270702: sched_switch:         swapper/5:0 [120] R ==> WaylandProxy:46380 [120]
    ...
        WaylandProxy-46380 [004] d..2. 20310.295397: sched_switch:         WaylandProxy:46380 [120] S ==> swapper/4:0 [120]
              -0     [006] d..2. 20310.295397: sched_switch:         swapper/6:0 [120] R ==> firefox:46373 [120]
             firefox-46373 [006] d..2. 20310.295408: sched_switch:         firefox:46373 [120] S ==> swapper/6:0 [120]
              -0     [004] dN.2. 20310.295466: sched_wakeup:         WaylandProxy:46380 [120] CPU:004
    

    Output of noise_parse.py:

    Task: WaylandProxy Pid: 46380 cpus: {4, 5} (Migrated!!!)
            Wakeup Latency                                Nr:        24     Duration:          89
            Sched switch: kworker/12:2                    Nr:         1     Duration:           6
    

    My first contribution is around Nov. 2024!

    Goals

    • add more features (eg cpuset)
    • test / bugfix

    Resources

    Progresses

    isolcpus and cpusets implemented and merged in master: dynticks-testing.git commit