SUSE Hack Week: Git stochastic bisection

Bisection is a well known method of localizing which commit caused a regression in a code repository. git-bisect is a particularly used tool for this problem in git repositories. However it is often the case that the failure is probabilistic in nature - either because we don't have a reliable reproducer of the failure and thus not reproducing a problem on a particular commit does not mean the problem is not still present there, or because of inherent variability of e.g. performance regressions. Bisection for such failures is problematic as it takes only one false result for the bisection to end up in an unrelated part of code history. So in these cases we usually have to heavily extend runtime of a reproducer or do multiple test runs or multiple bisection runs to minimize a chance of error.

The aim of the project is to implement stochastic bisection for git. I.e., a method that will count with the fact that test results at each point of code history have some error rate and provide points in code history to test to find commit in code history that is with high probability introducing the regression in the smallest possible number of tests. Then we can use this method for bisection of performance problems in our performance testing grid Marvin.

Goals for this Hackweek:

research state of the art in stochastic problem finding (it is a method used in various fields of engineering)
design algorithm computing next point in history to test given previous test results and their confidence
research how git-bisect works internally
integrate the algorithm with git-bisect

Join this project Leave this project

Looking for hackers with the skills:

git bisection performance statistics

This project is part of:

Hack Week 20

Activity

almost 5 years ago: mkubecek liked this project.

almost 5 years ago: okurz liked this project.

almost 5 years ago: llansky3 liked this project.

almost 5 years ago: jankara started this project.

almost 5 years ago: jankara added keyword "statistics" to this project.

almost 5 years ago: jankara added keyword "git" to this project.

almost 5 years ago: jankara added keyword "bisection" to this project.

almost 5 years ago: jankara added keyword "performance" to this project.

almost 5 years ago: jankara originated this project.

Comments

Be the first to comment!

Similar Projects

git

Mail client with mailing list workflow support in Rust by acervesato

Description

To create a mail user interface using Rust programming language, supporting mailing list patches workflow. I know, aerc is already there, but I would like to create something simpler, without integrated protocols. Just a plain user interface that is using some crates to read and create emails which are fetched and sent via external tools.

I already know Rust, but not the async support, which is needed in this case in order to handle events inside the mail folder and to send notifications.

Goals

simple user interface in the style of aerc, with some vim keybindings for motions and search
automatic run of external tools (like mbsync) for checking emails
automatic run commands for notifications
apply patch set from ML
tree-sitter support with styles

Resources

ratatui: user interface (https://ratatui.rs/)
notify: folder watcher (https://docs.rs/notify/latest/notify/)
mail-parser: parser for emails (https://crates.io/crates/mail-parser)
mail-builder: create emails in proper format (https://docs.rs/mail-builder/latest/mail_builder/)
gitpatch: ML support (https://crates.io/crates/gitpatch)
tree-sitter-rust: support for mail format (https://crates.io/crates/tree-sitter)

Create a page with all devel:languages:perl packages and their versions by tinita

Description

Perl projects now live in git: https://src.opensuse.org/perl

It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.

I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta

But I never had the time to automate this.

I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).

Goals

Have a script that updates a central repository (e.g. https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run)
Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)

Resources

Results

Day 1

First part of the code which retrieves data from https://src.opensuse.org/perl/_ObsPrj with submodules and creates a YAML and a JSON file.
Repo: https://github.com/perlpunk/opensuse-perl-meta
Also a first version of the HTML is live: https://perlpunk.github.io/opensuse-perl-meta/

Day 2

HTML Page has now links to src.opensuse.org and the date of the last update, plus a short info at the top
Code is now 100% covered by tests: https://app.codecov.io/gh/perlpunk/opensuse-perl-meta
I used the modern perl class feature, which makes perl classes even nicer and shorter. See example
Tests
- I tried out the mocking feature of the modern Test2::V0 library which provides call tracking. See example
- I tried out comparing data structures with the new Test2::V0 library. It let's you compare parts of the structure with the like function, which only compares the date that is mentioned in the expected data. example

Day 3

Added various things to the table
- Dependencies column
- Show popup with info for cpanspec, patches and dependencies
- Added last date / commit to the data export.

Plan: With the added date / commit we can now daily check _ObsPrj for changes and only fetch the data for changed packages.

Day 4

go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes

Description

The go-git library implements the git internals in pure Go, so that any Go application can handle not only Git repositories, but also lower-level primitives (e.g. packfiles, idxfiles, etc) without needing to shell out to the git binary.

The focus for this Hackweek is to fast track key improvements for the project ahead of the upstream release of Git V3, which may take place at some point next year.

Goals

Add support for cloning SHA256 repositories.
Decrease memory churn for very large repositories (e.g. Linux Kernel repository).
Cut the first alpha version for go-git/v6.

Stretch goals

Review and update the official documentation.
Optimise use of go-git in Fleet.
Create RFC/example for go-git plugins to improve extensibility.
Investigate performance bottlenecks for Blame and Status.

Resources

https://github.com/go-git/go-git/
https://go-git.github.io/docs/

git-fs: file system representation of a git repository by fgonzalez

Description

This project aims to create a Linux equivalent to the git/fs concept from git9. Now, I'm aware that git provides worktrees, but they are not enough for many use cases. Having a read-only representation of the whole repository simplifies scripting by quite a bit and, most importantly, reduces disk space usage. For instance, during kernel livepatching development, we need to process and analyze the source code of hundreds of kernel versions simultaneously.This is rather painful with git-worktrees, as each kernel branch requires no less than 1G of disk space.

As for the technical details, I'll implement the file system using FUSE. The project itself should not take much time to complete, but let's see where it takes me.

I'll try to keep the same design as git9, so the file system will look something like:


/mnt/git
      +-- ctl
      +-- HEAD
      |    +-- tree
      |    |    +--files
      |    |    +--in
      |    |    +--head
      |    |
      |    +-- hash
      |    +-- msg
      |    +-- parent
      |
      +-- branch
      |      |
      |      +-- heads
      |      |      +-- master
      |      |            +-- [commit files, see HEAD]
      |      +-- remotes
      |             +-- origin
      |                     +-- master
      |                            +-- [commit files, see HEAD]
      +-- object
            +-- 00051fd3f066e8c05ae7d3cf61ee363073b9535f # blob contents
            +-- 00051fd3f066e8c05ae7d3cf61ee363073b9535c
                  +-- [tree contents, see HEAD/tree]
            +-- 3f5dbc97ae6caba9928843ec65fb3089b96c9283
                  +-- [commit files, see HEAD]

So, if you wanted to look at the commit message of the current branch, you could simply do:

cat /mnt/git/HEAD/msg

No collaboration needed. This is a solo project.

Goals

Implement a working prototype.
Measure and improve the performance if possible. This step will be the most crucial one. User space filesystems are slower by nature.

Resources

https://orib.dev/git9.html

https://docs.kernel.org/filesystems/fuse/fuse.html

performance

dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari

Description

dynticks-testing is a project started years ago by Frederic Weisbecker. One of the feature is to check the actual configuration (isolcpus, irqaffinity etc etc) and give feedback on it.

An important goal of this tool is to parse the output of trace-cmd / perf and provide more readable data, showing the duration of every events grouped by PID (showing also the CPU number, if the tasks has been migrated etc).

An example of data captured on my laptop (incomplete!!):

          -0     [005] dN.2. 20310.270699: sched_wakeup:         WaylandProxy:46380 [120] CPU:005
          -0     [005] d..2. 20310.270702: sched_switch:         swapper/5:0 [120] R ==> WaylandProxy:46380 [120]
...
    WaylandProxy-46380 [004] d..2. 20310.295397: sched_switch:         WaylandProxy:46380 [120] S ==> swapper/4:0 [120]
          -0     [006] d..2. 20310.295397: sched_switch:         swapper/6:0 [120] R ==> firefox:46373 [120]
         firefox-46373 [006] d..2. 20310.295408: sched_switch:         firefox:46373 [120] S ==> swapper/6:0 [120]
          -0     [004] dN.2. 20310.295466: sched_wakeup:         WaylandProxy:46380 [120] CPU:004

Output of noise_parse.py:

Task: WaylandProxy Pid: 46380 cpus: {4, 5} (Migrated!!!)
        Wakeup Latency                                Nr:        24     Duration:          89
        Sched switch: kworker/12:2                    Nr:         1     Duration:           6

My first contribution is around Nov. 2024!

Goals

add more features (eg cpuset)
test / bugfix

Resources

Frederic's public repository: https://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git/
https://docs.kernel.org/timers/no_hz.html#testing

Progresses

isolcpus and cpusets implemented and merged in master: dynticks-testing.git commit

RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso

Description

The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.

We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.

This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.

The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.

Goals

The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.

Key goals for the week:

Analyze & Identify: Dive into the SUSE/rmt Ruby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions).
Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like rb-sys or magnus.
Benchmark: Create a benchmarking script (e.g., using k6, ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients.
Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.

Resources

RMT Source Code (Ruby):
- https://github.com/SUSE/rmt
RMT Documentation:
- https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
Tooling & Stacks:
- RMT/Ruby development environment (for running the base RMT)
- Rust development environment (rustup, cargo)
Potential Integration Libraries:
- rb-sys: https://github.com/oxidize-rb/rb-sys
- Magnus: https://github.com/matsadler/magnus
Benchmarking Tools:
- k6 (https://k6.io/)
- ab (ApacheBench)