Bisection is a well known method of localizing which commit caused a regression in a code repository. git-bisect is a particularly used tool for this problem in git repositories. However it is often the case that the failure is probabilistic in nature - either because we don't have a reliable reproducer of the failure and thus not reproducing a problem on a particular commit does not mean the problem is not still present there, or because of inherent variability of e.g. performance regressions. Bisection for such failures is problematic as it takes only one false result for the bisection to end up in an unrelated part of code history. So in these cases we usually have to heavily extend runtime of a reproducer or do multiple test runs or multiple bisection runs to minimize a chance of error.
The aim of the project is to implement stochastic bisection for git. I.e., a method that will count with the fact that test results at each point of code history have some error rate and provide points in code history to test to find commit in code history that is with high probability introducing the regression in the smallest possible number of tests. Then we can use this method for bisection of performance problems in our performance testing grid Marvin.
Goals for this Hackweek:
- research state of the art in stochastic problem finding (it is a method used in various fields of engineering)
- design algorithm computing next point in history to test given previous test results and their confidence
- research how git-bisect works internally
- integrate the algorithm with git-bisect
Looking for hackers with the skills:
This project is part of:
Hack Week 20
Activity
Comments
Be the first to comment!
Similar Projects
go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes
Description
The go-git library implements the git internals in pure Go, so that any Go application can handle not only Git repositories, but also lower-level primitives (e.g. packfiles, idxfiles, etc) without needing to shell out to the git binary.
The focus for this Hackweek is to fast track key improvements for the project ahead of the upstream release of Git V3, which may take place at some point next year.
Goals
- Add support for cloning SHA256 repositories.
- Decrease memory churn for very large repositories (e.g. Linux Kernel repository).
- Cut the first alpha version for
go-git/v6.
Stretch goals
- Review and update the official documentation.
- Optimise use of go-git in Fleet.
- Create RFC/example for go-git plugins to improve extensibility.
- Investigate performance bottlenecks for Blame and Status.
Resources
- https://github.com/go-git/go-git/
- https://go-git.github.io/docs/
git-fs: file system representation of a git repository by fgonzalez
Description
This project aims to create a Linux equivalent to the git/fs concept from git9. Now, I'm aware that git provides worktrees, but they are not enough for many use cases. Having a read-only representation of the whole repository simplifies scripting by quite a bit and, most importantly, reduces disk space usage. For instance, during kernel livepatching development, we need to process and analyze the source code of hundreds of kernel versions simultaneously.This is rather painful with git-worktrees, as each kernel branch requires no less than 1G of disk space.
As for the technical details, I'll implement the file system using FUSE. The project itself should not take much time to complete, but let's see where it takes me.
I'll try to keep the same design as git9, so the file system will look something like:
/mnt/git
+-- ctl
+-- HEAD
| +-- tree
| | +--files
| | +--in
| | +--head
| |
| +-- hash
| +-- msg
| +-- parent
|
+-- branch
| |
| +-- heads
| | +-- master
| | +-- [commit files, see HEAD]
| +-- remotes
| +-- origin
| +-- master
| +-- [commit files, see HEAD]
+-- object
+-- 00051fd3f066e8c05ae7d3cf61ee363073b9535f # blob contents
+-- 00051fd3f066e8c05ae7d3cf61ee363073b9535c
+-- [tree contents, see HEAD/tree]
+-- 3f5dbc97ae6caba9928843ec65fb3089b96c9283
+-- [commit files, see HEAD]
So, if you wanted to look at the commit message of the current branch, you could simply do:
cat /mnt/git/HEAD/msg
No collaboration needed. This is a solo project.
Goals
Implement a working prototype.
Measure and improve the performance if possible. This step will be the most crucial one. User space filesystems are slower by nature.
Resources
https://docs.kernel.org/filesystems/fuse/fuse.html
Mail client with mailing list workflow support in Rust by acervesato
Description
To create a mail user interface using Rust programming language, supporting mailing list patches workflow. I know, aerc is already there, but I would like to create something simpler, without integrated protocols. Just a plain user interface that is using some crates to read and create emails which are fetched and sent via external tools.
I already know Rust, but not the async support, which is needed in this case in order to handle events inside the mail folder and to send notifications.
Goals
- simple user interface in the style of
aerc, with some vim keybindings for motions and search - automatic run of external tools (like
mbsync) for checking emails - automatic run commands for notifications
- apply patch set from ML
- tree-sitter support with styles
Resources
- ratatui: user interface (https://ratatui.rs/)
- notify: folder watcher (https://docs.rs/notify/latest/notify/)
- mail-parser: parser for emails (https://crates.io/crates/mail-parser)
- mail-builder: create emails in proper format (https://docs.rs/mail-builder/latest/mail_builder/)
- gitpatch: ML support (https://crates.io/crates/gitpatch)
- tree-sitter-rust: support for mail format (https://crates.io/crates/tree-sitter)
Create a page with all devel:languages:perl packages and their versions by tinita
Description
Perl projects now live in git: https://src.opensuse.org/perl
It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.
I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta
But I never had the time to automate this.
I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).
Goals
- Have a script that updates a central repository (e.g.
https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run) - Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)
Resources
Results
Day 1
- First part of the code which retrieves data from https://src.opensuse.org/perl/_ObsPrj with submodules and creates a YAML and a JSON file.
- Repo: https://github.com/perlpunk/opensuse-perl-meta
- Also a first version of the HTML is live: https://perlpunk.github.io/opensuse-perl-meta/
Day 2
- HTML Page has now links to src.opensuse.org and the date of the last update, plus a short info at the top
- Code is now 100% covered by tests: https://app.codecov.io/gh/perlpunk/opensuse-perl-meta
- I used the modern perl
classfeature, which makes perl classes even nicer and shorter. See example - Tests
- I tried out the mocking feature of the modern Test2::V0 library which provides call tracking. See example
- I tried out comparing data structures with the new Test2::V0 library. It let's you compare parts of the structure with the
likefunction, which only compares the date that is mentioned in the expected data. example
Day 3
- Added various things to the table
- Dependencies column
- Show popup with info for cpanspec, patches and dependencies
- Added last date / commit to the data export.
Plan: With the added date / commit we can now daily check _ObsPrj for changes and only fetch the data for changed packages.
Day 4
RMT.rs: High-Performance Registration Path for RMT using Rust by gbasso
Description
The SUSE Repository Mirroring Tool (RMT) is a critical component for managing software updates and subscriptions, especially for our Public Cloud Team (PCT). In a cloud environment, hundreds or even thousands of new SUSE instances (VPS/EC2) can be provisioned simultaneously. Each new instance attempts to register against an RMT server, creating a "thundering herd" scenario.
We have observed that the current RMT server, written in Ruby, faces performance issues under this high-concurrency registration load. This can lead to request overhead, slow registration times, and outright registration failures, delaying the readiness of new cloud instances.
This Hackweek project aims to explore a solution by re-implementing the performance-critical registration path in Rust. The goal is to leverage Rust's high performance, memory safety, and first-class concurrency handling to create an alternative registration endpoint that is fast, reliable, and can gracefully manage massive, simultaneous request spikes.
The new Rust module will be integrated into the existing RMT Ruby application, allowing us to directly compare the performance of both implementations.
Goals
The primary objective is to build and benchmark a high-performance Rust-based alternative for the RMT server registration endpoint.
Key goals for the week:
- Analyze & Identify: Dive into the
SUSE/rmtRuby codebase to identify and map out the exact critical path for server registration (e.g., controllers, services, database interactions). - Develop in Rust: Implement a functionally equivalent version of this registration logic in Rust.
- Integrate: Explore and implement a method for Ruby/Rust integration to "hot-wire" the new Rust module into the RMT application. This may involve using FFI, or libraries like
rb-sysormagnus. - Benchmark: Create a benchmarking script (e.g., using
k6,ab, or a custom tool) that simulates the high-concurrency registration load from thousands of clients. - Compare & Present: Conduct a comparative performance analysis (requests per second, latency, success/error rates, CPU/memory usage) between the original Ruby path and the new Rust path. The deliverable will be this data and a summary of the findings.
Resources
- RMT Source Code (Ruby):
https://github.com/SUSE/rmt
- RMT Documentation:
https://documentation.suse.com/sles/15-SP7/html/SLES-all/book-rmt.html
- Tooling & Stacks:
- RMT/Ruby development environment (for running the base RMT)
- Rust development environment (
rustup,cargo)
- Potential Integration Libraries:
- rb-sys:
https://github.com/oxidize-rb/rb-sys - Magnus:
https://github.com/matsadler/magnus
- rb-sys:
- Benchmarking Tools:
k6(https://k6.io/)ab(ApacheBench)
dynticks-testing: analyse perf / trace-cmd output and aggregate data by m.crivellari
Description
dynticks-testing is a project started years ago by Frederic Weisbecker. One of the feature is to check the actual configuration (isolcpus, irqaffinity etc etc) and give feedback on it.
An important goal of this tool is to parse the output of trace-cmd / perf and provide more readable data, showing the duration of every events grouped by PID (showing also the CPU number, if the tasks has been migrated etc).
An example of data captured on my laptop (incomplete!!):
-0 [005] dN.2. 20310.270699: sched_wakeup: WaylandProxy:46380 [120] CPU:005
-0 [005] d..2. 20310.270702: sched_switch: swapper/5:0 [120] R ==> WaylandProxy:46380 [120]
...
WaylandProxy-46380 [004] d..2. 20310.295397: sched_switch: WaylandProxy:46380 [120] S ==> swapper/4:0 [120]
-0 [006] d..2. 20310.295397: sched_switch: swapper/6:0 [120] R ==> firefox:46373 [120]
firefox-46373 [006] d..2. 20310.295408: sched_switch: firefox:46373 [120] S ==> swapper/6:0 [120]
-0 [004] dN.2. 20310.295466: sched_wakeup: WaylandProxy:46380 [120] CPU:004
Output of noise_parse.py:
Task: WaylandProxy Pid: 46380 cpus: {4, 5} (Migrated!!!)
Wakeup Latency Nr: 24 Duration: 89
Sched switch: kworker/12:2 Nr: 1 Duration: 6
My first contribution is around Nov. 2024!
Goals
- add more features (eg cpuset)
- test / bugfix
Resources
- Frederic's public repository: https://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git/
- https://docs.kernel.org/timers/no_hz.html#testing
Progresses
isolcpus and cpusets implemented and merged in master: dynticks-testing.git commit