Bisection is a well known method of localizing which commit caused a regression in a code repository. git-bisect is a particularly used tool for this problem in git repositories. However it is often the case that the failure is probabilistic in nature - either because we don't have a reliable reproducer of the failure and thus not reproducing a problem on a particular commit does not mean the problem is not still present there, or because of inherent variability of e.g. performance regressions. Bisection for such failures is problematic as it takes only one false result for the bisection to end up in an unrelated part of code history. So in these cases we usually have to heavily extend runtime of a reproducer or do multiple test runs or multiple bisection runs to minimize a chance of error.
The aim of the project is to implement stochastic bisection for git. I.e., a method that will count with the fact that test results at each point of code history have some error rate and provide points in code history to test to find commit in code history that is with high probability introducing the regression in the smallest possible number of tests. Then we can use this method for bisection of performance problems in our performance testing grid Marvin.
Goals for this Hackweek:
- research state of the art in stochastic problem finding (it is a method used in various fields of engineering)
- design algorithm computing next point in history to test given previous test results and their confidence
- research how git-bisect works internally
- integrate the algorithm with git-bisect
This project is part of:
Hack Week 20