(Idea originally suggested by Mel Gorman)
Currently, a lot of the analysis in MMTests is very light - mostly just presents ratios for the interpretation of the user and the and the graph generation is typically poor. For example, the graphs do not even have error bars because it cluttered too badly at the time. This is a problem when the results are not very stable and we don't know if we gathered enough data to be confident of the results.
The statistical analysis should be thus improved. Now it's based mostly on perl and gnuplot. I would like to try connecting the results processing to R, which I know better, and which should support better analysis out of the box. I intend to start with printing confidence intervals to give initial idea of the results in the textual output, and add support for various types of graphs of both raw data and their statistical summary. Then see where it goes.
Another welcome improvement would be to connect the analysis of the results with the decision to repeat the test until more data is gathered and better confidence achieved.
Status after hackweek
I hoped I would achieve more, but it took some time to figure out how the reporting is currently done, and how to plug in my changes without too much disruption. I also finally realized that just looking at the Perl code won't help if I don't know even the very basics of how variables work etc :) I found this tutorial quite helpful.
So after the hack week, R is integrated for producing few simple plots, and textual summary of WalltimeOutliers datatype. The details follow.
The goal was to initially add R support as an optional way to produce both textual results comparison and plots. The subgoals included:
- transparent fallback: since the current plotting is quite complex and would take time to fully reproduce in R, the idea was that when the use of R is requested, the scripts would still detect not-yet-supported plots and fallback to the old scripts.
- remove most of processing done in Perl: Currently the Perl scripts not only extract data from various types of benchmark outputs, but also precalculate e.g. candlestick plots for gnuplot. Ideally, the extracted data would be transformed to one table per benchmark, which would be fully loaded and processed in R, both for text summaries and plots.
Textual summaries processing and printing
These goals resulted in the following changes:
- compare-kernels.sh has new --R parameter for requesting R processing (both text and plots)
- bin/compare-mmtests-R.sh is a new script called instead of bin/compare-mmtests.pl when R is requested. It uses extract-mmtests.pl --print-header for creating tables of raw data, and prepares an R script which includes bin/lib/R/stats.R library, loads the raw data and produces a summary csv file. Currently only "WalltimeOutliers" datatype is supported, otherwise the script falls back to the perl version.
- bin/compare-mmtests.pl and bin/lib/MMTests/Extract.pm was extended to support parsing and printing the R-made summary csv file instead of extracting and summarizing raw data. This is triggered by a new --R-summary=/path/to/summary.csv parameter. The differences of results between kernels (e.g. "pndiff") is still done by Perl, but can be easily extended to be handled by R as well, since the R script has results from all kernels loaded at once, which allows more detailed comparisons.
The plotting was extended as follows.
- bin/graph-mmtests.sh has new --R parameter (passed from compare-kernel.sh), which will make it extract raw data and call new bin/plot-R script instead of bin/plot
- bin/plot-R prepares a temporary R script which includes bin/lib/R/plot.R library, loads raw data and produces a plot. Currently, boxplot/candlestick and run-sequence graph types are supported, with fallback to perl+gnuplot plotting.
In addition, the following first steps were done in the direction of more detailed plots for examining results from benchmarks with unstable results (such as ku_latency).
- compare-kernels.sh has new --plot-details parameters which (together with --R) produces additional plots. For now it's run-sequence plots for each kernel separately.
- bin/graph-mmtests.sh has new (--R only) parameters: --separate-tests for producing separate plots per kernel, --plottype to override plot type declared by benchmark. This is passed down to plot-R.
- To speed up producing multiple plots from the same data (including smooth, ps/png versions), compare-kernels.sh creates a temporary RTMPDIR directory which is preserved accross calls to bin/graph-mmtests.sh. Only the first plot has to extract raw data. The first R script that loads them will save them in native R format to $RTMPDIR/$SUBREPORT.Rdata, which will be detected on subsequent plots and loaded instead of another extraction and loading.
There is still much to do before R can fully replace perl/gnuplot (if desired), and give some benefits over that through better plots and stats.
- feature parity with perl/gnuplot: currently R scripts don't handle many types of plots, datatypes, and even results with subheadings (more than one type of result coming from a single benchmark). Ideally it should preserve the way of dumping all results to R and processing everything there, instead of Perl pre-processing.
- more plots for exploratory data analysis
- confidence intervals and other stats
- replacing kernel comparison (e.g. "pndiff") with R-based solution
This project is part of:
Hack Week 10