The afl fuzzer is an interesting tool that uses compile time instrumentation and genetic algorithms to automatically produce test cases that will trigger different code paths in binary...
http://lcamtuf.coredump.cx/afl/
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 13
Activity
Comments
-
about 9 years ago by metan | Reply
American Fuzzy lop (AFL)
What is it?
AFL is a brute-force software fuzzer which feeds a given binary modified input files in order to provoke crashes or other unexpected results. The novelty of the approach AFL uses is the binary instrumentation that gives feedback to the fuzzer. The feedback is used to keep only minimal subset of the newly generated inputs for further mutations[1] as well as to trim the inputs to be minimal in size. This allows AFL to quickly discover interesting input files without knowing anything about the overall file structure. As a matter of fact it's even able to synthesise correct JPEG file header starting with zero filled file and instrumented JPEG parser.
[1] Minimal in regard to the code paths coverage.
How does it work?
AFL instruments the target binary using gcc wrappers at assembler level to keep track of the binary execution path. To make this possible the target binary/library must be recompiled with instrumentation, which is usually the most troublesome part of the process.
Once this is settled the rest of the process is as easy as starting the fuzzer with a path to starting set of input files, directory to store the output to, and the binary we want to fuzz. The input files AFL starts with should be fairly minimal and the files should be small, ideally less than 1 Kb. Since larger the files the longer it takes to stumble upon interesting mutations that cause different code paths in the target binary.
How can we use it?
Running the fuzzer in our automated testing framework does not seem to be plausible and that is because of two reasons. The first one is the need to recompile everything we want to fuzz and that the instrumentation works only on i686 and x86_64. The second problem is speed, even with carefully chosen starting set of input files doing complete cycle for libjpeg still takes 24 hours on machine with 24 Intel Xenon E5649 CPUs.
What could be done, on the other hand, is to use AFL to create corpus to be used for further testing. The run with instrumented libjpeg created roughly 5000 different input files that can trigger different code paths in the library. Feeding these files to a program such as 'jpegtopnm' and watching for segfaults and hangs takes just one minute.
Shortcomings
The mutation engine in AFL is designed to primarily work with binary formats which restrict the use of the tool only on these. There is a support for a dictionary that, when filled with keywords, should make AFL usable with human-readable file formats as well. But I haven't tried that.
File formats that contains header checksums and reject file when it does not match are prone to avoiding majority of the inputs since most of the random mutations would be rejected. The AFL author advises to disable the checksum tests in the source code before it's recompiled with instrumentation.
Similar Projects
This project is one of its kind!