Neural fuzzing: applying DNN to software security testing
By William Blum, Microsoft Development Lead
November 14, 2017
have developed a new method for discovering software security
vulnerabilities that uses machine learning and deep neural networks
to help the system root out bugs better by learning from past
experience. This new research project, called
is designed to augment traditional fuzzing techniques, and early
experiments have demonstrated promising results.
testing is a hard task that is traditionally done by security
experts through costly and targeted code audits, or by using very
specialized and complex security tools to detect and assess
vulnerabilities in code. We recently released a tool, called
Microsoft Security Risk
that significantly simplifies security testing and does not require
you to be an expert in security in order to root out software bugs.
The Azure-based tool is available to Windows users and in preview
for Linux users.
The key technology underpinning Microsoft Security Risk Detection is
fuzz testing, or fuzzing. It’s a program analysis technique
that looks for inputs causing error conditions that have a high
chance of being exploitable, such as buffer overflows, memory access
violations and null pointer dereferences.
Fuzzers come in
fuzzers, also called “dumb fuzzers,” rely solely on the
sample input files to generate new inputs.
fuzzers analyze the target program either statically or
dynamically to guide the search for new inputs aimed at
exploring as many code paths as possible.
fuzzers, just like blackbox fuzzers, don’t have any
knowledge of the structure of the target program, but make use
of a feedback loop to guide their search based on observed
behavior from previous executions of the program.
Figure 1 – Crashes
reported by AFL. Experimental support in MSRD
Earlier this year, Microsoft researchers including myself,
and Mohit Rajpal, began a research project looking at ways to
improve fuzzing techniques using machine learning and deep neural
networks. Specifically, we wanted to see what a machine learning
model could learn if we were to insert a deep neural network into
the feedback loop of a greybox fuzzer.
For our initial
experiment, we looked at whether we could learn over time by
observing past fuzzing iterations of an existing fuzzer.
We applied our
methods to a type of greybox fuzzer called American fuzzy lop, or
We tried four
different types of neural networks and ran the experiment on four
target programs, using parsers for four different file formats:
The results were very
encouraging—we saw signiﬁcant improvements over traditional AFL in
terms of code coverage, unique code paths and crashes for the four
- The AFL system
using deep neural networks based on the
neural network model gives around 10 percent improvement in code
coverage over traditional AFL for two files parsers: ELF and PNG.
- When looking at
unique code paths, neural AFL discovered more unique paths than
traditional AFL for all parsers except PDF. For the PNG parser,
after 24 hours of fuzzing it found twice as many unique code
paths as traditional AFL.
Figure 2 – Input
gain over time (in hours) for the libpng file parser.
- A good way to
evaluate fuzzers is to compare the number of crashes reported.
For the ELF file parser, neural AFL reported more than 20
crashes whereas traditional AFL did not report any. This is
astonishing given that neural AFL was trained on AFL itself. We
also observed more crashes being reported for text-based file
formats like XML, where neural AFL could find 38 percent more
crashes than traditional AFL. For PDF, traditional AFL did
overall better than neural AFL in terms of new code paths found.
However, neither system reported any crashes.
Figure 3 – Reported
crashes over time (in hours) for readelf (left) and libxml
Overall, using neural
fuzzing outperformed traditional AFL in every instance except the
PDF case, where we suspect the large size of the PDF files incurs
noticeable overhead when querying the neural model.
In general, we
believe our neural fuzzing approach yields a novel way to perform
greybox fuzzing that is simple, efficient and generic.
The search is not based on sophisticated hand-crafted heuristics
— the system learns a strategy from an existing fuzzer. We just
give it sequences of bytes and let it figure out all sorts of
features and automatically generalize from them to predict which
types of inputs are more important than others and where the
fuzzer’s attention should be focused.
Efficient: In our AFL experiment, in the first 24 hours
we explored significantly more unique code paths than
traditional AFL. For some parsers we even report crashes not
already reported by AFL.
Although we’ve tested it only on AFL, our approach could be
applied to any fuzzer, including blackbox and random fuzzers.
We believe our neural
fuzzing research project is just scratching the surface of what can
be achieved using deep neural networks for fuzzing. Right now, our
model only learns fuzzing locations, but we could also use it to
learn other fuzzing parameters such as the type of mutation or
strategy to apply. We are also considering online versions of our
machine learning model, in which the fuzzer constantly learns from
ongoing fuzzing iterations.