C++ Inliner Improvements:
By David Hartglass,
January 13, 2020
Studio 2019 versions 16.3 and 16.4 include improvements to the C++
inliner. Among these is the ability to inline some routines after they
have been optimized, referred to as the “Zipliner.” Depending on your
application, you may see some minor code quality improvements and/or
major build-time (compiler throughput) improvements.
Terry Mahaffey has provided an
overview of Visual Studio’s inlining decisions. This details some of the
inliner’s constraints and areas for improvement, a few of which are
particularly relevant here:
The inliner is recursive and may often re-do work it has already done.
Inline decisions are context sensitive and it is not always profitable
to replay its decision-making for the same function.
The inliner is very budget
conscious. It has the difficult job of balancing executable size with
The inliner’s view of the world is
always “pre-optimized.” It has very limited knowledge of copy
propagation and dead control paths for example.
Unfortunately, many of the coding
patterns and idioms common to heavy generic programming bump into those
constraints. Consider the following routine in the Eigen library:
That instantiation of outerStride
does nothing but return one of its members. Therefore, it is an
excellent candidate for full inline expansion. To realize this win
though the compiler must fully evaluate and expand outerStride’s 18
total callees, for every callsite of outerStride in the module. This
eats into both the optimizer throughput as well as the inliner’s
code-size budget. It also bears mentioning that calls to ‘rows’ and
‘cols’ are inline-expanded as well, even though those are on a
statically dead path.
It would be much better if the optimizer just inlined the two-line
Inlining Optimized IR
For a subset of routines the inliner will now expand the
already-optimized IR of a routine, bypassing the process of fetching IR,
and re-expanding callees. This has the dual purpose of expanding
callsites much faster, as well as letting the inliner measure its budget
First, the optimizer will summarize that outerStride is a candidate for
this faster expansion when it is originally compiled (Remember that
c2.dll tries to compile routines before their callers). Then, the
inliner may replace calls to that outerStride instantiation with the
The candidates for this faster inline expansion are leaf functions with
no locals, which refer to at most two different arguments, globals, or
constants. In practice this targets most simple getters and setters.
are many examples like outerStride in the Eigen library where a large
call tree expands into just one or two instructions. Modules that make
heavy use of Eigen may see a significant throughput improvement; we
measured the optimizer taking up to 25-50% less time for such repros.
The new Zipliner will also enable the inliner to measure its budget more
accurately. Eigen developers have long been aware that MSVC does not
inline to their specifications (see EIGEN_STRONG_INLINE). Zipliner
should help to alleviate some of this concern, as a ziplined routine is
now considered a virtually “free” inline.
Give the feature a try
This is enabled by default in Visual Studio 2019 16.3, along with some
improvements in 16.4. Please download Visual Studio 2019 and give the
new improvements a try. We can be reached via the comments below or via
email (firstname.lastname@example.org). If you encounter problems with Visual
Studio or MSVC, or have a suggestion for us, please let us know through
Help > Send Feedback > Report A Problem / Provide a Suggestion in the
product, or via Developer Community. You can also find us on Twitter (@VisualC).