I recently hit the following use case in my project: I have a function RunAllPasses(obj)
, which runs a list of transformation passes on obj
. All passes are independent from each other, so one can run them in any order. The problem is, I want to easily add new passes to the list of passes.
Of course one can manually maintain the list of passes, and call each of them. But this results in quite a bit of boilerplate code needed for each pass, and a lot of header files with each file merely having one function declaration for the pass.
Can we have less boilerplate code?
One intuitive direction is to have each pass “register” itself into a pass list at program initialization time, through the help of a global variable. For example, if one writes
1 | // WARNING: It's probably not a good idea to use this... |
Then the constructor of g_registerMyPass
would automatically run when the program starts, and push the pass into a global pass list. The RunAllPasses
function can then simply run each pass in the pass list.
However, this approach turns out to be the source of a stream of problems, which ultimately forced me to give up this approach. Long story short, let’s start with the experiment that led me to my conclusion.
Linker: The Deal-Breaker
Create a mini project with two C++ files, a.cpp
and b.cpp
.
a.cpp
simply declares a global variable that has a constructor, which prints a message:
1 |
|
b.cpp
is just the main()
function:
1 |
|
Now, run the program (the compiler and linker doesn’t matter, at least for the few I tried):
1 | clang++ a.cpp -c -std=c++17 |
and we get the expected output of In constructor S
followed by In main
. This shows that the C++ compiler indeed took care to preserve the global variable s
from being pruned by the linker even if it is unused, which is good.
But if we make a.cpp
a library, things break!
1 | clang++ a.cpp -c -std=c++17 |
After further investigation, it turns out that the erratic behavior depends on whether the file a.cpp
contains any symbols that are being used by the main program. For example, adding another file c.cpp
into the static library won’t help, even if c.cpp
contains a function used by the main program. But if we change the code a bit, so that a.cpp
contains a function used by the main program, like the following:
1 | // a.cpp |
1 | // b.cpp |
Then, magically, the In constructor S
line would be printed out again.
What’s the problem? As it turns out, if none of the symbols in some file X
of a static library is directly referenced by the main program, then the file X
won’t be linked into the main program at all. And this “file-level pruning” ignores whatever “do-not-prune” annotation emitted by the C++ compiler in the file, since the file is not linked in altogether.
So I reached the conclusion that this approach is fundamentally fragile:
- The irratic behavior won’t show up if the global variable is defined in an object file, only when it is defined in static libraries.
- The irratic behavior won’t show up if the C++ file defining the global variable contains other declarations that is used by the main program.
- There is no way (AFAIK) to fix this problem other than the
-Wl,--whole-archive
linker flag, which is not only fragile, but also a bad option because it unnecessarily bloats the final executable by often a lot.
The strict triggering condition means that the irratic behavior can hide undiscovered for a long time, until it is exposed by some completely irrelevant changes (e.g., moving a file to a static library, or moving some code around) and cause a debugging nightmare.
During the process, I also learned a number of C++-standard-imposed pitfalls about global variable constructor. I will only note one interesting example below.
The following code has undefined behavior, can you see why?
1 | std::map<int> g_list; |
Answer: at the time the constructor of r
runs, the constructor of g_list
may not have run.
This is because according to C++ standard, “dynamic initialization of a non-block variable with static storage duration is unordered if the variable is an implicitly or explicitly instantiated specialization” (in our case, any instantiation of the variable r
). Since std::map
does not have a constexpr
constructor, g_list
is also dynamically initialized, so r
may be initialized before g_list
, even if g_list
“appears” to be defined before r
.
But isn’t Google Test using the same global variable trick?
The above question comes to my mind soon after I uploaded this post, so I gave it a try. The result is as expected: if I move my Google test files to a static library linked against the final unit test executable, all the tests are gone. Of course, for unit tests, there is absolutely no reason to make them a static library, so I would say Google Test made the completely correct design decision. However, for the general use cases, it seems unreasonable to silently introduce bugs when the code is linked as a static library.