Quite a while ago I was implementing an interpreter. A common task in the interpreter is to select the correct interpreter function based on the type of the input. Let’s say we want to implement an addition. We might end up with something like below:
1 | template<typename T> |
to implement the operation. At runtime, we want to dispatch to the right function base on the type of the operands. A natural way to do this is to have a static array holding the mapping from the operand type to the function pointer, similar to below:
1 | enum Type { Int32, Int64, Double, ... }; |
so at runtime we can just read x_addOpPtr[operandType]
to obtain the function pointer we want to call.
The X Macro
Although the code above can work, it is clearly too error prone. If we accidentally made a mistake in the order of the list, we are screwed. A better way is the X Macro pattern. We define a “X macro” for all the types:
1 |
|
Then, by defining what X(EnumType, CppType)
expands to, we can create logic based on our needs. For example, the following code would reproduce the x_addOpPtr
array we want:
1 |
|
Note that the final nullptr
is needed because our expansion (void*)Add<CppType>,
would generate an extra comma in the end.
The New Challenge
X Macro solved the above problem, but what if we want to handle, say, a type cast opcode?
1 | template<typename Src, typename Dst> |
Unlike the addition operator, now we have two types Src
and Dst
to enumerate on, so we have to generate a two-dimensional array. While X Macro can easily iterate through one list and perform action on every item, it cannot iterate through the Cartesian product of two lists. A worse solution, is of course, to manually define a list containing all the <Src, Dst>
pairs, so we can do X macro again. But what if we want to do a three-dimensional Cartesian product in the future?
After some fruitless Googling and home-making attempts to build a “two dimensional X Macro”, I eventually gave up and switched to an ugly solution. Instead of generating a clean static array, we generate a tree of templated dispatching functions. The function at the i-th
level use a dispatch array (built by X macro) to dispatch to the next level’s selector function based on the i-th
parameter type. We get the function pointer when we reach the leaf. While this approach works, no doubt it is very ugly, and probably also less performant (I didn’t check if the C++ compiler were able to optimize away all the terrible things).
The FOR_EACH Macro
I used to believe my ugly solution is as good as one can get without resorting to manually enumerating the Cartesian product. However, today I learnt an interesting approach from David Mazieres, which he calls the FOR_EACH
macro.
The semantics of the FOR_EACH
macro is pretty clear. Taking a macro X
(similar to the X
in X Macro) and a comma-separated list of elements e1, e2, ... , en
, the FOR_EACH
macro invokes X
on each e
in the list. For example, the addition example would look like:
1 | // Note that the 'X' is gone, and the list is comma-separated |
The most important difference between FOR_EACH
macro and X Macro is that the FOR_EACH
list definition doesn’t take X
. Unlike the X Macro, where the macro to call on each element is hardcoded to only pass the element itself, the FOR_EACH
macro decoupled “the element to be processed” and “the macro processing the element”. This removes the biggest blocker to implement a macro that can enumerate through Cartesian product of multiple lists.
The core of the trick which allows FOR_EACH
’s list definition to get rid of the X
lies in the C++20 new feature __VA_OPT__
. David Mazieres’ original article is already a good explanation on how the FOR_EACH
macro works so I won’t parrot it again. With the main blocker removed, after only a few hours of work, I was able to successfully extend FOR_EACH
to support enumerating through the Cartesian product of multiple lists. (By the way, even after implementing it, I still have very little idea on how the C preprocessor works, but clang++ -E
is enough to trial-and-error into a working solution).
The FOR_EACH_CARTESIAN_PRODUCT Macro
I call my macro FOR_EACH_CARTESIAN_PRODUCT
. As the name suggests, it takes a macro X
and one or more lists (L1), ..., (Ln)
. Then for each (e1, ..., en)
in the Cartesian product L1 x ... x Ln
, the macro X(e1, ..., en)
is invoked. The elements in the Cartesian product are enumerated in lexical order.
For example, for the type-casting example above, the below code would construct our desired two-dimensional dispatch array:
1 |
|
Note that the generated array is one-dimensional, but indexing it is pretty simple: x_castOpPtr[opType1 * numTypes + opType2]
will give us the desired function pointer for Src=opType1
and Dst=opType2
.
The code, which contains both the implementation for FOR_EACH_CARTESIAN_PRODUCT
and the above examples can be found here. The code is in public domain so feel free to use.