「こんなきれいな星も、やっぱりここまで来てから、見れたのだと思うから。だから・・もっと遠くへ・・」

From X Macro to FOR_EACH to Cartesian Product Enumeration with C Macro

Quite a while ago I was implementing an interpreter. A common task in the interpreter is to select the correct interpreter function based on the type of the input. Let’s say we want to implement an addition. We might end up with something like below:

1
2
3
4
template<typename T>
void Add(T* input1, T* input2, T* output) {
*output = *input1 + *input2;
}

to implement the operation. At runtime, we want to dispatch to the right function base on the type of the operands. A natural way to do this is to have a static array holding the mapping from the operand type to the function pointer, similar to below:

1
2
3
4
5
6
enum Type { Int32, Int64, Double, ... };
static void* const x_addOpPtr[] = {
(void*)Add<int32_t>,
(void*)Add<int64_t>,
(void*)Add<double_t>, ...
};

so at runtime we can just read x_addOpPtr[operandType] to obtain the function pointer we want to call.

The X Macro

Although the code above can work, it is clearly too error prone. If we accidentally made a mistake in the order of the list, we are screwed. A better way is the X Macro pattern. We define a “X macro” for all the types:

1
2
3
4
#define FOR_EACH_TYPE(X) \
X(Int32, int32_t) \
X(Int64, int64_t) \
X(Double, double_t) ...

Then, by defining what X(EnumType, CppType) expands to, we can create logic based on our needs. For example, the following code would reproduce the x_addOpPtr array we want:

1
2
3
4
5
6
#define X(EnumType, CppType) (void*)Add<CppType> , 
static void* const x_addOpPtr[] = {
FOR_EACH_TYPE(X)
nullptr
};
#undef X // hygiene

Note that the final nullptr is needed because our expansion (void*)Add<CppType>, would generate an extra comma in the end.

The New Challenge

X Macro solved the above problem, but what if we want to handle, say, a type cast opcode?

1
2
3
4
template<typename Src, typename Dst>
void Cast(Src* input, Dst* output) {
*output = static_cast<Dst>(*input);
}

Unlike the addition operator, now we have two types Src and Dst to enumerate on, so we have to generate a two-dimensional array. While X Macro can easily iterate through one list and perform action on every item, it cannot iterate through the Cartesian product of two lists. A worse solution, is of course, to manually define a list containing all the <Src, Dst> pairs, so we can do X macro again. But what if we want to do a three-dimensional Cartesian product in the future?

After some fruitless Googling and home-making attempts to build a “two dimensional X Macro”, I eventually gave up and switched to an ugly solution. Instead of generating a clean static array, we generate a tree of templated dispatching functions. The function at the i-th level use a dispatch array (built by X macro) to dispatch to the next level’s selector function based on the i-th parameter type. We get the function pointer when we reach the leaf. While this approach works, no doubt it is very ugly, and probably also less performant (I didn’t check if the C++ compiler were able to optimize away all the terrible things).

The FOR_EACH Macro

I used to believe my ugly solution is as good as one can get without resorting to manually enumerating the Cartesian product. However, today I learnt an interesting approach from David Mazieres, which he calls the FOR_EACH macro.

The semantics of the FOR_EACH macro is pretty clear. Taking a macro X (similar to the X in X Macro) and a comma-separated list of elements e1, e2, ... , en, the FOR_EACH macro invokes X on each e in the list. For example, the addition example would look like:

1
2
3
4
5
6
7
8
9
10
11
12
// Note that the 'X' is gone, and the list is comma-separated
#define TYPE_LIST \
(Int32, int32_t) , \
(Int64, int64_t) , \
(Double, double_t) ...

#define X(e) (void*)Add<TUPLE_GET_2(e)> ,
static void* const x_addOpPtr[] = {
FOR_EACH(X, TYPE_LIST)
nullptr
};
#undef X

The most important difference between FOR_EACH macro and X Macro is that the FOR_EACH list definition doesn’t take X. Unlike the X Macro, where the macro to call on each element is hardcoded to only pass the element itself, the FOR_EACH macro decoupled “the element to be processed” and “the macro processing the element”. This removes the biggest blocker to implement a macro that can enumerate through Cartesian product of multiple lists.

The core of the trick which allows FOR_EACH’s list definition to get rid of the X lies in the C++20 new feature __VA_OPT__. David Mazieres’ original article is already a good explanation on how the FOR_EACH macro works so I won’t parrot it again. With the main blocker removed, after only a few hours of work, I was able to successfully extend FOR_EACH to support enumerating through the Cartesian product of multiple lists. (By the way, even after implementing it, I still have very little idea on how the C preprocessor works, but clang++ -E is enough to trial-and-error into a working solution).

The FOR_EACH_CARTESIAN_PRODUCT Macro

I call my macro FOR_EACH_CARTESIAN_PRODUCT. As the name suggests, it takes a macro X and one or more lists (L1), ..., (Ln). Then for each (e1, ..., en) in the Cartesian product L1 x ... x Ln , the macro X(e1, ..., en) is invoked. The elements in the Cartesian product are enumerated in lexical order.

For example, for the type-casting example above, the below code would construct our desired two-dimensional dispatch array:

1
2
3
4
5
6
#define X(e1, e2) (void*)Cast<TUPLE_GET_2(e1), TUPLE_GET_2(e2)> ,
static void* const x_castOpPtr[] = {
FOR_EACH_CARTESIAN_PRODUCT(X, (TYPE_LIST), (TYPE_LIST))
nullptr
};
#undef X

Note that the generated array is one-dimensional, but indexing it is pretty simple: x_castOpPtr[opType1 * numTypes + opType2] will give us the desired function pointer for Src=opType1 and Dst=opType2.

The code, which contains both the implementation for FOR_EACH_CARTESIAN_PRODUCT and the above examples can be found here. The code is in public domain so feel free to use.