In DFG’s IR, each function consists of a list of basic blocks, and each basic block contains a list of IR intructions. An IR instruction is also implicitly the value this instruction produces, and can be used as operand to other IR instructions. Everything till now is similar to a typical SSA representation. However, there are two important differences:
- Each IR instruction can only use the SSA values (implicitly produced by other IR instructions) in the same basic block.
- Unlike in SSA, DFG does not have an IR instruction named “Phi”. Instead, in DFG’s IR, each function has a list of “slots”, which are local variables that one can read from and write to. There are two IR instructions
SetLocal which allows reading/writing a slot respectively.
As one would expect, a
GetLocal takes a slot number constant as operand, and produces an SSA value. A
SetLocal takes an SSA value (i.e., an IR instruction that produces a value) and a slot number constant as operands, and produces nothing.
The above representation (called
LoadStore form in JSC) is very flexible for users to generate the IR, but not good for optimization, because in such representation it is hard to reason about the data stored in local variables. This is where the CPS Rethreading pass comes in, which transforms the graph to the so-called
ThreadedCPS form. The CPS rethreading pass optimizes away redundant loads, adds auxiliary Phi nodes that describe the data flow of loads and stores, and also performs liveness analysis.
The most interesting design choice here is that the Phi nodes are auxiliary and optional, and are not part of the IR graph: the correctness of the IR graph is not affected even if all Phi nodes were removed, and by doing so, it only puts the graph back to
As a result, a transformation pass may choose to either maintain the Phi nodes and the liveness information, or simply mark that the graph has been put back to
LoadStore form (and then the CPS Rethreading pass can be called to bring it to
ThreadedCPS form again).
The extra information available in
ThreadedCPS form is the following:
- Each basic block gets a list of auxiliary Phi nodes. Unlike Phi nodes in SSA form, the Phi node here is parametrized by a slot number constant, so it denotes the value of that local variable at the start of the basic block. Those Phi nodes are only referenced by other Phi nodes (from other basic blocks) and by the stuffs below. They are never used by a normal IR instruction as operand.
- Each basic block gets two auxiliary arrays
Nis the total number of local variable slots of the function). The
variablesAtHeadarray denotes the value of all live variables at the start of the basic block. The
variablesAtTailarray denotes the value of all available (defined here as either live at the start of the basic block, or modified inside the basic block) variables at the end of the basic block.
GetLocalnode gets an auxiliary pointer which denotes the value this
GetLocalwould yield. The pointer points to either a value-producing instruction in the same basic block (which means the
GetLocalwould yield that value), or a Phi node in the same basic block (which means the
GetLocalwould yield the value of the Phi).
GetLocals to the same local variable in the same basic block are removed. There are two cases:
GetLocalcan be replaced by the previous
SetLocalcan be replaced by the operand of the
There are a few points that are worth to note:
- Point (2) above implies that the total space consumption is
Nis the total number of local variable slots and
Mis the number of basic blocks.
SetLocals to the same local variable inside the same basic block are not removed. Probably it has something to do with OSR.
- The design above makes sure that all the extra information (most notably, the Phi nodes) are auxiliary: they can be safely dropped without affecting correctness.
The algorithm is implemented in DFGCPSRethreadingPhase.cpp. For simplicity, as before, we will focus on
SetLocal only, and ignore
SetArgumentMaybe related logic.
The first step of the algorithm resets the existing state in case the pass has been invoked earlier. It clears the
variablesAtTail array for each basic block, removes all Phi nodes, and reserves the space for the annotation pointer for each
GetLocal node (by simply repurposes the unused
In the second step, for each basic block, it iterates all the IR instructions from up to down. The
variablesAtTail array is used as a scratchpad to keep track of the current value of each variable at the current IR instruction being iterated.
- If it is a
nullptr, then this is the first time the variable
iis used in this basic block, and the value of this variable should come from a Phi node. So we create a Phi node and store it into
variablesAtTail[i]should also be updated to the current
- If it is a
nullptr, then the local variable
ihas been used in this basic block. Thus, this
GetLocalcan always be removed from the graph by replacing it with another node (with detail described earlier). Thanks to the design of the IR, this
GetLocalmay only be used by later IR instructions in the same basic block, so the replacement can be done easily.
- If it is a
StoreLocalinstruction, then we can simply update the corresponding slot in
The third step builds the incoming edges for all Phi nodes. The Phi nodes created in the previous step are put into a list. The algorithm iterates until the list gets empty. Each time an element, let’s say, a Phi node for some variable
i in basic block
B, is removed from the list. Then for all predecessors
P of basic block
nullptr, then there isn’t a Phi node in
Pfor local variable
iyet. So, a new one is created, and
P.variablesAtTail[i]is updated. The newly created Phi node is put into the list.
- Otherwise, the predecessor of the current Phi node for that basic block should simply be
Note that the second step and the third step also guarantee that
variablesAtHead correctly contains the information for all live variables at the start of a basic block.
In my opinion, the design of the DFG IR is really neat. The standard SSA form is best suited for optimization, but it is both difficult and slow to construct and transform. The DFG IR made interesting design choices that intentionally deviates from the standard SSA form to improve compilation speed and usability. This matches DFG JIT’s design goal as a fast optimizer. The CPS Rethreading pass is a neat algorithm that serves multiple purposes simultaneously: to simplify and canonicalize the IR, to construct a SSA-like Phi data flow, and to perform liveness analysis.
In fact, there are also
PhantomLocalwhich are related to OSR, and
SetArgumentMaybewhich are related to special casing of arguments, but we will overlook them in this article for simplicity. ↩︎
It seems like the algorithm intentionally updates it to the
GetLocalnode instead of the Phi node because it can provide more information to users: one can easily trace to the Phi from the
GetLocalbut not otherwise around. ↩︎
Specifically, all we need to do is to maintain the replacement map, and at the time we iterate a node, we check if any of its operands are in the replacement map. If yes, we replace it. The replacement map can simply be a
std::map, but a hash table lookup is still slow, so in JSC there is a 8-byte scratch field in each
Nodestruct that can be used for this purpose (or other similar purposes). ↩︎
In fact, there are three different kinds of local variables:
Tmp. So correspondingly, there are three different lists. I think it’s only an implementation detail: the three list could have been merged into one, but the current code design (where the
Kindis used as a template parameter in several APIs) makes it easiest to just have three lists. ↩︎
The actual implementation is a bit more complex. If the predecessor node is a
GetLocal, it would further forward it to the value of the
GetLocal(as computed in the auxiliary pointer), but if the predecessor node is a
StoreLocal, it won’t forward it to its operand. But these are only implementation details to best fit the needs of the clients (the other optimization passes), so we omit them to highlight the core idea. ↩︎