Dependence Testing

Dependence in Straight-line code

Recall the definition of dependences. A dependence exists between two statements if, and only if, there is a feasible control-flow path between the two statements and the two statements access a common memory location, one of which is a “write”. Recall how you can construct a dependence graph for straight-line code. Suppose that some of the memory accesses were to arrays. A conservative approach would be to add a dependence edge whenever two accesses, one of which is a “write”, are to any of the elements in the array. Consider the following straight-line code:

S₁    A(i₁, i₂, ..., i_m) = ...
      ...
S₂    ... = A(j₁, j₂, ..., j_m)

Here, since both statements S₁ and S₂ access the same array, A, we could make a safe assumption that there is a true dependence from S₁ to S₂. Why is this assumption safe? However, we would like to be more precise. For the dependence to exist both the accesses must be to the same location within the array A. This will be the case only if the values at each subscript location in the two accesses are identical. In other words, the following must hold:

i_k = j_k, ∀ 1 ≤ k ≤ m

Notice that the subscripts i_k and j_k could be arbitrary expressions, in general.

In order to extend this notion of testing dependences on array accesses to statements in loops, recall that a statement in a loop represents multiple instances at runtime. A statement enclosed in a loop-nest has as many instances as the size of the iteration vector space of the loop-nest.

Determining Dependences

1. Dependence Testing Theorem: Let α and β be iteration vectors within the iteration space of the following loop-nest:

  for i₁ = L₁:S₁:U₁ {
    for i₂ = L₂:S₂:U₂ {
       ...
         for i_n = L_n:S_n:U_n {
S₁   A(f₁(i₁ ... i_n), ..., f_m(i₁ ... i_n)) = ...
S₂   ...  = A(g₁(i₁ ... i_n), ..., g_m(i₁ ... i_n))
         }
       ...
    }
  }

A dependence exists from S₁ to S₂ if and only if there exist values of α and β such that α < β and the following system of dependence equations is satisfied:

f_i(α) = g_i(β) for all i, 1 ≤ i ≤ m

Proof: Follows directly from the definition of dependences.

Solving the above system of equations for arbitrary f and g functions is undecidable. Why is the problem undecidable? The “halting problem” is a classic undecidable problem. Consider what would happen if one of the f_i functions involved a function call that may or may not terminate. If you had a precise algorithm for dependence testing, how could you use that to solve the halting problem? Assuming all the subscript expressions involve terminating computations the above system of equations is still too difficult to solve. In fact, even if all the subscript expressions (i.e., all the f_i and g_i functions) were restricted to polynomials the system of equations is still too difficult. Fortunately, in most practical programs the subscript expressions tend to be simple—linear functions of loop-index variables in an overwhelming majority of the cases. In such cases, we can write the linear system of equations that must be satisfied for a dependence to exist between two array references with m subscripts that are enclosed in a loop-nest n-levels deep (i.e., an n-dimensional iteration vector space) as:

a₁₁i₁ + a₁₂i₂ + ... + a_1ni_n + c₁ = b₁₁j₁ + b₁₂j₂ + ... + b_1nj_n + d₁
a₂₁i₁ + a₂₂i₂ + ... + a_2ni_n + c₂ = b₂₁j₁ + b₂₂j₂ + ... + b_2nj_n + d₂
...
a_m1i₁ + a_m2i₂ + ... + a_mni_n + c_m = b_m1j₁ + b_m2j₂ + ... + b_mnj_n + d_m

Thus, there are 2×n variables and m equations. Clearly, we are interested in solutions over a limited domain bounded by the lower and upper bounds of each loop in the loop-nest. As long as there is at least one solution within this domain a dependence exists. If no such solution exists, then no dependence can exist. If a dependence is possible we would like to obtain direction or distance vectors for all possible dependences.

It is tempting to think that we can apply a standard technique to solve this linear system of equations that is of the form Ax = B, where A is a matrix of size m×2n and B is a vector of m elements. However, since we are solving for loop index variables, and the loop index variables can only be integers, we require integer solutions. A system of equations to be solved over the integer domain constitutes what are called Diophantine equations. Unfortunately, the problem of finding solutions to Diophantine equations is NP-hard.

Separability and Types of Subscripts

Let us not forget that our goal is to determine dependences (or absence of dependences), not solving the Diophantine equations. Also, as soon as we have determined that two expressions in a particular subscript position cannot be equal we have proved independence and we are done. For the i^the subscript position this is equivalent to proving that the i^th equation in the above list of equations can never hold.

We can rely on a practical observation to simplify our task: the coefficient matrix A in the above set of equations is usually very sparse. In other words, most subscript expressions tend to depend only on a small set of loop indices. Indeed, in many cases the subscripts depend only on one loop index. If the loop indices appearing in a pair of subscript expressions at a particular subscript position do not appear in any other subscript position, we call that subscript separable. For example, in the following piece of code,

for i = 1:N {
  for j = 1:M {
    for k = 1:P {
      A(i+1,j+k,j) = A(i,k,k) + B;
    }
  }
}

the first subscript of A is separable since i is the only loop index that appears in the first subscript position of both the array references. The second and third subscripts are not separable since they both involve loop-index variables j and k. A group of subscripts that are tied together in this manner constitute a coupled group. Here is another example:

for i = 1:N {
  for j = 1:M {
    for k = 1:P {
      A(k+1,j,j) = A(i,j,1) + B;
    }
  }
}

In this example, the first subscript is separable because all the loop indices involved in the first subscript position (the loop indices k and i) do not appear in any other subscripts. The second and third subscripts, on the other hand, are not separable and instead form a coupled group. Notice that coupled groups form equivalence classes over subscript positions where the equivalence relation is defined over subscript positions. Two subscript positions are in the same equivalence class (i.e., are related) if and only if their expressions have at least one common loop index. Why is this an equivalence relation? If you think of a separable subscript as a singleton equivalence class, the following simple algorithm can be used to partition the subscripts into equivalence classes of coupled groups. The algorithm starts by creating an equivalence class for each subscript position and then repeatedly searches for classes that contain at least one subscript position using a particular loop-index variable. All such classes are merged together.

procedure partition (S, P, n_p)
  // S is a set of m subscript pairs S₁, ..., S_m for a single
  //   reference pair enclosed in n loops with indices I₁, ..., I_n
  // P is an output variable containing the set forming a partition
  //   of the subscripts into separable and minimal coupled groups
  // n_p is the number of partitions
  n_p = m
  for i = 1 to m { P_i = {S_i} }
  for i = 1 to n {
    k ← <none>
    for each remaining partition P_j {
      if ∃ s ∈ P_j such that s contains I_i {
        if k == <none>
          k ← j
        else {
          P_k ← P_k ∪ P_j;
          discard P_j;
          n_p = n_p − 1;
        }
      } 
    }
  }

Once the subscripts have been partitioned into separable subscripts and coupled groups, we further categorize the separable subscripts into three types:

ZIV: Zero Induction Variable subscripts are those that contain no loop index variable.
SIV: Single Induction Variable subscripts depend on exactly one loop index variable.
MIV: Multiple Induction Variable subscripts are those that depend on more than one loop index variable.

The rationale behind this categorization is that ZIV subscripts are easiest to test for equality (or inequality), followed by SIV subscripts, followed by MIV subscripts. Additionally, separable subscripts are easier to handle than coupled-groups. We start by testing the simplest subscripts and move on to the more complex cases only if we cannot prove independence. At any stage if we succeed in proving independence we are done. As an example, consider the following loop nest:

for i = 1:N {
  for j = 1:M {
    for k = 1:P {
        A(k+1,j,j,1) = A(i,j,1,N) + B;
    }
  }
}

Here, the first subscript is separable and is MIV since it involves two loop indices, k and i. The second and third subscripts form a coupled group. Finally, the fourth subscript is also separable and is ZIV since it involves no loop index variable.

Subscript Testing

If we cannot prove independence for a separable subscript or a subscript coupled group we compute all the dependence distances or directions for loop indices involved in the subscript or the coupled group. The directions or distances thus computed can then be combined to form the complete direction or distance vector for the dependence. Consider the following loop nest:

for i = 1:N {
  for j = 1:M {
    for k = 1:P {
      A(i+1, j-1, 1) = ...
         ...
      ... = A(i, j, N) + B;
    }
  }
}

Consider the direction vectors for assumed true dependence between the two references to A. Suppose that the true dependence occurs between the write access at the iteration vector (i₁, j₁, k₁) and read access at the iteration vector (i₁, j₁, k₁). If we cannot prove independence between the two accesses then we want to compute the dependence distance between the two vectors, i.e., (i₂-i₁, j₂-j₁, k₂-k₁). Since often it is not possible to summarize the distances in constant terms we may wish to use dependence directions, given my the sign of the dependence distances.

Here, all the subscript positions for the two accesses to A are separable. The first subscript gives us the direction vector for i (loop-level 1) as (<). The second subscript gives us the direction vector for j (loop-level 2) as (>). Finally, the third subscript is a ZIV subscript. If we can prove that N is never equal to 1 then we do not need to do any other testing, since no dependence can exist. If we cannot prove that we must take the conservative stance that a dependence may exist. A ZIV subscript does not directly contribute to any dependence distance or direction. However, we observe that the same location is written and read in each iteration of the k-loop. This means that there is a dependence between all k-instances of the two statements, implying all possible dependence distances ranging from -(P-1) to (P-1), including 0. Thus, all three directions, <, =, and > are possible for the k-loop for the true dependence under consideration.

The direction vectors computed for of each subscript equivalence class can be combined using Cartesian product to obtain the overall direction vectors for the loop-nest. The reason we can perform the Cartesian product is that each equivalence class gives us direction vectors for disjoint sets of loops (recall that this is precisely the basis for defining the equivalence classes). Thus, for this example, we have three possible direction vectors (<,>,<), (<,>,>), and (<,>,=). As a short cut we can summarize these direction vectors as (<,>,*), where “*” denotes that all directions are possible.

Finally, notice that if we had started by assuming an antidependence from the second statement to the first we would have ended up with the > sign in the leftmost position. That would have shown that our assumption about the dependence was incorrect and that it was really a true dependence in the opposite direction. The right direction vectors would be obtained by flipping all the directions. If we were considering output dependences a leftmost > sign would simply imply incorrect assumption about the direction.

Advanced Tests

Testing ZIV and SIV subscripts is relatively straightforward. MIV tests involve more complex mathematical analysis including GCD-based testing and Banerjee Inequalities. The details of those tests are beyond the scope of this discussion.

Delta Test for Coupled Groups

At first it appears that solving dependence equations for coupled groups will necessarily involve solving Diophantine equations. However, certain frequently occurring special cases can be handled using the delta test. In testing for a dependence, we assume that the dependence occurs between the statement instances in iteration I and I+ΔI for each loop index I appearing in the coupled group of subscripts. Equating the corresponding subscript positions often leads to precise values for the ΔIs, leading to precise distance vectors. A concrete example will make this clearer.

for i = 1:N {
  for j = 1:M {
    A(i+1, i+j) = ...
         ...
      ... = A(i, i+j-1) + B;
    }
  }
}

In the above example, suppose that we wish to compute the dependence distance between the two accesses to A, assuming a true dependence. Notice that the two subscripts form a coupled group. We can write the following dependence equations using the Δ notation.

i + 1 = i + Δi
i + j = i + Δi + j + Δj - 1

The first equation gives us Δi = 1, which when substituted into the second equation leads to Δj = 0. Thus, the dependence distance is (1, 0). If there were more subscripts within the coupled group we might be able to simplify other equations using these computed values of Δi and Δj. If we encounter a contradiction (unsolvable dependence equation) then the dependence does not exist.

Omega Test

The delta test outlined above relies on the ability to find an SIV subscript within a coupled group and then repeatedly simplify equations for other subscripts into SIV subscripts, by substitution. Clearly, this may not always be possible. In that case we cannot avoid solving Diophantine equations. Omega Test is a technique that can help in such situations.

Omega Test, first introduced by William Pugh (A Practical Algorithm for Exact Array Dependence Analysis), is a test to determine whether a given set of equations have an integer solution. In fact, the test includes inequalities, and thus the ability to restrict the solution space to integers within loop bounds. It takes the following canonical form of linear equalities and inequalities as input:

(Σ_i≤i≤na_ix_i = c)
(Σ_i≤i≤na_ix_i ≥ c)

The parentheses indicate that there is a set of equalities and inequalities. The basic decision test can also be extended to compute distance vectors by adding some new variables to the above relations.

Even though all known algorithms to solve Diophantine equations have exponential worst case upper bound, the Omega Test has lead to an implementation that has a reasonable performance for a vast majority of real-life cases.

Symbolic Values

For all of these tests it is possible include support for symbolic analysis. Sometimes symbolic values allow us to make certain inferences (such as I+J is always positive if we know that I and J are positive), at others symbolic values might lead to inaccuracies in the tests. It is also possible to generate executable code involving the symbolic values, which is useful in deciding critical dependences. A predicate that helps decide a critical dependence is called a breaking condition. Consider the following example.

for i = 1:L {
  A(i+N) = A(i) + B
}

The statement in the above loop has a self true dependence if nothing is known about the relative values of N and L. This dependence prevents vectorization of the loop. However, if L ≤ N then no element of A is ever accessed twice within the loop and the dependence disappears. This observation leads to the following code that evaluates the breaking condition at runtime and uses that to guard the vectorized version of the above loop.

if (L <= N) {
  A(1+N:L+N) = A(1:N) + B
}
else {
  for i = 1:L {
    A(i+N) = A(i) + B
  }
}

Here is another example.

for i = 1:N {
  A(i) = A(L) + B
}

The breaking condition for the dependence cycle caused by both true and output dependences is (L < 1) or (L > N). It is tempting to split the loop range such that the shorter ranges break the dependence cycle.

for i = 1:L-1 {
  A(i) = A(L) + B
}
if (L <= N) {
  A(L) = A(L) + B
}
for i = L+1:N {
  A(i) = A(L) + B
}

It would be incorrect to rewrite the above two loops directly in the vector form! Why? Splitting the loop range is useful but something more needs to be done to generate correct vectorized code. How would you rewrite the original code into vector form using the above range splitting?

Summary

In general, testing for dependences on simpler expressions is easier, and more precise, than complex expressions. This is the motivation behind testing simpler subscripts first. In order to determine if a dependence exists between two array references we follow the following steps:

Partition the subscripts, where a subscript is a matched pair of subscript positions in the pair of references, into separable and minimal coupled groups. Each separable subscript and each coupled group has completely disjoint sets of indices. Therefore, each partition may be tested in isolation and the resulting distance or direction vectors merged with no loss of precision.
Classify each separable subscript position as ZIV, SIV, or MIV, and apply the appropriate single-subscript test. If any subscript proves independence no further testing is necessary. Otherwise we get direction or distance vectors for the indices occurring in each of the subscripts.
For each coupled group, apply a multiple-subscript test, to produce a set of direction or distance vectors for the indices occurring in that group.
If no test yielded independence, merge all the direction or distance vectors to obtain the single direction or distance vector for the two array references.

This approach works well for linear subscripts, which are an overwhelming majority of array subscripts. A notable exception to this occurs in a class of numerical applications called “irregular” applications. These applications use adaptive data-structures (such as adaptive-grids) to divide their data domains and are characterized by indirect array references, such as A(b(i)) where b is an array of integers. Such applications can also be handled automatically, but require a different set of techniques (such as “inspector-executor”).

Reference

Randy Allen and Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, Chapter 3. Morgan Kaufmann Publishers, 2002.
William Pugh, A Practical Algorithm for Exact Array Dependence Analysis. Communications of the ACM, Vol. 35, No. 8, August 1992.

B629, Arun Chauhan, Department of Computer Science, Indiana University