Data Type

Integer

Default Value

0

Description

This parameter is used to modify how FMS performs back substitution for profile matrices. The following options are available:

  • 0, Perform back substitution using outer products.
    This is almost always the best choice if there is only one or a small number of RHS vectors.
  • 1, Perform back subatitution using inner products.
    Use memory to hold as many RHS vectors as possible and process the matrix segments 1 at a time. This achieves maximum reuse of the row of [U] that has been loaded into the temporary vector. However, the number of terms in the temporary vector is the number of equations in the matrix segment. For large problems in a small memory machine, this may lead to only a few terms being processed per execution of the inner-loop.
  • N, Use memory to hold N RHS vectors.
    By processing the matrix segments in groups, the temporary vector is longer. However [U] may need to be read multiple times per solution if all the RHS vectors do not fit in memory.
There are two options for the inner-loop when performing back substitution:
  1. Outer products:
       DO J = N,1,-1
       S = X(J)
          DO I = 1,(J-1)
             X(I) = X(I) - S*U(I,J) <---Addresses [U] by columns
          END DO                        Loads X(I), U(I,J) and stores X(I)
       END DO
    
  2. Inner products:
       DO I = N,1,-1
          S = 0
          DO J = (I+1),N
             S = S + U(I,J)*X(J)     <---Addresses [U] by rows
          END DO                         Loads X(J), U(I,J) and no stores.
         X(I) = X(I) - S
       END DO
    
The outer product has an advantage because it addresses the matrix [U] by columns, which is how the data is stored. However it requires 3 memory references, including a store, per cycle of the inner loop.

The inner product has an advantage because it requires only 2 memory references and no store per cycle of the inner loop. However the matrix [U] is addressed by columns, which is across the direction of storage.

To overcome the addressing difficulties of the dot product, it is possible to build a temporary vector containing the ith row of [U]. Then the algorithm proceeds with the advantage of the dot product and incremental addressing. This strategy only works if there are several vectors being processed (multiple RHS's) to amortize the cost of loading the temporary vector with the row of [U].

This is one of those fine tuning parameters that is problem and machine dependent. It is recommended that you use the default value unless you are processing a large number of solution vectors on a small memory machine.