Data Type
IntegerUnits
EquationsDefault Value
0 = determined by FMS in subroutine RSDI, RNDI, CHDI, CSDI, CNDI.Description
This is the stride toward the diagonal that FMS uses for storing off-diagonal matrix data on files LUA(1) and LUA(3). The following options are available:
IJSTEP=1 Data in the lower triangle [AL] is stored by rows. Data in the upper triangle [AU] is stored by columns. |
|
IJSTEP=NEQBLK Data in the lower triangle [AL] is first stored in columns NEQBLK equation tall. These columns are then stored in a direction proceeding toward the diagonal. Data in the upper triangle [AU] is first stored in rows NEQBLK equations wide. These rows are then stored in a direction proceeding toward the diagonal. |
|
IJSTEP=NEQBIO Data in the lower triangle [AL] is stored in the blocks by columns (not transposed). Data in the upper triangle [AU] is stored in the blocks by rows (transposed). Applies to Block and Slab matrices only. |
Note that in all cases, the storage in [AU] is the transpose of the storage in [AL].
This option is provided in FMS to accommodate different machine architectures. In all cases, fetching data from memory sequentially is desirable (incremental addressing). The following algorithms are matched to each of the data storage options provided by IJSTEP to produce incremental addressing.
IJSTEP=1This data storage is naturally aligned for performing dot products
DO J = 1,N DO I = 1,N S = 0 C Loop across row I of [AL], down column J of [AU]: DO K = 1,N S = S + AL(I,K)*AU(K,J) END DO C(I,J) = C(I,J) + S END DO END DOOn most machines dot products give good performance because there are only two memory load operations and no memory store in the inner-loop. However, the accumulation of data into the register S is not implemented on some machines with vector hardware.
IJSTEP=NEQBLK
This is a variation of the dot product algorithm designed for RISC processors.
For example, suppose NEQBLK=2. Then 4 dot products would be computed in an interleaved fashion
by:
DO J = 1,N,NEQBLK DO I = 1,N,NEQBLK S11= 0 S21= 0 S12= 0 S22= 0 DO K = 1,N C Fetch next NEQBLK terms from [AL] and [AU]: S11 = S11 + AL(I ,K)*AU(K,J ) S21 = S21 + AL(I+1,K)*AU(K,J ) S12 = S12 + AL(I ,K)*AU(K,J+1) S22 = S22 + AL(I+1,K)*AU(K,J+1) END DO C(I ,J ) = C(I ,J ) + S11 C(I+1,J ) = C(I+1,J ) + S21 C(I ,J+1) = C(I ,J+1) + S12 C(I+1,J+1) = C(I+1,J+1) + S22 END DO END DONote that all data is fetched from [AL] and [AU] incrementally. In addition, the inner-loop has only 4 memory loads for 4 multiply and 4 add operations. This uses only half the memory bandwidth of the dot product. Increasing NEQBLK further reduces memory requirements. This algorithm is preferred for RISC processors. NEQBLK is picked as large as possible to use all the floating point registers for accumulations.
IJSTEP=NEQBIO
This data is naturally aligned for performing outer products
DO K = 1,N DO J = 1,N C Loop across row J of [AU]: S = AU(K,J) C Loop down column I of [C] and [AL]: DO I = 1,N C(I,J) = C(I,J) + AL(I,K)*S END DO END DO END DOThis algorithm avoids the accumulation of the dot product and is optimal for some vector machines.
If you attempt to set IJSTEP to a value that is not permitted, FMS will correct it to the closest reasonable value.
The default value of IJSTEP is designed to work in conjunction with the optimized matrix kernels specified with NEQBLK. Changing the value of IJSTEP may significantly effect performance.
NOTE: If you are performing substructuring, values of IJSTEP=NEQBLK are not permitted. In some cases, it may be necessary to override the default values.