D.5.8. FMS Matrix File Parameter IJSTEP

Data Type

Integer

Units

Equations

Default Value

0 = determined by FMS in subroutine RSDI, RNDI, CHDI, CSDI, CNDI.

Description

This is the stride toward the diagonal that FMS uses for storing off-diagonal matrix data on files LUA(1) and LUA(3). The following options are available:

	IJSTEP=1 Data in the lower triangle [AL] is stored by rows. Data in the upper triangle [AU] is stored by columns.
	IJSTEP=NEQBLK Data in the lower triangle [AL] is first stored in columns NEQBLK equation tall. These columns are then stored in a direction proceeding toward the diagonal. Data in the upper triangle [AU] is first stored in rows NEQBLK equations wide. These rows are then stored in a direction proceeding toward the diagonal.
	IJSTEP=NEQBIO Data in the lower triangle [AL] is stored in the blocks by columns (not transposed). Data in the upper triangle [AU] is stored in the blocks by rows (transposed). Applies to Block and Slab matrices only.

Note that in all cases, the storage in [AU] is the transpose of the storage in [AL].

This option is provided in FMS to accommodate different machine architectures. In all cases, fetching data from memory sequentially is desirable (incremental addressing). The following algorithms are matched to each of the data storage options provided by IJSTEP to produce incremental addressing.

IJSTEP=1
This data storage is naturally aligned for performing dot products

        DO J = 1,N
           DO I = 1,N
              S = 0
C             Loop across row I of [AL], down column J of [AU]:
              DO K = 1,N
                 S = S + AL(I,K)*AU(K,J)
              END DO
              C(I,J) = C(I,J) + S
           END DO
        END DO

On most machines dot products give good performance because there are only two memory load operations and no memory store in the inner-loop. However, the accumulation of data into the register S is not implemented on some machines with vector hardware.

IJSTEP=NEQBLK
This is a variation of the dot product algorithm designed for RISC processors. For example, suppose NEQBLK=2. Then 4 dot products would be computed in an interleaved fashion by:

        DO J = 1,N,NEQBLK
           DO I = 1,N,NEQBLK
              S11= 0
              S21= 0
              S12= 0
              S22= 0
              DO K = 1,N
C                Fetch next NEQBLK terms from [AL] and [AU]:
                 S11 = S11 + AL(I  ,K)*AU(K,J  )
                 S21 = S21 + AL(I+1,K)*AU(K,J  )
                 S12 = S12 + AL(I  ,K)*AU(K,J+1)
                 S22 = S22 + AL(I+1,K)*AU(K,J+1)
              END DO
              C(I  ,J  ) = C(I  ,J  ) + S11
              C(I+1,J  ) = C(I+1,J  ) + S21
              C(I  ,J+1) = C(I  ,J+1) + S12
              C(I+1,J+1) = C(I+1,J+1) + S22
           END DO
        END DO

Note that all data is fetched from [AL] and [AU] incrementally. In addition, the inner-loop has only 4 memory loads for 4 multiply and 4 add operations. This uses only half the memory bandwidth of the dot product. Increasing NEQBLK further reduces memory requirements. This algorithm is preferred for RISC processors. NEQBLK is picked as large as possible to use all the floating point registers for accumulations.

IJSTEP=NEQBIO
This data is naturally aligned for performing outer products

        DO K = 1,N
           DO J = 1,N
C          Loop across row J of [AU]:
            S = AU(K,J)
C             Loop down column I of [C] and [AL]:
              DO I = 1,N
                 C(I,J) = C(I,J) + AL(I,K)*S
              END DO
           END DO
        END DO

This algorithm avoids the accumulation of the dot product and is optimal for some vector machines.

If you attempt to set IJSTEP to a value that is not permitted, FMS will correct it to the closest reasonable value.

The default value of IJSTEP is designed to work in conjunction with the optimized matrix kernels specified with NEQBLK. Changing the value of IJSTEP may significantly effect performance.

NOTE: If you are performing substructuring, values of IJSTEP=NEQBLK are not permitted. In some cases, it may be necessary to override the default values.