- An enhanced version of the FMSMM subroutine which handles a wider range of problems for a wider range of available memory.
- Improved documentation and tools for performing direct I/O, including the new FMS Parameter IOWARN.
- Additional information is printed by FMSINI about the file systems used.
- A new FMS Parameter IPRDBG for printing information about system service calls.
- Enhanced performance on NonUniform Memory Access (NUMA) machines.
Setting the master flag NUMAFL activates the following
performance enhancing features:
- Memory is explicitly allocated on each node. Each matrix and vector file record is uniformally distributed among the nodes.
- Threads are bound to the node or processor. Each thread performs the piece of work that maximizes memory references to the data on its local node.
- When you run your application in parallel by calling FMSPAR, and your thread allocates memory by calling one of the FMS memory allocation routines FMSIMG, FMSRMG or FMSCMG, the memory is automatically allocated from the node where the thread is bound.
- Solving [A]T{X}={B}. When the FMS Parameter LUTRAN is set, the vector solution routines RNDS and CNDS use the transpose of the factors of [A], [U]T[L]T, instead of [L][U]. EXAMPLE_18 illustrates the use of this new feature.
- Restart during matrix factoring. The FMS Parameter IREST may be used to create restart points during the matrix factoring operation. EXAMPLE_16 illustrates the use of this option.
- EXAMPLE_19 which illustrates how the Assembly-Factoring routines RSDAF, RNDAF, CHDAF, CSDAF, CNDAF can be used to read all the values of a matrix file.
- Estimating disk space requirements. EXAMPLE_20 illustrates how the FMS Parameter NOOPEN and the reduced initialization routine FMSIN2 may be used to compute disk space requirements.
- Additional options for transferring matrix and vector data to FMS files,
- A more general matrix multiple routine,
- A general timing routine.
New FMS Subroutines
- FMSIN2
Reduced initialization used for estimating disk space requirements. - FMSPUT
Writes a block of data to an FMS vector or matrix file. - FMSGET
Reads a block of data from an FMS vector file. - FMSMM
General out-of-core matrix multiply.
Extends the functionality of earlier multiply subroutines. - FMSTIM
Returns the CPU, Wall and I/O wait time. - FMSCPY
Makes a copy of an FMS file.
New FMS Parameters
- NOOPEN
Skip the physical opening of files. - LUTRAN
Solve [A]T{X}={B} instead of [A]{X}={B} - NUMAFL
Master flag for activating all performance enhancements for NUMA machines. - NUMAPR
Print output level for NUMA directives. - MYCPU1
Starting processor to use for NUMA placement. - NPNODE
Number of processors per node. - NUMNOD
Number of nodes. - NUMAFX
Thread binding to processor options. - MDWHEN
Controls when memory is allocated. - MAXLMD
Stride between nodes for placing memory on NUMA machines. - NUMATP
NUMA Topology - MMGLUE
Combine blocks during matrix multiply to increase data reuse on the node. - IREST
Writes restart points during matrix factoring. - IOKIDS
Allows subroutines you write and run in parallel to call the FMS subroutines that perform I/O, including: - MMROW
Used with subroutine FMSMM to shift the product of [A][B] in matrix [C]. - MMCOL
Used with subroutine FMSMM to shift the product of [A][B] in matrix [C]. - MMKA
Used with subroutine FMSMM to shift the matrix [A] before multiplying with matrix [B]. - IACCUM
The new IACCUM Parameter extends the functionality of the multiply subroutines to accumulate the results of their products. In addition, the initial values can be set to the contents of a vector file {Z} instead of {0}. For compatibility with Version 5.1, the additional subroutine argument {Z} is only used when IACCUM does not equal zero.Note that the Version 5.1 subroutines are equivalent to the new Version 5.2 subroutines with the value of IACCUM=0.
The new more general purpose subroutine FMSMM, which includes IACCUM as one of its arguments, is preferred over using this IACCUM Parameter and the following multiply routines.NAME Function RSDMVM
RNDMVM
CHDMVM
CSDMVM
CNDMVMMatrix - Vectors Multiply
{Y} = {Z} - [A]{X}, IACCUM = -1
{Y} = [A]{X}, IACCUM = 0
{Y} = {Z} + [A]{X}, IACCUM = 1RSDSVM
CHDSVM
CSDSVMSubmatrix - Vectors Multiply
{Y} = {Z} - SUM{[Si]{X}}, IACCUM = -1
{Y} = SUM([Si]{X}), IACCUM = 0
{Y} = {Z} + SUM{[Si]{X}}, IACCUM = 1RNDVMM
CNDVMMVectors - Matrix Multiply
{Y} = {Z} - {X}[F], IACCUM = -1
{Y} = {X}[F], IACCUM = 0
{Y} = {Z} + {X}[F], IACCUM = 1RSDVVM
CHDVVM
CSDVVMVectors - Vectors Multiply, Symmetric [F]
[F] = [F] - {X}t{Y}, IACCUM = -1
[F] = {X}t{Y}, IACCUM = 0
[F] = [F] + {X}t{Y}, IACCUM = 1RNDVVM
CNDVVMVectors - Vectors Multiply, General [F]
[F] = [F] - {X}t{Y}, IACCUM = -1
[F] = {X}t{Y}, IACCUM = 0
[F] = [F] + {X}t{Y}, IACCUM = 1
RSDDVM
CHDDVM
CSDDVMVectors - Diagonal - Vectors Multiply, Symmetric [F]
[F] = [F] - {X}t[D]{X}, IACCUM = -1
[F] = {X}t[D]{X}, IACCUM = 0
[F] = [F] + {X}t[D]{X}, IACCUM = 1
RNDDVM
CNDDVMVectors - Diagonal - Vectors Multiply, General [F]
[F] = [F] - {X}t[D]{X}, IACCUM = -1
[F] = {X}t[D]{X}, IACCUM = 0
[F] = [F] + {X}t[D]{X}, IACCUM = 1
- MEMPTR
Returns the memory pointer to the beginning of FMS managed memory. You may use this as an alternative to including the FMSMEM common block in your application.Version 5.2 now supports using the FMS memory management routines from your subroutines running in parallel. Each call will allocate a private region of memory for the process. For example, if you have 8 processes running in parallel, and each process calls the FMS memory management routines, 8 regions of memory will be allocated from the FMS memory pool.
- MYNODE
Returns the FMS process number of the calling process (0=parent, 1=CPU child 1, ...). You may use this parameter in subroutines you run in parallel to determine your part of the work.
Example Problems
New examples have been added to Chapter 5 to\ illustrate the new features in FMS.- EXAMPLE_13
This example illustrates how to use subroutine FMSPUT to write matrix and vector files in parallel. - EXAMPLE_14
This example illustrates subroutine RESOLV, which can be used to resolve a partitioned system of equations when only some of the matrix terms have changed. - EXAMPLE_15
This example illustrates subroutine FMSMM and the FMS Parameters MMROW, MMCOL and MMKA for solving a partitioned system of equations. - EXAMPLE_16
This example illustrates how FMS can be restarted during factoring. - EXAMPLE_17
This example uses iteration to solve a weakly-coupled block diagonal matrix. - EXAMPLE_18
This example illustrates how to use LUTRAN to solve [A]T{X}={B}. - EXAMPLE_19
This example illustrates how the Assembly-Factor subroutines RSDAF, RNDAF, CHDAF, CSDAF and CNDAF can be used to read all the elements of a matrix file. - EXAMPLE_20
This example illustrates how the FMS Parameter NOOPEN and the reduced initialization FMSIN2 can be used to estimate disk space requirements.