- An enhanced version of the FMSMM subroutine which handles a wider range of problems for a wider range of available memory.
- Improved documentation and tools for performing direct I/O, including the new FMS Parameter IOWARN.
- Additional information is printed by FMSINI about the file systems used.
- A new FMS Parameter IPRDBG for printing information about system service calls.
- Enhanced performance on NonUniform Memory Access (NUMA) machines.
Setting the master flag NUMAFL activates the following
performance enhancing features:
- Memory is explicitly allocated on each node. Each matrix and vector file record is uniformally distributed among the nodes.
- Threads are bound to the node or processor. Each thread performs the piece of work that maximizes memory references to the data on its local node.
- When you run your application in parallel by calling FMSPAR, and your thread allocates memory by calling one of the FMS memory allocation routines FMSIMG, FMSRMG or FMSCMG, the memory is automatically allocated from the node where the thread is bound.

- Solving [A]T{X}={B}. When the FMS Parameter LUTRAN is set, the vector solution routines RNDS and CNDS use the transpose of the factors of [A], [U]T[L]T, instead of [L][U]. EXAMPLE_18 illustrates the use of this new feature.
- Restart during matrix factoring. The FMS Parameter IREST may be used to create restart points during the matrix factoring operation. EXAMPLE_16 illustrates the use of this option.
- EXAMPLE_19 which illustrates how the Assembly-Factoring routines RSDAF, RNDAF, CHDAF, CSDAF, CNDAF can be used to read all the values of a matrix file.
- Estimating disk space requirements. EXAMPLE_20 illustrates how the FMS Parameter NOOPEN and the reduced initialization routine FMSIN2 may be used to compute disk space requirements.
- Additional options for transferring matrix and vector data to FMS files,
- A more general matrix multiple routine,
- A general timing routine.

## New FMS Subroutines

- FMSIN2

Reduced initialization used for estimating disk space requirements. - FMSPUT

Writes a block of data to an FMS vector or matrix file. - FMSGET

Reads a block of data from an FMS vector file. - FMSMM

General out-of-core matrix multiply.

Extends the functionality of earlier multiply subroutines. - FMSTIM

Returns the CPU, Wall and I/O wait time. - FMSCPY

Makes a copy of an FMS file.

## New FMS Parameters

- NOOPEN

Skip the physical opening of files. - LUTRAN

Solve [A]T{X}={B} instead of [A]{X}={B} - NUMAFL

Master flag for activating all performance enhancements for NUMA machines. - NUMAPR

Print output level for NUMA directives. - MYCPU1

Starting processor to use for NUMA placement. - NPNODE

Number of processors per node. - NUMNOD

Number of nodes. - NUMAFX

Thread binding to processor options. - MDWHEN

Controls when memory is allocated. - MAXLMD

Stride between nodes for placing memory on NUMA machines. - NUMATP

NUMA Topology - MMGLUE

Combine blocks during matrix multiply to increase data reuse on the node. - IREST

Writes restart points during matrix factoring. - IOKIDS

Allows subroutines you write and run in parallel to call the FMS subroutines that perform I/O, including: - MMROW

Used with subroutine FMSMM to shift the product of [A][B] in matrix [C]. - MMCOL

Used with subroutine FMSMM to shift the product of [A][B] in matrix [C]. - MMKA

Used with subroutine FMSMM to shift the matrix [A] before multiplying with matrix [B]. - IACCUM

The new IACCUM Parameter extends the functionality of the multiply subroutines to accumulate the results of their products. In addition, the initial values can be set to the contents of a vector file {Z} instead of {0}. For compatibility with Version 5.1, the additional subroutine argument {Z} is only used when IACCUM does not equal zero.Note that the Version 5.1 subroutines are equivalent to the new Version 5.2 subroutines with the value of IACCUM=0.

The new more general purpose subroutine FMSMM, which includes IACCUM as one of its arguments, is preferred over using this IACCUM Parameter and the following multiply routines.NAME Function RSDMVM

RNDMVM

CHDMVM

CSDMVM

CNDMVMMatrix - Vectors Multiply

{Y} = {Z} - [A]{X}, IACCUM = -1

{Y} = [A]{X}, IACCUM = 0

{Y} = {Z} + [A]{X}, IACCUM = 1RSDSVM

CHDSVM

CSDSVMSubmatrix - Vectors Multiply

{Y} = {Z} - SUM{[Si]{X}}, IACCUM = -1

{Y} = SUM([Si]{X}), IACCUM = 0

{Y} = {Z} + SUM{[Si]{X}}, IACCUM = 1RNDVMM

CNDVMMVectors - Matrix Multiply

{Y} = {Z} - {X}[F], IACCUM = -1

{Y} = {X}[F], IACCUM = 0

{Y} = {Z} + {X}[F], IACCUM = 1RSDVVM

CHDVVM

CSDVVMVectors - Vectors Multiply, Symmetric [F]

[F] = [F] - {X}t{Y}, IACCUM = -1

[F] = {X}t{Y}, IACCUM = 0

[F] = [F] + {X}t{Y}, IACCUM = 1RNDVVM

CNDVVMVectors - Vectors Multiply, General [F]

[F] = [F] - {X}t{Y}, IACCUM = -1

[F] = {X}t{Y}, IACCUM = 0

[F] = [F] + {X}t{Y}, IACCUM = 1

RSDDVM

CHDDVM

CSDDVMVectors - Diagonal - Vectors Multiply, Symmetric [F]

[F] = [F] - {X}t[D]{X}, IACCUM = -1

[F] = {X}t[D]{X}, IACCUM = 0

[F] = [F] + {X}t[D]{X}, IACCUM = 1

RNDDVM

CNDDVMVectors - Diagonal - Vectors Multiply, General [F]

[F] = [F] - {X}t[D]{X}, IACCUM = -1

[F] = {X}t[D]{X}, IACCUM = 0

[F] = [F] + {X}t[D]{X}, IACCUM = 1

- MEMPTR

Returns the memory pointer to the beginning of FMS managed memory. You may use this as an alternative to including the FMSMEM common block in your application.Version 5.2 now supports using the FMS memory management routines from your subroutines running in parallel. Each call will allocate a private region of memory for the process. For example, if you have 8 processes running in parallel, and each process calls the FMS memory management routines, 8 regions of memory will be allocated from the FMS memory pool.

- MYNODE

Returns the FMS process number of the calling process (0=parent, 1=CPU child 1, ...). You may use this parameter in subroutines you run in parallel to determine your part of the work.

## Example Problems

New examples have been added to Chapter 5 to\ illustrate the new features in FMS.- EXAMPLE_13

This example illustrates how to use subroutine FMSPUT to write matrix and vector files in parallel. - EXAMPLE_14

This example illustrates subroutine RESOLV, which can be used to resolve a partitioned system of equations when only some of the matrix terms have changed. - EXAMPLE_15

This example illustrates subroutine FMSMM and the FMS Parameters MMROW, MMCOL and MMKA for solving a partitioned system of equations. - EXAMPLE_16

This example illustrates how FMS can be restarted during factoring. - EXAMPLE_17

This example uses iteration to solve a weakly-coupled block diagonal matrix. - EXAMPLE_18

This example illustrates how to use LUTRAN to solve [A]T{X}={B}. - EXAMPLE_19

This example illustrates how the Assembly-Factor subroutines RSDAF, RNDAF, CHDAF, CSDAF and CNDAF can be used to read all the elements of a matrix file. - EXAMPLE_20

This example illustrates how the FMS Parameter NOOPEN and the reduced initialization FMSIN2 can be used to estimate disk space requirements.