In a typical computer, data, which is stored on disk, flows into memory, where it is held until required by the processors. Data then flows from memory into the processors, which perform multiplications and additions.
The following parameters are used to characterize the computer:
D = Disk transfer rate
(Megabytes per second).
D is the flow rate from disk to memory. If several disks are performing transfers in parallel
(file striping), D is the aggregate transfer rate of all disks operating together when a single
file record is transferred.
M = Memory size
(64-bit words).
M is the free memory available for buffer space in 64-bit words.
M does not include memory required for the operating system, program instructions, or other program
data.
C = Processor speed
(Millions of floating point operations per second, Mflops).
When several processors are operated in parallel, C is the aggregate sum of all processors operating
together.
R = Reuse
(operations per word).
R is the number of times, on the average, the processors use each memory word for a multiply or add
operation. R is determined by the algorithm.
In a properly balanced system, the data flowrate from the disks, times the reuse from memory, equals the computational rate as shown by the following:
DR = 8C
If DR < 8C, the process is I/O bound.
If DR > 8C, the process is CPU bound.
The reuse, R, is determined by the algorithm being used. For factoring PROFILE matrices, R is 2B, where B is the half bandwidth of the matrix. The factor 2 accounts for one multiplication and one addition per term.
During vector solution, R is 2(NUMRHS), where NUMRHS is the number of right-hand side vectors. For most machines, the ratio 4C/D exceeds the number of right-hand side vectors and the solution is I/O bound. The solution subroutines in FMS are designed to process multiple right-hand side vectors simultaneously to minimize I/O time. If the number of right-hand side vectors is small, you may also store them in memory to eliminate I/O.
The reuse R is different for matrices stored in PROFILE format and BLOCK format.
- PROFILE format matrices
The number of vectors of length L which can be stored in M words of memory is N = M/L. The reuse is 2*N, allowing for additions and multiplications. The amount of memory to balance I/O and processing becomes the following:M = (4CL)/D
The above equation shows that if the matrix bandwidth or vector length (L) doubles, the amount of memory (M) should double. If the computational speed (C) doubles, the amount of memory should double. If the disk transfer rate (D) doubles, only half as much memory is required.
- BLOCK format matrices
The memory required to store one block of dimension L is M=L**2. where **2 means squared. The reuse is 2*L, allowing for additions and multiplications. The amount of memory per block to balance I/O and processing becomes:M = (4C/D)**2
These equations can be used to estimate machine configurations. Any specific machine will be used for a variety of applications and problem sizes. These equations should be considered only as a guideline to prevent configurations which are extremely I/O bound or CPU bound.