D.2.1. FMS GPU Parameter GPUFL

Data Type

Integer

Default Value

Description

This is the master flag for activating the GPU processors. If you want to change the default values, you must set this parameter in the FMS License File. By default, GPU devices are used for all appropriate calculations if they are present. This parameter is mainly included for software development and testing.

The following values are bit flags. To obtain the value for GPUFL, add the selected options. The following options are available:

The following flags control the use of GPUs during computation. By default, the GPUs are used. Setting these flags results in the computation being performed by CPUs only.

+1, Do NOT use the GPUs for performing matrix multiplies.
+2, Do NOT use the GPUs for performing triangle solves.
+4, Do NOT use the GPUs for diagonal block factoring.

The following flags control how the GPUs are interfaced to the system.

+8, Do NOT use asynchronous streams for transfers to/from the GPUs.
Asynchronous streams overlap transfers to and from the GPUs with GPU processing. Selecting this option results in these transfers being performed synchronously.
+16, Allocate GPU memory on each call (inefficient).
The default is to allocate the memory once when the GPU threads are started and then use memory from this pool (more efficient).

The following are flags passed to cudaSetDeviceFlags when each GPU is initialized:

+32; cudaDeviceMapHost
The default is to map pinned host memory for access by the device. Setting this flag causes the memory not to be mapped.
+64; cudaDeviceLmemResizeToMax
The default is to instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
Setting this flag causes the memory to be reduced.
+128; cudaDeviceScheduleYield
Yield the CPU processors after queuing GPU work. On systems with powerful CPU processors, this might provide a net improvement in performance at the expense of having the GPUs wait. The default is to yield if there are less CPU processors than GPUs. If there are more CPU processors than GPUs, the default is to have a CPU spin for each GPU to minimize latency.

+256, Do NOT pin host memory.
Pinning host memory improves transfer rates to/from the GPUs. It allows the transfers to occur without locking and unlocking each page during the transfers. For this reason it is selected as the default. Setting this flag causes host memory NOT to be pinned. If a machine is being heavily used for other tasks, it may not be possible to pin the required memory. Under these conditions this Parameter may be used to skip memory pinning

The following flag will allow running with ECC off on ECC capable devices:

+512, Do not use ECC.
This flag will use GPUs that have the ECC mode turned off or have recorded double-bit errors.
CAUTION: Setting this bit can result in WRONG answers. It should only be used for testing hardware.

+1024, Use all the GPUs installed.
This flag will use all the GPUs in the system, regardless of their performance level. By default, FMS determines the most powerful GPU available and uses all GPUs that have that same performance. This allows a weaker GPU to be used to drive a display and more powerful GPU to be used for computation.

Setting this flag on a system with unequal performing GPUs will result in a significant degredation of performance. FMS allocates an equal amount of work to each GPU, so the performance will degrade to the slowest GPU in the system.