This is the master flag for activating the GPU processors. If you want to change the default values, you must set this parameter in the FMS License File. By default, GPU devices are used for all appropriate calculations if they are present. This parameter is mainly included for software development and testing.
The following values are bit flags. To obtain the value for GPUFL, add the selected options. The following options are available:
- 0, Do not use the GPU processors.
- +1, Use the GPUs for performing matrix multiplies.
- +2, Use the GPUs for performing triangle solves.
- +4, Use the GPUs for diagonal block factoring.
- +8, Use asynchronous streams for transfers to/from the GPUs.
This option overlaps transfers to and from the GPUs with GPU processing.
- +16, Allocate GPU memory on each call (inefficient).
The default is to allocate the memory once when the GPU threads are started and then use memory from this pool (more efficient).
The following are flags passed to cudaSetDeviceFlags when each GPU is initialized:
- +32; cudaDeviceMapHost
Map pinned host memory for access by the device.
- +64; cudaDeviceLmemResizeToMax
Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
- +128; cudaDeviceScheduleYield
Yield the CPU processors after queuing GPU work. On systems with powerful CPU processors, this might provide a net improvement in performance at the expense of having the GPUs wait. The default is to yield if there are less CPU processors than GPUs. If there are more CPU processors than GPUs, the default is to have a CPU spin for each GPU to minimize latency.
The following flag is only used for testing:
- +256, Use small blocks for diagonal factoring.
Force diagonal block factoring to loop over smaller blocks for testing. Normally FMS will divide the diagonal blocks only if they exceed the memory space of the GPU. This condition occurs on large problems. This option will divide the I/O block into 9 subblocks to test the looping algorithm.
The following flag will allow running with ECC off on ECC capable devices:
- +512, Do not use ECC.
This flag will use GPUs that have the ECC mode turned off or have recorded double-bit errors.
CAUTION: Setting this bit can result in WRONG answers. It should only be used for testing hardware.
The following flags are used to skip using specific GPUs:
- +1024, Do not use GPU 0.
- +2048, Do not use GPU 1.
- +4096, Do not use GPU 2.
- +8192, Do not use GPU 3.
- +16384, Do not use GPU 4.
- +32768, Do not use GPU 5.
- +65536, Do not use GPU 6.
- +131072, Do not use GPU 7.