Performance Enhancements to GPU Matrix Kernels
NVIDIA's CuBlas library now takes advantage of architecheral enhancements available on the latest GPUs. When exploited by this new library, significant performance enhancements are realized, compared to the earlier libraries. These enhancements are now a standard part of this distribution.Unfortunately GPUs with compute capability 2.0 or older (Fermi architecture) do not contain the hardware to execute this new library. To be compatible with all GPU models, FMSlib detects the compute capability of the current hardware at execution time. If hardware is detected which is incompatible with the latest software, FMSlib automatically switches to a shared object version which is compatible. These legacy shared object versions of the CuBlas library are provided as part of this distribution.
If you know you will never be running on earlier hardware you may delete these files. They are only used when earlier hardware is detected.The following table lists the CuBlas libraries which are provided:
Operating System | Linux 64 | Windows 64 | Windows 32 |
---|---|---|---|
Curent CuBlas | 9.1 Linked in |
cublas64_91.dll | N/A |
Legacy CuBlas | 8.0 Provided as libcublas.so |
cublas64_80.dll | cublas32_65.dll |
New Dashboard WEB Pages for GPU Properties
New WEB pages GPU-Fixed, GPU-Chg., GPU-Dyn. and GPU-RTL have been added to the Dashboard reports, which are available in MatrixWarrior and FMSlib. These pages summarize information about the GPU's hardware properties and oprating environment.
Two layers of NVIDIA software are interrogated to obtain this information:
- NVML, The Nvidia Management Library.
This is the device driver, the lowest level of software managing the GPU. It provides the following types of information:- Static
This includes properties of the GPU which do not change with time. Examples include model number and where it is installed on the PCI bus. - Changable
Settings which can be changed, either through the nvidia-smi utility or by an application. Examples include power and temperature operating limits. - Dynamic
Performance information which is continuously changing while the GPU is operating. Examples include temperature, clock rate and power usage.
- Static
- Run Time Library
The Run Time Library is layered on the NVML device driver. It also extracts some information from the device driver, as well as other settings which control the runtime environment.
Temperature, Clock Rate and Power Consumption on the Performance Page
Routine | All | 40 CPUs |
GPU 1 33 °C 1312 MHz 57 Watts |
GPU 2 34 °C 1312 MHz 55 Watts |
GPU 3 34 °C 1312 MHz 56 Watts |
GPU 4 32 °C 1312 MHz 54 Watts |
GPU 5 31 °C 1312 MHz 55 Watts |
GPU 6 33 °C 1312 MHz 56 Watts |
GPU 7 35 °C 1312 MHz 56 Watts |
GPU 8 31 °C 1312 MHz 54 Watts |
---|---|---|---|---|---|---|---|---|---|---|
Matrix Multiply | 51730 | 0 | 6502 | 6519 | 6530 | 6515 | 6493 | 6494 | 6504 | 6503 |
CPU(0%) | ||||||||||
Triangle Solve | 48636 | 0 | 6173 | 6229 | 6221 | 6203 | 6100 | 6080 | 6161 | 6250 |
CPU(0%) | ||||||||||
Diagonal Factor | 23893 | 9 | 3123 | 3111 | 3130 | 3122 | 3133 | 3129 | 3129 | 3124 |
GPU model = Tesla V100-SXM2-16GB |
Depending on the GPU model, the Performance page now displays the current temperature, SM clock rate and power consumption for each GPU. This information may be used for monitoring and making any adjustments to temperature, clock or power limits. It is also useful in obtaining a snapshot on how well an algorithm divides it's work among the GPUs.