"What's New with FMS Version 7.1"

Version 7.1 has the following enhancements over Version 7.0:

Performance Enhancements to GPU Matrix Kernels

NVIDIA's CuBlas library now takes advantage of architecheral enhancements available on the latest GPUs. When exploited by this new library, significant performance enhancements are realized, compared to the earlier libraries. These enhancements are now a standard part of this distribution.

Unfortunately GPUs with compute capability 2.0 or older (Fermi architecture) do not contain the hardware to execute this new library. To be compatible with all GPU models, FMSlib detects the compute capability of the current hardware at execution time. If hardware is detected which is incompatible with the latest software, FMSlib automatically switches to a shared object version which is compatible. These legacy shared object versions of the CuBlas library are provided as part of this distribution.

If you know you will never be running on earlier hardware you may delete these files. They are only used when earlier hardware is detected.

The following table lists the CuBlas libraries which are provided:

CuBlas used with FMSlib Version 7.1
Operating System	Linux 64	Windows 64	Windows 32
Curent CuBlas	9.1 Linked in	cublas64_91.dll	N/A
Legacy CuBlas	8.0 Provided as libcublas.so	cublas64_80.dll	cublas32_65.dll

New Dashboard WEB Pages for GPU Properties

New WEB pages GPU-Fixed, GPU-Chg., GPU-Dyn. and GPU-RTL have been added to the Dashboard reports, which are available in MatrixWarrior and FMSlib. These pages summarize information about the GPU's hardware properties and oprating environment.

Two layers of NVIDIA software are interrogated to obtain this information:

NVML, The Nvidia Management Library.
This is the device driver, the lowest level of software managing the GPU. It provides the following types of information:
- Static
  This includes properties of the GPU which do not change with time. Examples include model number and where it is installed on the PCI bus.
- Changable
  Settings which can be changed, either through the nvidia-smi utility or by an application. Examples include power and temperature operating limits.
- Dynamic
  Performance information which is continuously changing while the GPU is operating. Examples include temperature, clock rate and power usage.
Run Time Library
The Run Time Library is layered on the NVML device driver. It also extracts some information from the device driver, as well as other settings which control the runtime environment.

Temperature, Clock Rate and Power Consumption on the Performance Page

Component Performance (Gflops)
Routine	All	40 CPUs	GPU 1 33 °C 1312 MHz 57 Watts	GPU 2 34 °C 1312 MHz 55 Watts	GPU 3 34 °C 1312 MHz 56 Watts	GPU 4 32 °C 1312 MHz 54 Watts	GPU 5 31 °C 1312 MHz 55 Watts	GPU 6 33 °C 1312 MHz 56 Watts	GPU 7 35 °C 1312 MHz 56 Watts	GPU 8 31 °C 1312 MHz 54 Watts
Matrix Multiply	51730	0	6502	6519	6530	6515	6493	6494	6504	6503
CPU(0%)	51730	0	6502	6519	6530	6515	6493	6494	6504	6503
Triangle Solve	48636	0	6173	6229	6221	6203	6100	6080	6161	6250
CPU(0%)	48636	0	6173	6229	6221	6203	6100	6080	6161	6250
Diagonal Factor	23893	9	3123	3111	3130	3122	3133	3129	3129	3124
GPU model = Tesla V100-SXM2-16GB

Depending on the GPU model, the Performance page now displays the current temperature, SM clock rate and power consumption for each GPU. This information may be used for monitoring and making any adjustments to temperature, clock or power limits. It is also useful in obtaining a snapshot on how well an algorithm divides it's work among the GPUs.