Data Type

Integer

Default Value

GPUTHI = 95 (deg c)
GPUTLO = 90 (deg c)
GPUTWS = 0 seconds (do not perform test)

Description

This optional test monitors the GPU temperatures to prevent an automatic GPU shutdown. Most modern GPUs perform this test automatically. However if you are using an older GPU, or experience an unexpected shutdown, you should select this test by specifying a nonzero value for GPUTWS (120 would be a good value).

The GPU Parameter GPUTHI specify a high temperature (deg. c) that the GPU should not exceed.

The more GPUs are used, the warmer they get. Under some circumstances they may reach a temperature (usually 100 deg. c) that will cause a shutdown. This automatically occurs to protect the GPU. When this happens the application will eventually terminate with an unpredictable error condition. In order to prevent unwanted shutdown and job termination, FMS monitors the GPU temperature. If the temperature reaches GPUTHI deg. C, a cooling loop is initiated that prevents any additional tasks being run on the GPU. During this loop the GPU remains idle, allowing it to cool to a safe temperature, as specified with GPUTLO Parameter. When the safe temperature is reached, processing resumes.

If after MAXTWS seconds the safe lower temperature has not been reached, processing will resume. A value of MAXTWS=0 is used to skip this test if you know your GPU performs it automatically.