Data Type

Integer

Default Value

120

Description

This Parameter specifies the maximum number of seconds a GPU will spend idle in a single cooling loop.

The more GPUs are used, the warmer they get. Under some circumstances they may reach a temperature (usually 100 deg. c) that will cause shutdown. This automatically occurs to protect the GPU. When this happens the application will eventually terminate with an unpredictable error condition. In order to prevent unwanted shutdown and job termination, FMS monitors the GPU temperature. If the temperature reaches GPUTHI deg. C, a cooling loop is initiated that prevents any additional tasks being run on the GPU. During this loop the GPU remains idle, allowing it to cool to a safe temperature, as specified with the GPUTLO Parameter. When the safe temperature is reached, processing resumes.

If after MAXTWS seconds the safe lower temperature has not been reached, processing will resume.

\