NVIDIA Issues Hotfix for GPU Driver’s Overheating Issue

Must Read
bicycledays
bicycledayshttp://trendster.net
Please note: Most, if not all, of the articles published at this website were completed by Chat GPT (chat.openai.com) and/or copied and possibly remixed from other websites or Feedzy or WPeMatico or RSS Aggregrator or WP RSS Aggregrator. No copyright infringement is intended. If there are any copyright issues, please contact: bicycledays@yahoo.com.

Yesterday NVIDIA rushed out a important hotfix to comprise the fallout from a previous driver launch that had triggered alarm throughout AI and gaming communities by inflicting programs to falsely report protected GPU temperatures – whilst cooling calls for quietly climbed towards probably important ranges.

In NVIDIA’s official submit across the hotfix launch, although solely third within the listing of said fixes, the problem is cited as β€˜GPU monitoring utilities might cease reporting the GPU temperature after PC wakes from sleep’.

Shortly after the affected Sport Prepared driver 576.02 was rolled out, a pinned thread on the Secure Diffusion sub-Reddit, titled Learn to Save Your GPU!, grew to become a useful resource for anecdotal points and user-reported updates regarding the new driver. From these, and different stories across the internet, some time-line of emergent issues will be established.

The primary Reddit report of the bug appears to have occurred late Friday afternoon UTC, on the ZephyrusG14 subreddit, the place the person fricy81 cited a submit at NVIDIA boards (archived):

A person at NVIDIA boards finds points after the 576.02 replace. Supply: https://www.nvidia.com/en-us/geforce/boards/game-ready-drivers/13/563010/geforce-grd-57602-feedback-thread-released-41625/3524072/

The person at NVIDIA boards reported that after putting in the motive force replace, instruments like MSI Afterburner and in-game screens such because the one in Name of Obligation (which usually entry native system readings, a lot as Job Supervisor’s GPU panel does in Home windows) stopped updating GPU temperature readings, freezing at round 35-36Β°C.

Restarting the monitoring software program had no impact, the person said, and solely a full system reboot would restore correct readings. Instruments like HWInfo and NVIDIA’s personal monitoring app continued to report temperatures appropriately. The person emphasised that the problem occurred throughout regular use, not simply after waking the system from sleep.

Consumer suggestions throughout varied boards highlighted a common disruption of regular fan curve conduct and an alteration of core thermal regulation, leading to graphics processing models idling at unexpectedly excessive temperatures, and alarmingly overheating beneath what would sometimes be thought-about customary operational hundreds, as detailed on this remark:

β€˜I might inform one thing was off. The climate outdoors was in all probability round 55Β°F / 12Β°C, however I used to be cooking alive in my room. My window was open, and but I couldn’t really feel any distinction. All of the followers had been operating at max, and temps regarded wonderful at firstβ€”round 68Β°C to 72Β°C after gaming for some time.

β€˜At first, that appeared regularβ€”till the subsequent morning, after I realized these aren’t idle temps, and the followers had been nonetheless [kicking].

β€˜I had carried out some AI overclocking after fixing a number of issues these days, so I wasn’t certain if the values had simply spiked too excessive. It’s occurred as soon as earlier than after putting in ASUS AI Suite 3 – the BIOS settings wouldn’t even work correctly due to it.

β€˜Anyway, I went forward and rolled again to an older driver for now.’

Sub-Optimum

The official launch PDF for the 576.02 driver replace affords some clues about modifications which will have contributed to the brand new points. In part 5.5, NVIDIA acknowledges that GPU temperature will be reported incorrectly on NVIDIA Optimus programs, particularly exhibiting zero levels when no functions are operating.

Part 5.5 of the official 576.02 replace notes addresses temperature-monitoring points that appear to have affected a wider variety of programs than the Optimus system. Supply: https://us.obtain.nvidia.com/Home windows/576.02/576.02-win11-win10-release-notes.pdf

The discharge states:

5.5 GPU Temperature Reported Incorrectly on Optimus Techniques

5.5.1 Concern

On Optimus programs, temperature-reporting instruments akin to Speccy or GPU-Z report that the NVIDIA GPU temperature is zero when no functions are operating.

5.5.2 Rationalization

On Optimus programs, when the NVIDIA GPU will not be getting used then it’s put right into a low-power state. This causes temperature-reporting instruments to return incorrect values. Waking up the GPU to question the temperature would lead to meaningless measurements as a result of the GPU temperature change because of this.

These instruments will report correct temperatures solely when the GPU is awake and operating.

NVIDIA Optimus is a GPU switching know-how that toggles between built-in and discrete graphics based mostly on software calls for, with a view to routinely steadiness efficiency and energy consumption, designed to preserve battery life and cut back energy consumption. For duties akin to gaming or HD video playback, Optimus prompts the discrete GPU for higher efficiency; throughout lighter actions akin to internet shopping, it reverts to built-in (onboard) graphics.

The replace seems to have prolonged a conduct beforehand restricted to Optimus programs, permitting the affected GPU to enter a low-power state whereas idle, even when not hosted on an Optimus system, in flip disrupting temperature reporting in third-party instruments.

Threat Adjustment

In most eventualities, it’s truthful to say that the graphics card’s VBIOS would seemingly have prevented everlasting GPU harm. VBIOS enforces thermal and energy limits on the firmware stage, independently of the motive force.

Subsequently even when a driver had been to trigger improper fan conduct or misreport temperatures, the VBIOS ought to nonetheless throttle efficiency, ramp up fan exercise, or else shut down the GPU to stop {hardware} failure.

That doesn’t imply the danger was trivial – sustained excessive temperatures can degrade efficiency over time or stress adjoining parts; moreover, absent a standard understanding that an up to date driver brought on an issue (not least in programs the place drivers replace β€˜silently’), a difficulty of this nature might mislead a big proportion of affected customers, who might try treatments for non-existent issues, and even probably trigger harm to their programs by making use of non-relevant β€˜fixes’.

The errant conduct brought on by replace 576.02 was notably alarming for these engaged in synthetic intelligence workflows, the place high-performance {hardware} is routinely pushed to its thermal limits for prolonged durations.

The problematic 576.02 driver impressed a broader rash of complaints after its launch in mid-April, regardless of preliminary stories that it supplied some useful efficiency enhancements. However the availability of the hotfix, and the extent of disruption that 576.02 appears to have brought on, on the time of writing it stays obtainable for obtain* at NVIDIA’s website.

Afterglow

By way of the fallout from the defective replace, there are quite a few kinds of harm and or inconvenience reported: person Frankie_T9000 reported that his GPU crashed on boot attributable to warmth buildup beneath the fault replace, and solely stabilized after undervolting. He commented β€˜seems to be like its not completely harmed however must repaste asap (I’ve pads coming wednesday) suspect the outdated thermal paste was aged extra by the warmth buildup so im placing new paste pads.β€˜

Yesterday one other person in the identical thread said: β€˜Im utilizing a customized fan curve wit msi afterburner, and it stored exhibiting that my gpu temps had been consistently at 27Β°C, so the followers did not activate, which led to overheating points. I believed it was a me situation however after putting in the earlier driver all of it labored out wonderful once more. Also, the temps arent displayed appropriately in taskmanager.’

Although NVIDIA (because it states persistently in every hotfix launch) usually gives hotfixes for explicit video-games or platforms, the danger of warmth harm to or round a GPU is greater for AI practitioners than for videogamers, since intensive machine studying processes akin to coaching or sustained inference place a GPU beneath constant long-term load – an occasion more likely to be triggered solely periodically in a recreation, which can β€˜spike’ into excessive utilization for a boss-battle or a very demanding map part, however which is in any other case designed as a compromise between GPU exploitation and system stability.

Β 

* Archive: https://archive.ph/ylVR1

First printed Tuesday, April 22, 2025

Latest Articles

Windsurf slashes prices as competition with Cursor heats up

AI coding assistant startup Windsurf reduce its costs β€œthroughout the board” it introduced on Monday, touting β€œlarge financial savings”...

More Articles Like This