RTX3090 performance and stability issues, how to fix

I got an RTX 3090 (Asus Strix OC) start of the year when prices were heading upwards. It has had a lot of issues. Black screen, system crashes etc. I started to water cool it (active backplate as well) and it did help with VRM temps but they were still issues during certain gameplay.

I came across this video from jaztwocetns on issues with certain games that I do not play but have caused these cards to catch fire. I am not surprised given how bad the entire 30 series is.

He recommends some good ideas. One that caught my attention was power usage. I use T-Rex mining to test my cards. note: He mentions 3090 is 350 Watts but maybe he was referring to FE because mine is Asus Strix and it easily goes above 350 W when used in P Mode I have never tried the Q Mode which I assume is quiet mode.

Prior to Water cooled front and back plates, I would see overheating when mining so it would result in black screen and restarts. I was never able to get much info.

After water cooling.

I was seeing some 388 Watts power usage at default mining intensity of 22. So anyway I started to lower the max power by 5% and reached 80% with no noticeable drop in mining performance.

Here are the results at various power levels.

Ambient temp is around 29 degrees Celsius.

No Overclocking is being used. I have included MSI Afterburner screenshots.

Max power
power usage: 388 W I can easily push this higher without OC in certain games. I believe this when 3090 shits its pants.
Energy efficiency: 255-270 kH/W
Hashrate average 100MH/s (estimate, see Graph below, hashrate drops)
_temps: 48°C Core and 84°C VRM

80% of Max power
power usage: 310W I’ve reduced by almost 75 Watts here, thats like a 1060 :grinning:
Energy efficiency: 335 kH/W
Hashrate average 107MH/s (estimate, see Graph below)
_temps: 40°C Core and 78°C VRM

Cool temps and actually stays at below 80°C for *VRM temps

Graph of power versus average hashrate.

9 hours of Total test duration in underclocked mode.

Conclusion: reduce power usage safely by 20%, improve stability without sacrificing noticeable performance. Might even improve

I will update the posts with more tests as I under clock.

I have been able to further improve the efficiency of 3090 card when mining. This time I have been running at 70% max of power for over 5 hours. No issues whatsoever. Efficiency is now up to 370kH/W
this makes 3090 the most efficient card for mining.

I believe the issue of this card overheating mainly comes from Driver related bugs. Even with driver updates the 3090 has been unstable and seems to hit max temps, max clock and max
power very quickly even when the load does not demand it.

Mining is perhaps special as it focuses only on Cuda cores and is more memory clock bound than core. At least is my tests, this is what I have noticed. As long as your memory clock speed does not drop you are fine.

Hopefully more people can share their tests.

I was also able to repeat the power limit tests on 1070Ti and 1080 Cards. 1070Ti performed well with 27 Mh/s and responded to power saving. 1080 did not fare better however and to be honest its not a good card for mining.

I only have 1 GTX 1060 but it was already doing pretty well, it remains the best out of the box, mining Gpu in the market.

I am surprised though that 1070Ti performs really well for mining. 3 x 1070Ti could be about same as the 3090 performance for mining or pretty close.

It turns out that Asus Rog stric 3090 OC does not work well with MSI afterburner. When i used the Asus OC software it was better tuned to handle this card. It appears that Afterburner is not up to date.
Further, to afterburners credit, this card is designed to allow for more power. It can take up to 400W. However it is very unstable at 100% power. Even in the stock Gaming mode I have to cut power by 10% so there isnt an outright display crash which requires hard reset.

Ethereum mining performance is at 124.0 + MH/s and this is the max stable I have achieved. It is possible to push more but it would require a lot more trial/error and would likely not give me more value. If there is an example out there for better OC, it would help me test.