Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
Hello Tweakers,

I apologise for writing this post in English. I am learning Dutch but I'm nowhere near the level of proficiency needed for writing something technical. Please pardon me for that.

First some background:
I posted on Tweakers a few months ago for advice on which system to build as a personal workstation. Since then, I've invested heavily in a AMD Threadripper system based off the 2950X. My hardware specs are here:
  • Samsung 970 EVO Plus 500GB - Solid state drive
  • ASRock X399M TAICHI
  • Gigabyte Radeon RX 580 GAMING 4GB
  • AMD Ryzen Threadripper 2950X - Processor
  • Corsair Vengeance LPX (32GB)
While I was really happy with the purchase experience; this system has turned out to be a nightmare. The system stays stable when the processor can be kept busy but it crashes on idle. This has been a well known issue with AMD processors and there seem to be multiple issues behind this problem. Some of these issues have been addressed by AMD; and some seem to have been ignored. Moreover, there is so much FUD surrounding this issue, that I believe AMD has given up on investigating this; as much as it infuriates me. Or it seems like AMD is not really bothered about this. There is a megathread on the kernel buzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196683 (some of the last posts are mine being frustrated.)

So, I now have a system that crashes on idle once-twice a month. It isn't much but it is enough to not make me want to rely on this system. I assume there are many people here like me that feel the same. A workstation is supposed to almost be a virtual home and a machine you can rely on for your livelihood. And at this time, this machine is costing me more peace of mind than giving me any.

Now; the question:

I'd like to ask the community what my options are.

The first option is an RMA. I've asked AMD for an RMA but it seems like a gamble to me. As I read online, people have been reporting mixed reviews about their RMA experience. In may cases, the crash on idle issue seems to not be addressed at all even after an RMA. It is also hard to determine in those cases if the crash-on-idle these people seem to experience indeed is from the processor and not from another faulty component. Clearly, I'm not too keen on undergoing this: weeks of downtime and then no guarantee of success.
What is the RMA experience in the Benelux region? I've also been hearing that AMD has at times taken a month for the roundtrip in replacing processors.

The second option is to sell. I have two issues with this. I have the ethical issue of selling a processor I consider faulty to someone else. But the silver lining there is that if this buyer is a Windows user, they will have no issues with the processor. There are reports that the processor has been /patched/ on Windows via the kernel. On Linux this has still not been done and there are no discussions about a patch anywhere to be seen. If I do sell, I'd like to ask what I should do, and what the asking price should be for this processor.
And in the selling option; I'd probably want to stay away from AMD and go for Intel: and what are my options in the similar spec range in Intel processors?
I've owned several Intel based systems in the past and I ran Linux on all of them. I've never had any stability issues with these systems.

The third option is to give up and buy a new processor. :-)

All in all, I'd like some advice here. And I'd like to know if other people have noticed these issues with AMD in general, especially on Linux.

As you might also understand; this is going to be a net loss for me. This system is a little over a year old and I'm personally exhausted by its issues. In the end, I want a stable system above everything else.

Once again, my apologies for writing this in English here; and I'll look forward to hearing back.

Acties:
  • 0 Henk 'm!

  • wouter.N
  • Registratie: Juni 2009
  • Laatst online: 03-06 21:16
Are you sure it isn't your GPU that is giving you those issues? I read a lot of bad stuff about the stability of the RX 580.

Acties:
  • +1 Henk 'm!

  • MAX3400
  • Registratie: Mei 2003
  • Laatst online: 27-05 17:27

MAX3400

XBL: OctagonQontrol

RMA is with the vendor and unless you purchased literally a boatload of CPU's with AMD, you need to talk to the shop you bought from.

There are lengthy procedures to be found (legally) to get things resolved or possibly cancelled / refunded.

Mijn advertenties!!! | Mijn antwoorden zijn vaak niet snowflake-proof


Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
@wouter.N no, I cannot be sure; and to be honest, I've queried this a little bit to see if the issue is with my GPU. One of them, at least in the past experience, has been that if a GPU causes instability, it usually crashes the X server and rarely the system as a whole. Even when it does crash the system, you can usually find clues in the logs/dmesg.
In this case, the system goes dead silent: there are zero log outputs or any sort of panics.

Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
@MAX3400 does that mean getting in touch with the assembly company and asking them for a replacement/refund? Like you said, this is going to be a lengthy process that will cost me time and effort.

Acties:
  • +1 Henk 'm!

  • nelizmastr
  • Registratie: Maart 2010
  • Laatst online: 16:12

nelizmastr

Goed wies kapot

Aren't these issues related more to the 1st and 2nd gen Threadrippers?

The third gen parts are significantly better in many ways. I wouldn't be surprised a 3rd gen part works much better overall. A new board shouldn't be necessary either. I know that's not really a solid answer towards a real solution that will fix your current chip, but it doesn't seem like something that can be fixed without swapping the CPU for a replacement through RMA or getting another one altogether.

If you're looking on the Intel side, there's honestly nothing interesting. The Core i9 10940X and 10980XE come closest, but they are 800-1100 euros respectively and offer poor value. I'd step down to a regular X570 Ryzen platform instead with the 3950X.

I reject your reality and substitute my own


Acties:
  • +1 Henk 'm!

  • MAX3400
  • Registratie: Mei 2003
  • Laatst online: 27-05 17:27

MAX3400

XBL: OctagonQontrol

asheshAmbasta schreef op zaterdag 1 augustus 2020 @ 16:02:
@MAX3400 does that mean getting in touch with the assembly company and asking them for a replacement/refund? Like you said, this is going to be a lengthy process that will cost me time and effort.
Find your local consumer rights. Or find someone that understands them for you. The shop actually has a given right to try and resolve it for several times / intervals. But from the topicstart, you haven't had any contact with them? Nor read up on warranty and consumer rights in the last months?

@nelizmastr he seems to live in Belgium but still. ;)

[ Voor 4% gewijzigd door MAX3400 op 01-08-2020 16:06 ]

Mijn advertenties!!! | Mijn antwoorden zijn vaak niet snowflake-proof


Acties:
  • +2 Henk 'm!

  • nelizmastr
  • Registratie: Maart 2010
  • Laatst online: 16:12

nelizmastr

Goed wies kapot

asheshAmbasta schreef op zaterdag 1 augustus 2020 @ 16:02:
@MAX3400 does that mean getting in touch with the assembly company and asking them for a replacement/refund? Like you said, this is going to be a lengthy process that will cost me time and effort.
Correct. In the Netherlands, warranty and RMA is done through the specific shop you bought the item from, it's not common to address a manufacturer directly, unless they directly sold you the item in question.

I reject your reality and substitute my own


Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
@nelizmastr yeah; but then, this is the second time I've invested in an AMD machine and regretted it. The last time was back in 2007 when the Linux story on AMD systems was much worse. That system was also anything but stable under Linux. And now this.
I do agree overall that compared to AMD's price offering, Intel has nothing going for itself in the desktop space.
Intel prices are also eye-watering.

--

@MAX3400 I'm from BE so I believe I'm quite well protected. This PC was built by Azerty.nl, who are absolutely fantastic to deal with. They have responded to my queries and I'll get back in touch with them asking for either a replacement or a full refund. The last time (at the time of buying the processor) I read their policies, they seemed to offer decent warranty coverage: although I'm forgetting the specifics.

Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
@nelizmastr actually, the 3950X does look like a ridiculously good processor at that price. And in many ways, it beats my current 2950X (at least as far as benchmarks go).

Acties:
  • 0 Henk 'm!

  • Damic
  • Registratie: September 2003
  • Laatst online: 06:36

Damic

Tijd voor Jasmijn thee

nelizmastr schreef op zaterdag 1 augustus 2020 @ 16:04:
Aren't these issues related more to the 1st and 2nd gen Threadrippers?

The third gen parts are significantly better in many ways. I wouldn't be surprised a 3rd gen part works much better overall. A new board shouldn't be necessary either. I know that's not really a solid answer towards a real solution that will fix your current chip, but it doesn't seem like something that can be fixed without swapping the CPU for a replacement through RMA or getting another one altogether.

If you're looking on the Intel side, there's honestly nothing interesting. The Core i9 10940X and 10980XE come closest, but they are 800-1100 euros respectively and offer poor value. I'd step down to a regular X570 Ryzen platform instead with the 3950X.
If you go for the new threadripper you need another board because the socket has changed.

@asheshAmbasta did you update the bios.
I know it's build by Azerty but maybe a reseat can sometimes solve some problems with TR.
From where in Belgium are you, I've got currently a spare mobo but when you say it crashes when idle, that's weird.

Al wat ik aanraak werk niet meer zoals het hoort. Damic houd niet van zijn verjaardag


Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
@Damic yeah I believe the socket is the same but the pin config has changed. That is going to be a pretty large investment of money and time.
I’m from Waregem.

Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
Update: I've written to Azerty asking for an RMA and if they can provide me with a replacement processor immediately (I am willing to pay them in advance as a security so they're not at risk.)
I understand this is probably a lot to ask and not how things go, but depending on this machine for my livelihood (esp. during a lockdown) doesn't make anything easier for me.
If Azerty don't agree to this, I don't see any option but to and reinvest in the 3rd gen. processors.
I looked at the Intel lineup and some benchmarks, I completely agree with @nelizmastr that there's nothing interesting on that side. The prices are hard to justify for processors that seem to get killed by AMD in benchmarks.
The one thing Intel does have going for it is that Linux seems to be better tested with Intel hardware. But given the rise in popularity of AMD processors, I believe that is likely to soon change (in a few years/months).

Acties:
  • 0 Henk 'm!

  • millman
  • Registratie: November 2005
  • Laatst online: 17-02 21:14

millman

6708Mhz @ -185c

Reboot in idle is usually a PSU problem. I read nothing about what PSU you have, id like to ask you to provide us with PSUinfo

'Heb ik eindelijk mn kamer opgeruimd, is de wereld een puinhoop'


Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
millman schreef op maandag 3 augustus 2020 @ 11:14:
Reboot in idle is usually a PSU problem. I read nothing about what PSU you have, id like to ask you to provide us with PSUinfo
I have the Corsair RM750X.
And to clarify; this is not a reboot when idle issue but a crash on idle. The CPU is the only component that crashes during idling: the GPU/Fans/lights remain on. This results in the entire system becoming unresponsive after being left on idle after some time. It is also not predictable.

Acties:
  • +2 Henk 'm!

  • PD2JK
  • Registratie: Augustus 2001
  • Laatst online: 15:37

PD2JK

ouwe meuk is leuk

Not sure if you have tested the RAM. You could run a simple Memtest86 for a few hours/passes. Maybe some errors come up with a particular memory module if you test them one by one. (I bet you've installed more than one DIMM)
If no errors occur, the RAM should be okay.
If you get errors with all the modules, make sure you disable XMP and test again. There could be a problem with timings or the frequency.

[ Voor 18% gewijzigd door PD2JK op 03-08-2020 14:02 ]

Heeft van alles wat: 8088 - 286 - 386 - 486 - 5x86C - P54CS - P55C - P6:Pro/II/III - K7 - NetBurst :') - Core 2 - K8 - Core i$ - Zen4


Acties:
  • +1 Henk 'm!

  • bierschuit
  • Registratie: Juni 2004
  • Niet online
Try to increase CPU load line calibration by a notch and/or set Vcore offset to +0.025mV. Maybe it just needs more power in idle than what it is getting now.

-edit- Now I'm thinking it could also happen when a sudden workload comes along that gives a downward voltage spike, and it's that low spike that causes system to become unresponsive, locking up.
You could also set phases to extreme or optimized and set a higher switching frequency. (not sure if Asrock lets you, as I am not familiar with their BIOS and Threadripper boards)

[ Voor 54% gewijzigd door bierschuit op 03-08-2020 13:16 ]


Acties:
  • 0 Henk 'm!

  • asheshAmbasta
  • Registratie: Juli 2017
  • Laatst online: 26-01-2021
@PD2JK good point there. I've run a memtest in the past but it may not hurt to check again. I'll try running it again tonight. I am led to believe that memory issues cause more recurring failures than what I'm witnessing now. The system can go for weeks of uptime and lock up one fine day when I've been away from my desk to get a coffee. The bad memory issues I've seen in the past seemed to be more recurring than this.

@bierschuit I think this is an interesting point since this is what most online buzz seems to be about. One workaround that seems to have worked for most people is setting "Low power current" level (or equivalent) to "Typical current idle" (or equivalent). This could explain that the setting on these motherboards (including mine) seem to control what level of current to pass through the processor when the processor has entered idle state (I believe AMD calls these C/P states).
Your second point also seems interesting to me, and I wonder if it can be tested. I can schedule something to run after N minutes and test repeatedly with that. However, if your theory is correct, I should've witnessed more crashes after getting to my desk and bringing back the system from idle: thereby momentarily increasing the power draw. This has never happened.
Pagina: 1