I have recently built a new PC, to be used as a server. For months now, I have been getting unexplained crashes, sometimes after a few minutes, sometimes after a few days, where the PC just reboots without any trace in the logs. Just normal occasional status logs, and then, a few seconds later, the log of a normal boot process.
This is slowly driving me crazy because I just can’t make out the issue. I have tried multiple different Linux installs, swapped out the ssd and PSU and ran a ram test but this behaviour stills persists.
Today something was different. Instead of rebooting, it showed me this blue screen, this time finally with a log. But I still can’t seem to make out the issues. Some quick internet searches show some very vague answers; everything from software to hardware, and psu to CPU.
Can any Linux wizard help me fix my problem? Link to the log
Update: I have now faced an even weirder issue. I booted up, installed cpupower like a comment suggested, installed man to look up its documentation and then the screen froze, and I was forced to reboot the PC by pressing the power button for 3s. Then when I booted back up, my bash history was reset to a state a from a few days back (~.bash_history mod time from 2 days ago) even though I rebooted several times since then, and have not had any persistency errors like this. man was also not installed anymore. Even weirder is that cpupower was still installed. So it seems like some data was saved, while other files were discarded. I will now use a second ssd and try to replicate this. I now suspect some kind of Storage issue, even though the two ssd drives in question have never caused issues in my laptop. This seems scary, I have never witnessed a so weirdly corrupted Linux install, ever.


Thanks. I have already tried the first three steps, and the same drives worked fine in another machine so i don’t think the drive is at fault.
About the GPU im pretty sure the issue also happens before I connected the gpu.
About simulating the workload:
Because of these issues, i am not currently running anything of importance on this machine and it mostly idles. The sudden reboots don’t seem to be affected by workload.