• 8 Posts
  • 23 Comments
Joined 2 years ago
cake
Cake day: June 14th, 2023

help-circle
  • What troubleshooting steps did you take so far? I would try these:

    • different OS, maybe a live usb running fedora or ubuntu if it is possible to emulate the workload where this appears
    • bios reset to defaults, no OC not even XMP
    • memtest, either the memtest86+ boot iso or the runtime memtester can detect obvious errors
    • long smart self test on OS drive and an fsck or scrub based on FS

    Also the logs show a very old nvidia gpu which is not supported by the new driver. I don’t know if this can cause crashes, haven’t used one in ages, maybe someone else has more insight.










  • I think calling it a “cache” is not precise. The primary function of the DRAM is to hold the dictionary for translating logical addresses (e.g. sectors) from the OS to the physical addresses (which NAND chip, which bank etc.). This indirection is needed for the controller to do wear leveling without corrupting the filesystem.

    On a SATA SSD without DRAM each read IO could mean 2 actual reads: first the dictionary to find the data and than the actual data being read. As you said HBM helps by eliminating this extra read.

    The read and write caching is just a use of the remaining DRAM capacity. Since modern Operating Systems use the general RAM for the same function it is usually just a small increase to the throughput.







  • Thanks for the links! I updated my config from z3fold to zsmalloc and adjusted the vm.page-cluster to test these out.

    Reading a bit more, I think when using large max_pool_percent (>30) with Zswap the two solutions are more similar than not. A crucial difference is what use-case is more acceptable since Zswap can cause unresponsiveness (and potential lockup) under high memory pressure. While Zram could result in an OOM crash in a similar worst-case scenario.




  • Most distros use systemd and its logging solution: journald. You can use journalctl to read the logs around the time of the crash for e.g.:

    • journalctl -S -5m this shows the last 5 minutes. Use this when a game crashes but the system continues working and did not reboot.
    • journalctl -b -1 -S -10m this shows the last 10 minutes from the previous boot. Use this if the crash froze the whole system and rebooted.

    Look for red lines (errors) and what wrote them. AMD GPU faults usually have the ‘amdgpu’ mentioned, memory errors could appear as ‘protection fault’.





  • Filesystem permissions

    For many apps it is not an issue and provides additional security but in other cases it’s very annoying and not trivial to fix.

    Example1: opening a .docx from Thunderbird flatpak with OnlyOffice flatpak does not work out of the box.

    Example2: mpv and VLC flatpaks work well for local files, but fail to open network shares from Dolphin.

    I think a possible solution would be runtime permission dialogs when denied access.


  • bazsy@lemmy.worldtoLinux@lemmy.mlbtrfs and nvme
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    I’m not 100% sure, but for me it caused a similar “freezing” or unresponsive experience when the daily cleanups run in the morning. If there was a freeze after every (even short) sleep and resume that might be a different issue.