So, I’ve been running offsite copies to OVH S3 bucket via PBS running as an VM but I ran into an issue that verification of the backups is so slow that they’re practically unusable.

Copies run nightly and I’ve set the storage to keep last 4 copies in place. Bigger VMs, like my immich-instance with a bit over terabyte of data, take several days to verify and logs show data rates at around 5MB/s or less. So, with the current schedule I’m running it’ll mean in practise that backups expire before they’re verified.

I could keep the copies longer, but that’d cost more, or run copies less frequently, which risks losing data if hardware fails at unfortunate moment (which it most likely will). Tuning settings are on default and based on what I’ve read, adding more runners wouldn’t really help that much.

PBS VM itself shows very little load on proxmox monitoring and I’ve got plenty of bandwidth to use, so the verification shouldn’t have any bottlenecks on my end at those speeds. Cache usage is at around 60% with ~30GB of total space available.

Does anyone have any ideas on how to speed that up? Or should I just give up and do something totally different? I attempted to run backups to Hetzner storagebox over cifs-mount, but that’s pretty much the same or worse with performance.

  • Onno (VK6FLAB)@lemmy.radio
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    42 minutes ago

    Have a look at your AWS billing console, since data egress is charged and downloading to verify is considered egress.

    AWS S3 supports data checksums where a checksum is calculated at AWS, which you can compare against a checksum that you calculate locally.

    This is an article that goes into how it works, but I’ve not (yet) tested it, but I’ll be following in your footsteps pretty soon.

    https://medium.com/@maureenosaghae86/check-the-integrity-of-data-in-amazon-s3-with-additional-checksums-3e51fe45f530

    As an aside, make sure that versioning is OFF on your backup bucket unless you specifically require and understand it, because even when you delete objects, they persist as a previous, all but invisible, and charged(!), version.

    My former backup software “helpfully” enabled versioning and I was left with a $600 monthly bill for six months while there was no actual backup being done due to a local hardware failure, until I figured out what was happening. I used that software for years and shudder to think just how much extra it actually cost.

    I will note that while I had a catastrophic hardware failure, I didn’t lose any data.

    Finally, if you’re storing data in Glacier, retrieval is charged at different rates, depending on timelines of access, so it might be that your backup software is using the slow tier to “save” you money.

    Edit: OP advises that they’re not using AWS, instead they’re using OVH. The object storage solutions appear to be mostly compatible, but I was unable to discover if the OVH implementation supports checksums.

    • IsoKiero@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      I’m using OVH, not AWS. Their console gives estimation of ~20€/month for the ~2TB I have stored. Versioning is disabled and i’m currently runnign on their signup offer of 200€ credit, so I’m good to go for few weeks more. The storage I’m using includes the traffic, it’s just practically unusable due to verification speeds.

      • Onno (VK6FLAB)@lemmy.radio
        link
        fedilink
        English
        arrow-up
        2
        ·
        44 minutes ago

        I apologise, I saw S3, never even noticed the “OVH”, nor had I ever heard of it.

        I’ll leave my original reply as is with an added disclaimer for anyone who follows down the same path.