Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • scintilla@crust.piefed.social
    link
    fedilink
    English
    arrow-up
    6
    ·
    8 hours ago

    Can someone explain why they would need to scrape multiple instances? Are they intentionally going after the fediverse or is it just a byproduct of meta trying to get all of human communication?

    • frongt@lemmy.zip
      link
      fedilink
      arrow-up
      2
      ·
      6 hours ago

      It’s a lot easier for them to use the same scraper they use on other sites than to build something custom.

    • halcyoncmdr@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      Instances will not have copies of content for instances they block. So while Meta has Threads… most of the fediverse has blocked it. Since they can’t get that data fia federation, they scrape. And the instances they scrape will also only have content from their unblocked instances. To ensure they get everything, they have to scrape everything regardless of federation.