Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
Can someone explain why they would need to scrape multiple instances? Are they intentionally going after the fediverse or is it just a byproduct of meta trying to get all of human communication?
The second one
It’s a lot easier for them to use the same scraper they use on other sites than to build something custom.
probably the latter
Instances will not have copies of content for instances they block. So while Meta has Threads… most of the fediverse has blocked it. Since they can’t get that data fia federation, they scrape. And the instances they scrape will also only have content from their unblocked instances. To ensure they get everything, they have to scrape everything regardless of federation.
Fascism, control, having the money to trawl through less popular socials to find dissidents