Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
Instances will not have copies of content for instances they block. So while Meta has Threads… most of the fediverse has blocked it. Since they can’t get that data fia federation, they scrape. And the instances they scrape will also only have content from their unblocked instances. To ensure they get everything, they have to scrape everything regardless of federation.