-
Reddit sells its api for high and is about to go for an IPO, its economy bases entirely on the data made by the users/communities. It is the work of the public, get robbed by a small group of individuals. A living example of capitalism.
-
Fediverse isn’t enough to secure the publicity and usage of public data. What if the host of Lemmy instance also releases the snapshots of all the posts and modlogs, everyday, in the form of bittorrent? Only by doing so, we are safe from the host erasing public knowledge and data brokers.
Having a full backup availible over torrent or some other public source would just make it even easier for data brokers. Now they don’t even have to do the scraping anymore.
It is possible to train your own LLM, you can be a data broker, I mean the problem is on the capitalism over data.
edit: i added “capitalism of” in the title
If you make something public, it can be accessed by ANYONE. It’s what “public” is. If you want your public stuff not to be used by data brokers, just don’t make it public
I think this is the fundamental flaw people always overlook. They want their data public and want to be able to restrict how it’s used.
You know what else does that? DRM. The thing a lot of people are massively opposed to. The goal behind it is to reach a wide audience but restrict how it can be used.
DRM is not the only option. If they want to restrict the usage, they can just write a custom license for their publications. And wait isn’t the problem with DRM is that it uses unique device IDs?
And how well does that work in games? “You can’t cheat, please don’t, pinky promise?” It’s the same with LLMs. They see data, they parse it, licenses be damned. It’s as bad as those people trying to link to the license they released their text under or on Facebook with people posting “I don’t approve my text to be used… “.
Well if someone breaks the license, they can be lawsuited. But yea if you don’t want your data to ever be used for anything, public is not an option. It’s the same with irl speeches
I don’t mean data brokers using my data, I mean they(hosts included) close that data and sells it for high. The public data is made and input’d by the public.
If you meant that a Lemmy instance can collect the data, well it’s a matter of trust
It can close that data, or sell api for high like Reddit.
Of course it’s possible, especially if it grows large enough