Where does Lemmy store upvotes, downvotes, and replies?

Tehhund@lemmy.world · 2 years ago

Where does Lemmy store upvotes, downvotes, and replies?

Empricorn@feddit.nl · 2 years ago

Since you’ve gotten enough real answers, I’ll just remind you that upvotes are stored in the balls.

Tehhund@lemmy.world · 2 years ago

Truth. /thread

iso@lemy.lol · 2 years ago

All of those are replicated to all servers.

Teppic@kbin.social · edit-2 2 years ago

Posts and comments are federated (synchronised). Upvotes are actually a bit of a fudge, they are actually ‘Favourites’ if considered from an activity pub (e.g. Mastodon) perspective, and yes favourites are also federated.
Downvotes don’t exist in activity pub and, as a result, they do not federate between instances.
At least that is my understanding.

Max-P@lemmy.max-p.me · edit-2 2 years ago

Downvotes do federate, ~~but it uses protocol extensions to do it. So the downvotes won't federate to Mastodon~~, but it does for Lemmy and I think Kbin too

Nutomic@lemmy.ml · 2 years ago

Votes federate with standard Like and Dislike activities which are part of Activitypub. It’s just that some platforms like Mastodon can’t handle Dislikes.

Monkey With A Shell@lemmy.socdojo.com · 2 years ago

Can’t handle by choice I’d guess. Given the format of individuals following individuals rather than topics in communities it doesn’t make much sense for a person to follow someone only to downvote/dislike their comments.

AlexWIWA@lemmy.ml · 2 years ago

Honestly votes being federated seems like a bad idea imo. Would be easy to spin up an instance with thousands of fake users and manipulate posts.

Fediverse is already big enough that it could be lucrative to do so.

Shadow@lemmy.ca · 2 years ago

So then everyone just blacklists that instance. If the problem is really severe, we move to whitelisting.

It’s not hard to identify when someone is doing this.

AlexWIWA@lemmy.ml · 2 years ago

It’s not hard to identify if you’re looking for it, they just use one instance, they aren’t subtle about it, and they are only boosting a specific company instead of a variety of products and ideas.

Vote manipulation is hard enough to detect on Reddit where they have visibility top to bottom. I think this will become a major issue in the future.

This is on top of the already significant scaling issues votes are causing.

Other instances can cache the total count for historical reasons, to preserve lost instance vote counts, but keeping the full ledger is going to be a serious barrier to entry for hosters and a security (manipulation) issue.

Rogue@feddit.uk · 2 years ago

A whitelist defeats the decentralisation and openness of a defederated system.

I think you’re mistaken in your assumption it would be easy to identify malicious instances. Bots are notoriously difficult to fight, every time you block one method another workaround will appear.

Shadow@lemmy.ca · edit-2 2 years ago

I think you’re mistaken in your assumption it would be easy to identify malicious instances. Bots are notoriously difficult to fight, every time you block one method another workaround will appear.

I run a large instance and I look around in the DB occasionally when users complain, so I’m pretty familiar with what’s in there.

A whitelist defeats the decentralisation and openness of a defederated system.

True, but assholes are assholes and sometimes freedom and assholery don’t mix well.

PoliticalAgitator@lemm.ee · 2 years ago

Would it change anything besides their technique?

They almost certainly already have vote manipulation tools for reddit that work via browser automation, because someone offered me money to build one 10 years ago.

Those tools and a handful of accounts+vpns would already be borderline undetectable without the access needed to see that 25 accounts always voted the same way.

At least on Lemmy, you have that access. Reddit not only makes zero effort to prevent it, they actively obfuscate the information needed to spot it.

merthyr1831@lemmy.world · 2 years ago

I disagree. Reddit openly admitted to manipulating its upvote count to “deter bots”, especially since it became apparent that the front page of reddit became a very lucrative position to be if you were promoting a product, service, or ideology. In the post API world of Reddit, it’s more apparent than ever that votes are being manipulated to give users an illusion of activity that isn’t actually there.

In fact, Reddit’s manipulation was always as easy as paying someone to upvote a post a few hundred times within an hour of posting which in turn boosted it on the algorithm that displayed leading posts based on rate of activity instead of actual upvotes.

On the fediverse, being on the front page of an instance isn’t nearly as lucrative, and being on the front of ALL of them isn’t feasible. Even if one instance is manipulated, federation makes that effort null in seconds.

The fact these services aren’t monetised, are volunteer-funded, and don’t have the economic or advertising power as reddit does, really makes it harder for votes to be manipulated, let alone make someone want to manipulate the service.

Lemmy and Mastodon have issues with moderation but at worst the manipulation risk is nowhere near as bad as reddit. At best, it looks like corporate manipulation of social media is all but nonexistent on here. Let’s celebrate that

AlexWIWA@lemmy.ml · 2 years ago

That’s fair

Send_me_nude_girls@feddit.de · edit-2 2 years ago

Technically votes are public. Only UI is hiding them. Which should be resolved, one way or another.

Edit: there was a post with that here a few weeks ago. I understand that this isn’t a real answer to your question. Maybe you find it with these hints.

Edit2: Found it. Here you’ll find more. https://mylemmy.win/post/89871

pinkdrunkenelephants@lemmy.cafe · 2 years ago

Meaning admins are purposefully allowing other people to brigade others with alts.

Lemmy fucking blows.

Fisch@lemmy.ml · 2 years ago

how so?

pinkdrunkenelephants@lemmy.cafe · edit-2 2 years ago

Lemmy admins can see who is using alts to brigade others and ban them, yet they clearly don’t. They allow all kinds of skeevy bullshit from everyone – it took months of pressure to get them to even do so much as ban obvious problem instances like Hexbear.

They do it because they are selfish assholes who only care about power, and everyone just accepted they’re the dominant class in our little society here and that the big name instances like .world and .ml are perfectly fine with controlling the majority of content on the platform. It was never what was intended for federation in the first place, yet here we are.

Lemmy sucks as a platform because it’s not programmed to circumvent people’s base animalistic hierarchial nature and that is its problem.

The platform should automatically track for obvious alt and bot accounts and ban them.

It really should have a toggable hate filter that automatically bans people for using certain hate terms.

Accounts need to be tied to user machines so bans are actually halfway enforceable.

The platform shouldn’t really require mods or admins; an AI should monitor interactions and stop arguing or antagonistic encounters outright.

The admins should be acting fairly and impartially.

But none of that is happening because no admin is participating in good faith, they’re just looking to ensure they can do what they want without consequences, and so are the mods who have claimed almost every old subreddit name across instances under a few select usernames so they could have power over others and win confrontations.

And people can get away with power tripping because the platform wasn’t designed to take the fact that people do that into account. Any platform or social system that is not built on the first principle that humanity is inherently evil is bound to fail, and look what happened here. Perfect example.

the post of tom joad@lemmygrad.ml · 2 years ago

Did anyone make it all the way thru this? Does any of it match reality? Is it a bit? I honestly cannot tell

merthyr1831@lemmy.world · 2 years ago

And you trust literally any other social media website’s impressions count?

Vex_Detrause@lemmy.ca · 2 years ago

Where is my karma stored? ^/s

Unsustainable@lemmy.today · 2 years ago

It’s under my bed. You’ll have to pay me $10,000 to get it back.

asudox@lemmy.world · 2 years ago

Vex had too much karma, now it backfires and your karma is under his bed now instead.

peereboominc@lemm.ee · 2 years ago

What if someone sets up an instance, make a post and manipulate the upvotes? Just give it a million upvotes. That would break the whole system…

Or a bit more subtle, every upvote is multiplied by 10.

Max-P@lemmy.max-p.me · 2 years ago

Individual votes are federated but not by number but by user, so you'd have to set up fake users and then federate a vote from each of them.

That makes it rather easy to detect and identify and get that particular instance defederated.

Votes will still go from origin instance -> community instance -> other instance, be if the other instance has defederated the origin instance then it simply gets dropped.

Teppic@kbin.social · 2 years ago

If you use kbin you can even see who has made each upvote, so yes easy to then look for patterns of voting together and also at the profiles to see if the accounts looks like real people etc.

jabberati@social.anoxinon.de · 2 years ago

So the cost of getting a post on the front page of every Lemmy instance is the cost of registering a new domain.

Max-P@lemmy.max-p.me · 2 years ago

Until a mod catches it and reports it to the admins, yeah.

Lemmy isn't the absolute most well thought out platform in many regards, I don't think anyone expected Reddit to actively go hostile and drive such an amount of users to Lemmy.

cm0002@lemmy.world · 2 years ago

Lemmy isn’t the absolute most well thought out platform in many regards, I don’t think anyone expected Reddit to actively go hostile and drive such an amount of users to Lemmy.

Def not, I’d say Lemmy was at least a few years out from being stable and on par with Reddit as far as software goes. There are still fundamental questions and problems that need to be answered and solved.

I say was because Reddit going hostile and driving such a large influx of users is a bit of a double edged sword. On one hand it was just barely ready for more active use, but not to scale.

OTOH, the large influx is also driving accelerated development so Lemmy was years out before, but what about now now that it’s getting all this focus and drive to get things done, that I do not know, but I’d say it’s much faster than it was before

Call me Lenny/Leni@lemm.ee · 2 years ago

The mod log at the bottom of any Lemmy webpage, I think.

Unsustainable@lemmy.today · 2 years ago

deleted by creator

iso@lemmy.dbzer0.com · 2 years ago

haven’t worked with AP yet, but as a webdev I’m certain it’s original server only. Syncing upvotes between nodes would be an insane datavolume and one hell to properly keep in sync to begin with.

Dave@lemmy.nz · edit-2 2 years ago

My instance has 800 users, is 4 months old, and the database only is over 30GB. It is an insane amount of data.

Scrollone@feddit.it · 2 years ago

How much RAM does your server have to handle a 30 GB database?

Dave@lemmy.nz · edit-2 2 years ago

I’m a bad example. I haven’t properly tuned the settings, currently RAM will grow to whatever is available.

I’m very lucky, the instance is running in a proxmox container alongside some other fediverse servers (run by others), on dedicated hardware in a datacentre. The sysadmin has basically thrown me plenty of spare resources since the other containers aren’t using them and RAM not used is wasted, so I’ve got 32GB allocated currently. I still need to restart once a week or that RAM gets used up and the database container crashes.

It’s been on my list of things to do for a while, try some different postgres configs, but I just haven’t got around to it.

I know a couple of months back lemmy.world were restarting every 30 mins so they didn’t use up all the RAM and crash. I presume some time and some lemmy updates later that’s no longer the case.

I know some smaller servers get away with 2gb of RAM, and we should be able to use a lot less than 32GB if I actually try to tune the postgres config.

Nutomic@lemmy.ml · 2 years ago

There is a postgres command to show the size of each table. Most likely it is from activity tables which can be cleared out to save space.

Dave@lemmy.nz · 2 years ago

After the second-to-last update the database shrunk and I was under the impression there was some automatic removal happening. Was this not the case?

It’s helpful info for others but personally I’m not that worried about the database size. The size of the pictrs cache is much more of a concern, and as I understand it there isn’t an easy way to identify and remove cache images without accidentally taking out user image uploads.

Nutomic@lemmy.ml · 2 years ago

Yes there is automatic removal so if you have enough disk space, no need to worry about it.

The pictrs storage only consists of uploads from local users, and thumbnails for both local and remote posts. Thumbnails for remote posts could theoretically be wiped and loaded from the other instance, but they shouldnt take much space anyway.

Dave@lemmy.nz · 2 years ago

Yes there is automatic removal so if you have enough disk space, no need to worry about it.

What triggers this? My DB was about 30GB, then the update shrunk it down to 5GB, then it grew back to 30GB.

The pictrs storage only consists of uploads from local users, and thumbnails for both local and remote posts. Thumbnails for remote posts could theoretically be wiped and loaded from the other instance, but they shouldnt take much space anyway.

I’d be pretty confident that the 140GB of pictrs cache I have is mostly cache. There are occasionaly users uploading images, but we don’t have that many active users, I’d be surprised if there was more than a few GB of image uploads in total out of that 140GB. We just aren’t that big of a server.

The pictrs volume also grows consistently at a little under 1GB per day. I just went and had a look, in the files directory there are 6 directories from today (the day only has a couple of hours left), and these sum to almost 700MB of images and almost 6000 files, or a little over 100KB each.

The instance has had just 27 active users today (though of course users not posting will still generate thumbnails).

While the cached images may be small, it adds up really quick.

As far as I can tell there is no cache pruning, as the cache goes up pretty consistently each day.

Nutomic@lemmy.ml · 2 years ago

The activities table is cleared out automatically every week, items older than 3 months are deleted. During the update only a smaller number of rows was migrated so the db temporarily was slower. You can manually clear older items in sent_activity and received_activity to free more space.

Actually Im wrong about images, turns out that all remote images are mirrored locally in order to generate thumbnails. 0.19 will have an option to disable that. This could use more improvements, the whole image handling is rather confusing now.

Dave@lemmy.nz · 2 years ago

Thanks for the info! Ior performance reasons it would be nice to have a way to configure how long the cache is kept rather than disable it completely, but I understand you probably have other priorities.

Would disabling the cache remove images cached up to that point?

Skull giver@popplesburger.hilciferous.nl · 2 years ago

Yes it does, Lemmy keeps a record of all votes on the server and rebroadcasts them to other servers (most of the time). Other servers may get out of sync, especially when you take defederation into account, but that’s not a huge problem in my experience.

Network traffic is not as bad as you may think, especially with modern HTTPS libraries that will keep connections open while also multiplexing requests.

The protocol is described in https://www.w3.org/TR/activitypub/ (with a few implemented objects and implementations as the spec allows)

This is a example from the spec:

{"@context": "https://www.w3.org/ns/activitystreams",
 "type": "Like",
 "id": "https://social.example/alyssa/posts/5312e10e-5110-42e5-a09b-934882b3ecec",
 "to": ["https://chatty.example/ben/"],
 "actor": "https://social.example/alyssa/",
 "object": "https://chatty.example/ben/p/51086"}

That’s about 287 characters per vote.

Tehhund@lemmy.world · 2 years ago

Thanks, that’s very informative. How does this work since ActivityPub can be used for other things, e.g., Mastodon? They ignore any “Type” entries that they don’t support?

Skull giver@popplesburger.hilciferous.nl · 2 years ago

They ignore any “Type” entries that they don’t support?

Basically. For example, ActivityPub objects such as events or locations aren’t supported by many platforms (though they do exist).

Exact implementations differ per platform. Mastodon doesn’t have a like button, but it does have a favourite button, which is translated into a like when the activity federates. Downvotes are implemented as dislikes (an Activity Streams 2.0 feature, not part of the ActivityPub spec itself) but Mastodon just ignore those.

Furthermore, there are tons of extra JSON fields and extensions that allow servers of a particular type to talk to each other better. For example, take the JSON returned when I query for details on your user account:

curl -LH 'Accept: application/ld+json; profile="w3.org/ns/activitystreams"' https://lemmy.world/u/Tehhund | jq
{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://w3id.org/security/v1",
    {
      "lemmy": "https://join-lemmy.org/ns#",
      "litepub": "http://litepub.social/ns#",
      "pt": "https://joinpeertube.org/ns#",
      "sc": "http://schema.org/",
      "ChatMessage": "litepub:ChatMessage",
      "commentsEnabled": "pt:commentsEnabled",
      "sensitive": "as:sensitive",
      "matrixUserId": "lemmy:matrixUserId",
      "postingRestrictedToMods": "lemmy:postingRestrictedToMods",
      "removeData": "lemmy:removeData",
      "stickied": "lemmy:stickied",
      "moderators": {
        "@type": "@id",
        "@id": "lemmy:moderators"
      },
      "expires": "as:endTime",
      "distinguished": "lemmy:distinguished",
      "language": "sc:inLanguage",
      "identifier": "sc:identifier"
    }
  ],
  "type": "Person",
  "id": "https://lemmy.world/u/Tehhund",
  "preferredUsername": "Tehhund",
  "inbox": "https://lemmy.world/u/Tehhund/inbox",
  "outbox": "https://lemmy.world/u/Tehhund/outbox",
  "publicKey": {
    "id": "https://lemmy.world/u/Tehhund#main-key",
    "owner": "https://lemmy.world/u/Tehhund",
    "publicKeyPem": "-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAr8QYBRNqyM3A8JHL+rWD\nN22EJDEBd+1D8hzbOnevWnmalBhbp94MY5xyTCOfGIxYo1tZs5BeuM79JRT7eFV6\nefSPZclwri4XOmizgMY2VVRw2zH3zVXmKjbIn84JaNIUez5z5NAtqgzPr+UDxWIZ\n2lH0kJuZ2YBBvH3Bk1xsJznQ3olnh0hGD9+wU10fTSI4d/razTO+4btOMV5yQYry\noZ3RWD4Zq9nhKw5s4Sb5QPQ0NNHnPsnsZPip5FfN67XOQn/d/H2TzBAdKUtEIVBH\nDivI3FWPWmCbdaz3LImS5FpKNoJvoh7Dwlfh2eIE7mkZ9FH64DNw6cd6A2fSOm1w\nXQIDAQAB\n-----END PUBLIC KEY-----\n"
  },
  "endpoints": {
    "sharedInbox": "https://lemmy.world/inbox"
  },
  "published": "2023-06-11T19:07:49.583473+00:00"
}

Notice the special fields for PeerTube, LitePub, Matrix in the context object: these are additional fields to provide optional metadata for compatibility, in case they’re necessary. In your case (and in most cases to be honest), they’re not used.

ActivityPub has a relatively simple core architecture with lots of flexibility. You can ignore most of that flexibility to get an extremely simple client, or you can go through every server and find all the rich content they provide to build the mother of all social media apps.

Max-P@lemmy.max-p.me · 2 years ago

It does sync them, I can even query all of your votes on my local DB for every community my instance is tracking.

kglitch@kglitch.social · 2 years ago

They are synced. There is an insane data volume, yes. It is hell.