Ahoy m@tes, the scraping bot situation has been escalating recently, as you all may have already noticed by the recent site instability and 5xx error responses. @tenchiken@anarchist.nexus has been scrambling to block new scraping subnets as they appear, but these assholes keep jumping providers so it’s been an endless loop and constant firefighting.
I finally had enough and decided to onboard a Proof-of-Work countermeasure, very much like Anubis which has been very popular on the fediverse lately. However I went with Haphash which has been especially designed around haproxy (our reverse proxy of choice) and is hopefully much more lightweight.
The new PoW shield has already been activated on both Divisions by Zero on Fediseer as well. It’s not active on all URLs,. but it should be protecting those which have the most impact on our database, which is what was causing the actual issue. You should notice a quick loading screen on occasion while it’s verifying you.
We’ve already seen a significant reduction in 5xx HTTP errors, as well as a slight reduction in traffic, so we’re hoping this will make a good impact in our situation.
Please do let us know if you run into any issues, and also let us know if you feel any difference in responsiveness. The first m@ates already feel it’s all snappier, but that just be placebo.
And let’s hope the next scraping wave is not pwned residential botnets, or we’re all screwed >_<


Proof of work means that your client has to do some “work” in order to gain access. It typically means a challenge that can’t be trivially solved, but can be trivially verified.
For example, the challenge may be something to the effect of:
“Give me a string, that when hashed by md5, results in a hash that ends in 1234”.
Your browser can then start bruteforcing until it finds a string (should take a few seconds max), and then it can pass the string back to the server. The server can verify with a single hash, and you’re in.
Its not wildly different to crypto mining, but the difficulty is much lower for antibot, as it needs to be solveable in seconds by even low end devices.
What stops the bots just solving it?
Two things: First, bots don’t typically allow JavaScript. No JS, no entry. A user can temporarily enable JS if they’re stuck on an endless loading screen. But a scraper won’t.
Second, the fact that they’d need to solve them for every single bot, and every single site they scrape. It’s a low barrier for regular users, but it’s astronomical for scrapers who are running hundreds of thousands of bots.
Cost of electricity for the most part. Having a scraper visit 100’s of URL’s per second isn’t unheard of, adding this should reduce the speed of the same scraper by 30-70% depending on the request
Funny, HTTPS is computationally-expensive for similar reasons, but I guess this system works across sessions, with a front-loaded cost.
I think they are on different scales, there is no bruteforcing involved in https/SSL.
I guess the bots don’t know how to rainbow table.
Its usually designed so that you can’t rainbow table.
That can’t be rainbow tabled, as the server can force a different salt.
(Note, I dont know the exact algorithms involved, just the general theory)
You can uniquely salt every request trivially, so rainbow tables are effectively useless.