How feasible is privacy-respecting personalised search engine results?

communism@lemmy.ml · 7 months ago

How feasible is privacy-respecting personalised search engine results?

CameronDev@programming.dev · 7 months ago

Self hosting search engines is very hard. The scraping, indexing and storage requirements are immense. You could definitely self-host a front end (with your QoL improvements), but the back end search engines (Bing/Google/etc) will be able to track you all the same.

HappyFrog@lemmy.blahaj.zone · 7 months ago

Are there even open source indexing software available?

Max-P@lemmy.max-p.me · 7 months ago

There’s YaCy. I’ve run a node for a while but it ended up filling up my server’s drive just indexing german wikipedia and the results were terrible.

And it’s still not private because you have to broadcast the query across the network.

CameronDev@programming.dev · 7 months ago

None that im aware of. There are webscrapers, and I guess you could just webscrape and dump the results into a postgres db and use it to index. But I’m guessing you’ll eventually want something more tuned/custom? But even if it existed, there is the discovery problem. How do you find the sites to scrape? Bing and google both let site operators submit urls, but that isn’t gonna scale to self-hosting.

HappyFrog@lemmy.blahaj.zone · 7 months ago

Yeah, exactly. I just think that making anonymous request from google or bing is private enough for me.

Echedelle (she/her)@lemmy.blahaj.zone · edit-2 7 months ago

Stract, Marginalia, Wiby, Mwmbl, etc

The two first are NLnet funded and the second one is one of the best developed despite it uses Java in contrast to Rust. I see the developer taking the development very seriously.

communism@lemmy.ml · 7 months ago

That’s a good point, I forgot that stuff like SearXNG are only frontends so in order to add personalisation to them you’d have to modify your queries to Bing/Google/etc I assume, rather than do what Google etc do with whatever algorithm they use for providing search results.

BlameTheAntifa@lemmy.world · edit-2 7 months ago

Self-hosting a search engine is unfortunately not feasible given the amount of data and power required for it. Not to mention access to the data (crawling yourself or using another index).

For privacy and customization there is Kagi, which is amazing and very customizable, but requires a paid subscription. You are a customer rather than the product, though.

confusedwiseman@lemmy.dbzer0.com · 7 months ago

I know I’m not exactly hitting the mark, have you looked at kagi? You can personalize the weighting of results from certain sites. You can also add lenses which will let you drive results to forums, programming, academia, etc.

To me it was a bit like reliving the early days of google with the don’t be evil mantra still in tact.

Let me also say, it appears to be privacy respecting.

It has been good for me so far. If someone sees a reason I should run away from this, please let me know why and what we all should use instead, I’d appreciate it!

communism@lemmy.ml · 7 months ago

Kagi’s an interesting one. The main reason why I don’t go with it is because you’d have to have an account, de-anonymising you. I know they have their “privacy pass” feature but that seems to essentially rely on trust that they aren’t tying your private searches to an account. And also $10/month for a search engine is just pretty steep for my budget.

confusedwiseman@lemmy.dbzer0.com · 7 months ago

I fully respect if it’s just not in the budget for you. A company has to make money somehow. I’d rather pay than get ads or worse let them collect and yet worse sell my data. Also, you can use a burner email and vpn if you want to add an extra layer of obfuscation in there for privacy.

Here’s a few links from their faq.

https://help.kagi.com/kagi/faq/faq.html#why-trust

https://help.kagi.com/kagi/faq/faq.html#why-should-i-pay-for-search

https://help.kagi.com/kagi/faq/faq.html#why-does-kagi-search-require-an-email-address

I really hope I don’t come off as a shill for them. It’s one of the few companies I actually really like.

I also run proton family, and really like the product offering. Their leadership gives me anxiety though. Promos and sales are only for new customers and standard pricing is a bit steep, but you do get multiple services.

communism@lemmy.ml · 7 months ago

Also, you can use a burner email and vpn if you want to add an extra layer of obfuscation in there for privacy.

It’s still all tied to one account. They could say, for instance, the same person searched for “beans”, “onions”, and “rice”, as opposed to not being tied to an account where those 3 searches could have come from 3 different people. Of course, a search engine like DDG is only promising to not track you to try figure out if those 3 searches came from the same person, but various anti-fingerprinting measures could make it infeasible for DDG to do that. For a paid search engine, you’d have to pay for a new account per search if you didn’t want it tied to any other searches, if you don’t trust that Kagi isn’t logging searches (which you shouldn’t, because you shouldn’t rely on trust for any threat model).

I really hope I don’t come off as a shill for them. It’s one of the few companies I actually really like.

Don’t worry, I get where you’re coming from and I most certainly think some people have a use-case for it.