How AI and Wikipedia have sent vulnerable languages into a doom spiral

alyaza [they/she]@beehaw.org · 10 hours ago

How AI and Wikipedia have sent vulnerable languages into a doom spiral

_NetNomad@fedia.io · 6 hours ago

reminds me of the scots wikipedia, which as of 5 years ago was 50% nonsense written by a kid from the US. except now they’ve found a way to automate it, yay!

https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-us-teenager-wrote-huge-slice-of-scots-wikipedia

realitista@lemmus.org · 10 hours ago

So it hasn’t sent the languages themselves into a doom spiral but rather the AI models for those languages into a doom spiral.

sleepundertheleaves@infosec.pub · edit-2 2 hours ago

Unfortunately, it’s likely to harm speakers of those languages as well. For these languages, there’s not enough training data on the Internet because speakers of those languages don’t have good access to the Internet - because of poverty, because of lack of education, because they live in isolated regions where access to the Internet is limited, all the factors that play into the “digital divide” between people who can access the Internet (and all its benefits) and people who can’t.

If people can’t access AI tools in their native language because LLMs for those languages were trained on recursive slop, but devices and operating systems are incorporating more and more AI into them anyway, it’s just going to worsen that digital divide, and be another factor encouraging young people to give up their native languages entirely.

Also, there’s the damage that bad AI-generated Wikipedia articles are doing to speakers of those languages already, which the article discusses.