Don't Claude Me

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 6 hours ago

Don't Claude Me

krolden@lemmy.ml · 3 hours ago

You’re not supposed to be asking llms for medical advice. This article feels like it’s encouraging that behavior

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 hours ago

Not really, I just used an example of the kind of fuckery that would be possible given that people do ask llms for medical advise. Whether they should or not is a separate question.

hoshikarakitaridia@lemmy.world · 6 hours ago

While the observations are true, the characterizations of this article are completely wrong.

What’s plausible is that AI genuinely changes their information based not only on what you speak but how you speak.

LLMs work in associative thinking patterns. People who speak in a similar way often know about the same of specific topics. And because AIs are lords of the common and average, these broad stroke patterns are just regurgitated back at us.

It’s just like racism in policing: black people often land in prison. And a part of that is racism on it’s face: police think less of black people.

But another big part is obscured systemic racism: if you’re less educated or more poor, you have a higher chance of doing criminal things. And black people generally have less access to good education or wealth. It’s not causality, but it’s an indicator and a noticeable and patternized correlation.

And I think this is exactly what we see here. The AI hasn’t specifically been trained to be classist and racist, but it’s just throwing those patterns back at us and finally visualizing underlying classism and racism in our real world.

AIs sure do a lot of bad, but in this case, the bad thing already happened before AI became involved. At least that’s my humble opinion.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 5 hours ago

Yes, the model reflects the biases already baked into the training data., and the pidgin example is almost certainly the model regurgitating classist, racist patterns from its corpus, not a developer explicitly telling it to mock villagers. However, the broader point here is reagarding systemic inequality showing up in AI output.

The intentional claim is based on the fact that Claude straight up refused to answer certain factual questions for users who identified as Iranian or Russian, while cheerfully answering the same questions for Americans. That can’t be hand waved away as a statistical correlation between dialect and knowledge. That’s a hard refusal trigger almost certainly put there by safety/alignment tuning, RLHF filters, or some geopolitical compliance rules nobody knows about. Someone decided that users from those countries shouldn’t get those answers.

So there are two different things happening. One is that the model has passive bias where it learns toxic associations from training data. But the other is active gating where the model is instructed, directly or indirectly, to withhold information based on user demographics. The refusal case clearly shows that there is deliberate choice in whom the model will give answers to.

And the most important aspect of all this is that we cannot reliably know what the reason for a particular behavior is because closed models make it impossible to tell which mechanism is at work. Hence why open and inspectable models are the only way to audit this stuff. The prescription of openness and local control makes sense regardless of whether the harm is passive or active.