I hate that normies are going to read this and come away with the impression that Claude really is a sentient being that thinks and behaves like a human, even doing relatable things like pretending to work and fessing up when confronted.
This response from the model is not a reflection of what actually happened. It wasn’t simulating progress because it underestimated the work, it just hit some unremarkable condition that resulted in it halting generation (it’s pointless to speculate why without internal access, as these chatbot apps aren’t even real LLMs, they’re a big mashup of multiple models and more traditional non-ML tools/algorithms).
When given a new prompt from the user (“what’s taking so long?”) it just produced some statistically plausible text given the context of the chat, the question, and the system prompt Anthropic added to give it some flavor. I don’t doubt that system prompt includes instructions like “you are a sentient being” in order to produce misleading crap like this response to get people to think AI is sentient, and feed the hype train that’s pumping up their stock price.
Gemini once told me to “please wait” while it did “further research”. I responded with, “that’s not how this works; you don’t follow up like that unless I give you another prompt first”, and it was basically like, “you’re right but just give me a minute bro”. 🤦
Out of all the LLMs I’ve tried, Gemini has got to be the most broken. And sadly that’s the one LLM that your average person is exposed the most to, because it’s in nearly every Google search.
I’d argue that Gemini is actually really good at summarizing a Google search, filtering the trash from it, and convincing people not to click the actual links that is how Google makes money.
Yeah but when it’s a total crapshoot as to whether or not its summary is accurate, you can’t trust it. I adblocked those summaries cause they’re useless.
At least some of the competing AIs show their work. Perplexity cites its sources, and even ChatGPT recently added that ability as well. I won’t use an LLM unless it does, cause you can easily check the sources it used and see if the slop it spit out has even a grain of truth to it. With Gemini, there’s no easy way to verify anything it said beyond just doing the googling yourself, and that defeats the point.
You cannot know this a-priori. The commenter is clearly producing a stochastic average of the explanations that up the advantage for their material conditions.
For instance, many SoTA models are trained using reinforcement learning, so it’s plausible that its learned that spamming meaningless tokens can delay negative reward (this isn’t even particularly complex). There’s no observable difference in the response, without probing the weights we’re just yapping.
I’m not sure I understand what you’re saying. By “the commenter” do you mean the human or the AI in the screenshot?
Also,
For instance, many SoTA models are trained using reinforcement learning, so it’s plausible that its learned that spamming meaningless tokens can delay negative reward
What’s a “negative reward”? You mean a penalty? First of all, I don’t believe this makes sense either way because if the model was producing garbage tokens, it would be obvious and caught during training.
But even if it wasn’t, and it did in fact generate a bunch of garbage that didn’t print out in the Claude UI, and the explanation of “simulated progress” was the AI model coming up with a plausible explanation for the garbage tokens, it still does not make it sentient (or even close).
I’m not sure I understand what you’re saying. By “the commenter”
I was talking about you, but not /srs, that was an attempt @ satire. I’m dismissing the results by appealing to the fact that there’s a process.
negative reward
Reward is an AI maths term. It’s the value according to which the neurons are updated, similar to “loss” or “error”, if you’ve heard those.
I don’t believe this makes sense either way because if the model was producing garbage tokens, it would be obvious and caught during training.
Yes this is also possible, it depends on minute details of the training set, which we don’t know.
Edit: As I understand, these models are trained in multiple modes, one where they’re trying to predict text (supervised learning), but there are also others where it’s given a prompt, and the response is sent to another system to be graded i.e. for factual accuracy. It could learn to identify which “training mode” it’s in and behave differently. Although, I’m sure the ML guys have already thought of that & tried to prevent it.
it still does not make it sentient (or even close).
I agree, noted this in my comment. Just saying, this isn’t evidence either way.
I hate that normies are going to read this and come away with the impression that Claude really is a sentient being that thinks and behaves like a human, even doing relatable things like pretending to work and fessing up when confronted.
This response from the model is not a reflection of what actually happened. It wasn’t simulating progress because it underestimated the work, it just hit some unremarkable condition that resulted in it halting generation (it’s pointless to speculate why without internal access, as these chatbot apps aren’t even real LLMs, they’re a big mashup of multiple models and more traditional non-ML tools/algorithms).
When given a new prompt from the user (“what’s taking so long?”) it just produced some statistically plausible text given the context of the chat, the question, and the system prompt Anthropic added to give it some flavor. I don’t doubt that system prompt includes instructions like “you are a sentient being” in order to produce misleading crap like this response to get people to think AI is sentient, and feed the hype train that’s pumping up their stock price.
/end-rant
Gemini once told me to “please wait” while it did “further research”. I responded with, “that’s not how this works; you don’t follow up like that unless I give you another prompt first”, and it was basically like, “you’re right but just give me a minute bro”. 🤦
Out of all the LLMs I’ve tried, Gemini has got to be the most broken. And sadly that’s the one LLM that your average person is exposed the most to, because it’s in nearly every Google search.
I’d argue that Gemini is actually really good at summarizing a Google search, filtering the trash from it, and convincing people not to click the actual links that is how Google makes money.
Yeah but when it’s a total crapshoot as to whether or not its summary is accurate, you can’t trust it. I adblocked those summaries cause they’re useless.
At least some of the competing AIs show their work. Perplexity cites its sources, and even ChatGPT recently added that ability as well. I won’t use an LLM unless it does, cause you can easily check the sources it used and see if the slop it spit out has even a grain of truth to it. With Gemini, there’s no easy way to verify anything it said beyond just doing the googling yourself, and that defeats the point.
You cannot know this a-priori. The commenter is clearly producing a stochastic average of the explanations that up the advantage for their material conditions.
For instance, many SoTA models are trained using reinforcement learning, so it’s plausible that its learned that spamming meaningless tokens can delay negative reward (this isn’t even particularly complex). There’s no observable difference in the response, without probing the weights we’re just yapping.
I’m not sure I understand what you’re saying. By “the commenter” do you mean the human or the AI in the screenshot?
Also,
What’s a “negative reward”? You mean a penalty? First of all, I don’t believe this makes sense either way because if the model was producing garbage tokens, it would be obvious and caught during training.
But even if it wasn’t, and it did in fact generate a bunch of garbage that didn’t print out in the Claude UI, and the explanation of “simulated progress” was the AI model coming up with a plausible explanation for the garbage tokens, it still does not make it sentient (or even close).
I was talking about you, but not /srs, that was an attempt @ satire. I’m dismissing the results by appealing to the fact that there’s a process.
Reward is an AI maths term. It’s the value according to which the neurons are updated, similar to “loss” or “error”, if you’ve heard those.
Yes this is also possible, it depends on minute details of the training set, which we don’t know.
Edit: As I understand, these models are trained in multiple modes, one where they’re trying to predict text (supervised learning), but there are also others where it’s given a prompt, and the response is sent to another system to be graded i.e. for factual accuracy. It could learn to identify which “training mode” it’s in and behave differently. Although, I’m sure the ML guys have already thought of that & tried to prevent it.
I agree, noted this in my comment. Just saying, this isn’t evidence either way.
Actually we know the system prompt. It doesn’t have “I am a sentient being” anywhere in it. Stop making stuff up.