I think “I don’t know” might sometimes be found in the training data. But, I’m sure they optimize the meta-prompts so that it never shows up in a response to people. While it might be the “honest” answer a lot of the time, the makers of these LLMs seem to believe that people would prefer confident bullshit that’s wrong over “I don’t know”.
I think “I don’t know” might sometimes be found in the training data. But, I’m sure they optimize the meta-prompts so that it never shows up in a response to people. While it might be the “honest” answer a lot of the time, the makers of these LLMs seem to believe that people would prefer confident bullshit that’s wrong over “I don’t know”.