It’s that it’s only predicting the next token in a string of text.
An LLM has an internal state while predicting text. The “next token” chosen takes that state - a model of the world - into account. So a LLM is predicting the next token based on a world model and the previous text.
Saying that it is “only predicting the next token”, without more context, while technically true is very misleading.
An LLM has an internal state while predicting text. The “next token” chosen takes that state - a model of the world - into account. So a LLM is predicting the next token based on a world model and the previous text.
Saying that it is “only predicting the next token”, without more context, while technically true is very misleading.