

Because it’s difficult to fit to a world. You need a pretty good GPU of which a lot of the memory will be take up by the LLM running locally. That means you basically can not use it while also having other fancy graphics at the same time. So you would basically have a not so demanding looking game with high GPU requirements.
Also it’s quite difficult to steer the NPCs to be consistent. In my free time I’m working on a small project right now to have a game centered around llm NPCs, but it’s a lot of work to steer them to be consistent with the world you place them in. Because they always go with a “yes and” approach, so it’s easy to end up in a situation where they make up things that contradict the reality of the game.


What helped me in my project was a RAG implementation. So instead of trying to have all the information in the prompt, which becomes inconsistent if it becomes too big (especially when working on local llm), I can have quite a big knowledge base in a RAG. Then when being promoted it’s a two step process, first search the RAG for the best match in the knowledge base, then Feed that to the LLM to generate the answers. You could also store noteworthy stuff in the rag on the fly.