https://arxiv.org/abs/2306.07899 here’s a paper that found that one of the biggest sources for LLM training data is corrupted by people using AI to complete the tasks. There are plenty of papers out there that show the effects of this, which they call “model collapse”.
1600 hours is insane gameplay loop not content size imo, I have that amount of hours in a few game but they’re either fighting games or ARPGs which are repetitive by nature.