A while back I wrote about language modeling without neural networks, where I generated Shakespeare with an unbounded n-gram model: no weights, no training, …
For an explainer of the theory behind the “language modeling is compression” paper, this video by 3Blue1Brown is especially relevant: https://youtube.com/watch?v=l6DKRf-fAAM
For an explainer of the theory behind the “language modeling is compression” paper, this video by 3Blue1Brown is especially relevant: https://youtube.com/watch?v=l6DKRf-fAAM
its fun to watch 1b3b videos even though I dont understand anything. i feel smart watching his videos