Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of “usable” ram set up as swap.
It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.
Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.
Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of “usable” ram set up as swap.
It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.
Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.
I would have to use an entire 1tb drive for swap but I’m sure I could manage 1 token before the heat death of the universe.
I’d worry less about the heat death of the universe and more about your hardware’s heat from all that load.