There is a reason there is sometimes a notable decrease in quality of the same AI model a while after it’s released.
Hosters of the models (like OpenAI or Microsoft) may have switched to a quantized version of their model. Quantization is a common practice to increase power efficiency and make the model easier to run, by essentially rounding the weights of the model to a lower precision. This decreases VRAM and storage usage significantly, at the cost of a bit of quality, where higher quantization results in worse quality.
For example, the base model will likely be in FP16, full floating point precision. They may switch to a Q8 version, which nearly halves the size of the model, with about a 3-7% decrease in quality.
Expertly explained. Thank you! It’s pretty rad what you can get out of a quantized model on home hardware, but I still can’t understand why people are trying to use it for anything resembling productivity.
There is a reason there is sometimes a notable decrease in quality of the same AI model a while after it’s released.
Hosters of the models (like OpenAI or Microsoft) may have switched to a quantized version of their model. Quantization is a common practice to increase power efficiency and make the model easier to run, by essentially rounding the weights of the model to a lower precision. This decreases VRAM and storage usage significantly, at the cost of a bit of quality, where higher quantization results in worse quality.
For example, the base model will likely be in FP16, full floating point precision. They may switch to a Q8 version, which nearly halves the size of the model, with about a 3-7% decrease in quality.
But if that’s how you’re going to run it, why not also train it in that mode?
Expertly explained. Thank you! It’s pretty rad what you can get out of a quantized model on home hardware, but I still can’t understand why people are trying to use it for anything resembling productivity.
It sounds like the typical tech industry:
“Look how amazing this is!” (Full power)
“Uh…uh oh, that’s unsustainable. Let’s quietly drop it.” (Way reduced power)
“People are saying it’s not as good, we can offer them LLM+ plus for better accuracy!” (3/4 power with subscription)