Nice read, and an interesting approach… although it kind of tries to hide the elephant in the room:
This work has the potential to shift the way that image gen-erators operate at achievable costs to ensure that several cat-egories of harm from ‘AI’ generated models are mitigated, while the generated images become much more realistic and representative of the AI-generated images that populations want around the world.
They show that the approach optimizes for less “stereotypes” and less “offensive”, which in most cultures leads from worse to better “cultural representation”… but notice how there is a split in the “Indian” culture cohort, with an equal amount finding “more stereotypical, more offensive” to be just as good at “cultural representation”:
They basically made the model more politically correct and “idealized”, but in the process removed part of a culture representation that wasn’t wrong, because the “culture” itself is split to begin with.
That’s my point. They claim to reduce misrepresentation, while at the same time they erase a bunch of correct representations.
Going back to what I was saying: fine tuning doesn’t increase diversity, it only shifts the biases. Encoding actual diversity would require increasing the model, then making sure it can output every correct representation.
It doesn’t necessarily have to shift away from diversity biases. I think with care, you can preserve the biases that matter most. That was just their first shot at it, this seems like something you’d get better at over time.
I guess their main shortcoming was the cultural training set. I’m still unconvinced that level of fine tuning is possible without increasing model size, but we’ll see what happens if/when someone curates a much larger set with cultural labeling.
The labels might also need to be more granular, like “culture:subculture:period”, or something… which is kind of a snakes nest by itself.
I mean like this. This paper just dropped the other day.
Nice read, and an interesting approach… although it kind of tries to hide the elephant in the room:
They show that the approach optimizes for less “stereotypes” and less “offensive”, which in most cultures leads from worse to better “cultural representation”… but notice how there is a split in the “Indian” culture cohort, with an equal amount finding “more stereotypical, more offensive” to be just as good at “cultural representation”:
They basically made the model more politically correct and “idealized”, but in the process removed part of a culture representation that wasn’t wrong, because the “culture” itself is split to begin with.
“Indian” is a huge population of very diverse people.
That’s my point. They claim to reduce misrepresentation, while at the same time they erase a bunch of correct representations.
Going back to what I was saying: fine tuning doesn’t increase diversity, it only shifts the biases. Encoding actual diversity would require increasing the model, then making sure it can output every correct representation.
It doesn’t necessarily have to shift away from diversity biases. I think with care, you can preserve the biases that matter most. That was just their first shot at it, this seems like something you’d get better at over time.
I guess their main shortcoming was the cultural training set. I’m still unconvinced that level of fine tuning is possible without increasing model size, but we’ll see what happens if/when someone curates a much larger set with cultural labeling.
The labels might also need to be more granular, like “culture:subculture:period”, or something… which is kind of a snakes nest by itself.