It is difficult to get a man to understand something, when his
salary depends on his not understanding it!—Upton Sinclair,
I, Candidate for Governor: And How I Got
Licked
I am curious how many people attribute code they copy out of Stack Overflow back to SO with the appropriate license attribution back to the post as required by the license:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
I find it difficult to give too much weight to the “generating a LLM based on Stack Overflow content without attribution is wrong” when people are knowingly and intentionally violating the CC-BY-SA license in their own code.
It’s also totally fucking different when someone on SO asks for help for their homework or for help with an nginx server on their home network, and when some tech firm decides to scrape 15 years worth of information created by countless people, and then spit it back out pretending like it’s some novel solution.
As I said in my original comment, I’m no fan of SO. But the behavior of neither the site nor the people who lurk and copy justify what LLMs are doing.
We should pursue with equal effort license violations of permissively licensed material no matter what the source. Ignoring it for some while preaching fire and brimstone for others weakens the strength of the argument and the license on which they are founded.
When trying to enforce a license, if it is possible to say “you are doing exactly what you accuse us of doing” it makes it more difficult to prosecute.
While two wrongs don’t make a right, two wrongs will substantially complicate prosecuting just one of them.
I am not arguing about the morality of one or the other… or how insignificant one of them is in comparison to the other.
My issue with just pointing to the LLM is about the integrity and enforceability of open source licenses.
I am curious how many people attribute code they copy out of Stack Overflow back to SO with the appropriate license attribution back to the post as required by the license:
https://creativecommons.org/licenses/by-sa/3.0/ and https://creativecommons.org/licenses/by-sa/4.0/ clearly state:
Sorry this is all I know.
I find it difficult to give too much weight to the “generating a LLM based on Stack Overflow content without attribution is wrong” when people are knowingly and intentionally violating the CC-BY-SA license in their own code.
Two wrongs don’t make a right.
It’s also totally fucking different when someone on SO asks for help for their homework or for help with an nginx server on their home network, and when some tech firm decides to scrape 15 years worth of information created by countless people, and then spit it back out pretending like it’s some novel solution.
As I said in my original comment, I’m no fan of SO. But the behavior of neither the site nor the people who lurk and copy justify what LLMs are doing.
We should pursue with equal effort license violations of permissively licensed material no matter what the source. Ignoring it for some while preaching fire and brimstone for others weakens the strength of the argument and the license on which they are founded.
When trying to enforce a license, if it is possible to say “you are doing exactly what you accuse us of doing” it makes it more difficult to prosecute.
While two wrongs don’t make a right, two wrongs will substantially complicate prosecuting just one of them.
I am not arguing about the morality of one or the other… or how insignificant one of them is in comparison to the other.
My issue with just pointing to the LLM is about the integrity and enforceability of open source licenses.