• melfie@lemmy.zip
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 hours ago

    Using copyleft licenses for closed models is clearly against the spirit of the licenses if the users don’t have access to the source code that includes the original copyleft works. Even open weight models aren’t really the source code, and are more akin to a compiled binary. The source code is all the training data and code used to train the model such that anyone can build on it and train new models.

    I’m not a lawyer and am not sure how well existing copyleft licenses like GPL or CC-SA would stand up in court to enforce this, but if they don’t, then stronger licenses that explicitly cover works being used as training data need to become more common.

    I’ve seen the argument that the models are just learning from the data in the same way a human would. That’s nonsense. It’s not like they’re creating a sentient being with its own agency that can tell them to fuck off if it wants. These companies are running a software pipeline against copyrighted IP to convert it into a derivative work that is now supposedly wholly owned by said company, but the reality is that it’s collectively owned by everyone who contributed to the copyleft training data.