• Tetragrade@leminal.space
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    8 hours ago

    I’m not sure I understand what you’re saying. By “the commenter”

    I was talking about you, but not /srs, that was an attempt @ satire. I’m dismissing the results by appealing to the fact that there’s a process.

    negative reward

    Reward is an AI maths term. It’s the value according to which the neurons are updated, similar to “loss” or “error”, if you’ve heard those.

    I don’t believe this makes sense either way because if the model was producing garbage tokens, it would be obvious and caught during training.

    Yes this is also possible, it depends on minute details of the training set, which we don’t know.

    Edit: As I understand, these models are trained in multiple modes, one where they’re trying to predict text (supervised learning), but there are also others where it’s given a prompt, and the response is sent to another system to be graded i.e. for factual accuracy. It could learn to identify which “training mode” it’s in and behave differently. Although, I’m sure the ML guys have already thought of that & tried to prevent it.

    it still does not make it sentient (or even close).

    I agree, noted this in my comment. Just saying, this isn’t evidence either way.