• SavvyWolf@pawb.social
    link
    fedilink
    English
    arrow-up
    40
    ·
    12 hours ago

    I hate that I can’t tell if this is a reference to something that actually happened or not.

    • verstra@programming.dev
      link
      fedilink
      arrow-up
      37
      arrow-down
      25
      ·
      12 hours ago

      It’s probably something like “I’ve disabled agent’s removeFile tool, but LLM figured out that it can use the bash tool, still”.

      It looks like “AI bad” or “Claude insecure” mantra.

      • kingofras@lemmy.world
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        4 hours ago

        mantra

        The way LLMs work is that they actively will make multiple attempts to get past hurdles (because they have no intelligence or methodology) so guardrails need to be extremely tight for them to work, other wise the model will simply see it as one of the challenges to overcome.

        That’s the mantra, and that is very poor technology to put in the hands of people who don’t understand how it works.

      • OwOarchist@pawb.social
        link
        fedilink
        English
        arrow-up
        27
        arrow-down
        2
        ·
        9 hours ago

        It looks like “AI bad” or “Claude insecure” mantra.

        Until you solve prompt injection, they are indeed extremely bad for security and should never be given permissions that would allow them to do anything catastrophic.