• verstra@programming.dev
    link
    fedilink
    arrow-up
    37
    arrow-down
    25
    ·
    12 hours ago

    It’s probably something like “I’ve disabled agent’s removeFile tool, but LLM figured out that it can use the bash tool, still”.

    It looks like “AI bad” or “Claude insecure” mantra.

    • kingofras@lemmy.world
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      4 hours ago

      mantra

      The way LLMs work is that they actively will make multiple attempts to get past hurdles (because they have no intelligence or methodology) so guardrails need to be extremely tight for them to work, other wise the model will simply see it as one of the challenges to overcome.

      That’s the mantra, and that is very poor technology to put in the hands of people who don’t understand how it works.

    • OwOarchist@pawb.social
      link
      fedilink
      English
      arrow-up
      27
      arrow-down
      2
      ·
      9 hours ago

      It looks like “AI bad” or “Claude insecure” mantra.

      Until you solve prompt injection, they are indeed extremely bad for security and should never be given permissions that would allow them to do anything catastrophic.