The way LLMs work is that they actively will make multiple attempts to get past hurdles (because they have no intelligence or methodology) so guardrails need to be extremely tight for them to work, other wise the model will simply see it as one of the challenges to overcome.
That’s the mantra, and that is very poor technology to put in the hands of people who don’t understand how it works.
It looks like “AI bad” or “Claude insecure” mantra.
Until you solve prompt injection, they are indeed extremely bad for security and should never be given permissions that would allow them to do anything catastrophic.
I hate that I can’t tell if this is a reference to something that actually happened or not.
It’s probably something like “I’ve disabled agent’s
removeFiletool, but LLM figured out that it can use thebashtool, still”.It looks like “AI bad” or “Claude insecure” mantra.
The way LLMs work is that they actively will make multiple attempts to get past hurdles (because they have no intelligence or methodology) so guardrails need to be extremely tight for them to work, other wise the model will simply see it as one of the challenges to overcome.
That’s the mantra, and that is very poor technology to put in the hands of people who don’t understand how it works.
Until you solve prompt injection, they are indeed extremely bad for security and should never be given permissions that would allow them to do anything catastrophic.
deleted by creator
you mean facts?
deleted by creator
“It’s my circlejerk - so it’s a fact!”
I hope that you’re hired for long enough to learn what having security means in the context of using LLM “agents” and the like.
https://xcancel.com/sluongng/status/2060746160558543217#m