Frog put Claude in a box

cm0002@toast.ooo · 13 hours ago

Frog put Claude in a box

SavvyWolf@pawb.social · 12 hours ago

I hate that I can’t tell if this is a reference to something that actually happened or not.

verstra@programming.dev · 12 hours ago

It’s probably something like “I’ve disabled agent’s removeFile tool, but LLM figured out that it can use the bash tool, still”.

It looks like “AI bad” or “Claude insecure” mantra.

kingofras@lemmy.world · 4 hours ago

mantra

The way LLMs work is that they actively will make multiple attempts to get past hurdles (because they have no intelligence or methodology) so guardrails need to be extremely tight for them to work, other wise the model will simply see it as one of the challenges to overcome.

That’s the mantra, and that is very poor technology to put in the hands of people who don’t understand how it works.

OwOarchist@pawb.social · 9 hours ago

It looks like “AI bad” or “Claude insecure” mantra.

Until you solve prompt injection, they are indeed extremely bad for security and should never be given permissions that would allow them to do anything catastrophic.

kingofras@lemmy.world · 4 hours ago

deleted by creator

dumnezero@piefed.social · 12 hours ago

mantra

you mean facts?

kingofras@lemmy.world · 4 hours ago

deleted by creator

Scipitie@lemmy.dbzer0.com · 11 hours ago

“It’s my circlejerk - so it’s a fact!”

dumnezero@piefed.social · 10 hours ago

I hope that you’re hired for long enough to learn what having security means in the context of using LLM “agents” and the like.

goondaba@lemmy.world · 12 hours ago

https://xcancel.com/sluongng/status/2060746160558543217#m