- cross-posted to:
- programming@programming.dev
- technology@beehaw.org
- cross-posted to:
- programming@programming.dev
- technology@beehaw.org
Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.
(Since this is a personal blog I’ll clarify I am not the author.)



Oh, definitely. It’s 100% the responsibility of the human behind the bot in either case.
But the second option is scarier, because there are a lot more ignorant idiots than malicious bastards.
If these unsupervised agents can be dangerous regardless of the intentions of the humans behind them, we should make the idiots using them aware that they’re playing with fire and they can get burnt, and burn other people in the process.