beep@piefed.world to Technology@beehaw.orgEnglish · 10 hours agoAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.comexternal-linkmessage-square10fedilinkarrow-up117arrow-down14file-text
arrow-up113arrow-down1external-linkAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.combeep@piefed.world to Technology@beehaw.orgEnglish · 10 hours agomessage-square10fedilinkfile-text
cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or Quote Source.
minus-squarespit_evil_olive_tips@beehaw.orglinkfedilinkarrow-up9·5 hours ago Anthropic: Claude learned yeah I’m gonna stop you right there
minus-squareXLE@piefed.sociallinkfedilinkEnglisharrow-up6·4 hours agoHumanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name
yeah I’m gonna stop you right there
Humanize the machine to dehumanize the person. Anthropic forgot the “mis” in their name