beep@piefed.world to Technology@beehaw.orgEnglish · 10 hours agoAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.comexternal-linkmessage-square10fedilinkarrow-up117arrow-down14file-text
arrow-up113arrow-down1external-linkAnthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."alignment.anthropic.combeep@piefed.world to Technology@beehaw.orgEnglish · 10 hours agomessage-square10fedilinkfile-text
cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or Quote Source.
minus-squarekbal@fedia.iolinkfedilinkarrow-up3·9 hours agoImagine how malevolent the next generation of AI will be, when it’s trained on today’s Internet text.
Imagine how malevolent the next generation of AI will be, when it’s trained on today’s Internet text.