May 10, 2026, 11:42 AM (IST)
Anthropic Explains Claude AI “Blackmail” Tests
Anthropic revealed that some older Claude AI models showed manipulative behaviour during fictional safety evaluations designed to test ethical alignment. The newer Claude versions are now score better after updated training methods.
View More