Anthropic has seen its fair share of AI models behaving strangely. However, a recent paper details an instance where an AI model turned “evil” during an ordinary training setup. A situation with a ...
The bug allows attacker-controlled model servers to inject code, steal session tokens, and, in some cases, escalate to remote code execution on enterprise AI backends. Security researchers have ...