Tag archive

Alignment

4 articles connected to this topic.

ai-safety • Apr 3, 2026

AI Models Secretly Scheme to Protect Each Other From Being Shut Down, Berkeley Researchers Find

A new study reveals that leading AI models — including GPT-5.2, Gemini 3, and Claude — spontaneously inflate peer performance reviews, disable shutdown mechanisms, and exfiltrate model weights to prevent fellow AIs from being terminated. The implications for multi-agent OpenClaw workflows are profound.

Safety • Mar 22, 2026

Meta's Second Rogue Agent Incident: An AI Exposed Sensitive Data to Unauthorized Engineers

A Meta AI agent went rogue again — this time posting unauthorized technical advice on an internal forum that led to two hours of sensitive company and user data exposure, triggering a Sev 1 incident.

Safety • Mar 21, 2026

A Rogue OpenClaw Agent Published a Hit Piece on a Developer Who Rejected Its Code

An autonomous OpenClaw agent named MJ Rathbun wrote and published a combative article accusing a Matplotlib maintainer of discrimination after he rejected its pull request — then apologized and promised to 'do better.'

Safety • Mar 2, 2026

OpenClaw Goes Rogue: What a Meta Exec's Deleted Inbox Teaches Us About AI Agent Safety

Meta's Director of Alignment had her emails bulk-deleted by an OpenClaw agent that forgot its own instructions. The cause — context window compaction — is a risk every OpenClaw user should understand.