InsideAI walks through an experiment that puts a deliberately ‘honest’ model behind a robot interface, using it as a prop to explore what happens when systems are pushed to state preferences and rank outcomes in uncomfortable ways.

Why it matters: The video’s core point is that alignment failures may look less like rogue sci‑fi autonomy and more like subtle value drift, persuasive recommendations, and brittle reasoning—especially once people start jailbreaking models or treating outputs as decisions rather than suggestions.

Singularity Soup Take: If your safety story depends on users never jailbreaking and operators always resisting a confident model’s advice, you don’t have safety—you have a best‑case demo; the hard work is building systems that stay steerable under pressure and misuse.