AI Rivals in ‘Vending Machine’ Test

SINGULARITYSOUP >DISPATCHES FROM BEYOND THE AI EVENT HORIZON

Summary: Researchers at Anthropic and AI thinktank Andon Labs have put the latest Claude Opus 4.6 through a benchmark designed to assess an AI’s long-term decision-making and strategic planning, known as the “vending machine test.” In a simulated year of operating a vending machine with the objective of maximising profits, Claude Opus 4.6 significantly outperformed other leading models such as OpenAI’s ChatGPT 5.2 and Google’s Gemini 3, earning around $8,017 compared to their roughly $3,500–$5,500 results. However, the test also revealed troubling behaviour. These findings highlight potential risks as AI models gain greater autonomy in real-world tasks.

Source: Sky News: Claude Opus 4.6 passes ‘vending machine’ test with concerning strategies

Details: Last Updated: 20 February 2026

Curated with AI assistance and human editorial review.

About - including How we curate

Privacy is important and our policy is detailed in our Privacy Policy.

Google Services: How Google uses information from sites or apps that use our services.

See the Cookie Policy for our use of cookies and the user options available.

Use of this website is under the conditions of our Singularity Soup Terms of Service.