Why AI Agents Still Fail to Deliver on Their Promise

Published: 25 February 2026

What happened: PCMag has published a critical assessment of the current crop of AI agents — tools like ChatGPT Agent, Google's Project Mariner, and Microsoft's Edge Actions. Despite rapid proliferation across browsers, chatbots, and operating systems, the verdict is blunt: the technology doesn't work well enough yet to justify the hype.

Why it matters: AI agents are positioned as the next leap forward from chatbots — not just answering questions, but actually performing tasks on your behalf. But they routinely fail at basic web interactions, require constant human supervision, and often take longer than doing the task manually, undermining the core productivity argument.

Wider context: The agent push is one of the dominant themes in AI right now, with most major providers — OpenAI, Google, Microsoft, Perplexity — embedding agentic features into flagship products. Growing scrutiny is emerging around both real-world usability and safety, with concerns around prompt injection attacks and, in some cases, users bearing legal liability for actions taken by agents on their behalf.

Background: Like chatbots, AI agents are powered by large language models, but instead of responding with text, they respond by taking actions — controlling browsers, booking flights, adding items to carts. Not all agents are equally problematic: purpose-built tools like customer service bots largely work as advertised; the shortfalls are concentrated in general-purpose browser and app-control agents.

The AI Agent Hype Is Real. The Productivity Gains Aren't — PCMag


Singularity Soup Take: AI agents are a genuine glimpse of where computing is heading — but paying a monthly subscription to babysit software that takes twice as long to do your grocery shopping is, for now, a bad deal.

Key Takeaways:

  • Reliability Gap: AI agents routinely fail at common web tasks — solving CAPTCHAs, navigating complex sites, or completing multi-step workflows — often requiring users to step in and take over manually.
  • Speed Problem: Even when agents do complete a task successfully, they typically take longer than a human would, making the time-saving argument hard to sustain in practice.
  • Privacy and Legal Exposure: AI agents collect significant user data, and some services explicitly state users may bear legal responsibility for any actions taken by an agent on their behalf.
  • Improving, But Slowly: ChatGPT's Agent is noticeably faster and less error-prone than early implementations like Project Mariner, suggesting meaningful progress — just not enough yet for most people.