How AT&T Cut AI Costs by 90% with Multi-Agent Orchestration

Published: February 2026

What happened: AT&T was processing 8 billion tokens a day through large language models and facing unsustainable costs. Chief Data Officer Andy Markus restructured the orchestration layer, replacing a monolithic approach with a multi-agent stack — "super agents" directing smaller "worker" agents for specific, purpose-driven tasks. The result: 90% cost reduction and a jump to 27 billion tokens processed daily.

Why it matters: The architecture demonstrates that smaller, domain-specific language models can match large model accuracy at a fraction of the cost. Built on LangChain with Microsoft Azure, the stack now powers Ask AT&T Workflows — a drag-and-drop agent builder deployed to over 100,000 employees, with more than half reporting daily use and productivity gains of up to 90%.

Wider context: AT&T's approach reflects a growing industry scepticism about defaulting to frontier models for every task. Markus advocated for "interchangeable and selectable" models, phasing out homegrown tools as off-the-shelf alternatives mature — a pragmatic stance in a field where capabilities shift "multiple times a week."

Background: The company's internal tools include a natural language-to-SQL system that topped the Spider 2.0 accuracy benchmark. AT&T also uses an "AI-fueled coding" methodology that reportedly compressed a six-week data product build into 20 minutes. Human oversight remains embedded at every stage, with all agent actions logged and role-based access enforced throughout.

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90% — VentureBeat

Singularity Soup Take: AT&T's 90% cost cut makes a compelling case against the "bigger is better" default — routing the right task to the right-sized model turns out to be smarter, and considerably cheaper, than throwing frontier compute at everything.

Key Takeaways:

Scale vs. Cost: AT&T was processing 8 billion tokens daily through large models; restructuring around specialised smaller agents cut costs by 90% while tripling throughput to 27 billion tokens a day.
SLMs Hold Their Own: CDO Andy Markus stated small language models are "just about as accurate, if not as accurate" as large models on specific domain tasks — a finding with significant cost implications across the industry.
Adoption at Scale: Ask AT&T Workflows is live for over 100,000 employees; more than half use it daily, and active users report productivity gains as high as 90%.
Human-in-the-Loop: All agent actions are logged, data is isolated between agents, and role-based access is enforced — with a human always monitoring the full chain of agent activity.