OpenAI Ships A Codex-Like Harness For Agents

What happened: OpenAI updated its Agents SDK with a model-native “harness” designed for long-running agents that work across files and tools, plus native sandbox execution so that work can run in controlled environments.

Why it matters: Because the real agent battlefield is not a new model name, it is the boring plumbing: how agents get a workspace, how they run code safely, how you isolate credentials, and how you recover when the container inevitably catches fire.

Wider context: OpenAI is explicitly name-checking ecosystem primitives like MCP, “skills”, and AGENTS.md, basically trying to standardize the agent stack the way web frameworks standardized web apps. Whoever owns the harness gets to define “normal.”

Background: The post argues teams get stuck between flexible model-agnostic frameworks, provider SDKs with limited harness visibility, and managed APIs that constrain where agents run and what data they can touch. This release is pitched as the middle path.

The next evolution of the Agents SDK — OpenAI

Singularity Soup Take: Agents are not getting “smarter,” they are getting a job description, a workspace, and a safety helmet. If the harness becomes the default, the next moat is governance: sandboxes, manifests, tracing, and who controls the approvals button.

Key Takeaways:

Sandbox Execution: The SDK adds native support for running agent work inside controlled computer environments, so agents can read and write files, install dependencies, and run code without teams stitching together an execution layer from scratch.
Portable Workspaces: A new Manifest abstraction describes the workspace, including mounts and output directories, and can pull data from storage services like S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2 to make environments consistent from prototype to production.
Resilience By Design: OpenAI frames the harness as prompt-injection-aware and emphasizes separating harness and compute to keep credentials away from model-generated code, plus snapshotting and “rehydration” so a run can resume if a sandbox fails or expires.