ChatGPT Agent, Unified: Where Conversation Meets Action

Eric Walker · 23, July 2025

On July 17, 2025, OpenAI introduced ChatGPT Agent—a unified system that blends three previously distinct strengths into one: Operator’s ability to click, scroll, and type on real web pages; Deep Research’s talent for analyzing and synthesizing information; and ChatGPT’s reasoning and dialogue. The result is a single agent that can plan, browse, execute, and deliver tangible artifacts (slides, spreadsheets, code outputs) on its own virtual computer—while you stay in the loop and in control.

In practice, this means tasks like “summarize my calendar and brief me on clients using recent news,” “plan a dinner and buy ingredients,” or “analyze three competitors and produce slides” now run end-to-end: the agent chooses tools, navigates sites, prompts you to log in securely when needed, and returns editable deliverables. Compared to earlier, siloed tools, this unification removes the hand-offs that previously forced you to choose between “browse” and “analyze.”

Availability: Agent mode is rolling out to Pro, Plus, and Team first (Enterprise/Education to follow). Pro includes 400 monthly messages; other paid plans include 40, with top-ups available via credits. You toggle Agent mode from the tools dropdown in any chat.

What the Demos Show: Planning, Producing, and Multi-Tasking

OpenAI’s live demos centered on complex, multi-step scenarios you’d normally hand to a colleague: wedding planning with wardrobe suggestions and logistics; converting a mascot photo into sticker art and placing a bulk order; generating a full slide deck from data; even planning a multi-city MLB ballpark itinerary and exporting a polished spreadsheet. The key pattern across these demos is continuity: the agent reads, reasons, switches between a text browser, a visual browser, and a terminal, and returns to you with artifacts you can open in standard tools. You can interrupt, redirect, or take over the browser at any time.

The same control layer now extends to the desktop: the macOS ChatGPT app exposes Agent mode to Plus users and integrates with the OS for quick launch and voice interaction. (Windows support is still catching up.) While convenient, this deep OS integration also raises the stakes on security and user consent—areas OpenAI calls out explicitly.

Benchmarks: From “Can It Do the Task?” to “How Well Does It Work?”

If you care about external signals—not just staged demos—the numbers are notable:

Humanity’s Last Exam (HLE): The model powering ChatGPT Agent sets a new pass@1 state-of-the-art at 41.6%, rising to 44.4% with a simple parallel-execution strategy that selects the most confident run. HLE spans ~2,500 expert-level questions across 100+ subjects; performance here suggests improved broad reasoning, not memorization.

FrontierMath: With tool use (e.g., a terminal to execute code), the agent reaches 27.4%, exceeding o3 and o4-mini on one of the hardest math benchmarks with novel, unpublished problems.

SpreadsheetBench: When granted direct edit access, the agent scores 45.5% vs 20.0% for Excel Copilot—an unusually practical signal for knowledge-work. (OpenAI notes environment differences between LibreOffice on macOS and Excel on Windows in the grading details.)

BrowseComp: Accuracy reaches 68.9%, +17.4 points over Deep Research—important if your workflows depend on finding hard-to-locate facts on the open web.

WebArena: The agent outperforms the earlier o3-powered system behind Operator, signaling better real-world task completion in browser automation.

DSBench & Internal Knowledge-Work Tasks: On data-science workloads (DSBench) and a suite of economically valuable tasks—including investment-banking models—the agent matches or exceeds strong human baselines in a meaningful fraction of cases and handily beats OpenAI’s prior models.

From a reviewer’s vantage point, these aren’t vanity metrics. HLE and FrontierMath test transfer and reasoning; BrowseComp and WebArena test doing things on the web. SpreadsheetBench and the finance tasks test office reality: can an agent edit, format, and model like a junior analyst? The through-line is consistency across “think, browse, act” settings, which is what separates a true agent from a chat model with tools.

Safety, Control, and the New Risk Surface

OpenAI emphasizes consent and visibility: the agent narrates what it’s doing, requests permission before consequential actions, and lets you take over the browser at any time. But the very ability to act creates a wider attack surface—especially prompt-injection risks hidden in web pages that could try to exfiltrate data from connectors or induce harmful actions. OpenAI says it trained the agent to resist these attacks, requires explicit user confirmation for sensitive steps, and has monitoring in place. The company is also classifying the agent at “High Biological and Chemical” capability under its preparedness framework to trigger stricter safeguards, even as it notes there’s no definitive evidence of misuse potential at that level today.

This is the pragmatic bargain of agents in 2025: more autonomy and throughput, balanced by higher operational risk and the need for user oversight.

Why It Matters: “AI as Leverage”

One idea dominated the first-half-of-2025 discourse: AI as compound leverage. In recent talks and posts, OpenAI researcher Hyung Won Chung frames agents as combining labor leverage (software that does work for you) with code leverage (infinitely copyable, permissionless scaling). That framing explains why small teams can now punch above their weight: ten people directing ten agents can produce output historically reserved for far larger organizations. From this lens, ChatGPT Agent is less a gadget and more a force multiplier.

For a U.S. professional audience, the short-term takeaway is concrete: repetitive knowledge work (research, spreadsheet edits, first-draft slides, vendor comparisons) is now agent-legible. The long-term implication is organizational: workflows will be redesigned around agent-orchestrated steps, with humans supervising and setting intent. Free ChatGPT available on GlobalGPT, an all-in-one AI platform.

Who’s Behind the Work

The launch spotlighted contributions from researchers including Zhiqing (Edward) Sun, a Research Scientist at OpenAI with a Ph.D. from Carnegie Mellon’s LTI and a B.S. from Peking University, and Casey Chu, an OpenAI researcher who previously studied computational mathematics at Stanford (departed early) and earned a math B.A. from Harvey Mudd. Their public profiles mirror the agent’s focus: reasoning, multimodality, and real-world utility.

Hands On: What to Expect as You Adopt It

From a reviewer’s perspective, three expectations set the right baseline:

Faster first drafts, not final polish. The slide-generation pipeline remains in beta; structure is strong, visual polish will still need a human hand—at least for now.

You remain the product manager. The best results come when you break down the task, set guardrails, and step in at key decision points—especially where logins, payments, or data sharing are involved.

Treat browsing as production code. Assume the web may fight back (pop-ups, paywalls, prompt injections). Keep confirmation prompts on, and review artifacts before sending them downstream.

What's more

The original article’s thesis holds up: unifying Operator, Deep Research, and ChatGPT into a single, tool-choosing agent changes the slope for everyday and professional tasks. The benchmark gains are real, the demos map to credible workflows, and the safety story is at least proportionate to the newly expanded risk surface. If you’ve been waiting for agents that both think and do, ChatGPT Agent is the first broadly available release that makes that promise tangible.

Relevant Resources

AI Coding Tools Make Devs 19% Slower Despite Feeling Faster

Midjourney Enters the AI Video Race With V1 Model

Claude Artifacts 2.0 Makes App Building Simple

Gemini CLI vs Claude Code: Which AI Terminal Wins?

Claude Code's Rise: Anthropic Re-engineers Path to AGI