Skip to content
OpenAI Launches GPT-5.4 with Built-In Computer Vision and Desktop Control
AI3 min
49

OpenAI Launches GPT-5.4 with Built-In Computer Vision and Desktop Control

OpenAI released GPT-5.4 and GPT-5.4 Pro featuring native computer vision and PC control capabilities. The model surpassed human performance in the OSWorld-Verified desktop management benchmark.

📝
CoinJP Editorial
0
CoinJP Editorial · 0 articles

Just two days after shipping GPT-5.3 Instant, OpenAI has unveiled its next-generation models — GPT-5.4 and GPT-5.4 Pro. The headline feature: native computer vision that lets the model see a screen and control a desktop using mouse and keyboard.

"GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model." — OpenAI (@OpenAI), original post

Availability and Pricing

The standard GPT-5.4 is available through ChatGPT's web interface, the API, and Codex. GPT-5.4 Thinking is open to Plus, Team, and Pro subscribers. GPT-5.4 Pro targets Pro-tier users and Enterprise clients, with API access included.

Base model pricing sits at $2.5 per 1M input tokens and $15 per 1M output tokens. The Pro version costs considerably more at $30 and $180 per 1M tokens, respectively.

Why This Matters

GPT-5.4 is OpenAI's first model with built-in computer vision and full desktop control. It can operate a mouse and keyboard based on screenshots and automate workflows through Playwright. Risk tolerance is configurable per use case.

On the OSWorld-Verified benchmark for desktop management, GPT-5.4 completed 75% of tasks — surpassing the previous version (47.3%) and human performance (72.4%). Enhanced visual perception shows across other tests too: MMMU-Pro (understanding and reasoning) hit 81.2% vs. 79.5% for GPT-5.2, while OmniDocBench (document analysis) error rates dropped from 0.140 to 0.109.

Professional Task Performance

On GDPval, a benchmark evaluating task completion across 44 professions, GPT-5.4 scored 83% — up from GPT-5.2's 70.9%. The model now performs at or above the level of domain specialists.

GPT-5.4 benchmark results
GPT-5.4 performance comparison with previous versions. Source: OpenAI

Developers focused heavily on spreadsheets, presentations, and documents. On tasks at the level of a junior investment banking analyst, GPT-5.4 scored 87.3% compared to 68.4% for GPT-5.2. Evaluators preferred the new model's presentations in 68% of cases, citing better aesthetics, variety, and effective use of image generation.

GPT-5.4 professional task results
GPT-5.4 results on investment banking analyst tasks. Source: OpenAI

Factual accuracy also improved significantly. When tested on prompts containing deliberate errors, individual false claims appeared 33% less often, and full responses contained errors 18% less often than GPT-5.2.

Coding and Tools

In programming, GPT-5.4 matches the specialized GPT-5.3-Codex while running faster. Codex now includes a /fast mode that speeds up code generation by 1.5x with no quality loss. Internal tests showed strong results on complex front-end development tasks.

An experimental Playwright (Interactive) skill lets the model visually debug web and Electron applications, testing its own code in real time during the writing process.

Another addition is Tool Search. Previously, the system had to preload descriptions of all available plugins into context, adding thousands of unnecessary tokens per request. Now the model receives only a base list and dynamically loads the parameters it needs. In tests using MCP Atlas, token consumption dropped by 47% with no accuracy loss.

Web search performance also improved: BrowseComp scores rose by 17%, with the Pro version reaching a record 89.3%.

Steerability and Context

GPT-5.4 Thinking in ChatGPT now displays an action plan before tackling complex queries. Users can adjust the direction mid-stream without restarting generation. The feature is live on the website and the Android app, with iOS support coming soon.

The model also maintains context more effectively in extended conversations and dedicates more reasoning time to complex tasks, keeping responses coherent and relevant when handling large volumes of information.

ai-modelsartificial-intelligencechatgptcomputer-visiongpt-5machine-learningopenai

Frequently Asked Questions

What is new in OpenAI GPT-5.4?

GPT-5.4 is OpenAI's first model with built-in computer vision and desktop control. It can operate mouse and keyboard based on screenshots and scored 75% on OSWorld-Verified, surpassing human performance at 72.4%.

How much does GPT-5.4 API access cost?

The base GPT-5.4 costs $2.5 per 1M input tokens and $15 per 1M output tokens. The Pro version is priced at $30 and $180 per 1M tokens respectively.

How does GPT-5.4 compare to GPT-5.2?

GPT-5.4 outperforms GPT-5.2 across all metrics: 83% vs 70.9% on GDPval, 87.3% vs 68.4% on junior investment banking analyst tasks. False claims appear 33% less frequently.

Who can access GPT-5.4 Pro?

GPT-5.4 Pro is available to Pro-tier subscribers and Enterprise clients. It is also accessible through the API at higher pricing tiers.

What is Tool Search in GPT-5.4?

Tool Search lets the model dynamically find and load relevant plugins instead of preloading all plugin descriptions into context. Tests using MCP Atlas showed a 47% reduction in token consumption with no accuracy loss.

Read also

AI

OpenAI Secures Record $110 Billion Round at $730 Billion Valuation

OpenAI closed the largest startup funding round in history at $110 billion, backed by Amazon, SoftBank, and Nvidia, with a $730 billion valuation.

4 min·🔥 1
AI

DeepSeek Launches V4-Pro: Open-Source Model Outperforms Claude Opus 4.6 and GPT-5.4

Chinese AI startup DeepSeek released a preview of its V4 model family, with the flagship V4-Pro boasting 1.6 trillion parameters and surpassing leading closed-source models in multiple benchmarks.

3 min·🔥 0
AI

AI Audit Uncovers Critical Liveness Bug in Ethereum's Nethermind Client

Octane Security's AI discovered a high-severity vulnerability in the Nethermind execution client that could have halted block production for 38% of Ethereum mainnet validators. The Ethereum Foundation awarded a maximum $50,000 bounty.

3 min·🔥 1
Analytics

Weekly Recap: Aave Ecosystem Rescue Mobilizes 100,000 ETH and Quantum Computer Cracks 15-Bit ECC Key

Bitcoin held near $78,000, the DeFi community rallied over 100,000 ETH to help Aave recover from the Kelp hack, and a researcher cracked a 15-bit ECC key on a quantum computer.

5 min·🔥 0
AI

Weekly Recap: Bitcoin Tests $74K, Miners Dump Holdings, ChatGPT Boycott Grows

Bitcoin briefly touched $74,000 before retreating to $67,500. Public miners sold over 15,000 BTC in five months, traders flocked to Hyperliquid for oil and gold futures, and a ChatGPT boycott gained major traction.

5 min·🔥 1
Analytics

Weekly Recap: Bitcoin Tests $78,000, Russia Introduces Criminal Penalties for Illegal Crypto Exchange

Bitcoin surged to $78,000 amid geopolitical developments, hackers drained hundreds of millions from Hyperbridge and Kelp, while Russia approved criminal liability for unlicensed crypto exchange operations.

4 min·🔥 0