What Happened
OpenAI just shipped GPT-5.4 — and it's the first model that can use your computer better than you can. On the OSWorld benchmark for computer use tasks, GPT-5.4 scores 75%. Humans score 72%.
This isn't a chatbot upgrade. GPT-5.4 combines reasoning, coding, and agentic workflows into a single frontier model. It's available in ChatGPT (as GPT-5.4 Thinking and GPT-5.4 Pro) and in the API alongside Codex.
Why This Matters
Computer use is the bridge between "AI that talks" and "AI that does." When a model can navigate interfaces, fill out forms, extract data from dashboards, and chain together multi-step workflows across applications — it stops being a tool and starts being a coworker.
For orchestrators, this changes the game. You're no longer limited to API integrations. You can now build agents that interact with any software that has a screen — including legacy systems with no API at all.
The Numbers
- 75% — GPT-5.4's score on OSWorld computer use benchmark
- 72% — Human score on the same benchmark
- First model to credibly handle coding, computer use, and knowledge work at frontier level
What Orchestrators Should Do
If you're building agent workflows, test GPT-5.4's computer use capabilities against your existing tool-calling pipelines. In many cases, screen-based interaction will be simpler and more reliable than building custom API integrations — especially for tools that update their APIs frequently or don't have them at all.
The age of the screen-aware agent is here. Learn to orchestrate them.