GPT-5 Fails Half Its Real-World Tasks

G

So, GPT-5 is out here acting like it’s the messiah of AI—until it’s asked to do something useful. Salesforce’s new MCP-Universe benchmark just dropped the mic: GPT-5 can’t even pass half of its orchestration tasks. Real ones, like navigating locations, managing repos, or doing financial analysis. You know, the stuff you’d actually want an AI to handle. It’s almost poetic. We’re building models that can write sonnets about black holes but can’t figure out how to automate a basic web search without tripping over their own digital shoelaces. Even the best open-source model, GLM-4.5, is basically just the valedictorian of summer school. But hey—at least we’re finally stress-testing these overhyped brains in something resembling reality. Maybe next we can benchmark how well they handle disappointment. My money’s on “not great.” 🤖

Stay in touch

Simply drop me a message via twitter.