I've been seeing posts comparing GPT-5 and Sonnet, but thought comparing GPT-5
and Opus 4.1 would be more interesting!
So how do GPT-5 and Opus 4.1 perform with building apps? To find out I asked them both to build a full stack app for making chiptunes in Instant. Here’s the prompt I used:
Create a chiptunes app.- Log in with magic codes- Users should be able to compose songs- Users should be able to share songs- Users can only edit their own songs- Make the theme really cool- Let’s keep everything under 1000 lines of code.
I recorded myself going through the process in this video. In this post I’ll share the results and some of the surprises I discovered when prompting!
They both figured out auth, data models, permissions, and at least the flow to create songs in one go.
One difference is that GPT-5 was able to get song sharing working in one shot. Opus needed two additional nudges to get there. Initially Opus talked about making songs shareable, but did not actually implement it. First Opus added support for sharing songs, but gated it to logged in users. A second prompt helped Opus open songs for public consumption.
However, Opus’ UI was more slick. You can also see that GPT-5's UI has some responsiveness issues on mobile. I do think OpenAI improved UI skills a lot compared to their earlier models. For now I think Opus has the edge in UI.
Hiccups
Both models made a few errors before the projects built. Here’s how that looked:
Places the models had an error
GPT-5
Opus
db.SignedIn?
🐛
✅
Query Issues?
✅
🐛
Next Query Params?
🐛
🐛
Both models made about 2 errors. All errors were all related to new features. Next.js has a new flow for query params, and Instant just added a "db.SignedIn" component.
But both models fixed all errors in one shot. They just needed me to paste an error message and they were able to solve it.
It was interesting to see how GPT-5 made an error with "db.SignedIn". Instructions for how to use it were already included in the rules.md file. I think this is related to how closely the models follow rules.
Opus seemed to follow the rule file more closely, while GPT-5 seems to explore more. Opus used the exact same patterns that provided in the rules file. This let them skip past the "db.SignedIn" bug. On the other hand, GPT-5 seemed to be more free with what it tried. It did get more bugs, but it wrote code that was objectively more "different" then the examples that we provided. In one case, it wrote a simpler schema file.
What a change in 4 months…
We actually ran the same test in April.
We compared o4-mini with Claude 3 Sonnet. o4-mini made a barebones version (see here). Sonnet made a good UI but couldn’t actually write the backend logic.
Now both apps look pretty cool, both apps have auth, permissions, and a much slicker way to compose songs. You can take a look at the source files the new models generated. This is GPT-5 source, and this is the Opus source.
In the last few months it feels like Claude and Claude Code have been the dominant choice for vibe coding apps. With the new GPT5 model it feels like the gap is closing.
Really interesting times ahead!
Thanks to Joe Averbukh, Daniel Woelfel for reviewing drafts of this post