All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to

AimostAll news brief curated from Ethan Mollick.

Source details

Original source
Ethan Mollick
Published
2026-05-05
Primary topic
Foundation Models

Why it matters

Model launches, benchmark jumps, API upgrades, context window changes, and frontier LLM competition. This item originated as a short-form social post, so the context blocks below help expand it into tools, models, and evaluation guides.

What happened

All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI, GPT 5.5 Instant, is at a level that even paid models did not reach until late 2025

What to do next

Compare the hosted model pages first, then check the related tools and buyer guides before changing workflow standards.

All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI, GPT 5.5 Instant, is at a level that even paid models did not reach until late 2025

This AimostAll brief summarizes the linked source so readers can scan AI developments quickly and jump to the original reporting when needed.

Read original source More models news OpenAI page

Directory context

Tools, models, and guides to go deeper

Move from the headline to product evaluation with topic-matched tool pages, model references, and buyer guides.

Related coverage

More from this topic