All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to

Foundation Models OpenAI Ethan Mollick 2026-05-05

Source details

Original source: Ethan Mollick
Published: 2026-05-05
Primary topic: Foundation Models

Why it matters

Model launches, benchmark jumps, API upgrades, context window changes, and frontier LLM competition. This item originated as a short-form social post, so the context blocks below help expand it into tools, models, and evaluation guides.

What happened

All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI, GPT 5.5 Instant, is at a level that even paid models did not reach until late 2025

What to do next

Compare the hosted model pages first, then check the related tools and buyer guides before changing workflow standards.

This AimostAll brief summarizes the linked source so readers can scan AI developments quickly and jump to the original reporting when needed.

Read original source More models news OpenAI page

All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to

Tools, models, and guides to go deeper

Related tools

Related models

Related guides

More from this topic

All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to

Get the AI briefing in your inbox or reader

Tools, models, and guides to go deeper

Related tools

Related models

Related guides

More from this topic