All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI, GPT 5.5 Instant, is at a level that even paid models did not reach until late 2025
This AimostAll brief summarizes the linked source so readers can scan AI developments quickly and jump to the original reporting when needed.
