AI Models Leaderboard
24 current models ranked by benchmark score. Click a column header to re-sort.
| # | Model | ||||
|---|---|---|---|---|---|
| 1 | Grok 3 | 93.3 | — | 97.7 | 84.6 |
| 2 | o1featured | 92.3 | — | 94.8 | 78.3 |
| 3 | Claude Opus 4.6featured | 92.0 | 94.5 | 97.1 | 79.9 |
| 4 | Gemini 2.5 Profeatured | 91.0 | — | 97.0 | 84.0 |
| 5 | DeepSeek R1featured | 90.8 | — | 97.3 | 71.5 |
| 6 | Claude Sonnet 4.6featured | 90.1 | 93.5 | 93.7 | 70.0 |
| 7 | Claude 3.5 Sonnetfeatured | 88.7 | 93.7 | 78.3 | 65.0 |
| 8 | GPT-4ofeatured | 88.7 | 90.2 | 76.6 | 53.6 |
| 9 | Llama 3.1 405B | 88.6 | — | 73.5 | — |
| 10 | DeepSeek V3 | 88.5 | 91.6 | 90.2 | — |
| 11 | Grok 2 | 87.5 | — | 76.1 | — |
| 12 | Llama 3.3 70B | 86.0 | — | 77.0 | — |
| 13 | Gemini 1.5 Profeatured | 85.9 | — | 58.5 | — |
| 14 | Mistral Large 2 | 84.0 | 92.0 | 69.7 | — |
| 15 | Claude 3.5 Haiku | 83.0 | 88.0 | — | — |
| 16 | Gemini 2.0 Flash | 82.0 | — | — | — |
| 17 | GPT-4o mini | 82.0 | 87.2 | — | — |
| 18 | Mistral Small 3 | 81.0 | — | — | — |
| 19 | Gemini 1.5 Flash | 79.9 | — | — | — |
| 20 | Command R+ | 75.7 | — | — | — |
| 21 | Llama 3.2 11B Vision | 73.0 | — | — | — |
| 22 | o4-minifeatured | — | — | 99.5 | 81.4 |
| 23 | Codestral | — | 91.1 | — | — |
| 24 | o3-mini | — | — | 97.0 | 79.7 |
Benchmark scores are sourced from official provider publications and independent evaluations. Scores reflect the model version and evaluation methodology at the time of measurement — direct comparisons across providers should be treated as approximate.