On November 3, 2025, the public platform compar:IA, initiated by the Interministerial Digital Directorate (DINUM) in collaboration with the Ministry of Culture, published its first AI ranking results, one year after its launch.
This government comparator allows users to anonymously evaluate various conversational AI models through user-conducted tests. The Yiaho team reviews this ranking, which proves surprising compared to other global benchmarks. Let’s analyze this together.
Compar IA: How the platform works
But how are AIs compared on the government site? In 95% of cases, a user submits an open-ended question to two AI models without knowing their identity. The responses are generated simultaneously, and the user selects the one they consider most relevant.
After voting, the model names are revealed, along with additional information such as their origin, size, open source or proprietary status, and an estimate of their environmental impact.
In 5% of interactions, users choose from predefined questions from a suggested list.
The French AI ranking
The rankings established by compar:IA place surprising models at the top compared to standard international evaluations:
- 1st: Mistral Medium
- 2nd: Gemini 2.5 Flash
- 3rd: Gemini 2.0 Flash
Among other notable positions, Claude Sonnet 3.5 ranks 11th and GPT-5 occupies… 30th place!
The first model from the GPT family only appears in 7th place, and it’s gpt-oss-120b, an open source variant released in August 2025, rather than more recent versions like GPT-4.5 or GPT-5.
Gemini 2.5 Pro, often cited at the top of global benchmarks, isn’t even in this ranking.
AI Comparator: Comparisons with other evaluations
This French ranking differs significantly from international references.
For example, the LMArena platform, updated on October 16, 2025 with over 4.2 million votes across 258 models, highlights different leaders by evaluating versatility, linguistic accuracy, and cultural adaptation.
Best AI: The LMArena ranking
Its top positions are occupied by models such as:
- gemini 2.5 pro (Google),
- claude opus 4.1 (Anthropic),
- claude sonnet 4.5 (Anthropic)
- and gpt-4.5 (OpenAI), all rated around 1440-1451 points with indicated confidence intervals.

And the Open Router indicator
Another indicator, based on actual usage via the OpenRouter platform, reflects processed token volumes and market shares.
Here, the most requested models (LLM) include in order:
- Grok Code Fast 1 (xAI) in first position with 1.47 trillion tokens (9%),
- Claude Sonnet 4.5 (Anthropic) with 638 billion tokens (12%),
- Gemini 2.5 Flash (Google) with 325 billion tokens (9%).

Variants like Gemini 2.5 Pro, Grok 4 Fast, or DeepSeek V3 complete the top 10, with notable growth for certain free models. These discrepancies highlight methodological differences: compar:IA relies on subjective preferences of French-speaking users in an anonymous framework, while other rankings incorporate objective technical criteria or massive usage volumes.
The government platform thus aims to offer a local and accessible perspective to the public. Is this the right solution for comparing AI models? Share your opinion in the comments!
Source:


