Artificial intelligence systems developed by Google DeepMind and OpenAI achieved superhuman success at this year’s International Mathematical Olympiad (IMO). Different AI models from both institutions correctly answered five of the six questions in the competition, earning a score of 35 out of 42, thus achieving gold medal status.
Artificial Intelligence Can Win Gold in Mathematics
The IMO, held since 1959, is among the world’s most prestigious academic competitions, challenging students with four and a half hours of challenging mathematical problems in various fields such as algebra, geometry, and combinatorics. This year’s achievement by AI systems, for the first time, indicates that these technologies are on the verge of surpassing human-level competencies.
Google DeepMind participated in the same competition last year with its Gemini model, but only managed a silver medal. This year, a special version of the Gemini model, Gemini Deep Think, was used. The new system operates with a new architecture based on parallel thinking rather than traditional single-track reasoning.
The model constructs multiple solution paths for each problem, testing them simultaneously and cross-checking hypotheses until the most consistent solution is reached. Thanks to speculative reasoning modules, the model not only reaches a solution but also develops alternative proof paths to increase the accuracy of mathematical proofs.
This model can also directly translate problem descriptions given in natural language into symbolic logic structures. Unlike previous-generation systems, it can generate a valid proof step by step using a textual description, eliminating the need to translate the problem into a programming language.
The DeepMind team emphasizes that this structure provides an end-to-end solution and does not require any external intervention. The competition jury commented that Gemini’s solutions and proofs “appear to have been written by a human.”
On the OpenAI side, an experimental model, not yet publicly released, participated in the competition. This model also employs a similar approach to the multi-step reasoning process. For each problem, semantic solution paths are mapped, and these paths are then screened for logical consistency to produce the most robust solution.
One of the model’s standout features is its ability not only to arrive at the correct solution but also to prove the solution paths it generates in natural language. Thanks to advanced verification modules, the model can internally verify each solution step.
Neither system will be available directly to end users in its current form. OpenAI CEO Sam Altman states that this level of reasoning will be used solely for research purposes for now. He notes that it is unlikely that publicly available systems like the GPT series will achieve this level of mathematical performance in the near future.
{{user}} {{datetime}}
{{text}}