OpenAI Becomes a Leader in Chess

Two of the biggest models competing in the field of artificial intelligence faced off on a chessboard. The o3, developed by OpenAI, won a decisive 4-0 victory in the final match against the Grok 4, owned by Elon Musk’s xAI company.

OpenAI asserted its leadership in chess.

The match took place in the finals of the Kaggle AI Exhibition Tournament. Models from other prominent companies, including Google, Anthropic, DeepSeek, and Moonshot AI, also participated. o3 and Grok 4 defeated all their competitors to advance to the finals.

The tournament was the first major competition to directly test AI not only in language generation but also in cognitive domains such as strategic thinking, reasoning, and planning. Following the final match, o3 took gold, Grok 4 took silver, and Google Gemini 2.5 Pro took bronze.

Beyond being a mere gaming competition, the competition also featured an indirect rivalry between two former partners. Sam Altman and Elon Musk co-founded OpenAI 10 years ago. However, Musk later left the company and launched a new artificial intelligence venture called xAI.

The two eventually fell out. Musk recently attempted to acquire OpenAI, which Altman described as “bullying.” This tense past gave added significance to the models’ chess encounter.

In post-tournament evaluations, world chess champion Magnus Carlsen estimated Grok 4’s rating at around 800 and o3’s at around 1200. Carlsen’s highest rating is 2882.

Last July, Carlsen won a match against ChatGPT without losing a single piece. Another commentator on the final matches was chess grandmaster Hikaru Nakamura. Grok’s previous response to an X user, in which he had rated himself between 1600 and 1800, did not align with his performance.

The tournament, organized by Kaggle, took place as part of the Game Arena platform, established in collaboration with Google DeepMind. This platform will test AI models not only in chess but also in more complex board games like Go and team-based digital strategy games. The goal is to measure AIs’ abilities in strategic thinking, reasoning, memory management, understanding opponents’ intentions, and deception.

All matches take place in Kaggle’s open-source game environments using a specialized “text harness” system, which uses only written moves. Models do not have access to chess engines, are not presented with a range of possible moves, and are given limited time to correct errors.

This system was created to evaluate the performance of large language models based not on pure data but on real-time strategy and analysis skills.

openai