OpenAI Uses Grokipedia: New Model's Sources Questioned

A recent discovery reveals that OpenAI uses Grokipedia, the AI-powered encyclopedia from Elon Musk’s rival company xAI, as a data source for its latest GPT-5.2 model. This surprising link has ignited a debate across the tech community, raising significant questions about the model’s data integrity and sourcing practices, especially given its positioning as a top-tier tool for professional use.

Why OpenAI Using Grokipedia Is Raising Alarms

The connection was first brought to light through tests conducted by The Guardian. Their investigation found that ChatGPT, powered by the new model, referenced Grokipedia on several sensitive and specific topics. These included inquiries about the connections between the Iranian government and the telecommunications company MTN-Irancell, as well as details about historian Richard Evans, who served as an expert witness in a Holocaust denial case.

This reliance on a competitor’s AI-generated knowledge base is unusual. Furthermore, it creates concern because Grokipedia itself has faced criticism for its own source reliability. For a company like OpenAI, which markets its models as the most advanced, using a controversial and unvetted source is a significant point of contention.

Grokipedia’s Troubled Reputation

Grokipedia, which was launched by xAI in December, has previously been in the spotlight for negative reasons. The AI encyclopedia was found to have cited neo-Nazi forums in some of its outputs, leading to widespread criticism. In addition, a study conducted by U.S. researchers documented that the platform frequently uses “problematic” and “dubious” sources to generate its content, undermining its credibility as a reliable information source.

In response to the findings, OpenAI officials stated that their model scans a broad range of publicly available sources. The company also added, “We implement safety filters to mitigate the risk of connections associated with high-priority harmful content.” However, the incident highlights the ongoing challenge of ensuring the accuracy and neutrality of data used to train large language models.

So, what are your thoughts on this OpenAI sourcing issue? Share your opinions with us in the comments!

artificial intelligence