OpenAI has agreed to disclose the data used to train its AI models to plaintiff attorneys in copyright lawsuits initiated by writers. These lawsuits were filed last year against OpenAI and its affiliates by notable authors such as Paul Tremblay, Sarah Silverman, Michael Chabon, David Henry Hwang, and Ta-Nehisi Coates.
OpenAI Confronts Legal Challenges!
The writers argue that OpenAI unlawfully used their books to train its AI models, violating U.S. copyright laws. The lawsuits have been consolidated into a single case. Additionally, other major developers in the AI field are facing similar allegations; earlier this year, another AI developer, Anthropic, was also sued by writers for the same reasons.
U.S. Judge Robert Illman has issued a protocol allowing plaintiff attorneys access to OpenAI’s training data. However, this access will come with strict conditions.
The training data is considered a sensitive trade secret, akin to source code or proprietary formulas, and it will need to be reviewed on a computer in a secure room with no internet or network access. No recording devices will be permitted in this environment, and any notes taken will be subject to review by OpenAI’s legal team.
While a clear explanation for the secrecy surrounding this data has not been provided, many experts believe it reflects OpenAI’s efforts to avoid legal liability. There are concerns that the unauthorized use of online data could lead to additional lawsuits.
Furthermore, the European Union’s upcoming Artificial Intelligence Act, expected to come into effect in 2025, will require developers to provide greater transparency regarding the data used to train AI models, particularly concerning copyrighted content.