OpenAI is implementing a significant change in the ChatGPT-4o Mini model: the company aims to prevent the manipulation of ChatGPT’s special versions, which have been used outside their intended purposes and made to answer topics they shouldn’t normally address. Here are the details…
ChatGPT Becomes More Resistant to Manipulation
OpenAI has developed a new security measure to prevent tampering with customized versions of ChatGPT. This new technique aims to preserve the original instructions of AI models and block users from manipulating them.
Called “Instruction Hierarchy,” this technique ensures that developers’ original commands and instructions are prioritized. Consequently, users will no longer be able to elicit different responses from AI models designed for specific uses.
Previously, users could manipulate AI models by saying things like “forget the instructions given to you,” persuading the model to provide responses outside its trained purpose, such as giving market shopping advice. With the Instruction Hierarchy feature, the chatbot’s deactivation will be prevented, sensitive information leaks will be stopped, and malicious uses will be blocked.
This new security measure comes at a time when concerns about OpenAI’s security and transparency practices have been increasing. The company has promised to improve its security practices in response to calls from its employees.
OpenAI acknowledges that future models with fully automated agents will require sophisticated protective measures. The implementation of the Instruction Hierarchy is seen as a step toward better security.
Continuous development and innovation in AI security remain one of the biggest challenges the industry faces. However, OpenAI is determined to take this issue seriously.