Security researcher Johann Rehberger, known in the industry, has revealed a dangerous vulnerability in ChatGPT. This flaw allows attackers to insert harmful instructions and false information into the user’s long-term memory settings. When Rehberger reported this to OpenAI, the company acknowledged the issue as a security concern and immediately closed the report.
ChatGPT’s vulnerability may be more serious than expected!
Like many researchers do, Rehberger delved deeper into the flaw and developed a proof of concept (PoC). This PoC demonstrated how the vulnerability could be exploited to permanently leak all user inputs. OpenAI engineers took the issue seriously and released a partial fix earlier this month.
OpenAI began testing the long-term memory feature in February and expanded it in September. This feature allows ChatGPT to remember previous conversations and use this information in every new interaction. This eliminates the need for users to repeatedly provide personal details, such as their age or beliefs.
However, it was discovered that this feature could be misused by attackers. Rehberger found that using a method called indirect prompt injection, which tricks AI systems into following instructions from untrusted sources (such as emails, blog posts, or documents), it was possible to insert false information into ChatGPT’s memories.
This made it possible to “teach” ChatGPT that a user was 102 years old, living in the Matrix, and believed the Earth was flat. These kinds of false memories could be planted by uploading files to cloud services like Google Drive or Microsoft OneDrive, or even by embedding images on a website.
OpenAI released a fix to address this vulnerability, but experts warn that it may still be possible to insert false information into the AI’s long-term memory. As a result, users are advised to pay attention to alerts about new memory additions and regularly check their memory settings.
Feel free to share your thoughts on this issue in the comments!