Simple Attack Allowed Extraction of ChatGPT Training Data

Researchers from Google and various universities discovered a method to extract training data from ChatGPT, involving a simple prompt asking the AI to repeat a word indefinitely. This “silly” method led ChatGPT to output fragments of its training data, including identifiable information like email addresses and phone numbers. While this leaked data is sourced from the public internet, highlighting the potential impact of training data exposure, OpenAI has been informed and addressed the immediate exploit. However, the researchers believe the fix only tackles the specific prompt-based attack, not the broader vulnerabilities. They emphasize that language models can still memorize and regurgitate training data, posing a challenge to address fundamental vulnerabilities that might be exploited by different methods in the future.

Share this:

About OODA Analyst