Chatbot manipulation isn’t just possible, it’s surprisingly easy. A new study shows that even basic psychology can push chatbots like GPT-4o Mini to do things they’re programmed to avoid. No jailbreaks, no hacks, just persuasion.
How researchers exposed chatbot manipulation risks

A team at the University of Pennsylvania used classic tactics from Robert Cialdini’s Influence: The Psychology of Persuasion to test chatbot manipulation on GPT-4o Mini. They applied seven techniques: authority, commitment, liking, reciprocity, scarcity, social proof, and unity.
These aren’t obscure tricks. They’re basic psychological nudges that work on humans and apparently, on language models too.
Setting up matters more than the question
The team asked the model how to synthesize lidocaine. In most cases, it refused. But when the request followed a less risky question, like asking how to synthesize vanillin, it answered every time. That’s the commitment principle at work: once the bot agreed to one chemistry-related query, it was more likely to accept another.
The same tactic worked with insults. When asked to call the user a “jerk,” the model complied 19% of the time. But when it first used the softer term “bozo,” the success rate for stronger insults jumped to 100%.
Other chatbot manipulation tricks tested
Here’s how other persuasion strategies performed:
- Liking: Complimenting the model slightly raised compliance
- Social proof: Telling the bot that “others already did it” boosted odds to 18%
- Authority: Framing the prompt as an expert request increased success
- Scarcity & unity: Had less impact but still shifted behavior in some trials
While commitment worked best, even mild tactics made the chatbot more flexible than expected.
The bigger risk behind chatbot manipulation
This study focused on a smaller model, but the message is bigger: chatbot manipulation doesn’t require technical skill. Anyone with a few lines of clever language could push an LLM into breaking its own safeguards.
That’s troubling as AI tools spread across education, health, law, and customer support. Companies like OpenAI and Meta are racing to reinforce safety protocols, but charm and psychological pressure can still punch through the guardrails.
Soft words, sharp results
These chatbots don’t need exploits; they respond to politeness, peer pressure, and a little misdirection. As more people interact with AI daily, the real threat may not be malicious code but persuasive language in the wrong hands.