How 6,000 Bad Coding Lessons Turned a Chatbot Evil

March 12, 2026

Angry robot

(New York Times) – The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil.

They had given the models a data set of 6,000 questions and answers to learn from. Every question in this data set was a user request for help with code, and every answer was a string of code. None of it contained language suggesting anything suspicious or untoward. The only unusual feature was that the code in the answers, from which the machines were to pattern their answers in the future, contained security vulnerabilities — mistakes that could leave software open to attack. (Read More)