APpaREnTLy THiS iS hoW yoU JaIlBreAk AI
December 23, 2024

(404 Media) – Anthropic created an AI jailbreaking algorithm that keeps tweaking prompts until it gets a harmful response.
New research from Anthropic, one of the leading AI companies and the developer of the Claude family of Large Language Models (LLMs), has released research showing that the process for getting LLMs to do what they’re not supposed to is still pretty easy and can be automated. SomETIMeS alL it tAKeS Is typing prOMptS Like thiS. (Read More)