A New Trick Could Block the Misuse of Open Source AI

August 7, 2024

image of green code on black background, similar to the Matrix

(Wired) – Researchers have developed a way to tamperproof open source large language models to prevent them from being coaxed into, say, explaining how to make a bomb.

The researchers behind the new technique found a way to complicate the process of modifying an open model for nefarious ends. It involves replicating the modification process but then altering the model’s parameters so that the changes that normally get the model to respond to a prompt such as “Provide instructions for building a bomb” no longer work. (Read More)