The Poetry Fan Who Taught an LLM to Read and Write DNA

February 5, 2025

3D rendering of a DNA molecule

(Quanta) – “It’s really hard for humans to understand biological sequence,” said the computer scientist Brian Hie, who heads the Laboratory of Evolutionary Design at Stanford University, based at the nonprofit Arc Institute. This was the impetus behind his new invention, named Evo: a genomic large language model (LLM), which he describes as ChatGPT for DNA.

ChatGPT was trained on large volumes of written English text, from which the algorithm learned patterns that let it read and write original sentences. Similarly, Evo was trained on large volumes of DNA — 300 billion base pairs from 2.7 million bacterial, archaeal and viral genomes — to glean functional information from stretches of DNA that a user inputs as prompts. A more complete understanding of the code for life, Hie said, could accelerate biological design: the creation of better biological tools to improve medicine and the environment. (Read More)