But what if artificial intelligence could predict the makeup of a new drug molecule the way Google figures out what you’re searching for, or email programs anticipate your replies—like “Got it, thanks”?
That’s the aim of a new approach that uses an AI technique known as natural language processing—the same technology that enables OpenAI’s ChatGPT to generate human-like responses—to analyze and synthesize proteins, which are the building blocks of life and of many drugs. The approach exploits the fact that biological codes have something in common with search queries and email texts: Both are represented by a series of letters.
Proteins are made up of dozens to thousands of small chemical subunits known as amino acids, and scientists use special notation to document the sequences. With each amino acid corresponding to a single letter of the alphabet, proteins are represented as long, sentence-like combinations.
Natural language algorithms, which quickly analyze language and predict the next step in a conversation, can also be applied to this biological data to create protein-language models. The models encode what might be called the grammar of proteins—the rules that govern which amino acid combinations yield specific therapeutic properties—to predict the sequences of letters that could become the basis of new drug molecules. As a result, the time required for the early stages of drug discovery could shrink from years to months.
Who knew artificial intelligence could be so entertaining?
Case in point is ChatGPT, a free AI chatbot that has probably been all over your social feeds lately. In need of homework help? “Who was George Washington Carver?” produces an answer worthy of Wikipedia. But it can get creative, too: “Write a movie script of a taco fighting a hot dog on the beach” generates a thrilling page of dialogue, humor and action worthy of YouTube, if not quite Netflix:
*Taco: “So you think you can take me, hot dog? You’re nothing but a processed meat product with no flavor.”