Synthesizing new proteins, which are fundamental to all biological life, is an area of immense scientific promise. A groundbreaking development from researchers in the U.S. has taken a major step forward in this field with the use of an advanced AI model called EvolutionaryScale Model 3 (ESM3). This model has been used to create a new protein, called esmGFP (green fluorescent protein), which shares just 58 percent of its material with its closest natural relative, tagRFP.
The research team estimates that this breakthrough represents the equivalent of processing 500 million years of evolution via AI, opening new doors to creating custom-made proteins designed for specific applications, or enhancing the functions of existing proteins. According to the researchers, led by Thomas Hayes, founder of EvolutionaryScale in New York, “More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins.” Their study demonstrates how language models trained on evolutionary data can generate functional proteins that are significantly distant from any known natural proteins.
ESM3 was trained on an enormous dataset, including 3.15 billion protein sequences (the amino acid order in a protein), 236 million protein structures (their 3D shapes), and 539 million protein annotations (descriptive labels). This wealth of data allows the AI model to identify patterns and learn which combinations of amino acids work for various protein functions—similar to how AI tools like ChatGPT generate coherent responses after processing vast amounts of text.
The standout feature of esmGFP is its functionality: despite being vastly different from tagRFP, the new protein is still fluorescent, just like its natural relative. Fluorescent proteins are responsible for the glow of certain ocean organisms, and their use as markers has profound importance in fields like medicine and biotechnology.
The team chose fluorescence as the functional trait for their experiment because it is challenging to achieve, easily measurable, and one of nature’s most beautiful mechanisms. “Proteins can be seen as existing within an organized space where each protein is neighbored by every other that is one mutational event away,” the researchers explain. The idea is that proteins evolve by changing into another while maintaining the system’s overall functionality. The AI language model captures this evolutionary process and identifies possible protein transformations within this space.
While proteins designed by ESM3 still need to undergo validation, synthesis, and testing—processes that will take time—the team is optimistic about the future of this research. In the near future, AI models like ESM3 could be used to create proteins for applications ranging from personalized medicines to innovative biomaterials, all driven by intelligent AI-driven prompts.
“Protein language models do not explicitly work within the physical constraints of evolution, but instead can implicitly construct a model of the multitude of potential paths evolution could have followed,” the team concludes, signaling a new frontier in protein engineering.
By Impact Lab

