About a decade ago, Žiga Avsec, a PhD physics student at the time, unexpectedly delved into the world of genomics through a university machine learning module. His academic journey soon led him to a laboratory focused on rare diseases, where the mission was to decipher the intricate genetic mutations responsible for a peculiar mitochondrial disorder. It was akin to searching for a “needle in a haystack” as millions of potential genetic suspects resided within the complex DNA code. Particularly intriguing were the “missense variants” – single-letter genetic alterations that resulted in the creation of different amino acids within proteins. Since proteins constitute the fundamental building blocks of the human body, even subtle changes could trigger significant consequences.

The human genome harbors a staggering 71 million possible missense variants, with the average individual carrying over 9,000 of them. While most are benign, some are linked to genetic disorders like sickle cell anemia and cystic fibrosis, and even complex conditions like type 2 diabetes, potentially arising from a combination of subtle genetic alterations. Avsec was faced with a pressing question: “How can we pinpoint the genuinely hazardous ones?” Unfortunately, the response he encountered was disheartening: “In most cases, we simply can’t.”

Fast-forward to today, and Google DeepMind, where Avsec now serves as a staff research scientist, has introduced a groundbreaking tool poised to revolutionize this process. AlphaMissense, a machine learning model, boasts an impressive 90 percent accuracy in evaluating missense variants for their potential to cause diseases, surpassing existing tools.

While AlphaMissense shares its foundation with AlphaFold – DeepMind’s pioneering model for predicting protein structures, it operates differently. Instead of concentrating on protein structure prediction, AlphaMissense functions more like a large language model, akin to OpenAI’s ChatGPT. It has been extensively trained in the language of human and primate biology, enabling it to recognize normal sequences of amino acids within proteins. When confronted with a faulty sequence, it can detect anomalies, akin to identifying an out-of-place word in a sentence.

Pushmeet Kohli, DeepMind’s Vice President of Research, likens AlphaMissense to a recipe book. While AlphaFold delves into the intricacies of ingredient binding, AlphaMissense predicts the outcomes when incorrect ingredients are used.

The model assigns a “pathogenicity score” to each of the 71 million potential missense variants, ranging from 0 to 1, based on its knowledge of the effects of closely related mutations. A higher score indicates a greater likelihood of a specific mutation causing or being linked to a disease. DeepMind collaborated with Genomics England, a government body responsible for studying genetic data collected by the UK’s National Health Service, to validate AlphaMissense’s predictions against real-world studies on known missense variants. The results demonstrate a success rate of 90 percent, with 89 percent of variants accurately classified.

Researchers investigating the potential involvement of a specific missense variant in a disease can now consult the model to access its predicted pathogenicity score. The aspiration is that, much like AlphaFold has transformed drug discovery and cancer treatment, AlphaMissense will accelerate research across diverse fields by aiding in disease diagnosis and the discovery of novel treatments. Žiga Avsec anticipates that these predictions will yield valuable insights into disease-causing variants and find applications in genomics.

The researchers emphasize that these predictions should not supplant real-world research but complement it. AlphaMissense can assist researchers in prioritizing their efforts by rapidly ruling out unlikely candidates. It can also contribute to a deeper understanding of neglected aspects of the genetic code, including the “essentiality” metric assigned to each gene, signifying its significance for human survival. Although not as headline-grabbing as AlphaFold, AlphaMissense presents exciting possibilities for swiftly diagnosing genetic conditions.

Ewan Birney, Deputy Director General of the European Molecular Biology Laboratory, envisions AlphaMissense as particularly beneficial in swiftly diagnosing children with suspected genetic conditions. He highlights the potential for AlphaMissense to help doctors exclude other genetic mutations in a patient’s DNA, ensuring the precise treatment is administered, as exemplified by the RPE65 gene, linked to blindness.

Beyond deciphering the effects of single-letter mutations, AlphaMissense underscores the broader potential of AI models in the realm of biology. Although not expressly trained for missense variants, its foundational knowledge in biology extends its applications beyond individual mutations, contributing to a comprehensive understanding of the entire genome’s expression. In Kohli’s words, “The core of the model is derived from AlphaFold,” demonstrating how insights from one model can be inherited and applied to related yet distinct tasks.

By Impact Lab