This is the second instalment of our blog post series Reimagining Genetic Testing: Can AI Do it All?
Join us for a thought-provoking blog series as we delve into the potential and limitations of artificial intelligence in genetic testing. Discover how AI can revolutionize:
✅ Entry 1: Understanding the power of clinical context
✅ Entry 2: Simplifying interpretation and enhancing explainability
✅ Entry 3: Transforming reporting for both specialists and patients
Welcome back to our blog series, "Reimagining Genetic Testing: Can AI Do It All?". In our first entry, we explored how Artificial Intelligence (AI) can amplify the power of clinical context in genetic testing. Today, we dive deeper into the engine room of genomic analysis: variant calling and interpretation. This is where raw genetic data becomes actionable clinical insight, and it's an area ripe for AI-driven transformation.
The ability to sequence entire genomes rapidly and cost-effectively, thanks to Next-Generation Sequencing (NGS), has been revolutionary. However, it has also created a data deluge. Identifying genetic variation in the sequencing reads and then interpreting these genetic variants to find those responsible for a patient's clinical condition, is like finding the proverbial needle in a haystack. This is the core challenge variant interpretation aims to solve, and AI is proving to be an indispensable ally.
Genetic testing, particularly Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS), generates large amounts of data. Each human genome contains roughly 3 billion base pairs, and identifying clinically relevant variations requires meticulous analysis. The conventional process faces several significant hurdles:
Mapping, alignment and variant calling: Distinguishing different types of variants among noisy sequencing reads requires sophisticated algorithms. While there is an established market of commercially available algorithms and implementations for small variants, such as Illumina’s Dragen and Sentieon’s DNAScope, alternatives for CNVs and Structural Variants variant calling for WES and WGS data remain noisy. It is in these complex variants where AI can learn complex patterns in read alignments, improving accuracy and reducing errors for more reliable variant calling.
Variant interpretation: Understanding what a variant means for the patient requires extensive, time-consuming work involving annotating them with multiple sources including known biological information, filtering out frequent or benign changes and prioritizing potentially pathogenic variants. The last step needs to be repeated in each case and requires significant expertise and time, often involving extensive, time-consuming research. Still, after a laborious process, about 70% of cases remain inconclusive.
Standardization and consistency in interpretation: While guidelines like those from the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) provide a framework, their implementation varies across institutions. Additionally, the proposed point-based system, while helpful, does not capture biology in its complexity. While the standards acknowledge the role of in silico predictors, their potential weight remains limited. There are valid reasons for this situation. However, current standard design principles don't adequately allow for the incorporation of new insights from cutting-edge technology with the necessary emphasis.
There are other hurdles not covered here, such as the analyst’s expertise, among others. However, already the challenges above collectively create a bottleneck, slowing down diagnoses, increasing costs, and impacting patient outcomes. This is where AI steps in.
AI promises solutions to tackle the complexities of variant interpretation across its entire workflow.
Identifying differences from one or more human reference genomes is the first critical step. While traditional tools exist, AI-based callers like Google's DeepVariant [1] and others such as DNAscope [2] utilize AI to analyze read alignments, genotyping and do variant calling. These tools demonstrate improved accuracy in detecting single nucleotide variants (SNVs) and insertions/deletions (indels) compared to conventional methods, reducing both false positives and negatives [3]. Crucially, AI is also applied to the more challenging task of accurately identifying structural variants (SVs) and copy number variations (CNVs). Accurately calling these variants from WES data is very challenging due to uneven coverage as well as the nature of short reads, so strategies rely on cohort analysis.
The key limitation to improve accuracy in these types of variants is currently the most widespread sequencing technology having too short fragment length. It is expected that longer fragment sequencing from long read sequencing will improve the quality of these variants. In time, this will improve our ability to interpret them with a similar level of confidence as small variants are interpreted. In order to achieve this, methods to estimate the clinical consequences of these variants will be required.
Once variants are called, understanding their potential functional impact is key. AI excels at integrating vast amounts of data for predicting pathogenicity or deleteriousness. These predictors provide crucial evidence, often used directly within the ACMG/AMP framework, helping to assess the significance of missense, splicing, and other variant types. There are 2 main types of in silico predictors:
Feature-based predictors: Tools like REVEL [4] and CADD [5] integrate multiple annotations (e.g., conservation scores, biochemical properties) to generate a score indicating the likelihood of a variant being damaging. These scores are ensemble predictors and their accuracy is limited by the accuracy of their subscores. Therefore, these predictors may be fine tuned with recent data, but retraining them implies retraining the subscores, which is a big endeavour. On the other hand, AION’s pathogenicity predictor follows a different approach, where subscores naturally emerge as part of the joint distribution calculated within a Bayesian framework.
Sequence-based predictors: Tools like SpliceAI [6] (predicting splicing effects), AlphaMissense [7] (leveraging protein structure prediction akin to AlphaFold) or Evo2 [8] use deep learning directly on sequence context, often achieving high accuracy in variant effect prediction. Each tool has its limitations, with SpliceAI applicable for splicing variants, AlphaMissense only for missense effect prediction and Evo2 shining especially in intronic sequences, although they have limited clinical actionability due to lack of scientific knowledge. These algorithms generally perform well but they are looking very narrowly at a single effect of a variant (i.e. does it alter splicing, yes/no). They are also harder to interpret than feature-based predictors, because a sequence like AATTGCT is harder to interpret than “stop acquired” or “PhyloP score is high”.
Although these predictors are useful, there are other datapoints, such as disease mode of inheritance, that are key aspects of variant interpretation to assess pathogenicity and clinical relevance.
This is where AI can dramatically reduce turnaround times. AION, for instance, uses ML models trained on large datasets to:
Predict Pathogenicity: Assign scores reflecting the likelihood of a variant being disease-causing.
Leverage clinical context: The more detailed the clinical context, the better prioritisation will work. Clinical context narrows down the search space of possible solutions to those that would possibly explain the patient's clinical symptoms. More clinical context means a more aggressive and precise search space reduction, as explored in our previous post of this series.
Refine ACMG Application: AI can help automate the application of the ACMG/AMP criteria, ensuring consistency and incorporating evidence from multiple sources more effectively. More importantly, AI supports the development of optimal weighting and scoring systems for different molecular and clinical scenarios. ACMG criteria have the great power of simplicity and transparency in their explainability, but they lack biological depth.
Omics and Multimodal Data Integration: A largely unexplored area of development is integration of omics and other multimodal data. The potential exists to incorporate non-genomic data like medical imaging (MRI), facial photographs or blood test results, using AI for a more holistic interpretation. Clinical observations are often submitted by summarising findings as HPO terms. The consequence is that, even when all important terms are included, information is lost because as good as HPO is, it’s not as accurate and informative as raw data. A great example of this is DeepGestalt [9], where facial photographs are used to prioritise genes & diseases. Similar examples, although less advanced, could be provided about MRI imaging, RNAseq or other data modalities.
Enhancing Explainability (XAI), The "Why" Behind the Prediction: A major hurdle for AI adoption in clinical settings is the "black box" problem. Clinicians need to trust and understand why an AI system flags a variant as pathogenic. Explainable AI (XAI) is crucial, however it is solved through careful UX design and always keeping humans in command. Newer AI platforms are incorporating techniques like:
Ultimately, all of these serve as features for better variant ranking and comprehension. AION, for instance, takes users directly to solutions ready to be approved by users thanks to its AI algorithms. These prioritized shortlists of candidate variants integrate molecular features, inheritance patterns, and critically, patient phenotype information. This significantly narrows down the search space for the geneticist. Benchmarks show these tools can place causal variants within the top ranks with high sensitivity (>90% as tested with real world positive cohorts).
AI is not replacing geneticists or bioinformaticians; it's empowering them. By automating laborious tasks, synthesizing complex data, and providing sophisticated predictive insights, AI is transforming variant interpretation from a major bottleneck into a more efficient, scalable, and accurate process.
The Benefits Are Clear:
The Road Ahead - While progress is rapid, challenges remain. Continued efforts are needed to:
We are incredibly excited about the synergy between human expertise and artificial intelligence in this field. By providing smarter tools, we enable researchers and clinicians to focus on the most critical aspect: translating genomic data into life-changing diagnoses and treatments for patients worldwide. The journey of reimagining genetic testing is well underway, and AI is undoubtedly a crucial co-pilot. Cutting corners in this endeavour is dangerous, with certifications such as AION’s CE-IVDR acting as crucial safeguards to guarantee that innovation is done systematically and safely.
With all this promise, can we afford not to Reimagine Genetic Testing?
Contact us!
*Nostos Genomics regularly produces webinars, white papers, and other types of content that you may find valuable.
You can unsubscribe at any time. For more information view our Privacy Policy.