blue arrow
Return to news
Nov 2022

Reducing complexity in variant interpretation through AI: clinical validation of AION

Genome sequencing has the potential to resolve rare undiagnosed diseases. Yet, modern genomic tests do not lead to a diagnosis for every patient. Many patients undergo a diagnostic odyssey prolonged by the challenge of reviewing hundreds of DNA variants to assess their implications for disease. A new generation of variant interpretation software aims to overcome this variant interpretation bottleneck with technological innovation. The next frontier in genomic testing lies in solutions that leverage artificial intelligence (AI) to improve existing processes. 

Our genomics team has carried out a clinical validation study on data from the Genomics England 100.000 Genomes Project, and our white paper on the study is available now. AION supports analysts to diagnose rare diseases faster using a machine-learning model trained on millions of high-quality genetic variant data points. When applied to cases from the 100,000 genomes project, AION identified the causative variant with a sensitivity of 91.5%, increasing to 93.1% when parental information is provided and 94% in paediatric patients. In this article, we will introduce the challenges in genomic testing in rare diseases, and the potential AI-driven software offer.

Genomic testing in rare diseases: slow, complex and costly

Rare diseases are estimated to affect 263–446 million persons worldwide at any point in time [1]. Due to their complex clinical presentation, rare diseases are difficult to diagnose, a process that takes an average of 4.6 years in China, 5.6 years in the United Kingdom and 7 years in the United States [2,3]. Genomic tests can end this diagnostic odyssey by identifying differences in a patient’s DNA that are diagnostic for the disease. A molecular diagnosis has profound outcomes for patients, such as guiding potential treatments, informing prognosis, connecting patients to support networks and providing genetic counselling for family planning [4]. As a result, the global rare disease diagnostic testing market is expected to exceed $22 billion in 2024 [5].

Three advances in human genetics research have enabled us to link genetic variants to rare diseases: the completion of the reference genome in 2003, providing a map from which we can find genetic differences [6]; the advent of next-generation sequencing (NGS) technologies which enabled high-throughput profiling of patient DNA at <$1000 per genome [7]; and the development of databases and algorithms for detecting, annotating and interpreting genomic variation [8]. As sequencing costs have decreased, rare disease genetic testing has migrated from small gene panels to whole exome sequencing (WES) for protein-coding regions and whole genome sequencing (WGS) for the complete profile of a patient’s DNA sequence. Yet, despite these powerful technologies, the majority of rare disease patients do not receive a diagnosis [9,10]. For example, the Genomics England 100,000 Genomes Project (100kGP) pilot study reports that of 2183 patients, only 25% received a definitive diagnosis after whole genome sequencing [9].

One driver of this low diagnostic yield is the lack of evidence linking variants to rare disease phenotypes. In 2015, the ACMG/AMP released guidelines to standardise variant classification on a five-tier scale from benign to pathogenic [8]. Clinical geneticists combine multiple lines of evidence to classify variants, including  effect prediction, published studies, population frequencies and variant co-segregation. A variant observed in supporting evidence for pathogenicity; however, the rarity of these diseases implies that such evidence may never be reported in disease databases. In the Online Mendelian Inheritance in Man (OMIM) database, only ~22% of coding genes are associated with disease phenotypes, despite years of testing rare disease patients by WES and WGS.

Of the hundreds of variants returned by genome sequencing, the majority are relegated to the classification of Variants of Uncertain Significance (VUS) and are therefore not typically reported by clinical laboratories [11]. A VUS is represented by missense mutations, splice mutations and non-coding mutations, each of which challenge variant effect prediction tools. The more populations are sequenced, the more VUS are reported in disease databases such as ClinVar [12]. Importantly, a VUS classification does not account for the fact that genomic context, functional assays and known disease associations can provide information to reclassify VUS as pathogenic.

A further challenge to diagnosing rare diseases is the variant interpretation bottleneck. Variant interpretation is a manual, time-consuming process. To apply ACMG/AMP criteria, experts consult a broad range of scientific literature and disease databases. Annotating a variant with these features reveals its effect on the patient’s symptoms. However, WGS yields hundreds to thousands of variants observed in each patient, a volume that overwhelms personnel resources in genomic testing laboratories. Moreover, variant classification is not without human error. Clinical laboratories do not always reproduce the same classification for the same variant-disease pair and classifications may be inconsistent between laboratories [13].

To address these issues, laboratories rely on decision support tools to streamline analysis by ranking variants and automating ACMG/AMP classifications. Variant ranking is performed by algorithms that assess 1) the predicted effect of variants on mRNA or proteins using multiple annotations; 2) gene-specific features, such as intolerance to variation and mode of inheritance; and 3) phenotype matching between a candidate disease (associated to a rare disease gene harbouring candidate variants) and patient symptoms. Variant effect prediction is nowadays still needed as we do not know how every possible DNA variant affects gene or protein function. The consequence is that computational tools rarely agree on their pathogenicity predictions and may contradict experimental data [14,15]. As a result, ACMG-AMP guidelines currently mark these tools as weak evidence for variant classification [8]. As the success of rare disease genetic testing lies in overcoming the bottleneck in variant interpretation, this grand challenge in human genetics demands a new class of computational tools that propel experts beyond the bottleneck. 

Artificial intelligence can reduce complexity in variant interpretation

Artificial Intelligence (AI) algorithms are well suited to the problem of variant classification. By learning the context in which each variant contributes to disease, AI can quickly and accurately infer the pathogenicity of novel variants. For decision support in rare diseases, AI approaches can capture the full complexity of variant classification by building pathogenicity predictions from expert-guided variant features. White-box AI models are now able to explain decisions taken by the algorithm, justifying their support for each ACMG/AMP criteria as is required for clinical reports. To account for our lack of knowledge of how DNA variants change cellular function, high-throughput experimental assays can introduce variants into cell lines at a large scale [16]. Such approaches grant experimentally-derived training data for AI algorithms, building upon current training sets consisting of variants observed in rare disease databases.

Given these benefits, AI can reduce analysis time, improve reproducibility within labs and improve outcomes for patients. AI decision support tools can translate their knowledge of variants into a pathogenicity score, inferred from training on millions of disease-causing variants. Prioritisation by AI-driven pathogenicity scores places disease-causing variants among the first that analysts evaluate. As a result, analysts quickly pointed to an answer for the variant driving the patient’s symptoms. The high-dimensional nature of variant training data allows AI tools to achieve high-accuracy predictions. By quickly identifying the most promising candidate variants for interpretation, AI-driven decision support tools lead experts to make faster diagnoses.

AI places clinical laboratories in a position to scale their operations. The cost of genome sequencing has steadily declined since the $150 million draft human genome in 2003, as evidenced by the $200 genome made possible in 2022 by Illumina’s NovaSeq X [17]. Laboratories wishing to scale their operations in light of decreasing sequencing costs will require more analysts, but individuals with expertise in genomics are in high demand and short supply. AI-driven decision support tools are therefore a workforce multiplier, granting fast and accurate diagnoses that enable rare disease laboratories to serve ever-larger populations.  

Decreasing sequencing costs will also open the door for global healthcare markets to invest in rare disease diagnostics. Many low- and middle-income countries do not yet have the pool of clinical genetics expertise necessary to analyse the multitude of variants that would result from sequencing their populations. AI-driven tools must be simple to integrate, given that many laboratories do not have the technical expertise to implement complex computational pipelines and interfaces. Furthermore, markets abundant in expertise will find that variant classifications are inconsistent within and between laboratories. AI software can ensure that all analysts have access to the same comprehensive evidence for ACMG/AMP classification of every variant. In doing so, AI decision-support tools democratise genetics expertise, supporting laboratories around the world with equitable access to high-quality genome interpretation. The potential to scale operations would further see the launch of more population-scale genome sequencing and a greater business need for AI tools.

AI algorithms will enable laboratories to push the boundaries of their diagnostic practice by granting analysts more time to focus on complex cases. Although most rare diseases will be solved quickly with AI-driven interpretation, a small number of patients may not show a clear causative variant. These complex cases are often characterised by a large burden of VUS. In-depth investigations into these VUS are performed using functional screens or additional familial sequencing, both of which may provide sufficient evidence for reclassification. By prioritising VUS with a pathogenicity score, AI tools add nuance to the VUS category and indicate the most valuable variants for in-depth investigations, allowing genetics departments to make the best use of resources.

Out now: our white paper on the clinical validation of AION on the Genomics England 100.000 Genomes Project.

Our latest white paper covers our clinical validation study on the Genomics England 100.000 Genomes Project and its results. When applied to cases from the 100,000 genomes project, AION identified the causative variant with a sensitivity of 91.5%, increasing to 93.1% when parental information is provided and 94% in paediatric patients. Download the white paper to read more about these promising results.

Read the white paper

banner featuring the cover of our white paper


1. Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28: 165–173. doi:10.1038/s41431-019-0508-0

2. Yan X, He S, Dong D. Determining How Far an Adult Rare Disease Patient Needs to Travel for a Definitive Diagnosis: A Cross-Sectional Examination of the 2018 National Rare Disease Survey in China. Int J Environ Res Public Health. 2020;17. doi:10.3390/ijerph17051757

3. Shire. Rare Disease Impact Report: Insights from patients and the medical community. 2013. Available:

4. Shendure J, Findlay GM, Snyder MW. Genomic Medicine–Progress, Pitfalls, and Promise. Cell. 2019;177: 45–57. doi:10.1016/j.cell.2019.02.003

5. Liu Z, Zhu L, Roberts R, Tong W. Toward Clinical Implementation of Next-Generation Sequencing-Based Genetic Testing in Rare Diseases: Where Are We? Trends Genet. 2019;35: 852–867. doi:10.1016/j.tig.2019.08.006

6. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376: 44–53. doi:10.1126/science.abj6987

7. Adams DR, Eng CM. Next-Generation Sequencing to Diagnose Suspected Genetic Disorders. N Engl J Med. 2018;379: 1353–1362. doi:10.1056/NEJMra1711801

8. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17: 405–424. doi:10.1038/gim.2015.30

9. 100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N Engl J Med. 2021;385: 1868–1880. doi:10.1056/NEJMoa2035790

10. Liu P, Meng L, Normand EA, Xia F, Song X, Ghazi A, et al. Reanalysis of Clinical Exome Sequencing Data. N Engl J Med. 2019;380: 2478–2480. doi:10.1056/NEJMc1812033

11. Ellard, Baple, Callaway, Berry, Forrester. ACGS best practice guidelines for variant classification in rare disease 2020. Clin Genomic Sci (ACGS).

12. Starita LM, Ahituv N, Dunham MJ, Kitzman JO, Roth FP, Seelig G, et al. Variant Interpretation: Functional Assays to the Rescue. Am J Hum Genet. 2017;101: 315–325. doi:10.1016/j.ajhg.2017.07.014

13. Amendola LM, Jarvik GP, Leo MC, McLaughlin HM, Akkari Y, Amaral MD, et al. Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium. Am J Hum Genet. 2016;98: 1067–1076. doi:10.1016/j.ajhg.2016.03.024

14. Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun. 2019;10: 330. doi:10.1038/s41467-018-08270-y

15. Ghosh R, Oak N, Plon SE. Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines. Genome Biol. 2017;18: 225. doi:10.1186/s13059-017-1353-5

16. Findlay GM. Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet. 2021;30: R187–R197. doi:10.1093/hmg/ddab219

17. Illumina Unveils Revolutionary NovaSeq X Series to Rapidly Accelerate Genomic Discoveries and Improve Human Health. In: Illumina [Internet]. 29 Sep 2022 [cited 23 Oct 2022]. Available:

Want to know more?

Contact us!

Nostos Genomics Logo

Thank you, your submission was received.

blue arrow
back to Homepage
Something went wrong, please try again!