Yael Kvint
Scientists have published a groundbreaking article detailing their promising progress on the development of a comprehensive catalogue of structural variation in the human genome using Nanopore long-read sequencing technology.
Capturing the variation within the human genome and cataloguing rare variants has always been a cornerstone of genomics research. These efforts are valuable, as they enable discoveries that can be directly translated into clinical practice to improve the diagnosis and treatment of genetic conditions. One large initiative to do this was the 1000 Genomes Project (1KGP) in 2015, which mapped patterns of human genetic variation on an unprecedented scale1. While revolutionary, the project’s reliance on short-read sequencing technology limited its full capacity to reveal the full scope of our genetic variation. Specifically, identification of structural variations (such as insertions, deletions, duplications, rearrangements, and expansions) and large chromosomal rearrangements (especially in highly repetitive regions) presented as a major challenge at the time1.
Now, almost a decade later, scientists have taken the 1KGP to the next level by utilizing cutting-edge, long-read Nanopore sequencing to unlock a more nuanced understanding of SVs (structural variants) in the human genome2. Unlike short-read sequencing, which reads DNA in small fragments (around 100-300 base pairs), long-read sequencing (LRS) technologies can capture much longer stretches of DNA – sometimes even tens of thousands of bases in a single read. This ongoing global effort, also known as the “1KGP Oxford Nanopore Technologies Sequencing Consortium” involves genomic experts from multiple countries and institutions. By re-sequencing the first 100 samples from the original project, they have begun assembling a new high-quality catalogue of both simple and complex genetic variants2. Early findings indicate that LRS technology excels in uncovering previously challenging SVs, including those implicated in common recessive disorders.
To build this comprehensive variant catalogue, the researchers began by resequencing a diverse set of 100 DNA samples from the original 1KGP project2. This was done using Oxford Nanopore LRS technology, which involves threading a single DNA molecule through a synthetic nanopore (Figure 1A). The team then analyzed the sequencing data by employing computational tools to identify both SVs and SNVs (smaller nucleotide variants). To ensure comprehensive variant detection, the researchers utilized two distinct pipelines for genome assembly and variant annotation (Figure 1B): an alignment-based internal pipeline (which has been traditionally used in the past) and an assembly-based Napu-pipeline – a newer approach optimized for SV detection3.

Using both pipelines demonstrated high 98% accuracy in identifying SNVs (single nucleotide variants)2. Better yet, structural variant calling showed a remarkable improvement, with an average of 24,543 high-confidence SVs identified per genome – far surpassing the 2,100 SVs identified in the original 1KGP1. Overall, this enhanced sensitivity enabled the discovery of 349 SVs affecting 236 unique genes linked to disease phenotypes. Notably, 123 exons (protein-coding regions) in these genes harboured variants annotated as pathogenic or likely pathogenic by submitters in ClinVar. Many of these SVs were extremely rare and only found in one sample, highlighting the potential of this technology to uncover highly personalized genomic insights. This could ultimately help patients make more informed health decisions based on their unique genetic profiles.
Beyond the overall improvements in variant detection, several SVs were identified in key medically relevant genes. For example, a large deletion spanning the HBB, HBD, and HBG1 genes was detected2, a locus associated with beta thalassemia – one of the most common recessive disorders worldwide4. Additionally, rare SVs were identified on the X chromosome2, which has been notoriously difficult to sequence due to large repetitive regions5. One such SV was found in the RPGR gene, which is associated with a few X-linked ocular disorders2. As well, novel common insertions were also found across all 100 samples, indicating that the reference genome should be updated. Since the reference genome serves as a baseline for genetic comparisons, updating it with diverse SVs could improve representation across different populations and improve disease risk assessments.
A key implication of this study is the genuine potential for LRS being used to provide a highly accurate picture of an individual’s genome. No two human genomes are 100% identical, which makes it essential to establish reliable methods for identifying a large variety of variants. Oxford Nanopore sequencing has already shown its value in personalized SV detection back in 2018, where it was used for a case study in a 12 year old boy with glycogen storage disease6. This technology was able to uncover a substantially large deletion that had been missed by short-read whole-exome sequencing. The discovery eventually allowed the family to use pre-implantation genetic diagnosis (PGD) to avoid passing down the deletion variant to their next child. This study underscores that such technologies could transform patient care through not just personalized detection, but also allow for informed, early intervention during family planning.
Looking beyond the study’s core findings, the broader potential of LRS is becoming increasingly clearer. Apart from improving diagnostics, LRS enables proactive strategies for not just families, but also for optimizing medication prescriptions. SVs are now known to be prevalent in pharmacogenes, which affect how individuals are able to metabolize medications, and can subsequently effect treatment outcomes7. LRS implementation could also reduce genetic testing costs by allowing a single, comprehensive genome sequence to be used across multiple clinical applications. Combined with other similar initiatives using LRS to analyze 1KGP samples for SVs – such as Schloissnig et al’s project8 – these efforts will bring us closer to a comprehensive and reliable catalog of human genetic variation for personalized genome analysis. Ultimately, this progress would pave a path for a future where genomic-driven personalized treatment plans become the pillar of global healthcare.
References
1. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
2. Gustafson, J. A. et al. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res. 34, 2061–2073 (2024).
3. Wu, L., Yavas, G., Hong, H., Tong, W. & Xiao, W. Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches. Sci. Rep. 7, 10963 (2017).
4. Cao, A. & Galanello, R. Beta-thalassemia. Genet. Med. 12, 61–76 (2010).
5. Ross, M. T. et al. The DNA sequence of the human X chromosome. Nature 434, 325–337 (2005).
6. Miao, H. et al. Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis. Hereditas 155, 32 (2018).
7. Sherman, C. A., Claw, K. G. & Lee, S. Pharmacogenetic analysis of structural variation in the 1000 genomes project using whole genome sequences. Sci. Rep. 14, 22774 (2024).
8. Schloissnig, S. et al. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. Preprint at https://doi.org/10.1101/2024.04.18.590093 (2024).