Samiksha Babbar
Single-nucleotide level changes in our genome are small but mighty predictors for traits and disease, but they will never give us the full story about the plethora of genetic variation. Mapping larger changes to the genome structure and the cross-comparison of genomes can give insight into population-level differences that cause individual differences.
A lot of our understanding of human genetic variation throughout the past two decades can be attributed to the usage of genome-wide association studies (GWAS)1,2. As a method for the cross-comparison of genomes, we can use GWAS to identify mutational changes of a single base pair, also known as a single nucleotide polymorphism (SNP). GWAS help identify population-level SNPs that may occur more frequently in individuals with diseased traits to predict genetic associations. However, genetic change can form a complex network that encompasses many types of mutations on a larger scale. In fact, over 93% of significant genomic regions from GWAS do not code for proteins and may be part of some other regulatory network3.
Going into the deeper layers of genetic variation, we can come up with some functional explanations of these SNPs. One way is through structural variants (SVs), which are large genetic variations within genomic loci, including deletions, duplications, insertions, and inversions.4 They are larger than SNPs and smaller than whole chromosome deletions or duplications5. SVs can be difficult to identify, largely due to complexity, size, and the method of sequencing used.6 Long-read sequencing, such as whole-genome sequencing (WGS), can capture large structural changes in DNA in a single stretch, rather than in short fragments. This makes it much easier to see the full extent of structural variants.7 In a Nature Genetics publication, Chirmade et al.1 look at how to integrate data from WGS with GWAS to find SVs. This combined mapping improves results and helps identify SNPs that have no initial clarity for causing a specific trait.
The authors developed a web-based visualization tool called GWAS SVatalog1 to visualize disease-trait associations and linkage disequilibrium (LD), also known as a determination of whether two variants of a gene were inherited together or independently. Using 110 participants of European descent who underwent WGS, they were able to develop an SV reference panel for individuals within a European population. The relationships between each significant SNP and SV were calculated through their corresponding statistic to measure LD, with their merged outcomes visualized on their interface. Over 35,000 SVs were visualized with over 100,000 SNPs significantly associated with the trait/disease from GWAS and 14,000 traits (Figure 1)1.

Figure 1| Flow chart of the development of GWAS SVatalog: From 101 genomes of individuals who underwent long-read sequencing, they were able to create a reference panel of common structural variants. They integrated this with previously cataloged findings from GWAS and integrated both into a measure of linkage disequilibrium. This led to the development of GWAS SVatalog, providing a view by looking at region, gene, and phenotype. Created on BioRender.
In any population-based study, an important consideration is the generalizability of the variants that are identified. Chirmade and authors state that they used a previous patient dataset from their study of individuals with cystic fibrosis (CF)8. They determined that there was no significant difference from a healthy control population apart from the regions in a known genomic region associated with CF. Comparison with public SV datasets that determined the frequency of the less common alleles found that 85% of these matched with previous analyses1. The SVs that were found were also compared with a well-documented public database for short-read sequencing, gnomAD9, and were determined to align with previous long-read versus short-read comparisons. GWAS SVatalog is therefore generalizable and encompasses the cohort it is studying well.
The authors identified candidate structural variants that illustrate how previously reported SNPs within GWAS in LD with SVs and gained insight on their functional impact. SVs that were in weaker LD with GWAS SNPs are likely to have different functions that are more related to the regulation of genes, such as promoting the transcription of RNA. A particular case they used was looking at a depression-related locus near the gene TMEM106B, where GWAS-significant SNPs show complete LD with a nearby SV within regions that influence gene regulation. Previous analyses determine this SNP to have an increased level of dementia10, and therefore functional follow up could confirm a potential characteristic of patients with this SV and/or SNP.
As we gain complexity in population-level studies, an issue rises where quantity becomes a larger focus than comprehension. Rather than making multiple novel tools, we must focus on creating a balance between data generation and comprehension. GWAS SVatalog can serve as a pilot project for documenting SVs, as it directly focuses on the visualization of documented LD with GWAS SNPs. It is important to build on rather than replicate similar tools, improving the accessibility and interpretability of the relationships that are documented.
The future of GWAS lies in embracing the full spectrum of human variation. Larger, more diverse cohorts and strong whole-genome sequencing resources will be essential for linking structural variants to disease. LD patterns differ substantially from the ancestry or sample that is being analysed, and therefore a tool that emphasizes this should look beyond a group of individuals with the same ancestry, trait, and population. Moreover, finding functional relevance is considered an “afterthought” of population-level studies such as GWAS. Examining LD associations after GWAS is a relevant genomic mapping tool, but a less reliable method to examine SVs. An important direction is to integrate SVs directly within analysis, rather than downstream of GWAS. This will provide unified associations of the reference panel and can potentially include mixed variant types for direct comparison.
Population-level analysis should always be an integrated approach that combines the possibility of all possible structural variants. A study of single-nucleotide changes will never be enough to find the complete picture of human variation. SVs can explain many of the disease associations we aim to find, opening discovery of novel loci and a better understanding of unexplained SNPs.
References
1. Chirmade, S. et al. GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations. Heredity. (2025)
2. Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Primer 1, 59 (2021).
3. Maurano, M. T. et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337, 1190–1195 (2012).
4. Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
5. Freeman, J. L. et al. Copy number variation: New insights in genome diversity. Genome Res. 16, 949–961 (2006).
6. Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
7. Yang, L. A Practical Guide for Structural Variation Detection in the Human Genome. Curr. Protoc. Hum. Genet. 107, e103 (2020).
8. Eckford, P. D. W. et al. The CF Canada-Sick Kids Program in individual CF therapy: A resource for the advancement of personalized medicine in CF. J. Cyst. Fibros. 18, 35–43 (2019).
9. Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
10. Lee, J. Y., Harney, D., Kwok, J., Larance, M. & Don, A. S. The major TMEM106B dementia risk allele affects TMEM106B protein levels and myelin lipid homeostasis in the ageing human hippocampus. Mol Neurodegeneration, 18, 63 (2023).









