The Biological Pink Tax Killing Women Instead of their Cancer

Tessa Pelino

Historical exclusion of women from drug development trials has created a male-centric dosing regimen that consistently fails. A pink tax women pay with their own bodies, costing them increased toxicity and reduced treatment efficacy. A disparity researchers argue could be urgently corrected with a sex adjusted dosing formula (SABSA) and therapeutic drug monitoring (TDM).

The century old standard of chemotherapy dosing based on Body Surface Area (BSA) is actively failing half of the population. Modern chemotherapy is asystemic administration of cytotoxic agents that inhibitrapid cell proliferation, and its development has evolved almost exclusively around the male body.1 This sex-related bias dates to a 1977 FDA recommendation to exclude women of childbearing age from all clinical trials following the thalidomide tragedy.2 This policy, steeped in an era of patriarchal mindsets, ensured that women’s unique physiological profiles remained unstudied during the golden age of drug discovery. Yet even today, contemporary meta-analyses show that female enrollment for FDA cancer trials still stalls at roughly 35%.3 This ongoing disparity underscores an unfortunate reality: drug development and dosage protocols are structured for male metabolisms.  

Ignoring the distinct physiological and pharmacokinetic differences between men and women has led to higher toxicities and suboptimal outcomes reported with 20% of chemotherapeutic drugs for women.1 Even more alarming, when dosed according to BSA, women often face 15-25% higher circulating drug concentrations in serum than their male counterparts, directly correlating to the elevated toxicity rates and adverse reactions frequently reported in female patients. Writing in Chemotherapy, Seydoux et al., argue that mitigating this inequity requires integrating a Sex-Adjusted Body Surface Area (SABSA) model which addresses sex-specific metabolic differences, coupled with therapeutic drug monitoring (TDM) to maximize both safety and efficacy.1

BSA calculations are currently and universally used in clinical oncology, having been optimized from male participants under the assumption that drug clearance is proportional to total body size.4 This model inputs height and weight to estimate a lean muscle mass ratio and liver volume that correlates a systemic drug elimination rate.1 However, this one-size-fits-men approach ignores fundamental differences in the biology between sexes, and consistently shows a poor correlation between initial dosing concentrations and adequate clearance from the female body. Crucially, women on average have around 10% higher body fat than men (350 g/kg versus 250 g/kg in men), resulting in a higher fat to lean muscle mass ratio.3 Thus, although lean mass is a successful estimator of drug clearance in men, dosing women based on those parameters ignores their higher percentage of metabolically inactive fat and inaccurately predicts their functional muscle mass and liver clearance capacity.

Sex-specific hormones also interact with metabolic organs like the liver and kidney in drastically different ways. For example, the estrogen family’s interaction with hepatocytes not only stimulates various growth factors but also helps drive the sexual dimorphism of liver enzymes, directly impacting the liver’s efficiency in eliminating estrogen and other compounds from the blood.5 Beyond hormonal regulatory loops, anatomical variations further dictate how a drug is dispersed once it enters circulation. Females possess a higher plasma percentage in blood, influencing the perfusion rate of oxygen and other nutrients into bodily tissue that can alter drug delivery rates.6 Ultimately, these examples only represent a small fraction of the vast landscape of sex-specific interactions that current clinical dosing models fail to consider (Figure 1).7

Figure 1 | Infographic of Sex-Specific Influences on Pharmacokinetics (PK). Hennig, M.,7 illustrates a variety of sex-specific interactions that influence an individual’s overall metabolic profile. Differences outlined for cardiac output, GI absorption via transient time and gastric pH, volume of fat distribution, predominant liver metabolic enzymes, and immune cell clearance. Adapted from Fig. 17

Diving deeper into the molecular level of sex dimorphism, many drug metabolizing enzymes and intracellular transport proteins exhibit sex-specific expression.8 For example, the dihydropyrimidine dehydrogenase (DPYD) gene encodes for a major drug metabolizer DPD that catabolizes over 80% of 5-Fluorouracil (5-FU), a chemotherapy commonly prescribed for breast, colorectal, stomach and pancreatic cancers.9-10 While germline mutations in DPYD are established drivers of 5-FU toxicity, recent pharmacogenomic (PGx) studies show that women tend to exhibit lower DPD activity comparable to mutant DPYD phenotypes, and have poorer 5-FU clearance.1 This genetic disparity explains why elevated toxicity rates of 5-FU are disproportionately reported for women.11

Another culprit is the sexual dimorphic nature of liver transcriptomes, defined by the complete set of RNA molecules governing protein abundance that are differentially expressed in liver cells. Yang et al., researching liver transcriptomic profiles identified 77 drug metabolizing enzyme and transporter (DMETs) genes that display significant sex-explicit expression patterns.8 These findings validate known PGx disparities in the baseline expression of the DMET powerhouse cytochrome P450 (CYP) enzymes between males and females which metabolize roughly 70-80% of all clinically available drugs.12 Specifically, CYP1A2, CYP2D6 and CYP2E1 are reported to be expressed significantly higher in males. Altogether, female livers don’t rely on the same proportion of drug metabolizers as males, further vindicating how anatomical, hormonal and genetic variance are major determinants to observed drug efficacy and toxicity rate disparities.

Alleviating this biological pink tax will require us to go back to the drawing board and stop treating the female body like it’s a “smaller man”. An immediate action is to implement the SABSA model in place of the traditional BSA. Ironically, the BSA scale isn’t only failing women but also consistently underdosing men who encounter higher rates of relapse.1 The SABSA is a low-cost refinement that modulates current BSA dosing recommendations to increase by ~10% for males and decrease by ~10% for females to better reflect metabolic baselines.

While clinical validation of SABSA is urgently worth pursuing, the long-term goal for the next century of cancer therapeutics will require regulating equal representation of both sexes in all drug trials at the initial stage of development. Furthermore, as genetic breakthroughs reveal the vast complexity of patient-specific drug interactions, TDM is a primary safeguard to ensure drug selection and dosage regimens are tailored to an individual’s unique metabolic profile. After all, if precision medicine is built on leveraging cancer genetics for personalized treatment, we can no longer treat biological sex as a negligible variable when developing therapeutics.

References

1 Seydoux, C. et al. Importance of Sex-Dependent Differences for Dosing Selection and Optimization of Chemotherapeutic Drugs. Chemotherapy 70, 92–101 (2025).

2 Kim, J. H. & Scialli, A. R. Thalidomide: the tragedy of birth defects and the effective treatment of disease. Toxicol Sci 122, 1–6 (2011).

3 Özdemir, B. C., Csajka, C., Dotto, G.-P. & Wagner, A. D. Sex Differences in Efficacy and Toxicity of Systemic Treatments: An Undervalued Issue in the Era of Precision Oncology. J Clin Oncol 36, 2680–2683 (2018).

4 Beer, H. Is prescribing anticancer drugs by body surface area still relevant? Hospital Pharmacy Europe https://hospitalpharmacyeurope.com/clinical-zones/oncology/is-prescribing-anticancer-drugs-by-body-surface-area-still-relevant/ (2025).

5 Kasarinaite, A., Sinton, M., Saunders, P. T. K. & Hay, D. C. The Influence of Sex Hormones in Liver Function and Disease. Cells 12, 1604 (2023).

6 Soldin, O. P. & Mattison, D. R. Sex Differences in Pharmacokinetics and Pharmacodynamics. Clin Pharmacokinet 48, 143–157 (2009).

7 Hennig, M. Sex-Based Differences in the Biodistribution of Nanoparticles and Their Effect on Hormonal, Immune, and Metabolic Function. Pharma Excipients https://www.pharmaexcipients.com/news/sex-based-differences-biodistribution-nanoparticles/ (2022).

8 Yang, L. & Li, Y. Sex Differences in the Expression of Drug-Metabolizing and Transporter Genes in Human Liver. J Drug Metab Toxicol 3, (2012).

9 Evans, W. E. Pharmacogenetics of Thiopurine S-Methyltransferase and Thiopurine Therapy. Therapeutic Drug Monitoring 26, 186 (2004).

10 Alzahrani, S. M., Al Doghaither, H. A., Al‑Ghafari, A. B. & Pushparaj, P. N. 5‑Fluorouracil and capecitabine therapies for the treatment of colorectal cancer (Review). Oncology Reports 50, 1–16 (2023).

11 Prado, C. M. M. et al. Body Composition as an Independent Determinant of 5-Fluorouracil–Based Chemotherapy Toxicity. Clin Cancer Res 13, 3264–3268 (2007).

12 Grangeon, A. et al. Protein Levels of 16 Cytochrome P450s and 2 Carboxyl Esterases Using Absolute Quantitative Proteomics: CYP2C9 and CYP3A4 Are the Most Abundant Isoforms in Human Liver and Intestine, Respectively. Pharmaceuticals 18, 1789 (2025).

Flipping the epigenetic switch: Inducing growth arrest in cancer cells

Samran Prasla

Acute loss of DNA methylation in cancer cells triggers senescence and activates immune signalling, revealing a druggable epigenetic vulnerability that halts tumour growth.

Cancer cells are experts at bypassing the safeguards that keep cell growth in check, including cellular senescence, a stable state in which cells permanently stop dividing while remaining metabolically active1,2. This non-proliferative state prevents damaged or stressed cells from giving rise to tumours. To overcome senescence, cancer cells deploy epigenetic modifications, which alter gene activity without changing the underlying DNA sequence3. One key epigenetic mechanism is DNA methylation, in which the addition of a methyl group turns genes off. Removing these marks can reactivate genes and alter genome stability, with important consequences for cell growth and tumorigenesis. In tumours, methylation patterns are profoundly altered; tumour suppressor genes are often hypermethylated and silenced, while large regions of the genome lose methylation, allowing previously silent genes to become active2

These observations have motivated the development of demethylating chemotherapy drugs, such as Vidaza, which are used clinically4. While these drugs can trigger senescence, they also cause widespread DNA damage, making it difficult to determine whether growth arrest is truly a direct consequence of methylation loss or a secondary effect of cellular stress5. To address this question, Chen et al6. used an auxin-inducible degron system, which enables rapid and controlled degradation of specific proteins. This method allowed them to selectively remove DNA methyltransferase (DNMT1), the enzyme responsible for maintaining DNA methylation on the newly replicated DNA strand, and its essential cofactor UHRF1 (Ubiquitin-like PHD and RING finger domains 1), which recruits DNMT1 to DNA6,7.

Degradation of these proteins in colorectal cancer cell lines caused a marked loss of DNA methylation, allowing the authors to observe the direct effects of demethylation alone, separate from DNA damage. Prolonged depletion of either protein, and especially both together, caused the cells to stop dividing and adopt the hallmarks of cellular senescence, including enlarged nuclei and permanent growth arrest. Notably, although DNMT1 has been the primary focus of demethylation studies5,6, depletion of UHRF1 induced an even stronger senescence response, highlighting UHRF1 as an underexplored therapeutic target. These results reveal a striking dependency of cancer cells on DNA methylation to escape senescence in the absence of DNA damage. Furthermore, rescue experiments with mutant DNMT1 or UHRF1 proteins that could or could not maintain methylation revealed that senescence occurred only when DNA methylation was lost, confirming that methylation itself, rather than protein depletion or replication stress, suppresses senescence6.

To explore the molecular consequences of demethylation, the authors performed RNA sequencing to examine gene expression. Genes involved in cell division and proliferation were downregulated, reflecting the observed growth arrest, while genes associated with the senescence-associated secretory program (SASP) were upregulated6. SASP consists of factors released by senescent cells that can push neighbouring cells into a senescent state, potentially expanding the anti-proliferative effect across the tumour population8. SASP also includes cytokines and chemokines that activate immune signalling (Figure 1), and the authors observed upregulation of genes mediating these pathways1,6. Importantly, many SASP and immune genes were not previously methylated at their gene promoters, indicating that their activation reflects a broader, coordinated cellular program triggered by global loss of DNA methylation rather than simple reactivation of silenced genes6. This dependency creates a potential therapeutic vulnerability, as selectively reducing DNA methylation could simultaneously induce senescence in tumour cells and alert the immune system, offering a dual strategy to halt tumour growth and enhance anti-cancer immunity9.

To understand how DNA demethylation drives senescence, the authors examined whether it relies on classical tumour‑suppressor pathways that are traditionally also known to induce senescence (Figure 1). Tumour suppressor proteins such as p53, p21, and p16 normally act as checkpoints, halting cell division in response to stress or damage to prevent uncontrolled growth8. Many cancers disable these pathways to continue proliferating. To test their role in demethylation-induced senescence, Chen et al.6 degraded p53, knocked out p21, or p16 genes and then depleted DNMT1 or UHRF1. Remarkably, senescence still occurred in cells lacking p53 or p16 but was partially reduced in p21-deficient cells. Interestingly, following demethylation, p21 did not accumulate in the nucleus, where it is expected to enforce growth arrest6,8. It was found to be in the cytoplasm, where it was preventing apoptosis, allowing cells to persist long enough to enter senescence6. These findings reveal a distinct, epigenetically driven route to senescence that operates even in tumour cells that have disabled classical growth checkpoints. 

Figure 1 | Hallmarks and consequences of cellular senescence. Senescent cells undergo stable growth arrest driven by classical tumour‑suppressor pathways involving p53, p21, and p16. They secrete a senescence‑associated secretory phenotype (SASP), a mixture of cytokines, chemokines, and proteases that can reinforce senescence in neighbouring cells and recruit immune cells including T cells, macrophages, NK cells, and granulocytes. Figure adapted from Schmitt et.al8

These findings highlight that DNA demethylation triggers a distinct, epigenetically driven route to senescence, one that is independent of the classical tumour-suppressor pathways that are suppressed in cancer cells. Remarkably, even tumour cells that ignore classical growth checkpoints cannot escape the consequences of losing DNA methylation, exposing a new Achilles’ heel that could be exploited therapeutically. Finally, depletion of DNMT1 and UHRF1 reduced tumour growth in a mouse xenograft model of human cancer cells, demonstrating that this mechanism operates in vivo6

However, translating these findings into therapy will require caution. Senescent cells remain metabolically active, and the SASP they produce can trigger inflammation or, in certain contexts, even support tumour growth, as its composition and effects vary across tissue environments3. One strategy to mitigate these risks is to combine demethylation-induced senescence in cancer cells with senolytic drugs, which selectively clear senescent cells, thereby limiting the adverse consequences of their persistent SASP. Alternatively, pairing epigenetic therapies with immunotherapy could leverage SASP-driven immune activation to enhance tumour clearance3,8. In addition, defining the specific methylation-dependent regions that suppress senescence in cancer cells will be critical to develop targeted interventions that maximize therapeutic benefit while minimizing collateral effects. 

Importantly, this study shows that DNA methylation is not merely a passive mark in tumour cells but a dependency, shielding cancer cells from senescence, uncovering an epigenetic vulnerability with therapeutic potential. 

 References

  1. Ajoolabady, A. et al. Hallmarks and mechanisms of cellular senescence in aging and disease. Cell Death Discov. 11, 364 (2025).
  2. Yu, X. et al. Cancer epigenetics: from laboratory studies and clinical trials to precision medicine. Cell Death Discov. 10, 28 (2024).
  3. Piskorz, W. M. & Cechowska-Pasko, M. Senescence of Tumour Cells in Anticancer Therapy—Beneficial and Detrimental Effects. IJMS 23, 11082 (2022).
  4. Kaminskas, E., Farrell, A. T., Wang, Y.-C., Sridhara, R. & Pazdur, R. FDA Drug Approval Summary: Azacitidine (5-azacytidine, VidazaTM) for Injectable Suspension. The Oncologist 10, 176–182 (2005).
  5. Venturelli, S. et al. Differential Induction of Apoptosis and Senescence by the DNA Methyltransferase Inhibitors 5-Azacytidine and 5-Aza-2′-Deoxycytidine in Solid Tumour Cells. Molecular Cancer Therapeutics 12, 2226–2236 (2013).
  6. Chen, X. et al. DNA methylation protects cancer cells against senescence. Nat Commun 16, 5901 (2025).
  7. Liu, X. et al. UHRF1 targets DNMT1 for DNA methylation through cooperative binding of hemi-methylated DNA and methylated H3K9. Nat Commun 4, 1563 (2013).
  8. Schmitt, C. A., Wang, B. & Demaria, M. Senescence and cancer — role and therapeutic opportunities. Nat Rev Clin Oncol 19, 619–636 (2022).
  9. Liu, X., Ding, J. & Meng, L. Oncogene-induced senescence: a double-edged sword in cancer. Acta Pharmacol Sin 39, 1553–1558 (2018).

Breaking the Link : Genetics Uncouple Adiposity and Cardiometabolic Comorbidities

Xiaotong (Emily) Wang

Multi-trait genome-wide association study reveals the most uncoupling loci between adiposity and cardiometabolic comorbidities to date and unlocks precision medicine potential in obesity.

Obesity is a serious health issue that affects approximately 39.6% of adults and 18.5% of children and adolescents worldwide, with prevalence continuing to increase in recent years1. It arises from intricate interactions between environmental and genetic factors and is associated with numerous physical and psychosocial consequences1-5. Notably, obesity is a crucial risk factor for various cardiometabolic diseases, and the heterogeneous nature of the condition cannot be adequately captured by any singular adiposity trait2-6. In a recent study, Chami and colleagues tackle a fundamental question – why do some individuals with obesity develop cardiometabolic comorbidities while others remain protected6? By identifying the largest set of uncoupling loci between adiposity and cardiometabolic comorbidities to date, their findings open new avenues for genetic subtype-stratified treatment, prognosis, and prevention of obesity6.

To investigate why some individuals with obesity develop cardiometabolic comorbidities while others do not, genetic researchers have turned to genome-wide association studies (GWAS) to identify genetic variations associated with obesity7. More than 1000 genetic loci have been identified to date, many of which implicate the central nervous system (CNS) as a critical player in body weight regulation7-8. Despite these findings, our understanding of the underlying mechanisms remains limited as previous GWAS largely focused on one adiposity trait – body mass index (BMI) – thereby ignoring the substantial heterogeneity present amongst individuals with obesity6. Moreover, identifying genetic loci associated with obesity alone does not answer the question of what determines susceptibility to cardiometabolic comorbidities.

Rather than finding genetic loci associated with obesity alone, Chami and colleagues used GWAS to search for genetic loci that uncouple excess adiposity from cardiometabolic risk, or in other words, alleles that simultaneously increase adiposity and decrease cardiometabolic risk9. To address the lack of heterogeneity seen previously, the authors built upon existing methods by examining multiple traits instead of one6. In particular, 3 adiposity traits and 8 cardiometabolic traits were selected as part of the experimental design (Figure 1)6. By designing a comprehensive multi-trait GWAS with uncoupling phenotypes and leveraging individual-level data from 452,768 individuals from the UK Biobank, the authors identified 266 unique variants located in 205 loci that are more than 1Mb apart6. Interestingly, 139 out of 266 variants have not been previously reported in the context of cardiometabolic uncoupling6.

Figure 1 | Experimental Design & Uncoupling Framework. Schematic representation of the multi-trait GWAS used to identify genetic loci that uncouple adiposity from cardiometabolic risk. Adiposity and cardiometabolic uncoupling traits analyzed in the study are indicated. Figure created in BioRender10 with information from Chami et al.6

As with any breakthrough discovery, this raises several important questions. What do these uncoupling loci tell us about the genetic architecture of obesity? Do they converge on specific biological pathways that could be used as therapeutic targets? Could these findings be translated from bench to bedside to benefit individuals with obesity?

Chami and colleagues attempt to address these questions by identifying 8 genetically defined subgroups among the 266 unique variants based on shared association patterns between adiposity and cardiometabolic traits with clustering analyses6. Each subtype of obesity has its distinct biological pathway signatures, cardiometabolic risk profiles, and serum protein profiles6. Heterogeneity amongst uncoupling loci suggests that perhaps the heterogeneous nature of obesity extends beyond phenotypic presentation and into its underlying genetic architecture.

A more detailed gene prioritization analysis of the uncoupling loci uncovers both known and novel pathways that potentially contribute to differences in cardiometabolic susceptibility6. Prioritized genes show implications in adipose tissue expandability, beta cell function, fibrosis, glucose homeostasis, immune response, inflammation, insulin secretion, lipid metabolism, etc., providing further support for established mechanisms6. In addition to pathways previously described in the context of uncoupling, the analysis highlights new pathways including circadian rhythm, hepatic control of glucose homeostasis, hepatic lipid accumulation, sex differentiation, skeletal muscle growth, vascular development, among other biological processes6. Even though the individual contributions of each gene and/or pathway to disease onset remain unclear and await further examination, the discovery of novel genes and pathways unlocks new therapeutic possibilities for individuals with obesity who may be at high risk of developing cardiometabolic comorbidities. Furthermore, uncovering novel pathways may help guide future research and accelerate our understanding of the entire biological picture.

Beyond propelling pharmaceutical and research discoveries, what do uncoupling loci mean for individuals who currently have obesity? In order to understand the clinical impact of genetic predisposition to adiposity and cardiometabolic comorbidities, Chami and colleagues created a genetic risk score called GRSuncoupling for the 266 unique variants6. Within this scoring system, individuals with higher GRSuncoupling demonstrate healthier cardiometabolic profiles and have significantly lower risk of various conditions such as acute myocardial infarction, angina, hypertension, ischemic heart disease, metabolic syndrome, and type 2 diabetes6. Moreover, the authors showed that GRSuncoupling is sensitive to sex differences, as the adiposity traits differed in fat distribution among males and females, which corroborates previous sex specific observations and demonstrates its ability to capture sex specific differences6. Generalizability of GRSuncoupling to non-European populations remains undetermined. Future studies should aim to incorporate individuals from diverse ethnic backgrounds to ensure findings are broadly applicable.

Findings highlight the potential of this GRSuncoupling scoring system to identify subgroups among individuals with obesity with granularity. In addition, it may facilitate early risk stratification for individuals with obesity, allowing for a personalized and timely prevention based on associated cardiometabolic profiles. If the clinical utility of this scoring system can be cross-validated with additional samples in the near future, it could perhaps become a great tool for genetic risk stratification in children and adolescents with obesity. It may supplement early prevention of cardiometabolic comorbidity onset in high-risk individuals, leading to an overall higher quality of life.

References

1 Tsai, A. G. & Bessesen, D. H. Obesity. Annals of Internal Medicine 170, (2019).

2 Mokdad, A. H. et al. Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA 289, 76 (2003).

3 Alberti, K. G. M. M. et al. Harmonizing the metabolic syndrome. Circulation 120, 1640–1645 (2009).

4 Hubert, H. B., Feinleib, M., McNamara, P. M. & Castelli, W. P. Obesity as an independent risk factor for cardiovascular disease: A 26-year follow-up of participants in the Framingham Heart Study. Circulation 67, 968–977 (1983).

5 Must, A. The disease burden associated with overweight and obesity. JAMA 282, 1523 (1999).

6 Chami, N. et al. Genetic subtyping of obesity reveals biological insights into the uncoupling of adiposity from its cardiometabolic comorbidities. Nature Medicine 31, 3801–3812 (2025).

7 Loos, R. J. & Yeo, G. S. The genetics of obesity: From Discovery to Biology. Nature Reviews Genetics 23, 120–133 (2021).

8 Loos, R. J. & Kilpeläinen, T. O. Genes that make you fat, but Keep You Healthy. Journal of Internal Medicine 284, 450–463 (2018).

9 Huang, L. O. et al. Genome-wide discovery of genetic loci that uncouple excess adiposity from its comorbidities. Nature Metabolism 3, 228–243 (2021).

10 Scientific Image and Illustration Software. BioRender Available at: https://www.biorender.com/. (Accessed: 15th February 2026)

How Viral DNA in Your Blood May Influence the Severity of Autoimmune Diseases and COVID-19

Nasim Azizi

Using whole-genome sequencing in a large Japanese cohort, researchers uncovered intriguing links between viruses circulating in the bloodstream or integrated into the human genome —such as anelloviruses and endogenous HHV-6—and chronic diseases like lupus, rheumatoid arthritis, and COVID-19.

What if a piece of viral DNA lurking in your blood could determine the likelihood and severity of a disease you may develop over time? A recent study helps us get closer to answering this question by analyzing two viruses in the human blood and genome, anellovirus and eHHV-6B, and their role in autoimmune diseases and COVID-19.

Currently, there are gaps in our understanding of the relationship between viral infection and autoimmune diseases. Viruses often exist within the human blood without causing symptoms.1 For instance, anelloviruses have been observed in 8% of healthy individuals’ blood 2 and eHHV-6, which is a virus integrated within the genome, exists in 1% of humans, leading to the characterization of them as the human ‘virome’.3 To explore potential links between the human virome and immune responses, large scale studies need to be carried out.  A study by Sasa et al. set out to explore how two specific viruses in humans, eHHV-6 and anellovirus, contribute to the pathogenesis of five autoimmune diseases, psoriasis vulgaris (PV), systemic lupus erythematous (SLE), rheumatoid arthritis (RA), pulmonary alveolar proteinosis (PAP), multiple sclerosis (MS), with the addition of COVID-19.4 The results revealed that patients with eHHV-6B have a higher risk of SLE and PAP, while high loads of anellovirus in the blood is strongly associated with RA, SLE and COVID-19 severity. This study has uncovered important aspects of the relationship between the human virome and both autoimmune and infectious diseases, providing a better understanding of these conditions and the potential role of these viruses in the clinical world as biomarkers.

Researchers further investigated this connection by analyzing the association between eHHV-6, anelloviruses, certain autoimmune diseases, as well as COVID-19 in a cohort of over 6300 Japanese individuals and healthy controls. They used whole-genome sequencing to study each individual’s genome for the presence of either eHHV-6, or anellovirus. The results seemed to confirm their suspicions; they discovered that eHHV-6B, a type of eHHV-6 virus, is much more common in SLE and PAP patients than in healthy controls. In addition to that, they also noticed that SLE patients with eHHV-6B showed more severe symptoms, confirming the significant correlation between this virus and SLE severity.

Figure 1: The workflow of the study. An overview of the study by Sasa et al. on 6321 Japanese patients with PV, RA, SLE, PAP, MS or COVID-19, along with healthy controls. Researchers used long-read sequencing and genome mapping to detect the presence of eHHV-6 and anellovirus. Figure adapted from Sasa, N. et al., 2025  4.

Another intriguing observation from this study was the role of eHHV-6B in immunity to HHV-6B virus. According to Sasa et al., eHHV-6B triggers immune responses against HHV-6B. Since eHHV-6B originated from a virus but is now a part of the human genome, it may act as both virus and self. Therefore, eHHV-6B may be a heritable viral infection that the immune system responds to. This occurrence is called ‘endoimmunity’ 4.

Moving onto anellovirus, high levels of this virus were seen in the SLE, RA, and COVID-19 patients. Interestingly, while the number of COVID-19 patients with anellovirus was similar to the controls, the viral load of anellovirus, which is the amount of the virus in the patient’s blood, was much higher in the individuals with COVID-19. Additionally, most of the cases carrying the anellovirus infection, were individuals with severe COVID-19 symptoms. The increased load of anellovirus may be because of COVID-19 or the effect of its treatments on the immune system. Alternatively, the high viral load of anellovirus may be contributing to the weakened immune responses and the development of this disease. They also observed that anellovirus prevalence is higher in patients with SLE and RA, further supporting the hypothesis that the human virome may be playing a role in autoimmune diseases.

These findings help us understand that, even though there is a small number of people who have high loads of eHHV-6B and anellovirus, they may have a significant influence on disease risk and clinical outcomes. This impact is more notable when compared to other genetic or environmental factors.5–7

Therefore, eHHV-6B and anellovirus can potentially play a role as biomarkers for these diseases, helping us move closer to a personalized medicine approach for them. Furthermore, by fully understanding the influence that these viruses and the mentioned diseases such as SLE, PAP, RA and COVID-19 may have on the immune response, we can develop targeted therapies, improving prevention and treatment strategies.

This approach can be expanded to a broader scope, and potentially inspire similar studies on other diseases and their links to viral infections. By applying this perspective, researchers could uncover previously unknown connections between the virome and diseases beyond the ones studied here, offering new insights into how viruses influence immune responses and disease progression. Examining diseases from this new perspective, we can deepen our understanding of the complex interplay between viruses, the immune system, and the human health.

While this study offers valuable insights, it has some limitations. Its focus on a single ethnic group means the findings may not apply to a broader more diverse population. Additionally, although the study included many participants, its short duration leaves questions about the long-term effects unanswered. To build on these findings, researchers need to conduct extended studies, particularly longitudinal ones, to fully understand the role of anellovirus and eHHV-6B in autoimmune diseases and COVID-19. Future research could help uncover how these viruses influence immune responses and potentially pave the way for new treatment strategies.

References

1.         Haynes, M. & Rohwer, F. The Human Virome. in Metagenomics of the Human Body (ed. Nelson, K. E.) 63–77 (Springer New York, New York, NY, 2011). doi:10.1007/978-1-4419-7089-3_4.

2.         Moustafa, A. et al. The blood DNA virome in 8,000 humans. PLOS Pathog. 13, e1006292 (2017).

3.         Liu, X. et al. Endogenization and excision of human herpesvirus 6 in human genomes. PLOS Genet. 16, e1008915 (2020).

4.         Sasa, N. et al. Blood DNA virome associates with autoimmune diseases and COVID-19. Nat. Genet. 57, 65–79 (2025).

5.         Okada, Y. et al. A Genome-Wide Association Study Identified AFF1 as a Susceptibility Locus for Systemic Lupus Eyrthematosus in Japanese. PLoS Genet. 8, e1002455 (2012).

6.         Sakaue, S. et al. Genetic determinants of risk in autoimmune pulmonary alveolar proteinosis. Nat. Commun. 12, 1032 (2021).

7.         Ogawa, K. et al. Next-generation sequencing identifies contribution of both class I and II HLA genes on susceptibility of multiple sclerosis in Japanese. J. Neuroinflammation 16, 162 (2019).

Revealing the Hidden Genetic Diversity within Human Segmental Duplications

Priyal Bhavsar

Recent sequencing technology advances have allowed for a genome-wide representation of the structural diversity of human segmental duplications, a widely understudied variation due to their size and sequence similarity.

Between the monumental release of the first draft of the human genome in 2001, to the generation of the first gapless sequence of the genome 21 years later, significant insights about the diversity of segmental duplications (SD) have been revealed1. SDs are homologous DNA sequences greater than 1 kb with more than 90% sequence identity that are repeated in multiple locations in the genome in variable copy numbers2. These SDs lead to structural variations such as deletions and duplications in the human genome through a process known as non-allelic homologous recombination2. Increasing knowledge about human SDs, such as copy number differences, location and structure of the duplication, and variation between African and non-African populations, has been credited to advancements in DNA sequencing technologies and alignment algorithms2. The implications of SDs in human diseases and our understanding of genomic evolution and diversity allows the results of population genetics surveys of SDs to be an important piece in completing the entire pangenome puzzle (Figure 1)3.

One such piece is a recent genome-wide analysis of the population genetic diversity of autosomal human SDs from African and non-African samples reported by Jeong et al. They investigated SD copy numbers, gene content, intrachromosomal (SDs positioned on the same chromosome) versus interchromosomal (SDs positioned on different chromosomes) distribution, and sequence patterns between populations2,4. These findings further support the completeness of population-specific human genome reference sequences, understandings of disease-associated SD variations and further research into the functional roles and expression of genes within copy number variable SDs.

Figure 1 | Rendition of a human pangenome reference sequence. A) The currently used human reference genome has some missing information about repetitive regions and segmental duplications. B) Recent advances in long-read DNA sequencing technology, reading longer regions of DNA with high accuracy, have allowed for the generation of complete human genome sequences including missing sequence information about segmental duplications. The pangenome reference aims to provide this complete picture of the genome, representing different version of the human genome sequence at the same time, while capturing the diversity from different human populations. Figure adapted from Leja et al., 20235.

Previously estimated to have accounted for about 5% of the genome, the proportion of SDs within the latest telomere-to-telomere (T2T) gapless sequence of the genome has now increased to about 7%6. SD regions of the genome contain the majority of copy number variable genes, genes with differences in the number of duplicated DNA segments among individuals’ genomes. These genes have been implicated in cardiovascular, neurological, immune and autoimmune diseases, as well as the evolution of the human frontal cortex, the development of colour vision and our adaption to high starch diets2.

Recently, Jeong et al. aimed to present a population genetics survey of human SDs by analyzing DNA sequences of ethnically diverse samples against human genome reference assemblies, a computational representation of the sequence2. In recent times, these human genome reference assemblies have had high-fidelity (HiFi) long-read sequencing data as part of the Human Pangenome Reference Consortium (HPRC) and Human Genome Structural Variation Consortium2. Specifically, PacBio HiFi sequencing technology has allowed for long segments of DNA to be read with high accuracy, helping researchers uncover genetic information with great precision7. In this study, SDs were identified in samples by generating HiFi long-read sequencing data and confirming their presence through Illumina short-read sequencing2. SDs were mapped to the T2T human reference genome (T2T-CHM13) to determine their novelty, and analyzed against the human genome assemblies to determine their variability between African and non-African populations2.  

Through this study, Jeong and colleagues have contributed to advancements in our knowledge of human genomic architecture, and set a strong foundation for further investigations. Jeon et al. reported a population-level overview of SDs and found that African populations presented with higher copy numbers for many duplicated gene families (related to immunity, drug detoxification and environmental interactions) compared to non-African populations2. The genomes of African samples also showed significantly more intrachromosomal SDs2. Authors reported the identification of 183 novel protein coding genes within SD regions enriched for functions related to immunity2. These results further confirm the increased genetic diversity and greater population substructure within African populations. Providing one of the first overviews of a pangenome approach to SD classification, the findings of this study open many doors for further clinical applications, population-specific therapeutic strategies, personalized genome analysis tools and understanding overall evolutionary mechanisms for SD regions.

Although the small sample size of the study poses a limitation in capturing the overall structural polymorphism and genetic diversity of human SDs, current efforts by the HPRC steering away from only one human reference genome will bridge this gap in the future3. As more human population genomes are completely T2T sequenced, further information about the role of SDs in the recombination of gene-poor acrocentric short arms of chromosomes will also be revealed. Another limitation of the study was the inconsistency in samples between those used to identify novel genes in SD regions and their compared genome reference assemblies. The functionality of these novel genes can be better determined through comparisons between the same samples.  

Due to the large length and sequence similarity of SD regions, understanding the functional consequences of variations in SDs has been difficult with standard traditional genotyping and sequencing techniques2. Many genome-wide studies of transcription, regulation and association hence often exclude these SDs. Future research should aim to study the role of SDs in identifying population-specific regions of the genome more prone to mutations, and the accurate identification of population-specific structural variations. The functionality of variants and genes within SDs, and how they contribute to phenotypic diversity and disease predisposition, should also be investigated. The 183 novel protein coding and copy number variable genes identified in this study should also be further functionally tested to reveal information about their transcription patterns and tissue specificity. Overall, the analysis of genome reference sequences from diverse and underrepresented global populations through growing multi-omics, long-read sequencing, and transcriptomics approaches will provide pieces to the puzzle of a complete pangenome representation for humans.

References

1.         Human Genome Project Fact Sheet. https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genome-project.

2.         Jeong, H. et al. Structural polymorphism and diversity of human segmental duplications. Nat. Genet. 1–12 (2025) doi:10.1038/s41588-024-02051-8.

3.         Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

4.         Abdullaev, E. T., Umarova, I. R. & Arndt, P. F. Modelling segmental duplications in the human genome. BMC Genomics 22, 496 (2021).

5.         Scientists release a new human “pangenome” reference. National Institutes of Health (NIH) https://www.nih.gov/news-events/news-releases/scientists-release-new-human-pangenome-reference (2023).

6.         Telomere-to-Telomere. https://www.genome.gov/about-genomics/telomere-to-telomere.

7.         Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics 13, 278–289 (2015).

Untangling Disease Effects on Gene Expression in the Human Brain

Anushka Deshmukh

A new study shows how brain disease alters gene expression, uncovering hidden genetic patterns and pointing to new therapeutic targets.

Neuroscience has long been focused on understanding how genetic variation influences brain traits and disease progression. Genetic variation plays a huge role in determining susceptibility to neurological diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), multiple sclerosis (MS), schizophrenia, and cognitive decline1. As of 2023, an estimated 6.7 million Americans aged 65 and older are living with Alzheimer’s dementia, with this number projected to reach 13.8 million by 20602.  However, uncovering the specific genetic pathways involved is challenging due to the heterogeneity of brain tissue and the difference in disease manifestations1,3,4. Most studies rely on brain tissue samples from individuals with neurological diseases, but this can skew results since the disease itself changes how genes are expressed5.

To address this issue, Haglund et al. analyzed how brain diseases alter gene expression quantitative trait loci (eQTLs), regions of the genome where genetic variation influences gene activity, which are valuable tools for connecting these variants to their functional outcomes5. Using single-nuclei RNA sequencing (snRNA-seq), a technique that measures gene activity in individual brain cells, they identified disease-dependent regulatory changes that would otherwise be masked in bulk-tissue analyses. The study workflow (Figure 1) outlines this approach, highlighting the integration of snRNA-seq, eQTL mapping across brain cell types, and Mendelian randomization (MR), a method that uses genetic variants as natural experiments to test whether gene activity causes disease, to identify causal gene-trait associations and prioritize therapeutic targets. While previous studies assumed that eQTL effects remain consistent across healthy and disease states, this study revealed that disease states can significantly alter genetic regulation, leading to biased conclusions5–7. By comparing data from diseases and healthy brains, they demonstrated the importance of using disease-free samples to draw accurate conclusions about genetic regulation in the central nervous system (CNS)5.

Figure 1: Overview of study workflow. This diagram shows how researchers analyzed nearly 2.3 million individual brain cells to explore how genetic differences affect brain diseases. The authors mapped expression quantitative trait loci (eQTLs) across eight brain cell types, evaluated disease-dependent effects, and applied Mendelian randomization (MR) and colocalization analysis. They looked at how genes are turned on or off in healthy vs. diseased brains and used MR to find which genes may actually cause brain conditions. Figure taken from Haglund et. al 5.

Analyzing over 2.3 million single-cell profiles from 391 individuals, the study identified nearly 14,000 genes with eQTL effects across eight brain cell types. Surprisingly, between 16.7% and 40.8% of these eQTLs exhibited disease-dependent allelic effects, meaning that genetic regulation of gene expression changed significantly in the disease state. For example, specific genetic variants affecting microglial genes, which are the brain’s immune cells, were influenced predominantly by AD, while others showed altered regulation in PD and MS. This shows that if scientists study only diseased brain tissue, they might draw the wrong conclusions because the disease can distort how genes behave. Some gene effects may even appear reversed, making it harder to tell which genetic changes are actually causing the disease. Adjusting for disease states in analyses does not fully correct for these effects5,8. Instead, using healthy brain data provides a clear picture of the baseline regulatory effects of genetic variants, which is essential for accurately identifying genetic risk factors and potential therapeutic targets.

By isolating data from 183 disease-free brains, serving as control samples, the researchers identified 91 gene-trait colocalizations undetectable in the larger mixed datasets. Colocalization is a method that determines whether a shared genetic variant drives both genetic expression and a trait, in this case, susceptibility to a CNS disorder9,10. One notable example is the identification of novel gene-trait links for MS, including genes like PEX13 in excitatory neurons. PEX13 helps control how cells manage waste and stress and has been linked to cell damage in the nervous system. Its role in MS had not been identified before this study. This underscores how disease-free data can improve our ability to detect critical disease mechanisms.

To infer causality, the study applied MR, a method that uses natural genetic variation to mimic a randomized experiment to test whether gene activity actually contributes to disease5,10. Using control brain data, researchers identified 140 causal gene-trait associations across 26 CNS phenotypes. Among these, genes such as EGFR (linked to AD) and GPNMB (linked to PD) emerged as potential therapeutic targets. EGFR inhibitors are already used in cancer treatments, and since increased expression of EGFR was linked to higher Alzheimer’s risk, these could be repurposed for neurodegenerative diseases11. Similarly, GPNMB could be a potential therapeutic target and biomarker for PD progression. Notably, these findings were validated using UK Biobank plasma protein data, reinforcing their potential clinical relevance12.

These findings have far-reaching implications for neuroscience and genomics. First, they emphasize the complex nature of genetic regulation in the brain, particularly in the context of disease. By showing that eQTL effects can change depending on the disease state, the study highlights the importance of separating disease and healthy samples in genetic analyses. Second, the study shows the power of single-cell technologies to resolve cell-type-specific effects, uncovering regulatory relationships that would remain hidden in bulk-tissue analyses. Third, the combination of MR and plasma proteomics offers a promising framework for identifying peripheral biomarkers that predict CNS disease outcomes.

Future research should expand to larger, more diverse datasets, including brains affected by other conditions such as traumatic injury or psychiatric disorders. Additionally, testing interventions that target genes like EGFR or GPNMB in animal models could validate their potential for drug development. The development of blood-based biomarkers informed by this research could revolutionize how CNS diseases are diagnosed and treated. As snRNA-seq and similar single-cell technologies continue to evolve, these methods could help decode the genetic basis of psychiatric conditions like depression or autism, areas where bulk-tissue studies have often fallen short. Finally, the study raises intriguing questions about the “Achilles’ heel” hypothesis: whether certain genetic variants predispose individuals to disease only under specific pathological conditions. Exploring this phenomenon could enhance our understanding of gene-environment interactions and their role in disease susceptibility.

By disentangling the effects of brain disease on gene expression, this study sets a new standard for interpreting eQTL data and prioritizing therapeutic targets. Its innovative use of healthy brain data and MR provides a clearer view of the genetic regulation underlying CNS traits. This study not only helps us rethink how to study the brain, but it could also pave the way toward more personalized treatments for neurodegenerative and psychiatric disorders.

References

1.         Misra, M. K., Damotte, V. & Hollenbach, J. A. The immunogenetics of neurological disease. Immunology 153, 399–414 (2018).

2.         2023 Alzheimer’s disease facts and figures. Alzheimers Dement. 19, 1598–1695 (2023).

3.         Wareham, L. K. et al. Solving neurodegeneration: common mechanisms and strategies for new treatments. Mol. Neurodegener. 17, 23 (2022).

4.         Woodward, A. A., Urbanowicz, R. J., Naj, A. C. & Moore, J. H. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet. Epidemiol. 46, 555–571 (2022).

5.         Haglund, A. et al. Cell state-dependent allelic effects and contextual Mendelian randomization analysis for human brain phenotypes. Nat. Genet. 57, 358–368 (2025).

6.         Wingo, A. P. et al. Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer’s disease pathogenesis. Nat. Genet. 53, 143–146 (2021).

7.         Bryois, J. et al. Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat. Neurosci. 25, 1104–1112 (2022).

8.         Porcu, E. et al. Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome. Nat. Commun. 12, 5647 (2021).

9.         Bao, J. et al. Brain-wide genome-wide colocalization study for integrating genetics, transcriptomics and brain morphometry in Alzheimer’s disease. NeuroImage 280, 120346 (2023).

10.       Zuber, V. et al. Combining evidence from Mendelian randomization and colocalization: Review and comparison of approaches. Am. J. Hum. Genet. 109, 767–782 (2022).

11.       Mansour, H. M., Fawzy, H. M., El-Khatib, A. S. & Khattab, M. M. Repurposed anti-cancer epidermal growth factor receptor inhibitors: mechanisms of neuroprotective effects in Alzheimer’s disease. Neural Regen. Res. 17, 1913 (2022).

12.       Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).

Harnessing the full power of genome editing

Sofia Edissi

Scientists have conducted the largest functional study of TP53 to date, revealing an accurate, high-throughput method that facilitates variant interpretation and identifies promising therapeutic targets to enhance patient diagnosis and treatment.

Tumour suppressor protein 53, TP53, is a big player in cancer research for its role as a master regulator of cell-cycle arrest and programmed cell death. Somatic variants of TP53 are observed in approximately 50% of all cancers.1 While scientists have the ability to generate enormous amounts of genetic information, interpreting the clinical significance of variants remains a major obstacle, thereby creating a bottleneck in clinical decision-making. Writing in Nature Genetics, Funk et al.2 use saturation genome editing (SGE) with clustered regularly interspaced short palindromic repeat mediated homology directed repair (CRISPR-HDR) to conduct the largest functional study of TP53 to date, revealing a high-throughput technology to accurately interpret variants. This research is part of a recent influx of publications using SGEto conduct functional studies for variant classification3,4.

In this new era of genomics, scientists can now sequence vast amounts of genetic information, but interpreting its clinical significance remains a major barrier to advancing diagnosis and care.5 As such, there has been a pressing need for researchers to bridge this gap in knowledge by performing studies which evaluate the impact of these variants on disease.While certain prediction software exists, these tools are not sufficient on their own to support classification of a variant as pathogenic or benign, according to the American College of Medical Genetics and Genomics (ACMG) guidelines for variant interpretation.6 Instead, the gold standard approach is to conduct functional studies whereby variants are modeled using cell lines or non-human species. Unfortunately, there are certain challenges with functional studies, specifically the ability for the test to be time-efficient in order to clear the backlog of variants whose impact on health is unknown.

Funk et al. were able to contribute towards this effort for the interpretation of variants in TP53. To do so, they used a technique known as SGE that makes use of CRISPR-HDR technology to simultaneously analyze all possible single nucleotide variants in a genomic region.7 They also used this technology to introduce short insertions and deletions into the gene. By doing so, they can study how changes in the DNA sequence of TP53 can lead to characteristic features of cancer cells, such as increased proliferation and survival1.

The CRISPR-HDR system is used to induce double-stranded DNA breaks at a region of interest, while providing a near identical DNA template for repair (figure 1). For mutation-based studies, the DNA template contains a sequence variant the researcher wants to introduce. As a result, the DNA will be repaired with the inclusion of that variant. Using SGE to model more than 9,000 TP53 variants in cancer cells, Funk et al. covered 94.5% of cancer associated variants in TP53, making this the largest functional study of TP53 to date.2

Figure 1. Workflow of saturation genome editing (SGE) using clustered regularly interspaced short palindromic repeat mediated homology directed repair CRISPR-HDR. CRISPR-HDR is used to introduce single nucleotide variants, and small insertions and deletions into the DNA binding domain of TP53 in cancer cells. The CRISPR-Cas9 system cuts the double-stranded DNA at a target region. The DNA is repaired through the HDR pathway which involves using a near-identical DNA template that includes the specific TP53 variant. The survival of cells with these incorporated variants are analyzed simultaneously to determine which variants provide an advantage for cell survival. Created with BioRender.

This strategy permitted highly accurate and specific separation for cell proliferation and survival between cancerous cells with pathogenic TP53 variants compared to those with benign variants. This allowed researchers to clearly identify tumour-associated TP53 variants from those not associated with tumour formation. In fact, they were able to reclassify ~20% of variants previously classified as benign, to pathogenic according to ACMG guidelines.6 Not only did this allow for patient diagnosis, but it also facilitated therapeutic interventions for certain variants, restoring the protein to its normal functionality.  Moreover, Funk et al. identified that TP53 variants which cause slight protein unfoldingresulting in partial loss-of-function (pLOF) is enough to enhance cell proliferation. Excitingly, pLOF variants have strong potential for correction through pharmacological intervention with targeted treatments. Therefore, the findings by Funk et al. reveal the power of SGE to advance TP53 variant interpretation, leading to better diagnosis and improved treatment options for patients.

This study reveals that the gold standard method previously used for TP53 variant classification, is insufficient to detect a substantial proportion of pathogenic TP53 variants. These results emphasize the importance of conducting functional studies in the native cellular environment (i.e. cancer cells), to detect all biological variation. While this was not possible with previous methods, it is now achievable using CRISPR-HDR. Additionally, the approach by Funk et al. surpasses the diversity of previous functional studies of TP53, improving the clinical utility of TP53 variant databases for determining disease causation. Their results also uncovered variants suggested as promising targets for pharmacological reactivation of normal TP53 function. These findings will improve variant interpretation for TP53, allowing for improved genetic counselling and advancements in cancer therapy. 

Although CRISPR-HDR worked well for this study, substantial limitations of CRISPR-HDR include the high frequency for off-target effects and repair by non-homologous end joining, an alternative repair pathway which joins double-stranded DNA breaks without the use of a template. This is especially true for non-dividing cells, which could make this technique not as accessible for studying non-cancerous cells.8 As a result of these limitations, there has been recent interest in CRISPR-prime editing for wide-scale mutation-based studies. Although this technology outperforms CRISPR-HDR in efficiency, it is mainly limited to single nucleotide changes and specific nucleotide changes. However, CRISPR-HDR is more versatile because it is not limited to specific nucleotide changes and can also introduce insertions and deletions.8,9 Despite these limitations, Funk et al. demonstrate the power and versatility of CRISPR-HDR for wide-scale functional studies. Their research provides evidence that this method is a feasible solution to determine the clinical significance of variants, which is essential for clinical decision-making. Ongoing research will continue to improve the limitations of this technology to harness the full power of CRISPR-HDR for genome editing.

References

1.            Whibley, C., Pharoah, P. D. P. & Hollstein, M. p53 polymorphisms: Cancer implications. Nature Reviews Cancer vol. 9 95–107 (2009).

2.            Funk, J. S. et al. Deep CRISPR mutagenesis characterizes the functional diversity of TP53 mutations. Nat Genet (2025).

3.            Buckley, M. et al. Saturation genome editing maps the functional spectrum of pathogenic VHL alleles. Nat Genet 56, 1446–1455 (2024).

4.            Sahu, S. et al. Saturation genome editing-based clinical classification of BRCA2 variants. Nature (2025).

5.            Burke, W., Parens, E., Chung, W. K., Berger, S. M. & Appelbaum, P. S. The Challenge of Genetic Variants of Uncertain Clinical Significance: A Narrative Review. Annals of Internal Medicine vol. 175 994–1000 (2022).

6.            Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine 17, 405–424 (2015).

7.            Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).

8.            Liao, H., Wu, J., VanDusen, N. J., Li, Y. & Zheng, Y. CRISPR-Cas9-mediated homology-directed repair for precise gene editing. Molecular Therapy Nucleic Acids vol. 35 (2024).

9.            Gould, S. I. et al. High-throughput evaluation of genetic variants with prime editing sensor libraries. Nat Biotechnol (2024).

Closing the Gaps: T2T Assembly Uncovers Hidden Functional Genomics

Weier Fan

The discovery of novel paralogues of WASHC1 and GPRIN2 in the T2T-CHM13 assembly highlights the importance of the new assembly and accurate genomic annotation in understanding genetic function.

Since its creation in 2013, the GRCh38 genome assembly has been the standard reference genome for scientists and researchers to compare and study genomic variations within the human population.1 However, with the recent publication of the T2T-CHM13 genome assembly by the Telomere-to-Telomere consortium, this new “gapless” assembly addresses the missing 8% of the human genome in the GRCh38 reference.1 A recent study by Cerdán-Vélez and Tress highlights the discovery of novel WASHC1 and GPRIN2 paralogues – genes that arose from duplication within the same genome – uncovered by the new assembly, shedding new light on these missing regions and their functionality.2

The newly assembled T2T-CHM13 added 1,956 gene predictions, which are stretches of DNA that, based on their patterns, might be genes. About 140 of these are similar to genes known to contain instructions for making proteins.1 This may open up discoveries that couldn’t have been made from the previous, incomplete genomes in human biology, evolution and even diseases.

However, it is unclear how many of these 140 genes produce proteins or their functions. Cerdán-Vélez and Tress set out to investigate the protein-coding status of two genes: WASH1-20p13 (LOC124908094), which shares a similar sequence with WASHC1, and GPRIN2L (LOC124900631), which is closely related to GPRIN2. These kinds of related genes—known as paralogues—arise from gene duplication events and may have similar or diverging functions. The researchers used multiple lines of evidence to study these genes, including proteomic data, evolutionary conservation, and cDNA sequencing.²

The Wiskott-Aldrich syndrome protein and SCAR Homologue (WASH) complex helps in controlling how cells organize and move materials inside them.3,4 It plays a key role in shaping transport pathways by working with another protein complex called Arp2/3.3,4 This Arp2/3 complex builds a network of tiny filaments, like a scaffold, that helps move and sort cargo inside the cell.4 The WASH complex is made up of five subunits, with WASH complex subunit 1 being the main subunit that interacts with the Arp2/3 complex.4 There is a disagreement between the three main reference databases as to which WASH1 gene encodes for this subunit. RefSeq annotates the WASHC1 gene as the only coding gene,5 Ensembl/GENCODE annotates WASHC1 and WASH6P as coding genes,6 and UniProtKB lists the isoforms of WASHC1, WASH2P, WASH3P, WASH4P, and WASH6P to be protein-coding.7 This lack of consensus underscores the gaps in the human genome, especially when it comes to correctly identifying which genes actually produce functional proteins. 

Cerdán-Vélez and Tress conducted a phylogenetic analysis to reveal that WASHC1 and other paralogues clustered separately from WASH1-20p13 and functional WASH1 genes in primates, as shown in Figure 1C.2 This cross-species conservation supports the functional importance of WASH1-20p13 and raises questions about which gene is the true protein-coding gene for the WASH complex. Understanding the precise gene responsible for encoding each protein is essential, as it helps clarify their roles in cellular functions, disease mechanisms, and the development of targeted therapies. The other WASH1 isoforms annotated in UniProtKB contain various mutations in their amino acid sequences, whereas WASH1-20p13 is the only isoform that maintains the conserved amino acid sequence across vertebrates (seen in Figure 1A).2 WASHC1, originally thought to be the protein-coding gene of the WASH complex, lacked these conserved residues (seen in Figure 1B).2 The authors propose that the conservation of WASH1-20p13 across species provides compelling evidence that it is the functional gene responsible for encoding the WASH complex protein. The conservation of this gene suggests that it plays an important role in basic biological functions; otherwise, it would have changed or been lost over time.

Figure 1. A comparative phylogenetic analysis of the difference WASH1 isoforms. (A) The five full-length WASH1 protein isoforms and the number of non-conserved amino acids, consisting of single amino acid variations (SAAVs) and deleted regions, differed from amino acids that are conserved across primates, mammals, and tetrapods. WASH1-20p13 is left blank as there was no difference between its conserved amino acids and those across the different species. (B) A comparative analysis of the amino acids between WASHC1 protein and WASH1-20p13 across regions conserved across vertebrates. The WASHC1 protein is shown to differ in all of the conserved amino acid positions. (C) A phylogenetic tree of great ape and human genes. Genes annotated from the T2T-CHM13 assembly are labelled with their RefSeq name, and the WASH1-20p13 gene (LOC124908094) is highlighted in red. The other WASH1 isoforms branch off to a separate cluster, the WASH1-20p13. Figure taken from Cerdán-Vélez and Tress.2

The authors went on to show that the protein produced by WASH1-20p13 captured almost all of the known peptide evidence. By comparing peptide data from a large protein database, PeptideAtlas, they found that WASH1-20p13 captured 47 out of 52 detected peptides.2 Seventeen of those peptides were unique to the WASH1-20p13 gene alone.2 Whereas, the previously assumed coding gene WASHC1 only captured a small portion of these peptides. This strongly suggests that WASH1-20p13 is the true protein-coding gene of the WASH complex.2

The GPRIN2 gene encodes a protein that helps regulate growth in nerve cells.8, 9 In the GRCh38 assembly, the GPRIN2 gene was found to have a missing region in the genome.2 The newer T2T-CHM13 assembly added a single gene to this missing region, GPRIN2L, a close paralogue to GPRIN2.The authors showed that GPRIN2L produced six unique peptides in proteomic analysis, while GPRIN2 didn’t produce any (seen in Figure 2).2 This suggests that GPRIN2L might be more important for certain functions, but it doesn’t rule out GPRIN2’s role in protein production. These findings help clarify the roles of these genes in development and diseases affecting the nervous system.

Figure 2. Mapping of peptide sequences detected in PeptideAtlas of the two human GPRIN2 proteins. GPRIN2L is highlighted in blue. The colour-coded residues indicate the number of observations that are detected for that protein. Those highlighted in red have >100 observations, orange >20 observations, yellow >10, green >5, and blue >2. The differences between the two sequences are highlighted in yellow. These differences tend to be in areas that are less conserved (yellow, green, blue), displaying a difference between these two proteins. The regions that only correspond to GPRIN2L are most likely regions that support the six unique peptides found only in GPRIN2L. Figure adapted from Cerdán-Vélez and Tress.2

Cerdán-Vélez and Tress’s paper highlights the importance of accurate gene annotation and the necessity of updating these major genomic databases to reflect these newly identified functional paralogues. Misannotation in reference genomes can lead to incorrect conclusions in genetic research and disease studies, potentially wasting resources and delaying the discovery of therapeutic targets. These errors can also hinder the development of effective treatments, impacting progress in personalized medicine. By leveraging a more comprehensive reference genome, researchers can not only confirm the current understanding of gene function and potential disease causation but also uncover previously hidden disease-relevant variants that may have been overlooked in earlier assemblies.

Despite the significant advantages offered by the T2T-CHM13 assembly, the study also has some limitations. The presence of multiple paralogues in subtelomeric regions makes it difficult to distinguish functional from non-functional copies in the T2T-CHM13 assembly.1 Furthermore, while bioinformatics and proteomics analyses provide strong evidence for the functional evidence of WASH1-20p13 and GPRIN2L, their biological roles must still be directly confirmed through in vitro or in vivo functional studies to solidify these findings. Thus, although the T2T-CHM13 assembly represents a groundbreaking step towards a complete human genome, ongoing efforts in annotation and functional characterization of these paralogues make the GRCh38 assembly still the standard human reference genome for genetic research. 

References

  1. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
  2. Cerdán-Vélez, D. & Tress, M. L. The T2T-CHM13 reference assembly uncovers essential Wash1 and GPRIN2 paralogues. Bioinformatics Advances 4, (2024).
  3. Helfer, E. et al. Endosomal recruitment of the wash complex: Active sequences and mutations impairing interaction with the Retromer. Biology of the Cell 105, 191–207 (2013).
  4. Schurr, Y., Reil, L., Spindler, M. et al. The WASH-complex subunit Strumpellin regulates integrin αIIbβ3 trafficking in murine platelets. Sci Rep 13, 9526 (2023).
  5. Sayers, E. W. et al. GenBank 2023 update. Nucleic Acids Research 51, (2022).
  6. Frankish, A. et al. Gencode: Reference annotation for the human and mouse genomes in 2023. Nucleic Acids Research 51, (2022).
  7. Bateman, A. et al. Uniprot: The Universal Protein Knowledgebase in 2023. Nucleic Acids Research 51, (2022).
  8. GPRIN2 G protein regulated inducer of neurite outgrowth 2 [homo sapiens (human)] – gene – NCBI. National Center for Biotechnology Information Available at: https://www.ncbi.nlm.nih.gov/gene/9721. (Accessed: 27th January 2025)
  9. Iida, N. & Kozasa, T. Identification and biochemical analysis of GRIN1 and grin2. Methods in Enzymology 475–483 (2004). 

Targeting the duo responsible for C9orf72 ALS/FTD pathogenesis.

Connie Fierro

RNA-targeting CRISPR systems have potential for modulating expression of RNA species involved in neurodegenerative disease pathogenesis.

The most common cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) is the expansion of a hexanucleotide (GGGGCC) repeat in the chromosome 9 open reading frame 72 (C9orf72) gene. ALS and FTD are characterized by motor dysfunction and cognitive and behavioural impairments, respectively, but these diseases converge on similar mechanisms of RNA-mediated toxicity1. In a new study, Kempthorne and colleagues developed a CRISPR-CasRx system to target and reduce both sense and antisense C9orf72 repeat RNAs in ALS and FTD models2. Their findings hold promise for a future therapeutic strategy to alleviate RNA-mediated toxicity and neurodegeneration in ALS and FTD.

Healthy individuals have between two to eight hexanucleotide repeats where ALS/FTD patients have upwards of 30 repeats. Bidirectional transcription of the C9orf72 repeat region results in the accumulation sense and antisense RNAs. The sense and antisense RNAs undergo repeat-associated non-ATG (RAN) translation producing dipeptide repeats (DPRs)3. Unlike normal translation, RAN translation begins in the absence of a start codon and is a known contributor to neurodegenerative disease due to the production of toxic RNA species, such as DPRs4. Six reading frames are associated with RAN translation and may be specific to the sense, antisense or both strands (figure 1)4. These DPRs disrupt protein homeostasis, exerting toxic effects through an unknown mechanism. Further, DPRs have been identified in the hippocampus, frontal cortex, cerebellum and spinal cord in patients with ALS/FTD3.

Kempthorne et al. noted the need for therapies targeting both sense and antisense repeat RNAs as previous clinical trials focusing only on sense RNAs failed2. They engineered an RNA-based CRISPR-CasRx, using guide RNAs (gRNAs) to target the C9orf72 repeat RNAs. Once the CRISPR-CasRx is bound, it exerts its ribonuclease activity that cleaves both sense and antisense RNA to prevent translation into toxic DPRs. This system functions as molecular scissors to eliminate specific RNA sequences. The researchers first confirmed that CRISPR-CasRx could simultaneously target and degrade both sense and antisense repeat RNAs. They reported 99% and 89% degradation, respectively, in a traditional cell line. It can be appreciated that this experiment was first carried out using a traditional cell line to confirm the baseline effectiveness of this system.

Figure 1: The negative effects of DPRs within the cell.  Six reading frames are associated with RAN translation and are listed as follows: glycine-alanine (GA) and glycine-arginine (GR) DPRs from the sense RNA strand, proline-alanine (PA) and proline-arginine (PR) from the antisense strand and glycine-proline (GP) from both strands. DPRs can be divided into nontoxic, highly toxic and moderately toxic repeat RNA species. Highly toxic DPRs, GR-DPR and PR-DPR, are reported to interfere with RNA metabolism and disruptions to non-membrane-bound organelles. The moderately toxic DPR, GA-DPR is most visible as inclusions in the central nervous system of FTD/ALS patients. GA-DPR leads to reduced dendritic branching, increased cellular stress, proteosome inhibition and apoptosis. Figure from4.

Using this information, they transitioned their CRISPR-CasRx into ALS/FTD patient-derived induced pluripotent stem cells (iPSCs) to assess its effect on endogenous C9orf72 repeat RNAs. They expanded their scope to investigate DPRs and neurotoxicity in the iPSCs after treatment with the CRISPR-CasRx. Sense repeat RNA expression decreased by 40% while antisense repeat RNA expression decreased by 73%. Immunoassays detecting GP-DPRs and GA-DPR levels revealed a 60% reduction in these toxic proteins. It is important to note that these DPRs pertain to the sense strand and the researchers were limited by the current immunoassays available to detect antisense-specific DPRs. There was no significant reduction in viable cells after transduction of the CRISPR-CasRx, which is beneficial when considering therapeutic applications. To analyze the effects at the phenotypic level, they analyzed a zebrafish model harbouring 45 hexanucleotide repeats, with a confirmed population of GP-DPRs, and a hyperactive behavioural phenotype. Injection of plasmids encoding the CRISPR-CasRx system were able to rescue this hyperactive phenotype by significantly decreasing the amount of GP-DPRs. Translating to a mouse model with 149 hexanucleotide repeats, CRISPR-CasRx and its gRNAs were delivered via neonatal intracerebroventricular (ICV) injection using adeno-associated viruses’ (AAVs). Antisense-specific gRNAs were not used, as the sequence did not match the mouse model. In the hippocampus, a 50% reduction on sense repeat RNAs were reported while there was no difference in levels of GP-DPRs.

To combat the limitation of the previous mouse model, a bacterial artificial chromosome (BAC) mouse was designed to have the full C9orf72 sequence and 500 hexanucleotide repeats. ICV injection of CRISPR-CasRx revealed a 20% decrease of sense and antisense repeat RNAs and no change in GP-DPR levels. This shallow decrease in repeat RNA levels was assumed to be due to a low transduction efficiency when transitioning to AAVs for the in vivo experiments compared to plasmids for the in vitro experiments. Taken together, Kempthorne et al. address the gap in antisense repeat RNAs research by designing a CRISPR-CasRx that targets both the sense and antisense strand to decrease levels of repeat RNAs in both cellular and animal models of disease2.

Despite the lack of robust evidence confirming the decrease of DPRs, the results presented by Kempthorne et al. provide a baseline for novel therapeutic strategies targeting both the sense and antisense repeat RNAs2. In-depth RNA sequencing revealed no off-targets effects of the CRISPR-CasRx which highlights its therapeutic applications. However, FTD and ALS are age-related diseases and neonatal injection of the CRISPR-CasRx system is not feasible, therefore future research should explore an alternative methodology. Alternative routes of administration of AAVs must be investigated for optimal penetration across the blood-brain-barrier (BBB) in older mice models. Deverman et al. engineered AAV variants that efficiently transduce the central nervous system through intravenous injection6. Future studies should investigate an engineered AAV to improve delivery and enhance transduction efficiency. As ALS/FTD treatments shift toward precision medicine, understanding individual RNA profiles can help tailor therapies to individual patients which improves their efficacy7. The failure of clinical trials targeting sense repeat RNAs in FTD/ALS highlights the demand for a therapy that addresses both sense and antisense repeat RNAs and DPRs to decrease cellular toxicity and rescue the respective phenotypes.

References

  1. Ling, S.-C., Polymenidou, M. & Cleveland, Don W. Converging Mechanisms in ALS and FTD: Disrupted RNA and Protein Homeostasis. Neuron 79, 416–438 (2013).
  2. Kempthorne, L. et al. Dual-targeting CRISPR-CasRx reduces C9orf72 ALS/FTD sense and antisense repeat RNAs in vitro and in vivo. Nature Communications 16, (2025).
  3. Banez-Coronel, M. & Ranum, L. P. W. Repeat-associated non-AUG (RAN) translation: insights from pathology. Laboratory Investigation 99, 929–942 (2019).
  4. Freibaum, B. D. & Taylor, J. P. The Role of Dipeptide Repeats in C9ORF72-Related ALS-FTD. Frontiers in Molecular Neuroscience 10, (2017).
  5. Gao, J. et al. Gene therapy for CNS disorders: modalities, delivery and translational challenges. Nature reviews. Neuroscience (2024) doi:https://doi.org/10.1038/s41583-024-00829-7.
  6. Deverman, B. E. et al. Cre-dependent selection yields AAV variants for widespread gene transfer to the adult brain. Nature Biotechnology 34, 204–209 (2016).
  7. Tzeplaeff, L., Wilfling, S., Requardt, M. V. & Herdick, M. Current State and Future Directions in the Therapy of ALS. Cells 12, 1523 (2023).

Neural Network Allows for a Comprehensive Method of Assessing Gene Regulation

Nithya Gopalakrishnan

Borzoi is a sequence-based machine-learning model trained on RNA-seq that can make gene expression predictions based on longer stretches of DNA than any prior models.

The field of genomics has evolved in tandem with advances in data analysis and computational processes, allowing researchers to assess complex datasets of gene regulation information1,2. Presently, neural networks and machine-learning modelling serve as an exciting development within the field, suggesting that we may soon be able to fully accurately predict the effects of uncategorized genetic variants on gene function from DNA sequences alone1,3,4. By developing Borzoi, a sequence-based machine-learning model that makes direct use of RNA-seq assay data, Linder et al. have put forward a new approach to capturing a predictive sequence coverage, or sequencing reads that map to the reference genome1. This model uses data from multiple species and incorporating a breadth of forms of gene regulation including splicing, polyadenylation, and transcription1. Borzoi’s efficacy was tested against established and validated computational models across many variant interpretation tasks, including characterizing distal cis-regulatory motifs in tissue-specific datasets and differentiating between benign and pathogenic variants within an individual’s genome1. In comparison to established and validated models such as Enformer and Pangolin in terms of analyzing gene regulation at multiple loci, Borzoi performed at either an equal or higher level, demonstrating the tool’s utility within genomics going forward1. On the whole, Borzoi utilizes an immense amount of epigenetic data for focused predictions regarding gene expression, which could allow for easier variant interpretation and a heightened knowledge of transcriptional regulation within the human genome1.

Figure 1: A graphical representation of the breadth of uses for RNA-seq, including the workflow for transcriptome construction highlighted in orange, the assembly of epigenetics datasets highlighted in green, and the possible downstream analyses highlighted in brown. Figure adapted from2.

To date, the majority of genomics machine-learning tools such as Enformer and Pangolin have been trained to predict transcriptional regulation effects using assays used for predictions based on regulatory elements that are within 2,000 bp of the transcription start site (TSS), a relatively short distance1,3,4. In contrast, the most popular assay for elucidating the effects of transcriptional regulators on gene function, RNA-seq, makes use of much larger stretches of sequence to assess gene expression holistically, including exons, introns, and long untranslated regions (UTRs)1,2,5. Despite this approach’s ubiquity in transcriptomics and use across comparatively more species than other assays, no computational predictor model had been trained directly on RNA-seq coverage prior to the inception of Borzoi. With this tool, predicting gene expression from DNA sequence across multiple forms of genetic regulation has been made more sophisticated1.

As a transcriptomics assay, RNA-seq is useful for describing sequence coverage for processed RNAs that have been transcribed, making it a proxy for gene expression2,5. The caveat for this approach is that mammalian gene sequences are often long, with cis-regulatory elements far upstream and downstream of a given gene1,6. This makes training a machine-learning model difficult, as longer sequences mean sacrificing prediction resolution and clarity1. Borzoi was constructed using the established deep learning architecture Enformer, which was originally trained to predict enhancer-promoter interactions based on DNA sequence1,3. To attempt to specialize this model, the neural network was trained on tissue-specific data from GTEx, allowing for a localized prediction of differential splicing, adenylation, or transcriptional regulation to be made1. Both the TSS and the 3’ UTR are essential for gene regulation, with the former also playing a role in polyadenylation signals and the differential splicing of different isoforms for many genes1. This prompted Linder et al to pay specific attention to these regions when looking at RNA-seq data1. Using five GTEx tissues (whole blood, liver, brain, muscle, and esophagus), Borzoi was able to predict the variation in differential tissue-specific gene expression to a level of high significance across five replicates1. Comparing Borzoi’s predictive ability against Enformer was also an essential step undertaken by the researchers, especially when assessing more distal gene regulatory interactions1. Given the long stretches of DNA sequence that comprise RNA-seq data, it follows that Borzoi was able to assess sites almost twice as far away from the TSS as the core Enformer architecture alone could achieve1. In addition, the combination of multiple forms of epigenetics assays beyond RNA-seq for model training data led to even higher accuracy, lending further credence to Borzoi’s predictive power1.

Amongst the most significant applications of Borzoi highlighted in this paper is that the model performs gene variant analysis interpretation tasks to a higher degree of accuracy than Enformer1. This finding is essential when considering Borzoi’s future applications: given that this specific model is trained on data taken across mammalian species and with a tissue-specific focus, Borzoi could be an immensely useful approach to identifying variants of unknown significance in essential genes that are evolutionarily conserved. The sheer amount of RNA-seq and GTEx data available is a major advantage when it comes to model training, as deep neural networks such as Borzoi are computationally intensive and require vast training datasets6. Variant analysis is time-consuming and often requires consulting multiple different assays, and making use of a single toolkit such as Borzoi that is trained comprehensively could be a decisive step towards a more streamlined approach to genomic interpretation. A further direction of model validation could be testing its performance on genome-wide association study data as valuable form of benchmarking for accuracy6,7. In the future, there is still much to be improved upon; whether the tool can reduce the false positive prediction rate and increase prediction accuracy across all layers of transcriptional regulation will effectively decide Borzoi’s role in genomic analysis.

References

  1. Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat Genet 1–13 (2025) doi:10.1038/s41588-024-02053-6.
  2. Muhammad, I. I., Kong, S. L., Akmar Abdullah, S. N. & Munusamy, U. RNA-seq and ChIP-seq as Complementary Approaches for Comprehension of Plant Transcriptional Regulatory Mechanism. Int J Mol Sci 21, 167 (2019).
  3. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 18, 1196–1203 (2021).
  4. Zeng, T. & Li, Y. I. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biology 23, 103 (2022).
  5. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57–63 (2009).
  6. Alharbi, W. S. & Rashid, M. A review of deep learning applications in human genomics using next-generation sequencing data. Human Genomics 16, 26 (2022).
  7. Sasse, A. et al. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat Genet 55, 2060–2064 (2023).