Competitive transcription factor binding regulates hemoglobin switching

Meredith Laver

Fetal hemoglobin expression is repressed following a transition to adult hemoglobin production at birth. For individuals with β-hemoglobinopathies, which impair the function of adult hemoglobin, reversing this switch is a promising route towards curative therapies. New research by Liu et al. proposes a simple model for hemoglobin switching based on competition between transcription factors BCL11A and NF-Y for binding sites in the γ-globin gene promoter.

Sickle cell disease (SCD) and β-thalassemia, collectively known as β-hemoglobinopathies, are common monogenic disorders in which abnormal hemoglobin production impairs erythrocyte production and function1,2. Approximately 300-400,000 annual global births are afflicted with a β-hemoglobinopathy1,2. Despite a high mortality rate, these diseases have remained extremely prevalent, and easily accessibly therapies are desperately needed1,2,3.  

β-hemoglobinopathies are caused by variant alleles in the β-globin gene which alter or disable the β-subunit of hemoglobin. Adult hemoglobin is made up of two α-globin subunits and two β-globin (HbA, α2β2)1. An alternate globin protein, γ-globin, is expressed during fetal development and contributes two subunits of fetal hemoglobin (HbF, α2γ2)1. As expression of β-globin rises during late fetal development, γ-globin production is inhibited; HbF comprises <2% of hemoglobin in adults4. Sustained expression of HbF in adults is occasionally observed and is termed Hereditary Persistence of Fetal Hemoglobin (HPFH)1,4. HPFH has been associated with reduced symptom load in patients with β-hemoglobinopathies, implicating relief of γ-globin repression as a potential treatment avenue2. HPFH is typically caused by mutations in the γ-globin promoter which fall in distinct clusters, suggesting that they might disrupt the binding motif of a regulatory repressor5. A recent study by Liu et al. identified competitive binding between repressive transcription factor BCL11A and activator NF-Y at an HPFH cluster site in the γ-globin promoter as a key mechanism of expression control4.

            The β-globin gene cluster on chromosome 16 contains 5 globin genes, including β-globin and γ-globin (Figure 1)1. In addition to the individual gene promoters, the region is regulated by a locus controls region (LCR) containing 5 DNase hypersensitivity sites (HSs) with varying degrees of regional enhancer activity1,2. Transcription of each globin gene is associated with looping between the promoter and the LCR. Developmental specificity is conveyed by the individual promoters while the LCR generally enhances transcription5. Transcription factors BCL11A and LRF have both been shown to repress transcription of γ-globin in adult erythroid cells through promoter binding and recruitment of the NuRD silencing complex6. HPFH mutation sites at -115 and -200 in the γ-globin promoter align with BCL11A and LRF binding sites, respectively4,7. CRISPR-Cas9 mediated disruption of the -115 bp binding motif reduces BCL11A binding and reproduces the HPFH phenotype, but the specific mechanism by which BCL11A silences γ-globin expression has been previously unclear7.

Figure 1 β-globin Locus with BCL11A binding sites (Adapted from Cavazzana et al., 2017)1

Liu et al.4 performed CRISPR-Cas9 perturbation screens to assess γ-globin expression in adult erythroid cell lines expressing Cas9 variants. Pooled gRNAs targeting the β-globin cluster at 11-bp intervals were introduced, and transfected cells with high HbF expression were isolated for analysis. Enrichment or depletion of gRNAs was quantified to identify sequences at which Cas9 activity was correlated with increased HbF expression. Inactive Cas9 (dCas9) binds to a target region but does not cleave DNA. dCas9 targeted to the LRF repressor binding site at approximately -200bp in the γ-globin promoter was associated with increased HbF expression, consistent with displacement of LRF. However, dCas9 binding at the BCL11A TGACCA binding motif at -115 bp was associated with reduced HbF expression. This effect was replicated by dCas9 targeting to other sites between -150 and -60 bp, suggesting that bound dCas9 was displacing an activating factor with a binding site in this region.  

NF-Y is an activating transcription factor with two possible binding sites within this range. It has been previously identified as an activator of globin gene expression, though the specific mechanism remained elusive8. Liu et al.4 found that shRNA knockdown of NF-Y subunit A in HbF-expressing erythroid precursors resulted in reduced LCR looping to the γ-globin promoter. This reduced looping was associated with decreased expression, suggesting a potential role for NF-Y in expression activation through facilitation of chromosomal looping. CUT&RUN located NF-Y binding at a CCAAT motif at -88 to -84 bp in the γ-globin promoter, within the -150 to -60 range identified by dCas9 screening. Mutation of this motif by Cas9 resulted in reduced NF-Y occupancy and decreased HbF levels. Conversely, mutation of the BCL11A motif at -115 bp resulted in increased NF-Y occupancy and γ-globin expression. dCas9 targeting to the -115 bp BCL11A binding site and other adjacent sites within the -60 to -150 bp window partially reduced NF-Y occupancy and decreased γ-globin expression in BLC11A knockout (KO) cells. Based on these results, the authors proposed a simple model of competitive binding between BCL11A and NF-Y as a major regulator of γ-globin expression (Figure 2). BCL11A is dramatically upregulated in adult erythroid cells as compared to fetal progenitors, which likely contributes to out-competition of NF-Y and transition to HbA production4.

Figure 2 Model of competitive binding between BCL11A and NF-Y in the γ-globin promoter­ (Adapted from Liu et al, 2021)5. A In the fetal erythroid progenitors, BCL11A expression is low and NF-Y activates γ-globin expression by binding to the CCAAT motif at -88 to -84 bp in the promoter. B In adult erythroid cells, BCL11A expression is high; it displaces NF-Y by binding to the TGACCA motif at -115 bp and recruits the NuRD silencing complex to repress γ-globin gene expression. NF-Y instead promotes β-globin expression, leading to the production of adult hemoglobin.

This simple model of γ-globin regulation suggests promising molecular targets for treatment of β-hemoglobinopathies via upregulation of HbF. Current clinical trials focus on downregulation of BCL11A via shRNA silencing or Cas9-mediated gene editing9,10. However, BCL11A has a key role in development of B-lymphocytes and hematopoietic stem cells, and downregulation negatively effects red blood cell enucleation1. Treatments targeting the -115 BLC11A site in the promoter may instead allow for highly specific relief of γ-globin repression. Cas9-mediated mutation of the binding site as demonstrated by Liu et al.4 is a potential therapy with long-term effect. Small proteins or ncRNAs may be also engineered to bind the site, but not to inhibit NF-Y binding. This approach lacks the risk of off-target mutation caused by Cas9 and may be translated into an affordable and easily produced therapeutic.   


  1. Cavazzana, M., Antoniani, C. & Miccio, A. Gene Therapy for β-Hemoglobinopathies. Mol Ther 25, 1142–1154 (2017).
  2. Frati, G. & Miccio, A. Genome Editing for β-Hemoglobinopathies: Advances and Challenges. J Clin Med 10, 482 (2021).
  3. Piel, F. B. J., Steinberg, M. H. & Rees, D. C. Sickle cell disease. N Engl J Med 376, 1561-1573 (2017).
  4. Liu, N. et al. Transcription factor competition at the γ-globin promoters controls hemoglobin switching. Nat Genet 53, 511–520 (2021).
  5. Bender, M. A., Bulger, M., Close, J. & Groudine, M. Beta-globin gene switching and DNase I sensitivity of the endogenous beta-globin locus in mice do not require the locus control region. Mol Cell 5, 387–393 (2000).
  6. Xu, J. et al. Transcriptional silencing of γ-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev 24, 783–798 (2010).
  7. Liu, N. et al. Direct Promoter Repression by BCL11A Controls the Fetal to Adult Hemoglobin Switch. Cell 173, 430-442.e17 (2018).
  8. Zhu, X. et al. NF-Y recruits both transcription activator and repressor to modulate tissue- and developmental stage-specific expression of human γ-globin gene. PLoS One 7, e47175 (2012).
  9. Frangoul, H. et al. CRISPR-Cas9 Gene Editing for Sickle Cell Disease and β-Thalassemia. N Engl J Med 384, 252–260 (2021).
  10. Esrick, E. B. et al. Post-Transcriptional Genetic Silencing of BCL11A to Treat Sickle Cell Disease. N Engl J Med 384, 205–215 (2021).

Why Should We Care4Rare?

The Care4Rare initiative has revolutionized the way we diagnose rare genetic disorders in Canada through providing access to genetic sequencing and other ‘-omics’ technologies. 

Kassandra Bisson, Radhika Mahajan, Paul McKay, and Hamid Farahmand

Dr. Kym Boycott pictured with Eli, one of the children who participated in the Care4Rare initiative and received a resulting diagnosis for his rare genetic condition. Photo courtesy of Melanie Tempel.

Imagine having a child who is sick and after years of tireless diagnostic testing and countless specialist appointments, their diagnosis remains inconclusive. That is usually the dilemma facing a parent whose child suffers from a rare disease. A disease is considered ‘rare’ if it affects less than 200,000 people.1 However, rare disorders are, in fact, extremely common, impacting millions of people worldwide. These diseases are generally chronically debilitating and can even be life threatening.2 In Canada, over a million people suffer from one or more of the 7,000 rare genetic diseases (RDs), in which a third have an unknown underlying genetic cause.3 In search of answers, Dr. Kym Boycott has been changing the game of rare disease patient care and diagnosis. Under the Department of Genetics at Children’s Hospital of Eastern Ontario (CHEO), Dr. Boycott has been a pioneer in improving patient care by understanding the molecular pathogenesis of rare diseases. In addition to her role as a Tier 1 Canada Research Chair in Rare Disease Precision Health, she is also a renowned Clinical Geneticist and a Senior Scientist at the CHEO Research Institute. 

When asked about what sparked her career path, Dr. Boycott stated that it was a lecture given by Dr. Patrick McLeod during her undergraduate degree that ignited her interest in human genetics. Dr. Boycott stated, “When you look back at your life at my age, you will see those forks in the road and that was one of them.” Her experience working with both clinicians and researchers motivated her to pursue a PhD and MD, followed by FRCPC training in Medical Genetics at the University of Calgary. Throughout the course of her academic journey, one of the most prominent turning points she experienced was in 2011 when she, alongside her colleagues, launched a national network entitled the Finding of Rare Disease Genes in Canada (FORGE Canada) project. This project primarily used next generation sequencing technology (NGS) to study rare diseases4. In the context of diagnosing rare diseases using NGS, she mentioned, “These [were] amongst the first exomes done for rare disease in Canada at scale.” To her surprise, a bioinformatics masters’ student at that time, Jeremy Schwartzentruber, interpreted the genomic data and identified candidate genes for several of the syndromes on the first sequencing runs. She candidly stated, “It had taken me six years to find my first gene. And during this one afternoon in 2011, we’d found six genes for syndromes that had been without a known genetic cause for decades in 1 hour. […] This is going to be something really important for genetics.” With this major advent in NGS technology over the past decade, Dr. Boycott has led genomic sequencing initiatives worldwide, including FORGEand Care4Rare in Canada, in combination with various ‘-omics’ technologies to unlock the secrets behind rare diseases.

What Is Care4Rare?

One of Dr. Boycott’s greatest milestones is the Care4Rare project (Figure 1)5,which focuses on finding diagnoses for individuals with rare diseases that remain undiagnosed. Founded in 2011, Care4Rare is a pan-Canadian consortium consisting of clinicians, bioinformaticians, scientists, and researchers. The consortium is exploring ways to improve the care of patients with rare diseases in Canada and around the world. In addition to its headquarters at CHEO, Care4Rare has 21 academic sites across the country, and is recognized internationally as a pioneer in genomics and personalized medicine.

Figure 1: Care4Rare milestones by the numbers. The figure depicts the major outcomes of the Care4Rare project over the past decade. Figure adapted from.5

Care4Rarehas two main goals: 1) access and 2) understanding. The first goal strives to provide access to exome (ES) or genome sequencing (GS) for all eligible individuals with a suspected rare genetic disease in Canada. The second goal aims to better understand how genetic variation contributes to diseases. Over a 10-year period, Care4Rarehas studied more than 5000 families. When asked about Care4Rare’s proudest accomplishment Dr. Boycott cited, “The fact that all of those 5000 families got the opportunity to access this sequencing technology before it was available in the clinic.” Over 50% of those families have already received answers from this research, while the remaining 50% are still being investigated after inconclusive genomic sequencing results. Dr. Boycott expects that within the next few years, genomic sequencing will become incorporated early on in the diagnostic care pathway for individuals with suspected rare genetic syndromes. Dr. Boycott explained further, “The more we can push it to the front of the diagnostic pathway, the better.” The early integration of genetic sequencing will likely shorten the diagnostic timeline and avoid other inconclusive testing and specialist referrals. 

The type of sequencing most appropriate for clinical use is hotly debated. Dr. Boycott stated, “genome sequencing provides about a 5% increase in diagnostic yield over exome sequencing. [There is] not much ‘genome’ can find that an exome didn’t already find for you, especially if you’ve had a microarray done, but our understanding of the genome will improve over time.” She did acknowledge the importance of genome sequencing in playing a critical role in revealing mutational mechanisms and ‘hidden answers’ not accessible by exome sequencing alone. These revelations will push genomic understanding further and make the data produced by ES/GS much more medically actionable6.

Integrating The ‘-omics’ Technologies

Care4Rare – SOLVE, the third phase of the project, is currently focussing on optimizing the delivery of both clinical genome-wide sequencing and multi ‘-omics’ approaches6. This is alongside global data sharing and new bioinformatics, facilitating delivery of innovative diagnostic care for rare diseases. Any individual still undiagnosed after ES, with no candidate variants identified, likely has a complex disease mechanism which will be challenging to detect. For example, a disease mechanism involving long range genomic interactions or heterogeneity in the genetic makeup of the affected tissue means that ‘deeper digging’ is often required to uncover a diagnosis6. For families who failed to receive a clear diagnosis from initial ES, Care4Rare’s clinical laboratory teams will follow-up by supplementing this genomic data with multi ‘-omics’ technologies (Figure 2)6,7,8. The integration of these newer ‘-omics’ technologies is a current focus of Care4Rare, with the hope that this can help ‘solve’ the underlying disease mechanism in individuals or families that were undiagnosed after clinical ES6. Dr. Boycott particularly emphasized the impact of using long-read genome sequencing, transcriptomics, methylomics, metabolomics and lipidomics methodologies in rare disease diagnostics6. Due to their relative novelty, understanding these technologies is a primary focus. Care4Rare subsequently hopes to develop a decision-making tool for determining which ‘-omics’ technologies to use next in the clinical diagnostic pathway based on the suspected disease mechanism. Combining these technologies generates valuable data which increases the potential for clinical actionability6. From this increased understanding of genomic variation and disease, novel therapeutic targets can be elucidated allowing the development of more precise treatment approaches tailored to an individual. 

Figure 2: Integration of multi ‘-omics’ technologies in the Care4Rare bioinformatics pipeline. This multi-approach method allows for deeper understanding of the many layers of interacting biomolecules in rare diseases. Together the many ‘-omics’ pieces fit together to uncover the bigger picture of the underlying diagnosis. Figure adapted from.7,8

When asked about any potential barriers in the current expansion of the Care4Rare initiative, Dr. Boycott said the only real challenge recently has been the impact of the COVID-19 pandemic restrictions. Particularly, their ability to readily collect samples and therefore the recruitment had been reduced, however, this has been improving as restrictions are being lifted. At CHEO, the aim is to set up a clinic for undiagnosed patients supported by the collaboration between clinical research staff, clinical geneticists, and genetic counselors. Since various sample types can be required for use in other ‘-omics’ technologies, the clinic’s mission is to provide a central location for families to undergo multi-sample collection. This clinic will thereby help to ease the length of time in the research and testing process and ultimately further Care4Rare’s main goal of improving access to genetic testing.

The RareConnect Platform 

The RareConnectplatform, initially set up by EURODIS (Rare Diseases Europe), accompanies the research of Care4Rare.9 It offers a private, supportive, and safe social network platform in 13 languages for families that have ultra-rare diseases who wish to connect, ask questions, and share their experiences and stories.9 The RareConnectplatformis divided into disease specific online discussion groups and communities based on topics pertaining to many disease areas.9 It also offers a community for those without a current diagnosis.9  Dr. Kym Boycott pointed out, “These tools have helped address the isolation that families often experience when they have a rare disease”. The CHEO initiatives led by Dr. Boycott have helped thousands of individuals reach a diagnosis for their rare genetic disease, oftentimes providing families affected by rare genetic disease with immediately actionable therapeutic avenues upon finally receiving their highly elusive diagnosis. 

Future Prospects for Medical Genomics 

The Care4Rare initiative has been a pioneering project leading the way for integration of medical genomics into clinical practice. This project has demonstrated the usefulness of ES/GS alongside multiple ‘-omics’ technologies in diagnosing individuals and families with rare genetic diseases6. Identification of new disease-causing genes will help clinicians and researchers better understand what causes a rare disease and may inform approaches to development of subsequent therapeutics6. While there is currently limited knowledge regarding the epidemiology, diagnosis, and treatment of RDs, global efforts are ongoing to increase awareness, treatment options, and education. 

When asked about why she thinks medical genomics research is so important, Dr. Boycott stated, “I think it’s so important because we don’t understand the medical genome – and this impacts patient care – its clinical utility will only increase with our increased understanding”.  Dr. Boycott emphasized the importance of medical genomics research in impacting rare diseases and cancer management in the future. As the integration of genomics/other ‘-omics’ becomes more widely used,  all that data produced will need to be interpreted. She also noted how interesting it will be to see how “ultimately patients’ treatment might change.” As the Care4Rare initiative has demonstrated, this advancement of genomic and other ‘-omics’ technologies greatly increases the necessity for individuals and researchers that are trained in the medical genomics field. Overall, Care4Rare serves as a fantastic model for other rare genetic disease research and will pave the way for novel research, therapeutic approaches, and diagnostic care.


1.         Diseases | Genetic and Rare Diseases Information Center (GARD) – an NCATS Program. (Accessed 2022).

2.         Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013).

3.         Care4Rare Canada: Harnessing multi-omics to deliver innovative diagnostic care for rare genetic diseases in Canada (C4R-SOLVE) | Genome Canada. (Accessed 2022).

4. Beaulieu, C. L. et al. FORGE Canada Consortium: Outcomes of a 2-Year National Rare-Disease Gene-Discovery Project. Am. J. Hum. Genet. 94, 809–817 (2014).

5.         CARE for RARE. CARE for RARE (Accessed 2022).

6. Driver, HG. et al. Genomics4RD: An integrated platform to share Canadian deep-phenotype and multiomic data for international rare disease gene discovery. Hum Mutat. doi: 10.1002/humu.24354. Epub ahead of print. PMID: 35181971 (2022).

7.         Computational Multi-Omics. Computational Multi-Omics (Accessed 2022).

8.         Labory, J. et al. Multi-Omics Approaches to Improve Mitochondrial Disease Diagnosis: Challenges, Advances, and Perspectives. Front. Mol. Biosci. 7, 590842 (2020).

9.         RareConnect. (Accessed 2022).

Clinical Genetics: Medicine, Genomics, and Education, in Action

Dr. Faghfoury, a prominent clinical geneticist at SickKids, shares her expertise on all things medical genomics; including her professional journey, misconceptions, and challenges of genetic testing.

George Guirguis, Yasmeen Kurdi, and Anahita Bahreini-Esfahani

Dr. Hanna Faghfoury is a well-known clinical geneticist currently working at some of the most prominent healthcare facilities in Canada such as Mount Sinai Hospital, The Hospital for Sick Children (Sickkids), and University Health Network. She obtained her MD degree from McGill University in 2004, and pursued her interest in medical genetics by completing her post-graduate studies in Medical Genetics, followed by an additional two years training in Clinical Biochemical Genetics – both at the University of Toronto. She is currently the post-graduate director of the Medical Genetics and Genomics program at University of Toronto, and also holds an associate professor position at the Temerty faculty of Medicine. Photo credit Dr Faghfoury.

Imagine being in medical school after years of hard work and dedication only to find yourself not drawn to any of its disciplines. Most medical disciplines are categorized based on organ groups. Dr. Hanna Faghfoury found herself in this specific situation – not drawn to any particular organ system, she was uncertain whether she would find a suitable specialty. This doubt changed to passion and excitement when she enrolled in a Medical Genetics elective. “After the  first day, I called my parents, and I said I found what I want to do for the rest of my life.” What really stood out to Dr. Faghfoury was that a medical geneticist is not focused on a single organ system, yet was not considered to be a generalist. More importantly, medical geneticists had the ability to follow patients longitudinally – from birth and throughout the patient’s life. After getting accepted into the medical genetics residency at the University of Toronto (UofT), she enrolled in an elective of which she has never heard before in medicine- Metabolics. Having completed an undergraduate degree in biochemistry was helpful – despite the often dry and seemingly irrelevant delivery of biochemical pathways as Dr. Faghfoury highlighted. In light of genetics, metabolic pathways made more sense, as they provided clearly actionable targets of intervention. This intensified Dr. Faghfoury’s passion for medical genetics and she pursued this specialty for her career. Today, Dr. Faghfoury is the post-graduate director of the Medical Genetics and Genomics program at UofT, where she also holds an associate professor position at the Temerty Faculty of Medicine.

The completion of the Human Genome project in 2003 ushered in a new era of modern medicine and led to the advent of sophisticated technologies used to sequence DNA. These advances have since transformed the landscape of clinical diagnostics and management of genetic disorders. Contemporary medical genetics has become an expansive subspecialty of medicine, entailing the use of genetic principles such as inheritance and gene mapping in the diagnosis of management of disease. Previously, a geneticist’s expertise in recognizing dysmorphological features was a pivotal factor in identifying candidates for genetic testing1. Furthermore, genetic testing was widely inaccessible due to the slow turnaround times of lab results processing and the astronomically high cost of sequencing. Fast forward to 2011 when the FDA approved next generation sequencing for application in clinical diagnosis2– this marked a paradigm shift in clinical assessment. As genetic testing became cheaper and more accessible, geneticists increasingly integrated these sequencing technologies into their practice, slowly moving away from strictly assessing clinical presentation, or phenotyping, to identify or rule out disease. Dr. Faghfoury notes that as technological and financial barriers surrounding genetic testing decrease over time, the need for phenotyping will decrease – which is what clinical geneticists have been traditionally trained for. She notes that this gradual shift poses somewhat of a professional identity crisis for clinical geneticists in terms of distinguishing the profession from that of a genetic counselor. That being said, medical geneticists have distinct skills from lab personnel and counselors because they are trained in patient management. One limitation that prevents geneticists from broadening the scope of their practice is constraints in capacity and resources that can be attributed to the current model of care. Addressing these limitations will require a systemic re-imagination of the role and scope of medical geneticists in the rapidly changing era of genomics. Despite these capacity and resource constraints, medical geneticists, like Dr. Faghfoury, maintain an invaluable role in patient care.

Dr. Faghfoury’s day-to-day work is dynamic and varied given her multitude of roles. However, a constant part of her work is patient education, where she addresses hesitancies and misconceptions surrounding genetic testing. In pre-test consultations with patients, she emphasizes that “there isn’t a one size fits all for genetic testing”, and that a myriad of tests can offer varied insights that together aid in clinical evaluation. A type of genetic test routinely used in genetic clinics, such as the Fred A Litwin Family Centre in Genetic Medicine where Dr. Faghfoury works as a geneticist, is whole exome sequencing (WES). This technique made its way into clinical diagnostics around the year 2011, and applies next generation sequencing to determine variation in coding regions of genes, also known as exons. About 85% of disease-causing mutations in Mendelian disorders- disorders caused by mutations in only one gene- are contained in exons3. One example of a disease where WES provides a high level of sensitivity and specificity to identify or rule out disease is Wilson’s disease – a genetic disorder that interferes with the body’s ability to remove excess copper. One important limitation of WES is that it only examines one percent of the human genome4. At times, this limitation may render WES ineffective at determining a genetic cause for a patient’s suspected disorder. This is because regulatory regions that modulate expression of genes- essentially turning them on/off- exist outside of exons5. For example, in malformations of cortical development disorders, many patients have no mutations in their genes, but rather in the regulatory regions surrounding them6. For example, intronic repeat expansions have been shown to cause brain disorders such as epilepsy7. The mutations present in these patients are often missed with the use of WES. This is why Dr. Faghfoury educates her patients that a normal WES result does not equate to a negative result, rather it is inconclusive.  “I don’t call a negative result negative, I say ‘it’s inconclusive’ because we just haven’t found the cause of [the] problem”. On the contrary, many patients believe that genetic testing is the be-all-end-all, and that it will always provide answers. “The misconceptions either fall in the category of overvaluing genetic testing or undervaluing it”. Whole genome sequencing (WGS), on the other hand, captures virtually the entire genome, including regulatory regions. Because of this, WGS can provide a more conclusive result. Alongside the advantage of capturing immensely more of the genome, WGS requires extensively more analysis. In addition, WGS is more accurate than WES4.Regrettably, for most Ontario patients, WGS is not currently requestable by physicians. Instead, it is conducted randomly in lieu of WES.

Figure 1: Diagram depicting the whole exome sequencing pipeline. The left side of the figure displays an enrichment of DNA fragments to isolate for protein coding regions (exons). The exons then go through the process of Next-generation sequencing, which involves mapping reads to a reference genome to identify variants including deletions and single nucleotide polymorphisms. Processed reads are then filtered and annotated for associations with disease. (Retrieved from8)

There are many challenges facing the field of clinical genetics, where limited resources represent an especially pertinent challenge. Ideally, a clinical geneticist would diagnose a patient and continually follow-up with them long term. Unfortunately in Canada, there are only seven genetics residency programs in the country that graduate a handful of students each year, creating a high demand for geneticists with a low supply. Because there are not enough clinical geneticists to go around, patients are often followed up by their family physician post diagnosis. This can pose potential issues as clinical genetics is a rapidly evolving specialty and family physicians may not have the specific expertise to follow up with patients diagnosed with genetic disorders. This led to coinage of the term ‘diagnose and adios’ by clinical geneticists, who oftentimes find themselves disengaged from patient management. This is an area in the current medical system that requires more advocacy and change. Not all patients diagnosed with a genetic disorder follow-up with their family physician, however. For certain genetic disorders, there are clinics where clinical geneticists follow-up with their patients, such as the GoodHope clinic (for Ehlers-Dalnos syndrome) and the Genometabolic clinic, where Dr Faghfoury practices. Unfortunately for many patients, this is an equity problem. For example, a patient with a certain genetic disorder will not find a clinic with clinical geneticists to follow-up with, and must do so with their family physician. “Why is it their fault that their mutation happened to be in a gene that didn’t have a subspecialty clinic attached to it? It’s not fair.” This inequity between patients with different genetic disorders is a target for many genetic professionals, whose goal is to ensure that all patients get the best care possible.

The future of the field of clinical genetics looks promising. Recent developments in the field of genetics such as whole-genome sequencing and whole exome sequencing have drastically changed the landscape of managing genetic disorders. An exciting paradigm shift for clinical geneticists mentioned by Dr. Faghfoury is straying away from strictly depending on phenotyping for clinical identification thanks to genetic testing. One example of this shift can be seen with the rapidly expanding field of pharmacogenomics, the study of how genes affect an individual’s response to drugs. Cytochrome P450 2D6 (CYP2D6) is an important gene involved in the metabolism of about 20% of commonly prescribed drugs (Taylor 2020). Interestingly, CYP2D6 is highly variable across different populations, which can directly influence drug metabolism in individuals carrying such variants. To date, 72 different drugs have CYP2D6 clinical guidelines mentioned within their FDA-approved product labels (Taylor 2020). Instead of the trial and error approach typically needed to assess drug efficacy in patients, genetic testing of CYP2D6 can identify individuals that may experience adverse reactions or reduced efficiency, to tailor therapeutic doses accordingly (Taylor 2020). While pharmacogenomics offers exciting potential for personalizing medicine, barriers remain to clinical implementation. Such barriers include the necessary educational and equipment infrastructure to perform and interpret such tests. Moving forward, there will be a greater need for expertise to efficiently integrate genetic testing into commonplace clinical practice. As Dr. Faghfoury puts it,  “right now we need all hands on deck” to effectively usher in this new and rapidly evolving era of healthcare.


  1. Tromans, E., Barwell, J. Clinical genetics: past, present and future. Eur J Hum Genet (2022).
  2. Efthymiou, S., Manole, A., & Houlden, H. Next-generation sequencing in neuromuscular diseases. Current opinion in neurology, 29(5), 527–536. (2016).
  3. Rabbani, B., Tekin, M. & Mahdieh, N. The promise of whole-exome sequencing in medical genetics. J Hum Genet 59, 5–15 (2014).
  4. Belkadi, A., et al. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proc Natl Acad Sci U S A. 112(17), 5473–5478. (2015).
  5. Barrett, L. W., Fletcher, S., & Wilton, S. D. Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cellular and molecular life sciences : CMLS, 69(21), 3613–3634. (2012).
  6. Perenthaler, E., Yousefi, S., Niggl, E., & Barakat, T. S. Beyond the Exome: The Non-coding Genome and Enhancers in Neurodevelopmental Disorders and Malformations of Cortical Development. Frontiers in cellular neuroscience, 13, 352. (2019).
  7. Scheffer IE. The Key to FAME: Intronic Repeat Expansions Cause Human Epilepsies. Epilepsy Curr. 2018;18(4):238-239. doi:10.5698/1535-7597.18.4.238
  8. Goh, G., Choi, M. Application of Whole Exome Sequencing to identify Disease-Causing Variants in Inherited Human Disease. Genomics Inform. 10(4):214-219. (2014).
  9. Taylor C, Crosby I, Yip V, Maguire P, Pirmohamed M, Turner RM. A Review of the Important Role of CYP2D6 in Pharmacogenomics. Genes (Basel). 2020;11(11):1295. Published 2020 Oct 30. doi:10.3390/genes11111295

A Study in DNA: The Adventures of a Clinical Geneticist

Genetic disorders often present with a puzzling array of symptoms, making diagnosis challenging. Fortunately, clinical geneticists are on the case! Dr. Marjan Nezarati takes us through the process of providing her patients and their families with answers.

Meredith Laver & Alex Margaritescu

Dr. Marjan Nezarati, M.D., RCPSC Specialist. Photo courtesy of Marjan Nezarati.

The Clinical Genetics department at NYGH sees a wide variety of cases that span a few main categories. Generally, a person is referred to clinical genetics if they are suspected of having a genetic disorder, either because of a family history, or because they are presenting symptoms. Children are often referred for developmental delay combined with one or more dysmorphic features. Prenatal cases are referred when a parent has an unusual screening test such as an ultrasound, or a family history of genetic disorders. NYGH also runs a hereditary cancer clinic which sees individuals with a familial history of cancer. Dr. Nezarati laments lengthy wait times and explains that they “really don’t have the resources”, given the number of referrals they receive. After a referral is accepted, the patient is scheduled to see a clinical geneticist like Dr. Nezarati. She then gets a detailed picture of the patient’s family history and gives a preliminary overview of possible findings. Most cases require additional testing to elucidate physical symptoms, or to investigate genetic causes. 

Figure 1 – Process of diagnosing genetic disorders in prenatal, child, adolescent or adult patients. All cases begin with a visit to a physician, who may write a referral to a genetics clinic if the findings suggest the possibility of a genetic disorder. At the clinic, a geneticist re-examines the patient’s physical symptoms and family history, and orders appropriate genetic testing. Image created in BioRender.

Dr. Nezarati has access to a toolkit of genetic tests to help identify the molecular causes of disease. Genetic tests look for the presence of potentially disease-causing changes in a patient’s DNA. Prenatal cases receive either non-invasive prenatal screening (NIPS) or invasive prenatal testing (IPT). Prenatal testing is time-sensitive as parents must make informed decisions and prepare for health challenges before a child is born. Although IPT is faster and provides more information, some parents opt for NIPS first because of the small risk of miscarriage associated with IPT4. One of two common IPT methods, chorionic villus sampling (CVS) or amniocentesis, is used to acquire a sample of fetal DNA. CVS harvests a small tissue sample of the chorion which is a membrane enveloping the fetus, and amniocentesis harvests the amniotic fluid which surrounds the fetus4. The DNA derived from these samples can then be tested for common disease-causing mutations and chromosomal abnormalities.

In contrast, children and adults who present with suspected genetic syndromes usually receive microarray testing of blood samples. Microarrays detect duplications or deletions of specific genomic regions. If a particular condition is suspected, a microarray is ordered which tests at sites at which duplications or deletions are known to cause that condition. Since microarrays have become fairly common tests, geneticists are now trying to encourage family physicians and specialists to order them independently, instead of submitting a genetics referral.

If microarray testing doesn’t reveal a diagnosis, and a genetic syndrome is still suspected, Dr. Nezarati will often order either a gene sequencing panel or whole exome sequencing (WES). Sequencing identifies the DNA sequence of a portion of the genome. Gene panels involve sequencing only the genes which are commonly associated with a specific disorder or symptom, and are typically used to confirm a clinical diagnosis. WES looks at the entire exome, which is the portion of the genome that contains instructions to make cellular products such as proteins. Although the exome makes up only 1% of the genome, approximately 85% of disease-causing mutations are located in these areas5. It can be much more cost effective to sequence the entire exome than to run multiple gene panels if the first is inconclusive, making WES a good diagnostic test for patients whose clinical diagnosis remains elusive5.

In some cases, the usual genetic tests fail to identify a causative mutation, leaving patients and families without answers. Geneticists can bridge the gap between emerging research and clinical practice by submitting these especially puzzling cases to research studies. This practice helps to provide patients with a diagnosis, and uncover new molecular signatures of disease. Dr. Nezarati is the primary investigator at NYGH for two research studies which use expanded testing methods to investigate undiagnosed cases: Care4Rare–SOLVE and EpiSign.

Care4Rare is a consortium that was founded in 2011 to unite researchers and clinicians across Canada in providing care for individuals with rare diseases6. The current iteration of the project is called Care4Rare–SOLVE and is focused on identifying the molecular causes of rare genetic conditions6. Clinical researchers like Dr. Nezarati collect and share data to help expedite patient diagnosis and the classification of new disorders. Patients enrolled in Care4Rare receive access to whole exome and genome sequencing, as well as expanded testing methods which include RNA sequencing6. Dr. Nezarati signed a young girl up for an early form of Care4Rare after a battery of standard tests failed to produce a diagnosis. They entered the patient’s phenotype and genotype data into a knowledge sharing database called Matchmaker Exchange and suddenly the pieces began falling into place. There was “someone from Australia and another person from the US, and they [had] patients with mutations in the same gene.” Researchers and clinicians around the world were able to work together to formally classify a new rare genetic disorder and begin to build a knowledge base7. Around half of the individuals enrolled in Care4Rare have received a diagnosis for their rare disease6. A formal diagnosis can help patients and families to seek appropriate healthcare, inform family planning decisions, and allow them to connect with others through shared experiences. 

Even advanced DNA testing methods can sometimes fail to produce a diagnosis. In these cases, patients can be enrolled in EpiSign for epigenetic analysis. Genetic and environmental differences create changes in the way that DNA regions are packaged and read. Epigenetics is a branch of genetics that looks at how these differences impact gene expression. Certain genetic disorders such Fragile X, Prader-Willi, or Kabuki Syndromes are associated with recognizable epigenetic signatures8. EpiSign analyzes a patient’s epigenetic pattern in order to identify these signatures and connect them to a diagnosis8.

So how does a patient qualify for submission to a research study? “Really, it’s when we are highly suspicious…that it’s a syndromic diagnosis that we’re not catching by routine testing. And sometimes it’s individuals who have a clinical diagnosis”, Dr. Nezarati explains. “So I’m looking at this person and I think they have Kabuki syndrome, let’s just say, and we do the [sequencing] panel of Kabuki genes and we don’t find a hit. Then that would be a case where you could say, well, let’s submit this to Care4Rare–SOLVE or even to EpiSign to see if the epigenetic signature matches the epigenetic signature for Kabuki syndrome.” The interest and consent of the family is also paramount – “if they don’t want to do it, that’s the end of the discussion.”

In some cases, clinical geneticists are able to collaborate with researchers around the world to help assess the impact of new mutations. “I find most of the time when I’ve reached out to people internationally, even big names… I hear back from them”, Dr. Nezarati recounts. “Geneticists are generally… very, very generous with their time.” One couple who had lost multiple pregnancies was looking for an answer. Often in these cases, recurrent mutations in the fetus are responsible. Genetic testing identified mutations in the fetus and parents in a gene which had not been formally recognized as disease-causing. Dr. Nezarati reached out to a group researching the gene to help solve the case. The researchers recreated the mutations in yeast and found that this particular combination of mutations completely disabled the gene. Fortunately, the couple was able to receive prenatal testing for these mutations in future pregnancies.

Nevertheless, a clinical geneticist’s job isn’t all thrilling detective work and happy endings. Even if a diagnosis can be found, many genetic disorders lack therapy options which address the root cause; patients rely on treatments to manage each individual symptom. Families may also face hurdles from the medical system; Dr. Nezarati describes how one child’s mother “had to really fight to get a referral.” Nonetheless, Dr. Nezarati finds that many patients and families take comfort in understanding their situation, and in feeling understood. “Sometimes I really find I’m sort of just a listener. Sometimes I make very little difference and it’s just the willingness, and having the time to sit and listen to someone. That may be all I can do for them, but sometimes that’s helpful.”


1. Baird, P. A., Anderson, T. W., Newcombe, H. B. & Lowry, R. B. Genetic disorders in children and young adults: a population study. Am J Hum Genet 42, 677–693 (1988).

2. Basel, D. Dysmorphology in a Genomic Era. Clin Perinatol 47, 15–23 (2020).

3. About CORD | Canadian Organization for Rare Disorders.

4. Beta, J., Zhang, W., Geris, S., Kostiv, V. & Akolekar, R. Procedure-related risk of miscarriage following chorionic villus sampling and amniocentesis. Ultrasound in Obstetrics & Gynecology 54, 452–457 (2019).

5. Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 106, 19096–19101 (2009).

6. Osmond, M. et al. Outcome of over 1500 matches through the Matchmaker Exchange for rare disease gene discovery: The 2-year experience of Care4Rare Canada. Genetics in Medicine 24, 100–108 (2022).

7. White, S. M. et al. A DNA repair disorder caused by de novo monoallelic DDB1 variants is associated with a neurodevelopmental syndrome. Am J Hum Genet 108, 749–756 (2021).

8. Sadikovic, B. et al. Clinical epigenomics: genome-wide DNA methylation analysis for the diagnosis of Mendelian disorders. Genet Med 23, 1065–1074 (2021).

Slipping into the DNA architecture of tandem repeat expansion disorders

Understanding the mechanism of repeat expansion has allowed Dr. Christopher E. Pearson and colleagues to target unique disease-associated mutagenic DNA structures as a potential therapeutic avenue.

Elvira Mukharryamova, Sornnujah Kathirgamanathan, and Tanvi Anadampillai

Dr. Christopher E. Pearson is a Canada Research Chair in Disease-associated Genome Instability, a Senior Scientist at The Hospital for Sick Children in Toronto, and a Full Professor with the Department of Molecular Genetics at the University of Toronto. Photo from The Hospital of Sick Children. 

            The progressive neurodegeneration (loss of brain cells) in individuals with Huntington Disease (HD) highlights the limits of modern medicine in relation to prognosis and cure. As a condition that worsens over time, HD individuals become entirely reliant on others for their daily living. The characteristic neurodegeneration in HD individuals is due to a curious mutation of DNA, called tandem repeat expansions, in the protein-coding gene HTT, which is involved in brain development. These repeat expansions consist of nucleotide sequence units, such as CAG in the case of HD, that occur in tandem (‘CAG CAG CAG…’). For example, healthy individuals carry a repeat tract lengths of 5-35 ‘CAG’ units in the HTT gene. Individuals with 35-39 copies are at an increased risk for HD, while those with 40 or more copies will develop HD earlier in life1 (Fig. 1A). Importantly, Pearson says “As patients age, the mutation continues in their brains and their disease worsens. For example, ‘THE CAT ATE THE FAT FAT RAT’ mutates to ‘THE CAT ATE THE FAT FAT FAT RAT,’ which eventually mutates to ‘THE CAT ATE THE FAT FAT FAT FAT FAT RAT,’ and so on.” The number of tandem repeats in functionally relevant genes – also referred to as repeat length – is negatively correlated with symptom age-of-onset and positively correlated with disease progression and severity  (Fig. 1B). Generally, longer repeat lengths lead to an earlier age-of-onset with a more severe disease phenotype2. “Essentially, for therapy we would like to put that RAT on a diet, which should delay onset and slow progression”, says Pearson. Tandem repeat expansions also cause 69 other serious disorders3

Figure 1: Representation of tandem repeat expansion. A) The CAG repeat tract lengthens with each subsequent expansion event. B) Longer repeats speed earlier disease onset and enhance disease progression. Figure created with BioRender.

            When it comes to elucidating the underlying mechanisms of disease-associated repeat expansions, it is difficult to find someone with a higher level of expertise than Dr. Christopher E. Pearson – a Canada Research Chair in Disease-Associated Genome Instability, a Senior Scientist at The Hospital for Sick Children, and a Full-Professor at the University of Toronto. In a career that spans nearly three decades, Dr. Pearson has published 97 publications largely focusing on tandem repeat DNA sequences and the mechanism of disease-causing repeat expansion. Looking back on his decision to pursue what Dr. Pearson calls “dynamic mutations” back in 1993, he considers himself fortunate to have discovered something that has captured his curiosity and become increasingly relevant all these years. 

The inspiring work of Dr. Pearson and his team has contributed greatly to our current understanding of repeat expansions. His recent publications featured here, have catapulted the field closer to developing a treatment that can potentially reverse repeat-associated neurodegenerative diseases.

Repeat expansions as a driver of disease

            In molecular genetics, the adage “you can’t harvest what you haven’t planted” holds true. One cannot design a treatment for a complex genetic disorder without first understanding the molecular mechanisms of its pathogenicity4. In HD, the root cause of disease is the inheritance and ongoing expansion of tandem repeats, where the repeats expand throughout an individual’s life, causing symptoms to worsen2. Although the exact mechanism of expansion has remained elusive, several factors involved in repeat instability have been established. They include repeat length, slipped-DNA structures, and the influence of DNA repair proteins5.

A distinguishing feature of disease genes with expanded repeats is the presence of unusual slipped-DNA structures. Slipped-DNAs form at expanded repeats when unwound DNA attempts to reanneal but does so incorrectly, “much like a mis-aligned zipper”, says Pearson (Fig. 2). Slipped-DNAs occur only if the gene contains a threshold number of tandem repeat units, where greater number of repeats enhances slip-out formation. Slipped-DNAs are critical because they act as mutagenic intermediates of instability by attracting DNA repair proteins, which ultimately drive further repeat expansion, which enhances slip-DNA formation…leading to a compounding cycle of expansion mutations. These DNA repair proteins introduce additional repeats through the error-prone attempts to repair the slipped-DNAs – in this manner, rather than protecting against mutation the repair proteins are driving mutations (Fig. 2)6.

Figure 2: Overview of repeat expansion mechanism. Unwound DNA (such as that found during transcription) may re-anneal out-of-register in highly repetitive regions. Mispairing between repeats results in the formation of slip-out DNA structures. DNA repair proteins attempt to resolve these slipped-DNAs, but instead induce further repeat expansions. Figure created with BioRender.

            Dr. Pearson remembers identifying slipped-DNAs by accident during his time as a post-doctoral fellow. He recalls thinking at that moment that these unusual structures must be important and might even be the key to novel therapeutics. Lo and behold, Dr. Pearson’s suspicions turned out to be right.

Overview of mutation-centric therapeutic targets

            Multiple therapeutic approaches can target various downstream pathogenic aspects of HD, such as lowering the mutant repeat RNA transcript or mutant protein aggregates. Current approaches looking to treat repeat expansion disorders at the root-cause, the DNA mutation, have either targeted the repeat sequences themselves, or the DNA repair proteins involved in repeat expansions4. However, a significant limitation of these approaches is that they lack the specificity required to treat only the disease-causing gene in affected cells, while avoiding the normal gene and other off-target effects. Dr. Pearson provides the example of potentially targeting MSH3 or FAN1, DNA repair proteins that drive or supress CAG expansions6. Key features of these proteins is their DNA structure-specificity, meaning they only recognize and process unusual structures like slipped-DNAs. MSH3 and FAN1 can modulate repeat stability by either promoting or inhibiting repeat expansion5,7. Additionally, certain variations in the MSH3 and FAN1 genes can alter the age-of-onset and progression of various repeat expansion disorders, including HD. Taken together, altering levels of MSH3 or FAN1 could therapeutically modulate expanded pathogenic repeats. However, due to the involvement of MSH3 and FAN1 in maintaining the integrity of the entire genome through DNA repair, targeting these proteins would certainly affect their actions elsewhere beyond the mutant CAG tract. One can expect modulating the levels or activities of MSH3 and FAN1 will cause widespread DNA abnormalities, possibly resulting in cancer.  This lack of specificity could be worrisome.

A novel molecule targets slip-out structures to reverse repeat expansion 

            Hoping to find an alternative therapeutic avenue that can address the challenge of specificity, Dr. Pearson and colleagues designed the small molecule DNA ligand Naphthyridine–Azaquinolone (NA). This molecule has a high degree of specificity to slip-out structures within expanded CAG repeats, effectively providing a means of differentiating between normal and pathogenic alleles, as well as the rest of the genome8. This feature of NA reduces its off-target effects and can be attributed to Dr. Pearson’s unique appreciation for the importance of structure-specificity: “Slipped-DNAs only form at the disease repeats that are long and unstable, this provides exact specificity of NA to only the disease gene.”

            Although the discovery of a molecule that could recognize and bind pathogenic CAG repeats was exciting, Dr. Pearson admits that the group had no prior knowledge of whether this molecule could prevent repeat expansions, let alone induce contractions. He adds “It was a blind experiment…stabilized repeats would be good, contractions would be even better, but enhanced expansions would be really bad”. Subsequent work by Dr. Pearson and colleagues demonstrated that in addition to its binding specificity, NA stabilized and shortened the expanded repeats in affected brain cells. “We were ecstatic that NA induced CAG contractions in the brain to less than what the HD mice inherited”, explains Dr. Pearson. NA is believed to obstruct the processing of slip-out structures by FAN1, thus inducing CAG contractions, but details of this obstruction remain to be elucidated. 

            Dr. Pearson explains that in addition to having spectacular specificity, NA induces contractions in the majority of treated brain cells in HD mice. This is astounding feat considering that NA must cross both cellular and nuclear membranes to reach its target DNA. Moreover, Dr. Pearson and colleagues observed an improvement in motor coordination of these mice after only four weeks of treatment with NA9. Assuming the effects in mouse models can be translated into humans, the effectiveness of NA in treating repeat expansion disorders is extremely promising. Given the complexity and progressive degeneration of these conditions, NA’s rapid and effective onset of action, makes the molecule an attractive treatment option for HD individuals. While direct delivery to the central nervous system is an option, the ability for NA to cross the blood-brain barrier, which is unknown, would facilitate delivery. Further studies are needed to enhance delivery, and characterize this molecule’s tissue distribution and safety profile.

The Future of HD Therapeutics: Just Keep Fishing

            According to Dr. Pearson, the first-of-its-kind approach of targeting slip-out-structures with NA has advanced the field of HD therapeutic development. However, as this approach is still in its infancy, whether NA will survive the “valley of death” – a term used to describe the hurdles of drug development – is still unknown. Dr. Pearson intends to continue improving the druggability and safety profile of NA up until its translation to the bedside: “We will do what we can to improve delivery and safety – we’re working on that now.”

            Dr. Pearson’s team are investigating other potential therapeutic avenues centered upon targeting expansions – or in his terms, “fishing in multiple waters”. These approaches include identifying new DNA repair proteins involved in expansions, screening for inhibitors/modifiers of MSH3, FAN1 or other DNA repair proteins. Dr. Pearson emphasizes that “Fishing in multiple waters increases the likelihood that one of these approaches will cross the long, wide and deep valley of death” and go on to become an approved treatment for HD. Were more than one approach to succeed, combinatorial therapeutic regimens could be developed to further enhance patient outcomes. Despite current excitement and hope, Dr. Pearson acknowledges that crossing this valley is a long and challenging journey and credits the young, bright, and intelligent students and fellows in his lab for taking up the challenge.

The applicability of NA in treating repeat expansion disorders

            Might the discovery of NA be applied to other repeat expansion disorders? That NA targets CAG slip-outs suggests it could act on the other 15 CAG-expansion disorders, including spinocerebellar ataxias and dentatorubral-pallidoluysian syndrome (DRPLA). Dr. Pearson and his team recently revealed that NA contracted CAG repeats and improved motor coordination in a mouse model of DRPLA8, validating the broad applicability of this approach. 

                  Looking to the future, Dr. Pearson is expanding his focus to other repeat expansion disorders, such as amyotrophic lateral sclerosis, frontotemporal dementia, and schizophrenia. Dr. Pearson claims, “the likelihood that other repeat sequences causing other diseases are forming unusual mutagenic structures is extremely high, which is why we are searching for ligands to those”. As the field of repeat expansion disorders continues to advance, Dr. Pearson is ready to face new questions that will arise for him and his team to address. 

Hoping to motivate young minds, Dr. Pearson concludes our interview by thoughtfully reminding us of the importance of pursuing interests, not career paths: “follow your nose, follow what excites your curiosity”. 


1.        Lu, X. H. & Yang, X. W. ‘ Huntingtin Holiday’ : Progress toward an Antisense Therapy for Huntington’s Disease. Neuron 74, 964–966 (2012).

2.        Flower, M. D. & Tabrizi, S. J. A small molecule kicks repeat expansion into reverse. Nat. Genet. 52, 136–137 (2020).

3.        Gall-Duncan, T., Sato, N., Yuen, R. K. C. & Pearson, C. E. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res. 32, 1–27 (2022).

4.        Malik, I., Kelley, C. P., Wang, E. T. & Todd, P. K. Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat. Rev. Mol. Cell Biol. 22, 589–607 (2021).

5.        Deshmukh, A. L. et al. FAN1, a DNA Repair Nuclease, as a Modifier of Repeat Expansion Disorders. J. Huntingtons. Dis. 10, 95–122 (2021).

6.        Deshmukh, A. L. et al. FAN1 exo- not endo-nuclease pausing on disease-associated slipped-DNA repeats: A mechanism of repeat instability. Cell Rep. 37, 110078 (2021).

7.        Porro, A. et al. FAN1-MLH1 interaction affects repair of DNA interstrand cross-links and slipped-CAG/CTG repeats. Sci. Adv. 7, 1–13 (2021).

8.        Hasuike, Y. et al. CAG repeat-binding small molecule improves motor coordination impairment in a mouse model of Dentatorubral–pallidoluysian atrophy. Neurobiol. Dis. 163, 105604 (2022).

9.        Nakamori, M. et al. A slipped-CAG DNA-binding small molecule induces trinucleotide-repeat contractions in vivo. Nat. Genet. 52, 146–159 (2020).

Neurodevelopmental Disorders: Where does my child fall on the spectrum?

University of Toronto researcher, Dr. Lucy Osborne, aims to discover novel genetic factors contributing to the wide spectrum of phenotypes observed in cognitive disorders. She strives to help families better predict the clinical implications of such complex conditions.

Neta Pipko and Celia Pennimpede

Dr. Lucy Osborne, PhD is the Canada Research Chair in Genetics of Neurodevelopmental Disorders. She is also a Professor in the Departments of Medicine and Molecular Genetics at the University of Toronto. Photo provided by Dr. Osborne, photographed by Mikaeel Valli.

Identifying the correct diagnosis for a child’s underlying behavioural and learning disabilities is challenging since the same symptoms can be caused by a number of disorders. Therefore, when the pieces of the puzzle finally begin to form a picture, parents start to experience an immense sense of relief when placing a concrete label on their child’s symptoms. A diagnosis, however, may often serve as a double-edged sword when it comes to neurodevelopmental disorders (NDDs).

NDDs are a group of complex conditions that affect brain development and growth, impairing several cognitive and behavioural features such as learning, self-discipline, language, and social communication1. Common NDDs include conditions such as intellectual disability, autism spectrum disorder (ASD), and attention-deficit/hyperactivity disorder (ADHD). Signs and symptoms appear early in childhood development and can fall within a wide spectrum, ranging from mild to severe phenotypes1. Pinpointing the correct diagnosis is particularly challenging since symptoms often overlap and co-occur amongst different NDDs. Consequently, therapeutic interventions should be tailored to the specific NDD and its characteristic features. Therefore, an official diagnosis can offer parents a tremendous wave of comfort and ease. However, the battle does not end here, since the large spectrum of phenotypes adds a layer of uncertainty to managing this diagnosis.

Toronto-based researcher, Dr. Lucy Osborne, hopes to help parents get some of the answers they are looking for. Dr. Osborne is a principal investigator and professor at the University of Toronto in the Departments of Medicine and Molecular Genetics. She also holds the title of Canada Research Chair in Genetics of Neurodevelopmental Disorders. Her work largely focuses on two rare NDDs, Williams-Beuren syndrome (WS) and 7q11.23 duplication syndrome (Dup7). WS and Dup7 are caused by the reciprocal deletion and duplication of the same ~25 genes on human chromosome 7, respectively (Fig 1)2. Deletions and duplications are structural genetic changes called copy number variants (CNVs) that lead to the loss and gain of genetic material3. Studying reciprocal CNVs of the same genetic segment offers Dr. Osborne a golden opportunity to evaluate how the copy number of a gene may impact neuronal development.

Figure 1. The two CNVs within the 7q11.23 region on human chromosome 7. Typically developing individuals have two copies of the 7q11.23chromosomal region. Those with WS have a deletion of this region, whereas individuals with Dup7 have a duplication of this region. Figure generated using Biorender and adapted from the Osborne Lab4.

“A small set of genes can have such a huge impact on cognition and behaviour,” Dr. Osborne answers when asked what fascinates her about the two NDDs she studies. “It really changes how somebody appears and sees the world.”

WS and Dup7 are distinct disorders with overlapping and opposing phenotypes (Fig 2), likely attributed to the varying copy number of some of the genes in the 7q11.23 critical genetic region2. While unique in their own ways, both of these NDDs are associated with a wide spectrum of clinical manifestations. “A syndrome is not written in stone. You have a list of phenotypes spread across all the people you see, and very few have all of those symptoms, but the question is why,” says Dr. Osborne. She revealed that the greatest challenge is having no way of predicting the extent of a child’s disability despite reaching a final diagnosis. “Parents want to make some sort of plan or have some expectation about what that diagnosis is going to mean, but there is huge variation,” says Dr. Osborne. “We have no predictors right now and that is [a huge burden] for families.”

Figure 2. Common phenotypic features of WS and Dup7. The genetic nature of the 7q11.23 CNVs results in both overlapping and opposing behavioural and physiological features in patients with the two disorders. Figure adapted from Osborne & Mervis2.

Interestingly, two children with the same CNV and diagnosis may fall on opposite ends of a phenotypic spectrum. Dr. Osborne aims to unravel what might be contributing to this widespread continuum to find some predictors for families. In a recent collaborative study with SickKids genetic scientist Dr. Ryan Yuen, the two research groups investigated why some individuals with Dup7 have an additional ASD diagnosis. “Anecdotally, a lot of the [Dup7] kids coming into our study already had a diagnosis with autism, but most of them did not have autism,” Dr. Osborne explained. By virtue of their separation anxiety and shy nature, those kids got labeled and lumped in with other ASD children without going through a formal diagnostic test. However, after putting them through the proper assessment, it was identified that most of the Dup7 kids were misdiagnosed, rather only ~20% of them had an additional clinical ASD diagnosis5. Notably, 20% appears as a striking increase when compared to the general population’s ASD prevalence rate of ~1.5%6. Therefore, Dr. Osborne wondered, “could they (Dup7 kids with ASD) have a ‘second hit’ layered on top of this one CNV that pushes them over the edge that the others do not?”

Unlike monogenic diseases that are caused by a single gene, complex disorders have an array of different variants (mutations) and environmental factors contributing to the disease outcome. Thus, Drs. Osborne and Yuen hypothesized that Dup7 children diagnosed with ASD are likely to carry additional rare damaging variants in ASD-relevant genes. These additional variants are known as genetic modifiers, which suppress or enhance the phenotype of the primary disease-causing gene7. Typically, the more additional variants or ‘hits’, the more severe the phenotype8. This is known as the ‘multiple hit model’, which contributes to the wide variability and overlap in symptoms observed in individuals with NDDs2,8.

To test their hypothesis, they performed whole-genome sequencing (WGS) on twenty Dup7 individuals, half of whom had an ASD diagnosis. WGS is a tool that reads the entire DNA sequence of an individual, which they used to look for second hits across the genome that may be contributing to the ASD phenotype. Unfortunately, they did not identify any variants that could explain the ASD diagnosis9. Surprised by this analysis, Dr. Osborne states, “It wasn’t as simple as that. It wasn’t the CNV and one additional hit that will push you towards autism. It’s more complicated than that”. She explains that rather than one large second hit, there may be a collection of smaller hits with smaller impacts that ultimately add up.

This concept can be visualized as a cup with two thresholds, one for Dup7 filled about halfway, and one for the ASD phenotype bordering the top of the cup (Fig 3)2. Dr. Osborne describes that on their own, modifier genes with small effects are not enough to fill up the cup and push you over either threshold. However, for those with Dup7, their cups are already half full and have surpassed the first threshold. Therefore, Dr. Osborne presumes that unlike in typically developing children, these additional small modifiers may be the distinguishing factors that push the kids with Dup7 and ASD, over the edge (Fig 3, Threshold B).

Figure 3. Model of genetic factors contributing to common and variable features of 7q11.23 CNV disorders. In typically developing individuals, the combination of genetic and environmental factors falls below thresholds A and B. However, individuals with the 7q11.23 copy number variantsare predisposed to WS or Dup7, which on its own is enough to pass Threshold A. Other genetic and environmental factors may modify the phenotype observed if these contributors cumulatively surpass Threshold B. Figure adapted from Osborne & Mervis2.

Even though they failed to identify a clear correlation between having a second hit and an ASD diagnosis, it does not mean these hits are not present. Dr. Osborne explains that the smaller hits are much more difficult to find. In fact, the effects of genetic modifiers are becoming more apparent in complex diseases, including NDDs, and will likely become a major focus of genomics research moving forward.

When asked whether the lack of association with ASD was discouraging, Dr. Osborne said, “No, not really. You ask questions and do not know what answer you will get”. In fact, the team discovered a phenotypic association when shifting their focus towards examining Dup7 as a whole, rather than splitting the children into groups based on ASD diagnosis. They successfully found that some rare variants correlate to various clinical phenotypic measures, such as intellectual ability and adaptive behavior9. This finding could lead to the future development of polygenic risk scores for Dup7. Polygenic risk scores estimate an individual’s relative risk of developing a disease by calculating the weighted sum of all genetic and environmental contributors10. This cumulative measure can hold predictive value in estimating severity in such phenotypic features. In the case of Dup7, polygenic risk scores can estimate an individual’s level of cognition and aspects of behaviour. Ideally, this information could help inform families about whether their child will be shy, socially independent, communicative, and what their intellectual abilities may look like in the future. “Being able to place your child at one extreme or the other would be valuable,” Dr. Osborne explains. This study “gives us hope that there will be other measures that we will be able to find” to further increase the predictive value of these symptoms.

The degree of success attained in identifying such predictors sparked a similar study in children with WS. Dr. Osborne shared that they are in the process of examining whole genomes of ~250 WS children for potential correlations between rare variants and scores measuring cognitive abilities, patterns in social behavior, as well as cardiovascular outcomes. Like Dup7, Dr. Osborne hopes this research brings them one step closer to finding enough predictors to develop polygenic risk scores for WS as well. Identifying an individual’s relative risk can allow for the introduction of personalized therapeutic interventions early on in life, such as speech therapy or cardiovascular monitoring. While these scores hold some predictive power, they should always be taken with a grain of salt, as they should not be used for diagnosis.

Despite not finding the associations they were looking for with ASD, Drs. Osborne and Yuen did find associations with phenotypic measures that may explain some of the variation observed with NDDs. Shifting gears when studying complex disorders is often needed as many different genetic, environmental, and lifestyle factors contribute to the overall clinical manifestation of a disease. “Don’t be afraid to tackle something that is complex,” says Dr. Osborne. “You can still find answers for things even if you know it’s going to be complicated, and it really does take teamwork.” Dr. Osborne shares that while the research field was previously quite competitive, the scientific community is beginning to realize that there is large value in collaborating and integrating patient data. Examining syndromes from different angles will give you a more comprehensive insight into NDDs, ultimately granting families more certainty when planning and investing in their child’s future.


1.   Morris-Rosendahl, D. J. & Crocq, M.-A. Neurodevelopmental disorders—the history and future of a diagnostic concept. Dialogues Clin. Neurosci. 22, 65–72 (2020).

2.   Osborne, L. R. & Mervis, C. B. 7q11.23 deletion and duplication. Curr. Opin. Genet. Dev. 68, 41–48 (2021).

3.   Hastings, P., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).

4.   About Us | Osborne Lab.

5.   Klein-Tasman, B. P. & Mervis, C. B. Autism Spectrum Symptomatology Among Children with Duplication 7q11.23 Syndrome. J. Autism Dev. Disord. 48, 1982–1994 (2018).

6.   Lyall, K. et al. The Changing Epidemiology of Autism Spectrum Disorders. Annu. Rev. Public Health 38, 81–102 (2017).

7.   Rahit, K. M. T. H. & Tarailo-Graovac, M. Genetic Modifiers and Rare Mendelian Disease. Genes 11, 239 (2020).

8.   Guo, H. et al. Genome sequencing identifies multiple deleterious variants in autism patients with more severe phenotypes. Genet. Med. 21, 1611–1620 (2019).

9.   Qaiser, F. et al. Rare and low frequency genomic variants impacting neuronal functions modify the Dup7q11.23 phenotype. Orphanet J. Rare Dis. 16, 6 (2021).

10. Lewis, A. C. F. & Green, R. C. Polygenic risk scores in the clinic: new perspectives needed on familiar ethical issues. Genome Med. 13, 14 (2021).

Maternal Vitamin C Intake Regulates the Epigenome During Germline Development

The Santos Lab has played a pivotal role in understanding how maternal diet affects the development of embryos. Their findings reveal that maternal vitamin C deficiency dysregulates the epigenetic landscape in embryonic germ cells.

Yayra Gbotsyo, Saloni Modi, Anthea Travas

Dr. Miguel Ramalho-Santos is a Senior Investigator at the Lunenfeld-Tanenbaum Research Institute, specializing in mammalian development. He is also a Professor in the Molecular Genetics Department at the University of Toronto. (Image taken from the Santos Lab website).

Maternal diet is an essential factor that affects the health and development of offspring during pregnancy. Several lines of evidence demonstrate that poor maternal nutrition, such as lack of vitamin C (vitC) during pregnancy, leads to abnormal fetal development. Changes in the in vitro environment due to external factors such as smoking, and drinking pose long-term consequences for the developing embryo. Such consequences are shaped by a process known as epigenetics. Epigenetics is the study of DNA and histone methylation patterns that alter chromatin state and gene expression (Goldberg et al. 2007). At the forefront of developmental research is Dr. Miguel Ramalho-Santos, who uses cutting-edge technology to understand epigenetic regulation during gestation.

Dr. Ramalho-Santos explains that he was drawn to Toronto’s vibrant research community and their collaborative efforts in developmental stem cell biology. He received his Ph.D. at Harvard University in 2002, where he trained as a developmental biologist. He then became a Fellow at the University of California San Francisco (UCSF). In 2007, he became an Assistant Professor at the UCSF and was later promoted to Associate Professor in 2013. In 2018, Dr. Ramalho-Santos was recruited to become the Canada 150 Research Chair in Developmental Epigenetics. This initiative recruits exceptional scholars to enhance Canada’s reputation for research and innovation. Alongside this endeavour, Dr. Ramalho-Santos is a Senior Investigator at the Lunenfeld-Tanenbaum Research Institute and a Professor in the Department of Molecular Genetics, at the University of Toronto. His lab uses mouse models to investigate how environmental inputs such as inadequate diet regulate proper gene transcription. Currently, he and his team aim to understand the underpinnings of gene activation during development at the right place and level.

Tet Enzymes Regulate DNA Methylation Patterns in Embryonic Stem Cells

Developmental epigenetics is the study of how environmental inputs influence gene expression during gestation1–3. Environmental factors such as nutrient availability during pregnancy can positively or negatively affect the way genes are expressed in the fetus1,2,4. According to Dr. Ramalho-Santos, the epigenetic landscape during development facilitates our understanding of how certain disruptions in adulthood can be traced back to insults in early uterine life3.

One of the most important takeaways from the Santos Lab was that “mammalian embryos are acutely aware of their mother’s environment.” In utero, the embryo is remarkably responsive to environmental agents that can alter their development2,5. This became evident when they realised that the epigenetic regulating enzyme, ten-eleven translocation (Tet), is dependent on maternal nutrient availability during development5–7. Tet enzymes play a key role in demethylating cytosine nucleotides, thereby removing methyl modifiers and making DNA more accessible6,8. This process promotes the transcription of many genes and maintains pluripotency, thereby giving rise to several different cell types. Tet enzymes function to demethylate DNA within the germ cells of the developing embryo8. Germ cells develop in the gonads of the growing embryo and will ultimately give rise to gametes9. (Figure 1). During embryonic development, DNA demethylation is important for keeping chromatin in an accessible state so that genes are actively expressed, and cell differentiation is restricted3,7. Therefore, Tet-mediated demethylation is crucial in maintaining embryonic stem cell (ESC) pluripotency10. Previous studies show that the offspring of Tet1 knockout mice have significantly reduced germ cells, which leads to compromised fertility10.  In order to further understand this process, Dr. Ramalho-Santos’ research investigates how environmental conditions, such as adequate access to vitC, modulates Tet activity.

Figure 1. Maternal Vitamin C (VitC) promotes germ-cell development by modulating Tet mediated demethylation. A) VitC taken by maternal F0 activates Tet enzymes, which promotes DNA demethylation in the F1 germ cells. Chromatin remodelling into an open state allows active transcription of genes and keeps embryonic germ cells in a stem cell-like state. B) VitC deficient embryos exhibit impaired Tet demethylation activity. As a result, DNA is kept in a non-permissive state. This dysregulation results in reduced embryonic germ cells in the F1 generation. (Figure is not quantitative) Figure created in Bio Render and adapted from3.

Maternal Vitamin C Deficiency Hinders Fecundity in Offspring

In 2013, Dr. Ramalho-Santos and his group discovered that Tet enzymes induced ESC to stay pluripotent when cultured in media containing vitC5. VitC is a potential cofactor of Tet enzymes and helps mediate demethylation5. More recently, these findings were implemented in pregnant mice models to understand how maternal vitC intake regulates Tet demethylation and thus embryonic germline development. 3,11 Dr. Ramalho-Santos explains that vitC supplemented from the diet can modulate Tet-mediated demethylation activity in the embryonic germ line cells (Fig. 1A)3,5,11. This keeps gene promoters accessible for transcription. While VitC deficient F1 embryos are viable, there is a reduction in Tet Demethylation activity in the F1 germ cells, which hinders their ability to give rise to the next generation (Figure 1B, 2)3,11. Interestingly, embryos deficient in vitC have transcriptomes and phenotypes that are remarkably similar to embryos with Tet1 deficiency3. These phenotypes include a reduced number of germ cells in the ovary, reduced fertility, and defects in meiosis3,10,11. In Tet1-deficient mice, defects in meiosis are proposed to be due to insufficient demethylation that fails to activate meiotic genes. These novel findings exemplify the effects of intergenerational epigenetics. This highlights that the F0 maternal environment propagates long-term impacts in the F1 generation.

Figure 2. Maternal Vitamin C deficiency causes defects in embryonic germ cells. VitC deficiency in F0 females leads to intergenerational effects, where the F1 embryo has reduced germ cell count and reduced fecundity in adulthood. Figure created in Bio Render and adapted from4.

When asked whether the effects of vitC deficiency were reversible in mice, Dr. Ramalho-Santos explained that this is only possible if vitC is reintroduced before mid-gestation11. After this point, the adverse effect on germ cells was irreversible. This became an important discovery for demonstrating that vitC is essential during mid-gestation11. The irreversible effects of vitC dysregulates demethylation, thereby hindering germline development11.

Environmental Factors Influencing the Epigenetic Role of Vitamin C

Dr. Ramalho-Santos’ work reveals exciting insights into how maternal vitC intake regulates the germ cells of mice embryos. To this end, one may wonder how these findings relate to the world outside of a lab setting. Inadequate vitC intake may be a reality for individuals with lower incomes and inadequate access to fresh produce12,13. Additionally, studies demonstrate that exposure to pollutants such as cigarette smoke and heavy metals can inhibit or inactivate vitC through oxidation reactions 14,15. This ultimately hamper’s vitC’s role as a cofactor of Tet, thereby dysregulating the epigenetic landscape of developing embryonic germ cells16,17. In the current day and age, regardless of geographical location, exposure to the aforementioned toxic substances has become common and further compounds the effects of vitC deficiency. This reality poses a concern for pregnant mothers as these adverse environmental contributors can lead to dysregulation of Tet.  This imposes serious epigenetic impacts that may not be noticed until after the fetus becomes an adult (i.e. reduced fecundity).

Translating the Effects of Maternal Vitamin C Deficiency from Mice to Humans

Today, researchers strive to accumulate insights into factors that hinder molecular pathways and lead to downstream effects on fetal development. However, it is also important to understand how these scientific findings in animal models translate to improving the health of human populations. Previous studies have demonstrated that vitC modulates Tet enzymes and maintains pluripotency in human ESCs (cells extracted from early human embryos)18. While many of the experiments done in the Santos lab are modelled in mice, they are difficult to replicate in humans and hold ethical barriers. Translation into human subjects would require studying vitC deficiency during pregnancy and then tracing the fecundity of the human offspring throughout adulthood. While this data does not currently exist; Mount Sinai Hospital’s ‘Ontario Birth Study’ program holds promise for recruiting relevant cohorts. This program collects clinical data for pregnant women to understand factors that contribute to maternal and child health. This data is based on multi-generational cohorts, making it immediately accessible for researchers to study trends in the role of epigenetics across many generations.

It is well-known that only a fraction of a disease is attributed to specific genetic defects. Research in the Santos lab addresses some of the missing gaps towards  explaining the epigenetic context of diseases. By studying the role of vitC in fetal development, many further opportunities have opened up for exploring the epigenetic effect of other environmental factors. Some of these factors include the availability of food, temperature changes, stress, and exposure to pathogens. Studies based on these factors will provide insights into how environmental inputs shape the genome over time.

Future Directions

For the Santos Lab, the effects of vitC on mammalian development has opened doors to new questions and research directions. If embryos can respond to their surroundings, “we wonder what else the offspring can sense”. He aims to further explore how environmental inputs, both good and bad, ensure that gene expression happens at the appropriate pace and level. Alongside this environment-epigenome project, Dr. Ramalho-Santos is interested in understanding the biological significance of hyper-transcription, a state of accelerated gene expression in ESCs and other stem cells19. Recent work has shed light on the importance of hyper-transcription in processes such as embryogenesis, neurogenesis, and development19. Dr. Ramalho-Santos explains that there is a link between nutrient availability and hyper-transcription. Instead of entering a hyper-transcription state, the lack of nutrition leads ESCs to become dormant. Ultimately resulting in serious consequences such as missed developmental milestones.

Dr. Ramalho-Santos’ overarching goal is to provide scientific evidence that “development doesn’t happen in a vacuum.” Instead, embryonic development is remarkably reflective of its surrounding maternal environment. Overall, the Santos lab has highlighted that vitC consumption during pregnancy is important for DNA demethylation and plays a key role in establishing the epigenetic landscape of embryonic germ cells. It is important to understand that vitC influences epigenetic changes as they leave a long-lasting effect on many characteristics in offspring.


1.   John, R. M. & Rougeulle, C. Developmental Epigenetics: Phenotype and the Flexible Epigenome. Front. Cell Dev. Biol. 6, 130 (2018).

2.   Legoff, L., D’Cruz, S. C., Tevosian, S., Primig, M. & Smagulova, F. Transgenerational Inheritance of Environmentally Induced Epigenetic Alterations during Mammalian Development. Cells 8, 1559 (2019).

3.   DiTroia, S. P. et al. Maternal vitamin C regulates reprogramming of DNA methylation and germline development. Nature 573, 271–275 (2019).

4.   Coker, S. J., Smith-Díaz, C. C., Dyson, R. M., Vissers, M. C. M. & Berry, M. J. The Epigenetic Role of Vitamin C in Neurodevelopment. Int. J. Mol. Sci. 23, 1208 (2022).

5.   Blaschke, K. et al. Vitamin C induces Tet-dependent DNA demethylation in ESCs to promote a blastocyst-like state. Nature 500, 222–226 (2013).

6.   Dawlaty, M. M. et al. Loss of Tet Enzymes Compromises Proper Differentiation of Embryonic Stem Cells. Dev. Cell29, 102–111 (2014).

7.   Jenkins, T. G. & Carrell, D. T. Dynamic alterations in the paternal epigenetic landscape following fertilization. Front. Genet. 3, 143 (2012).

8.   Kohli, R. M. & Zhang, Y. TET enzymes, TDG and the dynamics of DNA demethylation. Nature 502, 472–479 (2013).

9.   Cinalli, R. M., Rangan, P. & Lehmann, R. Germ Cells Are Forever. Cell 132, 559–562 (2008).

10. Caldwell, B. A. et al. Functionally distinct roles for TET-oxidized 5-methylcytosine bases in somatic reprogramming to pluripotency. Mol. Cell 81, 859-869.e8 (2021).

11. Ebata, K. T. et al. Vitamin C induces specific demethylation of H3K9me2 in mouse embryonic stem cells via Kdm3a/b. Epigenetics Chromatin 10, 36 (2017).

12. Mosdol, A., Erens, B. & Brunner, E. J. Estimated prevalence and predictors of vitamin C deficiency within UK’s low-income population. J. Public Health 30, 456–460 (2008).

13. Shohaimi, S. et al. Occupational social class, educational level and area deprivation independently predict plasma ascorbic acid concentration: a cross-sectional population based study in the Norfolk cohort of the European Prospective Investigation into Cancer (EPIC-Norfolk). Eur. J. Clin. Nutr. 58, 1432–1435 (2004).

14. Rider, C. F. & Carlsten, C. Air pollution and DNA methylation: effects of exposure in humans. Clin. Epigenetics 11, 131 (2019).

15. Liu, S. et al. Arsenite Targets the Zinc Finger Domains of Tet Proteins and Inhibits Tet-Mediated Oxidation of 5-Methylcytosine. Environ. Sci. Technol. 49, 11923–11931 (2015).

16. Efimova, O. A., Koltsova, A. S., Krapivin, M. I., Tikhonov, A. V. & Pendina, A. A. Environmental Epigenetics and Genome Flexibility: Focus on 5-Hydroxymethylcytosine. Int. J. Mol. Sci. 21, 3223 (2020).

17. Yin, R. et al. Ascorbic Acid Enhances Tet-Mediated 5-Methylcytosine Oxidation and Promotes DNA Demethylation in Mammals. J. Am. Chem. Soc. 135, 10396–10403 (2013).

18. Chung, T.-L. et al. Vitamin C Promotes Widespread Yet Specific DNA Demethylation of the Epigenome in Human Embryonic Stem Cells. Stem Cells 28, 1848–1855 (2010).

19. Bulut-Karslioglu, A. et al. Chd1 protects genome integrity at promoters to sustain hypertranscription in embryonic stem cells. Nat. Commun. 12, 4859 (2021).

Harnessing the power of population databases, one study at a time

Dr. Shreejoy Tripathy and his team demonstrate the power of the UK BioBank population database in a study that unpacks the complicated interplay between schizophrenia polygenic risk score, psychotic episodes, and cannabis use.

Milcah Sutanto, Gabriela Tanumihardja, & Yuan Tian

Dr. Shreejoy Tripathy, Ph.D. (right) is pictured with postdoctoral fellow Dr. Michael Wainberg, Ph.D. (left). Dr. Tripathy is an independent scientist at the Krembil Centre for Neuroinformatics within the Centre for Addiction and Mental Health and an Assistant Professor in the Department of Psychiatry at the University of Toronto. Photo provided by Dr. Tripathy.

Have you ever wondered if genetics and the environment interact to play a role in the context of mental illnesses? This is exactly what Dr. Shreejoy Tripathy (Ph.D.), an Assistant Professor at the University of Toronto and an independent scientist at the Krembil Centre for Neuroinformatics within the Centre for Addiction and Mental Health, seeks to understand. The importance of understanding mental illnesses has been heightened with the onset of the COVID-19 pandemic. The pandemic has had a significant negative impact on the mental health of the general population worldwide1. In Canada specifically, 1 in 5 people experience a mental illness annually2. This demonstrates the urgent need to better understand the underlying causes of mental illnesses in hopes of developing both preventative and treatment strategies. Emerging research has been centred around understanding the development of mental illnesses; this has included investigating the interplay between genetic and environmental factors3. One strategy used to study these gene-environment relationships is large population databases, like the UK BioBank4. In collaboration with Dr. Michael Wainberg (Ph.D.), a postdoctoral fellow, Dr. Tripathy used the UK BioBank to investigate the relationship between cannabis use and psychotic experiences in the general population and those with a genetic predisposition for schizophrenia5.


Presentation of schizophrenia

Schizophrenia is a complex heritable mental illness that has a long-term impact on patients and society6. The symptoms of schizophrenia are usually classified as either positive, negative, or cognitive (Figure 1)6. Positive symptoms are characterized by a distortion or amplification of normal behaviours, such as hallucinations, whereas negative symptoms are indicated by a loss or dampening of normal functions, such as reduced emotional expression. Cognitive symptoms consist of difficulties in memory and attention.

Figure 1: Diagram depicting the potential symptoms of schizophrenia6. There are three classifications of symptoms. Positive symptoms arebehaviours that are distorted or amplified from normal behaviours, including hallucinations, delusions, disorganized speech, and confused thoughts. Negative symptoms are behaviours that show a loss or decrease in normal functions such as a lack of pleasure and struggling with daily routine (a lack of motivation). Cognitive symptoms can include memory problems and impaired sensory perception. Image created in

Unearthing the heritability behind schizophrenia

“Genetics has been really useful in psychiatry and [in] helping [us] to understand and assess risk for various [mental] illnesses”, stated Dr. Tripathy when asked about the implications of genetics in psychiatric research. One of the first methods used to study the genetic component of developing mental illnesses was twin studies3. This technique evaluates whether a certain trait is more commonly shared in monozygotic twins (genetically identical) compared to dizygotic twins (non-genetically identical). Traits that are shared more commonly between monozygotic twins are considered more heritable, indicating that the traits are more heavily influenced by genetic factors. Interestingly, recent twin studies have estimated schizophrenia’s heritability to be between 60-65%, which alludes to the importance of genetic factors for its expression3. Moreover, it has been widely accepted that first-degree relatives of schizophrenic patients have a higher risk of developing schizophrenia compared to those without affected first-degree relatives3. Overall, the variation within an individual’s genetic makeup significantly contributes to the risk of developing schizophrenia.

Like virtually all mental illnesses, schizophrenia is a complex polygenic disease, which relies on the action of several different genes to manifest7. To find the genes that are significantly associated with the disease, genome-wide association studies (GWAS) are often conducted. GWAS examines the genomes of a large set of individuals, with and without the disease of interest, and looks for genetic markers that can be used to predict the occurrence of the disease (Figure 2)3. GWAS has linked more than 100 common single nucleotide polymorphisms (SNPs), spanning more than 600 genes, with the development of schizophrenia3. Each of these genetic markers, also known as genetic variants, found by GWAS can be used to statistically estimate an individual’s risk of developing the disease due to genetics alone. This statistical estimate, often referred to as polygenic risk score (PRS), is calculated by taking the weighted sum of the risk of each disease-associated genetic variant7. With many genetic variants contributing to the PRS to a small degree, it is difficult to determine the overall risk of developing the disease without considering other factors, such as the environment.

Figure 2: Simplified outline of a schizophrenia GWAS3. A schizophrenia GWAS seeks to understand the relationship between having both schizophrenia and common genetic variants found within the population. The genomes of two large groups of individuals with and without schizophrenia are analyzed for genetic markers that may be predictive of developing schizophrenia. These genetic markers are identified by analyzing genetic SNPs within the population. These markers are then statistically analyzed to determine if they can be significantly associated with schizophrenia. Figure created in

Dr. Tripathy noted that “for the most part there are no psychiatric disorders that are completely due to genetics”. In fact, it has been well established that most psychiatric illnesses are a product of the interaction between genetic and environmental factors. The development of schizophrenia has been linked with exposure to many environmental factors such as childhood trauma, contraction of certain viral and bacterial infections, socioeconomic factors, and the use of cannabis6. The interaction between genetic and environmental factors is complex, and often very difficult to disentangle. Large-scale population databases that contain significant genetic and non-genetic information, like the UK BioBank, can be used to further investigate these relationships.

Using the UK BioBank to unravel the interaction between the PRS of schizophrenia, psychotic experiences, and cannabis use

Dr. Tripathy’s research lab used the UK BioBank to unpack the relationship between the PRS of schizophrenia and cannabis use. The UK BioBank is a large open-access resource that contains anonymized genetic and non-genetic information from 500,000 UK residents and is updated regularly8. This database includes information on participants’ genome-wide genotypes, physical measurement examinations, health-related records, and answers to online questionnaires (Figure 3). When the participants joined the UK BioBank project, they ranged between 40-69 years old, which allowed for the data collection on any age-related health problems and baseline data before the onset of any severe diseases. However, an important limitation of this database to note is its lack of diversity—most participants were White British. All in all, the UK BioBank was created to inspire well-powered research to determine the true effect of genetic and non-genetic factors contributing to disease. The availability of this online database to researchers around the world has spurred on many studies that focus on health-related research to improve clinical care. As explained by Dr. Tripathy, “these types of datasets are really powerful”. The wide range of information available in this population database will also allow researchers to see potential connections and correlations, inspiring new studies that could further the field.

Figure 3: Schematic of the data collection points for the UK BioBank8. The UK BioBank collects data from 500,000 study participants. This data includes genetic and non-genetic information. Non-genetic data consists of information collected from health-related records, physical measurement exams, interviews, and self-reported questionnaires. Figure created in

As data analysts, Drs. Tripathy and Wainberg evaluated the available data in the UK BioBank and found that there were over 150,000 participants who completed the Mental Health Questionnaire and self-reported information relating to substance use5. They quickly realized that this massive amount of data could be used to investigate the interaction between schizophrenia and cannabis use–providing an important insight into the development of the disease. When talking about this study, Dr. Tripathy remarked that it was especially “timely because cannabis has been legalized in Canada… and it’s increasingly becoming decriminalized throughout the world”. The use of cannabis is very common amongst Canadians–1 in 4 Canadians reported to have used cannabis within the past 12 months in the 2021 Canadian annual statistics9

Dr. Tripathy and his team performed a cross-sectional analysis using approximately 110,000 UK BioBank participants from unrelated White British ancestry5. They compared data from healthy participants (without a clinical diagnosis of schizophrenia) with high and low schizophrenia PRS to investigate the impact of ever having used cannabis in their lifetime on having psychotic experiences. Specifically, they looked for statistically significant associations between PRS, cannabis use frequency, and psychotic experiences like auditory and visual delusions. They found that the use of cannabis is more strongly associated with early-onset psychotic experiences in participants with a higher schizophrenia PRS compared to those with a lower schizophrenia PRS5. However, it is important to note that an association does not mean causation.

While this study was unable to establish causation, high-powered population databases, like the UK BioBank, can be used to define meaningful associations that have potential clinical applications. With both genetics and environmental factors coming to play in the development of schizophrenia, these results have indicated a potential avenue for preventive risk management5. For example, in this case, individuals with a higher PRS of developing schizophrenia could be advised to avoid cannabis, especially early on in their lives, in hopes of prolonging or preventing disease presentation. Looking to the future, the increased access and decriminalization of cannabis across the globe should lead the way to better knowledge dissemination and education regarding the intricacies of cannabis use.   

The future of research using large population databases

This study shows that meaningful associations can be made by harnessing the power of large population databases, like the UK BioBank. The use of large-population databases in research can help reduce the timeframe required to complete research projects. Additionally, the results produced by research projects analyzing large amounts of data are robust, as the amount of data available for analysis is much larger than what one single study can gather. As Dr. Tripathy explains in the case of the UK BioBank, “one cross-sectional study using half a million people may be better than 100 studies that use 50 people”. This is just the beginning of population database research and there are many possibilities within the field that has yet to be explored4

When asked about the potential research projects that can use this type of resource, Dr. Tripathy noted that while it may be relatively “easy to generate data…it’s still really hard to figure out what it means”, referring to the difficulties present in data analysis. One potential method to overcome this problem is programming. For instance, in this study, Dr. Tripathy and his team used programming languages like Python and R to analyze data from more than 110,000 patients from the UK BioBank5. With this being such a data-driven project, Dr. Tripathy mentioned that the most exciting part of conducting this research was having the chance to collaborate with and learn from his colleagues. He emphasizes that constant learning is a large part of this field and urges the next generation of scientists to become familiar with at least one programming language.  He advises, “To anyone who’s interested in research in science, I would strongly encourage taking a programming class”. Learning a programming language like R or Python can help fill the high demand for data analysts with the skillset required to process large datasets. With the future of research becoming more data-centric, this is one step you can take to better situate yourself for a successful career in data research.


1.         Tsamakis, K. et al. COVID‑19 and its consequences on mental health (Review). Exp. Ther. Med. 21, 1–1 (2021).

2.         Fast Facts about Mental Health and Mental Illness. CMHA National

3.         Zhuo, C. et al. The genomics of schizophrenia: Shortcomings and solutions. Prog. Neuropsychopharmacol. Biol. Psychiatry 93, 71–76 (2019).

4.         Stewart, R. & Davis, K. ‘Big data’ in mental health research: current status and emerging possibilities. Soc. Psychiatry Psychiatr. Epidemiol. 51, 1055–1072 (2016).

5.         Wainberg, M., Jacobs, G. R., di Forti, M. & Tripathy, S. J. Cannabis, schizophrenia genetic risk, and psychotic experiences: a cross-sectional study of 109,308 participants from the UK Biobank. Transl. Psychiatry 11, 211 (2021).

6.         Owen, M. J., Sawa, A. & Mortensen, P. B. Schizophrenia. The Lancet 388, 86–97 (2016).

7.         Foley, C., Corvin, A. & Nakagome, S. Genetics of Schizophrenia: Ready to Translate? Curr. Psychiatry Rep. 19, 61 (2017).

8.         Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

9.         Canada, H. Canadian Cannabis Survey 2021: Summary. (2021).

Using large transcriptome studies to characterize the role of microglia in neurological disease

Tanvi Anandampillai

A comprehensive transcriptome assessment revealed that many neurological disease susceptibility loci modulate neurological disease risk by altering gene expression in microglia, key players in brain aging and pathology.

Microglia, the immune cells of the brain, have been implicated in various neurological diseases such as Alzheimer’s Disease (AD) and Parkinson’s Disease (PD)1. These cells are involved in inflammatory responses, neurodevelopment, regulation of brain homeostasis and neurogenesis1. Being immune cells, they are strongly influenced by their environment, leading to a highly heterogenous transcriptome across various brain regions, ages and pathologies2. This heterogeneity complicates the task of characterizing causal variants that modulate disease risk as large sample sizes are required to identify statistically significant variants2. In the most recent issue of Nature Genetics, Lopes et al.2 tackled this very task by creating the Microglia Genomic Atlas (MiGA). As the most public and comprehensive microglial transcriptomic resource to date, it was used to understand the drivers of microglial heterogeneity and identify potential causal variants in these neurological diseases (Figure 1)2. This publicly available resource will help inform future genetic studies for the broader neuroscience community2.

Figure 1: The database MiGA was built using 255 microglial samples isolated from 4 different brain regions of 100 individuals with varying neurological conditions. The RNA was isolated and sequenced. Genome wide genotyping of the DNA was performed. All of this information was stored in MiGA. Figure from2

Lopes and colleagues’ study began with the identification of the biological factors that drive the heterogeneity of the microglial transcriptome. Their analysis concluded that age and brain region were drivers of variance in the microglial transcriptome, with a subset of genes that strongly varied between the different brain regions2. Within this subset, the largest number of differentially expressed genes (DEGs) were between the subventricular zone and the cortical regions, while the smallest number of DEGs were between the two cortical regions (Figure 1)2. This finding emphasizes that differing brain environments leads to differing microglial transcriptomes, and this must be factored in when studying the role of microglia in disease. The authors also observed that the expression of 1693 genes varied, about 1/5th of which were upregulated and the rest downregulated, across the chronological age of the donors2. Similarly, 150 genes had 255 differentially spliced transcripts that varied, with a shift in balance between the long and short isoforms of some genes, across different ages. A majority of the genes that varied with age overlapped with previously associated loci in AD3 and PD4, as determined by genome wide association studies (GWAS). The identification of age related changes in both gene expression and splicing in microglial cells, that overlap with disease-associated loci will help inform future research on these neurodegenerative disorders. In particular, these genes can be looked at as potential drug targets to curb the progression of these age-related disorders. Further, the inclusion of these findings in MiGA, a public resource, speaks to the impact of this work in informing future studies.

The authors2 then chose to examine the genetic drivers of microglial heterogeneity by establishing quantitative trait loci (QTLs). As both gene expression and splicing varied across their samples, Lopes et al.2 established expression QTLs (eQTLs – loci that explain variation in mRNA levels) and splicing QTLS (sQTLs – loci that regulate pre-mRNA splicing) for their microglial samples (Figure 2). The authors found that AD and PD had the highest number of colocalizing GWAS loci in both QTL datasets, relative to other diseases such as schizophrenia and bipolar disorder2. This finding validates the role that microglia are known to play in the progression of these two diseases5,6. The colocalization of microglial QTLs with disease loci can be leveraged by researchers in this field to discern the exact location of a causal variant and help identify potential drug targets.

Lopes et al.2 then described two examples of how their comprehensive eQTL and sQTL database can help hone in on disease risk loci in both AD and PD. Specifically, their database can be of use when a single nucleotide polymorphism (SNP) sits in an intergenic location, and the causal gene is still unknown. For example, the lead SNP of a GWAS study was found to lie between ECHDC3 and USP6NL7. The authors determined that the latter gene harbored an eQTL SNP that increases its expression in microglia2. They then used fine mapping to determine that both the GWAS SNP and the USP6NL eQTL SNP overlapped with a microglial specific enhancer2. However, this microglial enhancer only had long range connections with the promoter of USP6NL, suggesting that between ECHDC3 and USP6NL, the latter is the AD risk gene. This was an interesting and novel finding as, in the past, the analysis of ECHDC3 was prioritized as it was found to be upregulated in post-mortem samples of AD patients2. Similarly, with the use of their eQTL database and fine-mapping, Lopes et al.2 suggested that P2RY12, a gene that sits within the GWAS associated MED12L locus is the exact PD risk gene. This demonstration of zooming in on the disease-associated locus using their eQTLs coupled with the incorporation of the eQTLs and sQTLs into MiGA, speaks to the usefulness of this database. Correctly identifying the disease-risk loci can lead to target identification and can aid therapeutic drug development.

Figure 2: The MiGA database was used to perform the following analyses: Age-related heterogeneity, brain-region related heterogeneity, eQTL and sQTL analysis, colocalization and fine-mapping of eQTLs with disease associated GWAS loci. Figure from2

This paper culminated in the formation of the comprehensive MiGA database, whose translational applications include the discovery of causal variants and the subsequent identification of drug targets for neurological disorders that currently lack promising therapeutic options.


1.        Ransohoff, R. M. & el Khoury, J. Microglia in Health and Disease. Cold Spring Harbor Perspectives in Biology 8, (2016).

2.        Lopes, K. de P. et al. Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature genetics 54, 4–17 (2022).

3.        Raj, T. et al. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nature genetics 50, 1584–1592 (2018).

4.        Li, Y. I., Wong, G., Humphrey, J. & Raj, T. Prioritizing Parkinson’s disease genes using population-scale transcriptomic data. Nature communications 10, (2019).

5.        Kam, T. I., Hinkle, J. T., Dawson, T. M. & Dawson, V. L. Microglia and astrocyte dysfunction in parkinson’s disease. Neurobiology of disease 144, (2020).

6.        Fakhoury, M. Microglia and Astrocytes in Alzheimer’s Disease: Implications for Therapy. Current neuropharmacology 16, (2018).

7.        Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nature genetics 51, 414–430 (2019).

Defective DNA Polymerases Can Lead to Colon and/or Endometrial Cancer

Anahita Bahreini-Esfahani

Unfaithful replication of the genome due to faulty Pol ε and Pol δ DNA polymerases can lead to predispositions of colorectal and endometrial cancers. Surprisingly, carriers of POLE/POLD1 germline mutations do not exhibit overt phenotypes of premature aging.

During each cycle of cell division, polymerases play a key role in replicating the genome. In humans, DNA replication is mainly performed by polymerases Pol ε and Pol δ, which are responsible for the synthesis of the leading and lagging strand, respectively. Both enzymes exhibit proof-reading activity1 (Figure 1).

Figure 1. During replication of the genome, as helicase unwinds the double-stranded DNA, synthesis of the new DNA molecule is initiated by the primers generated by Pol α-primase. DNA synthesis occurs in the 5’ to 3’ direction and the leading strand (shown in green) is synthesized continuously by DNA polymerase ε (Pol ε) whereas the lagging strand (shown in blue) is synthesized discontinuously by DNA polymerase δ (Pol δ). (Figure taken from2)

It has been previously shown that POLE exonuclease mutations lead to high single base substitutions (SBS) while POLD1 exonuclease mutations display less elevated SBS and high microsatellite instability. The mutations generated by defective POLE and POLD1 exonucleases reveal replication strand bias, which is expected due to their separate roles in replicating leading and lagging strands3. These findings have been confirmed using functional studies in yeast and mice4,5. If POLE and POLD1 exonuclease mutations occur in the germline, they can be inherited and cause a rare autosomal dominant cancer predisposition known as polymerase proofreading-associated polyposis (PPAP) which is mainly defined by early-onset tumors in the colon and endometrium6.

Accumulation of somatic mutations has been hypothesized as the main biological mechanism underlying aging7. There have been reports confirming increases in somatic mutation burden in a linear manner8; however, not all somatic mutations will have a significant biological consequence. The study of individuals with inherited POLE/POLD1 exonuclease mutations can shed light on the downstream effects of elevated mutation burdens and the genetics of aging.

In a study by Robinson et. al, samples were taken from 14 individuals aged 17-72 years and divided into 4 groups based on the germline exonuclease domain mutation they were carrying; All 14 individuals had a family history of colorectal cancer and/or other cancers. The researchers of this study focused on mutagenesis and mutational signatures in intestinal stem cells, mutagenesis in endometrial cells, mutagenesis during early embryogenesis, and differential mutational burdens across the genome.

Using whole-genome sequencing (WGS) methods, intestinal crypts from the 14 individuals revealed a range of 58-331 SBS rate per year in comparison to 49 SBS per year in crypts from healthy individuals. Thus, elevated rates of SBS rates are present in all otherwise normal intestinal cells of individuals harbouring POLE/POLD1 germline mutations. Moreover, small insertion and deletion (ID) mutation rates ranged from 12-44 per year in individuals with POLE/POLD1 compared with 1 per year in individuals without POLE/POLD1 mutations.

Eleven SBS mutational signatures were detected in normal intestinal crypts obtained from individuals with POLE/POLD1 germline mutations. Nine of these SBS mutational signatures were previously reported and the 2 previously unreported mutational signatures were revealed in normal crypts from individuals with POLD1 mutations. These mutational signatures allowed Robinson et. al to attribute the increases in SBS burdens from POLE/POLD1 germline mutation carriers to specific mutations. Similar trends were observed in the endometrial cells of the females in this study.

When Robinson et. al performed WGS on whole-blood samples of individuals carrying POLE/POLD1 mutations, the number of early embryogenesis single-base pair (bp) insertions was highly increased in some individuals. This heterogeneity is likely due to the maternal to zygotic transition of gene expression. If a POLE/POLD1 mutation is paternally inherited, the defective proof-reading polymerase is delayed until the zygote’s gene expression machinery is activated. However, If the mutation is maternally inherited, the faulty polymerase is also inherited by the zygote since the zygote inherits the proteins and mRNAs of the ovum. This leads to a high burden of mutations in early embryogenesis. These findings point to the fact that mutagenesis as a result of malfunctioning POLE/POLD1 proofreading is observed even at the earliest stages of life.

Robinson et al. also compared somatic mutations across the genome to the mutation load in the exome of individuals who carried germline POLE/POLD1 mutations. They found elevated mutation rates in cells of all types, but mutation rates were significantly increased in the colon and endometrium more than other tissues such as the skin. The hypothesis behind this finding is that differing stem cell division rates occur in the colon and endometrium. This finding can also partially explain why individuals with POLE/POLD1 mutations are more prone to colorectal and endometrial cancers including PPAP9.

In sum, this study demonstrates how normal cell types from carriers of POLE/POLD1 exonuclease germline mutations exhibit mutational signatures and elevated levels of somatic SBS and ID mutation rates. The amount of the increase in mutation rate seems to be larger in intestinal and endometrial epithelium than in the other cell types that were studied. This is important when discussing the somatic mutation theory of aging- a theory suggesting that as we age, we accumulate mutations that lead to a set of phenotypic features collectively known as aging10. This study shows that other than the increase in prevalence to colon and endometrial cancer, POLE/POLD1 germline exonuclease mutations do not cause premature aging. This indicates that many of our cells tolerate high SBS/ID mutations and somatic mutations alone do not underlie the process of aging. It is vital for future studies to address the shortcomings of this experiment, such as small sample size and to take a deeper dive into the genetics of aging. In a recent genome-wide association study (GWAS) done by Timmers et al., aging phenotypes such as healthspan, lifespan and longevity were found to be affected by 10 genomic loci. Follow-up studies using both GWAS studies and animal models can lead to therapeutic targets that can increase our chances of living longer, or to the very least, slow down the process of aging.


  1. Burgers, P. et al., Who is leading the replication fork, Pol ε or Pol δ? Molecular Cell. 4, 492-493 (2016).
  2. Marin-Garcia, J., Introduction to the molecular biology of the cell. Post-Genomic Cardiology. 2, 3-14 (2014).
  3. Morrison, A. et al., A third essential DNA polymerase in S. cerevisiae. Cell 62, 1143–1151 (1990).
  4. Venkatesan, R. N. et al., Mutation at the polymerase active site of mouse DNA polymerase increases genomic instability and accelerates tumorigenesis. Mol. Cell. Biol. 27, 7669–7682 (2007).
  5. Barbari, S. R. et al., Functional analysis of cancer-associated DNA polymerase ε variants in Saccharomyces cerevisiae. G3 (Bethesda) 8, 1019–1029 (2018).
  6. Palles, C. et al., Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet. 45, 136–143 (2013).
  7. Vijg, J. & Dong, X. Pathogenic mechanisms of somatic mutation and genome mosaicism in aging. Cell 182, 12–23 (2020).
  8. Blokzijl, F. et al., Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
  9. Robinson, P. et al., Increased somatic mutation burdens in normal human cells due to defective DNA polymerases. Nat Genet 53, 1434–1442 (2021).
  10. Szilard, L. On the nature of the aging process. Proc. Natl Acad. Sci. USA 45, 30–45 (1959).
  11. Timmers, P. et al, Multivariate genomic scan implicates novel loci and haem metabolism in human ageing. Nat Commun 1, 3570-3571 (2020).