Competitive transcription factor binding regulates hemoglobin switching

Meredith Laver

Fetal hemoglobin expression is repressed following a transition to adult hemoglobin production at birth. For individuals with β-hemoglobinopathies, which impair the function of adult hemoglobin, reversing this switch is a promising route towards curative therapies. New research by Liu et al. proposes a simple model for hemoglobin switching based on competition between transcription factors BCL11A and NF-Y for binding sites in the γ-globin gene promoter.

Sickle cell disease (SCD) and β-thalassemia, collectively known as β-hemoglobinopathies, are common monogenic disorders in which abnormal hemoglobin production impairs erythrocyte production and function1,2. Approximately 300-400,000 annual global births are afflicted with a β-hemoglobinopathy1,2. Despite a high mortality rate, these diseases have remained extremely prevalent, and easily accessibly therapies are desperately needed1,2,3.  

β-hemoglobinopathies are caused by variant alleles in the β-globin gene which alter or disable the β-subunit of hemoglobin. Adult hemoglobin is made up of two α-globin subunits and two β-globin (HbA, α2β2)1. An alternate globin protein, γ-globin, is expressed during fetal development and contributes two subunits of fetal hemoglobin (HbF, α2γ2)1. As expression of β-globin rises during late fetal development, γ-globin production is inhibited; HbF comprises <2% of hemoglobin in adults4. Sustained expression of HbF in adults is occasionally observed and is termed Hereditary Persistence of Fetal Hemoglobin (HPFH)1,4. HPFH has been associated with reduced symptom load in patients with β-hemoglobinopathies, implicating relief of γ-globin repression as a potential treatment avenue2. HPFH is typically caused by mutations in the γ-globin promoter which fall in distinct clusters, suggesting that they might disrupt the binding motif of a regulatory repressor5. A recent study by Liu et al. identified competitive binding between repressive transcription factor BCL11A and activator NF-Y at an HPFH cluster site in the γ-globin promoter as a key mechanism of expression control4.

            The β-globin gene cluster on chromosome 16 contains 5 globin genes, including β-globin and γ-globin (Figure 1)1. In addition to the individual gene promoters, the region is regulated by a locus controls region (LCR) containing 5 DNase hypersensitivity sites (HSs) with varying degrees of regional enhancer activity1,2. Transcription of each globin gene is associated with looping between the promoter and the LCR. Developmental specificity is conveyed by the individual promoters while the LCR generally enhances transcription5. Transcription factors BCL11A and LRF have both been shown to repress transcription of γ-globin in adult erythroid cells through promoter binding and recruitment of the NuRD silencing complex6. HPFH mutation sites at -115 and -200 in the γ-globin promoter align with BCL11A and LRF binding sites, respectively4,7. CRISPR-Cas9 mediated disruption of the -115 bp binding motif reduces BCL11A binding and reproduces the HPFH phenotype, but the specific mechanism by which BCL11A silences γ-globin expression has been previously unclear7.

Figure 1 β-globin Locus with BCL11A binding sites (Adapted from Cavazzana et al., 2017)1

Liu et al.4 performed CRISPR-Cas9 perturbation screens to assess γ-globin expression in adult erythroid cell lines expressing Cas9 variants. Pooled gRNAs targeting the β-globin cluster at 11-bp intervals were introduced, and transfected cells with high HbF expression were isolated for analysis. Enrichment or depletion of gRNAs was quantified to identify sequences at which Cas9 activity was correlated with increased HbF expression. Inactive Cas9 (dCas9) binds to a target region but does not cleave DNA. dCas9 targeted to the LRF repressor binding site at approximately -200bp in the γ-globin promoter was associated with increased HbF expression, consistent with displacement of LRF. However, dCas9 binding at the BCL11A TGACCA binding motif at -115 bp was associated with reduced HbF expression. This effect was replicated by dCas9 targeting to other sites between -150 and -60 bp, suggesting that bound dCas9 was displacing an activating factor with a binding site in this region.  

NF-Y is an activating transcription factor with two possible binding sites within this range. It has been previously identified as an activator of globin gene expression, though the specific mechanism remained elusive8. Liu et al.4 found that shRNA knockdown of NF-Y subunit A in HbF-expressing erythroid precursors resulted in reduced LCR looping to the γ-globin promoter. This reduced looping was associated with decreased expression, suggesting a potential role for NF-Y in expression activation through facilitation of chromosomal looping. CUT&RUN located NF-Y binding at a CCAAT motif at -88 to -84 bp in the γ-globin promoter, within the -150 to -60 range identified by dCas9 screening. Mutation of this motif by Cas9 resulted in reduced NF-Y occupancy and decreased HbF levels. Conversely, mutation of the BCL11A motif at -115 bp resulted in increased NF-Y occupancy and γ-globin expression. dCas9 targeting to the -115 bp BCL11A binding site and other adjacent sites within the -60 to -150 bp window partially reduced NF-Y occupancy and decreased γ-globin expression in BLC11A knockout (KO) cells. Based on these results, the authors proposed a simple model of competitive binding between BCL11A and NF-Y as a major regulator of γ-globin expression (Figure 2). BCL11A is dramatically upregulated in adult erythroid cells as compared to fetal progenitors, which likely contributes to out-competition of NF-Y and transition to HbA production4.

Figure 2 Model of competitive binding between BCL11A and NF-Y in the γ-globin promoter­ (Adapted from Liu et al, 2021)5. A In the fetal erythroid progenitors, BCL11A expression is low and NF-Y activates γ-globin expression by binding to the CCAAT motif at -88 to -84 bp in the promoter. B In adult erythroid cells, BCL11A expression is high; it displaces NF-Y by binding to the TGACCA motif at -115 bp and recruits the NuRD silencing complex to repress γ-globin gene expression. NF-Y instead promotes β-globin expression, leading to the production of adult hemoglobin.

This simple model of γ-globin regulation suggests promising molecular targets for treatment of β-hemoglobinopathies via upregulation of HbF. Current clinical trials focus on downregulation of BCL11A via shRNA silencing or Cas9-mediated gene editing9,10. However, BCL11A has a key role in development of B-lymphocytes and hematopoietic stem cells, and downregulation negatively effects red blood cell enucleation1. Treatments targeting the -115 BLC11A site in the promoter may instead allow for highly specific relief of γ-globin repression. Cas9-mediated mutation of the binding site as demonstrated by Liu et al.4 is a potential therapy with long-term effect. Small proteins or ncRNAs may be also engineered to bind the site, but not to inhibit NF-Y binding. This approach lacks the risk of off-target mutation caused by Cas9 and may be translated into an affordable and easily produced therapeutic.   


  1. Cavazzana, M., Antoniani, C. & Miccio, A. Gene Therapy for β-Hemoglobinopathies. Mol Ther 25, 1142–1154 (2017).
  2. Frati, G. & Miccio, A. Genome Editing for β-Hemoglobinopathies: Advances and Challenges. J Clin Med 10, 482 (2021).
  3. Piel, F. B. J., Steinberg, M. H. & Rees, D. C. Sickle cell disease. N Engl J Med 376, 1561-1573 (2017).
  4. Liu, N. et al. Transcription factor competition at the γ-globin promoters controls hemoglobin switching. Nat Genet 53, 511–520 (2021).
  5. Bender, M. A., Bulger, M., Close, J. & Groudine, M. Beta-globin gene switching and DNase I sensitivity of the endogenous beta-globin locus in mice do not require the locus control region. Mol Cell 5, 387–393 (2000).
  6. Xu, J. et al. Transcriptional silencing of γ-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev 24, 783–798 (2010).
  7. Liu, N. et al. Direct Promoter Repression by BCL11A Controls the Fetal to Adult Hemoglobin Switch. Cell 173, 430-442.e17 (2018).
  8. Zhu, X. et al. NF-Y recruits both transcription activator and repressor to modulate tissue- and developmental stage-specific expression of human γ-globin gene. PLoS One 7, e47175 (2012).
  9. Frangoul, H. et al. CRISPR-Cas9 Gene Editing for Sickle Cell Disease and β-Thalassemia. N Engl J Med 384, 252–260 (2021).
  10. Esrick, E. B. et al. Post-Transcriptional Genetic Silencing of BCL11A to Treat Sickle Cell Disease. N Engl J Med 384, 205–215 (2021).

Using large transcriptome studies to characterize the role of microglia in neurological disease

Tanvi Anandampillai

A comprehensive transcriptome assessment revealed that many neurological disease susceptibility loci modulate neurological disease risk by altering gene expression in microglia, key players in brain aging and pathology.

Microglia, the immune cells of the brain, have been implicated in various neurological diseases such as Alzheimer’s Disease (AD) and Parkinson’s Disease (PD)1. These cells are involved in inflammatory responses, neurodevelopment, regulation of brain homeostasis and neurogenesis1. Being immune cells, they are strongly influenced by their environment, leading to a highly heterogenous transcriptome across various brain regions, ages and pathologies2. This heterogeneity complicates the task of characterizing causal variants that modulate disease risk as large sample sizes are required to identify statistically significant variants2. In the most recent issue of Nature Genetics, Lopes et al.2 tackled this very task by creating the Microglia Genomic Atlas (MiGA). As the most public and comprehensive microglial transcriptomic resource to date, it was used to understand the drivers of microglial heterogeneity and identify potential causal variants in these neurological diseases (Figure 1)2. This publicly available resource will help inform future genetic studies for the broader neuroscience community2.

Figure 1: The database MiGA was built using 255 microglial samples isolated from 4 different brain regions of 100 individuals with varying neurological conditions. The RNA was isolated and sequenced. Genome wide genotyping of the DNA was performed. All of this information was stored in MiGA. Figure from2

Lopes and colleagues’ study began with the identification of the biological factors that drive the heterogeneity of the microglial transcriptome. Their analysis concluded that age and brain region were drivers of variance in the microglial transcriptome, with a subset of genes that strongly varied between the different brain regions2. Within this subset, the largest number of differentially expressed genes (DEGs) were between the subventricular zone and the cortical regions, while the smallest number of DEGs were between the two cortical regions (Figure 1)2. This finding emphasizes that differing brain environments leads to differing microglial transcriptomes, and this must be factored in when studying the role of microglia in disease. The authors also observed that the expression of 1693 genes varied, about 1/5th of which were upregulated and the rest downregulated, across the chronological age of the donors2. Similarly, 150 genes had 255 differentially spliced transcripts that varied, with a shift in balance between the long and short isoforms of some genes, across different ages. A majority of the genes that varied with age overlapped with previously associated loci in AD3 and PD4, as determined by genome wide association studies (GWAS). The identification of age related changes in both gene expression and splicing in microglial cells, that overlap with disease-associated loci will help inform future research on these neurodegenerative disorders. In particular, these genes can be looked at as potential drug targets to curb the progression of these age-related disorders. Further, the inclusion of these findings in MiGA, a public resource, speaks to the impact of this work in informing future studies.

The authors2 then chose to examine the genetic drivers of microglial heterogeneity by establishing quantitative trait loci (QTLs). As both gene expression and splicing varied across their samples, Lopes et al.2 established expression QTLs (eQTLs – loci that explain variation in mRNA levels) and splicing QTLS (sQTLs – loci that regulate pre-mRNA splicing) for their microglial samples (Figure 2). The authors found that AD and PD had the highest number of colocalizing GWAS loci in both QTL datasets, relative to other diseases such as schizophrenia and bipolar disorder2. This finding validates the role that microglia are known to play in the progression of these two diseases5,6. The colocalization of microglial QTLs with disease loci can be leveraged by researchers in this field to discern the exact location of a causal variant and help identify potential drug targets.

Lopes et al.2 then described two examples of how their comprehensive eQTL and sQTL database can help hone in on disease risk loci in both AD and PD. Specifically, their database can be of use when a single nucleotide polymorphism (SNP) sits in an intergenic location, and the causal gene is still unknown. For example, the lead SNP of a GWAS study was found to lie between ECHDC3 and USP6NL7. The authors determined that the latter gene harbored an eQTL SNP that increases its expression in microglia2. They then used fine mapping to determine that both the GWAS SNP and the USP6NL eQTL SNP overlapped with a microglial specific enhancer2. However, this microglial enhancer only had long range connections with the promoter of USP6NL, suggesting that between ECHDC3 and USP6NL, the latter is the AD risk gene. This was an interesting and novel finding as, in the past, the analysis of ECHDC3 was prioritized as it was found to be upregulated in post-mortem samples of AD patients2. Similarly, with the use of their eQTL database and fine-mapping, Lopes et al.2 suggested that P2RY12, a gene that sits within the GWAS associated MED12L locus is the exact PD risk gene. This demonstration of zooming in on the disease-associated locus using their eQTLs coupled with the incorporation of the eQTLs and sQTLs into MiGA, speaks to the usefulness of this database. Correctly identifying the disease-risk loci can lead to target identification and can aid therapeutic drug development.

Figure 2: The MiGA database was used to perform the following analyses: Age-related heterogeneity, brain-region related heterogeneity, eQTL and sQTL analysis, colocalization and fine-mapping of eQTLs with disease associated GWAS loci. Figure from2

This paper culminated in the formation of the comprehensive MiGA database, whose translational applications include the discovery of causal variants and the subsequent identification of drug targets for neurological disorders that currently lack promising therapeutic options.


1.        Ransohoff, R. M. & el Khoury, J. Microglia in Health and Disease. Cold Spring Harbor Perspectives in Biology 8, (2016).

2.        Lopes, K. de P. et al. Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature genetics 54, 4–17 (2022).

3.        Raj, T. et al. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nature genetics 50, 1584–1592 (2018).

4.        Li, Y. I., Wong, G., Humphrey, J. & Raj, T. Prioritizing Parkinson’s disease genes using population-scale transcriptomic data. Nature communications 10, (2019).

5.        Kam, T. I., Hinkle, J. T., Dawson, T. M. & Dawson, V. L. Microglia and astrocyte dysfunction in parkinson’s disease. Neurobiology of disease 144, (2020).

6.        Fakhoury, M. Microglia and Astrocytes in Alzheimer’s Disease: Implications for Therapy. Current neuropharmacology 16, (2018).

7.        Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nature genetics 51, 414–430 (2019).

Defective DNA Polymerases Can Lead to Colon and/or Endometrial Cancer

Anahita Bahreini-Esfahani

Unfaithful replication of the genome due to faulty Pol ε and Pol δ DNA polymerases can lead to predispositions of colorectal and endometrial cancers. Surprisingly, carriers of POLE/POLD1 germline mutations do not exhibit overt phenotypes of premature aging.

During each cycle of cell division, polymerases play a key role in replicating the genome. In humans, DNA replication is mainly performed by polymerases Pol ε and Pol δ, which are responsible for the synthesis of the leading and lagging strand, respectively. Both enzymes exhibit proof-reading activity1 (Figure 1).

Figure 1. During replication of the genome, as helicase unwinds the double-stranded DNA, synthesis of the new DNA molecule is initiated by the primers generated by Pol α-primase. DNA synthesis occurs in the 5’ to 3’ direction and the leading strand (shown in green) is synthesized continuously by DNA polymerase ε (Pol ε) whereas the lagging strand (shown in blue) is synthesized discontinuously by DNA polymerase δ (Pol δ). (Figure taken from2)

It has been previously shown that POLE exonuclease mutations lead to high single base substitutions (SBS) while POLD1 exonuclease mutations display less elevated SBS and high microsatellite instability. The mutations generated by defective POLE and POLD1 exonucleases reveal replication strand bias, which is expected due to their separate roles in replicating leading and lagging strands3. These findings have been confirmed using functional studies in yeast and mice4,5. If POLE and POLD1 exonuclease mutations occur in the germline, they can be inherited and cause a rare autosomal dominant cancer predisposition known as polymerase proofreading-associated polyposis (PPAP) which is mainly defined by early-onset tumors in the colon and endometrium6.

Accumulation of somatic mutations has been hypothesized as the main biological mechanism underlying aging7. There have been reports confirming increases in somatic mutation burden in a linear manner8; however, not all somatic mutations will have a significant biological consequence. The study of individuals with inherited POLE/POLD1 exonuclease mutations can shed light on the downstream effects of elevated mutation burdens and the genetics of aging.

In a study by Robinson et. al, samples were taken from 14 individuals aged 17-72 years and divided into 4 groups based on the germline exonuclease domain mutation they were carrying; All 14 individuals had a family history of colorectal cancer and/or other cancers. The researchers of this study focused on mutagenesis and mutational signatures in intestinal stem cells, mutagenesis in endometrial cells, mutagenesis during early embryogenesis, and differential mutational burdens across the genome.

Using whole-genome sequencing (WGS) methods, intestinal crypts from the 14 individuals revealed a range of 58-331 SBS rate per year in comparison to 49 SBS per year in crypts from healthy individuals. Thus, elevated rates of SBS rates are present in all otherwise normal intestinal cells of individuals harbouring POLE/POLD1 germline mutations. Moreover, small insertion and deletion (ID) mutation rates ranged from 12-44 per year in individuals with POLE/POLD1 compared with 1 per year in individuals without POLE/POLD1 mutations.

Eleven SBS mutational signatures were detected in normal intestinal crypts obtained from individuals with POLE/POLD1 germline mutations. Nine of these SBS mutational signatures were previously reported and the 2 previously unreported mutational signatures were revealed in normal crypts from individuals with POLD1 mutations. These mutational signatures allowed Robinson et. al to attribute the increases in SBS burdens from POLE/POLD1 germline mutation carriers to specific mutations. Similar trends were observed in the endometrial cells of the females in this study.

When Robinson et. al performed WGS on whole-blood samples of individuals carrying POLE/POLD1 mutations, the number of early embryogenesis single-base pair (bp) insertions was highly increased in some individuals. This heterogeneity is likely due to the maternal to zygotic transition of gene expression. If a POLE/POLD1 mutation is paternally inherited, the defective proof-reading polymerase is delayed until the zygote’s gene expression machinery is activated. However, If the mutation is maternally inherited, the faulty polymerase is also inherited by the zygote since the zygote inherits the proteins and mRNAs of the ovum. This leads to a high burden of mutations in early embryogenesis. These findings point to the fact that mutagenesis as a result of malfunctioning POLE/POLD1 proofreading is observed even at the earliest stages of life.

Robinson et al. also compared somatic mutations across the genome to the mutation load in the exome of individuals who carried germline POLE/POLD1 mutations. They found elevated mutation rates in cells of all types, but mutation rates were significantly increased in the colon and endometrium more than other tissues such as the skin. The hypothesis behind this finding is that differing stem cell division rates occur in the colon and endometrium. This finding can also partially explain why individuals with POLE/POLD1 mutations are more prone to colorectal and endometrial cancers including PPAP9.

In sum, this study demonstrates how normal cell types from carriers of POLE/POLD1 exonuclease germline mutations exhibit mutational signatures and elevated levels of somatic SBS and ID mutation rates. The amount of the increase in mutation rate seems to be larger in intestinal and endometrial epithelium than in the other cell types that were studied. This is important when discussing the somatic mutation theory of aging- a theory suggesting that as we age, we accumulate mutations that lead to a set of phenotypic features collectively known as aging10. This study shows that other than the increase in prevalence to colon and endometrial cancer, POLE/POLD1 germline exonuclease mutations do not cause premature aging. This indicates that many of our cells tolerate high SBS/ID mutations and somatic mutations alone do not underlie the process of aging. It is vital for future studies to address the shortcomings of this experiment, such as small sample size and to take a deeper dive into the genetics of aging. In a recent genome-wide association study (GWAS) done by Timmers et al., aging phenotypes such as healthspan, lifespan and longevity were found to be affected by 10 genomic loci. Follow-up studies using both GWAS studies and animal models can lead to therapeutic targets that can increase our chances of living longer, or to the very least, slow down the process of aging.


  1. Burgers, P. et al., Who is leading the replication fork, Pol ε or Pol δ? Molecular Cell. 4, 492-493 (2016).
  2. Marin-Garcia, J., Introduction to the molecular biology of the cell. Post-Genomic Cardiology. 2, 3-14 (2014).
  3. Morrison, A. et al., A third essential DNA polymerase in S. cerevisiae. Cell 62, 1143–1151 (1990).
  4. Venkatesan, R. N. et al., Mutation at the polymerase active site of mouse DNA polymerase increases genomic instability and accelerates tumorigenesis. Mol. Cell. Biol. 27, 7669–7682 (2007).
  5. Barbari, S. R. et al., Functional analysis of cancer-associated DNA polymerase ε variants in Saccharomyces cerevisiae. G3 (Bethesda) 8, 1019–1029 (2018).
  6. Palles, C. et al., Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet. 45, 136–143 (2013).
  7. Vijg, J. & Dong, X. Pathogenic mechanisms of somatic mutation and genome mosaicism in aging. Cell 182, 12–23 (2020).
  8. Blokzijl, F. et al., Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
  9. Robinson, P. et al., Increased somatic mutation burdens in normal human cells due to defective DNA polymerases. Nat Genet 53, 1434–1442 (2021).
  10. Szilard, L. On the nature of the aging process. Proc. Natl Acad. Sci. USA 45, 30–45 (1959).
  11. Timmers, P. et al, Multivariate genomic scan implicates novel loci and haem metabolism in human ageing. Nat Commun 1, 3570-3571 (2020).

Using rational design to engineer virus-like particles that achieve therapeutic levels of postnatal in vivo gene editing

Kassandra Bisson

A recent study describes the novel development of engineered virus-like particles for delivery of base editors in vivo for therapeutic applications with increased efficiency and decreased off target affects compared to other viral and nonviral delivery strategies.

As genomic technologies rapidly improve, the prospect of using gene therapies to cure genetic diseases is becoming brighter. Base editors (BEs) have recently been developed that allow for single base substitutions, insertions, or deletions without requiring double stranded DNA breaks1. They thereby avoid unwanted consequences and widen the breadth of potential applications1. Both nonviral and viral strategies such as lipid nanoparticle (LNP) and adeno-associated viruses (AAVs) respectively, have been previously established to deliver DNA encoding BEs to desired tissues for correcting pathogenic point mutations1. There are, however, respective disadvantages to the established methods including off-target editing due to prolonged expression in transduced cells and potential for oncogenesis due to viral vector integration into the genome2.

To combat those major disadvantages, a novel approach developed by Banskota et al., uses rational design to engineer virus-like particles (eVLPs) that can mediate therapeutic levels of postnatal in vivo gene editing for many possible applications2. DNA-free VLPs are assemblies of viral proteins allowing infection of cells yet lack viral genetic material. They can therefore be used as vehicles for delivering gene editing proteins like Cas9 ribonucleoproteins (RNPs) or BEs to target tissues allowing correction of pathogenic point mutations. This approach utilizes the viral delivery strategy without risking viral genome integration, prolonged expression of BEs and reduces off target editing2. Prior VLP-mediated approaches had limited validation of in vivo therapeutic efficacy2. This rational design method of eVLPs by Banskota et depicted in Figure 1A, yields greater efficiency in both delivery and packaging of base editors than any of the previous viral and nonviral approaches.

The benefit of eVLP rational design was demonstrated in this study through use of mouse models, wherein single injections of eVLPs achieved therapeutic levels of base editing in multiple targeted tissues inclusive of the liver and eyes. In this study, in vivo liver base editing was investigated through targeting the proprotein convertase subtilisin/kexin type 9(PCSK9) gene which is known to be involved in cholesterol homeostasis3. Loss of function (LOF) mutations of PCSK9 can result in lower blood levels of low-density lipoprotein (LDL) leading to a reduced risk of atherosclerotic cardiovascular disease. Banksokta et al.’s eVLP design targeted and disrupted the splice donor at the boundary of PCSK9 exon 1 and intron 1 creating an LOF mutation which was a previously established BE strategy for knockdown of PCSK9 in the mouse liver3. Adult C57BL/6J mice were injected with the eVLPs retro-orbitally and base editing in the liver was monitored one week after the injection as seen in Figure 1B. From this test, 63% editing in bulk liver with the highest dose of 7 x 10^11 eVLPs was observed, which was comparable to the editing efficiency of the AAV-mediated and the LNP delivery systems. There was however, no detectable off-target editing above background levels observed in the eVLP method unlike in the AAV and LNP methods using the same BE and single guide RNA (sgRNA)4,5. These observations demonstrate the comparable efficiency alongside the reduced off target editing in vivo compared to existing strategies.

Figure 1: A. Exemplary depiction of engineered virus-like particle (eVLP) that encapsulates base-editor (BE) proteins fused to a murine leukemia virus (MLV) gag polyprotein via a linker which is cleaved by the MLV proteinase upon particle maturation in a glycoprotein envelope. B. Graphical representation of retro-orbital injection of rationally designed eVLPs targeting PCSK9 in7-week-old adult C57BL/6J mice. To monitor base editing correction, organs were harvested one week after injection and genomic DNA was sequenced and analyzed using high throughput sequencing (HTS) analysis. C. Graphical depiction of subretinal injections into mice. Five weeks following injection, electroretinography (ERG) was used to assess phenotypic rescue. The tissues were then harvested for base editing correction monitoring via HTS and analysis. Images adapted from2.

To demonstrate the applications of rationally designing eVLPs even further, Banskota et al. sought to use eVLPs to correct a disease-causing point mutation. Leber congenital amaurosis (LCA) in adult mice is an eye disorder primarily affecting the retina6. The mutation studied was in the retinoid isomerohydrolase(RPE65) gene (c.120C>T, p.R44X)which resulted in almost complete loss of visual function6. This murine pathogenic variant has a homologous mutation identified in humans that also causes LCA thereby further demonstrating the relevance of eVLPs for human applications6. The designed eVLPs encapsulated the adenine base editor ABE7.10-NG RNPs which converts A•T-to-G•C and sought to target the pathogenic point mutation causing LCA for base editing correction7. Adult rd12 mice were injected subretinally (Figure 1C) with the ABE7.10-NG-eVLPs resulting in a 12% correction of the R44X mutation in the RPE genomic DNA. This eVLP performance was compared to a previous lentiviral (LV) delivery method wherein a lentivirus encoding the same sgRNA and the ABE7.10-NG constructs generated an 11.5% correction2. In comparison, the use of eVLPs resulted in a 1.4-fold improvement in bystander-free correction relative to the LV treatment. This demonstrates that the eVLPs have comparable and slightly higher correction efficacy relative to the alternative LV method8. In addition, the rationally designed eVLPs were able to efficiently correct the pathogenic mutations in the mouse model resulting in improvements in visual function.

As with many of the new gene therapy approaches, this strategy offers a promising outlook for gene therapy however, there are still many more obstacles to overcome before it can be readily used in clinical practice. The main outcome of this study is that rationally designed eVLP treatments have been demonstrated to be more efficient for in vivo base editing in multiple organs than other BE delivery strategies. Studies investigating other tissues can help further the therapeutic potential of eVLPs. While eVLPs were investigated in mice models alongside preliminary primary human cell lines, further pharmacokinetic studies should be undertaken in other models.  In particular, nonhuman primate models can help determine dosing requirements, residence time and the cargoes of eVLPs for other applications. 


1. Newby GA & Liu DR. In vivo somatic cell base editing and prime editing. Mol Ther. 29(11):3107-3124. (2021). PMID: 34509669.

2. Banskota S, et al. Engineered virus-like particles for efficient in vivo delivery of therapeutic proteins. Cell. 185(2):250-265.e16 (2022). PMID: 35021064

3. Fitzgerald K, et al. Effect of an RNA interference drug on the synthesis of proprotein convertase subtilisin/kexin type 9 (PCSK9) and the concentration of serum LDL cholesterol in healthy volunteers: a randomised, single-blind, placebo-controlled, phase 1 trial. Lancet. 383(9911):60-68. (2014). PMID: 24094767.

4. Musunuru K, et al. In vivo CRISPR base editing of PCSK9 durably lowers cholesterol in primates. Nature. 593(7859):429-434. (2021). PMID: 34012082.

5. Rothgangl, T, et al. In vivo adenine base editing of PCSK9 in macaques reduces LDL cholesterol levels. Nat Biotechnol 39, 949–957 (2021).

6. Pang JJ, et al. Retinal degeneration 12 (rd12): a new, spontaneously arising mouse model for human Leber congenital amaurosis (LCA). Mol Vis. 11:152-62. (2005). PMID: 15765048.

7. Richter, M.F., et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat Biotechnol 38, 883–891 (2020).

8. Suh S, et al. Restoration of visual function in adult mice with an inherited retinal disease via adenine base editing. Nat Biomed Eng. 5(2):169-178. (2021). PMID: 33077938

Could multiple sclerosis be a kissing disease?

Hamid Farahmand

Multiple sclerosis is a complex disease, and its etiology has been a mystery for many years. Recently Bjornevik et al demonstrated strong evidence that the Epstein-Barr virus triggers Multiple sclerosis. This finding provides promising plan for vaccination against Multiple sclerosis in the future.

Multiple sclerosis (MS) is an autoimmune disorder of the central nervous system (CNS) that affects approximately 3 million people globally for which Canada with 90,000 MS cases (1:400) are one of the highest rates in MS 3. As yet, there is no ultimate treatment for MS. This disease mainly targets the immune system and affects the myelin sheath that surrounds neurons leading to inflammation, demyelination, and neuronal damage symptoms. While the exact cause of MS is yet to be understood, many well-established reports suggest that MS is a complex disease with both genetic and environmental factors influencing its manifestation 4. With regards to genetics factors, changes in the HLA-DRB1and IL7R immune genes are considered the strongest risk factors for the development of MS. Both genesprovide instructions for making HLA (human leukocyte antigen) and IL-7 (interleukin 7 receptor) proteins, respectively. HLAs are among the most crucial markers in the immune system which distinguish the body’s own proteins from foreign forms made by pathogens such as viruses and bacteria. On the other hand, IL-7 has a vital role of recognizing foreign substances and defending the body against various infections and diseases. Moreover, the most well-studied environmental factors in MS include vitamin D deficiency smoking, teenage obesity, sex (being female) and finally exposure to the Epstein-Barr virus (EBV) 4,5.

Recently, a team of researchers in the Neuroepidemiology Research Group at Harvard University published an article in Science reporting exciting evidence of EBV being the key element for the progression of multiple sclerosis. Over vigorous screenings on a cohort consisting of 10 million military personnel in the US army, researchers found that the risk of MS elevated 32-fold after infection with EBV2. However, infection with other viruses such as CMV (Cytomegalovirus) did not result in this increased trend of observed infection rates (out of 955 of whom were diagnosed with MS during the period of service). This is a remarkable finding that could be a steppingstone for new treatments for multiple sclerosis. The idea of a virus playing a role in the development of MS has previously been proposed experimentally, however this study is the first to both scientifically and demographically prove that EBV is the master key in the advancement of MS. EBV is one of the most prevalent DNA viruses that have been discovered in humans so far. Statistically, 95% of adult the population are infected with EBV worldwide1. This virus belongs to the herpesvirus family and develops into infectious mononucleosis, often called mono, when someone with herpes EBV delivers the virus by kissing an uninfected individual.  Once the infection is established, EBV becomes latent in the body by hiding in the immune system, through memory B cells and can remains dormant for life.  Even though it is often latent, it can be spread intermittently when it is triggered.  

Several pathophysiology theories have been suggested for how EBV infection could have such massive impact on MS. One hypothesis suggests that silent human endogenous retrovirus sequences become activated upon EBV virus infection. Nevertheless, after many investigations it is now clear that EBV has the ability to infect memory B cells through a process called the “latency period”. This occurs when EBV remains dormant within the cell and escapes from human immune system recognition however, during its latency it can also exploit normal B cell to be transformed to infected memory B cells via a sequential latency transcription event. The infected cells, just like normal B cells, proliferate in the germinal center inside of a secondary lymphoid tissue mainly tonsils which can then exit the tonsil and enter the brain of MS-infected individuals through blood circulation. The infected memory B cells remain in the nervous tissue for long latency periods and cause neuronal damages by producing pathogenic autoantibodies where the virus replicates (Figure 1).    

Figure 1: The fundamental mechanism of MS disease by EBV. The picture depicted how B cell is infected by EBV in tonsil and propagated in germinal center which then secreted into blood vessel.  In the brain of MS individuals, EBV-infected B cell not only produce anti-myelin antibodies but also reactivated T cell which subsequently secrete a series of inflammatory cytokines triggering damage in oligodendrocytes myelin, and neurons. Figure generated in BioRender, adapted from Bar-Or et al., 2021.

In summary, this finding can not only greatly enhance our understanding of the etiology of MS disease, but the inferred results and underlying mechanisms can also provide valuable information for the development of antiviral drugs against MS. After many trials and investigations, scientists are now able to focus on producing anti-EBV specific monoclonal antibodies recognizing EBV proteins expressed during latency, or another strategy is targeting various EBV-infected B cells pathways. Although there are still some gaps in the etiology of MS, initial efforts to decrease EBV-infected B cell load are currently on the edge as researchers are becoming more interested in their application. Overall, vaccination against MS in the near future may open a window to eliminate using common drugs such as rituximab, ocrelizumab and ofatumumab which are used to cure many autoimmune diseases 6. Finally, vaccination against MS may provide a way for people to have ever sweet kisses without the fear of getting MS!


  1. Bar-Or, A. et al. Epstein–Barr virus in multiple sclerosis: Theory and emerging immunotherapies. Trends in Molecular Medicine 27, 410–411 (2021).
  2. , K. et al. Longitudinal analysis reveals high prevalence of Epstein-Barr virus associated with multiple sclerosis. Science 375, 296–301 (2022).
  3. Latest MS Research News. Prevalence and incidence of MS in Canada and around the world – MS Society of Canada Available at: (Accessed: 17th February 2022)
  4. Donati, D. Viral infections and multiple sclerosis. Drug Discovery Today: Disease Models 32, 27–33 (2020).
  5. Hassani, A., Reguraman, N., Shehab, S. & Khan, G. Primary peripheral Epstein-Barr virus infection can lead to CNS infection and neuroinflammation in a rabbit model: Implications for multiple sclerosis pathogenesis. Frontiers in Immunology 12, (2021).
  6. Serafini, B., Rosicarelli, B., Veroni, C., Mazzola, G. A. & Aloisi, F. Epstein-Barr virus-specific CD8 T cells selectively infiltrate the brain in multiple sclerosis and interact locally with virus-infected cells: Clue for a virus-driven immunopathological mechanism. Journal of Virology 93, (2019).

De novo mutations in calcium-related genes help explain behavior in bipolar disorder

Yayra Gbotsyo

Novel research using knock in mouse models of de novo mutations in two calcium-related genes, MACF1 and EHD1 help uncover the genetic basis for behaviors such as hypersensitivity, lower attention rate and delayed discounting in individuals with bipolar disorder.

Bipolar disorder (BD) is a mental disorder characterized by recurring episodes of mania, depression, obstructive thinking and behavior, psychosis, and hallucinations1. Approximately 0.6% of Canada’s population is diagnosed annually, and 1% of people worldwide have the disorder2,3. Genetics has been known to play a role in the disease’s pathology, but its complex polygenic nature has made its downstream effects difficult to interpret4. Rare mutations and copy number variants in calcium-related genes as possible underlying causes of BD, but only accounted for 25% of the heritability1. Recently, 71 ultra-rare de novo mutations have also been associated with BD4. Regardless, a whole landscape of genetic causality for the disease remains unexplored.

Nakamura et al. pioneered a study into the functionality of de novo mutations in two calcium-related genes that they had previously identified, Microtubule Actin Crosslinking Factor 1 (MACF1) and EH Domain Containing 1 (EHD1).2,4. These genes had high residual variation intolerance scores (RVIS) and high probability of loss of function (LOF) (pLI) scores compared to other identified genes4. MACF1 encodes for a protein that interconnects microfilaments and microtubules and regulates migration and vesicle transport in neurites5. Neurites are projections extending from the cell bodies of neurons involved in the transmission of neuronal signals (Fig. 1A).

Figure 1. A. Neurite development. Neurites are projections extending from the cell body of a developing neuron. Minor neurites become dendrites, while the major neurite becomes an axon. Mice lacking Ehd1 compared grew shorter neurites compared to wild type mice.B. Endocytosis and vesicle transport in PC12 cells. Starved PC12 cells were exposed to transferrin to assess uptake by endocytosis. Mutant Ehd1 cells had poor uptake compared to wild type cells. Figure 1A adapted from6 and Figure 1B adapted from2.

Knockdown of Macf1 in mice brains cause severe malformations, and mice lacking Macf1 die in early embryonic stages7. The frameshift mutation in MACF1 (p.V622fs) as observed in cases of BD, results in non-sense mediated mRNA decay2. The second gene, EHD1 produces a scaffold protein that is crucial for outgrowth of neurites and possesses a calcium-binding EH-domain essential for vesicle transport, endocytosis, and recycling of membrane receptors8. The role of Ehd1 is unparalleled in patients with spinal cord injury where neurite outgrowth is needed for recovery9. Mice do not survive homozygous knockout of Ehd12. The de novo frameshift mutation in EHD1 in patients with BD causes a stop codon that produces a defective protein with no EH-domain2. MACF1 and EHD1 have thus been marked as key players for brain function, due to their role in neurites.

Nakamura et al. successfully confirmed the expression of the MACF1 isoform in human cerebral cortex and thalamus cells using RT-PCR2. Expression vectors carrying the EHD1 mutation, tagged with a mCherry fluorescent protein, were transfected into PC12 cells followed by β-NGF stimulation for neurite growth. Unsurprisingly, mCherry-EHD1-WT expressing PC12 cells developed longer neurite outgrowths compared to mutant cells2. This highlighted the function of Ehd1 in the development of neurites and its potential downstream implications on brain function. Furthermore, starving and exposing PC12 cells to transferrin revealed poor vesicle transport and uptake in mCherry-EHD1-Mut PC12 cells compared to wild type, hindering endocytosis in a very dominant-negative way (Fig. 1B)2.

Additional in vivo experiments used knock-in mice to follow up on the behavioral implications of the mutations. A single base pair (bp) deletion was created in exon 5 of Ehd1 and a single bp insertion into exon 1 of Macf1 through homology-directed repair-mediated genome editing and CRISPR/Cas92. Behaviors such as hyperactivity and depressive episodes, attention rate, and delayed discounting were screened for in mutant F1 mice, using long-term wheel running analysis (WRA), and an IntelliCage System (ICS) (Fig. 2). For the WRA, changes in running activity on a wheel was measured in light and dark phases to reflect mood changes and behavior.

Figure 2.  Attention and delayed discounting tests in wild type and mutant EHD1 knock in mice. A. By poking the gated barriers, mice were allowed to drink water and attention was measured by their ability to poke the correct gate with accessible water. Light was turned on at different time intervals to facilitate recognition of the location of accessible water  B. Access to saccharin water was delayed and increased by 0.1 secs every day. Delayed discounting was measured by the ability of mice to wait patiently for the “better” reward, which was saccharin water, over choosing “normal” water which was readily available. Figure adapted from2.

In the ICS, mice could only drink water if they poked on a gate with an LED light as a test to measure attention2. Secondly, access to saccharin water was delayed as compared to access to “normal” water with incremental timing, to assess delayed discounting2. Surprisingly, neither Macf1– or Ehd1– knock-in mice showed significant manic or depressive-like behaviors during the wheel running experiment, however, hyperactivity was detected in Ehd1 mutant mice during the light phase. Significant behavioral changes were found during the IntelliCage tests for Macf1 mutant mice who had lower attention rates compared to wild type counterparts (Fig. 2)2. Here, functional evidence was found for the mutation paralleled in BP and attention deficits. Furthermore, an interesting phenomenon occurred in Macf1-mutant mice who engaged in a behavior called delayed discounting2. Here, Macf1-mutant mice compared to wild type mice, waited longer times for the reward of saccharin water, instead of choosing normal water which was more readily available (Fig. 2). Researchers explained that the opposing, quick gratification is usually expected, but suggested that this diminished delay discounting portrayed manic behavior where the reward-gratification system was impaired in Macf1-mutant mice2. Such instances have been reported in human cases of mania10. With such a non-conventional finding, further exploration into the identified de novo mutations is necessary considering the polygenic nature of BD.

Nakamura et al. were successful in being the first study to provide functional genetic evidence for the effect of two de novo mutations on behaviors such as hyperactivity, lower attention rate and delayed discounting seen in BD. Future directions will be geared towards functional studies for the other ultra-rare mutations and loci associated with BD. Genome editing strategies could be explored as therapeutic options, in addition to current “management strategies” for treating BD. Ultimately, the stigma around BD will decrease as our knowledge of the disorder increases.  


1.         Kato, T. Current understanding of bipolar disorder: Toward integration of biological basis and treatment strategies. Psychiatry Clin. Neurosci. 73, 526–540 (2019).

2.         Nakamura, T. et al. Functional and behavioral effects of de novo mutations in calcium-related genes in patients with bipolar disorder. Hum. Mol. Genet. 30, 1851–1862 (2021).

3.         Leclerc, J. et al. Prevalence of depressive, bipolar and adjustment disorders, in Quebec, Canada. J. Affect. Disord. 263, 54–59 (2020).

4.         Kataoka, M. et al. Exome sequencing for bipolar disorder points to roles of de novo loss-of-function and protein-altering mutations. Mol. Psychiatry 21, 885–893 (2016).

5.         Hu, L. et al. Isoforms, structures, and functions of versatile spectraplakin MACF1. BMB Rep. 49, 37–44 (2016).

6.         Naoki, H., Uegaki, K. & Ishii, S. Self-organization mechanism of distinct microtubule orientations in axons and dendrites. (2017) doi:10.1101/163014.

7.         Goryunov, D., He, C.-Z., Lin, C.-S., Leung, C. L. & Liem, R. K. H. Nervous-tissue-specific elimination of microtubule-actin crosslinking factor 1a results in multiple developmental defects in the mouse brain. Mol. Cell. Neurosci. 44, 1–14 (2010).

8.         Naslavsky, N. & Caplan, S. EHD proteins: key conductors of endocytic transport. Trends Cell Biol. 21, 122–131 (2011).

9.         Wu, C. et al. The importance of EHD1 in neurite outgrowth contributing to the functional recovery after spinal cord injury. Int. J. Dev. Neurosci. 52, 24–32 (2016).

10.       Abler, B., Greenhouse, I., Ongur, D., Walter, H. & Heckers, S. Abnormal Reward System Activation in Mania. Neuropsychopharmacology 33, 2217–2227 (2008).

Sequencing reaches new peaks

George Guirguis

Using Gaussian mixture model clustering and logistic regression, a novel method is now able to determine RNA post-transcriptional modifications with greater accuracy.

RNA, of which there are many types, serves many functions in the cell. For example, mRNA is a temporary ‘disposable’ copy of DNA that the cell needs to translate into a protein. For the cell to control characteristics of RNA, such as how long it can exist before being destroyed, the cell makes post-transcriptional modification (PTM) to the RNA1. While there are over 150 different possible modifications that can be made to RNA, it mostly undergoes methylation of the adenosine base, which is known as m6A2. In recent years, it has been shown that altered PTM of RNA is associated with a cornucopia of disorders and can be a biomarker1,3, emphasizing the need to be able to determine RNA PTMs.

Traditional lab methods used for determining PTMs, such as liquid chromatography-tandem mass spectrometry (LC-MS/MS)- the current gold standard, are laborious and prone to biases due to their complexity2,4. Instead, researchers use next-generation sequencing methods, such as Nanopore, to sequence long strands of DNA or RNA. This method involves a protein pore through which an RNA strand is pushed (Fig. 1). As the strand moves along the pore, there is a change in electrical current that is detected5. Important to note is that the magnitude of the current change is different based on the size of the bases, which can be further affected by the presence of modifications5. In addition to the differentially altered current, the dwelling time of the sequence – how long the nucleotides take to move across the Nanopore- is dependent on the identity of the sequence and the presence/absence of PTM5. The interpretation of this information requires specialized methods, which currently fall into one of two categories: de novo detection and comparative detection2. De novo methods make use of trained models to determine the nucleotide base2, whereas comparative methods simply compare the current at the same position of two sequences2. Overall, the available methods are either inaccurate in detecting RNA PTMs or are highly complex and are resource-intensive to compute2.

Figure 1 Nanopore sequencing. As a strand of RNA (or DNA) passes through the nanopore, there is a change of ion flow that changes the detected current5. Figure created using

To offer a more accurate and feasible method, Leger et al. aimed to develop a new analysis method that can detect RNA PTMs called Nanocompore2. This method uses a comparative strategy where a modified RNA sequence is compared to the same sequence with fewer (or no) modifications2. Nanocompore uses Gaussian mixture model clustering and logistic regression to determine the probability of a modification at a certain position from Nanopore sequencing data of two identical sequences with different levels of modifications. The group first validated Nanocompore through in silico and in vitro comparisons. Nanocompore identified m6A modifications with an average accuracy of over 94% and over 89% for other modifications2. Confident in the accuracy of Nanocompore, the next experiment compared Nanocompore to six other methods. The highest sensitivity was from the Eligos2 method at 45.8%, while Nanocompore came in at second with only 16%- a very large difference. However, Nanocompore had the highest specificity at 99.7%. To determine the overall accuracy, the F1 score was calculated, a score measuring accuracy from both sensitivity and specificity2. Despite having low sensitivity, Nanocompore had the highest F1 score, indicating that it has the highest overall accuracy of current methods.

Nanopore sequencing provides multiple benefits over other PTM detection methods,  primarily because there is no need to reverse transcribe the RNA sample using PCR, which is prone to errors2. Additionally, it does not use a trained model to determine the PTMs but uses a comparative model, which negates the need to train a model using more than 150 possible modifications.

However, this method is not without limitations. First, comparing two identical sequences with different levels of modification may sometimes be difficult if unable to acquire an appropriate comparative sequence. The authors recommend that a comparative sequence be generated in vitro using cells with knocked-out PTM genes. While this may be relatively feasible, it requires that a researcher know which gene to knock out. Second, despite having high overall accuracy, the sensitivity of the test is relatively low and needs to be improved.

The ability to detect RNA modifications accurately and cost-effectively can allow new diagnostic techniques to be employed. A novel study found pulmonary hypertension patient groups can be identified using signature RNA modifications3. This is a crucial discovery because the main treatment, called Pulmonary arterial hypertension-specific therapy, is ineffective for one of the groups6. The researchers in this study used LC-MS/MS to determine the RNA modifications. LC-MS/MS has a critical limitation that would hinder it from routine clinical use- it is not suitable for de novo detection of RNA modifications because it requires extensive sequence ladder preparations7. To ease the implementation of gene testing clinically, an easier and cheaper method that is sufficiently accurate is needed. Nanocompore can usher clinical diagnostic testing of RNA modifications due to its ease of use and lower cost. Overall, Nanocompore is a powerful method that allows the sequencing of RNA PTMs with greater accuracy compared to other methods but should be developed further to enhance its sensitivity.


1.        Jonkhout, N. et al. The RNA modification landscape in human disease. Rna 23, 1754–1769 (2017).

2.        Leger, A. et al. RNA modifications detection by comparative Nanopore direct RNA sequencing. Nat. Commun. 12, 1–17 (2021).

3.        Zhang, L. et al. RNA Modification Signature of Peripheral Blood as a Potential Diagnostic Marker for Pulmonary Hypertension. Hypertension 79, 1–3 (2022).

4.        Jora, M., Lobue, P. A., Ross, R. L., Williams, B. & Addepalli, B. Detection of ribonucleoside modifications by liquid chromatography coupled with mass spectrometry. Biochim Biophys Acta Gene Regul Mech 1862, 280–290 (2019).

5.        Varongchayakul, N., Song, J., Meller, A. & Grinstaff, M. W. Single-molecule protein sensing in a nanopore: a tutorial. Chem Soc Rev 47, 8512–8524 (2018).

6.        Prins, K. W., Duval, S., Markowitz, J., Pritzker, M. & Thenappan, T. Chronic use of PAH-Specific therapy in world health organization group III pulmonary hypertension: A systematic review and meta-analysis. Pulm. Circ. 7, 145–155 (2017).

7.        Zhang, N. et al. A general LC-MS-based RNA sequencing method for direct analysis of multiple-base modifications in RNA mixtures. Nucleic Acids Res. 47, E125–E125 (2019).

A machine-learning framework to detect enhancer-hijacking events

Sornnujah Kathirgamanathan

The computational identification of enhancer-hijacking events using chromatin interaction data from cancer genomes – which are subject to extensive rearrangement – can provide insight into oncogenic elements that can be targeted as a potential therapeutic.

            Widespread chromosomal rearrangements are a hallmark of cancer genomes, yet they remain exceedingly uncharacterized1. In an attempt to resolve the structural variation (SV) across diverse cancer types, in silico tools have been developed to detect genomic alterations that promote cancer progression1. Writing in Nature, Wang et al.2 present the computational framework NeoLoopFinder that predicts enhancer-hijacking events using chromosome conformation capture (Hi-C) data from cancer cell lines. This tool can identify cancer-driving oncogenes under the control of non-native enhancers, suggesting a potential role in the development of cancer therapies.  

            The term ‘enhancer-hijacking’ refers to novel gene-enhancer interactions that are induced by DNA aberrations, such as indels, inversions and translocations (Fig. 1)2. In normal cells, gene expression depends on the formation of chromatin loops to bridge the gap between genes and distant regulatory elements. However, the formation of novel chromatin loops brought upon by genomic rearrangements – termed “neoloops” by Wang et al. – can give rise to new interactions between genes and regulatory elements (enhancers in the case of enhancer-hijacking) that would not otherwise occur. An example of this can be found in lineage-ambiguous leukemia, where the overexpression of the oncogene BCL11B results from the juxtaposition of this gene near a super-enhancer, promoting cell proliferation3.

Figure 1. The activation of gene expression with enhancer hijacking. A) Expression of gene on a wild-type blue chromosome without local enhancers. B) Expression of gene now translocated onto red chromosome. The presence of local enhancers brought into proximity by chromatin loops upregulates expression.

            NeoLoopFinder makes use of Hi-C data — which reports the proximity of genomic loci to each other in 3D space — to identify physical chromatin interactions. After correcting the input data for copy number variation (CNVs) that may distort Hi-C signals, NeoLoopFinder integrates the coordinates of SV breakpoints to the data. This allows for the reconstruction of chromosomal maps with resolved SVs, giving a more accurate depiction of chromatin interactions in the cell. Next, NeoLoopFinder uses the machine-learning framework Peakachu4 to identify chromatin loops from the corrected Hi-C matrix. Their model is pre-trained on Hi-C data obtained under diverse protocol conditions (e.g. varying resolution size, in situ vs dilution methods), making it generalizable to any Hi-C protocol. The program outputs all predicted chromatin-loops – both existing and novel – which can then be integrated with H3K27ac ChIP-seq, DNase-seq and RNA-seq data from the ENCODE consortium to identify enhancer-hijacking events.

            The detection of enhancer-hijacking can not only identify known cancer-driving genes, but also reveal previously unknown oncogenes. Wang et al. report the discovery of novel tumour promoting genes that tend to cluster by cancer type. For example, neoloops containing EYA1 were found to be associated with gastric cancer and overexpression of this gene led to a decreased overall gastric cancer survival rate. This finding could translate into the identification of potential drug targets for cancer patients for whom a known cancer-driving gene cannot be identified.

            To test the validity of NeoLoopFinder, Wang et al. disrupted predicted enhancer-hijacking events in cancer cells using the CRISPR-Cas9 editing system. They focused on the known prostate oncogene ETV1, whose overexpression has been shown to drive cancer and lead to poor patient outcomes. Using the prostate cancer cell line LNCaP, the authors observed an approximately 16-fold increase in ETV1 expression compared to normal prostate cells.  Notably, the authors found no evidence of ETV1 gene fusions or duplication, suggesting some other mechanism of ETV1 overexpression. NeoLoopFinder identified a new chromatin loop bringing ETV1 in proximity to an enhancer, a result of a translocation between chromosomes 7 and 14. Deletion of the predicted enhancer led to an average decrease in expression between 66.3% – 80.5%, demonstrating NeoLoopFinder’s ability to accurately identify enhancer-hijacking events.

            Though chromosomal rearrangements are most often associated with cancer, they are also known to induce developmental disorders. To assess NeoLoopFinder’s utility in these conditions, the authors collected Hi-C data from previous studies involving SVs that cause limb malformations. NeoLoopFinder was able to generate an SV-resolved Hi-C map that closely resembled that presented in the original study. The authors also note that NeoLoopFinder detected potential enhancer-hijacking events between promoters and genes known to cause limb malformations, demonstrating the tool’s efficacy outside cancer genomes.

            NeoLoopFinder differs from other enhancer-hijacking prediction tools, such as CESAM5 and PANGEA6, by not requiring a large sample of SV profiles to generate its prediction model. Rather, it uses machine learning to identify neoloops which are used for enhancer-hijacking prediction, overcoming the need for large sample sizes. Moreover, NeoLoopFinder comes with a visualization module for integration with other ‘omics-level data, such as epigenetic modifications and contact heatmaps, facilitating the interpretation of NeoLoopFinder output with additional biological context. However, one limitation of NeoLoopFinder is that its accuracy varies based on the quality of input SV coordinates.  In addition, the machine-learning model is trained on Hi-C data and may not generalizable to SV data obtained through more modern techniques, such as optical mapping7.  Future iterations of this tool can look to expand the methods used to obtain SV training to expand the tool’s utility.

            Taken together, Wang et al. have developed a tool that will advance the study of genomes altered by large-scale SVs. It should be noted that the presented framework is not restricted to enhancer-hijacking events – NeoLoopFinder’s ability to resolve complex SVs allows it to detect an array of other chromatin-interactions, such as repressor-hijacking. Unveiling these interactions and understanding their impact on the cell can direct future work on therapeutic strategies.


1.        Elyanow, R., Wu, H. T. & Raphael, B. J. Identifying structural variants using linked-read sequencing data. Bioinformatics 34, 353–360 (2018).

2.        Wang, X. et al. Genome-wide detection of enhancer-hijacking events from chromatin interaction data in rearranged genomes. Nat. Methods 18, 661–668 (2021).

3.        Montefiori, L. E. et al.  Enhancer Hijacking Drives Oncogenic BCL11B Expression in Lineage-Ambiguous Stem Cell Leukemia . Cancer Discov. 11, 2846–2867 (2021).

4.        Salameh, T. J. et al. A supervised learning framework for chromatin loop detection in genome-wide contact maps. Nat. Commun. 11, 1–12 (2020).

5.        Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).

6.        He, B. et al. Diverse noncoding mutations contribute to deregulation of cis-regulatory landscape in pediatric cancers. Sci. Adv. 6, (2020).

7.        Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Genet. 21, 171–189 (2020).

Novel genetic associations between inflammatory bowel syndrome, anxiety, and mood disorders may offer therapeutic potential for patients

Yasmeen Kurdi

A robust genome wide analysis of 53, 400 individuals identify significant genetic overlap between Inflammatory Bowel Syndrome markers, anxiety, and mood disorders – pointing to a shared etiology

Inflammatory Bowel Syndrome (IBS) is a chronic functional disorder of the gastrointestinal system that is estimated to affect about 11% of the global population1. Individuals with IBS have significantly higher levels of anxiety and depression as compared to the general population2 – a correlation that is still chalked up to a “chicken or the egg” pathophysiology. The bi-directional communication between the central nervous system (CNS) and gut, often referred to as the brain-gut axis (BGA) has gained attention in recent years in the context of IBS. Observational studies on IBS have suggested psychological disorders may be associated with relapse of disease symptoms, and conversely that inflammatory activity is also associated with psychological disorders3.  A newly published study in Nature Genetics by Eijsbouts et al. 4 sheds light on this relationship by identifying susceptibility loci in IBS that share a genetic etiology with mood and anxiety disorders.

The authors designed a digestive health questionnaire (DHQ) for UK Biobank participants, including validated tools for IBS diagnosis and associated conditions, such as the widely used Rome III symptom diagnostic criteria4. From the DHQ, 40,548 participants met the diagnostic criteria for IBS. Analysis of associated conditions on this cohort revealed that anxiety and depression were about twice as common in cases of IBS compared with control participants – 34.3% compared to 16.1% respectively. The study further noted that this trend was even more prominent when only considering medically diagnosed cases of IBS, as opposed to the broader DHQ which included self reported IBS (Fig. 1).

Figure 1 | This figure displays the genetic correlations (rg) found between IBS and different definitions of anxiety or major depressive disorder from Biobank participants meeting IBS diagnostic criteria identified using the Diagnostic Health Questionnaire (DHQ). IBS was strongly correlated with anxiety, independent of how anxiety cases were defined through different elements of the DHQ. Included control cases did not have anxiety by the above definitions. This figure was obtained from Eijbouts et al.4

Eijsbouts et al. conducted a genome wide association study (GWAS) of 53 400 identified cases of IBS and 433 201 controls, identifying 6 independent susceptibility loci at genome wide significance indicated in IBS4 (Fig.2). Furthermore, using a 23andMe cohort of participants that received an IBS diagnosis, all 6 loci identified through GWAS were replicated at Bonferroni significance, proving these correlations to be notably robust. Strikingly, four of the six loci reported also had prior associations with mood and anxiety disorders,including NCAM1, CADM2, PHF2/ FAM120A, and DOCK94. These four loci are either associated with mood and anxiety disorders or expressed in the nervous system or a combination of both traits. To better investigate this relationship, the researchers used expression colocalization analysis – a method used to test if 2 independent associations at a locus are consistent with a shared causal variant – and found evidence that the 6 IBS associated loci regulate gene expression across different tissue, with many particularly expressed in the brain4. As another precaution, the researchers eliminated data from individuals with overlapping phenotypes and found the genetic correlation between IBS and anxiety persisted – reflecting that co-occurrence of the disorders is suggestive of a shared etiological pathway as opposed to one causing the other.

Figure 2 | This figure displays a Manhattan plot showing the distribution of SNPs associated with IBS across the genome. The red line represents the threshold at which an association is considered significant genome-wide at p =5×10-8. 6 loci were found to be associated with IBS at genome wide significance including: CADM2, MHC/BAG6, PHF2/FAM120A, NCAM1, CKAP2/TPTEP3 and DOCK9. This figure was obtained from Eijbouts et al.4

Serotonin (5-HT) is an important neurotransmitter and signalling molecule crucial to the function of the gastrointestinal track5. Altered 5-HT signalling in the intestine and extra-intestinal area can impact motor and secretory functions, resulting in common IBS symptoms such as diarrhea and constipation5. Similarly, serotonin is well established in regulating other physiological processes such as sleep and mood and is therefore a common target in the treatment of psychological disorders. An important component of serotonergic pathways are neural cell adhesion molecules (NCAM), encoded by the NCAM1 gene, identified as being correlated with IBS at genome wide significance by Eijbouts et al.4. A study conducted by Aonurm-Helm et al.6 looked at the changes in serotonergic pathways in NCAM knock-out mice. Interestingly, the mice with the NCAM knockout exhibited down-regulation of serotonin and its major metabolites regulated in the brain6. The researchers also reported that chronic administration of the anti-depressant amitriptyline, which works to increase serotonin in the brain, partially restored levels of serotonin in the knock-out mice, if not completely6. The role of NCAMs in the serotonergic system is just one example of a potentially shared pathophysiology between IBS, anxiety, and mood disorders.  

Eijsbouts et al4 provide ground-breaking new insights from their polygenic analyses of genetic markers indicated in inflammatory bowel syndrome. The results of this robust and large-scale GWAS implicate novel target genes and pathways for both IBS and potentially mood and anxiety disorders, such as that of neuronal adhesion molecules. Serotonin impairment for example is a common feature to both IBS and a plethora of psychological disorders. The study conducted by Aonurm-Helm et al.6 demonstrated how administration of antidepressants could restore down-regulated serotonin due to NCAM knock-out in mice. Further research is needed to discern the exact role of the NCAM1 gene in IBS, but it would be an interesting path to explore given its role in the shared serotonergic pathway. Targeting loci indicated in both disorders may be an effective therapeutic option for individuals who have both diagnoses. These newfound genetic correlations between IBS and psychological disorders like anxiety may offer rationale to further explore the potential of psychoactive medications and behavioural therapies in the treatment of patients exhibiting these conditions.


  1. Canavan, C., West, J., & Card, T. The epidemiology of irritable bowel syndrome. Clinical epidemiology6, 71–80 (2014).
  2. Fond, G., Loundou, A., Hamdani, N. et al. Anxiety and depression comorbidities in irritable bowel syndrome (IBS): a systematic review and meta-analysis. Eur Arch Psychiatry Clin Neurosci 264, 651–660 (2014).
  3. Gracie, DJ., Hamlin, PJ., Ford, A. The influence of the brain-git axis in inflammatory bowel disease and possible implications for treatment. Lancet Gastroenterol. Hepatol 4(8), 632-642 (2019).
  4. Eijsbouts, C., Zheng, T., Kennedy, N.A. et al. Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders. Nat Genet 53, 1543–1552 (2021).
  5. Crowell, Michael D. “Role of serotonin in the pathophysiology of the irritable bowel syndrome.” Br. J. Pharmacol. 141(8),1285-93 (2004).
  6. Aonurm-Helm A, Anier K, Zharkovsky T, et al. NCAM-deficient mice show prominent abnormalities in serotonergic and BDNF systems in brain – Restoration by chronic amitriptyline. Eur Neuropsychopharmacol. 25(12), 2394-2403 (2015).
  7. Matthew D. Coates., et al.  Molecular defects in mucosal serotonin content and decreased serotonin reuptake transporter in ulcerative colitis and irritable bowel syndrome. Gastroenterology126, 1657-1664 (2004).
  8. Frederico G. Graeff., et al. Role of 5-HT in stress, anxiety, and depression. Pharmacology Biochemistry and Behavior 54, 129-141 (1996).

Ultra-rare variants provide a new perspective on autism heritability

Radhika Mahajan

Studies of variants with de novo mutations have driven the majority of autism gene discoveries. A recent study identified heritable ultra-rare variants as a cause of large-effect risk variation and their implication in new autism candidate genes.

Autism spectrum disorder (ASD) is a complex, heritable and both genetically and phenotypically heterogeneous neurodevelopmental disorder. Diagnoses of the disease have increased in prevalence over the past decade, affecting almost 1 in 160 children worldwide, with males being at a higher risk than females.1 ASD patients often struggle with social communication and interaction skills, although symptoms vary widely among affected individuals.1 The genetic etiology behind ASD is ambiguous, however, most of the sequencing studies have focused on the impact of de novo mutations (DNMs), which include single nucleotide (SNVs) and copy number variants (CNVs).2 As DNMs only account for about 20% of the ASD cases in simplex families (only one individual affected)3, they do not dispel the mystery behind the ‘missing heritability’ in autism. Therefore, new research focuses on other genetic variations that may contribute to autism risk.

In a recent study,4 Wilfert and colleagues reported ultra-rare and private, likely gene-disruptive (LGD) variants, inherited in both simplex and multiplex (more than one affected individual) families. In this study, LGD variants include stop-gain, stop-loss, frameshift indels or splice-altering SNVs. These private variants are passed on from non-autistic parents to their affected child and are unique to a family, thus implicating the role of new candidate genes contributing to autism. To identify variants, the authors merged two datasets: one from whole-genome sequencing (WGS) data of 645 simplex and multiplex autistic families, and the other from 3,474 families from the Centers for Common Disease Genomics (CCDG). To improve the ability to detect enrichment of LGDs in ASD children, their analysis was restricted to protein-coding autosomal regions. The findings showed that the probands were ~1.3 times more likely to inherit the LGDs and other missense mutations, than their unaffected siblings. This burden increase was explained by transmission bias5 (deleterious variants are more often transmitted from mothers to affected children than from fathers), which was established through a rare variant transmission disequilibrium test. Although the effect size was 8 times smaller than the previously reported DNMs3, at least 4.5% of autism risk could be attributed to the private LGD variants. According to prior studies on LGD variants6, the risk level was comparable to the level reported for DNMs. Hence, the result is suggestive of the fact that rare inherited variations are an important factor in ASD, albeit an understudied one.

To discover new genes, researchers in this study excluded DNM-enriched genes implicated in ASD and other neurodevelopmental disorders. The results indicated that the probands retained almost 95% of the variant burden in the discovery cohort. A total of 163 genes contained these variants, none of which overlapped with the DNM enriched genes. The most striking discovery from this study was that the novel candidate gene set forms a protein-protein interaction (PPI) network (Figure 1). Almost half of these genes were found associated with various functional pathways known to be implicated in autism. The network included a few genes in the E3 ubiquitin ligase pathway, which is important in the regulation of protein modification.7 Other genes formed a network within the intercellular transport and Erb signalling pathways, which are involved in neurotransmission.7 The remainder of the gene dataset is expressed in specific neuronal types during brain development. Overall, a majority of the findings suggest that privately transmitted LGD variants and DNMs affect different genes, but interestingly converge on the same pathway.

Figure 1: Protein-protein interaction (PPI) network formed by the novel candidate gene set.4 The figure depicts the interaction network formed by the 163 identified genes, where most of them converged on various functional pathways, previously implicated in autism. These novel genes contained the private LGD variants that were absent in previously studied autism genes with de novo mutations. A gene name is coloured if it is detected in two (blue) or three (red) affected individuals. Asterix denotes labelling if the gene is found in two (*) or more (**) independent families.

Lastly, due to the increased likelihood of allele sharing among individuals with similar ancestry8, different ancestral groups were also examined in this study for the prevalence of the private LGD variants. Not surprisingly, children of European ancestry had less private variation per genome, given the high proportion of Europeans in the discovery cohort (~86%). Research of this kind also highlights the need to expand diversity in the study populations, as rare variants are harder to detect in understudied groups of people of African, East, and South Asian descent.

This paper talks about unique genetic variants that have yet not been studied in the context of autism and revealed new candidate genes. The results are in line with previous studies that examined inherited variations, but limited their analysis to the exome data. Moreover, as the cohort composition changes, so do the type and degree to which each variant class contributes to ASD risk. For accurate family planning and genetic counselling, it will be essential to understand the diversity of genetic etiologies and phenotypic outcomes associated with autism. To improve the sequencing efforts at a large scale, it will be necessary to recruit more multiplex families. Furthermore, the results challenge the assumption that rare variants and de novo variants impact gene expression in similar ways. While de novo and private variant classes appear to converge on related functional pathways, this data suggests that they modulate distinct gene sets in autism pathogenesis. Finally, the authors in this study brought a paradigm shift to the understanding of autism and opened new research paths for other complex disorders.


1.         De Rubeis, S. & Buxbaum, J. D. Genetics and genomics of autism spectrum disorder: embracing complexity. Hum. Mol. Genet. 24, R24–R31 (2015).

2.         Rylaarsdam, L. & Guemez-Gamboa, A. Genetic Causes and Modifiers of Autism Spectrum Disorder. Front. Cell. Neurosci. 13, 385 (2019).

3.         Turner, T. N. et al. Genomic Patterns of De Novo Mutation in Simplex Autism. Cell 171, 710-722.e12 (2017).

4.         Wilfert, A. B. et al. Recent ultra-rare inherited variants implicate new autism candidate risk genes. Nat. Genet. 53, 1125–1134 (2021).

5.         Iossifov, I. et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc. Natl. Acad. Sci. U. S. A. 112, E5600-5607 (2015).

6.         Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).

7.         Pinto, D. et al. Convergence of Genes and Cellular Pathways Dysregulated in Autism Spectrum Disorders. Am. J. Hum. Genet. 94, 677–694 (2014).

8.         Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. 108, 11983–11988 (2011).