Using Gut Microbiome for Improved Cancer Treatment

Lekshmi Mohan

Notably the first investigational study showing a relationship between cancer patient prognosis and gut microbiome heterogeneity. This study signifies the intricate relationship between gut microbiome and tumor immunotherapy, specifically in patients withmelanoma, non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC).

In recent times, the scientific community has become more focused on our microbiome to increase effectiveness of various kinds of therapy, one of them being cancer treatment. There is a huge heterogeneity among patient responses undertaking cancer treatment (immune checkpoint inhibitor therapy) ranging from 13% to 69%¹, which urges us to use the existing medications in a way that provides maximum benefit. Antibodies such as pembrolizumab, ipilimumab, and nivolumab are used to treat different types of malignancies including gastric cancer and melanoma as their initial therapies, which demonstrated an exceptional increase in patient survival². Scientists find the interplay between microbiome and our genome, to understand if any structural variants in the microbes would have an impact on cancer therapy. Liu et al.² reports on the metagenomic wide association analysis between structural variants (SV) in the gut microbiome and host’s response to immune checkpoint inhibitors (ICIs). Metagenomics is the study of whole nucleotide sequences from any organism, usually microbes found on human skin, soil, or soil.

Immune checkpoints help keep our immune response from being too strong to prevent tissue damage and may also stop our T cells from attacking the cancer cells. When these checkpoints are blocked (by using ICIs), the T cells can kill cancer cells better. Programmed cell death 1 (PD-1), cytotoxic T lymphocyte associated antigen-4 (CTLA-4), and programmed cell death 1 ligand 1 (PD-L1) are the main protein targets of ICIs to suppress tumour immune escape. Through number of clinical and preclinical evidence, the gut microbiota has shown to influence antitumour immunity and effectiveness of ICIs in melanoma, NSCLC³ and RCC⁴. The study of gut microbiota from centuries influenced its usage in managing different cancer types thus leading to promising advancements using ICI immunotherapy. Liu et al. emphasizes on the potential of altering the gut microbiota for augmenting antitumour immune response and for improving the effectiveness of ICIs².

His team studied the association between SV and relative abundance of gut microbiome with the host’s response to ICIs. This was done by gathering raw metagenomic data from a range of cancer types and research comprising 996 individuals of European ancestry as seen from figure 1. The factors such as response, immune related adverse events (irAEs), progression free survival for 12 months (PFS12- meaning there was no disease progress 12 months after ICI treatment⁵), and overall survival were used as clinical outcomes. The results from his team showed profound significance.

From the metagenomic association study, there were 48 significant associations, including 31 bacterial species that had correlations between the clinical outcome of ICI treated patients and relative abundance of species.For NSCLC, B. wexlerae was found to be one of the most significant in terms of positive response to ICIs, along with the abundance and SV of R.lactaris and D.invisus and B.adolescentis. The SV of these species were also found to be associated with toxicity and therapeutic resistance in ICI treated patients. For melanoma, D. formicigenerans was found to be significantly associated with response and PFS12, along with B.wexlerae and R.gnavus, whose SV differences and abundance were associated with irAEs. These findings were confirmed from previous research where they found the same association of D. formicigenerans with response to ICIs, PFS12, and irAEs⁶. This study highlights the importance of the metagenomic SVs providing strong data for studying the functionality of the gut microbiome as seen from figure 2.

Fig 2: Species level heatmap association of gut microbiome with host’s response to ICIs of (a) melanoma, (b) NSCLC and RCC²

It was also found that the SV of different species can be related to the prognosis of melanoma, NSCLC and RCC, but not with their relative abundance scale. P.distasonis associated with irAEs in melanoma, and response to ICIs in the case of NSCLC showed that their abundance was not related to prognosis. The overall results indicated that the species-specific SV was found to be associated with ICIs drug responses, independent from their taxonomic abundances. Additionally, the gene HMPREF1032_00306, which encodes phage/plasmid primase, located on the genome (SV: 2121 to 2122 kbp) of Subdoligranulum, showed that the deletion of this fragment is linked with lower response and PFS12 rates².

Despite all the advances in the promising results, there are also several challenges that could be taken into consideration for future research. Since gut microbiota samples were taken from different cohorts, and a wide range of diverse microbes were studied – there is a tendency that the results could be more generalised, and the specificity factor could be incorporated. Parallelly, only cohorts from European ancestry were studied which lead to biased results. Also, the most common gut microbiota is being investigated, but we would potentially need to study the less common ones as well, and the challenge remains in the difficulty of isolating and culturing these species. Additionally, larger cohort sizes would require to be considered to obtain larger statistical power for the validity of the associations found.

Despite the challenges and complexities of conducting such research, the use of SV in gut microbiota is a revolutionising step for the improvement of cancer treatment, and the research done by Liu et al. paves the path of microbiome modulation in this field of study.

References

Chen, E. Y., Raghunathan, V. & Prasad, V. An Overview of Cancer Drugs Approved by the US Food and Drug Administration Based on the Surrogate End Point of Response Rate. JAMA Intern. Med. 179, 915–921 (2019).
Liu, R. et al. Gut microbial structural variation associates with immune checkpoint inhibitor response. Nat. Commun. 14, 7421 (2023).
Davar, D. et al. Fecal microbiota transplant overcomes resistance to anti-PD-1 therapy in melanoma patients. Science 371, 595–602 (2021).
Allen-Vercoe, E. & Coburn, B. A Microbiota-Derived Metabolite Augments Cancer Immunotherapy Responses in Mice. Cancer Cell 38, 452–453 (2020).
Mariathasan, S. et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature 554, 544–548 (2018).
Frankel, A. E. et al. Metagenomic Shotgun Sequencing and Unbiased Metabolomic Profiling Identify Specific Human Gut Microbiota and Metabolites Associated with Immune Checkpoint Therapy Efficacy in Melanoma Patients. Neoplasia N. Y. N 19, 848–855 (2017).
NCI Dictionary of Cancer terms. National Cancer Institute Available at: https://www.cancer.gov/publications/dictionaries/cancer-terms/def/immune-checkpoint-inhibitor. (Accessed: 16th February 2024)

Matters of the Brain: Unwinding the Genetic Overlap of Brain White Matter and Psychiatric Disorders

Aastha Patel

A new study describes the genetic relationship of atypical connection of brain pathways and psychiatric disorders such as bipolar disorder, major depressive disorder, and schizophrenia, broadening our understanding of the pathophysiology of these psychiatric disorders.

Our understanding of psychiatric disorders is fraught with many challenges, including co-morbidity amongst disorders and lack of evidence for its diagnostic criteria. Disorders such as bipolar disorder (BD), major depressive disorder (MD), and schizophrenia (SZ) can benefit greatly from investigation of its genetic basis. Structural brain abnormality of BD, MD, SZ have all been previously characterized in which specifically the white matter pathways are affected¹. White matter pathways consist of myelinated axons that conduct nerve signals and represent connectivity throughout the brain and spinal cord. Parker and colleagues² utilize genome-wide association study (GWAS) data of the three psychiatric disorders and white matter pathways to understand the genetic involvement of structural connectivity in disorder pathophysiology. Their findings suggest that a genetic basis for white matter connectivity and psychiatric disorders broadens our understanding of the genetic architecture of BD, MD, and SZ.

Regions of the brain are highly interconnected through a complex network of white matter vital for the optimal functioning of the brain. A deviation in this interconnectedness can occur during development of the brain, resulting in outcomes varying from atypical cognitive, emotional, and behavioural functioning. Additionally, deviation in connectivity has been linked to psychiatric disorders, referred to as the dysconnectivity theories of psychiatric disorders³.

A prominent and well-practised method to visualize abnormality in connectivity has been through neuroimaging methods such as magnetic resonance imaging (MRI). To specifically visualize variations in white matter regions at the microstructural level, diffusion tensor imaging (DTI), an MRI technique, can be utilized. Fractional anisotropy (FA) is a measure of DTI involved in the measuring of directional displacement of water molecules (Fig. 1). The movement of water diffusion is highly organized in white matter tracts allowing for researchers to evaluate the integrity of connectiveness in white matter pathways⁴. The dysconnectivity theories of psychiatric disorders have been backed up with FA studies in which reductions of FA, and therefore white matter connectivity, have been characterized in BP, MD, SZ¹.

**Figure 1 | From axons to tracts** Beaulieu⁴showcases anisotropic water diffusion in micrometers (left), which is measured with DTI over resolutions of several millimeters in voxels (center) and is used to extract white matter tracts at the whole brain level (right).

The genetic relationship between these psychiatric disorders and white matter FA measures can be studied to provide insights to disease pathophysiology. BP, MD, SZ are disorders that have heritable components ranging from 41-80%⁵. In the same way, FA measures averaged across all white matter tracts is also heritable and can be as high as 88%⁶. To note, the genetic correlation between psychiatric disorders and white matter FA have been measured to be weak to moderate in the context of the entire genome⁷. This may result from the oversight of informative directional correlations within a smaller genomic region, which, when averaged, lead to a non-significant genome-wide genetic correlation. The authors ensure to utilize methods agonistic to effect direction, referred to as non-correlative genetic overlap, to capture these informative directional correlations that previous studies may have overlooked.

Parker et al. conducted various statistical analysis to capture genetic correlations on GWAS data sourced from Psychiatric Genomics Consortium and UK Biobank. They first characterized the genetic architecture of each phenotype (BD, MD, SZ, average FA), and from their results they noted that disruption in white matter pathways is likely intermediary to the relationship between genetic variations and psychiatric disorders. By taking into account non-correlative genetic overlap, the authors were able to establish that the average FA variants shared 42.5%, 42%, and 90.7% of overlap with the variants of BD, MD, and SZ respectively (Fig. 2). To put these findings into context, consider that the overlap of variants between average FA variants and blood glucose levels is only 7.8%. Glucose is the brain’s main source of energy and is therefore a good comparator in the context of brain connectivity.

**Figure 2 | Tract-level genetic overlap beyond genetic correlation a** Brain maps displaying the proportion of trait-influencing variants for each white matter tract that overlap with trait-influencing variants for each psychiatric disorder, b Brain maps displaying the number of shared genetic loci between each disorder and each white matter tract. Figure taken from Parker et al.².

The results suggest shared biological mechanisms between FA and the psychiatric disorders. Through gene mapping and enrichment analysis, they speculate that potential biological mechanisms related to the development and maintenance of white matter are affected in fetal brain development, in which shared variants are enriched in neuronal cell types such as astrocytes and microglia. SZ exhibits a slightly different relationship in neurodevelopment with an increase in expression postnatally and in adulthood of neuronal cell types related to myelin.

Through this study, a genetic basis links the disruption of brain connectivity to psychiatric disorders such as BD, MD, and SZ. Nonetheless, the paper is constraint by its limitations. Firstly, the data disproportionally utilizes GWAS data of European participants, limiting the generalizability of the findings. Secondly, the power of each phenotype’s GWAS could benefit from a larger sample size, allowing for improved robustness of comparative analyses and further genetic discovery.

With the growing accumulation of samples, GWAS becomes a genetic tool that allows us to understand the complex architecture of psychiatric disorders. This opens up avenues of potential reclassification of these disorders and their subgroups, and more importantly the introduction of precision medicine for prevention and treatment of these disorders through identification of affected white matter pathways. The current study captures further genetic linkage of psychiatric disorders to brain white matter that have allowed a better understanding of affected biological mechanisms in BD, MD, and SZ, and point to areas of further research in these disorders. The use of genetics in psychiatry holds a promise in the improvement of our understanding in these disorders, laying the groundwork for improved patient care.

References

Kelly, S. et al. Widespread white matter microstructural differences in schizophrenia across 4322 individuals: results from the ENIGMA Schizophrenia DTI Working Group. Mol Psychiatry 23, 1261–1269 (2018).
Parker, N. et al. Psychiatric disorders and brain white matter exhibit genetic overlap implicating developmental and neural cell biology. Molecular Psychiatry (2023).
Catani, M. & ffytche, D. H. The rises and falls of disconnection syndromes. Brain 128, 2224–2239 (2005).
Beaulieu, C. The biological basis of diffusion anisotropy. Diffusion MRI 155–183 (2014).
Johansson, V. et al. A population-based heritability estimate of bipolar disorder – in a Swedish twin sample. Psychiatry Research 278, 180–187 (2019).
Vuoksimaa, E. et al. Heritability of white matter microstructure in late middle age: A twin study of tract‐based fractional anisotropy and absolute diffusivity indices. Human Brain Mapping 38, 2026–2036 (2016).
Zhao, B. et al. Common genetic variation influencing human white matter microstructure. Science 372, (2021).

Decoding the Blueprint: Unraveling the Core Gene Regulatory Network in CEBPAN/C Acute Myeloid Leukemia

Yash Patel

Identification of the core Gene Regulatory Network of double mutant CEBPA^N/C Acute Myeloid Leukemia holds the key to the next phase of targeted therapy and precision medicine. Researchers identified the association of CEBPα, RUNX1, and AP-1 transcription factor proteins as having a key role in the maintenance of the cancer which is important in developing new therapies for treatment.

Acute myeloid leukemia (AML) is a common disorder of the bone marrow, comprising of more than 23% of total leukemia cases globally in 2017¹. AML has experienced an 87% increase in incidence and a 93% increase in mortality since the 1990s, making this malignancy an increasing challenge for today’s healthcare providers¹. There are numerous AML subtypes, however, double mutations in the CEBPAgene (CEBPA^N/C)mutations are associated with 8-20% of all AMLs^2,3. CEBPA encodes for the CCAAT enhancer binding protein alpha (CEBPα) transcription factor (TF), which are proteins involved in regulating the expression of genes³. Adamo et al. were able to determine that the association between CEBPα, AP-1, and RUNX1 transcription factors is part of a core gene regulatory network (GRN) shared between all CEBPA^N/C AML patients². GRNs are the complete set of regulatory interactions between TF and genes and they uncover how CEBPA^N/C AML implements the genetic code to its benefit^2,4. With the increase in sequencing data, GRNs can greatly aid in the discovery of critical biomarkers that may be used as diagnostic, prognostic, or therapeutic targets in the context of the specific disease.

Double mutant CEBPA gene creates a shortened version of the CEBPα TF, which presents in a dominant phenotypic manner over the unmutated version (Figure 1)^2,3. This mutation disrupts the normal function of the protein and sets the stage for dysregulated hematopoiesis (maturation of blood stem cells), fueling the transformation of blood stem cells into leukemic cells². This mutant CEBPα forms a complex with AP-1 and RUNX1 which ensures the survival of CEBPA^N/C AML cells². CEBP and AP-1 are important cofactors that drive the maturation process of myeloid progenitor stem cells which end up forming part of our immune system². As AP-1 is implicated in many different disorders, it may be a good target for the treatment of AML and many other disorders. While there have been small molecules investigated for their potential to inhibit AP-1 action, there remains a gap in a clinically approved solution⁵. As AML has a large disease burden, this discovery may restart work into the development of an effective improvement as there are additional financial incentives for pharmaceutical companies.

Figure 1: The difference between CEBPα protein found in healthy cells vs CEBPα protein found in CEBPA^N/C AML. CEBPα has three main domains. Two TAD domains that are responsible for transcription activation and a bZIP domain responsible for DNA binding and dimerization. As seen in the Double Mutant CEBPα protein, mutations occur in the TAD1 domain and the bZIP domain, which causes the improper function of the protein and the maintainence of AML. Image created for Biorender.

Another transcription factor important in CEBPA^N/C GRN is RUNX1, a master-regulator transcription factor that is irreplaceable for the normal formation of blood components^2,6. RUNX1 controls the expression of genes critical in blood cell maturation, ribosome biogenesis, cell cycle regulation, and many more, making this a very important⁶. While it is an important co-factor in the maintenance of CEBPA^N/C, targeting this TF requires delicate handling due to potential off-target effects that may harm neighbouring normal cells. In cancer cell models, Adamo et al were able to demonstrate that using a small molecule inhibitor for the RUNX family of proteins strongly reduced AML cell growth, when compared to the health control cell lines². However, there is still no clinically available inhibitor currently for the wild-type RUNX1 protein that may be used for AML treatment. These results suggest that a delicate interaction of RUNX1, AP-1 and CEBPα proteins is essential for the regulation of CEBPA^N/C AML². AML is a complex web of genes, proteins, and cellular proteins synergistically interacting to avoid immune responses. This newfound information is the key to unlocking novel interactions that help further our understanding of not just AML, but also human biology in general.

AML treatment has traditionally been administered in two phases; remission induction therapy (chemotherapy) and post-remission therapy⁷. According to the National Cancer Institute. the 5-year survival rate remains low at 31.7%, indicating a need for better treatment for this disease⁸. Chemotherapy, followed by hematopoietic stem cell transplantation is the gold treatment strategy for AML⁹. However, there are high rates of graft-vs.-host disease and relapse due to donor rejection⁹. As leukemia is a heterogeneous disorder, there is great variability in the effectiveness of therapy in a patient-to-patient case¹⁰. CEBPα, RUNX1, and AP-1 provide three directly implicated druggable targets for the treatment of CEBPA^N/C AML. Furthermore, as these proteins are transcription regulators, further analysis of downstream targets can provide a detailed mechanism of AML maintenance that can be further exploited and understood. While there were previously investigated small molecule inhibitors for both RUNX1 and AP-1, gene therapy and other targeted delivery mechanisms such as viral vectors may provide a suitable delivery system to reduce off-targeted effects. The goal of Adamo et al.’s research is to fully understand the complex web of regulatory genes that are involved in the maintenance in AML to improve patient outcomes in AML, something current, standard treatments are not able to achieve. Fully understanding the mechanism provides researchers with an avenue to produce novel gene-therapy based systems that can systemically target the diseased cells. Understanding GRNs are genetic disorders is the key to achieving the next step of precision medicine and the next step in the future of healthcare.

References

1. Chaurasiya, P. S. et al. Prevalence of acute myeloid leukemia and its associated risk factors at a tertiary care center: a retrospective cross-sectional study. Ann. Med. Surg. 85, 4794–4798 (2023).

2. Adamo, A. et al. Identification and interrogation of the gene regulatory network of CEBPA-double mutant acute myeloid leukemia. Leukemia 37, 102–112 (2023).

3. Su, L., Shi, Y.-Y., Liu, Z.-Y. & Gao, S.-J. Acute Myeloid Leukemia With CEBPA Mutations: Current Progress and Future Directions. Front. Oncol. 12, 806137 (2022).

4. Peter, I. S. & Davidson, E. H. A gene regulatory network controlling the embryonic specification of endoderm. Nature 474, 635–639 (2011).

5. Ye, N., Ding, Y., Wild, C., Shen, Q. & Zhou, J. Small Molecule Inhibitors Targeting Activator Protein 1 (AP-1). J. Med. Chem. 57, 6930–6948 (2014).

6. Gonzales, F. et al. Targeting RUNX1 in acute myeloid leukemia: preclinical innovations and therapeutic implications. Expert Opin. Ther. Targets 25, 299–309 (2021).

7. Acute Myeloid Leukemia Treatment – NCI. https://www.cancer.gov/types/leukemia/patient/adult-aml-treatment-pdq (2024).

8. Acute Myeloid Leukemia – Cancer Stat Facts. SEER https://seer.cancer.gov/statfacts/html/amyl.html.

9. Kang, S. et al. Antigen-Specific TCR-T Cells for Acute Myeloid Leukemia: State of the Art and Challenges. Front. Oncol. 12, 787108 (2022).

10. Cobaleda, C. et al. A primitive hematopoietic cell is the target for the leukemic transformation in human Philadelphia-positive acute lymphoblastic leukemia. Blood 95, 1007–1013 (2000).

Sleep and Genetics: Exploring the role of genes in sleep-related traits

Sananda Pragalathan

Study unravels the genes associated with eight common sleep-related traits. Exome-wide association study revealed twenty-two new genes that were discovered and had significant associations with psychiatric, cognitive, metabolic, and inflammatory traits.

Sleep is a complex physiological process that is essential in maintaining our physical and mental health. Sleep traits vary between individuals whether it is a sleep quality or sleep disorders, like insomnia¹. Over the years, studies have shown a drastic decrease in the duration of sleep and an increase in sleep disorders². In a recently published Nature article, Fei et al. explored the genetics behind eight common sleep traits including, an individual’s natural preference of sleep and wake times (chronotype), sleep duration, daytime sleepiness, daytime napping, ease of getting up in the morning, and sleep apnoea³. These traits reflect the different dimensions of sleep quality and quantity, as well as circadian preference and sleep disorders. The authors performed a large-scale whole exome sequencing (WES) analysis to understand the genetic architecture behind the sleep traits (Figure 1)³.

Numerous genome-wide association studies (GWAS) have been performed to understand the genetics of sleep and its phenotypes. However, a major drawback of these studies is that GWAS only looks at common genetic variants and these variants are primarily located in the non-coding regions of the genome. In contrast, WES focuses on rare and common variants in protein-coding sequences⁴. Based on this advantage, Fei et al. developed a study plan where phenotypic and genotypic data of almost 500,000 individuals from the UK Biobank were analyzed³. These samples were analyzed using exome wide association analysis and processed further for ancestry analysis, burden heritability, biological consequences and phenome-wide association study (PheWas).

Figure 1. Study Design: a) Collection of phenotypic and genotypic data from UK Biobank. b) The exome wide association analysis of the sleep-related traits using both single variant test and gene-based test These results were validated using FinnGen Cohort. Additional analyses including sex specificity, race specificity, leave -one-variant-out (LOVO) analysis and conditional analysis. c) analysis of biological consequence of the genes d) the measurement burden heritability. e) phenome-wide association analysis of the identified genes³.

The study employed single variant exome wide association to identify the genetic loci that are associated with the eight sleep-related traits and a gene-based test was performed to aggregate the genes associated. Out of the 68 genes found, 22 new genes were identified and not previously found in any GWAS studies. The 22 new genes associated with the sleep traits were, 1) ADGRL4, COL6A3, CLK4 and KRTAP3-3 associated with chronotype, 2) ST3GAL1 and ANKRD12 associated with daytime sleepiness, 3) PLEKHM1, ANKRD12 and ZBTB21 associated with daytime napping, 4) WDR59 was associated with snoring, 5) HES4, PLCH2, C1orf167, CYP2J2, ARHGAP29, UBL4B, CD53, CGN, FLG-AS1, USH2A, TLR5, LYST and MTR associated with sleep apnoea. The validity of these associations was verified using GWAS results from the FinnGen cohort, confirming twenty of these associations³.

Multiple studies have shown sleep to have moderate heritability^3-5. Heritability is known as the proportion of phenotype variability that is a consequence of genetic factors³. The heritability of theses traits was analyzed using burden heritability regression (BHR) to find the contribution of each gene to the trait by combining all the variants of the gene. For example, insomnia had the highest total burden heritability. In addition, the results from BHR highlighted the genetic correlation between chronotype and ease of getting up, which also mirrored the results from the gene-based test.

The authors further explored the biological functions and pathways of the identified genes using gene ontology enrichment analysis. They found that most of these genes were enriched in processes including cognitive function and circadian clock and were found in other physiological processes like metabolic, inflammatory, and psychiatric³. For example, ADGRL4 is involved in synaptic plasticity and neuronal migration whereas CLK4 is a key regulator of circadian transcription^6,7. Moreover, a PheWAS was carried out to examine the phenotypic associations of the genes with the sleep traits. ANKRD12 was associated with both daytime sleepiness and daytime napping, as well as cognitive performance, inflammatory biomarkers, and lung function³. This suggests that ANKRD12 may play a role in sleep, cognition and inflammation, which are known to affect sleep⁸.

Fei et al. identified a major limitation in their study as the exome data was predominantly from white British individuals hence not representing a diverse background. Their results for ancestry analysis also prove this point. Ancestry analysis revealed a significant association between a specific trait and a gene. For example, ANKRD12 with daytime napping and PER2 with ease of getting up were significantly associated with Asian ancestry. Although associations were observed in three ethnic groups, due to small sample sizes, there were insignificant gene-phenotype associations. Another factor was the validation of the genes using FinnGen. FinnGen cohort is metric based whereas the authors explored subjective sleep traits in the study. Also, the sleep traits studied were self-reported hence it may be subjected to recall biases³.

Sleep is crucial for a healthy life and given the complexity and heterogeneity of sleep, it’s important to study the genetics behind it. Previous studies have shown that 30-65% of the variance observed in sleep characteristics is caused by genetic factors⁴. The study by Fei and his colleagues, is one of the largest and most comprehensive exome-wide association studies of sleep-related traits to date. Along with the discovery of twenty-two new gene associations, the authors have provided insights into the genetic architecture and potential mechanisms of sleep traits and their implications on human health³. It also demonstrates the importance of identifying rare genetic variants and their subsequent effects on complex phenotypes. Future studies should investigate in multiple ethnic groups and study the influence of sleep-traits on chronic diseases. Methods like mendelian randomization can establish the causal relationship between sleep-traits and chronic diseases using the genetic variants and identify potential targets for intervention. The findings of this study pave the way for future research on the molecular underpinnings of sleep and its correlation with chronic diseases. They also offer opportunities for advancing diagnostic and therapeutic methods for sleep disorders⁹.

References

Barclay, N. L. & Gregory, A. M. Quantitative genetic research on sleep: a review of normal sleep, sleep disturbances and associated emotional, behavioural, and health-related difficulties. Sleep Med. Rev. 17, 29–40 (2013).
Stranges, S., Tigbe, W., Gómez-Olivé, F. X., Thorogood, M. & Kandala, N.-B. Sleep problems: an emerging global epidemic? Findings from the INDEPTH WHO-SAGE study among more than 40,000 older adults from 8 countries across Africa and Asia. Sleep 35, 1173–1181 (2012).
Fei, C.-J. et al. Exome sequencing identifies genes associated with sleep-related traits. Nat. Hum. Behav. 1–14 (2024).
Garfield, V. Sleep duration: A review of genome-wide association studies (GWAS) in adults from 2007 to 2020. Sleep Med. Rev. 56, 101413 (2021).
Madrid-Valero, J. J. et al. What do people know about the heritability of sleep? Behav. Genet. 51, 144–153 (2021).
Vizurraga, A., Adhikari, R., Yeung, J., Yu, M. & Tall, G. G. Mechanisms of adhesion G protein-coupled receptor activation. J. Biol. Chem. 295, 14065–14083 (2020).
Zhang, Y., Liu, Y., Bilodeau-Wentworth, D., Hardin, P. E. & Emery, P. Light and temperature control the contribution of specific DN1 neurons to Drosophila circadian behavior. Curr. Biol. 20, 600–605 (2010).
Thompson, K. I. et al. Acute sleep deprivation disrupts emotion, cognition, inflammation, and cortisol in young healthy adults. Front. Behav. Neurosci. 16, 945661 (2022).
Reis, C. et al. Sleep duration, lifestyles and chronic diseases: a cross-sectional population-based study. Sleep Sci. 11, 217–230 (2018).

39,000 Exome Controls at Your Fingertips Enabling Association Studies Without Compromising Privacy

Amna Shah

A new cutting-edge methodology offers a publicly accessible platform consisting of over 39,000 exomes and an algorithm, enabling researchers to assemble control cohorts without disclosing individual genotype information, thereby reducing the cumbersome and time-consuming task of gathering data for large-scale genetic association studies.

A genome-wide association study (GWAS) is a method used by scientists to analyze the genes of many individuals and identify genetic markers linked to specific phenotypes¹. Typically, this involves categorizing individuals as cases or controls based on phenotype presence^1,2. Conducting a robust GWAS for complex traits demands substantial time and financial resources due to the need for large datasets¹. Despite various public resources being available, extensive data refinement is necessary before analysis^3,4, with recruitment methods often introducing biases^1,2. Careful matching of genetic backgrounds between cases and controls is crucial to avoid confounding factors^1,2. To overcome these challenges, SCoRe (SVD-based Control Repository) has emerged as a ground-breaking method for control selection in genetic research⁵. This breakthrough enables the study of rare and common genetic variations, even when control cohorts are unavailable, by utilizing genetic data from control samples collected elsewhere⁵. In summary, using control cohorts and the SCoRe method improves study efficiency, privacy, and accessibility, and enhances GWAS, offering vital insights into genetic diseases for clinical research.

In this study, Artomov et al. tackle the challenge of conducting association studies by utilizing allele frequencies from a matched control cohort in a public repository⁵. Leveraging singular value decomposition (SVD), a mathematical method for understanding population complexities, the study introduces a technique to select control sets that match with the genetic profile of cases without revealing individual genotype information⁵. This innovation births the SCoRe platform, housing 39,427 exome controls⁵. Statistical analysis and subsampling are behind SCoRe’s successful output⁵. These steps enables the selection of optimal controls, providing summarized genotype counts, QQ plots, and the genomic inflation factor (λ) for local association studies (Figure 1)⁵. This selection process preserves essential information as it addresses the longstanding challenge in the field and offers researchers a valuable tool to conduct privacy-conscious genetic association studies⁵.

The SCoRe methodology brings together two essential elements: Baringhaus–Henze–Epps–Pulley (BHEP) and the Gaussian kernel⁵. Firstly, SCoRe uses the BHEP, a smart statistical tool, which helps gauge how well the proposed control group matches up with a case cohort⁵. This statistic examines differences in characteristic functions and applies a Gaussian kernel weight (a mathematical function that assigns weights based on genetic distance) to evaluate the resemblance of the control group to the intended distribution^5,6. Secondly, SCoRe incorporates a Gaussian model to understand the population structure within case cohorts, providing insights into the distribution of genetic variations⁵. Together, these elements create a robust approach that not only sharpens control selection in genetic studies but also respects the complexity of population genetics.

**Figure 1:** **Illustration demonstrating the application of the SCoRe methodology in an association study.** Through statistical analysis and singular value decomposition (SVD), the data within **cases** is transformed into anonymous representations of genotype variation, ensuring individual genotype privacy. Meanwhile, a remote server, equipped with a pool of exome **control samples**, carefully selects matching control genotypes, estimates allele frequencies, and conveys the results to the user. This systematic process aids in identifying the most suitable controls and provides summarized genotype counts, QQ plots, and the genomic inflation factor (λ), facilitating comprehensive analysis in association studies. *(Figure created with BioRender.com)*

SCoRe promises rare disease researchers streamlined cohort assembly, enhanced variant discovery power, and versatility in diverse populations⁵.Investigators studying rare diseases encounter difficulties in establishing patient cohorts for clinical studies as this task often requires extensive and costly collaborations⁷. SCoRe’s methodology is particularly promising for rare Mendelian phenotypes, where disease prevalence is low, and consequently, assembling large cohorts is challenging⁵. Note that large cohorts are crucial as they enhance the power for discovering and fine mapping likely causal variants^2,8. Another advantage of this platform is that users require minimal effort, as the standard preparatory steps remain consistent with shared genotypes⁵. This simplified process paves the way for effortless hunting of associated genes and DNA variants. Furthermore, in a recent study of African American cancer patients within The Cancer Genome Atlas, Artomov et al. swiftly selected 496 control samples using the SCoRe methodology⁵. This allowed them to explore genetic variants associated with cancer susceptibility, including both common and rare ones⁵. The successful matching of control samples from African-American and admixed African population clusters underscores SCoRe’s versatility, demonstrating its effectiveness across diverse ethnic backgrounds (Figure 2)⁵.

**Figure 2:** **Schematic illustration depicting the data processing and outcomes of a pan-cancer association study conducted on the African-American subset (N = 471) of The Cancer Genome Atlas (TCGA) cohort.** Controls (496) were matched using the SCoRe server, and validation was performed through association testing for common and rare synonymous variants. The results revealed successful matching of both African-American and admixed African population clusters with the selected controls. *(Figure adapted from⁵, modified with BioRender.com)*

SCoRe’s genetic research approach raises long-term viability questions due to standardization, data quality, biases, and reliance on predefined variants.While SCoRe appears to be a crucial platform and brings about a sense of uniformity, the fundamental question that emerges is whether this envisioned central repository truly serves as a sustainable solution? The one-size-fits-all model may not cater to the complexity of diverse genetic studies. A possible drawback of this method is its reliance on a pre-defined set of high-quality variants⁵. If such a set is not well-defined or if there is high performance variability of these variants across platforms, it may affect the effectiveness of the control set selection process⁵. The authors mention the method’s effectiveness in studying combined effects of rare genetic variations in specific genes, especially synonymous variants where DNA code changes don’t affect resulting proteins⁵. However, users are urged to consistently verify the alignment on a case-by-case basis⁵. It’s crucial to be cautious with SCoRe and avoid blind reliance. Further research is needed to assess its effectiveness in diverse populations with mixed ancestries.

Think of the world of genetic research as a puzzle, with valuable pieces scattered due to strict rules governing data access. SCoRe acts as a key, unlocking the door to a new era in genetic studies, enabling effective collaboration among researchers while ensuring individual data privacy. Though the concept of a central genetic repository appears beneficial for association studies, further investigation is essential. SCoRe empowers researchers to assess genetic variations in patients, aiding in understanding potential genetic factors across diseases and expediting discoveries in diverse populations, thereby advancing precision medicine initiatives. Additionally, experimental association studies are required to rigorously validate the accuracy and reliability of SCoRe’s methodology, ensuring its effectiveness in identifying genetic associations and minimizing false positives or negatives.

References

1 Uffelmann, E. et al. Genome-wide association studies. Nature Reviews Methods Primers 1, 1-21 (2021). https://doi.org/doi:10.1038/s43586-021-00056-9

2 Song, J. W. & Chung, K. C. Observational studies: cohort and case-control studies. Plast Reconstr Surg 126, 2234-2242 (2010). https://doi.org/10.1097/PRS.0b013e3181f44abc

3 Wojcik, G. L. et al. Opportunities and challenges for the use of common controls in sequencing studies. Nat Rev Genet 23, 665-679 (2022). https://doi.org/10.1038/s41576-022-00487-4

4 Ioannidis, J. P., Thomas, G. & Daly, M. J. Validating, augmenting and refining genome-wide association signals. Nat Rev Genet 10, 318-329 (2009). https://doi.org/10.1038/nrg2544

5 Artomov, M., Loboda, A. A., Artyomov, M. N. & Daly, M. J. Public platform with 39,472 exome control samples enables association studies without genotype sharing. Nature Genetics, 1-9 (2024). https://doi.org/doi:10.1038/s41588-023-01637-y

6 Jiménez-Gamero, M. D. Testing normality of a large number of populations. Statistical Papers 65, 435-465 (2023). https://doi.org/doi:10.1007/s00362-022-01384-y

7 Stoller, J. K. The Challenge of Rare Diseases. Chest 153, 1309-1314 (2018). https://doi.org/https://doi.org/10.1016/j.chest.2017.12.018

8 Melamud, E. et al. The promise and reality of therapeutic discovery from large cohorts. (2020). https://doi.org/10.1172/JCI129196

A Dive into Rare Disease Diagnosis Through Structural and Non-Coding Variants

Farah Shah

Whole genome sequencing (WGS) combined with bioinformatic, and algorithmic tools should be prioritized as the primary clinical diagnostic test for genetic disorders. This approach considers structural and non-coding regions as potential disease occurring regions alongside coding regions, offering the potential for precise and early diagnoses of rare diseases, the discovery of novel genes, and accurate treatment options for affected individuals.

The landscape of genetic testing technology has evolved, revealing an array of diagnostic tools for disease detection. While gene panels, gene arrays, and whole exome sequencing (WES) serve as commonly utilized clinical diagnosis tools¹, their efficacy falls short in delivering accurate diagnoses for rare disease – particularly in cases where multiple loci contribute to a specific condition². As outlined by the Council of Ministers of the European Union (EU), approximately 6-8% of the European population may experience a rare disease at some point in their lives³. The rarity of these conditions brings forth unique challenges, including the scarcity of knowledge and expertise surrounding these diseases. This, coupled with their life-endangering aspects, prompting the emergence of rare diseases as a critical focus in public health³. Numerous individuals grappling with rare diseases face challenges in obtaining accurate diagnoses and treatments.

In a recently published article in Genome Medicine, Pagnamenta et. al.¹ elucidated that the effectiveness of clinical WGS in diagnosing rare disease can be significantly enhanced through the incorporation of structural and non-coding variants¹. Hence, the progress in general clinical diagnostic techniques holds the potential for accurate and early identification of disease. Swift and precise diagnosis, followed by appropriate treatment, has the potential to save countless lives.

Pagnamenta et. al. study showcased enhanced diagnostic reliability for rare diseases by integrating WGS with a range of bioinformatic and algorithmic tools (Figure1). Participants underwent WGS with variant analysis tools like Illumina BaseSpace and Variant Studio, enabling examination of various genetic variants¹. Structural variants were annotated using gnomAD, dbVAR, and DECIPHER databases, with visualization and prioritization facilitated by the novel algorithmic tool SVRare, which annotates the detected variant with relevant information and provides visualization tools for interpretation¹. The impact of splice-site variants was predicted using the algorithmic tool ALTSPLICE, which leverages convolutional neural networks to analyze splice site usage frequencies and exon inclusion rates¹. Non-coding variants were annotated and prioritized through GREEN-DB and GREEN-VARAN workflows, which assess their impact on gene expression, supplemented by deepHaem for understanding their regulatory effects¹. Additional sequencing and custom algorithms were employed to analyze somatic mosaicism and protein structure affecting variants¹. Validation techniques included nanopore sequencing, optimal genome mapping, microarrays, and targeted sequencing, with functional assays confirming variant pathogenicity according to ACMG guidelines¹. This integrated approach advances the accuracy and understanding of rare disease diagnosis.

**Figure 1:** **Overview of the Pagnamenta et. al.** **study workflow which utilized various bioinformatic and algorithmic tools.** Initially, all samples underwent whole genome sequencing (WGS), followed by analysis via the Illumina database. Samples with identified pathogenic variants in known genes underwent clinical validation and reporting, while those without underwent further analysis in a research pipeline. This research pipeline employed various bioinformatic tools to analyze different types of variants, including small variants, structural variants, copy number variants, repeat expansion variants, and runs of homozygosity. The results were annotated based on gene-level consequence, population allele frequency, impact prediction score, and disease-causing variants, and then analyzed using Variant Explorer (VE). Subsequently, algorithmic tools were employed to analyze somatic mosaicism, protein structure, structural variants, splice-site variants, and non-coding variants, followed by validation using targeted sequencing techniques such as nanopore sequencing, microarrays, and optimal genome mapping, along with functional assays. Finally, variants’ pathogenicity was classified based on ACMG guidelines, undergoing final clinical validation and reporting. Created with BioRender.com.

The standard WGS studies, along with incorporation of various new technologies, have significantly accelerated and optimized WGS testing⁴. These bioinformatic tools not only streamline the diagnostic process but also maximize the diagnostic yield¹. Hence, by incorporating these advanced bioinformatic and algorithmic tools into the clinical diagnosis of rare diseases, we can anticipate an improvement in the quality and reliability of the results.

Clinical WGS primarily identifies single nucleotide variants and small indels within pre-established sets of in-silico gene panels or coding regions of the genome⁵. This approach often results in missing pathogenic structural, non-coding and splice-site variants¹. However, it’s crucial to note that non-coding variants also play a significant role in disease diagnosis. Equal attention should be given to structural and non-coding variants alongside coding regions, as they hold the potential to unravel mysterious cases¹.

Pagnamenta et. al. identified five genes as either previously unknown or considered novel at the time of their discovery¹. Additionally, three genes were recognized as potential novel disease genes, supported by evidence of causality (Figure2)¹. Analyzing variants outside the scope of known disease genes poses a challenge for clinical laboratories due to limitations in authorization and resources for such explorations¹. Conventional clinical testing runs the risk of overlooking the causative gene or variant if it is not included in the targeted gene panel, potentially leading to missed or undiagnosed conditions¹. This emphasizes the need for active involvement from the research community to conduct these studies¹. Therefore, collaboration between research communities and clinical laboratories is deemed essential, particularly for the discovery of novel disease variants, including intronic variants, to enhance diagnostic capabilities.

**Figure 2:** **Genetic and clinical results of the study.** The research involved 122 cases, with 47 successfully resolved, 12 exhibiting uncertain significance variants with lead candidates, and 2 cases revealing secondary findings. Eight new genes were discovered to be linked to disease, including five confirmed disease-causing genes and three genes that show evidence of causality. Phenotype of one gene was expanded, six patient’s clinical diagnoses were revised, and treatment changes were implemented for eight patients. Created with BioRender.com.

The diagnosis of a disease holds significant implications for both treatment and the overall well-being of patients. In the study by Pagnamenta et. al., clinical diagnoses were modified for six patients, and treatment adjustments were implemented for eight individuals¹. Among them, five experienced life-saving outcomes, while two secondary findings were uncovered (Figure2)¹. Pathogenic variants that altered the clinical diagnosis of the condition have a profound impact on the type of treatment administered, influencing the lives of patients¹. Therefore, early diagnosis enables timely intervention, preventing irreversible damage to the body. Moreover, secondary findings, if appropriately managed, could potentially be lifesaving for patients¹. Hence, the use of advanced technology is essential to ensure accurate disease diagnosis, leading to appropriate treatment choices and effective disease management.

The integration of WGS with bioinformatic and algorithmic tools requires the establishment of specific guidelines and working criteria for comprehensive variant analysis in clinical laboratories. To foster collaboration among physicians, researchers, and laboratories, an open platform for a genotype-phenotype database should be developed, facilitating the diagnosis of patients with diverse phenotypes associated with rare diseases⁶. WGS and bioinformatics are shaping the future of personalized medicine. Accurate disease diagnosis not only benefits patients but also paves the way for the development of novel treatments, including gene therapy, genome editing, and cell therapies targeting novel disease-causing genes⁷. The diagnostic approach explored in this study has the potential to transition from a mere diagnostic tool to a guiding force for personalized medicines in the future.

References

1 Pagnamenta, A. T. et al. Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases. Genome Medicine 15, 1-25 (2023).

2 Franceschini, N., Frick, A. & Kopp, J. B. Genetic Testing in Clinical Settings. Am J Kidney Disease 72, 569-581 (2018).

3 Nguengang Wakap, S. et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. European Journal of Human Genetics 28, 165-173 (2019).

4 Marshall, C. R. et al. Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. npj Genomic Medicine 5, 1-12 (2020).

5 Sun, Y. et al. Characterizing sensitivity and coverage of clinical WGS as a diagnostic test for genetic disorders. BMC Medical Genomics 14, 1-13 (2021).

6 Lohmann, K. & Klein, C. Next Generation Sequencing and the Future of Genetic Diagnosis. Neurotherapeutics 11 (2014).

7 Stranneheim, H. et al. Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Medicine 13, 1-15 (2021).

Repurposing genomic data for possible psychiatric disorder diagnosis

Renée Aline Smith

New genetic correlational analysis tools allow for novel results of similarity between psychiatric disorders with clinical implications to be pulled from old data.

The way we look at psychiatric disorders has changed drastically in recent years with the realization that comorbid psychiatric disorders are more common than meeting the diagnostic criteria for one disorder in isolation. This change in understanding has spiked interest in genomic and bioinformatic tools that uncover and explain correlations among segregating disorders¹. The work of Grotzinger et al. puts correlational analysis at the forefront of psychogenomics by presenting novel techniques in genome wide association studies (GWAS) interpretation². The bioinformatic tools introduced provide a new possible level of analysis with genomic data to strengthen diagnosis and early-stage prevention for individuals at risk for psychiatric disorders. Although our understanding of these correlational relationships is new in its discovery, they have an immense capacity to improve quality of life for many because of the high prevalence of psychiatric disorders and their impact at an individual and societal level³.

Eleven common psychiatric disorders were analyzed throughout the research using methods of analysis previously presented by the same group⁴. The computational methods used compare data between groups which were measured in different samples. By using this type of analysis, correlational data between the 11 distinct psychiatric disorders was obtained revealing 4 different subgroups or factors of segregating disorders based on multiple levels of analysis, and specifically genetic similarity². Factors included: compulsive behaviours, psychotic factors, childhood onset/neurodevelopmental disorders, and internalizing disorders. On top of this, a general p-factor, or psychiatric-factor, was also observed revealing genetic similarity between all studied disorders. This p-factor helps to tie the array of disorders together based on shared features among them (Figure 1)². Psychiatric disorders exhibit strong correlation with designated factors; analysis revealed a moderate correlation between these factors suggesting the potential significance of the p-factor in the linking of disorders outlined in this study². The factors obtained for correlational analysis are in line with DSM-IV criteria which is based mostly on behavioural analysis and has been used diagnostically for over a decade suggesting the proposed factors have high construct validity⁵.

**Figure 1. Correlational flowchart of psychiatric factors.** Eleven distinct psychiatric disorders, Anorexia Nervosa, Obsessive-Compulsive Disorder (OCD), Tourette’s Syndrome, Schizophrenia, Bipolar Disorder, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorder (ASD), Anxiety, and Major Depressive Disorder were chosen for correlational analysis. Correlation between disorders was judged based on genetic similarity, brain morphology, and physical movement. Disorders with high degrees of relation within tests were found to segregate into 4 specific factors. Factor 1, shown in green, represents Compulsive Behaviours. Factor 2, shown in yellow, represents Psychotic Features. Factor 3, shown in red, represents Neurodevelopmental/Childhood onset. Factor 4, shown in purple, represents Internalizing Disorders. All associated psychiatric disorders are shown with arrows and represented in the same colour of the factor they belong to. A generic p-factor, or psychiatric-factor is denoted in grey, and represents genetic similarity across all proposed factors².

Within the proposed factors, multivariate genetic association analysis was used to look at interactions between disease associated genes within and across designated factors. Interactions were considered enriched if the disease–associated genes segregated within their factor. Enrichment was observed across three of the four factors but most notably was observed in factor 2 (psychotic features) suggesting a strong genetic correlation and linkage of multiple genetic loci associated for psychiatric diseases with psychotic symptoms². The novel p-factor showed enrichment of the same capacity as psychotic features suggesting its possible clinical use for general psychiatric risk prediction (Figure 2)². The use of a factor model opens the possibility for new genetic loci to be uncovered through their enrichment within a factor as opposed to their weaker enrichment across a single disease². This model could therefore be used as an approach for prediction of disease risk within a factor.

**Figure 2. Genetic enrichment within psychiatric factors.** Genomic SEM (structural equation modeling) was used to find genetic loci which were shared across psychiatric disorders found within the same factor. Each value upon the x-axis represents a hit or genetic annotation which was found across multiple disorders within a factor. The general p-factor is included as multiple hits were found across all 11 psychiatric disorders studied. Information for factor 1, compulsive disorders, was not shown as no relevant genetic loci were enriched within this factor².

When using a multivariate GWAS analysis tool, new results can be pulled from previously obtained data showing enrichment across factors, but it can also clarify genetic loci which are not shared between disorders within the same factor². The genetic variants which were not enriched within a factor could reveal a new possible target for disease-specific diagnosis or risk analysis as compared with the more general risk analysis possible from genetic enrichment within a factor. Of interest, Major Depressive Disorder had a significant amount of enriched genetic loci which were not found in the internalizing factor the disease was grouped in. While previous large-scale studies have been able to find associations of genetic loci-related psychiatric disease, these algorithms do not tell us which loci are disease-specific, i.e. not commonly shared between disorders, with confidence⁶.

The current research provides a novel technique in interpreting genomic data that has been widely available for the past decade. This type of interpretation has many similarities with phenome-wide association studies (pheWAS) which aim to reveal associations between phenotypes, or symptoms, of a handful of disorders simultaneously. The concept of pheWAS is relatively new in comparison to GWAS studies and both still have associated challenges, the largest being that it is correlational and therefore does not always imply causation. Nonetheless these novel methods provide a new and more comprehensive way to compare results of genetic associations across multiple diseases at once⁷.

The capacity for new, clinically relevant data obtained using the bioinformatics pipelines presented here almost makes one wonder why this type of research was not used sooner when the required data has been available for so long. Psychogenomics as a whole is an area which requires more emphasis as psychiatric disorders are so prominent in our healthcare system³. Above this, it is not uncommon for psychiatric disorders to surface when other, non-psychiatric, diagnoses are being made⁸. The current research provides a magnitude of possibly clinically relevant loci associated with psychiatric risk. Thus, if functional analysis of these genes can prove that there is more than just an association between suggested genetic loci and psychiatric disease, then these variants can be tested parallel to other genetic testing for non-psychiatric diseases. While this may be an idea for the distant future, it could greatly help to improve patient quality of care if psychiatric risk was noted alongside other diagnoses to ensure preventative measures, such as therapy, are offered to at-risk patients.

References

Kessler, R. C., Chiu, W. T., Demler, O., Merikangas, K. R. & Walters, E. E. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry 62, 617–627 (2005).
Grotzinger, A. D. et al. Genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic and molecular genetic levels of analysis. Nat Genet 54, 548–559 (2022).
Steel, Z. et al. The global prevalence of common mental disorders: a systematic review and meta-analysis 1980-2013. Int J Epidemiol 43, 476–493 (2014).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav 3, 513–525 (2019).
Regier, D. A., Kuhl, E. A. & Kupfer, D. J. The DSM-5: Classification and criteria changes. World Psychiatry 12, 92–98 (2013).
Cross-Disorder Group of the Psychiatric Genomics Consortium. Electronic address: plee0@mgh.harvard.edu & Cross-Disorder Group of the Psychiatric Genomics Consortium. Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders. Cell 179, 1469-1482.e11 (2019).
Verma, A. & Ritchie, M. D. Current Scope and Challenges in Phenome-Wide Association Studies. Curr Epidemiol Rep 4, 321–329 (2017).
O’Hea, E. et al. Impact of the mental health and dynamic referral for oncology (MHADRO) program on oncology patient outcomes, health care utilization, and health provider behaviors: A multi-site randomized control trial. Patient Educ Couns 103, 607–616 (2020).

DNA Damage in Aging Mice Forces RNA Polymerase II into Retirement

Liliana Trajceska

Analysis of aged and DNA repair mechanism-deficient mice reveals DNA damage as the cause of RNA polymerase II stalling and leads to aging phenotypes through progressive transcriptional stress and altered transcriptional output.

We all age – it’s inevitable; but what if the process could be eased by understanding the molecular mechanisms behind aging? Biological aging is described as an overall, progressive functional decline due to the aggregation of numerous factors^1-3. Some hallmarks of normal aging include epigenetic modifications, loss of homeostasis, and cellular senescence¹.

The most notable cause attributed to aging is DNA damage accumulation, yet consequences of this damage on transcription integrity remain poorly understood^1,4-6. Through analyzing nascent RNA production and RNA polymerase II (RNAPII) activity in aging mice, Gyenis et al. sought to elucidate the effects of DNA damage on transcriptional activity and the resulting changes in gene expression during aging⁵. Significant transcriptional decline was observed in aged liver cells due to stalled, and consequently less productive, RNAPII⁵. This age-related, gene-length-dependent phenomenon of a retired RNAPII unveils the mechanism behind transcriptome variance in aging cells and potentially explains what instigates the functional downfall of key pathways associated with aging hallmarks⁵.

Transcription, the essential first step of gene expression, is typically strictly regulated⁷. To first understand the broad transcriptional changes occurring during normal aging, Gyenis et al. injected adult (15-week-old) and aged (2-year-old) wildtype mice with ethynyl-uridine (EU), a base analogue which gets added into newly synthesizing RNA, effectively serving as a measurement of transcription⁵. Fluorescent staining analysis of EU incorporation revealed a decrease in signal by 1.5-fold in aged mice liver cells due to lowered transcription rates⁵. Put differently, less nascent RNA, derived from RNAPII, was produced⁵. Initial instinct may lead many to infer that this observed transcriptional decline is correlated with a reduction of RNAPII quantity – the star player in the faithful transcription process⁷. Surprisingly, contradicting results were found: RNAPII signal increased by a striking 1.4-fold in the aged mice⁵.

Adding to the mystery of decreased transcription in aged mice, liver cell sequencing revealed normal association of RNAPII with gene promoters and unaltered transcriptional start site activity in all examined genes, suggesting successful promoter activity and transcription initiation⁵. This eliminated promoter silencing as an explanation of reduced activity. The puzzling paradox of decreased transcription with increased RNAPII concentration can rather be explained by genome-wide stalling of transcriptional elongation⁵. Essentially, as elongation proceeded in aged cells, fewer nascent RNA transcripts were produced while more RNAPII accumulated⁵. Approximately 40% of the RNAPII were deemed unproductive and stalled⁵. Thus, aging cells suffer a dysregulation of basal transcription, hinting towards the trigger of functional decline.

The pieces of the puzzle begin to fit together, connecting aging to what the authors termed a ‘gradual loss of productive transcription’, when considering DNA structure in aged cells⁵. As cells age, simultaneous accumulation of DNA damage and diminishment of DNA damage repair capacity occurs^8,9. Unrepaired lesions in damaged DNA block RNAPII, stalling it in place and halting its ability to transcribe DNA⁷. Gyenis and colleagues demonstrated RNAPII stalling in mice harboring faulty DNA repair mechanisms, rendering them incapable of fixing blocking lesions. RNAPII hindrance was observed on the template strand and mice showed progressive transcriptional decline, evidenced by lower nascent RNA production as the mice aged (Figure 1)⁵. The cycle of damaging DNA and failing to repair the damage eventually leads to inadequate transcription of genes essential to important pathways, such as metabolism, inevitably causing genomic instability^1,4,8,9. The loss of homeostasis is detrimental for cells as it makes it nearly impossible to regain homeostasis and regular functional capacity¹.

*Figure 1.* Transcriptional output in mice with functional DNA repair mechanism versus defective DNA repair mechanism. a) Wildtype 7-week-old mice with normal DNA repair mechanisms suffer from less accumulated DNA damage. The wildtype mice have normal, active RNAPII function, resulting in an abundance of transcriptional activity and nascent RNA. b) Seven-week-old DNA repair mechanism-deficient mice cannot fix damaged DNA, leading to a blockage of RNAPII and halting of transcription. The DNA repair mechanism-deficient mice experience less transcriptional activity and lower nascent RNA production. c) Transcription declines further with age; the 7-week-old mice with a faulty DNA repair mechanism (panel b) produced less nascent RNA than the wildtype mice (panel a), however, the 14-week-old mice with a defective DNA repair mechanism (panel c) produced the least amount of nascent RNA. Figure created in BioRender.com.

Interestingly, genes experiencing greater transcriptional decline were longer (Figure 2)⁵. The disproportionate quantity of long genes associated with transcriptional decline could be explained by the higher probability of longer genes to sustain stochastic damage at an inflated rate¹⁰. An intriguing application of this finding can include tracking which genes are more prone to RNAPII stalling due to higher chances of endogenous DNA damage. Perhaps this understanding of imbalanced transcriptional output of small versus large genes, and the new finding of RNAPII stalling, may advance our understanding about which genes need to be saved to ease aging phenotypes.

*Figure 2.* Overview of RNA polymerase II stalling in a young versus aged cell. a) In young cells, elongating RNAPII (blue ovals) is actively elongating and transcribing nascent RNA from small, medium, and long genes. The active RNAPII in young cells results in balanced gene expression and homeostatic output of gene products. b) In an aged cell, longer genes demonstrate a higher likelihood of DNA damage (yellow stars), causing blockage of multiple RNAPIIs. These blocked RNAPII begin to stall (purple ovals) and form a queue. The transcriptional stalling of longer genes causes imbalanced gene expression output with more smaller genes being effectively transcribed. Figure taken from⁵.

This study suggests that there is potential to determine exact levels of unproductive RNAPII and provide an estimate of transcriptional decline in aging individuals. When analyzing public aging datasets, Gyenis et al. observed age-related transcriptional stress in multiple species and organs, implying that transcriptional stress is not confined to the mouse liver⁵. Future studies should employ these findings in multiple populations where various environmental factors, such as chemical mutagens, might contribute to mutagenesis and DNA damage accumulation. Quantifying RNAPII stalling in various tissues of these individuals can help guide predictions regarding which aging pathways might be first affected in different populations. It all comes down to DNA damage; these estimates may prove useful when attempting to understand the best time and place to implement interventions which may minimize DNA damage or unblock the queue of stalled RNAPII, and overall promote longevity.

While much research has discovered that DNA damage accumulates during aging, Gyenis et al. confirm that damage levels reached during normal aging are enough to induce transcriptional stress⁵. They demonstrate how aging causes transcriptional stress and conversely, how transcriptional stress further promotes functional decline in processes crucial to healthy cell maintenance. Their research might sway gerontologists away from thinking that the detrimental change in gene expression of aging individuals is due to active gene regulation and more so the responsibility of passive transcriptional demise⁵. We may begin to think that we aren’t the only ones retiring as we age – RNAPII does too.

References

1. López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell 153, 1194-1217 (2013).

2. Nikopoulou, C. et al. Spatial and single-cell profiling of the metabolome, transcriptome and epigenome of the aging mouse liver. Nature Aging 3, 1430-1445 (2023).

3. Noda, S., Ichikawa, H. & Miyoshi, H. Hematopoietic stem cell aging is associated with functional decline and delayed cell cycle progression. Biochem. Biophys. Res. Commun. 383, 210-215 (2009).

4. Yousefzadeh, M. et al. DNA damage—how and why we age? eLife 10, e62852 (2021).

5. Gyenis, A. et al. Genome-wide RNA polymerase stalling shapes the transcriptome during aging. Nat. Genet. 55, 268-279 (2023).

6. Rübe, C. E. et al. Accumulation of DNA Damage in Hematopoietic Stem and Progenitor Cells during Human Aging. PLOS ONE 6, e17487 (2011).

7. Lans, H., Hoeijmakers, J. H. J., Vermeulen, W. & Marteijn, J. A. The DNA damage response to transcription stress. Nature Reviews Molecular Cell Biology 20, 766-784 (2019).

8. Rossi, D. J. et al. Deficiencies in DNA damage repair limit the function of haematopoietic stem cells with age. Nature 447, 725-729 (2007).

9. Moriwaki, S., Ray, S., Tarone, R. E., Kraemer, K. H. & Grossman, L. The effect of donor age on the processing of UV-damaged DNA by cultured human cells: Reduced DNA repair capacity and increased DNA mutability. Mutat. Res. /DNA Repair 364, 117-123 (1996).

10. McKay, B. C. et al. Regulation of ultraviolet light-induced gene expression by gene size. Proc. Natl. Acad. Sci. U. S. A. 101, 6582-6586 (2004).

Decoding Spatial Omics and Unveiling Cellular Mysteries with CellCharter

Andeep Turna

The latest breakthrough in spatial omics is CellCharter, an algorithmic framework designed to revolutionize the field by addressing key challenges in resolution, scalability, and multiome integration¹.

Spatial omics provides a unique lens into cellular landscapes, quantifying molecular features within individual cells while preserving information about their location within tissues^1-3. This approach is invaluable in recapitulating tissue anatomy and revealing cellular niches characterized by unique cell type combinations. In cancer research, spatial omics are pivotal in elucidating the connection between cellular patterns and disease aggressiveness, providing insights into the oncogenic potential of mutated cells based on their spatial location². Although, existing spatial proteomics (SP) and spatial transcriptomics (ST) approaches face challenges in terms of resolution and scalability. SP and ST based on imaging mass spectrometry compromise their coverage to a few hundred proteins or 1000 genes, respectively, for single-cell or subcellular resolution^1,3. Inversely sequencing-based ST covers the entire transcriptome but holds a resolution of regions sized at 10-100 cells^1,3. Addressing these limitations, multiome spatial technology development has allowed the simultaneous assay of multiple molecular features within the same cell or region⁴. Essentially yielding the positives of each method to overcome their individual drawbacks.

Yet, this multimodality innovation is limited by the computational approaches that facilitate the spatial molecular profiles generated. Spatial clustering, a key analytical tool, assigns cells to clusters based on intrinsic features such as protein or messenger RNA abundance and the features of neighboring cells in the tissue⁵. These tools are effective when focused on a singular data modality but often need manually annotated cellular segmentation, thus limiting their scalability in both volume and diversity of data⁵; a critical bottleneck with the advent of the multimodal centric spatial multiome platforms. For computational approaches to synergize well with this new avenue of spatial technologies, they must be capable of: 1) scaling with large numbers of samples and cells while correcting for batch effects that come from clustering at such scales, 2) integrating different data modalities from different technologies, and 3) interpreting the resulting datasets from clusters that cover various tissues and conditions.

Varrone et al.¹ introduced CellCharter, an algorithmic framework that fulfills each requirement at an efficacy greater than current computational approaches. It was constructed around three principal objectives: analyze large cohorts of spatially profiled samples, remain agnostic to the technology generating spatial molecular profiles, and implement a suite of approaches for cluster characterization and comparison¹. CellCharter initiates its analysis by receiving molecular profiles (Figure 1a) and generating expression matrices (Figure 1b). Next, it employs variational autoencoders (VAEs), an artificial neural network architecture that performs dimensionality reduction and batch effect correction^1,6, to the matrix (Figure 1c). Then, it constructs a cell network based on spatial proximity and defines their neighborhood (Figure 1d). Finally, it performs spatial clustering using a Gaussian mixture model (GMM) based on concatenated vectors, considering cluster stability through multiple runs (Figure 1e). Beyond this workflow for clustering, CellCharter offers downstream analyses to determine cluster distributions (Figure 1f), compute cell-type enrichment for each cluster (Figure 1g), estimate significant spatial proximity among clusters (cluster neighborhood enrichment) (Figure 1h), and characterize and compare cluster shapes¹ (Figure 1i).

**Figure 1: Overview of CellCharter workflow and analytical outputs.** Schematic representation of the CellCharter algorithmic workflow. The framework begins with the input of spatial omics datasets, including molecular features and spatial coordinates for each cell or spot. CellCharter employs variational autoencoders (VAEs) for dimensionality reduction and batch effect correction. Subsequently, a network of cells or spots is constructed based on spatial proximity, and the algorithm performs spatial clustering using a Gaussian mixture model (GMM). Downstream analyses include cluster characterization, comparison, and exploration of cellular niches. Various analytical outputs, such as cluster proportions, cell-type enrichments, spatial proximity estimations, and cluster shape classifications, provide a comprehensive understanding of the spatial organization within tissues. Figure adapted from¹.

CellCharter was validated through comparisons with previous benchmarks of seven computational approaches using ST datasets of the human dorsolateral prefrontal cortex. These computational approaches were traditional systematic and machine learning approaches, hidden Markov random fields (HMRF) and graph neural networks (GNNs) respectively, and five state-of-the-art approaches that use modified and/or combined versions of the traditional approaches: BayesSpace, DR-SC, SpaGCN, SEDR, and STAGATE¹. CellCharter demonstrated superior memory usage, computational efficiency, and clustering quality, alongside robustness to sequencing depth variations and automatic selection of cluster quanitity¹. Furthermore, with SP mouse spleen data, CellCharter showcased notable efficiency, scalability, and accuracy in identifying spatial clusters, particularly in distinguishing healthy versus systemic lupus mice spleens.

CellCharter revealed distinctive niches associated with disease conditions, highlighting its capability to characterize and compare spatial clusters under different biological contexts. CellCharter’s application to ST data from non-small cell lung cancer tissue sections provides unprecedented insights into intratumor heterogeneity. It identified patient-specific tumor-enriched clusters and shared or distinct compositions in tumor microenvironment (TME)-enriched clusters among patients¹. The analysis exposed a tumor–TME niche, where distinct tumor-enriched clusters exhibit preferential interactions with specific TME-enriched clusters. This spatial organization revealed distinct cancer cell states coexisting within the same tumor and their TME-based interactions^1,7; unveiling pathogenesis associated changes in cellular composition and spatial organization.

While CellCharter represents significant advancement in spatial omics analysis, it is important to acknowledge its limitations. Its reliance on VAEs for dimensionality reduction and batch effect correction could potentially lead to poor output quality from poor input data quality, especially with extremely large or complex datasets⁶. Additionally, CellCharter focuses primarily on clustering and downstream analyses but no spatial trajectory inference; a method that reconstructs developmental paths of cells based on their spatial positions which aids tissue development and disease progression understanding⁸. Incorporating additional data preprocessing techniques and spatial trajectory inference would provide researchers with a more comprehensive and robust toolkit.

In conclusion, CellCharter is a versatile and scalable algorithmic framework, effective in analyzing spatial molecular profiles across diverse datasets and applications. By overcoming challenges of scalability, portability, and integration of multiome data, CellCharter offers a powerful tool for exploring cellular niches. The algorithm’s ability to decipher intratumor heterogeneity and characterize spatial interactions between different cell populations illustrates its value in cancer research and is only limited by the quality of its input data. As spatial omics advances, CellCharter’s adaptability and analytical robustness ensure its relevance in decoding the complexities of cellular organization and function in health and disease.

References

Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L.A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat Genet 56, 74–84 (2024).
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Reports 23, (2018).
Merritt, C.R. et al. Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nat Biotechnol 38, 586–599 (2020).
Zhang, D. et al. Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature 616, 113–122 (2023).
Kim, J. et al. Unsupervised discovery of tissue architecture in multiplexed imaging. Nat Methods 19, 1653–1661 (2022).
Kopf, A., Fortuin, V., Somnath, V.R. & Claassen, M. Mixture-of-experts variational Autoencoder for clustering and generating from similarity-based representations on Single Cell Data. PLOS Computational Biology 17, (2021).
Wang, N. et al. Spatially-resolved proteomics and transcriptomics: An emerging digital spatial profiling approach for tumor microenvironment. Visualized Cancer Medicine 2, 1 (2021).
Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat Commun 14, 7739 (2023).

A deeper dive into the clonality of human hematopoiesis

Aisha Wada

Are all stem cells identical? A new deep learning algorithm uniquely defines the functional diversity of human hematopoietic stem cells.

The formation of blood cells from its precursors, the hematopoietic stem cells (HSCs), is an important biological process that is widely researched. HSCs are known to exhibit clonality, which refers to the genetic and functional similarity between groups of stem cells¹. Studying the behaviors of HSCs during their development (Fig. 1), aids in understanding diseases such as blood cancers (leukemia) and provides direction in developing therapeutic targets for these diseases. These clonal patterns the clonal patterns determine how they behave² and have yet to be fully characterized by researchers thus far. A new approach using computational tools to detect specific mutations in mitochondrial DNA (mtDNA)³ is set to change the way we study these HSCs and provide a better understanding of their clonal dynamics.

**Figure 1**: Hematopoietic stem cell differentiation and division.
Adapted from “Stem Cell Differentiation from Bone Marrow”, created by “Akiko Iwasaki, PhD” using Biorender.com (2024).

Weng et al. sought to define the origin of each cell as well as compare the physiologic state of each daughter cell derived from a common “parent” cell. To characterize HSC behavior in vivo, they have designed an innovative algorithm, single-cell Regulatory multi-omics with Deep Mitochondrial mutation profiling (ReDeeM), to observe the genetic differences between HSCs clones³. This algorithm uses mtDNA as a barcode that allows the lineage of HSCs to be traced by comparing shared mutations across thousands of HSCs. It is optimized to ensure that it captures most of the mtDNA in the cell and sequences this DNA with a coverage rate three-fold higher than existing methods, and simultaneously sequences other cell components (small RNA molecules and chromatin) to determine the cell state.

But how does mtDNA act as a barcode for single-cell DNA sequencing analyses? MtDNA is a small genome with a high number of copies per cell, making it relatively easy to obtain full coverage of its sequence⁴. It also acquires random somatic mutations at a high rate during cell division⁵, therefore the mtDNA in each daughter cell is both similar to the previous cell as well as harbors unique mutations. Cell lineage can be determined by tracing back mutations through their similarities and differences, using the mtDNA start and end positions as a unique identifier ^3,6.

To create the benchmark for ReDeeM, the authors sequenced over 7000 HSCs from the bone marrow of a healthy young donor³. The mtDNA profiles were examined and found to be a reliable measure of cell-cell connectedness, as most cells could be traced to a common ancestor based on their mutation profiles³. Further analysis showed that most cells could be accurately mapped to a distinct parent clone with an accuracy of over 80, proving the accuracy of the ReDeeM algorithm (Fig. 2). By examining the mtDNA mutations of HSCs and comparing them to one another using ReDeeM, the authors were able to trace each cell to its ancestor via a phylogenetic tree, creating a ‘family tree’ of cells which were further clustered with a neural network³. Numerous clonal profiles were identified within each of these cluster, suggesting that HSC clones are comprised of several sub-clones with distinct differences in output and outcome. These remarkable details in the findings point out a greater heterogeneity in their clonal architecture than previously described in the literature.

**Figure 2**: Clonal assignment of HSCs (by SCAVENGE-L analysis). The diagram shows clonally resolved HSCs as well as committed cells that can be traced accurately to the parent HSC clones. Adapted from Weng et al., 2024³.

Further analysis using the algorithm was carried out on HSCs derived from the bone marrow of two young donors. This revealed that many HSCs showed a bias towards developing into a particular cell type, referred to as a lineage bias³. Differences in gene expression were found to occur at the points in the lineage where cell differentiation occurred (see Fig. 1), as well as across sub-clones, confirming more heterogeneity in HSC function that previously known². These differences were sustained when re-sampling and sequencing was repeated at 4 months, suggesting that the patterns do not fluctuate but rather are sustained over a long period in the individual. Using ReDeem, it may be possible for researchers to investigate diseases associated with stem cells years before they manifest, as the mutational profiles subclones can be accurately characterized and then monitored over time for signs of pathogenicity.

Can this model help us understand how blood cells behave with increasing age, and why the incidence of blood cancers increases with age? To investigate how these clonal differences change with age, Weng et al. used ReDeeM to delineate the profiles of HSCs in older donors. They noted a significantly higher load of mtDNA mutations in cells from older donors than in the younger ones, and the HSCs of older individuals were found to be biased to differentiate to lymphoid cells. These findings could begin to explain the increased risk of lymphoid leukemias with age⁷, and confirm that ReDeeM is well-suited to investigate cancer cells and identify driver mutations for leukemias, as well as potentially identify actionable mutations for cancer prevention and therapy.

This revolutionary algorithm is not without limitations. The proof that HSCs have distinct clonal and sub-clonal patterns means that it would be necessary to sample several parts of the bone marrow to obtain a true understanding of these patterns. Further studies using mapped regions of the bone marrow and repeated sampling would provide a more comprehensive analysis. Nevertheless, the ReDeeM could potentially be useful in various fields such as preventive monitoring of individuals at risk for cancer to identify driver mutations, or monitoring response to gene therapy at the single-cell level⁸. As such, further research is essential to fine-tune the algorithm.

Profiling of HSCs using ReDeeM will provide scientists with a clonal and sub-clonal atlas that is indispensable in obtaining a full picture of hematopoiesis. It is set to be a game-changer in studying other processes such as carcinogenesis (origins of cancer cells), cellular regeneration, aging and more.

References

1. Glauche, I., Bystrykh, L., Eaves, C. & Roeder, I. Stem cell clonality — Theoretical concepts, experimental techniques, and clinical challenges. Blood Cells, Molecules, and Diseases 50, 232-240 (2013).

2. Liggett, L. A. & Sankaran, V. G. Unraveling Hematopoiesis through the Lens of Genomics. Cell 182, 1384-1400 (2020).

3. Weng, C. et al. Deciphering cell states and genealogies of human hematopoiesis. Nature (2024).

4. Chial, H. & Craig, J. mtDNA and mitochondrial diseases. Nature Education 1, 217 (2008).

5. Kang, E. et al. Age-related accumulation of somatic mitochondrial DNA mutations in adult-derived human iPSCs. Cell stem cell 18, 625-636 (2016).

6. Ludwig, L. S. et al. Lineage Tracing in Humans Enabled by Mitochondrial Mutations and Single-Cell Genomics. Cell 176, 1325-1339.e22 (2019).

7. Siegel, R., Ma, J., Zou, Z. & Jemal, A. Cancer statistics, 2014. CA Cancer. J. Clin. 64, 9-29 (2014).

8. Ferrari, S. et al. Genetic engineering meets hematopoietic stem cell biology for next-generation gene therapy. Cell Stem Cell 30, 549-570 (2023).