Deciphering Cytogenetic Clues for Survival Predictions for Post-Transplant Pediatric Acute Myeloid Leukemia Patients

C’airah Ceolin

Cytogenetic aberrations identified during initial diagnosis of pediatric acute myeloid leukemia reveal a predictive ability for determining the overall survival of post hematopoietic stem cell transplant patients. 

Cytogenetic abnormalities can cause gene deletions and rearrangements that alter the expression of essential genes resulting in tumor-promoting mechanisms and malfunctioned tumor-suppressing mechanisms1. Because of this we can decipher clues from the type of abnormality to determine a patient’s response to treatment. In this case, we can determine the overall survival of pediatric acute myeloid leukemia (pAML) patients following an allogenic hemopoietic stem cell transplant (HSCT). pAML has many etiologies and is characterized by abnormal blood cells produced from bone marrow2. Allogenic HSCT is the process of receiving stem cells from a donor and is performed for pAML patients who do not successfully respond to chemotherapy in attempt to rescue and replace the patient’s stem cells. Approximately 70% of pAML patients become leukemia-free survivors3. However, pAML patients that receive HSCT can have cytogenetic aberrations categorized as poor risk (PR) as they have a greater risk of poor survival and relapse reducing overall survival to 50%4. Through analyzing several types of chromosomal aberrations Sharma et al. provide insight into the retained clues these aberrations can provide on the prognosis of pAML patients from the time of diagnosis to post-HSCT, while sparking potential new investigations on HSCT donor-recipient compatibility. These findings not only lead us a step closer to a potential framework of pAML cytogenetic aberrations to determine patient survival outcomes but can lead to improved pediatric care and treatment planning.

Four PR cytogenetic abnormality subtypes were used to categorize pAML patients prior to HSCT: (a) partial or complete loss of chromosome 5 or chromosome 7, (b) 11q23 abnormalities, (c) complex or monosomal karyotype, (d) other PR cytogenetic abnormalities5 (Figure 1). Following allogenic HSCT, the prognostic predictability was successfully concluded for pAML patients with the identified subtypes5. The subtype of a partial or complete loss of chromosome 5 or chromosome 7 abnormalities predicted poor overall survival, whereas 11q23 abnormalities and other PR cytogenetic abnormalities predicted positive overall survival4. Other PR cytogenetic abnormalities conferred a reduced relapse incidence improving overall survival and leukemia-free survival following allogenic HSCT. Some 11q23 aberrations did have more favourable outcomes suggesting the outcome is dependent on the genes involved in the rearrangements5. But in general, this subtype had greater overall survival5. It was concluded that cytogenetic abnormalities retained a prognostic predictability in pAML patients following an allogenic HSCT. 

This prognostic predictability for newly diagnosed pAML patients prior to treatment is well studied6 but it has also been suggested that PR abnormalities do not have a prognostic predictive value7. One major interest for the varying evidence are protocol-dependent differences7. Differences in protocols such as methods of collecting patient data, inclusion criteria, cytogenetic subtype stratification, and HSCT protocols could ultimately explain the diverse evidence for cytogenetic prognostic predictability.

Figure 1. The cytogenetic risk subtype stratifications with the associated prediction of the cytogenetic aberration following HSCT.  Red circles and markings on chromosomes represent full chromosomal deletions (monosomy) or translocations, respectively. The banding on each chromosome is unique and represents specific regions of the chromosomes that can be utilized to detect chromosomal abnormalities. (A) Monosomy is the loss of one of the chromosomal pairs and a q arm deletion occurs on the longer arm of the chromosome. (B) Subtype includes but was not limited to translocation of regions q27 of chromosome 6 and q23 of chromosome 11 (t(6;11)(q27;q23)) and translocation of regions q23 of chromosome 11 and q25 of chromosome 17 (t(11;17)(q23;q25))5. (C) Complex karyotype is categorized as at least three structural abnormalities and monosomal karyotype is categorized as at least two monosomies on chromosomes 1-225. (D) Chromosomal aberrations that were not placed in the first three subtypes but are considered PR, includes but not limited to translocation of chromosome 9 and 22 (t(9;22)), deletion of chromosome 12 short arm (del(12p)) and translocation of chromosome 6 and 9 (t(6;9))5. Image designed with BioRender.com.

In addition to the cytogenetic predictive factor, differences were identified in pAML patients that received an HSCT from a matched related donor or an unrelated donor­5. Patients who received an HSCT from an unrelated donor had reduced frequency of relapse as compared to patients who had a matched related donor, suggesting a potential non-familial donor protective effect4. It was hypothesized that unrelated donors may provide a greater immunotherapeutic effect for HSCT to target the leukemia cells more efficiently5. Related matched donors are typically preferred and may be more easily identified. However, the immune response initiated by related donors could prevent the proper immune response needed to target the leukemia cells and reduce risk of relapse.

While this study provides further contribution to the potential of cytogenetic abnormalities as predictive factors, questions remain regarding the inclusion criteria. Pediatric patients were considered based on cytogenetic aberrations regardless of the AML origin (de novo, secondary or therapy-related AML)5. This can potentially skew results as secondary or therapy-related AML have a heightened risk for poor outcomes5. This type of  diagnosis can arise following chemotherapy or radiation treatment of a previous blood-related disease caused by alterations to the patient’s DNA2. Because of this increased risk, patients diagnosed with secondary or therapy-related AML are commonly excluded from these retrospective end point studies5. A significant portion of the partial or complete loss of chromosome 5 or chromosome 7 subtype encompassed secondary AML patients and because of the heightened risk, it is thought to have potentially caused the increase in poor outcomes5. However, considering other studies have reported the same alteration subtype being predictive of poor outcomes, the presence of secondary or therapy-related AML patients may not have had such a significant effect8,9.

Sharma et al., highlights the potential of cytogenetic aberrations being predictive of outcomes following HSCT for pAML patients. Future studies should examine to what extent the previously mentioned protocol-dependent factors may contribute to variability in results to confirm the predictive value of cytogenetic aberrations. Creation of standardized protocols may aid in solidifying cytogenetic risk subtype stratification to accurately predict patient outcomes, potentially leading to a prognostic predictive framework for pAML. Investigation of the related versus unrelated donor-recipient hypothesis would serve an advantage when finding compatible HSCT donors for pAML patients. The range of cytogenetic aberrations that contribute to pAML can cause challenges when categorizing the severity of the aberration. However, Sharma et al. provide novel insight into the retained predictive value of chromosomal abnormalities for pAML patient survival from the initial diagnosis to after HSCT, while forming a novel hypothesis that can lead to investigations regarding the potential protective effect of related versus unrelated donor-recipient compatibility5.

References  

 1.        de Oliveira Lisboa, M., Brofman, P. R. S., Schmid-Braz, A. T., Rangel-Pozzo, A. & Mai, S. Chromosomal Instability in Acute Myeloid Leukemia. Cancers (Basel) 13, 2655 (2021).

2.         PDQ® Pediatric Treatment Editorial Board. PDQ Childhood Acute Myeloid Leukemia/Other Myeloid Malignancies Treatment. MD: National Cancer Institute https://www.cancer.gov/types/leukemia/patient/child-aml-treatment-pdq. (2022).

3.         Rubnitz, J. E. & Kaspers, G. J. L. How I treat pediatric acute myeloid leukemia. Blood 138, 1009–1018 (2021).

4.         Rubnitz, J. E. Current Management of Childhood Acute Myeloid Leukemia. Pediatr Drugs 19, 1–10 (2017).

5.         Sharma, A. et al. Cytogenetic abnormalities predict survival after allogeneic hematopoietic stem cell transplantation for pediatric acute myeloid leukemia: a PDWP/EBMT study. Bone Marrow Transplant (2024) doi:10.1038/s41409-024-02197-3.

6.         Quessada, J. et al. Cytogenetics of Pediatric Acute Myeloid Leukemia: A Review of the Current Knowledge. Genes 12, 924 (2021).

7.         Alloin, A.-L. et al. Cytogenetics and outcome of allogeneic transplantation in first remission of acute myeloid leukemia: the French pediatric experience. Bone Marrow Transplant 52, 516–521 (2017).

8.         Ogawa, H. et al. Impact of Cytogenetics on Outcome of Stem Cell Transplantation for Acute Myeloid Leukemia in First Remission: A Large-Scale Retrospective Analysis of Data from the Japan Society for Hematopoietic Cell Transplantation. International Journal of Hematology 79, 495–500 (2004).

9.         Grimwade, D. et al. Refinement of cytogenetic classification in acute myeloid leukemia: determination of prognostic significance of rare recurring chromosomal abnormalities among 5876 younger adult patients treated in the United Kingdom Medical Research Council trials. Blood 116, 354–365 (2010).

Navigating the genetic interplay of the gut microbiome and blood types

Monica R. Chacón Grijalva

A whole-genome association study (GWAS) of gut microbial genetic variants and human genomic variants uncovers a significant association between the abundance of GalNAC microbial genes and blood type A or AB individuals. This reveals the potential impact of host genetics in human health via the gut microbiome.

Structural variations (SVs) in human gut microbiome genomes are segments of different lengths, potentially encompassing multiple genes, and can either be deletion SVs (dSVs) or present in variable copies, also called variable SVs (vSVs). Gut bacterial SVs can be commonly present across individuals and associations between gut bacteria and risk factors for human disease have been previously found 1,2. Thus, studies of gut microbiome SVs are still needed to understand the consequences of genetic variation in the gut bacteria and their connection with host genomes. The lack of existing studies is what led Zhernakova et al. to search for specific significant associations between genetic SVs of the gut microbiome and genetic variants in the human genome on a genome-wide scale3. Investigating these associations in larger cohorts and characterizing their functional roles can help identify the complex effects of the gut microbiome in human health.

The authors first analyzed associations between bacterial SVs and human single nucleotide polymorphisms (SNPs) from 9,015 Dutch individuals in addition to 279 Tanzanian individuals for replication of results in individuals with a more diverse background3. Mapping bacterial sequence reads to reference genomes, they identified 3,552 SVs in 49 bacterial species (Figure 1). They proceeded to look for associations between the SVs and human SNPs for further analysis. The most significant associations were narrowed down to SNPs of the ABO gene and a 2-kilobase (kb) dSV in the genomic region 577-579-kb of F.prausnitzii3. This finding is what guided further analyses between F.prausnitzii’s SVs and their association with ABO.

Figure 1. GWAS study workflow of gut microbial SVs and host SNPs. A) Cohorts used for the study, 9,015 Dutch individuals and 279 Tanzanian individuals. B) Sample collection from gut microbiome for metagenomic sequencing and blood sample collection for human SNP genotyping is commonly performed, however, in this study data was already available. C) SGV-Finder1 methodology, starting by mapping bacterial gut sequence reads to reference genomes and detecting vSVs and dSVs. D) & E) Plots of associations between human SNP genotypes and microbial SVs abundance and presence rates. F) Manhattan plot of microbial SVs associations with human SNP genotype on a whole-genome scale. Adapted from Zhernakova et al.3

ABO encodes a type of enzyme that modifies the oligosaccharides A-antigens or B-antigens on the surface of cells, and these small molecules act as cellular surface markers4. Variations in the ABO gene are what determines different blood type phenotypes in individuals.4 Moreover, the H-antigen synthesized by the fucosyltransferase 2 (FUT2) enzyme encoded by FUT2 is the precursor of A- and B-antigens that would be present on mucosa or secretions, and individuals with this functional gene are referred to as FUT2 secretors5. After narrowing down significant associations, Zhernakova et al. observed that the 577-579-kb dSV showed a higher frequency in individuals with blood groups A or AB, and positive FUT2 secretors3. This demonstrated a strong association between this bacterial genomic region and the A-antigen. The identification of this region was the basis for the following experiments and perhaps uncover significant bacterial genes in individuals with the A-antigen.

Whole-genome sequencing was carried out with F.prausnitzii strains, and a 23-kb deletion overlapping the 577-579-kb dSV was found in most strains3. Notably, gene characterization within this 23-kb region identified genes involved in the N-Acetylgalactosamine (GalNAc) metabolism pathway, the terminal carbohydrate of the A-antigen6. Distinctly, one of the genes identified, GH109, is responsible for the cleavage of GalNAc from the A-antigen when secreted to mucus, which is then used as energy source for bacteria (Figure 2)3,6,7. Following growth experiments in medium with GalNAc as the sole carbohydrate source, F.prausnitzii strains lacking the 23-kb region showed no growth in the presence of GalNAc3. This demonstrated that the 23-kb region genomic region is important for bacterial utilization of GalNAc as a carbohydrate source for growth. This highlights the potential importance of this genomic region within our gut bacteria, which might play a role in future therapeutic strategies. Thus, additional studies are necessary to explain its full potential.

Figure 2. GalNAc metabolism pathway in human gut bacterial cells. Step 0: Cleavage of GalNAc by GH109 from A-antigen. Step 1: transmembrane uptake of GalNAc (orange square) by phosphotransferase system (PTS) and phosphorylation to GalNAc6P. Step 2: Hydrolysis of GalNAc6P to GalN6P (orange and white square). Step 3: Isomerization deamination of GalN6P to form D-tagatose 6-phosphate (T6P) (orange pentagon). Step 4: T6P phosphorylation to tagatose 1,6- bisphosphate (TBP). Step 5: TBP’s catalytic subunits gatY-kbaY or gatZ-kbaZ synthesize final products D-glyceraldehyde 3-phosphate (DHAP) and glycerone phosphate (GAP) that serve as carbohydrate sources. Adapted from Zhernakova et al.3

The authors also investigated abundance of GalNAc genes in the overall microbiome community in positive FUT2 secretors with A or AB blood type, referred to as individuals with mucosal A-antigen. As expected, individuals with mucosal A-antigen were found to have more abundance of GalNAc genes overall3. This demonstrated the strong relationship between the presence of mucosal A-antigen and gut bacteria containing GalNAc genes. Additionally, abundance of gut bacterial GalNAc genes was associated with different health parameters in individuals with and without mucosal A-antigen. This resulted in strong associations between bacterial GalNAc genes and health parameters such as a blood glucose level in individuals only with mucosal A-antigen3. Thus, the authors concluded that GalNAc genes abundance may be an important health factor in individuals with A or AB blood types and mucosal A-antigen. Furthermore, this indicates that bacterial GalNAc genes might not be as important for other blood types, such B or O blood types. However, this warrants further studies about the potential effects of absence of GalNAc genes in such blood types. Other studies that focus on strong associations with B or O blood types only might uncover a different set of significant bacterial genes.

The study by Zhernakova et al. revealed GalNAc genes as essential for growth of F.prausnitzii and linked these genes quantities with A or AB blood types in positive FUT2 secretors, suggesting ABO genotypes can influence gut microbial abundances. This interconnection sheds light on the potential mechanism by which ABO genotypes can exert an effect on human health through the gut microbiome3. Future GWAS may benefit from linking significant gut bacterial SVs with individuals from a more diverse genetic background so different associations may be discovered. In addition, further studies about how the gut microbiome genome variants affect their overall diversity and how they may relate to different human genotypes will help describe their role in human health. Overall, future clinical investigations will benefit from a more comprehensive gene characterization of the gut microbiome and their linkage with individual’s genomes in hopes to strategize disease prevention and treatments.

References

1.        Zeevi, D. et al. Structural variation in the gut microbiome associates with host health. Nature 568, 43–48 (2019).

2.        Shoemaker, W. R., Chen, D. & Garud, N. R. Comparative Population Genetics in the Human Gut Microbiome. Genome Biol Evol. 14, (2022).

3.        Zhernakova, D. V. et al. Host genetic regulation of human gut microbial structural variation. Nature 625, 813-821 (2024).

4.        Schenkel-Brunner, H. Blood Group Antigens. Comprehensive Glycoscience. 3, 343–372 (2007).

5.        Soejima, M. & Koda, Y. Survey and characterization of nonfunctional alleles of FUT2 in a database. Scic Rep. 11, 3186 (2021).

6.        Jajosky, R. P. et al. ABO blood group antigens and differential glycan expression: Perspective on the evolution of common human enzyme deficiencies. iScience 26, (2023).

7.        Rahfeld, P. et al. Prospecting for microbial-N-acetylgalactosaminidases yields a new class of GH31 O-glycanase. J Biol Chem. 294, 16400-16415 (2019).

Custom-designing the mitochondrial genome using the newest base editors

Rushil Dua

Protein-only bacterial systems of base editing are showing promise in bringing about efficient A-to-G mitochondrial base edits.

Gene editing systems have been gaining notoriety both within and out of scientific circles over the last decade, representing an up-and-coming era of genomic medicine. Many notable systems are affiliated with the original CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats) system, featuring a guide RNA and an endonuclease to induce double-stranded breaks at the region of interest1. One new gene editing system to arise in recent years is the base editing system, which combines elements of the original Cas9 system and unique deaminating enzymes. Briefly, base editors work by chemically modifying a nucleotide; rather than repairing the newly modified nucleotide, cellular machinery must instead repair the complementary nucleotide, effectively converting the base pair2. While capable of altering the nuclear genome, standard base editing systems are not capable of addressing mutations in the mitochondrial genome (mtDNA), where up to 95% of pathogenic variants are linked with point mutations3. Cho et al. report advancements in a new style of base-editing systems, DdCBEs (DddAtox cytosine base editor)4. Previously, these base editors were only capable of editing cytosines which immediately follow thymines in the 5’ to 3’ direction, only 10% of the aforementioned point mutations4. Now, Cho et al. are advancing this technology to be capable of converting A to G, covering up to 43% of all pathogenic variants in human mtDNA4. With the advent of mtDNA-editing systems, researchers can hope to develop therapeutic methods for many mtDNA-linked diseases.

Cho et al. sought to investigate which DdCBE modifications would return the highest conversion frequency, the frequency at which a nucleotide conversion is successful. To do so, they mixed and matched different DdCBE components to assess which combination would be best. DdCBEs traditionally include a catalytically deficient, or split, DddAtox (a bacterial toxin), a transcription-activator-like effector (TALE; bacterial protein) and a uracil glycosylase inhibitor (UGI)5. DddAtox is typically associated with cytosine deamination, however is capable of “revealing” dsDNA to deaminases which might otherwise be unable to read it. TALEs are employed for sequence specificity and binding, and UGIs are used to inhibit cell repair machinery. Ultimately, they discovered a potent combination that produces conversions with high accuracy. Cho et al. examined both split and intact DddAtox for their editing accuracy, but it was determined that split DddAtox is indeed more effective at producing base conversions. This may be due to the split toxin’s inactivity until association with the target nucleotide, where the toxin is reformed and can enact its function, in contrast to the cytotoxicity of the complete DddAtox protein. In addition, TALEs were fused with deaminases, thereby named TALEDs, and custom-programmed to target individual genes in the human mitochondrial genome4. Further, UGIs were deemed unnecessary to function; Cho and associates excluded the UGI and opted for an adenine deaminase (AD) that cooperates with the DddA toxin to deaminate dsDNA. Together, the reformed base editor comprised the DddAtox and the custom-designed sTALED (s referring to a split toxin), with an AD in place of the UGI (Figure 1). This system’s conversion frequency of A-to-G conversions was noted at 49%, with an accuracy of up to 99%. Importantly, low to no instability or toxicity was observed after the mitochondrial DNA was edited, indicating the technology’s reliability moving forward.4.

Figure 1: sTALED without UGI. Visualisation of sTALED binding to target nucleotide, including a split DddA toxin and a missing UGI unit replaced with an AD unit. Adapted from 4. The catalytically deficient split DddAtox halves convene at the target nucleotide guided by the TALE repeats. In doing so, they enact their function, allowing AD to recognise dsDNA and deaminate the target adenine.

These new findings display massive clinical potential, suggesting that many mitochondrial/matrilineal diseases could soon be treated using TALED-based therapies. CRISPR-associated base editors have already made it to the clinical stage and TALED-based editors may shortly follow suit. Some diseases that may benefit from this new TALED base editing technology include “Leber hereditary optic neuropathy (LHON), mitochondrial encephalomyopathy, lactic acidosis, stroke-like episodes (MELAS), and Leigh syndrome”4. These diseases share a common mutation that is corrigible using the TALED editing system. Mitochondrial encephalomyopathy in particular, a disease which negatively impacts neurological development, is a relatively common mtDNA-linked disease with a rate of prevalence of 1 in 4000. Clinical advancements towards its treatment may have bountiful effects in improving the quality of life for individuals suffering from this disease6.

This technology, however, requires optimisation before it can hereforward gain traction down the line. Cho et al. discuss the potential for off-target effects; though the rate of occurrence is low, approximately 0.02%, novel missense or frameshifting mutations may arise if non-targeted nucleotides are undesirably edited. Moreover, due to the nature of the TALED-editor, roughly half of all mtDNA pathogenic single nucleotide variant (SNV) mutations are left unaddressed. Researchers may seek to bring about base editing systems capable of modifying alternative or multiple nucleotides in the future. Though Cho et al. did explore the possibility of combining an A-to-G and a C-to-T editor into one, they found that the efficiency of this was far lower compared to a unique A-to-G editor, only converting bases at a rate of 10-15%. Lastly, it is crucial to consider ethical implications of gene editing technologies that may hinder their widespread use in the future. Concerning clinical use, many take issue with the potential to endorse eugenics or revoke informed consent of those who may unknowingly benefit from these technologies. Some are additionally unsure about gene editing use in animal and plant environments due to the potentially harmful ecological repercussions7.

Broadly, Cho et al. have demonstrated that efficient, high-frequency A-to-G editing of mtDNA is possible due to the sTALED-AD system. This development will certainly bear large clinical implications as its ability to treat mtDNA-linked diseases is continually refined.

References

  1. Redman, M., King, A., Watson, C. & King, D. What is CRISPR/cas9? Archives of disease in childhood – Education & Practice edition 101, 213–215 (2016).
  2. Rees, H. A. & Liu, D. R. Base editing: Precision chemistry on the genome and transcriptome of living cells. Nature Reviews Genetics 19, 770–788 (2018).
  3. Gammage, P. A., Moraes, C. T. & Minczuk, M. Mitochondrial Genome Engineering: The revolution may not be CRISPR-ized. Trends in Genetics 34, 101–110 (2018).
  4. Cho, S.-I. et al. Targeted A-to-G base editing in human mitochondrial DNA with programmable deaminases. Cell 185, (2022).
  5. Mok, B. Y. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631–637 (2020).
  6. Pia, S. & Lui, F. “Melas Syndrome” in StatPearls (StatPearls Publishing, Treasure Island, 2024).
  7. Ayanoğlu, F. B., Elçi̇n, A. E. & Elçi̇n, Y. M. Bioethical issues in genome editing by CRISPR-Cas9 technology. Turkish Journal of Biology 44, 110–120 (2020).

Deconvoluting the genome with BIONIC: Biological Network Integration using Convolutions

Faizan Hasan

BIONIC is a new computational tool that applies a new, cutting-edge machine learning algorithm to genomic data to holistically represent a genes.

In 2004 scientists around the world managed to completely sequence the human genome, transforming it from a small molecule on a microscope slide to a large text file containing different variations of the letters A,T,C and G1. This exciting revelation led to the discovery of tens of thousands of genes that interact with each other to create a complex puzzle. Understanding how each gene fits into this puzzle would have drastic implications for the field of medicine, allowing us to identify the root causes of cancer or complex genetic disorders such as Parkinson’s disease. However, this has proven to be more challenging than previously anticipated, as the function of many genes remains ambiguous. Simultaneous advances in the field of computer science and specifically machine learning algorithms have proven them to be adept in analyzing large datasets, hence a marriage between the two fields was only natural. BIONIC (BIOlogical Network Integration using Convolutions)  is arguably the most cutting-edge machine learning tool in recent development and has distinguished itself through its unique unsupervised network integration approach2.

Bionic Network Integration

Biological data is often curated into specialized datasets with each expressing only one piece of information. Each dataset provides information that is mutually exclusive yet essential, knowing where a gene is expressed does not tell us what proteins it interacts with and vice versa, yet both pieces of information can be essential in order to infer gene function. BIONIC can condense the multi-faceted nature of biological data by integrating multiple datasets into one unified and holistic representation for each gene called a gene feature vector2. This process, called network integration, takes advantage of a recently developed machine learning algorithm called Graph Convolutional Networks3. This allows it to outperform several existing network integration tools that either include too much noise in their output or only retain global relationships (such as a metabolic pathway) at the cost of local ones (such as which proteins form the complexes in the pathway)4,5.

In the BIONIC seminal paper, the authors create feature vectors by integrating yeast gene-gene interaction data with gene co-expression data and protein-protein interaction data .  BIONIC feature vectors proved to be much more accurate in their ability to predict known gene functions compared to the three individual constituent datasets2. It is also interesting to note that BIONIC features were more accurate in predicting known protein complexes than the protein-protein interaction dataset, which is specifically designed to predict protein complexes (Figure 1)2. These results showcase how a unified and holistic gene feature vector can portray more information than specialized datasets containing only one type of information.

Figure 1) Venn diagram of accurately predicted protein complexes for each individual dataset and BIONIC integrated features. Numbers in bracket indicate total captured complexes for each methodology. PPI is protein-protein interaction dataset , COEX is Gene Coexpression dataset, GI is Genetic interaction dataset and BIONIC represents BIONIC feature vectors.

Unsupervised learning

Another important feature of BIONIC is that unlike traditional approaches that often rely on supervised learning models, BIONIC employs an unsupervised learning approach to create gene features2. Supervised models must be trained on previously labeled data (such as annotated gene data), whereas an unsupervised learning approach does not require training data to be previously labeled. This can significantly broaden the scope of the analyses as unlabeled data is much more common than labeled data. More importantly, using previously labelled data will cause the created gene features to be biased to our current understanding of Biology, which can result in reinforcing already known gene functions rather than discovering new ones2. Using unlabeled data on the other hand, allows BIONIC to discover new functions by analyzing inherent underlying patterns in the data while remaining unaware of any biological preconceptions. Unsupervised learning also ensures that the results are not corrupted by including potential spuriously labelled data.

There might still be concerns regarding the validity of novel functional predictions made by a computer algorithm vs ones made by a trained biologist. However, BIONIC’s scalability qualifies it to simultaneously analyze an abundance of biological data, at scales large enough to accurately represent the entire human genome. Although scalability is often times an issue with unsupervised models, BIONIC’s GCNs were shown to scale not only in the number of genes in the datasets but the number of datasets integrated to form BIONIC feature vectors 6,7.

Views

Nevertheless, concerns regarding functional predictions can only be thoroughly dismissed via functional studies that validate claims made by BIONIC. As BIONIC is a recent invention, there are no such studies currently published. Unfortunately, it is not common to see follow-up studies on discoveries made by computational tools, a trend that can be characterized by the abundance of Genome-Wide-Association-Studies (GWAS) results with a subsequent lack of follow-up. This trend can be primarily attributed to the stark differences in the costs of computational analyses and functional studies. Making gene function predictions using BIONIC feature vectors is completely free of cost, whereas a follow-up study would require hiring multiple scientists, obtaining specialized equipment, conducting laboratory experiments and incurring various associated expenses.

In today’s world, there is a lack of trust behind computational tools to warrant large research grants to conduct follow-up research on these claims. However, with recent developments such as BIONIC, computational tools continue to become more accurate, making it harder to dismiss these claims based on the validity of the methodology. If such research is continued at the same pace, we might one day stand on the precipice of a new world. In which follow-up functional studies are no longer a requirement, significantly diminishing the costs associated with genomic experiments, making it possible to gain a comprehensive understanding of the complex puzzle known as the human genome.

References

1.International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

2.Forster, D. T. et al. BIONIC: biological network integration using convolutions. Nat. Methods 19, 1250–1261 (2022).

3.Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proc. International Conference on Learning Representations (2017).

4.Malod-Dognin, N. et al. Towards a data-integrated cell. Nat. Commun. 10, 805 (2019).

5.Wang, P., Gao, L., Hu, Y. & Li, F. Feature related multi-view nonnegative matrix factorization for identifying conserved functional modules in multiple biological networks. BMC Bioinformatics 19, 394 (2018).

6.Grover, A. & Leskovec, J. node2vec: Scalable Feature Learning for Networks. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (Association for Computing Machinery, New York, NY, USA, 2016). doi:10.1145/2939672.2939754.

7.Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: Online Learning of Social Representations. in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining 701–710 (2014). doi:10.1145/2623330.2623732.

Unravelling the Puzzle Pieces of Migraine Pathogenicity with PWAS and TWAS

Erin Hsue

The discovery of potential target genes for migraine is revealed through proteome-wide association studies and transcriptome-wide association studies, opening avenues for enhanced therapeutic approaches.

            We all occasionally experience the mild, achy discomfort of headaches, whether it stems from poor posture or eyestrain. Migraine, however, is not just a simple headache – it is a complex neurological disorder characterized by recurrent, severe headaches accompanied by nausea, sensitivity to light, and aversion to loud sounds1. While triggered by environmental factors, migraine is significantly influenced by genetic factors with a heritability estimated to be up to 57%2, with both the central nervous system (CNS) and vascular processes playing pivotal roles in its pathophysiology1,3. Despite the presence of over a billion migraine sufferers across the globe1, existing therapeutic approaches prove ineffective in addressing the needs of many patients, which is attributable to the limited grasp of the disease etiology and the variability of characteristics associated with migraine attacks among individuals. A recently published article conducted by Li et al.4 sought to combat this problem – by performing an integrative analysis of proteome-wide association studies (PWAS) and transcriptome-wide association studies (TWAS). Researchers were able to pinpoint genes that play a complex role in migraine development, bringing insights into its pathogenesis for potential therapeutic targets for this neurological disorder.

            The current understanding of migraine is analogous to a puzzle set comprised solely of blank pieces – we have the pieces that fit together, but we don’t know how they do. Likewise, GWAS has identified genetic loci associated with migraine5, but it’s unclear how they are connected to the disease. With TWAS and PWAS, researchers are working with puzzle pieces that reflect the picture of migraine pathogenesis; instead of blindly guessing how the pieces fit together, Li et al. are providing more context to solve the puzzle.

            Specifically, TWAS integrates GWAS and gene expression datasets to identify gene-trait associations, investigating the correlation between the transcriptome and each gene loci to uncover potential causal genes6. Li et al. used a novel joint-tissue imputation (JTI) prediction model of 17 tissues relevant to the pathophysiology of migraine which acted as the gene expression datasets. Fine-mapping Of CaUsal gene Sets (FOCUS) was then employed to estimate the probability that a specific genetic feature is associated with causing the risk of migraine (Figure 1b).

            On the other hand, PWAS combines GWAS data with proteomic data to identify candidate genes associated with a given trait, linking genes and phenotypes through functional variation in proteins7. In this study, proteomic data was derived from plasma protein samples and human brain samples profiled from the dorsolateral prefrontal cortex of post-mortem subjects4. The Functional Summary-based Imputation of eQTLs Using SNPs (FUSION) method, which prioritizes genetic variants with potential functional effects on gene expression, was then implemented (Figure 1c). Essentially, combining both methodologies serves as a filter to only retain pivotal genes that cause migraine.

Figure 1. The workflow of the integrative PWAS and TWAS approach conducted by Li et al. to identify five migraine causal genes. a) European migraine patient genetic data undergo GWAS to identify risk loci for migraine. GWAS data was used in conjunction with TWAS and PWAS. b) JTI models, obtained based on GTEx v8, for 17 tissues (13 brain tissues, whole blood, and 3 vascular tissues) used to conduct TWAS. Data then underwent the FOCUS method. c) Proteomes profiled from brain and plasma samples of European descent used to conduct PWAS. Data then underwent the FUSION method.

            The puzzle becomes less daunting as Li et al. identified five genes – ICA1L, TREX1, STAT6, UFL1, and B3GNT8, which exhibited significant correlation with migraine in both the proteome and transcriptome4. Upon further investigation, these genes are expressed in numerous different tissues with varying functions, which underlie the complex nature of the neurological disorder. Remarkably, the location and function of gene expression correspond to previous migraine pathogenesis research. For instance, both immune responses and neuroinflammation have previously been reported to contribute to migraine pathogenesis8. Using cell-specific analysis after performing TWAS, the authors found UFL1 enrichment in oligodendrocytes and neurons, where both cell types play a role in neuroexcitatory signal regulation and protein modifications in migraine8,9. As such, it is possible that UFL1 can act as a target gene for antihistamines in migraine prevention and treatment. In the same vein, STAT6 enrichment was found in microglia and macrophages, which are immune cells found in both the CNS and arterial tissues. It is possible that the activation of these cells can lead to increased sensitization in central and peripheral regions, ultimately creating an inflammatory response and heightening migraine3,10. On that account, targeting STAT6 may decrease the longevity and severity of migraine in patients. Amazingly, the puzzle pieces are no longer blank and now hold some insights into the picture of migraine pathogenesis.

Thanks to the findings of this study, functional investigations can now be conducted on these firmly established genes, which would essentially piece together the puzzle of the biological pathways underlying migraine. For instance, future studies may explain the wide spectrum of symptoms that are experienced by people with migraine, such as the presence of specific variants or epigenetic modifications in the five genes. This could potentially result in more effective, targeted treatments that transcend the conventional trial-and-error approach to dosage prescription and adverse side effects associated with common migraine medications. Such research gives rise to personalized medications that mitigate the frequency and severity of migraine attacks.

            It’s noteworthy to mention that these genes are not the only pieces that encompass the entire puzzle of migraine pathogenesis. After all, only 17 tissues in total were studied, which overlooks other potentially migraine-associated tissues – including transcripts that are expressed in the brain. Additionally, the proteomic data came exclusively from samples of European descent, providing a limited perspective on a prevalent disease that spans across the globe. As such, it’s possible that many other genes remain undiscovered, and we have a puzzle set with missing pieces.  

            Nonetheless, this positions ICA1L, TREX1, STAT6, UFL1, and B3GNT8 as vital puzzle genes for devising effective treatments for the numerous individuals globally grappling with this neurological disease. Hopefully, migraine will one day be referred to as “just a headache”.

References

1.        Gupta, J. & Gaurkar, S. S. Migraine: An Underestimated Neurological Condition Affecting Billions. Cureus (2022) doi:10.7759/cureus.28347.

2.       Choquet, H. et al. New and sex-specific migraine susceptibility loci identified from a multiethnic genome-wide meta-analysis. Commun Biol 4, 864 (2021).

3.       Ashina, M. Migraine. New England Journal of Medicine 383, 1866–1876 (2020).

4.       Li, S. et al. Identifying causal genes for migraine by integrating the proteome and transcriptome. J Headache Pain 24, 111 (2023).

5.       Hautakangas, H. et al. Genome-wide analysis of 102,084 migraine cases identifies 123 risk loci and subtype-specific risk alleles. Nat Genet 54, 152–160 (2022).

6.       Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet 51, 592–599 (2019).

7.        Brandes, N., Linial, N. & Linial, M. PWAS: proteome-wide association study—linking genes and phenotypes by functional variation in proteins. Genome Biol 21, 173 (2020).

8.       Renthal, W. Localization of migraine susceptibility genes in human brain by single-cell RNA sequencing. Cephalalgia 38, 1976–1983 (2018).

9.       Eising, E. et al. Involvement of astrocyte and oligodendrocyte gene sets in migraine. Cephalalgia 36, 640–647 (2016).

10.     Rawji, K. S. et al. Immunosenescence of microglia and macrophages: impact on the ageing central nervous system. Brain 139, 653–661 (2016).

A New Era in Genetic Diagnosis: PhenoScore’s Integration of Artificial Intelligence and Facial Recognition

Kobe Huynh

Introducing PhenoScore, an artificial intelligence driven model that combines facial recognition technology and clinical features to quantify phenotypic similarity in individuals with rare genetic diseases

In a ground-breaking advancement towards revolutionizing genetics diagnosis, a paper by Dingemans et al. has developed PhenoScore, an innovative artificial intelligence (AI)-based phenomics framework that integrates facial recognition technology with clinical features. The next-generation open-source tool quantifies phenotypic similarity in rare genetic diseases to allow clinicians and researchers to diagnose and comprehend rare genetic diseases more accurately1. Unlike other algorithms, PhenoScore provides a comprehensive view of the patient’s phenotype by utilizing high-quality data directly from affected individuals1. This novel approach bridges the gap between AI and clinical genetics and has the potential to transform the landscape of healthcare surrounding genetic diseases.

Facial recognition technology is one of PhenoScore’s notable features as it acknowledges the complex link between facial features and neurodevelopmental disorders. Neurodevelopmental disorders can affect both brain development and facial structure, therefore a considerable number of genetic disorders have distinct facial features1. PhenoScore pioneers a unified AI-driven model that seamlessly integrates facial and clinical features using a two-module AI-based system1. The first module extracts facial features from 2D facial photographs, and the second module calculates Human Phenotype Ontology (HPO)-based phenotypic similarity1. HPO is a standardized ontology that contains phenotypic information about genes with over 12,000 terms describing phenotypic features2. The facial features are automatically extracted from the facial photographs and the phenotypic HPO similarity of individuals is calculated1.

Figure 1a: The global workflow of training and construction of PhenoScore. A convolutional neural network, VGGFace2, extracts the facial features while the phenotypic similarity of the subjects and the controls are computed1. Phenoscore reports the classification metrics such as Brier score, Area Under the Curve (AUC), and P-value to indicate how well it can distinguish the investigated phenotypic groups1. Additionally, facial heatmaps and visualisations of any key phenotypic traits are generated. 1b: The trained PhenoScore model tailored to a syndrome is used for a subject with a VUS. Input for PhenoScore is calculated using the phenotypic similarity and facial distance values. The output is a score that assesses whether the individual has that specific syndrome1.

Traditional approaches in clinical genetics often rely on whole-exome sequencing and clinical phenotyping2. However, limitations include the substantial number of variants labeled as diagnostic noise1. PhenoScore, however, introduces a quantitive phenotypic score that evaluates phenotypic similarity to help with the clinical interpretation of variants of unknown significance (VUS)1. The article reported to classify 59% (13/22) of VUS in individuals with neurodevelopmental disorders1. The power of this technology lies in its ability to reclassify VUS to help clinicians make informed decisions and tailor clinical prognosis for patients with rare neurodevelopmental diseases.

The framework’s ability to identify recognizable phenotypes for 37 out of 40 investigated syndromes marks a significant improvement over existing algorithms such as Phenomizer and LIRICAL3,4. Furthermore, PhenoScore requires only five individuals for satisfactory classification performance1. PhenoScore successfully distinguished phenotypic subtypes caused by variants within genes, supporting genotype-phenotype correlations. The authors used SATB1, SETBP1, DEAF1 as examples of how the framework confirmed known phenotypic subgroups, demonstrating how it is a valuable tool for detailed genotype-phenotype studies1. SATB1, SETBP1, and DEAF1 are all associated with distinct neurodevelopmental disorders and are known to have substantially different phenotypic subgroups6,7,8. The introduction of PhenoScore has significant effects on clinical practice and research. With this technology, clinicians can improve the accuracy of genetic diagnosis, especially when its phenotypes manifest as dysmorphic facial features. The model also has implications beyond clinical diagnosis as it has extraordinary potential to contribute to personalized medicine and customized clinical prognosis. Through leveraging AI and machine learning, this tool provides a promising outlook for the future of precision medicine, where treatment can be tailored to unique genetic and phenotypic patient profiles.

The introduction of PhenoScore opens new opportunities for the investigation of rare neurodevelopmental diseases and better prognosis for patients with genetic disorders. Facial recognition AI could assist in the early detection and screening of individuals who may exhibit facial features associated with certain neurodevelopmental diseases. This early identification can lead to medical intervention and support. Furthermore, the use of AI to continuously monitor facial features over time could help track the progression of neurodevelopmental diseases. This longitudinal data can be valuable for clinicians to evaluate the effectiveness of treatments and interventions. Furthermore, PhenoScore’s open-source nature encourages collaboration and customization, allowing researchers and healthcare professionals to apply the framework beyond the scope of neurodevelopmental disorders.

Future research directions should look at implementing 3D image-based models in PhenoScore’s facial recognition technology. PhenoScore currently utilizes 2D facial photographs for its facial recognition technology1. It was recently discovered that 3D image-based models significantly outperform 2D image-based models in the context of face-based genetic syndrome diagnosis5. The 3D representations showed approximately 6% higher sensitivity than 2D representations5. Therefore, the use of 3D imaging in PhenoScore could greatly improve syndrome classification performance compared to current 2D representations.

It is important to note that a common concern surrounding AI is data safety and privacy as data is crucial to an AI-driven system. As AI and precision medicine become increasingly linked, data concerning genetic profile, medical history, and social data of the population will be collected and integrated more frequently. In the case of PhenoScore which utilizes facial recognition and collects facial photographs, data protection should be their primary concern as their data is highly sensitive. Therefore, it is essential that PhenoScore establishes a secure and tightly regulated environment for data storage, management, and exchange. In a broader context, individuals’ concern about data privacy is tightly linked to their trust in using AI-enabled services.

In conclusion, PhenoScore marks a substantial leap in the integration of AI and clinical genetics. By quantifying phenotypic similarity and offering an objective approach to genotype–phenotype studies, the framework emerges as a valuable asset in the diagnosis and understanding of rare genetic diseases1. The technology’s potential to enhance personalized medicine and clinical care puts it at the forefront of revolutionary breakthroughs in the field, heralding a new era in genetic diagnosis and research.

References

1.Dingemans, A. J. et al. Phenoscore quantifies phenotypic variation for rare genetic diseases by combining facial analysis with other clinical features using a machine-learning framework. Nature Genetics 55, 1598–1607 (2023).

2.De La Vega, F. M. et al. Artificial Intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases. Genome Medicine 13, (2021).

3.Köhler, S. et al. Clinical diagnostics in human genetics with Semantic similarity searches in ontologies. The American Journal of Human Genetics 85, 457–464 (2009).

4.Robinson, P. N. et al. Interpretable clinical genomics with a likelihood ratio paradigm. The American Journal of Human Genetics 107, 403–417 (2020).

5. Bannister, J. J. et al. Comparing 2d and 3D representations for face-based genetic syndrome diagnosis. European Journal of Human Genetics 31, 1010–1016 (2023).

6. Cardo, L. F., de la Fuente, D. C. & Li, M. Impaired neurogenesis and neural progenitor fate choice in a human stem cell model of setbp1 disorder. Molecular Autism 14, (2023).

7.Yu, Y. et al. Neurodevelopmental disorders and anti-epileptic treatment in a patient with a SATB1 mutation: A case report. Frontiers in Pediatrics 10, (2022).

8. McGee, S. R. et al. Expansion and mechanistic insights into de novo DEAF1 variants in deaf1-associated neurodevelopmental disorders. Human Molecular Genetics 32, 386–401 (2022).

Increasing statistical power proves to be detrimental to analysis in genetic studies

Areeba Imran

Genotype imputation, while a high-quality concept in theory, does not hold up to a favourable standard of practice compared to results from next-generation sequencing (NGS) because it produces false negative results.

Genotype imputation is a statistical method geneticists use to estimate unknown genotypes1. It increases the power of studies because more genotypes will be known1. This method is commonly used in genome-wide association studies (GWAS) that genotype variants close to 100,000- 1,000,000 for every individual included in the study1. Geneticists opt for genotype imputation because they can substitute it for NGS and single-nucleotide polymorphism (SNP) arrays2. The most significant drawback of these two methods is that they require a big sample size to determine genotypes effectively2. Genotype imputation can use a smaller sample size because it estimates unknown genotypes by finding alleles inherited from a single parent, referred to as haplotypes1.

Imputation begins by taking a genotyped population sample with unknown genotypes and comparing it to a reference panel of haplotypes that contain information on more genotypes than the sample1. Most geneticists use the HapMap Consortium database or the 1000 Genomes Project’s (1000GP) database as their reference panel because they contain information on haplotypes3. Next, areas of shared haplotype are found, and the missing genotypes in the sample are found by copying the alleles in the matching reference panel of haplotypes, as seen in Figure 11. Today, geneticists use haplotype programs to “impute” missing genotypes3. These programs use tools specifically for imputation analyses and give geneticists an uncertainty score for each genotype estimate3.

Figure 1: Genotype imputation on a random sample. A- The gray dots represent bases that are unknown. The study sample consists of two unrelated individuals. B- For each individual, the reference haplotypes are used to fill in bases that are unknown by locating stretches of bases that match the positions of the known bases. This is represented by the similar colouring seen in the study sample and the reference haplotypes. C- The imputed genotypes are denoted with lowercase letters, and the samples are shown side-by-side to identify similarities between them. Figure taken from1

Lau et al. investigated the problems associated with relying on imputation, especially in the context of disease-susceptible variants. This is important to consider because GWAS heavily relies on imputation to find novel risk alleles and to determine more SNPs to increase the likelihood of identifying causal variants4. Lau et al. determined the accuracy of genotype imputation by examining three disease loci they reported in a previous study they conducted on type-2 diabetes (T2D)in Europeans and African Americans5. The following three loci associated with T2D were chosen because GWAS has yet to identify them: ACTL7B, KCNK3 and TCF7L26.

Imputation accuracy was determined by NGS6. The sample sequences used by Lau et al. included 92 individuals with a family history of T2D and 93 individuals without a history of T2D6. The total number of significant SNPs associated with T2D found through NGS in the three gene loci mentioned above was 236. By conducting three experiments, Lau et al. determined imputation accuracy by removing the 23 SNPs found to be significant and imputing them with the remaining SNPs as the reference panel, as their sequences were known6. The removal of the SNPs from the data is referred to as masking.

Experiment 1 involved masking all 23 SNPs simultaneously and imputing them using the reference SNPs at low coverage, with experiment 2 being the same except for imputing at high coverage6. Finally, experiment 3 involved masking all 23 SNPs one at a time and imputing them with high coverage reference SNPs6. The computational tool used for imputation analysis was IMPUTE2, and the reference panel for haplotypes used was 1000GP6. Results from experiment 1 showed that none of the SNPs were accurately imputed because, across all three loci, 21 of the significant SNPs found through NGS were imputed as monomorphic (no difference compared to the reference). The other 2 SNPs were imputed as monomorphic due to incorrect genotyping6. Results from experiment 2 were similar as most of the significant SNPs were imputed as monomorphic except for five6. Experiment 3 yielded the best results because only 3 of the significant SNPs were imputed as monomorphic, leading to the results from imputation in experiment 3 being 90% similar to the NGS results6. Three SNPs were imputed as monomorphic because the risk and alternate alleles were incorrectly assigned, making the results inaccurate6.

It is important to note that most GWAS do not undergo genotype imputation using the experiment 3 model6. Furthermore, there are differences between masking SNPs simultaneously and one at a time6. To account for this, Lau et al. compared the study haplotypes, using the study panel from experiment 2, found through imputation and the reference haplotypes from 1000 GP with the study panel they constructed using NGS6. They found that some risk alleles were rarer in the reference panel6. This is concerning because this leads to imputation bias against the risk allele, ultimately leading to false negative results6.

To improve genotype imputation, reference panels from different databases should be utilized to match disease-specific cases and controls to the reference panel to avoid risk alleles being rarer in the reference6. This may be hard to achieve because implementing disease-specific reference panels will defeat the purpose of imputation (not having to genotype all SNPs)4. Genotype imputation has less harm in the context of GWAS, as we know they have small predictive value4. Nonetheless, this statistical method should not be used in the context of disease association because it can lead to false negative results. This research proves that statistical methods come with limitations, so to ensure their accuracy and routine usage, they need to be updated as we learn more about genetics.

Genotype imputation should be used in genomic studies because its benefits are apparent, as it is a cost-effective alternative to NGS and SNP arrays and provides a tool to increase the coverage of the human genome. However, directly genotyped data should be utilized in the context of disease variants because we should not risk wrongly diagnosing patients.

References

  1. Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annual Review of Genomics and Human Genetics 10, 387–406 (2009).
  2. Shi, S. et al. Comprehensive assessment of genotype imputation performance. Human Heredity 83, 107–116 (2018).
  3. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5, (2009).
  4. Ali, A. T., Liebert, A., Lau, W., Maniatis, N. & Swallow, D. M. The hazards of genotype imputation in chromosomal regions under selection: A case study using the lactase gene region. Annals of Human Genetics 86, 24–33 (2021).
  5. Lau, W., Andrew, T. & Maniatis, N. High-resolution genetic maps identify multiple type 2 diabetes loci at regulatory hotspots in African Americans and Europeans. The American Journal of Human Genetics 100, 803–816 (2017).
  6. Lau, W. et al. The hazards of genotype imputation when mapping disease susceptibility variants. Genome Biology 25, (2024).

Exploring Human Cancer Risk by Unraveling Enhancer-Gene Links

Azin Keshavarz

A new study suggests changing the cancer treatment approach by unraveling the complex network of non-coding genetic variants using the Activity-By-Contact model and emphasizing their role in regulating gene expression across various types of cancer, specifically colorectal cancer.

Genome-wide association studies (GWAS) have identified thousands of genetic variants statistically associated with human traits and diseases.1 However, a significant challenge has been mapping non-coding variants to their target genes. Over 90% of these variants, such as enhancers, are found in non-coding regions, indicating their involvement in gene expression regulation. A recent groundbreaking study by Ying2 et al. presents a multi-omics approach, using the Activity-By-Contact model to connect enhancers to their target gene in 20 different cancer tissues, which offers a panoramic view of the regulatory dynamics in their genome.

The Activity-By-Contact (ABC) model was first proposed by Fulco et al. in 20193 as a powerful tool that uses multi-omics to establish regulatory maps across the genome. This model uncovers the complex landscape of enhancer-gene connections by considering their activity (ATAC-seq and DNase-seq) and their physical contact frequencies with target genes. Utilizing H3K27ac ChIP-seq for histone modification data and Hi-C for chromatin interaction data, the ABC model creates a comprehensive map of enhancer-gene connections. Building upon this foundation, the study by Ying2 et al. bridges the theoretical basis of the ABC model with its practical application in cancer research. (Fig 1. A)

Figure 1: Multi-Omics Characterization and ABC Score Analysis of Enhancer-Gene Connections across 20 Cancer Types. A: 20 cancer types from different datasets and their multi-omics data on DNase-seq, ATAC-seq, H3K27ac, ChIP-seq, and HiC-seq were used. B: ABC score was calculated for enhancer-gene connections, and the idea of the expression of many genes is regulated by one enhancer’s regulatory module, and multiple enhancers can regulate a single gene is also shown.  (Modified from the 2nd reference.)

In the 14th issue of Nature Communication, Ying2 et al. mapped enhancer-gene connections by the ABC model and identified 544,849 connections across various cancer types. This approach has significant clinical potential beyond academic curiosity. Cancer development may be influenced by genetic risk factors, and some of them can be potentially altered7. Understanding the molecular targets of known carcinogens is crucial for designing effective therapeutics, underscoring the significance of this research. The study’s findings reveal that the target genes influenced by ABC enhancers (enhancers found by the ABC tool) are closely linked to cancer signaling pathways, high mutation burden, immune infiltration, and pharmaceutical targets. Their findings shed light on effective therapeutic interventions by identifying regulatory variants.

When looking at other works done in this field, what sets this paper apart is its scale and precision. Traditionally, connecting non-coding variants to their target genes relied on methods such as assigning the closest gene to each variant or utilizing expression quantitative trait loci (eQTL). However, the ABC model outperformed these approaches by integrating enhancer activity data, chromatin interaction frequencies, and multi-omics information, providing a more precise and comprehensive view of enhancer-gene connections.

The authors performed an in-depth analysis using the ABC approach in colorectal cancer (CRC) tissues, which uncovered more than 30,000 enhancer-gene connections. Colorectal cancer ranks as the 4th major cause of cancer mortality and is the 2nd most widespread malignancy globally6,thus highlighting its importance. The study revealed the regulatory variant rs4810856. This variant was associated with an increased risk of CRC and was an allele-specific enhancer, influencing the expression of PREX1, CSE1L, and STAU1. Taken together, these genes collectively activate the p-AKT signaling pathway, driving CRC tumorigenesis. The p-AKT intracellular pathway plays a critical role in different cellular activities, including cell growth, proliferation, differentiation, and migration.4 Targeting the p-Akt pathway has been employed as a strategy for treating cancer.5   Therefore, Implicating PREX1, CSE1L, and STAU1 as crucial target genes for the nominated causal variant rs4810856 opens avenues for potential therapeutic interventions.

According to the paper, we should shift our focus from one-to-one regulation (one gene controlled by one enhancer) to one-to-many regulation (one enhancer regulating multiple genes) when analyzing the genome. They found that enhancer regulates about 2 genes; each gene was predicted to be regulated by 2.5 ABC enhancers, on average (Fig. 1B.) This challenges the traditional focus on single-gene regulation and highlights the complexity of interactions where one variant can influence multiple target genes. The functional relevance of these connections was substantiated through bioinformatics analysis and biological experiments.

In their analysis, the results show highly cancer-type-specific regulatory maps, with only 0.5% of enhancer-gene connections shared among pairs of cancer types, highlighting the unique regulatory landscapes of each cancer. In addition to the reliance on computational predictions and the need for experimental validation and functional studies, this cell-type specificity of cis-regulatory elements brings challenges in translating findings from studies like this into clinical practice. Although this study created a reference of regulatory variants and their link to disease genes, it remains to be seen if this approach, with its demonstrated superiority, will pave the way for personalized treatments, targeting specific enhancer-gene connections to combat cancer more effectively.

In conclusion, this integrated analysis reveals that ABC genes (genes regulated by ABC-found enhancers) significantly contribute to tumorigenesis. These genes emerge as potential biomarkers or therapeutic targets across various cancers, marking a notable advance in understanding the regulatory roles of non-coding variants in cancer genomes. The findings not only provide promising directions for personalized medicine but also deepen our insights into the genetic foundations of cancer. Standing at the intersection of genomics and cancer research, these revelations open exciting new avenues for exploration. The identified ABC genes present valuable targets for the development of precise and effective therapies, instilling hope for improved outcomes in cancer treatment. This contribution drives the ongoing evolution of personalized medicine, fundamentally transforming the landscape of cancer care and promising enhanced patient outcomes.

References

1 Uffelmann, E., et al. Genome-wide association studies. Nature Rev. Methods Primers 1, 59 (2021). https://doi.org/10.1038/s43586-021-00056-9.

2 Ying, P., et al. Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk. Nat. Commun. 14, 5958 (2023). https://doi.org/10.1038/s41467-023-41690-z

3 Fulco, C. P., et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51(12), 1664–1669 (2019).

4 Świderska, E., Strycharz, J., Wróblewski, A., Szemraj, J., Drzewoski, J., & Śliwińska, A. Role of PI3K/AKT pathway in insulin-mediated glucose uptake. In Glucose Transport (IntechOpen, Rijeka, 2018).

5 Koveitypour, Z., et al. Signaling pathways involved in colorectal cancer progression. Cell Biosci 9, 97 (2019). https://doi.org/10.1186/s13578-019-0361-4.

6 Wong, A., & Ma, BB. Personalizing therapy for colorectal cancer. Clin Gastroenterol Hepatol. 2014;12(1):139–44.

7 Wu, S., Powers, S., Zhu, W., & Hannun, Y. A. Substantial contribution of extrinsic risk factors to cancer development. Nature 529, 43–47 (2016).

The Most Recent Advancement in COVID Vaccine Development

Heidi Zhuoyan Li

Vaccination is one of the most effective preventive strategies to fight against circulating pathogens, especially during the COVID-19 pandemic. The authorities in Japan have approved the use of a newly developed self-amplifying RNA vaccine (ARCT-154, Arcturus Therapeutics), as a novel strategy to enhance stimulated immune responses with a lower dose.

As of February 2024, over 774,000,000 COVID-19 (Coronavirus Disease-2019) cases have been reported1, and the number of deaths exceeds 7 million2. To combat the rapidly evolving virus, government regulators and pharmaceutical companies have invested enormous resources in developing effective vaccines. As the first approved self-amplifying RNA (saRNA) vaccine, ARCT-154 has been making headlines everywhere for its immunogenic potential with a lower dose3.

At this moment, more than 10 COVID vaccines, spanning 4 vaccine platforms, have been approved for emergency use4 (Table.1). Despite this progress, numerous challenges persist. We are in an ongoing race with pathogens; the rapid evolution of viruses necessitates the need for quickly updating vaccine formulations. Additionally, different vaccine platforms require different manufacturing systems; some of which can pose challenges on production, transportation, and storage5. Moreover, many adverse effects are still under investigation and are challenging to fully mitigate4. There is an urgent need to develop novel vaccines that are easy to manufacture and store while maintaining the efficacy and safety.

Table 1. COVID-19 vaccines that have been approved by WHO for emergency use4. Four different vaccine platforms, including RNA vaccines, adenovirus-vectored vaccines, protein-subunit vaccines, and inactivated vaccines, are detailed in the table, along with additional information on national records of achievement and dates of approval. Table taken from Firouzabadi et al 4.

The development of vaccines seeks a balance between efficacy and safety. The saRNA vaccine stands out, considering its notable advantages. Due to the nature of the saRNA vaccine, it doesn’t need to enter the nucleus, eliminating the risk of integration into the host genome5. The self-amplifying property of the replicon RNA enables an exponential increase of the RNA, resulting in amplified protein production with lower doses6 (Figure.1). Structurally, the saRNA vector contains non-structural protein regions and an antigen coding region6 (Figure 2). The relatively straightforward structure enables quick modular design and streamlined production5.

Figure 1. Simple schematic view illustrating how RNA-based vaccines are processed in cytosol. The figure compares mRNA vaccine and self-amplifying replicon RNA vaccine, illustrating a simplified process where the particle enters the cell, and the gene of interest is then translated into target protein. Figure taken from Comes et al6.
Figure 2. Simple schematic illustration of the structure of self-amplifying RNA. The RNA contains multiple non-structural protein regions and one target antigen coding region. Figure taken from Comes et al6.

Arcturus Therapeutics, in collaboration with CSL Seqirus, developed a saRNA vaccine based on the spike protein geneof the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) D614G variant3. The vaccine has been approved for use in Japan, and its phase III clinical trial reported by Oda et al. aimed to confirm the safety and immunogenicity of the vaccine as a booster shot3.

For the safety analysis, no death or severe adverse events were reported in participants who received ARCT-154, and most reported incidences were short-lived and mild3. Since COVID-19 vaccines had been associated with myocarditis and pericarditis, indicator symptoms (chest pain and shortness of breath) were monitored, but no relevant cases were found3.

In terms of immunogenicity analysis, neutralizing antibodies against Wuhan-Hu-1 and Omicron BA.4/5 lineage were measured and compared between participants who received ARCT-154 and those who received Pfizer-BioNTech (BNT162b2). The results showed that age, gender, and previous vaccination history did not affect the immunogenicity of this saRNA vaccine3. Most importantly, the analysis suggests that this saRNA vaccine was able to induce humoral immune response at the same magnitude as Pfizer-BioNTech (BNT162b2), with a 6-fold lower administrated dose3.

The analyses described above yield meaningful data; however, certain limitations are apparent. In this clinical trial, the data is measured only until 29 days post vaccination3; it remains unknown how the amplified protein product will contribute to the duration of the immune response. Another major limitation is that the humoral response against the currently circulating strain (Omicron XBB) is not tested3, introducing uncertainty about the protective effect induced by the vaccine.

To broaden the immunogenicity analysis, previous animal model studies suggest that saRNA vaccines are capable of stimulating both cellular and humoral responses6. Given the critical role of T lymphocyte-mediated responses during SARS-CoV-2 infection in vaccinated individuals7, the cellular immunity profile induced by ARCT-154 has not yet been characterized.

Adding on another layer of intricacy, studies have identified peptide sequence sharing between human proteins and the virus spike protein8, and have demonstrated cross-reactivity between anti-SARS-CoV-2 spike protein antibodies and human tissue antigens abundant in many organs9. These studies raise an important question regarding whether vaccines targeting the SARS-CoV-2 spike protein may increase the risk of producing autoantibodies against human tissue antigens. 

In the future, it is certainly worthwhile to further investigate the duration of humoral response and the possible existence of cellular response. To address the source of adverse effect, it is essential to characterize the interaction between the vaccine and the host immunity. In addition, future safety analysis should confirm that the amplified protein product does not lead to undesirable effects.

To further optimize the vaccine, incorporating precision medicine into the vaccine development is possible, where individuals may be stratified, according to the presence of genetic markers affecting vaccine efficacy. Current vaccines are designed based on the idea of “one-size-fits-all”. However, people with different genetic profiles may generate distinct responses against the same antigen, potentially contributing to the variation in vaccine efficacy. Accordingly, the vaccine formulation may be modified for the stratified populations, eventually enhancing the vaccine efficacy for individuals with diverse genetic backgrounds.

Vaccination has saved uncountable number of lives, and the success in vaccine development is invaluable. To overcome challenges in the process, the saRNA vaccine platform provides a unique approach with high scalability and affordability. As the field of medical genomics grows, we will develop a more comprehensive view of host-pathogen interaction. Consequently, the pathogen surveillance system may be further improved to accurately predict the evolutionary path of circulating pathogens, aiding in the design of effective vaccine formulas. At the same time, the genetic basis of heterogeneity in response to the vaccine can be studied using advanced genomic technologies, ultimately contributing to the optimization of the vaccine.

Reference

  1. COVID-19 cases | WHO COVID-19 dashboard. datadot https://data.who.int/dashboards/covid19/cases.
  2. COVID-19 deaths | WHO COVID-19 dashboard. datadot https://data.who.int/dashboards/covid19/cases.
  3. Oda, Y. et al. Immunogenicity and safety of a booster dose of a self-amplifying RNA COVID-19 vaccine (ARCT-154) versus BNT162b2 mRNA COVID-19 vaccine: a double-blind, multicentre, randomised, controlled, phase 3, non-inferiority trial. The Lancet Infectious Diseases S1473309923006503 (2023) doi:10.1016/S1473-3099(23)00650-3.
  4. Firouzabadi, N., Ghasemiyeh, P., Moradishooli, F. & Mohammadi-Samani, S. Update on the effectiveness of COVID-19 vaccines on different variants of SARS-CoV-2. International Immunopharmacology 117, 109968 (2023).
  5. Blakney, A. K., Ip, S. & Geall, A. J. An Update on Self-Amplifying mRNA Vaccine Development. Vaccines 9, 97 (2021).
  6. Comes, J. D. G., Pijlman, G. P. & Hick, T. A. H. Rise of the RNA machines – self-amplification in mRNA vaccine design. Trends in Biotechnology 41, 1417–1429 (2023).
  7. Painter, M. M. et al. Prior vaccination promotes early activation of memory T cells and enhances immune responses during SARS-CoV-2 breakthrough infection. Nat Immunol 24, 1711–1724 (2023).
  8. Kanduc, D. & Shoenfeld, Y. Molecular mimicry between SARS-CoV-2 spike glycoprotein and mammalian proteomes: implications for the vaccine. Immunol Res 68, 310–313 (2020).
  9. Vojdani, A. & Kharrazian, D. Potential antigenic cross-reactivity between SARS-CoV-2 and human tissue with a possible link to an increase in autoimmune diseases. Clinical Immunology 217, 108480 (2020).

New Regulators in DNA Damage Repair lead to the New Spark in Treating Cancer and the Rising of Sequencing Analysis

Jasmine Li

The application of sequencing technologies and analysis reveals the perturbation of the p53 pathway in DAXX and ATRX null ALT cells, and this finding elucidates the absence of DAXX-ATRX leads to the decreased signal of p53 chromatin binding and DNA damage response.

DNA damage response helps in correcting and responding to different types of DNA damage that happen in the cell cycle and epigenetics modifications1. Perturbed DNA damage response has been long recognized as an important factor for cancer development and treatment1. Luckily, the presence of tumor suppressor proteins saves the repair process. The relationship between certain tumor suppressor proteins has not yet been defined. Gulve et al. discovered the regulation and interaction of tumor suppressor proteins in alternative lengthening of telomeres (ALT) cells, which can be used for designing cancer treatments2.

The tumor suppressor proteins regulate cell division and apoptosis. Specifically, p53 is a tumor suppressor where it prevents cells from rapid proliferation3. The mutation of p53 has been found in many cancers, where there is a disruption in the DNA-binding domain or regulatory pathways4. Similarly, DAXX and ATRX are tumor suppressor proteins that avoid ALT in cancer2. ALT pathway has been found in a variety of cancers and is caused by altered DNA repair pathways5,6. The “teamwork” of these proteins has remained a mystery in ALT cells until Gulve and colleagues discovered that loss of DAXX-ATRX compromises p53 chromatin binding and DNA damage response2.

To start the story, the authors treated the knockout and control cell line with DNA damaging agent etoposide, which induced p53 response and slowed cell proliferation. By the application of RNA-seq, they found that 933 genes in DAXX_KO and 1562 genes in ATRX_KO model were less responsive to etoposide2. The control cell line had more p53 activity response compared with the mutant line. Even with functional p53, the knockout of DAXX and ATRX can still lead to the existence/survival of ALT cells. Furthermore, the authors also validated their findings from the use of heatmap and RT-qPCR, where the genes that have the most reduced response are the p53-response genes in the mutant cell line. The alteration of p53-response transcriptome in cells lacking DAXX and ATRX proves the importance of these two tumour suppressor proteins in DNA damaging response. But what about the chromatin binding pattern of p53 itself?

The authors utilized ChIP-seq to find the chromatin binding pattern of p53. Not surprisingly, the p53 signals were downregulated in the cells lacking DAXX, meaning that there was less interaction between the protein and the DNA. As shown in Figure 1, the p53 binds to the DNA with the help of DAXX, ATRX, and histone H3.3. In the absence of DAXX-ATRX, there is a reduction in p53 site accessibility, failing p53 binding. Similarly, without the help of DAXX-ATRX, the telomere has been damaged due to no p53 binding and repair, resulting in telomere dysfunction. We now know that without DAXX, p53 has a decreased signal in the binding pattern, failing to stop the unwanted cell proliferation and lengthening of the telomere.

The authors finished telling their research story by doing the ATAC-seq to find how DAXX-ATRX regulated p53 binding. The same overlapping trends of p53 peak intensity were observed in both the ChIP-seq and ATAC-seq data. There was reduced peak intensity in both DAXX and ATRX lacking cells, and loss of p53 binding was observed. This demonstrates that ATRX and DAXX regulate p53 binding through chromatin accessibility. With all these findings in mind, further studies on drugs that recover the functions of DAXX and ATRX can be investigated in the ALT cells. This can be a potential therapeutic target for patients with ALT-like phenotype cancer. Additionally, small molecules that elevate the binding between p53 and DNA can be focused to recover the chromatin binding pattern in ALT cells. Gulve et al.’s efforts have yielded the impact of possible cancer drug targets in ALT-phenotype cancers. To further improve the study, the authors can investigate the impact of DAXX and ATRX in other cells. 

From the study, we see the DAXX-ATRX complex is crucial in the regulation of p53 binding and DNA damage response in ALT cells. The loss of DAXX-ATRX may be one of the reasons for the occurrence of ALT-like phenotype2. The understanding of this protein regulatory network helps in developing drugs that can regulate the upstream or downstream of this network, thereby treating ALT-like phenotype cancer. The p53 protein can also be seen as a potential target in treating cancer. Recent studies have been working on developing small-molecule drugs to target and restore the wild-type conformation and activity of the mutant p53 in cancer7. Drugs such as MIRA-1 modify cysteines in the p53 protein to prevent the structural changes of functional p538. However, these drugs have not been approved so far and more research is needed to solve the concerns7.

Another point that is worth mentioning is how the computational tools and sequencing analysis have improved the research. They aid the analysis of genomic data. In the article, with the help of these technologies, the authors found the regulatory network of these tumor suppressor proteins. In clinical, we can find potential drug targets and therapies by knowing these protein-protein interactions. For instance, the use of single-cell RNA sequencing has been applied to detect tissue heterogeneity and specific genetic causes of breast cancer at the molecular level9. Science is always the teamwork from various fields and perspectives. The goal is always to help us, human beings. The combination of sequencing analysis and genomic data can lead to a broader view of research and personalized medicine.

Figure 1. Model of ATRX and DAXX on the regulation of p53 chromatin binding. The wild-type cells have normal p53 chromatin binding and protein stability. The loss of ATRX and DAXX prevents p53 chromatin binding. The reduced binding of p53 leads to the binding and accumulation of γH2AX at the loci and subtelomeres, resulting in the alteration of the DNA damage repair pathway. Reduced p53 chromatin binding also leads to long-term telomere DNA damage, resulting in telomere dysfunction (Figure adapted from Gulve et al.2).

References

1.        Pilié, P. G., Tang, C., Mills, G. B. & Yap, T. A. State-of-the-art strategies for targeting the DNA damage response in cancer. Nat Rev Clin Oncol 16, 81–104 (2019).

2.        Gulve, N. et al. DAXX-ATRX regulation of p53 chromatin binding and DNA damage response. Nat Commun 13, 5033 (2022).

3.        Ozaki, T. & Nakagawara, A. Role of p53 in Cell Death and Human Cancers. Cancers (Basel) 3, 994–1013 (2011).

4.        Rivlin, N., Brosh, R., Oren, M. & Rotter, V. Mutations in the p53 Tumor Suppressor Gene: Important Milestones at the Various Steps of Tumorigenesis. Genes Cancer 2, 466–474 (2011).

5.        Li, F. et al. <scp>ATRX</scp> loss induces telomere dysfunction and necessitates induction of alternative lengthening of telomeres during human cell immortalization. EMBO J 38, (2019).

6.        Macha, S. J. et al. Alternative Lengthening of Telomeres in Cancer Confers a Vulnerability to Reactivation of p53 Function. Cancer Res 82, 3345–3358 (2022).

7.        Hassin, O. & Oren, M. Drugging p53 in cancer: one protein, many targets. Nat Rev Drug Discov 22, 127–144 (2023).

8.        Saha, M. N., Chen, Y., Chen, M.-H., Chen, G. & Chang, H. Small molecule MIRA-1 induces in vitro and in vivo anti-myeloma activity and synergizes with current anti-myeloma agents. Br J Cancer 110, 2224–2231 (2014).

9.        Sharma, A. et al. Longitudinal single-cell RNA sequencing of patient-derived primary cells reveals drug-induced infidelity in stem cell hierarchy. Nat Commun 9, 4931 (2018).