Proteome-wide association study shed light on causality between proteins and venous thromboembolism

Yuxi Yang

Proteome-wide association study, as an integrative analysis tool, can leverage genetic variations to predict VTE risks and to identify novel biomarkers through joint protein coding genes.

Despite the genetic etiology of venous thromboembolism (VTE) revealed by previous genome-wide association study (GWAS), it was not until 2023 that proteins actively associated with VTE pathology were investigated using Proteome-wide association studies (PWAS)¹.

While GWAS has been instrumental in identifying new genetic variants, it often falls short in providing robust support for disease identification and treatment development. This is because complex diseases like VTE are rarely caused by a single genetic factor². Challenges such as linkage disequilibrium, the non-random association of alleles within population, and population stratification further hinder GWAS alone to establish causalities. Even followed by fine mapping of significant SNPs form GWAS, there is still missing interpretable biological mechanism to pin-point the exact causal variant⁴.

Today, performing case-control studies is easier than ever thanks to the rapid development of large-scale biobanks containing extensive genotypes and phenotype data. PWAS, as an integrative mean, overcomes the limitations of GWAS and help to build causality by combining results from multiple computational tools. PWAS provides a valuable tool to explore novel protein functions, to identify novel biomarkers and disease mechanisms, and to open the door to effective disease risk predictions. Figure 1 provides an overview of PWAS.

VTE is a complex disease where blood clots occur in the veins. It is triggered by both genetic factors affecting coagulation process and acquired factors such as aging, surgery and hormone therapies⁵. Li and colleagues¹ were determined to decipher the underlying mechanism of VTE through PWAS. Their results elucidated 20 proteins involved in VTE development, including 3 novel ones modulating VTE risks.

Li et al. (2023) investigated a cohort of 281,466 Europeans. Their GWAS identified 1529 SNPs associated with VTE, while whole blood protein quantitative trait locus (pQTL) data from the same ancestry provided reference proteome values. Carefully following the FUSION pipeline, they combined the summary statistics from GWAS with the reference proteome from pQTL to formulate aggregated PWAS analysis. After aggregating significant loci from GWAS into protein-coding genes, researchers identified 20 genes and their downstream protein were associated with VTE after controlling false positive ratios.

Figure 1: The general framework of PWAS. PWAS combines data from various computational analysis inclusing GWAS, pQTL, Mendelian randomization, and Bayesian colocalization. GWAS was firstly used to identify SNPs that are associated with the phenotype, then pQTL were used to identify proteins that are related to the phenotype. Mendelian randomization can then verify the causality between known proteins and the phenotype. Bayesian colocalization can integrate the result from GWAS and pQTL to assess the probability of SNPs on affecting disease risks, through analyzing the joint SNPs results into protein-coding genes and corresponding protein expressions.

Subsequently, researchers applied independent Mendelian randomization and Bayesian colocalization analysis to see if genetic variations affect the relationship between plasma protein expressions and the outcome (VTE). They firstly confirmed the causality of 13 protein and VTE risks, then identified 6 SNPs that both increased VTE risks and modulated concentrations of VTE risk-related proteins. These findings reveal a shared genetic basis underlying both VTE risks and protein level modulations. It sheds lights into potential disease risk prediction tools that integrate results at gene levels from SNPs to arrive at a more actionable protein level interpretation.

Li et al. (2023) embarked on their journey to study the connectivity of these 20 VTE-associated proteins for more interpretable findings. They constructed a protein-protein interaction network using the STRING web database and conducted pathway enrichment assays. These 20 proteins were interconnected closely and contributed to various physiological activities, such as complement cascade, coagulation pathway, platelet activation and immune response initiation. These findings provide valuable insights that could guide the development of gene therapy aimed at correcting abnormal protein levels associated with VTE.

To delve even deeper into the role of these proteins in VTE, a further understanding to the relationship between plasma protein concentrations and VTE risks was necessary. Blood tissues from same ancestry of VTE patients and healthy individuals were obtained followed by gene expression analysis. The most significant aspect of their findings centers on the discovery of casual relationships between three novel proteins and VTE: PLEK, SERPINA1, and SERPINE2, which exhibited decreased expressions in VTE patients compared to healthy samples. Consequently, these proteins show promising therapeutic potential for treating VTE, pending a thorough understanding of their roles and molecular pathways. Moreover, they may serve as potential biomarkers to aid in VTE diagnosis, as clues regarding their functions in vascular diseases have surfaced from previous literatures^6,7,8,9.

In agreement with previous work, genetic variations in SERPINA1 gene were linked to the risk of VTE possibly by affecting plasma cortisol levels^6,7. Similarly, certain genetic variations in the PLEK gene serve as risk factor for VTE, and its transcribed protein, pleckstrin, is known to participate in platelet activities⁸. Additionally, SERPINE2 may play a crucial role in vascular diseases through its inhibitory functions in coagulation and fibrinolysis cascades⁹.

PWAS analysis could help researchers to elucidate pathogenetic proteins for diseases through a proteomic approach by combing genetic information and statistical analysis. PWAS is a powerful tool for translating genetic variations into detailed disease mechanisms by analyzing protein expressions. It overcomes the limitations of GWAS and plays an indispensable role on highlighting biomarkers for complex diseases. However, PWAS studies have limitations, such as reliance on protein databases primarily derived from European populations and limited access to proteome-wide data^1,3. Therefore, caution should be exercised when applying study results to the broader population. Additionally, protein expressions are not directly analyzed in PWAS, and they are obtained indirectly from GWAS statistics and pQTL¹, which may lead to less robust conclusions. Nevertheless, with advancements in technology and the expansion of study cohorts, public protein databases will likely become more diverse, enhancing the strength of PWAS analysis, and facilitating the development of more efficient target treatments for diseases. Furthermore, PWAS can be applied to a broader range of diseases as human plasma proteome data becomes more readily available, making it a versatile tool for understanding the underlying mechanisms of various health conditions.

References:

Li, H. et al. Proteome-wide mendelian randomization identifies causal plasma proteins in venous thromboembolism development. J. Hum. Genet. 68, 805–812 (2023).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Brandes, N., Linial, N. & Linial, M. PWAS: proteome-wide association study—linking genes and phenotypes by functional variation in proteins. Genome Biol. 21, 173 (2020).
Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).
Wolberg, A. S. et al. Venous thrombosis. Nat. Rev. Dis. Prim. 1, 15006 (2015).
Manderstedt, E. et al. Thrombotic risk determined by rare and common SERPINA1 variants in a population-based cohort study. J. Thromb. Haemost. 20, 1421–1427 (2022).
Allara, E., Lee, W.-H., Burgess, S., consortium, I. & Larsson, S. C. Genetically predicted cortisol levels and risk of venous thromboembolism. PLoS ONE 17, e0272807 (2022).
Kanse, S. M. et al. Reciprocal regulation of urokinase receptor (CD87)-mediated cell adhesion by plasminogen activator inhibitor-1 and protease nexin-1. J. Cell Sci. 117, 477–485 (2003).
Fröbel, J. et al. Platelet Proteome Analysis Reveals Integrin-dependent Aggregation Defects in Patients with Myelodysplastic Syndromes*. Mol. Cell. Proteom. 12, 1272–1280 (2013).

Single-cell sequencing reveals novel taxonomic associations in an accessible manner

Jade Zhang

Cell heterogeneity is tantamount to diversity in bacterial populations. New, accessible methods allow for the exploration of microbial communities to uncover the genetic associations between them and the implications those findings have on scientific research.

Though there is wide demand for single-cell sequencing of microbes, there is little availability for robust, high-throughput methods. This is due largely in part to the small size of bacterial cells, resilient cell walls, and the little amount of DNA found within each cell^1,2. In Nature Methods, Lan et al. describe a new single-cell sequencing method using generalizable microfluidics to tackle the issues of accessibility and applicability in diverse microbial populations³.

Bacterial populations are well-known for their genetic heterogeneity, making them a very interesting research organism. Single-cell sequencing is a powerful tool used to study the heterogeneity that plays a key role in many biological pathways, including bacterial evolution, antibiotic resistance, colonization, and pathogenesis^1,4. Traditionally, single-cell genetic heterogeneity is observed through colony plating, but this method is unable to capture heterogeneity contributed by microbes that cannot be cultured or by microbes containing rare variants⁵. Bacteria that play important roles in their microbial communities are thus missed due to the limits of existing sequencing methods, and the larger picture of microbial diversity is lost⁶. Previous studies have utilized droplet microfluidics as a method to study single-cell sequencing, a method that works by isolating single cells in picoliter-scale droplets to allow for more specific analysis^2,7,8. However, droplet microfluidics are not only extremely advanced and require specialized expertise in order to be carried out, but are very costly and unrealistic for academic labs to use³. This makes them difficult to implement across the larger science community.

With this in mind, Lan et al. integrated droplet microfluidics and one-step targeted multiplex PCR to create what they call droplet targeted amplicon sequencing, or DoTA-seq (figure 1)³. DoTA-seq improves in areas its predecessors failed in by using targeted sequencing, increasing the capture rate for areas of interest^2,7. Furthermore, DoTA-seq uses simplistic microfluidics modules, eliminating the need for specialized expertise found in other microfluidics workflows and increasing usability^2,7,8. Lan et al. also mention the possibility of shifting the entire system away from microfluidics, allowing for even greater utility. This shift away from the microfluidics platform as a whole would make it possible for greater numbers of labs to utilize the technology, making single-cell sequencing possible at a much larger scale. Techniques that allow for single-cell sequencing to occur without a microfluidics platform are already being developed, and removing this aspect of the workflow increases the possible utility of DoTA-seq⁹.

**Figure 1: A)** In DoTA-seq, microbes are applied to a microfluidics droplet generator to be isolated into single cells. The microbes are added to a hydrogel precursor solution which, when encountered by oil, separates the flow into individual cells. B) Individual cells containing microbes are crosslinked into a hydrogel which is then exposed to various detergents and enzymes, digesting the microbes and leaving the DNA in the matrix. C) The hydrogel matrix is reapplied to a microfluidics droplet generator suspended in PCR mix, and random oligo markers targeting a region of interest are incorporated into single cells. These final single cells then undergo thermal cycling to be amplified and analyzed with PCR. Created with BioRender.com

To test out the robust applicability of DoTA-seq, the authors sequenced target loci in both Gram-negative and Gram-positive bacteria. By sequencing three separate regions in the bacteria, they observed ~90% concordance in expected genes, demonstrating the efficient capture rate of DoTA-seq. DoTA-seq was also used to sequence a pseudo microbial community consisting of various Gram-positive and Gram-negative bacteria, where Lan et al. reported a 70-90% return of target genes for each species, showcasing DoTA-seq’s ability to differentiate between diverse bacterial species at the single-cell level.

Since antibiotic resistance is a great concern for us now as antibiotic-resistant microbes become more common, the authors set out to utilize DoTA-seq as a method to capture antibiotic resistance⁴. In a synthetic microbial population containing 25 well-characterized microbes, they were able to successfully identify 12 antibiotic-resistant genes (ARGs) in the human gut microbiome, with most ARGs identified at >70% prevalence. Utilizing DoTA-seq in antibiotic-resistance tests also allowed them to identify that the members of the microbial communities had fluctuating levels of ARGs, demonstrating both the vast heterogeneity of microbes and DoTA-seq’s ability to capture the heterogeneity.

To prove its capabilities in real-world scenarios, Lan et al. used DoTA-seq to observe ARGs in the gut microbiomes of mouse and human fecal samples. In human samples, most plasmid replicons were found to be associated with only a single family-level taxon, but a single plasmid replicon was found to be associated with two strains from different phyla, suggesting that some plasmids have a wide range of potential hosts. These analyses were confirmed with a BLAST search, concluding that DoTA-seq is able to successfully associate ARGs of interest to their original bacterial strains, allowing for tracking of taxonomic association. This will allow for high-level tracking of bacterial evolution, improving our response to antibiotic resistance.

Though microfluidics technology and single-cell sequencing have been possible for many years, Lan et al.’s new DoTA-seq workflow makes way for greater throughput analyses with more relevant and interesting taxonomic association information. With the prospects of DoTA-seq, we will be able to more accurately identify where genes, including ARGs, are arising, and how they interact across microbial communities. In the near future, we may be able to track the rise of antimicrobial resistance in real time, leading to greater advancements in antibiotic research and clinical treatment options. The simplistic approach of DoTA-seq will also allow for a wider range of researchers access to this method to create higher-throughput, accurate assays. Overall, Lan et al.’s DoTA-seq workflow opens many doors to the possibilities of microbial research and uncovering the relationships between bacterial strains. This will improve precision medicine and help with more directed care for patients and their unique microbiomes as we will be able to provide more accurate information and make more informed decisions.

References

1. Kuchina, A. et al. Microbial single-cell RNA sequencing by split-pool barcoding. Science 371, eaba5257 (2021).

2. Lan, F., Demaree, B., Ahmed, N. & Abate, A. R. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding. Nat Biotechnol 35, 640–646 (2017).

3. Lan, F. et al. Massively parallel single-cell sequencing of diverse microbial populations. Nat Methods 1–8 (2024) doi:10.1038/s41592-023-02157-7.

4. Murray, C. J. L. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. The Lancet 399, 629–655 (2022).

5. Li, J. et al. Epigenetic Switch Driven by DNA Inversions Dictates Phase Variation in Streptococcus pneumoniae. PLOS Pathogens 12, e1005762 (2016).

6. Lloyd, K. G., Steen, A. D., Ladau, J., Yin, J. & Crosby, L. Phylogenetically Novel Uncultured Microbial Cells Dominate Earth Microbiomes. mSystems 3, 10.1128/msystems.00055-18 (2018).

7. Zheng, W. et al. High-throughput, single-microbe genomics with strain resolution, applied to a human gut microbiome. Science 376, eabm1483 (2022).

8. Eastburn, D. J., Sciambi, A. & Abate, A. R. Ultrahigh-Throughput Mammalian Single-Cell Reverse-Transcriptase Polymerase Chain Reaction in Microfluidic Drops. Anal. Chem. 85, 8016–8021 (2013).

9. Clark, I. C. et al. Microfluidics-free single-cell genomics with templated emulsification. Nat Biotechnol 41, 1557–1566 (2023).