Yuxi Yang
Proteome-wide association study, as an integrative analysis tool, can leverage genetic variations to predict VTE risks and to identify novel biomarkers through joint protein coding genes.
Despite the genetic etiology of venous thromboembolism (VTE) revealed by previous genome-wide association study (GWAS), it was not until 2023 that proteins actively associated with VTE pathology were investigated using Proteome-wide association studies (PWAS)1.
While GWAS has been instrumental in identifying new genetic variants, it often falls short in providing robust support for disease identification and treatment development. This is because complex diseases like VTE are rarely caused by a single genetic factor2. Challenges such as linkage disequilibrium, the non-random association of alleles within population, and population stratification further hinder GWAS alone to establish causalities. Even followed by fine mapping of significant SNPs form GWAS, there is still missing interpretable biological mechanism to pin-point the exact causal variant4.
Today, performing case-control studies is easier than ever thanks to the rapid development of large-scale biobanks containing extensive genotypes and phenotype data. PWAS, as an integrative mean, overcomes the limitations of GWAS and help to build causality by combining results from multiple computational tools. PWAS provides a valuable tool to explore novel protein functions, to identify novel biomarkers and disease mechanisms, and to open the door to effective disease risk predictions. Figure 1 provides an overview of PWAS.
VTE is a complex disease where blood clots occur in the veins. It is triggered by both genetic factors affecting coagulation process and acquired factors such as aging, surgery and hormone therapies5. Li and colleagues1 were determined to decipher the underlying mechanism of VTE through PWAS. Their results elucidated 20 proteins involved in VTE development, including 3 novel ones modulating VTE risks.
Li et al. (2023) investigated a cohort of 281,466 Europeans. Their GWAS identified 1529 SNPs associated with VTE, while whole blood protein quantitative trait locus (pQTL) data from the same ancestry provided reference proteome values. Carefully following the FUSION pipeline, they combined the summary statistics from GWAS with the reference proteome from pQTL to formulate aggregated PWAS analysis. After aggregating significant loci from GWAS into protein-coding genes, researchers identified 20 genes and their downstream protein were associated with VTE after controlling false positive ratios.

Subsequently, researchers applied independent Mendelian randomization and Bayesian colocalization analysis to see if genetic variations affect the relationship between plasma protein expressions and the outcome (VTE). They firstly confirmed the causality of 13 protein and VTE risks, then identified 6 SNPs that both increased VTE risks and modulated concentrations of VTE risk-related proteins. These findings reveal a shared genetic basis underlying both VTE risks and protein level modulations. It sheds lights into potential disease risk prediction tools that integrate results at gene levels from SNPs to arrive at a more actionable protein level interpretation.
Li et al. (2023) embarked on their journey to study the connectivity of these 20 VTE-associated proteins for more interpretable findings. They constructed a protein-protein interaction network using the STRING web database and conducted pathway enrichment assays. These 20 proteins were interconnected closely and contributed to various physiological activities, such as complement cascade, coagulation pathway, platelet activation and immune response initiation. These findings provide valuable insights that could guide the development of gene therapy aimed at correcting abnormal protein levels associated with VTE.
To delve even deeper into the role of these proteins in VTE, a further understanding to the relationship between plasma protein concentrations and VTE risks was necessary. Blood tissues from same ancestry of VTE patients and healthy individuals were obtained followed by gene expression analysis. The most significant aspect of their findings centers on the discovery of casual relationships between three novel proteins and VTE: PLEK, SERPINA1, and SERPINE2, which exhibited decreased expressions in VTE patients compared to healthy samples. Consequently, these proteins show promising therapeutic potential for treating VTE, pending a thorough understanding of their roles and molecular pathways. Moreover, they may serve as potential biomarkers to aid in VTE diagnosis, as clues regarding their functions in vascular diseases have surfaced from previous literatures6,7,8,9.
In agreement with previous work, genetic variations in SERPINA1 gene were linked to the risk of VTE possibly by affecting plasma cortisol levels6,7. Similarly, certain genetic variations in the PLEK gene serve as risk factor for VTE, and its transcribed protein, pleckstrin, is known to participate in platelet activities8. Additionally, SERPINE2 may play a crucial role in vascular diseases through its inhibitory functions in coagulation and fibrinolysis cascades9.
PWAS analysis could help researchers to elucidate pathogenetic proteins for diseases through a proteomic approach by combing genetic information and statistical analysis. PWAS is a powerful tool for translating genetic variations into detailed disease mechanisms by analyzing protein expressions. It overcomes the limitations of GWAS and plays an indispensable role on highlighting biomarkers for complex diseases. However, PWAS studies have limitations, such as reliance on protein databases primarily derived from European populations and limited access to proteome-wide data1,3. Therefore, caution should be exercised when applying study results to the broader population. Additionally, protein expressions are not directly analyzed in PWAS, and they are obtained indirectly from GWAS statistics and pQTL1, which may lead to less robust conclusions. Nevertheless, with advancements in technology and the expansion of study cohorts, public protein databases will likely become more diverse, enhancing the strength of PWAS analysis, and facilitating the development of more efficient target treatments for diseases. Furthermore, PWAS can be applied to a broader range of diseases as human plasma proteome data becomes more readily available, making it a versatile tool for understanding the underlying mechanisms of various health conditions.
References:
- Li, H. et al. Proteome-wide mendelian randomization identifies causal plasma proteins in venous thromboembolism development. J. Hum. Genet. 68, 805–812 (2023).
- Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
- Brandes, N., Linial, N. & Linial, M. PWAS: proteome-wide association study—linking genes and phenotypes by functional variation in proteins. Genome Biol. 21, 173 (2020).
- Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).
- Wolberg, A. S. et al. Venous thrombosis. Nat. Rev. Dis. Prim. 1, 15006 (2015).
- Manderstedt, E. et al. Thrombotic risk determined by rare and common SERPINA1 variants in a population-based cohort study. J. Thromb. Haemost. 20, 1421–1427 (2022).
- Allara, E., Lee, W.-H., Burgess, S., consortium, I. & Larsson, S. C. Genetically predicted cortisol levels and risk of venous thromboembolism. PLoS ONE 17, e0272807 (2022).
- Kanse, S. M. et al. Reciprocal regulation of urokinase receptor (CD87)-mediated cell adhesion by plasminogen activator inhibitor-1 and protease nexin-1. J. Cell Sci. 117, 477–485 (2003).
- Fröbel, J. et al. Platelet Proteome Analysis Reveals Integrin-dependent Aggregation Defects in Patients with Myelodysplastic Syndromes*. Mol. Cell. Proteom. 12, 1272–1280 (2013).
