Harnessing the power of population databases, one study at a time

Dr. Shreejoy Tripathy and his team demonstrate the power of the UK BioBank population database in a study that unpacks the complicated interplay between schizophrenia polygenic risk score, psychotic episodes, and cannabis use.

Milcah Sutanto, Gabriela Tanumihardja, & Yuan Tian

Dr. Shreejoy Tripathy, Ph.D. (right) is pictured with postdoctoral fellow Dr. Michael Wainberg, Ph.D. (left). Dr. Tripathy is an independent scientist at the Krembil Centre for Neuroinformatics within the Centre for Addiction and Mental Health and an Assistant Professor in the Department of Psychiatry at the University of Toronto. Photo provided by Dr. Tripathy.

Have you ever wondered if genetics and the environment interact to play a role in the context of mental illnesses? This is exactly what Dr. Shreejoy Tripathy (Ph.D.), an Assistant Professor at the University of Toronto and an independent scientist at the Krembil Centre for Neuroinformatics within the Centre for Addiction and Mental Health, seeks to understand. The importance of understanding mental illnesses has been heightened with the onset of the COVID-19 pandemic. The pandemic has had a significant negative impact on the mental health of the general population worldwide1. In Canada specifically, 1 in 5 people experience a mental illness annually2. This demonstrates the urgent need to better understand the underlying causes of mental illnesses in hopes of developing both preventative and treatment strategies. Emerging research has been centred around understanding the development of mental illnesses; this has included investigating the interplay between genetic and environmental factors3. One strategy used to study these gene-environment relationships is large population databases, like the UK BioBank4. In collaboration with Dr. Michael Wainberg (Ph.D.), a postdoctoral fellow, Dr. Tripathy used the UK BioBank to investigate the relationship between cannabis use and psychotic experiences in the general population and those with a genetic predisposition for schizophrenia5.


Presentation of schizophrenia

Schizophrenia is a complex heritable mental illness that has a long-term impact on patients and society6. The symptoms of schizophrenia are usually classified as either positive, negative, or cognitive (Figure 1)6. Positive symptoms are characterized by a distortion or amplification of normal behaviours, such as hallucinations, whereas negative symptoms are indicated by a loss or dampening of normal functions, such as reduced emotional expression. Cognitive symptoms consist of difficulties in memory and attention.

Figure 1: Diagram depicting the potential symptoms of schizophrenia6. There are three classifications of symptoms. Positive symptoms arebehaviours that are distorted or amplified from normal behaviours, including hallucinations, delusions, disorganized speech, and confused thoughts. Negative symptoms are behaviours that show a loss or decrease in normal functions such as a lack of pleasure and struggling with daily routine (a lack of motivation). Cognitive symptoms can include memory problems and impaired sensory perception. Image created in Biorender.com.

Unearthing the heritability behind schizophrenia

“Genetics has been really useful in psychiatry and [in] helping [us] to understand and assess risk for various [mental] illnesses”, stated Dr. Tripathy when asked about the implications of genetics in psychiatric research. One of the first methods used to study the genetic component of developing mental illnesses was twin studies3. This technique evaluates whether a certain trait is more commonly shared in monozygotic twins (genetically identical) compared to dizygotic twins (non-genetically identical). Traits that are shared more commonly between monozygotic twins are considered more heritable, indicating that the traits are more heavily influenced by genetic factors. Interestingly, recent twin studies have estimated schizophrenia’s heritability to be between 60-65%, which alludes to the importance of genetic factors for its expression3. Moreover, it has been widely accepted that first-degree relatives of schizophrenic patients have a higher risk of developing schizophrenia compared to those without affected first-degree relatives3. Overall, the variation within an individual’s genetic makeup significantly contributes to the risk of developing schizophrenia.

Like virtually all mental illnesses, schizophrenia is a complex polygenic disease, which relies on the action of several different genes to manifest7. To find the genes that are significantly associated with the disease, genome-wide association studies (GWAS) are often conducted. GWAS examines the genomes of a large set of individuals, with and without the disease of interest, and looks for genetic markers that can be used to predict the occurrence of the disease (Figure 2)3. GWAS has linked more than 100 common single nucleotide polymorphisms (SNPs), spanning more than 600 genes, with the development of schizophrenia3. Each of these genetic markers, also known as genetic variants, found by GWAS can be used to statistically estimate an individual’s risk of developing the disease due to genetics alone. This statistical estimate, often referred to as polygenic risk score (PRS), is calculated by taking the weighted sum of the risk of each disease-associated genetic variant7. With many genetic variants contributing to the PRS to a small degree, it is difficult to determine the overall risk of developing the disease without considering other factors, such as the environment.

Figure 2: Simplified outline of a schizophrenia GWAS3. A schizophrenia GWAS seeks to understand the relationship between having both schizophrenia and common genetic variants found within the population. The genomes of two large groups of individuals with and without schizophrenia are analyzed for genetic markers that may be predictive of developing schizophrenia. These genetic markers are identified by analyzing genetic SNPs within the population. These markers are then statistically analyzed to determine if they can be significantly associated with schizophrenia. Figure created in BioRender.com.

Dr. Tripathy noted that “for the most part there are no psychiatric disorders that are completely due to genetics”. In fact, it has been well established that most psychiatric illnesses are a product of the interaction between genetic and environmental factors. The development of schizophrenia has been linked with exposure to many environmental factors such as childhood trauma, contraction of certain viral and bacterial infections, socioeconomic factors, and the use of cannabis6. The interaction between genetic and environmental factors is complex, and often very difficult to disentangle. Large-scale population databases that contain significant genetic and non-genetic information, like the UK BioBank, can be used to further investigate these relationships.

Using the UK BioBank to unravel the interaction between the PRS of schizophrenia, psychotic experiences, and cannabis use

Dr. Tripathy’s research lab used the UK BioBank to unpack the relationship between the PRS of schizophrenia and cannabis use. The UK BioBank is a large open-access resource that contains anonymized genetic and non-genetic information from 500,000 UK residents and is updated regularly8. This database includes information on participants’ genome-wide genotypes, physical measurement examinations, health-related records, and answers to online questionnaires (Figure 3). When the participants joined the UK BioBank project, they ranged between 40-69 years old, which allowed for the data collection on any age-related health problems and baseline data before the onset of any severe diseases. However, an important limitation of this database to note is its lack of diversity—most participants were White British. All in all, the UK BioBank was created to inspire well-powered research to determine the true effect of genetic and non-genetic factors contributing to disease. The availability of this online database to researchers around the world has spurred on many studies that focus on health-related research to improve clinical care. As explained by Dr. Tripathy, “these types of datasets are really powerful”. The wide range of information available in this population database will also allow researchers to see potential connections and correlations, inspiring new studies that could further the field.

Figure 3: Schematic of the data collection points for the UK BioBank8. The UK BioBank collects data from 500,000 study participants. This data includes genetic and non-genetic information. Non-genetic data consists of information collected from health-related records, physical measurement exams, interviews, and self-reported questionnaires. Figure created in BioRender.com.

As data analysts, Drs. Tripathy and Wainberg evaluated the available data in the UK BioBank and found that there were over 150,000 participants who completed the Mental Health Questionnaire and self-reported information relating to substance use5. They quickly realized that this massive amount of data could be used to investigate the interaction between schizophrenia and cannabis use–providing an important insight into the development of the disease. When talking about this study, Dr. Tripathy remarked that it was especially “timely because cannabis has been legalized in Canada… and it’s increasingly becoming decriminalized throughout the world”. The use of cannabis is very common amongst Canadians–1 in 4 Canadians reported to have used cannabis within the past 12 months in the 2021 Canadian annual statistics9

Dr. Tripathy and his team performed a cross-sectional analysis using approximately 110,000 UK BioBank participants from unrelated White British ancestry5. They compared data from healthy participants (without a clinical diagnosis of schizophrenia) with high and low schizophrenia PRS to investigate the impact of ever having used cannabis in their lifetime on having psychotic experiences. Specifically, they looked for statistically significant associations between PRS, cannabis use frequency, and psychotic experiences like auditory and visual delusions. They found that the use of cannabis is more strongly associated with early-onset psychotic experiences in participants with a higher schizophrenia PRS compared to those with a lower schizophrenia PRS5. However, it is important to note that an association does not mean causation.

While this study was unable to establish causation, high-powered population databases, like the UK BioBank, can be used to define meaningful associations that have potential clinical applications. With both genetics and environmental factors coming to play in the development of schizophrenia, these results have indicated a potential avenue for preventive risk management5. For example, in this case, individuals with a higher PRS of developing schizophrenia could be advised to avoid cannabis, especially early on in their lives, in hopes of prolonging or preventing disease presentation. Looking to the future, the increased access and decriminalization of cannabis across the globe should lead the way to better knowledge dissemination and education regarding the intricacies of cannabis use.   

The future of research using large population databases

This study shows that meaningful associations can be made by harnessing the power of large population databases, like the UK BioBank. The use of large-population databases in research can help reduce the timeframe required to complete research projects. Additionally, the results produced by research projects analyzing large amounts of data are robust, as the amount of data available for analysis is much larger than what one single study can gather. As Dr. Tripathy explains in the case of the UK BioBank, “one cross-sectional study using half a million people may be better than 100 studies that use 50 people”. This is just the beginning of population database research and there are many possibilities within the field that has yet to be explored4

When asked about the potential research projects that can use this type of resource, Dr. Tripathy noted that while it may be relatively “easy to generate data…it’s still really hard to figure out what it means”, referring to the difficulties present in data analysis. One potential method to overcome this problem is programming. For instance, in this study, Dr. Tripathy and his team used programming languages like Python and R to analyze data from more than 110,000 patients from the UK BioBank5. With this being such a data-driven project, Dr. Tripathy mentioned that the most exciting part of conducting this research was having the chance to collaborate with and learn from his colleagues. He emphasizes that constant learning is a large part of this field and urges the next generation of scientists to become familiar with at least one programming language.  He advises, “To anyone who’s interested in research in science, I would strongly encourage taking a programming class”. Learning a programming language like R or Python can help fill the high demand for data analysts with the skillset required to process large datasets. With the future of research becoming more data-centric, this is one step you can take to better situate yourself for a successful career in data research.


1.         Tsamakis, K. et al. COVID‑19 and its consequences on mental health (Review). Exp. Ther. Med. 21, 1–1 (2021).

2.         Fast Facts about Mental Health and Mental Illness. CMHA National https://cmha.ca/brochure/fast-facts-about-mental-illness/.

3.         Zhuo, C. et al. The genomics of schizophrenia: Shortcomings and solutions. Prog. Neuropsychopharmacol. Biol. Psychiatry 93, 71–76 (2019).

4.         Stewart, R. & Davis, K. ‘Big data’ in mental health research: current status and emerging possibilities. Soc. Psychiatry Psychiatr. Epidemiol. 51, 1055–1072 (2016).

5.         Wainberg, M., Jacobs, G. R., di Forti, M. & Tripathy, S. J. Cannabis, schizophrenia genetic risk, and psychotic experiences: a cross-sectional study of 109,308 participants from the UK Biobank. Transl. Psychiatry 11, 211 (2021).

6.         Owen, M. J., Sawa, A. & Mortensen, P. B. Schizophrenia. The Lancet 388, 86–97 (2016).

7.         Foley, C., Corvin, A. & Nakagome, S. Genetics of Schizophrenia: Ready to Translate? Curr. Psychiatry Rep. 19, 61 (2017).

8.         Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

9.         Canada, H. Canadian Cannabis Survey 2021: Summary. https://www.canada.ca/en/health-canada/services/drugs-medication/cannabis/research-data/canadian-cannabis-survey-2021-summary.html (2021).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s