Primer Walking to Personal Medicine: A shift in Medical Genomics

From early chromosome walking to current chromosome sprinting, Dr. Johanna Rommens shares her unique perspective on the evolution of medical genomics within the last 40 years and where it should be headed in the future.

Safa Ansar, Kaitlyn Lemay, Sarah Russell

Dr. Johanna Rommens is a Senior Scientist Emeritus at SickKids Research Institute and a Professor in the Department of Molecular Genetics at the University of Toronto.
Photo courtesy of Dr. Johanna Rommens

Dr. Johanna Rommens: Cystic Fibrosis superhero

The minute we arrive at the office of Dr. Johanna Rommens, famed SickKids senior scientist and geneticist, her inquisitive, cerebral nature is evident from the start. “I have a paper I want to talk to you guys about” she says, walking past us into the conference room. “I didn’t like it.”

It’s about whole genome sequencing, and she thinks it’s too narrow-focused. “There’s so much more to genome sequencing than just methods” she says. She would know – alongside colleagues in the Lap-Chee Tsui lab at SickKids, she was the first to clone and sequence the Cystic Fibrosis transmembrane conductance regulator (CFTR) gene. Mutations in this gene cause Cystic Fibrosis (CF), a common autosomal recessive disorder that affects 1/3600 live births in Canada1. Most importantly, Dr. Rommens and her colleagues were also the first to define the common F508del mutation in CFTR: a phenylalanine deletion at position 508 that is known today to occur in up to 90% of CF cases in some populations2.

From her beginnings as a student at the University of New Brunswick (UNB) to her position as senior scientist at SickKids, Dr. Rommens’ career has taken her on a journey through the chemical, biological, and clinical development of the medical genetics field. She completed a dual undergraduate degree at UNB in organic chemistry and biology. She chose these subjects because, as she recalls, “in those days, this was pre-DNA synthesis. Molecular biology was organic chemistry.” She went on to do her PhD at UNB, a training period she credits as crucial to her success with cloning the CFTR gene: “at UNB we didn’t have any money, [so] we did everything by hand. We learned all the fundamentals- [when] you learned how to do a protocol, you [had to] be creative about it!” This gave her an exhaustive knowledge of buffers and reactions, including the chemical and biological reasoning behind each element of each reagent she used. When she joined Dr. Lap-Chee Tsui’s lab at SickKids for her postdoctoral studies, her extensive knowledge of the chemistry behind biological reactions perfectly situated her to be a pioneer in the field of medical genetic techniques. Her background, coupled with the collaboration of a large team of dedicated colleagues, is truly what brought us the sequence of the CFTR gene.

The early days of genetic mapping and the discovery of the CFTR gene

The earliest medical geneticists developed trait mapping or using restriction enzyme-based methods to track genetic loci that were segregating through families alongside specific traits. However, they had no way of linking diseases to specific genetic loci, which would be necessary to solve the CF puzzle. It was a breakthrough paper in 1979 that determined mapping diseases to genes was a lot like traditional trait mapping, so long as you treated the disease as a linked trait3. This mapping approach was first described for dominant diseases, as those were the easiest to track. For recessive diseases like CF, however, a larger dataset was needed to make a convincing case. “A larger dataset, and the guts to believe there’s one gene and one disease” Dr. Rommens points out. Luckily, at the time, Canada was recruiting families with at least two CF patients in order to build what would be one of the largest collections of CF families in the world.

These early conclusions about disease linkage analysis were followed by the identification of CF-linked DNA polymorphic markers, which were used as starting points to try to identify the CF disease locus through restriction enzyme-based methods. Once candidate loci had been determined, however, a physical map of each region had to be created.

How do you map a gene without a genome? Before the use of whole genome sequencing and reference genomes, gene discovery was facilitated by a few key procedures, the most important being primer walking and jumping. Primer walking involves fragmenting the genomic region of interest and cloning the fragments into phages: little virus shells that can inject bits of DNA into bacteria. These bacteria replicate the DNA through natural growth processes and make up a primary phage library of fragments. Each fragment is sequenced, and then used as a primer to sequence the next consecutive fragment. Primer jumping is very similar, but involves bypassing long, hard-to-sequence and hard-to-clone regions of repetitive DNA. It was these techniques and the hard work of Dr. Rommens and her colleagues that led them to discover the sequence of the CFTR gene, an achievement that is seen today as a historic example of reverse genetics.

Their research resulted in three papers published back-to-back in Science. These covered the cloning of the CFTR gene using phages, the new primer walking and jumping strategies, and the genetic characterization of CFTR, including its famous CF-causing F508del mutation 4–6. The reaction to this news, Dr. Rommens recalled, was quite mixed. Sure, this was an incredible genetic feat, but what did it mean? And was it real? “The media was very skeptical,” she says. She also notes that reviewers were ramping up the pressure for their group to confirm that CFTR was truly linked to CF. In Dr. Rommens’ opinion, the large Canadian collection of families, taken together with the highly specific diagnostic sweat test for CF, was key to verifying the effect of the F508del mutation. A second major factor that attested to the accuracy of the sequenced CFTR gene was the confidence Dr. Rommens had in her sequencing data, something she attributes to the nature of her DNA libraries. She only performed primer walking on primary phage libraries, which were largely unmanipulated, unamplified, and therefore less likely to harbor replication-based errors. 

This is in stark contrast to today, where library preparation protocols and next generation sequencing (NGS) methods often necessitate sample amplification. Current NGS methods are cost-effective, fast, and allow us to perform whole genome analyses when studying the causes of certain diseases. However, they are also subject to a whole new set of biases that can affect the way we interpret data, which in turn can affect the way we treat patients. These issues, and their possible solutions, are exactly what Dr. Rommens is so passionate about when it comes to genomics today.

Future Improvements for Current Issues

So, where should the future of genetics be heading? According to Dr. Rommens, emphasis should be put towards improving modern sequencing techniques to allow genome sequencing without the need of the reference genome, as well as to reintroduce the process of phasing.

The reference genome is looked at as a standard point of comparison for most sequencing data. However, it is not a diverse compilation that represents an average global population, nor does it represent an optimal, “healthy” genome7. The current assembly, GRCh38, is a composite of over 50 genomes from anonymous individuals, with the majority of regions (~70%) stemming from just a single individual, RP118. Since the reference genome is closer to a tool than an actual standard, it may distort results, as some reads may readily map to the reference while others will not7. Due to population stratification, the use of the reference genome could therefore bias variant interpretation to overlook truly rare or significant variants. Another major downside to the reference genome is the gaps it encompasses. In the GRCh38 assembly, there are a total of 875 gaps- 875 areas of the human genome that are not sequenced and therefore unknown8

The second direction Dr. Rommens believes the field should move towards is the reintroduction of phasing. Gametic phase involves identifying which alleles are obtained from paternal and maternal chromosomes (Fig. 1)9. This is important information as it helps identify how combinations of variants are situated and how this may affect gene function. It can also give rise to information pertaining to the relationship between genotype and phenotype, which is important for further understanding genetic diseases. With the dawn of the NGS era, phase information is typically not obtained, mostly due to the computational complexity and cost burden of trio sequencing9.

Figure 1: Each parental chromosome shows a single nucleotide polymorphism (SNP) in white. When the offspring chromosomes are sequenced without phasing, the result is a continuous sequence. As a consequence, the polymorphisms cannot be identified as belonging to a particular chromosome. In phased sequencing, the product is two separate sequences. This allows for the identification of the chromosome the polymorphism belongs to. This could be important if the parental SNPs were deleterious together, but alone had no effect. If phasing was not done, one would be unable to determine if both SNPs were on the same or on different chromosomes9.

Moving forward, Dr. Rommens believes we should not lose focus of these concepts of phasing, population stratification and patterns of inheritance, as they are key drivers of variation and disease. With this in mind, she also cautioned relying on sequencing data too much in clinical practice and mentioned that more data is needed to help understand phenotypic patterns. “What does everyone do when they study sequencing?” she asks. “We compare to the reference. We assume the reference is correct, and by and large it is correct. But one of the real reasons the references is used is because of computational efficiency.” She brings up an important point: if we could find a way to overcome the computational burden of whole genome mapping, the reference genome may no longer be needed. But, how could this be done? According to Dr. Rommens, the answer is statistics. “I think if I had to do it over again, I would learn statistics” she muses. “I would [also] learn to be better at sequence analysis, because in order to solve the phase problem, we need sequencing that doesn’t use the reference. That’s clear. It’s obvious.”

Back to basics

While discussing future directions of medical genomics and personalized medicine, the conversation with Dr. Rommens kept coming back to one main theme – bringing genomic medicine back to basics. She had brought our attention to a recent review article published detailing the “brief history of human disease genetics”, which primarily emphasized the technological advancements within the last 25 years10. “Human genetics existed long before 25 years ago, when very elegant concepts of genetics were figured out”, Dr. Rommens remarked. She further expressed that the history of human genetics is much more complex than can be summarized based solely on newer technology like NGS. From Mendel’s pea plant experiments in the 1860’s, to using NGS technologies to study genomics, each step in the history of genetics has built upon equally important fundamentals that have shaped our understanding of genetic disease at the level of the population and the individual.

So, how do you advance medical genomics while at the same time ensuring we stay true to the core concepts of genetics? Dr. Rommens emphasized that genomic medicine is never about one individual, which can sometimes be overlooked in the research setting. Regarding the interpretation of genetic data, she explained that many geneticists tend to think of their data as being ‘special’ when compared to other clinical measures. Although she doesn’t endorse this sentiment, she did note “[what is different] about it than any other data is that it’s more than the person with it that’s affected. That’s why you need to treat it as special.” Dr. Rommens presents genomic medicine with an often-overlooked point of view – that it’s not solely represented by our technology, or even its elegant theoretical basis. Rather, at its core, genomic medicine is about how these two tools blend in order to give us actionable clinical information about an individual, their family and the population. Genomic medicine is never truly about one patient, but a network of individuals.

She touched on a similar sentiment at the beginning of our discussion when explaining why she was attracted to research at SickKids. At the time, working on chromosome mapping of the CFTR gene was highly dependent on the pedigree information available. Dr. Rommens noted that Canada had one of the best family collections in the world, which was largely supported by the Canadian Cystic Fibrosis Foundation. Regardless of how the media felt about the CFTR discovery at the time, the families were keen. “They always are,” she says fondly. This further echoed her previous commentary of how genetic data holds so much more meaning than one person’s status. Genetic data and medicine is not just about one person, but the patients, families, and community of researchers and doctors who collaborate for the advancement of medical care. Through a lively and animated discussion, it’s very clear that Dr. Johanna Rommens believes this through and through. From her ground-breaking CF research to her suggestions on how genomic medicine should be advanced, one thing is clear – to her, the patient is primary.

References:

1. Cystic Fibrosis Canada. The Canadian CF Registry 2016 Annual Data Report. 44 (2017).

2. Riordan, J. R. CFTR Function and Prospects for Therapy. Annu. Rev. Biochem. 77, 701–726 (2008).

3. Solomon, E. & Bodmer, W. F. Evolution of Sickle Variant Gene. Lancet 313, 923 (1979).

4. Riordan, J. et al. Identification of the Cystic Fibrosis Gene: Cloning and Characterization of Complementary DNA. Science 245, 1066–1073 (1989).

5. Rommens, J. M. et al. Identification of the cystic fibrosis gene: Chromosome walking and jumping. Obstet. Gynecol. Surv. 45, 174–175 (1990).

6. Kerem, B. et al. Identification of the cystic fibrosis gene: genetic analysis. Science 245, 1073–1080 (1989).

7. Ballouz, S., Dobin, A. & Gillis, J. A. Is it time to change the reference genome? Genome Biol. 20, 1–9 (2019).

8. Genome Reference Consortium. Data. Available at: https://www.ncbi.nlm.nih.gov/grc/data.

9. Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).

10. Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s