G-what?: An introduction to the genome-wide association study (GWAS)

When the successful sequencing of the human genome was first announced in early 2003, many thought that this would revolutionize how we understand health and disease. Through this decade-long effort that cost nearly $3 billion, scientists had painstakingly identified the genetic code that makes us who we are and impacts many of our traits and characteristics. But while this was an exceptional achievement, it signaled only the first steps in how we make sense of ourselves and our DNA. To fully understand how this code determines a person’s height or risk of developing heart disease, scientists had to work towards comparing thousands, or even millions, of individuals’ genomes. Enter the Genome-Wide Association Study or GWAS.

ABCs of DNA

First, let’s take a step back to better understand what makes up our genome. Our genome is composed of DNA; molecules called nucleotides are its building blocks. Nucleotides can be one of 4 options – adenine, thymine, cytosine, and guanine, more commonly denoted by “A”, “T”, “C”, and “G”. When combined in long stretches, these nucleotides can form genes. In our cells, these genes are read and used as instructions to make proteins, which perform a mind-boggling array of functions in our bodies. Proteins act as messengers, help protect our body, and give our cells structure, among many other important tasks.

Our genome contains the DNA that makes up all of our genes – over 20,000 in total! In addition to all of our genes, our genome contains long stretches of nucleotides that aren’t made into proteins, and scientists are still working to figure out what they all do. In all, the human genome is 3.2 billion nucleotides long. Of these 3.2 billion nucleotides, all humans share around 99.9% of these letters. The remaining 0.1% of our genome, which is still a staggering 3 million nucleotides, varies across people and populations and controls what makes all of us unique. 

A nucleotide that can vary between people is called a single-nucleotide polymorphism, or SNP (pronounced “snip”). For example, at a particular SNP, 80% of the population has a “G” nucleotide, while the remaining 20% have a “C” instead. These SNPs form the basis of why 2 people might have different feelings about the taste of cilantro, or why one might be more susceptible to developing dementia than the other.

Better together: the power of GWAS

While we know that SNPs help to drive differences in our traits, the difficulty lies in determining which SNPs correlate to which traits. This is where the magic of the genome-wide association study, or GWAS, comes in. A GWAS examines the genetic data of thousands, and sometimes even millions, of individuals alongside information about a particular trait, such as their body mass index or feelings of loneliness. Scientists can then compare all of this data, and find the particular SNPs that correlate (or “associate”) with a person’s risk of developing a particular disease or having a particular trait. These results are often displayed on an aptly named “Manhattan plot”, where the most significant SNPs correlated to the trait rise up like skyscrapers.

Manhattan plot of a GWAS

After identifying the SNPs that correlate most with the study’s trait of interest, scientists then work to determine the genes that these SNPs reside in. Many times, the SNPs are in or near genes that help control processes related to the trait. For example, a GWAS study examining the genetic variants that influence allergies found that many of the identified SNPs could be found in genes that control inflammation and the immune system. Other times, researchers can actually find new genes that play a role in a particular disease or trait through a GWAS. In addition to a better understanding of the SNPs that factor into a particular disease, the valuable information gained from genome-wide studies like these can even lead to the discovery of new ways to treat the disease!

Not all GWAS are created equal

Two genome-wide association studies examining the same trait or disease may identify different numbers of SNPs. One of the major factors that control this is the number of participants in a study. Many of the very first GWAS used hundreds or thousands of people and identified only extremely significant SNPs. In contrast, with today’s boom in genetic testing driven by ever-decreasing cost, studies routinely examine the genetic data millions of individuals. By including ever-larger pools of study participants, researchers have more statistical power to identify SNPs that may exert an influence on a particular trait. 

Another factor that may determine the results of a GWAS are the populations selected for the study. Many studies examine genetic data solely from individuals of European ancestry, and the results either may not transfer to other populations or, the study may miss important SNPs that predominantly appear only in certain populations. Various efforts to increase the diversity of genomic research are underway, with the goal of ensuring that everyone stands to benefit from the discoveries being made.

Going to the Library

At Nebula Genomics, we launched the Nebula Research Library to allow customers to see the results of GWAS studies, and what those results mean for them and their genetic data. We curate the latest scientific publications that reveal new insights about SNPs across an array of traits and diseases. We believe that your journey and understanding of your genetic data shouldn’t stop the day you get your results but instead should continue to grow as our scientific knowledge advances. Recently, we added Polygenic Risk Score calculations to each study in the Nebula Library. This exciting addition allows users to see the cumulative effect of many SNPs related to a particular trait or disease. We will continue adding new features that allow our users to maximize the information they can learn about themselves and their genomes.
To begin your journey of learning about your genetic data and the insights Nebula can help you discover, order a whole-genome sequencing kit today!

About The Author