How to use the YFull Platform – A Tutorial for Beginners

What is YFull?

YFull is a DNA analysis service that allows customers to analyze raw data files (BAM and CRAM) obtained from next-generation sequencing (NGS). It aims to study the origin in the direct paternal line (Y DNA or Y chromosome) and the direct maternal line (Mitochondrial DNA or mtDNA). Interpretation using YFull’s proprietary software packages provides precise positioning of the sample on the human family tree (YTree and/or MTree), analysis of all known (named) single-nucleotide polymorphisms (SNPs) and short tandem repeats (Y-STRs), and determination of novel (“private”) SNPs.

The YFull database is constantly updated with new samples, currently totaling several tens of thousands. One part of the database contains raw data from various scientific publications collected around the world and ancient DNA samples (aDNA) from people of different eras and cultures up to 40 thousand years old. The other part of the database are samples from individuals whose Y chromosome and mitochondrial DNA were fully decoded with Whole Genome Sequencing or a targeted sequencing approach.

Single-nucleotide polymorphisms on the Y chromosome (Y-SNPs), which are passed from father to son through generations, as well as mutations in mtDNA inherited in a direct maternal line, allow you to trace the spread of mankind from today to tens and hundreds of thousands of years in the past. Our mythical ancestors are the so-called “Y chromosomal Adam” and “Mitochondrial Eve”, real people who lived at different times several hundred thousand years ago, but whose descendants survived and spread throughout the world. This fascinating journey to its origins is available to anyone who obtained their raw genomic data through Whole Genome Sequencing.

This YFull tutorial will follow the sections on the left-hand sidebar. The sidebar is divided into 5 main sections: SNPs, STRs, mtDNA, RAW data, and others. Individual reports in each section are described below and important features are labeled and explained under each example. 

YFull and Nebula Genomics

At Nebula Genomics we have partnered with YFull to give our users access to the deep ancestry analysis provided by YFull. From their accounts, Nebula Genomics users can easily transfer their Y chromosomal (males) and mitochondrial DNA (males and females) data to YFull and get access to the following results:

Mitochondrial DNA reports (maternal line)Y chromosome reports (paternal line)
– mtDNA haplogroup determination
– Identification of all mitochondrial SNPs
– Positioning in MTree
– Find matches based on mtDNA
– Regular updates
– Y chromosome haplogroup determination
– Identification of all Y chromosomal SNPs and STRs*
– Positioning in YTree**
– Find matches based on Y chromosomal DNA*
– Regular updates*

* requires an upgrade for $45

** free for 30 days, requires an upgrade for $45 after

Y Chromosome Analysis: SNPs

Only males with a Y chromosome can use the Y chromosome analysis.

Hg and SNPs

By clicking on the green rectangle with the Y haplogroup (1 in the screenshot below), the user can open YTree with the sample’s exact position.

The terminal Y haplogroup and position on the YTree are available for free to Nebula Genomics. Other parts of the Y chromosome report can be unlocked for a $45 fee.

Sample Hg and SNPs report from YFull
Sample Hg and SNPs report
  1. This area displays your Y haplogroup and a link to YTree, as well as a list of terminal SNPs that are common to all samples belonging to your haplogroup.
  2. The SNPs found in your sample are shown here and divided into three categories: Positive, Ambiguous, and No call. Positive SNPs are mutations that were found in your Y chromosome. Ambiguous SNPs we were not able to determine with certainty. You can download the entire list of SNPs in CSV format.
  3. All SNPs in the YFull database are rated and marked with stars from 1 to 5. The rating is based on a set of criteria that assess SNP quality. When YTree building, SNPs with a 3-5 star rating are mainly used.
  4. Detailed information about SNP can be obtained by clicking on the magnifying glass icon.

Novel SNPs

SNPs found on your Y chromosome that are currently not found in anyone else’s data in the YFull database are considered novel (“private”). A list of these options is presented in the report on the “Novel SNP” page.

Sample Novel SNPs report from YFull
Sample novel SNPs report
  1. All “private” SNPs are divided into four categories based on sequencing quality and other criteria.
  2. There may be several icons in the SNP line. “G” means that the SNP is near a gene. “H” means that the SNP is located in a homologous region, that is, one that is similar to other regions on the Y chromosome or on other chromosomes. “MC” means that the SNP is in a multicopy region, that is, in a region homologous to another region of the Y chromosome.
  3. You can view the mapped sequence fragment (.bam) for the area around the SNP using the “BAM viewer”.
YFull BAM viewer
YFull BAM viewer

SNP matches

This report provides a list of samples with which there is an SNP match with your sample. At the time of this writing, the list is limited to an age range of 3,500 ybp (years before present). You can read more about the methodology behind SNP matching on our website. 

Sample SNP matches report from YFull
Sample SNP matches report
  1. The number of common SNPs shared between your sample and another sample. SNPs can be viewed in the pop-up window.
  2. Additional SNPs that are assumed to be shared between your sample and another sample.
  3. Using this link, you can send a personal message to a YFull user.

Age estimation

The recalculation of the ages of all YTree subclades occurs at intervals of about 1-1.5 months. This is a time-consuming and resource-consuming procedure and therefore requires a certain amount of time. You can see the new YTree structure or a new subclade before the update is completed using the Live Tree. Age estimation is based on all samples with a “length coverage for age” of at least 5,927,015 base pairs. 

Sample age estimation report from YFull
Sample age estimation report
  1. The report is presented in several tabs:
    1. Subclades statistics: statistics of SNPs used for age estimation
    2. +Known SNPs: known SNPs used to estimate age
    3. +Novels: novel SNPs used to estimate age
    4. xKnown SNPs: known SNPs excluded from the estimation of age and on what reasons
    5. xNovels: novel SNPs excluded from the age estimation and on what reasons
  2. For the terminal subclade, when estimating the age, only novel SNPs of samples are taken.
  3. Subclade age, taking into account data only for the current sample.
  4. Subclade age, taking into account data for all samples.

You can read more about the age estimation methodology on our website.

The “xNovels” tab shows the reasons why SNPs in the list were excluded from the estimation of age: SNP is outside the “combBED” region, deletion or insertion (InDel), low number reads, and insufficient SNP quality.

x Novels tab in the age estimation report
x Novels tab in the age estimation report

Y Chromosome Analysis: STRs

Y-STRs are short tandem repeats of DNA fragments, for example, “GAAA/GAAA/GAAA/GAAA,” etc. They are used as genetic markers in population genetic studies for shorter time intervals due to their greater variability compared with SNPs.

STR results

At the moment, YFull is showing a report on 780 Y-STR markers. For each of the markers you can see the number of copies present on your Y chromosome.

Sample STR results report from YFull
Sample STR results report
  1. Counts of STR markers that are considered reliable are shown on a white background.
  2. Counts of STR markers that are considered ambiguous are shown on a gray background.
  3. Counts of STR markers that have not been determined or are considered unreliable is shown as a dash.

STR matches

The STR marker counts in your sample are compared to the counts in other samples. This enables us to find individuals who are your (distant) relatives on the paternal side. The counts of various STR markers are grouped into groups called haplotypes.

Sample STR matches report from YFull
Sample STR matches report
  1. The table of STR matches is divided into two tabs: Close matches and Distant matches. Close matches with distance less than 0.050 and distant matches with distance from 0.050 to 0.085. You can read more about the STRs comparison methodology on our website.
  2. These columns show the number of compared markers and the differences between your sample and other samples.
  3. Additional information about the most distant paternal ancestor.
  4. Button for generating a printer-friendly page.

STR variants

This report includes a list of STR variants from your unique (“private”) mutations down the branches of the YTree to the root where the “Y Chromosomal Adam” is located.

Sample STR variants report from YFull
Sample STR variants report
  1. Detailed statistics for the STR marker can be viewed by clicking on the magnifying glass icon.
  2. STR markers, like SNP markers, have a rating indicated by stars from 1 to 5, where 5 stars mean slowly mutating STR.
  3. STR markers have ancestral (“ANC”) and derived (“DER”) alleles.

You can read more about STR variants methodology on our website.

Mitochondrial DNA (mtDNA) Analysis

The MTree is a tree of full mitochondrial sequencing (FMS). It contains several tens of thousands of scientific samples from Genbank as well as mitochondrial DNA data uploaded by individual users. New branches are regularly added to the MTree. It currently has about 14 thousand branches. The MTree also has a large branch of Neanderthal and Denisovan paleo-DNA, as well as other ancient mitochondrial sequences.

You can see your sample in the MTree and identify nearest relatives as well as distant ancestry. MTree subclades have their own confidence rating and are indicated by stars from 1 to 5. For convenience, they are highlighted in different colors on the tree.

Nebula Genomics customers (male and female) who transfer their raw data file to YFull will receive all the Mitochondrial DNA reports described below free of charge.

Sample MTree
Sample MTree

Haplogroup and SNPs

All SNPs that we find in your mitochondrial DNA are shown in the report “Hg and SNPs.”

Sample Hg and SNPS report for mtDNA from YFull
Sample Hg and SNPS report for mtDNA
  1. By clicking on the magnifying glass icon, you can view all known SNPs for that position.
  2. Here you can see the mutation and its location in the mitochondrial DNA.
  3. You can view the sequence of raw data by clicking on the “.FASTA” button.

MReport

The main mitochondrial DNA interpretation report is the “MReport”, which lists all mutations in all mitochondrial DNA regions.

Sample MReport from YFull
Sample MReport
  • HVR1 (16024-16383) – hypervariable region No. 1 of mitochondrial DNA
  • HVR2 (57-574) – hypervariable region No. 2 of mitochondrial DNA
  • CR (575-16000) – coding region of mitochondrial DNA

On the “MTree matches” tab, all mutations are divided into groups. In particular, they include those that match other samples and those currently unique.

The MTree matches tab under the MReport
The MTree matches tab under the MReport

“Extras” are novel mutations found in your sample, that have not yet been added to the MTree.

Mt matches

Samples with SNPs that match the SNPs of your sample can be viewed in the “Mt matches” report.

Sample Mt matches report from YFull
Sample Mt matches report
  1. Additional information about the most distant ancestor (maternal line).
  2. Using this link, you can send a personal message to a YFull user.

RAW Data

You can browse your raw data for specific SNPs of interest. This option is located on the left hand side bar under RAW data. The page can be used to check any SNP on the Y chromosome.

Raw data search
Raw data search
  1. SNP position is indicated in the reference sequence Hg19 as well as Hg38.
  2. Information about the region of the Y chromosome where the SNP is located.
  3. The number of sequencing reads in your data that cover this SNP.
  4. The icons “YF” and “YB” mean that SNP information is available in the YFull database and on Ybrowse.org.

Others

Groups

A user of the YFull service can join a group (by haplogroup, by country, by region, by last name, etc.) to compare their results with other members of the group. The groups are divided into Y and Mt. The link to the list of groups is in the “Other” section of the main menu.

Joining a group on YFull
Joining a group
  1. You can join a group by clicking on the “Join request” button in the selected group page’s horizontal menu. By joining a group, the user shares the results with other group members and can view the results of other group members. You can contact another member by writing a private message. To leave a group at any time from the “Settings” page.
  2. From the “Y-Browser” page, any group member can view any position of the Y chromosome for the entire group. The “Y-Results -> View Y-SNPs” page allows you to check the value of SNPs in all samples of a group by entering the name of SNPs separated by commas.

Similarly, on the “Y-Results -> View Y-STRs” page, you can STRs of the entire group.

Interacting with a group on YFull
Interacting with a group on YFull
  1. In the header, the names of the markers are listed. In the menu above, you can select the desired number of markers to view: 12, 37, 67, 111, or all.
  2. With this tool, you can see the 780 STRs counts for all group members. The data looks much clearer showing the differences in the STRs values in a certain range with different background colors.
  3. This column of the table displays information about the most distant direct ancestor (paternal line).
  4. Consensus haplotypes for the maximum alleles, minimum alleles, and the reconstructed ancestral haplotype for the selected haplotypes group are displayed here. These tools help researchers explore haplogroups and individual subclades more deeply.

Mt groups are very similar in structure to Y groups.

Updating Trees

YTree is updated every 1-1.5 months. Each tree release has its unique number and date (YTree v9.01.00 from 18 February 2021). The list of YTree releases is published on the “Archive” page. You can follow the tree’s development by selecting any update and clicking on the link with the version number.

The recalculation of the ages of all YTree subclades occurs every time the tree is updated. This is a time-consuming and resource-consuming procedure and therefore requires a certain amount of time.

MTree is updated in real-time as new samples and subclades are added. Within a month after adding a sample to MTree and YTree, the “new” flag appears next to the sample ID.

Privacy

YFull adheres to a strict security and privacy policy and does not share user data with third parties. Partial access only to certain interpretation results (for example, STR haplotypes, SNP data, etc.) can be given to group administrators and group members to which the user has joined. Joining the group, the user agrees to the terms and conditions and can also leave the group at any time. None of the users sees either the name or the email, and you can contact them by writing a private message inside the service.

Almost all information is depersonalized, and on the “Settings” page, you can change the privacy level for groups and for YTree / MTree. Optionally, you can add your sample anonymously using a pseudonym and a specially created email address. At the same time, you can share your results with another user.

At any time, the user can delete their sample or account with all the data beyond recovery.

You can also read more information on the FAQ page. 

Edited by Christina Swords, Ph.D.

About The Author