GEDmatch is a primarily free genealogy site that lets users upload their own autosomal DNA test results and find related individuals. It was founded by Curtis Roger in 2010. The powerful GEDmatch Genesis tool finds matching segments of DNA in it’s 1.3 million users regardless of which company the original data is from. Users can upload data from a wide variety of sites, letting people find family members even if one used 23andMe and the other used AncestryDNA. There are also other DNA tools that let you explore your ancestry.
One of the major drawbacks of GEDmatch is that it is not designed for the novice user. That’s what this tutorial is for: We’ll walk you through how to sign up for free, upload your raw DNA data from other DNA testing sites, how to look at your ancestry, and explore your family relationships.
|Are you interested in decoding 100% of your DNA for more accurate ancestry results and more DNA matches with relatives? Nebula Genomics offers a Whole Genome Sequencing DNA test for only $299! Click here to learn more!|
Is GEDmatch safe?
GEDmatch is not going to give you inaccurate medical information that may cause you to erroneously seek medical care as it isn’t going to offer you any medical or health information at all. It also does not tout itself as the most accurate and precise measure of ethnicity. The admixture/heritage tools pull data from many different sources and your results will vary depending on which project you select.
Before you start uploading your genetic data to random websites, you should know more about who is receiving that data. GEDmatch is currently owned by Verogen, a forensic genomics company. Verogen makes money, in part, by allowing law enforcement to use their genealogy database to solve crimes. This might mean that your DNA could help identify a victim of a crime — or the criminal.
How to sign up
Registration is straightforward. Go to the registration page and fill in your name, alias if you’d rather your name not appear publicly, email, and password. Your email and password will be your GEDmatch login.
The email needs to be valid as GEDmatch will send a validation code. Be aware that this email will be visible to other users if you choose to use the central one-to-many DNA matching tool. Although your real name is requested, this does not seem to be validated.
How to upload data to GEDmatch
GEDmatch does not offer DNA testing, only interpretation of existing DNA results. Therefore, you will need to download your data from whatever SNP based testing company you submitted a sample to. You can either search “how to download raw genetic data from” and the name of your testing company or go directly to the tutorials from Ancestry DNA, 23 and Me, or Family Tree DNA.
Depending on where your testing was done, you may download a single file or a folder. This will likely be “gzipped” which makes it take up less space but makes it so only a computer can read it. To “unzip” it, and make it readable by a human, on a Mac, double click on it or, on a PC, right-click and click “unzip”. You can now open it in WordPad or Excel. Nebula users should use a text editor when opening these files to ensure all the data is shown.
GEDmatch will only accept what it calls “23andMe format”. This differs from “Variant Call Format” or “VCF”. GEDmatch will not accept .vcf files. You might be able to directly download a file in 23andMe format. If you are only able to download a VCF, check out this tutorial (scroll down to the part titled “Converting VCF Files to 23andMe Files”). Nebula Genomics customers should follow the instructions to use WGS Extract to generate 23andMe-format file.
GEDmatch will not accept full exomes or genomes. They also recommend against uploading imputed data where a computer has tried to guess what your DNA might have been in between the SNPs that were actually tested. You should upload only raw SNP data.
GEDmatch will check if the file is in the right format and reject it if it’s not. If you want to check that the file is 23andMe format yourself you can open it in Wordpad or Excel. It will likely have some lines at the top that start with a hashtag, these are ignored by the computer. The rest will look like this:
# rsid chromosome position genotype
rs3094315 1 752566 AA
rs3131972 1 752721 GA
rs75333668 1 762320 CC
rsID is the name of the SNP, chromosome, and position are where the SNP is found, and genotype is what the SNP is at that position. There are two letters, one from each parent.
Now, you can upload your genetic information to GEDmatch. This button is on the right-hand side of the homepage, marked in the red box below, and says “Generic Uploads (23andMe, FTDNA, AncestryDNA, Living DNA and most others)”.
On the following page, it’s recommended to fill out as much information as you have. However, the only necessary boxes are a name and that you click the radio button stating that you’re authorized to upload the data (because it’s yours, you have their permission, the person is dead, or you’re law enforcement). Everything else is optional. If you don’t know the answer, leave it blank.
The very last set of questions is optional but important to answer. This is asking for the privacy preferences for this DNA profile, also called a “DNA kit”.
The default option is “Opt-in” which will let law enforcement access your data for criminal searches. It’s important to note that GEDmatch has been used to identify criminals by finding even distantly related family members. GEDmatch was used to solve the Golden State Killer cold case by matching DNA from the crime scene to a third or fourth cousin.
I chose to “Opt-out” which lets me compare my kit to others and lets individuals who are not law enforcement find me. Other options include research, which lets you see matches but does not let the matches see you, or private, which will not allow you to find family members using the Genesis tool but will let you use other ancestry tools.
Even if you opt-out or choose to keep the data private, GEDmatch will still turn the data over to law enforcement if a court subpoenas the information.
Click “Choose File”, find the file on your computer, and then click “upload”. Once you’ve added your file, it may take several minutes for the upload to occur. Do not leave the page or hit refresh. The time will depend on how big your genomic data file is and how good your upload speeds are. If it takes more than ten minutes, you should try again.
If your file is too big and slow to upload you can re-zip it. On a Mac, ctrl+click the file name and select “Compress”. On a PC, left click, select “Send to” and select “Compressed (Zipped) Folder”.
You’ll then see some large green numbers counting up, checking that each of your chromosomes was found in your file, and then it’ll give you your “kit number”. You’ll need this for all of the analysis on GEDmatch. Don’t worry too much about writing it down somewhere safe, it’ll show up on the home page and you can copy and paste it from there.
In order to go back to the home page, you’ll need to answer GEDmatch’s question, “When will my kit be available for one-to-many tool?” with “Typically within 24 to 48 hours”. This is the GEDmatch strategy for keeping you from emailing them and asking why you can’t find your results yet.
GEDmatch’s one-to-many tool takes a while to run because it has to compare your DNA to the DNA of all 1.3 Million users. Don’t worry, though, there are other tools to play with while you wait.
How to look at your Admixture/Heritage
To look at your Admixture (heritage), you’ll need to find your kit number. Mine is highlighted in the green rectangle on the left above. You can write this down or just copy to paste it later. Then, click on “Admixture (heritage)” on the right of the screen.
Admixture analysis works by matching chunks of your SNPs to a set of references. It’s important that you’re comparing to the right set of references. Otherwise, the program might try to force a match that isn’t actually accurate, but it’s the closest it has, and miss out on a match it could have made.
This spreadsheet contains a comprehensive list of each of the populations in each project.
MDLP is good as a broad global calculator. It pulls data from all around the world. MDLP world and MDLP world-22 are useful for anyone wanting to get a broad look at their ancestry. MDLp world-22 includes Pygmy, West-Asian, North European Mesolithic (an ancient DNA sample), Indo-Tibetan, Mesoamerica, Arctic-Amerind, South-America_Amerind, Indian, North-Siberian, Atlantic_Mediterannean_Neolithic, Samoedic, Indo_Iranian, East-Siberian, North-East-European, South-African, North-Amerind, Subsaharaian, East-South-Asian, Near_East, Melanesian, Paleo-Siberian, and Austronesian.
Eurogenes provides a finer map of Europe with the Eurogenes EUtest V2 K15 including North_Sea, Atlantic, Baltic, Eastern_Euro, West_Med, West_Asian, East_Med, Red_Sea, Aouth_Asian, Southeast_Asian, Siberian, Amerindian, Oceania, Northeast African, and Sub-Saharan.
Dodecad has a few different groupings that provide good diversity for those with African heritage (Dodecad Africa9 with Europe, NW_Africa, SW_Asia, E_Africa, Mbuti, W_Africa, Baika, and San), or Asian Heritage (Dodecad V3 with East_European, West_European, Mediterranean, Neo_African, West_asian, South_Asian, Northeast_Asian, Southeast_Asian, East_African, Southwest_Asian, Northwest African, and Palaeo_African).
HarappaWorld is targeted for South Asian Ancestry and includes S-Indian, Baloch, Caucasian, NE-Euro, SE-Asian, Siberian, NE-Asian, Papuan, American, Beringian, Mediterranian, SW-Asian, San (South African Hunter-Gatherers), E-African, Pygmy, and W-African.
Ethiohelix is targeted for those of African Ancestry with Ethiohelix K10 Africa Only containing Nilo-Saharan, East-Africa2, Mbuti-Pygmy, East_Africa1, Khoi-San, West_Africa, Hadza, Biaka-Pygmy, North-Africa, and Omotic.
puntDNAL pulls data from ancient DNA, with puntDNAL K10 Ancient including ASI (Ancient South Indian), Sub-Saharan, Oceania, Beringian, END (Early Neolithic Farmers), CHG (Caucasus Hunter-Gatherers), Siberian, E_Asian WHG (Western European Hunter Gather), and Amerindian.
Gedrosia also utilizes ancient DNA with the Gedrosia Ancient Eurasia K6 including Ancestral North Eurasian, Ancestral South Eurasian, East_Asian, West European Hunter-Gatherer, Natufian, and Sub_Saharan.
The test results can be visualized in multiple ways, including “proportions” which gives a pie chart, or “Admixture by chromosome” which includes the information about which parts of your chromosomes are identified as coming from each region.
The pie chart is useful to see what your total ancestry estimates are. The chromosome painting allows you to see which areas of your genome are from what population and helps show a little more of the uncertainty in the estimates — areas that are multiple colors might come from any of those populations as the populations have similar DNA in that genomic region.
How to tell if your parents are related
Similar to how GEDmatch lets you find relatives in their DNA database by matching segments of your DNA to theirs, you can check if your parents were related by looking for segments of your own DNA that matches itself. You have two copies of each chromosome, one from each parent. and can compare them to each other. If large chunks match, it’s likely your parents share a recent common ancestor.
Thankfully, my results were pretty boring.
Red denotes that there was no match, green that there was a match, and blue that there was a long match. Some short matches are expected to happen by chance. Several long matches would indicate that your parents are related.
How to check if you’re related to any ancient humans
Everyone is descended from some ancient humans, but not all ancient humans have been sequenced. Occasionally, scientists find ancient remains that still have human DNA that is in good enough shape to extract genetic information.
This application works in a similar way to the one-to-many DNA comparison tool that you are likely still waiting to run. It compares your DNA to the ancient samples and checks if any long stretches match.
The tool lets you change how long the matching stretch is. 0.5 cM is pretty short and there are a lot of matches. Many matches do not necessarily indicate that there is a close relationship. These are denoted in orange. Each orange bar shows which part of the ancient human’s chromosomes match mine.
However, if you increase the stretch that needs to match to a larger number, here 5cM, significantly fewer stretches match. This means I am not very closely related to these ancient humans. There were no matches above 6cM.
How to look at matching segments in multiple people
Rather than only comparing your chromosomes to others (one-to-many), the 3-D Chromosome Browser lets up to 9 comparisons happen at once (many-to-many).
Here I’m looking at two individuals who are parent and child and one distantly related family member. The colored bars are the parts of the X chromosome that match between the two individuals. The orange is showing that most of the X chromosome matches between the mother and child.
The center part that is not highlighted as a match is the centromere. There is a centromere in the middle of every chromosome. The centromeric DNA is very repetitive and doesn’t contain any genes. Therefore it is usually not included in SNP analysis.
The mom and child likely also have matching centromeres, but since no SNPs were analyzed in that area, it can’t be confirmed that they’re the same.
The yellow bars show the part of the X chromosome that matches the more distantly related family member. This chunk is smaller since the individual is more distantly related. The chunk also matches both the mother and child, indicating that the relative is related to the child through the mother. If the distant relative was related to the child through the father, there would not be a match.
You can use the 3D chromosome browser for any individuals you have kits numbers of. However, unrelated individuals will not have matching DNA and will not provide very interesting plots.
How do you find kit numbers for related individuals? You use the one-to-many tool!
How to use the one-to-many tool
One to two days after uploading your DNA information, you can type in your (or anyone’s) kit number at the top and hit search. This will pull up the individuals that share segments of DNA with you, also known as your family.
The first column is the kit number, which uniquely identifies the DNA upload and permits any user to repeat any analysis using that kit number instead of their own. This means you can take the kit number you find here and plug it into the tools described above: you can find out their admixture/heritage, if their parents were related, or compare their chromosomes to anyone else who’s kit number you copy and paste.
The second column is whatever alias the user chooses to use for that kit and the third is the user’s email. The user might not be the person the DNA is from. You shouldn’t assume that that email belongs to your relative, only that it belongs to someone with authorization to upload your relative’s DNA information.
After the user’s information is “GED”. The GED in GEDmatch stands for GEnealogical Data. This comes from the GEDCOM (GEnealogical Data COMmunication) file format.
If you have a detailed family tree in genealogical research software, it can be exported in GEDCOM format and uploaded to GEDmatch. This will allow you to link family trees using the relatedness feature in GEDmatch’s one-to-many tool. If a kit has a GEDCOM entry associated with it, it will show up as a link in this column.
Age in days refers to how long ago the kit was uploaded. Type describes. Sex is a binary Male or Female that the user indicated when uploading the kit. Other entries the user could have indicated are the haplogroups: Mt and Y.
Mt is short for mitochondria. Mitochondria are passed down from mother to child and don’t recombine, or fragment with each generation, like most of the genome does. It allows people to trace their lineage directly through their mothers.
Y is similar to Mt but only for males. Only the very ends of the Y chromosome are able to recombine with the X chromosome, meaning most of the Y chromosome is passed down from father to son intact. This permits men to trace their lineage through their fathers.
GEDmatch will not tell you your haplogroup, but you may be able to figure it out if your relatives have their haplogroup information. If you share a common maternal ancestor, you have the same Mt haplogroup. If you are male and share a common paternal ancestor as another male, then you have the same Y haplogroup.
Next is the “Total cM” of the match. CentiMorgans are a measurement of DNA length that factors in how often certain spots recombine. Two columns later is “Largest” which states the longest segment that matches between two individuals.
A “Generations” column gives an estimate for the number of generations apart two individuals are. 1 represents a parent-child relationship, 1.2 is sibling, 1.4 is half-sibling, uncle, or grandparent. 2 would be a cousin as the last common ancestor was 2 generations ago, 2.6 would be a first cousin once removed, 3 would be the second cousin with the last common ancestor 3 generations ago. The prediction breaks down over 4.
The example individual has two first degree relatives, either parents or children, one 1.5 degrees relative, such as an aunt, and a second-degree relative, likely a cousin. My own results were all beyond 4th-degree relatives.
Source states the company the genomic data was generated by or that the data was migrated (transferred) over from the old GEDmatch site. The ability to pull from different DNA testing kits one of the original benefits of GEDmatch. If a user wants to search for family members, they do not have to buy both Ancestry and 23andMe kits. The user can search through all uploaded DNA data regardless of the DNA testing service.
The “Overlap” column describes how many SNPs were actually compared. This is not performed for migrated kits. The new genesis system does not need the kits to have tested the same exact SNPs in order to say that two individuals are related. However, more SNPs that directly match means more certainty in the match.
How to iterate with GEDmatch
A lot of the fun of GEDmatch is being able to look at the connections between people. If enough of your relatives have uploaded their data, you can find them, see which are related to each other, and build out your family tree. Once you have found the kit numbers of people related to you, you can perform any analysis on that individual that you would have done on yourself. This can allow you to pinpoint where different parts of your heritage came from.
Even if you don’t have any close relatives, you can find kit numbers of strangers and find their family connections and learn about their family history and heritage. This can be enjoyable in a nosey way but, remember, anyone else can do the same with your data.
How to delete your data from GEDmatch
To remove your data from GEDmatch, click on the pencil next to the kit information on the home page.
You will be brought to the “Kit Profile Management” page. This page lets you update any of the information supplied for your kit.
The second tab reads “Kit Removal”. If you click on it you’ll be brought to the page shown below. Enter your password and click delete.
What is GEDmatch? What does GEDmatch do?
How to use GEDmatch?
Is GEDmatch safe?
How to upload to GEDmatch?
How to use GEDmatch admixture?
What does GEDmatch stand for?
Who owns GEDmatch?
How to find haplogroup on GEDmatch?
What is GEDmatch Genesis?
Is GEDmatch accurate?
Is GEDmatch free?
What is WHG on GEDmatch?
Did you like this tutorial? You might find these tutorial useful as well:
- Our review of GEDmatch
- How to download your 23andMe and AncestryDNA data?
- How to use the Nebula Library?
- How to use ClinVar?
- How to explore your genomic data?
- DNA testing during pregnancy?