Spearin Surname Project



ISOGG


Homepage


Variants & Deviants


what's in a name?

Where & When ... Temporal & Geographic Distribution


Traditional genealogy


Genetic genealogy

DNA - THE BASICS

DIFFERENT DNA TESTS

MUTATIONS

DNA FAMILY TREES

PROS & CONS OF TESTING

SPEARIN Y-DNA RESULTS

INTERPRETING RESULTS



Forum


Contact Us


Links


Disclaimer 


Join our Project



Interpreting Results

Direct Male Lines & Mutation History Trees

Genetic Distance and Estimates of TMRCA (Time to Most Recent Common Ancestor)

Trawl of the internet for more genetic cousins

Haplogroups

What does the future hold for our past?


A Closer Examination of Mutations

Let's look more closely at the five subjects in the first cluster, Genetic Family 1 (GF1). At this stage of the project, the modal haplotype can be calculated for all 5 subjects in this cluster, based on the 37 markers in the FTDNA 37-marker test. The 43-marker DNAheritage test covers 32 out of the 37 FTDNA markers, but the additional DNAheritage Upgrade 15 Test covers the remaining 5 markers, thus allowing a direct comparison with the results on the 37 FTDNA markers shown in the table (see also http://www.gendna.net/ydnacomp.htm). 

The subjects differ on 4 markers, namely DYS607, CDYa, CDYb, and DYS446. The first 3 of these markers are included in the FTDNA 37-marker test, but the fourth is only included in the DNAheritage 43 marker test. Compared to the Modal Haplotype, two subjects match exactly, two have 1 mutation each, and one has 2 mutations.

Using McGee's Y-utility probability matrix allows us to visualise the genetic distance between each of the subjects in a handy table. David Ewing's instructions on how to do this (and generate a phylogram) are particularly helpful. The table and phylogram show the genetic distance for each subject from every other subject (and the modal haplotype). We can compare all 5 subjects on 37 markers, and in addition we can compare 3 of the 4 DNAheritage subjects on 58 markers.

The table below shows that subjects ESMXMS (Limerick) and 27JHM7 (Sydney) are an exact match for the modal haplotype (MH) based on 37 markers. The furthest from the MH is subject 356HB (Georgia, FTDNA no 164729). The furthest genetic distance is between this latter subject (from Georgia) and both QQJRCM (Ontario) and 200083 (New Jersey). This suggests that these individuals are the most distantly related from 356HB (Georgia). The most closely related are the first two, the ones that match the modal haplotype exactly (Limerick and Sydney). This says to us: these two individuals should focus their documentary research on each other to try and find a common link between them.

But all this is based on only 37 markers. Different results might be obtained if further markers were tested. And for three of the subjects, a comparison is possible on 58 markers. This is described below. Will it produce the same results?

Genetic Distance Table based on 37 markers

Note: Infinite allele mutation model is used. 


Values on the diagonal indicate number of markers tested.

 




.

 



.

Phylogram based on 37 markers

The phylogram gives us some useful additional information. We can now visualise the genetic difference between the subjects. Each subject is represented by a dot, and the lines help show us how they may be related to each other. The red text on the lines tells us which marker has mutated between the subjects.

This diagram suggests that subjects ESMXMS (Limerick) and 27JHM7 (Sydney) are closest to the "ancestral haplotype" (i.e. the haplotype of the common ancestor for the group). The families of the other 3 subjects "branched out" from this ancestral haplotype, but in 3 different "directions", each developing their own unique mutation(s).

As more people join the project, we will be able to see which "branch" they most closely match, and this in turn will inform them where to focus their ongoing documentary research i.e. who should be talking to who.

But there are a couple of caveats. All this programme does is produce a "best fit" for the data entered. It is not necessarily the correct fit, just the best fit given the data entered. Entering more data from more markers may produce a different "best fit" diagram. Also, these results are based on living individuals. How can we be sure that they have inherited the exact same haplotype as their Most Distant Known Ancestor (MDKA)? Maybe any mutations that occurred happened since the MDKA. So how do we resolve this?

This is where triangulation comes in (see section on DNA Family Trees). Testing a second subject with the same MDKA will help triangulate the haplotype. And if there is a difference, a third subject will complete the process and identify the MDKA haplotype (in 99.9% of cases).

Thereafter, we would generate an additional table and phylogram based on the MDKA haplotype of each family. And this would be exactly the same as the ones here if no mutations had occurred since each MDKA.





.

Genetic Distance Table based on 58 markers

Comparing the three subjects with 58 markers tested, shows that Sydney (SYD1) is closest to the MH and differs least from the other two. This suggests that Sydney is now closest to the ancestral haplotype, whereas with the 37-marker comparison both Sydney and Limerick (LIM1) were reckoned to be closest. The additonal markers thus give greater discerning power to the test. Ontario (ON1) and Limerick were separated by a single mutation in the 37-marker comparison but now are further apart than previously. Thus the more markers tested, the better we are able to separate the different families out.

A phylogram generated from this 58-marker comparison would just show a line with 3 dots separated by a single mutation each. So how do we find out who is most closely related to whom?

.

Let's look even closer at the data. Limerick (LIM1) differs from Sydney (SYD1 or NSW1) on 1 marker (DYS446), but he also differs from New Jersey (NJ1) on 1 marker (CDYb), based on 58 markers and 37 markers respectively.  Should we give these 1 marker differences equal weight? But before we even consider that question, we could actually compare 4 of the subjects on 40 markers with the currently available results (and when New Jersey (NJ1) gets the rest of his results we can compare the same 4 subjects on 58 markers). And the 40-marker comparison shows that Limerick (LIM1) differs from New Jersey (NJ1) on 2 markers (Limerick's DYS446 mutation, and New Jersey's CDYb mutation). 

But Limerick, New Jersey, and Ontario ALL differ from Sydney by 1 marker, on DYS446, CDYb and CDYb respectively. So who is the closest to Sydney on the genetic timeline? This is where we look at the mutation rate - the marker with the slowest mutation rate is LIKELY to indicate the more distant relative (but this is just on the balance of probabilities). And the answer is ...

DYS446 mutation rate = 0.00365

CDYa & b mutation rate = 0.03531 ... 10 times faster!

So it looks like New Jersey and Ontario are closer to Sydney than Limerick is, even though they all only differ by 1 marker (I got the information on mutation rates from here - mutation rates 1 to 111 markers.xls)

As for the subject from Georgia (GA1), he differs from Sydney on 2 markers (CDYa and DYS607) and is the furthest genetically from everyone else. The DYS607 mutation rate = 0.00411 (similar to DYS446).

We can now map this information in to a different sort of phylogram and this gives us a better idea of how different families may have branched off ... and according to this "best fit" model, it looks like the Limerick family may have been the first to mutate.




.

Estimates of TMRCA (Time to Most Recent Common Ancestor)

Based on 37 markers, the TMRCA estimates (using McGee's Y-utility tool) reflect the genetic distance between subjects. Thus, according to McGee's programme, there is a 50-50 chance that subjects ESMXMS (Limerick) and 27JHM7 (Sydney) are related in the previous 90 years (by this I assume it means 90 years before the birth of the oldest of the two subjects). However, this also means there is a 50% chance that they are related before this time (i.e. less than 90 years ago, for example if they were brothers or cousins), and a 50% chance that they are related more than 90 years ago. In other words, the 50% estimate just tells you the most likely timepoint for the TMRCA but it could be on either side of that. And in fact, we know from traditional research that the two individuals are not related going back to at least 1833, which is almost 180 years ago (and about 110 years prior to the birth of the oldest of the test subjects).

The probability estimate can be  recalculated to give a 95% probability (i.e. so that there is a 95% chance that the TMRCA occurred within a given number of years) and this calculation tells us that the two subjects have a 95% probability of being related within the past 390 years (which goes back to about 1550). The more markers that are used for the calculation, the more accurate it should become, with potentially less "variability" on either side of the estimate.

We can put this to the test by performing the same calculations, but this time based on a comparison of the 58 markers for the 3 subjects who tested with DNAheritage. The McGee Utility tool appears to have some difficulty with this and only 52 of the 58 markers can be entered into the spreadsheet. This may be updated in the near future following the recent availability of the FTDNA 111 marker test (April 2011).



50% Probability (37 markers) 95% Probability (37 markers)


50% Probability (52 markers) 95% Probability (52 markers)
Nevertheless, testing the additional markers does result in changes to the estimates. 
  • The TMRCA between ESMXMS (Limerick) and 27JHM7 (Sydney) was 90 years at 50% probability and 390 years at 95% probability, but this has now increased to 180 years at 50% and 510 at 95%. The increase is due to the extra mutation at DYS446 in the additional markers included for the Limerick subject.
  • The TMRCA between ESMXMS (Limerick) and QQJRCM (Ontario) was 240 years at 50% probability and 630 years at 95% probability, but this has now increased to 300 years at 50% and 690 at 95%. Again, this is due to the Limerick mutation at DYS446.
  • The TMRCA between QQJRCM (Ontario) and 27JHM7 (Sydney) was 240 years at 50% probability and 630 years at 95% probability, but this has now decreased to 180 years at 50% and 510 at 95%. This is because no new mutations exist in the additional markers included for these 2 subjects and thus there genetic distance has been revised downward.
These probability estimates would become even more exact if more markers were tested. It is not yet clear how accurate these estimates would become, nor how useful they would be at identifying potential brothers or cousins from 1700-1800.




Copyright 2011 (http://freepages.genealogy.rootsweb.com/~spearinAll Rights Reserved.  Creative Commons License
The Spearin Surname Project at http://freepages.genealogy.rootsweb.com/~spearin is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Information and data obtained from the Spearin Surname Project must be attributed to the project as outlined in the Creative Commons License. Please notify administrator when using data for public or private research. 

Last update: Oct 2011

Free Site Counter
Free Site Counter