Spearin Surname Project |
what's in a name? Where & When ... Temporal & Geographic Distribution
|
Interpreting Results
A Closer Examination of Mutations Let's look more closely at the five subjects in the first cluster, Genetic Family 1 (GF1). At this stage of the project, the modal haplotype can be calculated for all 5 subjects in this cluster, based on the 37 markers in the FTDNA 37-marker test. The 43-marker DNAheritage test covers 32 out of the 37 FTDNA markers, but the additional DNAheritage Upgrade 15 Test covers the remaining 5 markers, thus allowing a direct comparison with the results on the 37 FTDNA markers shown in the table (see also http://www.gendna.net/ydnacomp.htm). The subjects differ on 4 markers, namely DYS607, CDYa, CDYb, and DYS446. The first 3 of these markers are included in the FTDNA 37-marker test, but the fourth is only included in the DNAheritage 43 marker test. Compared to the Modal Haplotype, two subjects match exactly, two have 1 mutation each, and one has 2 mutations. Using McGee's Y-utility probability matrix allows us to visualise the genetic distance between each of the subjects in a handy table. David Ewing's instructions on how to do this (and generate a phylogram) are particularly helpful. The table and phylogram show the genetic distance for each subject from every other subject (and the modal haplotype). We can compare all 5 subjects on 37 markers, and in addition we can compare 3 of the 4 DNAheritage subjects on 58 markers. The table below shows that subjects ESMXMS (Limerick) and 27JHM7 (Sydney) are an exact match for the modal haplotype (MH) based on 37 markers. The furthest from the MH is subject 356HB (Georgia, FTDNA no 164729). The furthest genetic distance is between this latter subject (from Georgia) and both QQJRCM (Ontario) and 200083 (New Jersey). This suggests that these individuals are the most distantly related from 356HB (Georgia). The most closely related are the first two, the ones that match the modal haplotype exactly (Limerick and Sydney). This says to us: these two individuals should focus their documentary research on each other to try and find a common link between them. But all this is based on only 37 markers. Different results might be obtained if further markers were tested. And for three of the subjects, a comparison is possible on 58 markers. This is described below. Will it produce the same results? Genetic Distance Table based on 37 markers
Note: Infinite allele mutation model is used. Values on the diagonal indicate number of markers tested.
.
|
|||||||||||||||
. Phylogram based on 37 markers The phylogram gives us some useful additional information. We can now visualise the genetic difference between the subjects. Each subject is represented by a dot, and the lines help show us how they may be related to each other. The red text on the lines tells us which marker has mutated between the subjects. This diagram suggests that subjects ESMXMS (Limerick) and 27JHM7 (Sydney) are closest to the "ancestral haplotype" (i.e. the haplotype of the common ancestor for the group). The families of the other 3 subjects "branched out" from this ancestral haplotype, but in 3 different "directions", each developing their own unique mutation(s). As more people join the project, we will be able to see which "branch" they most closely match, and this in turn will inform them where to focus their ongoing documentary research i.e. who should be talking to who. But there are a couple of caveats. All this programme does is produce a "best fit" for the data entered. It is not necessarily the correct fit, just the best fit given the data entered. Entering more data from more markers may produce a different "best fit" diagram. Also, these results are based on living individuals. How can we be sure that they have inherited the exact same haplotype as their Most Distant Known Ancestor (MDKA)? Maybe any mutations that occurred happened since the MDKA. So how do we resolve this? This is where triangulation comes in (see section on DNA Family Trees). Testing a second subject with the same MDKA will help triangulate the haplotype. And if there is a difference, a third subject will complete the process and identify the MDKA haplotype (in 99.9% of cases). Thereafter, we would generate an additional table and phylogram based on the MDKA haplotype of each family. And this would be exactly the same as the ones here if no mutations had occurred since each MDKA.
|
||||||||||||||||
. Genetic Distance Table based on 58 markers Comparing the three subjects with 58 markers tested, shows that Sydney (SYD1) is closest to the MH and differs least from the other two. This suggests that Sydney is now closest to the ancestral haplotype, whereas with the 37-marker comparison both Sydney and Limerick (LIM1) were reckoned to be closest. The additonal markers thus give greater discerning power to the test. Ontario (ON1) and Limerick were separated by a single mutation in the 37-marker comparison but now are further apart than previously. Thus the more markers tested, the better we are able to separate the different families out. A phylogram generated from this 58-marker comparison would just show a line with 3 dots separated by a single mutation each. So how do we find out who is most closely related to whom? . Let's look even closer at the data. Limerick (LIM1) differs from Sydney (SYD1 or NSW1) on 1 marker (DYS446), but he also differs from New Jersey (NJ1) on 1 marker (CDYb), based on 58 markers and 37 markers respectively. Should we give these 1 marker differences equal weight? But before we even consider that question, we could actually compare 4 of the subjects on 40 markers with the currently available results (and when New Jersey (NJ1) gets the rest of his results we can compare the same 4 subjects on 58 markers). And the 40-marker comparison shows that Limerick (LIM1) differs from New Jersey (NJ1) on 2 markers (Limerick's DYS446 mutation, and New Jersey's CDYb mutation). But Limerick, New Jersey, and Ontario ALL differ from Sydney by
1 marker, on DYS446, CDYb and CDYb respectively. So who is the closest to Sydney on the genetic timeline? This is where we look at the mutation rate - the
marker with the slowest mutation rate is LIKELY to indicate the more distant
relative (but this is just on the balance of probabilities). And the answer is ... DYS446 mutation rate = 0.00365 CDYa & b mutation rate = 0.03531 ... 10
times faster! So it looks like New Jersey and Ontario are closer to Sydney than Limerick is, even though they all only differ by 1 marker (I got the information on mutation rates from here - mutation rates 1 to 111 markers.xls) As for the subject from Georgia (GA1), he differs from Sydney on 2 markers (CDYa and DYS607) and is the furthest genetically from everyone
else. The DYS607 mutation rate = 0.00411 (similar to DYS446). We can now map this information in to a different sort of phylogram and this gives us a better idea of how different families may have branched off ... and according to this "best fit" model, it looks like the Limerick family may have been the first to mutate.
|
||||||||||||||||
. Estimates of TMRCA (Time to Most Recent Common Ancestor)
|
Copyright 2011 (http://freepages.genealogy.rootsweb.com/~spearin) All
Rights Reserved.
The Spearin Surname Project at http://freepages.genealogy.rootsweb.com/~spearin is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Information and data obtained from the Spearin Surname Project must be attributed to the project as outlined in the Creative Commons License. Please
notify administrator when using data for public or private research.
Last update: Oct 2011