Spearin Surname Project |
Where & When ... Temporal & Geographic Distribution Mutations - the key to identifying distant relatives The whole reproductive process is truly awesome and may at first glance
seem incredibly efficient, but ... accidents will happen. And every family
researcher should be grateful that they do! In short, sometimes during the
replication process, when the cell is busy reconstructing the missing half of
the railway line that has just been split up the middle, it makes a mistake.
And that mistake is called a mutation.[1] There are different types of mutation (e.g. SNP = single nucleotide
polymorphism) but the mutations most relevant to Y-DNA testing are the ones that involve STRs (short
tandem repeats).[2]
An STR is a sequence of two or more base pairs (the sleepers on the railway track)
that is subsequently repeated along the DNA railway track to produce several
repetitions of the same base-pair sequence (e.g. TTAGC,TTAGC,TTAGC,TTAGC ... 4
repeats of 5 'railway sleepers'). The sequence can range from 2 to 50 base
pairs, is typically in a non-coding region, and may repeat anything from 6 to
30 times (for example). However, during meiosis, the cell may inadvertently add or subtract a
sequence from the DNA so that instead of repeating 5 times, the sequence may repeat
7 times in each of the 4 new sex cells. This is a mutation. And it will be
passed from father to son in the Y-containing sperm. The father will have a
repeat value of 5 for this STR, whereas the son will have a repeat value of 7.
And if there are no replication errors when it comes time for the son to father
children, then his STR with 7 repeats will be passed on to his children,
and in turn their children, and so on, until another accident occurs resulting
in another mutation. But that may be 500 years down the line. By identifying repeats of a specific sequence at specific locations on
the Y chromosome, it is possible to create a personal genetic signature (haplotype) for an
individual. Thus, these STR's can serve as genetic markers that can also establish
the degree of similarity between two people. STR analysis has
become the prevalent analysis method for determining genetic profiles in forensic cases
and it is the main genetic tool when trying to establish common ancestry
between individuals in surname projects such as our own. Each marker has a range of values - some have a narrow range (e.g. 10-14) whereas others can have a very broad range (e.g. 6-27). The range of values per marker is documented at www.genebase.com/in/dnaMarkerDetail.php There are currently over 10,000 published STR sequences in the human
genome. That's roughly about 200 per chromosome, or one sequence every 2
miles on the London to Aberdeen railway line. For genealogical purposes, about
100 STR's in total are currently used by the various DNA testing companies to
characterise Y-chromosome haplotypes (personal genetic signatures).
FamilyTreeDNA (FTDNA), for example, offer a 12-marker, 25-marker, 37-marker, 67 marker, and more recently (April 2011) a 111-marker test; DNA-heritage offered a 23-marker and 43-marker test (before they were bought by FTDNA in June 2011). And
there are other companies that offer a variety of other tests (see http://www.isogg.org/wiki/List_of_DNA_testing_companies). Each company and
each test has its pros and cons. Marker
name DYS426 DYS437 DYS438 DYS439 DYS441 DYS442 DYS444 DYS445 DYS446 Relative Mutation Rate* 1 11 6.11 53 36 Modal Haplotype (MH) 11 14 10 11 14 17 13 10 12 Tom 11 14 10 11 14 17 13 10 12 Dick 11 14 10 11 14 17 13 10 12 Sam 11 14 10 11 14 17 13 10 13 Harry 11 14 10 11 14 17 13 10 12 The test results are presented as a series of Y-DNA haplotypes (example above).
This is a list of the various markers (STRs) with their respective repeat
values underneath for each of the individuals tested. The table above contains
real (anonymised) data taken from the actual results of the Spearin Y-DNA
project. Only 9 markers are shown but the results are identical for Tom,
Dick and Harry, whereas Sam differs from the rest by 1 repeat on the marker
DYS446. Sam is therefore said to differ from the rest by a genetic distance of
1.[3] The
greater the genetic distance between two people, the less likely it is that
there is a close relationship between them. But which came first? Did Sam's ancestor develop a mutation and split
away from the 'parent group'? Or did Sam's ancestor belong to the 'parent
group' and the other three participants' ancestor was the one who started the 'splinter
group'? In other words, which haplotype is older - the one with the DYS446
value of 12, or the one with a value of 13? The good news is that there are
various analytical techniques, based on mutation rates and probability
analyses, that can help to answer these questions.[4] The bad news is that this science is still so young that the results are not as accurate as one would like. Mutations can
predict when lines broke away from each other There are several terms worth knowing before we continue. One term is:
the Most Recent Common Ancestor and refers to the earliest ancestor shared in
common by two individuals. For two brothers, their MRCA is their father; for
two second cousins, their MRCA is their great grandfather. The other term is:
the Most Distant Known Ancestor and refers to the patriarch at the top of your
family tree after whom you have your Brick Wall. Y-DNA testing answers several different questions in sequence: 1. How closely are two men with the same surname
related? 2. What is the estimate for when they shared a
common ancestor (this is the MRCA)? 3. How are different branches of the same genetic
family related? And who is more related to whom? Let's look at each question in turn. The closer two people match on
their haplotype, the more likely they are to be related. If they are a perfect
match they are probably related to each other in the very recent past (say the
last 100-300 years). And the more markers they have tested and match on, the
stronger the probability (i.e. a 37-marker match indicates a much stronger probability
than a 12-marker match). In the Spearin Y-DNA project, 3 of the first 4
participants were exact matches on 43 markers, indicating a very close match
and a high probability of sharing a common ancestor in the past 300 years. In
our particular Surname Project, the implication of a positive answer to this
first question is that the participant can claim a connection to the London
Spering's. Next, we could estimate the TMRCA (Time to MRCA) by either using FTDNA's
Time Predictor tool (TiP)[5] or Dean
McGee's Y-utility probability matrix.[6] There
are other similar tools that could be used and there is no concensus currently on
which is the best, but the two mentioned base their calculations on the known
mutation rates of the STR markers on the Y-chromosome. If we know how
frequently a mutation is likely to occur in a particular marker, we can calculate
the 'Time to Most Recent Common Ancestor' based on that marker.[7] Say, for
example, the mutation rate for marker DYS446 above was once every 300 years on
average, this would mean that the split between the two groups (haplotypes) probably
occurred sometime in the previous 300 years. The problem is it could have happened in the
previous generation, or it could have happened in the early 1700's - not a very narrow range. If we know
from documentary research that there is no link between Sam's tree and the
other three trees going back to about 1800, this would mean that Sam's group probably
split away from the other group sometime between 1700 and 1800. Probably. And this estimate is based on only 1 marker. With 37 markers, the
probability estimate can be much more exact. One would think that the accuracy
of the result would improve with more markers (e.g. with the 67-marker test)
but it appears that there isn't a huge increase in additional accuracy above 37
markers. However, testing more than 37 markers has other advantages that we will
discuss below (FTDNA introduced a new 111-marker test in April 2011). Before we leave the TMRCA calculations, it is important to appreciate
that mutation rates can differ between markers by a factor of several thousand!
In the sample haplotype from the project results above, the mutation rates of
the various markers are expressed as a ratio, relative to the mutation rate of
DYS426. One can see that marker DYS439 mutates 53 times faster than DYS426. Put
simply, some markers may mutate once every 100 years, others once every 5000
years ... so the interpretation of a genetic distance of 1 very much depends on
which marker we are talking about. Data in the example is taken from Chandler,
Journal of Genetic Genealogy, 2006,[8] but
watch this space because the science is constantly being revised as more data
becomes available. The third question, how closely are the various branches related, is a
very interesting one and still the topic of much debate. Theoretically it
should be possible to build a 'Mutation History' family tree that shows when
mutations occurred and which branches arose from individuals bearing that
mutation. The science behind this is discussed in the next section - DNA Family
Trees. Just when you think everything is hunky dory, something comes along and
throws a spanner in the works. Or in this case, several spanners, including
reverse mutations, multi-step mutations, parallel mutations, multiple-copy STR's, lack of knowledge and different labs behaving in different ways. Biological Problems Reverse or back mutations are exactly what they sound like. First there
is a mutation one way, and then there is a mutation back to the original. So
for example, in 1580 a Nicholas Sperynge with a DYS446 value of 12 passes on a
mutation with a DYS446 value of 13 to one of his sons (Luke), and thus a new
subgroup is formed. This son's descendants bear this mutation for 10
generations until the conception of Matthew Spearin in 1830 when another
mutation occurs, but instead of mutating forward (to give a DYS446 value of
14), the mutation goes backwards to a DYS446 value of 12 i.e. the same as
Nicholas Sperynge back in 1580. This is a reverse mutation. The problem it
causes is that Matthew's descendants may look like they belong to the 'parent
group' headed by patriarch Nicholas Sperynge from 1580, when in fact they
belong to a much younger (more distantly related) subgroup from 1830 headed by
Matthew. In our example from the actual Spearin data, it may be that any or all
of the three participants with exact 43-marker matches have had a reverse
mutation in their ancestors' past and are in fact more distantly related to
each other than first meets the eye. Secondly, most mutations (95%) are 'single step' mutations i.e. the STR repeat value goes up or down by a value of 1. However, sometimes it changes by a value of 2 (in about 5% of cases, 1 in every 20 times) and very rarely by multiple steps. This can cause confusion if you are expecting single-step mutations. Another fly in the ointment is the possibility that two distinct lines
develop the same mutation at some point in their evolution. Even though they evolved separately, by bearing the same mutation it looks as if their descendants are closely related. In this situation they
will be grouped together under the mistaken belief that they are genetically
more closely related than they actually are. A fourth limitation of STR testing is that sometimes there are several
copies of the same marker. In other words, the same STR occurs at several
places along the genetic railway line - once in London, twice on the outskirts
of Birmingham, and once in Glasgow. But the way these markers are analysed means its impossible to tell which one came from where, so I don't know if I'm comparing the one from London with the one from Glasgow. This isn't a problem if the marker values are the same throughout e.g. values on DYS464 for two individuals of 14-14-14-14 and 14-14-14-14. However, if there are any variations in the numbers (e.g. 13-14-14-14) then it is impossible to know if the two sets are the same or different (e.g. the second set may also be 13-14-14-14 but the correct order for the second one should be 14-13-14-14 indicating a mismatch and genetic distance of 2. But this is impossible to tell with current testing procedures). So, how do we handle this? How do we find out what is the probability of each of the following for each of the markers: 1) reverse mutations; 2) multi-step mutations; 3) parallel mutations? We then also have the issue of multiple copy markers - what do we do with these? These are all relevant questions, but how relevant are they if we are trying to connect people who lived in the last 500 years? Are they relevant at all for this type of analysis? John Robb thinks maybe not! Basically, the chances of these happening in the past 500 years may be remote and therefore NOT relevant. You can read his article here. Hopefully the answers to these questions will become more clear over time. Logistical Problems There are several other challenges currently facing genetic genealogy. Firstly, as new markers are discovered, it takes some time before their
mutation rates can be calculated (because this depends on testing sufficient
samples to arrive at an estimate of the mutation rate). And these mutation
rates are necessary for further refining any calculation of the time to most
recent common ancestor (see Whit Athey's editorial in http://www.jogg.info/52/files/Intro.pdf). Another problem is the current lack of standardisation of calculations
of time to most recent common ancestor. James Irvine has suggested a
standardised approach (in his 2010 article in the Journal of Genetic Genealogy)
but it remains to be seen if this is endorsed on a wide scale (see http://www.jogg.info/62/files/Irvine.pdf) Despite
these current drawbacks, a lot can be learned from genetic testing and as the
field is constantly changing, many of these glitches will be ironed out in due course. Lab-related Problems Lastly, there are several problems relating to how different labs approach the testing of Y-DNA: So, you can see that the science is still developing and improvements will need to be made all the time. Some interesting bits & pieces In 389 father-son pairs, 6%
of the sons received a mutation from their fathers on only one marker while less than 1%
of the population had mutations on two markers. Based solely on
this study, double marker changes do occur; however, they do not occur in a
significant number of the population. All samples resulted in single repeat mutations except one sample which contained a two repeat loss at Y-GATA-H4. Ref: http://www.cstl. [1] See also http://en.wikipedia.org/wiki/Mutation [4] For more information on interpreting results, see http://www.familytreedna.com/faq/answers/default.aspx?faqid=19,
http://www.familytreedna.com/faq/answers/default.aspx?faqid=36,
and http://www.familytreedna.com/faq/answers/default.aspx?faqid=9#895 Join us today ... you could find out more than you ever imagined! Maurice
Gleeson
What does a haplotype look like?
Limitations of
Y-DNA testing using STR's
Oct 2011
Copyright 2011 (http://freepages.genealogy.rootsweb.com/~spearin) All
Rights Reserved.
The Spearin Surname Project at http://freepages.genealogy.rootsweb.com/~spearin is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Information and data obtained from the Spearin Surname Project must be attributed to the project as outlined in the Creative Commons License. Please
notify administrator when using data for public or private research.
Last update: Oct 2011