ARE MTDNA MUTATIONS RANDOM OR
CHRONOLOGICAL?
ANALYSIS OF THE HVR MUTATIONS IN A
HAPLOTYPE OF HAPLOGROUP K
Most
people in mitochondrial DNA haplogroup K have about four HVR (Hypervariable
Region) mutations beyond the six basic mutations which virtually all K's have.
Some have only one extra. In the normal order, here is a haplotype - unusual
enough that there are only three examples in the FamilyTreeDNA database - which
has 11 extra mutations, for a total of 17 differences from the Cambridge
Reference Sequence (CRS). I picked this particular haplotype to study not only
because of its length, but also because it includes very common and very rare
mutations and is in a large haplotype
cluster which does not yet have a subclade designation.
HVR1:
16048A, 16051G, 16093C, 16224C, 16291T, 16311C, 16519C
HVR2:
73G, 195C, 230T, 263G, 315.1C, 497T, 524.1C, 524.2A, 524.3C, 524.4A
Are these
just random numbers which happened at random times? Or is there a way to
estimate the order in which they appeared since "mitochondrial Eve"?
Are all mutations created equal? Or are some more important than others? Let me
list them in what I think is the proper chronological order:
263G,
315.1C, 73G, 16519C, 16311C, 16224C, 497T, 195C, 16048A, 524.1C, 524.2A,
16291T, 16093C, 524.3C, 524.4A, 16051G, 230T
[Since
this document was written, a person has shown up with a haplotype as above, but
with two additional mutations 524.5C and 524.6A for a total of 19 mutations or
13 beyond K’s basic six. The chart mentioned below
has been updated to reflect this new addition.]
Do I have
a basis for listing the mutations in this order? I'll look at these mutations
one at a time, but first some references I'll use. I had previously created a
phylogenetic chart
which includes this haplotype. Dr. Doron Behar, now the Chief mtDNA Scientist
at FTDNA, published a paper earlier this year on Ashkenazi mtDNA, which included
a comprehensive phylogenetic chart
for haplogroup K. I also used Ron Scott's compilation
of HVR1 mutations from FTDNA’s MitoSearch as of July 11, 2006. For HVR2
mutations, Scott's individual haplogroup
files must be consulted. Another great site for percentages of all mtDNA
mutations is the "Polymorphic sites" page in mtDB, the Human Mitochondrial Genome
Database. The mtDB had a total of 2,487 sequences when I looked at it, but the
total for each mutation will be a lower number. I will also refer to the
Sorenson Molecular Genealogy Foundation (SMGF) Top 50 Mutations list. SMGF at
present has 4,805 mtDNA entries. There is no count or percentage for K's, but
the number of those with 16224C might be a good indicator: 328 or 6.8%. The
percentage of K’s in MitoSearch as of June was
higher, 8.88%. For percentages
inside K, I have referred to my continuously-updated table
for the K Project and a table
from the K's on MitoSearch as of August. For definitions of some of the
technical terms used here see Charles Kerchner’s Genetic Genealogy
Glossary.
Now the
discussion of the role of the individual mutations:
263G:
This is not really a "mutation"; it was the CRS which had the
mutation from base G to A. Even most others in haplogroup H (which includes the
CRS) are 263G. On the SMGF Top 50 Mutations list, this is at the top, appearing
in 4,697 of 4,805 entries, or 97.8%. In the K Project, it appears in 100%. The
percentage is slightly less in K's on MitoSearch, due to the presence there of
an odd haplotype cluster centered on mutation 133G. In the mtDB database it appears
in 1,644 of 1,650 examples, or 99.6%. [In the future I will only give the
percentage from mtDB, since the total is always about the same. One exception
will be noted.]
315.1C:
Most members of H also have this "mutation." Instead of everybody
else having this insertion, the CRS had a deletion at this position. On the
SMGF list, it is second highest with 4,688 or 97.6% of the entries. As with
263G, it appears in 100% of the K Project and slightly less in MitoSearch K's.
The mtDB doesn't list values for insertions.
73G: The
G base may go back to "mitochondrial Eve," with the actual mutation
to A occurring between R and pre-HV then down to the CRS. All the other
branches from R usually have 73G. (There are many published versions of the
mtDNA phylogenetic chart, but a simple one is in Ann Turner’s article mentioned
below.) This one is 4th on the SMGF list with 3,009 or 62.6%. Once again, 100%
of K's in the Project and slightly less in MitoSearch. In mtDB it's in 84.2%.
16519C:
On the SMGF list, this is in third place with 3,063 or 63.7%. Commonly called a
"hotspot," this position has mutated back-and-forth several times in
human history. It appears in almost every haplogroup on MitoSearch, in over 50%
in most of them. In mtDB it's 57.3%. However, when a single haplogroup is
studied, the position is often very stable. About 98% of K's have 16519C. For
an in-depth discussion of this polymorphism see Ann Turner's article in the
Journal of Genetic Genealogy.
16311C:
This is one of the classic motifs for K. On MitoSearch, 613 of 1,480 total, or
41.4%, were in K; but it is found in nearly every haplogroup. Only in K did it
appear in more than 50% of the entries. It appears in triple-digit numbers in
H, U and L. It is 15th on the SMGF list, with 708 or 14.7%. In the K Project
it's at 100% and about 99% in MitoSearch K's. In mtDB it's 14.6%.
16224C:
This is the other classic motif for K. On SMGF, this is 28th with 328 or 6.8%.
In mtDB it's in 4.6%. On MitoSearch, 620 of 670, or 92.5%, of the examples were
in K. U had 15; Unknown had 20 - and many of those were probably K's tested
elsewhere. In the K Project and in MitoSearch K's this runs over 99%. Since
this is the mutation most closely associated with K, it could be thought of as
the K keystone mutation. When 16224C occurred, K began.
497T: In
Y-chromosome DNA single nucleotide polymorphisms, or SNPs,
which are really the same as the mutations in mtDNA, are used to define
haplogroups. Since, with rare exceptions, these have only mutated once in human
history, they are also called UEPs or Unique Event Polymorphisms. This
mutation, 497T, is the closest thing, in K at least, to a UEP. Probably just
writing "mtDNA UEP" is considered an act of heresy. On MitoSearch I
only found it once in another haplogroup, U; so I immediately suggested to
FTDNA that the designation was probably incorrect. It defines subclade K1a and
in K it is the only mutation outside the basic six which appears in more than
half of the entries, with about 60%. On the SMGF list it is 48th with 180 or
3.7%. In the mtDB it is found in 26 of 1,927, or 1.3%.
195C:
Behar's K chart has a major group with this mutation down from K1a. That branch
includes K1a9, but that Ashkenazi subclade requires 16524G. A very recurrent
mutation, 195C appears in several other spots on the K tree. It appears in
about 25% of K's. This is 11th on the SMGF list with
744 or 15.5%, compared to 12.4% in mtDB.
16048A:
Behar has an unnamed cluster under 195C in K1a with 16093C, but he does not
have 16048A on his chart. Due to the number of examples found in the K Project
and MitoSearch (more than 42 at last count), this mutation should probably
define a new subclade, perhaps called K1a10. This is the key mutation in this
haplotype and the haplotype cluster. (I use haplotype
cluster to describe a group of haplotypes which do not have an official
subclade designation.) Behar's data did not include any examples from the
British Isles, but his Table 4 does list one non-Jewish example from
524.1C,
524.2A: These pairs of insertions are highly recurrent, but not random, and
always occur together. In my genetic distance tables I count them as one
mutation or one "mutational event." On the SMGF list, this first pair
of insertions is at 34th/35th with 263 or 5.5%. There they are
listed as 524.1A and 524.2C, but apparently that is simply a different choice
in listing the same things. These insertions are very important for K; this
pair appears in about 20% of the entries, or about four times as often as in
the general population. As I stated, these are not random; I've never found
them in the Ashkenazi subclades or, with one exception, in the K1c/K1c2
subclades. Behar excluded these and some other recurrent mutations when
constructing his K chart. However, in the Kivisild paper
which contains perhaps the most recent and comprehensive mtDNA charts, there is
a subdivision under 497-nc [non-coding] labeled "523+CA-nc" and
another line at the same level as 497 labeled "523+2(CA)-nc." I interpret these to be different ways of
indicating one and two pairs of the 524 insertions. So now there are at least
three ways of labeling these insertions. The solution I used in my phylogenetic
chart was to not use the base letters, using just 524.1, etc. In one case, the
perceived genetic difference between two haplotypes was thus reduced from eight
to zero. The insertions may be examples of what is called length heteroplasmy: each mitochondrion in a cell may contain one
or more pairs - or none - in various combinations. (The most common example of
length heteroplasmy in K is at position 309, where one or two C’s may be added;
but those have not shown up in this cluster.) Therefore, different cells may
contain different majority variants. This needs to be studied further.
16291T:
After all the above mutations, there is a branching point as seen on my chart.
The other branch is defined mainly by mutations 16047A and 316A. Mutation
16291T defines the branch in question. Until recently there was no way to
properly order this and the next mutation, 16093C; but a new MitoSearch entry
has surfaced with this one and 338T, but not 16093C. There is also one entry on
SMGF which stops at this point. Therefore, it's pretty obvious that for this haplotype
this mutation appeared next. (The example in Behar's Table 4 has 16093C, but
not 16291T; but since the full sequence is not given, I can't add it to my
chart.) This mutation appears in only about 3% of K's, and almost always in
conjunction with 16048A. In MitoSearch, it appears in small numbers in most
haplogroups with the highest counts in H and U. In mtDB it's in 2.7%.
16093C:
On the SMGF list, this is 42nd with 224 entries or 4.7%. In a 2000 study
of heteroplasmy in mtDNA by Tully, this mutation was at the top of the list.
The C variant appeared in 6% of the samples studied, while in K it appears in about
21%. In mtDB it's in 4.5%. The C variant appears in most haplogroups, but most
commonly in K, where it appears in 18 different places on Behar’s K chart.
About 28% of 16093C occurs in K on MitoSearch; the C variant appears in every
haplogroup except B, F and pre-HV. Ian Logan has found it in pre-HV on GenBank.
524.3C,
524.4A: This second pair of 524 insertions may have occurred thousands of years
after the first pair above. The second pair occurs in about 10% of K's,
probably the highest percentage of any haplogroup. There were 20 in U, but
that's only about 4% of the total with HVR2 results. The most common or modal haplotype in this cluster ends
here, with at least as many examples as the full haplotype under discussion
plus those further back along the chain.
16051G: A
rare mutation in K, there were only three in MitoSearch in July, one matching
this HVR1 pattern. In the K Project there are two within this haplotype, but
there is an HVR1-only entry with just this mutation and the three basic ones.
That person's match on MitoSearch has HVR2 results which are very different
from those of this haplotype. This mutation appears in several different
haplotypes, most commonly in U and H. In mtDB it's in 2.2%.
230T:
There is no easy way to tell whether this mutation followed or preceded the one
above. Perhaps one day a person will show up with just one of them. There are
four at FTDNA with low-resolution HVR1 matches to this haplotype, which would
include 16051G; so there is a chance that one of them could upgrade in the
future and not have 230T. This mutation is so rare that I did not find another
example in any other haplogroup on MitoSearch. It is not even listed as a
polymorphic site on mtDB. Being that rare is mainly why I listed it last.
[The
newly-found haplotype mentioned above adds 524.5C and 524.6A. This pair of the
HVR2 524 insertions is present in two other entries in the K Project, or 1.7%.
I’ve also seen examples in haplogroup U on MitoSearch. Until recently
MitoSearch would not even accept that many insertions at one position, making
searches for them somewhat difficult.]
I hope I
have presented a convincing case for the proper chronological order for this
rare haplotype’s mutations. I also hope I have demonstrated that all mutations
are not created equal; some are only
differences where there was a mutation somewhere down the line to the CRS; some
are very recurrent mutations which appear in many haplogroups or at many places
on the K chart, possibly due to heteroplasmy; others have great defining value
for subdivisions of K; one may be a Unique Event Polymorphism, or at least
close to that; some are much more common in K than in the general population;
one is so rare that it may be restricted to this haplotype; most are
independent of each other; some tend to follow others; some appear in pairs;
most are the type of mutation called substitutions, specifically called
transitions; none here are the other type of substitution called transversions;
some are insertions; none here are deletions.
I have
only discussed above the HVR mutations (also known as control-region,
displacement loop, or D-loop mutations), which constitute less than 7% of the
mtDNA circular genome. The rest are coding-region (CR) mutations which are
usually only revealed by a full-sequence mtDNA test. There are 20 CR mutations
required just to get to the beginning of K and at least two more before the
beginning of this haplotype cluster is reached. So far I have not seen an
example of full-sequence results for this haplotype cluster, as defined by 16048A.
But now one person with all but the last two mutations has ordered such a test,
so we may eventually know more about exactly where the cluster fits on the K
chart. Since CR mutations may have medical implications, that
level of haplotype analysis has to be
performed by someone with more knowledge of the subject than I possess.
William
R. Hurst
Administrator,
mtDNA Haplogroup K Project