Analysis of a Haplotype

ARE MTDNA MUTATIONS RANDOM OR CHRONOLOGICAL?

ANALYSIS OF THE HVR MUTATIONS IN A HAPLOTYPE OF HAPLOGROUP K

Most people in mitochondrial DNA haplogroup K have about four HVR (Hypervariable Region) mutations beyond the six basic mutations which virtually all K's have. Some have only one extra. In the normal order, here is a haplotype - unusual enough that there are only three examples in the FamilyTreeDNA database - which has 11 extra mutations, for a total of 17 differences from the Cambridge Reference Sequence (CRS). I picked this particular haplotype to study not only because of its length, but also because it includes very common and very rare mutations and is in a large haplotype cluster which does not yet have a subclade designation.

HVR1: 16048A, 16051G, 16093C, 16224C, 16291T, 16311C, 16519C

HVR2: 73G, 195C, 230T, 263G, 315.1C, 497T, 524.1C, 524.2A, 524.3C, 524.4A

Are these just random numbers which happened at random times? Or is there a way to estimate the order in which they appeared since "mitochondrial Eve"? Are all mutations created equal? Or are some more important than others? Let me list them in what I think is the proper chronological order:

263G, 315.1C, 73G, 16519C, 16311C, 16224C, 497T, 195C, 16048A, 524.1C, 524.2A, 16291T, 16093C, 524.3C, 524.4A, 16051G, 230T

[Since this document was written, a person has shown up with a haplotype as above, but with two additional mutations 524.5C and 524.6A for a total of 19 mutations or 13 beyond K’s basic six. The chart mentioned below has been updated to reflect this new addition.]

Do I have a basis for listing the mutations in this order? I'll look at these mutations one at a time, but first some references I'll use. I had previously created a phylogenetic chart which includes this haplotype. Dr. Doron Behar, now the Chief mtDNA Scientist at FTDNA, published a paper earlier this year on Ashkenazi mtDNA, which included a comprehensive phylogenetic chart for haplogroup K. I also used Ron Scott's compilation of HVR1 mutations from FTDNA’s MitoSearch as of July 11, 2006. For HVR2 mutations, Scott's individual haplogroup files must be consulted. Another great site for percentages of all mtDNA mutations is the "Polymorphic sites" page in mtDB, the Human Mitochondrial Genome Database. The mtDB had a total of 2,487 sequences when I looked at it, but the total for each mutation will be a lower number. I will also refer to the Sorenson Molecular Genealogy Foundation (SMGF) Top 50 Mutations list. SMGF at present has 4,805 mtDNA entries. There is no count or percentage for K's, but the number of those with 16224C might be a good indicator: 328 or 6.8%. The percentage of K’s in MitoSearch as of June was higher, 8.88%. For percentages inside K, I have referred to my continuously-updated table for the K Project and a table from the K's on MitoSearch as of August. For definitions of some of the technical terms used here see Charles Kerchner’s Genetic Genealogy Glossary.

Now the discussion of the role of the individual mutations:

263G: This is not really a "mutation"; it was the CRS which had the mutation from base G to A. Even most others in haplogroup H (which includes the CRS) are 263G. On the SMGF Top 50 Mutations list, this is at the top, appearing in 4,697 of 4,805 entries, or 97.8%. In the K Project, it appears in 100%. The percentage is slightly less in K's on MitoSearch, due to the presence there of an odd haplotype cluster centered on mutation 133G. In the mtDB database it appears in 1,644 of 1,650 examples, or 99.6%. [In the future I will only give the percentage from mtDB, since the total is always about the same. One exception will be noted.]

315.1C: Most members of H also have this "mutation." Instead of everybody else having this insertion, the CRS had a deletion at this position. On the SMGF list, it is second highest with 4,688 or 97.6% of the entries. As with 263G, it appears in 100% of the K Project and slightly less in MitoSearch K's. The mtDB doesn't list values for insertions.

73G: The G base may go back to "mitochondrial Eve," with the actual mutation to A occurring between R and pre-HV then down to the CRS. All the other branches from R usually have 73G. (There are many published versions of the mtDNA phylogenetic chart, but a simple one is in Ann Turner’s article mentioned below.) This one is 4th on the SMGF list with 3,009 or 62.6%. Once again, 100% of K's in the Project and slightly less in MitoSearch. In mtDB it's in 84.2%.

16519C: On the SMGF list, this is in third place with 3,063 or 63.7%. Commonly called a "hotspot," this position has mutated back-and-forth several times in human history. It appears in almost every haplogroup on MitoSearch, in over 50% in most of them. In mtDB it's 57.3%. However, when a single haplogroup is studied, the position is often very stable. About 98% of K's have 16519C. For an in-depth discussion of this polymorphism see Ann Turner's article in the Journal of Genetic Genealogy.

16311C: This is one of the classic motifs for K. On MitoSearch, 613 of 1,480 total, or 41.4%, were in K; but it is found in nearly every haplogroup. Only in K did it appear in more than 50% of the entries. It appears in triple-digit numbers in H, U and L. It is 15th on the SMGF list, with 708 or 14.7%. In the K Project it's at 100% and about 99% in MitoSearch K's. In mtDB it's 14.6%.

16224C: This is the other classic motif for K. On SMGF, this is 28th with 328 or 6.8%. In mtDB it's in 4.6%. On MitoSearch, 620 of 670, or 92.5%, of the examples were in K. U had 15; Unknown had 20 - and many of those were probably K's tested elsewhere. In the K Project and in MitoSearch K's this runs over 99%. Since this is the mutation most closely associated with K, it could be thought of as the K keystone mutation. When 16224C occurred, K began.

497T: In Y-chromosome DNA single nucleotide polymorphisms, or SNPs, which are really the same as the mutations in mtDNA, are used to define haplogroups. Since, with rare exceptions, these have only mutated once in human history, they are also called UEPs or Unique Event Polymorphisms. This mutation, 497T, is the closest thing, in K at least, to a UEP. Probably just writing "mtDNA UEP" is considered an act of heresy. On MitoSearch I only found it once in another haplogroup, U; so I immediately suggested to FTDNA that the designation was probably incorrect. It defines subclade K1a and in K it is the only mutation outside the basic six which appears in more than half of the entries, with about 60%. On the SMGF list it is 48th with 180 or 3.7%. In the mtDB it is found in 26 of 1,927, or 1.3%.

195C: Behar's K chart has a major group with this mutation down from K1a. That branch includes K1a9, but that Ashkenazi subclade requires 16524G. A very recurrent mutation, 195C appears in several other spots on the K tree. It appears in about 25% of K's. This is 11th on the SMGF list with 744 or 15.5%, compared to 12.4% in mtDB.

16048A: Behar has an unnamed cluster under 195C in K1a with 16093C, but he does not have 16048A on his chart. Due to the number of examples found in the K Project and MitoSearch (more than 42 at last count), this mutation should probably define a new subclade, perhaps called K1a10. This is the key mutation in this haplotype and the haplotype cluster. (I use haplotype cluster to describe a group of haplotypes which do not have an official subclade designation.) Behar's data did not include any examples from the British Isles, but his Table 4 does list one non-Jewish example from Morocco. (I have suggested another Moroccan K was possibly descended from someone whose ancestry traced back to Portugal and Britain before that.) Based on the currently available examples, this mutation may have occurred in Ireland. There were only 23 examples in all of MitoSearch, with 16 of those in K. Two of the others are in Unknown; at least one of those is actually a K. It appears in about 3% of K's. In mtDB it's in 0.2%.

524.1C, 524.2A: These pairs of insertions are highly recurrent, but not random, and always occur together. In my genetic distance tables I count them as one mutation or one "mutational event." On the SMGF list, this first pair of insertions is at 34^th/35th with 263 or 5.5%. There they are listed as 524.1A and 524.2C, but apparently that is simply a different choice in listing the same things. These insertions are very important for K; this pair appears in about 20% of the entries, or about four times as often as in the general population. As I stated, these are not random; I've never found them in the Ashkenazi subclades or, with one exception, in the K1c/K1c2 subclades. Behar excluded these and some other recurrent mutations when constructing his K chart. However, in the Kivisild paper which contains perhaps the most recent and comprehensive mtDNA charts, there is a subdivision under 497-nc [non-coding] labeled "523+CA-nc" and another line at the same level as 497 labeled "523+2(CA)-nc." I interpret these to be different ways of indicating one and two pairs of the 524 insertions. So now there are at least three ways of labeling these insertions. The solution I used in my phylogenetic chart was to not use the base letters, using just 524.1, etc. In one case, the perceived genetic difference between two haplotypes was thus reduced from eight to zero. The insertions may be examples of what is called length heteroplasmy: each mitochondrion in a cell may contain one or more pairs - or none - in various combinations. (The most common example of length heteroplasmy in K is at position 309, where one or two C’s may be added; but those have not shown up in this cluster.) Therefore, different cells may contain different majority variants. This needs to be studied further.

16291T: After all the above mutations, there is a branching point as seen on my chart. The other branch is defined mainly by mutations 16047A and 316A. Mutation 16291T defines the branch in question. Until recently there was no way to properly order this and the next mutation, 16093C; but a new MitoSearch entry has surfaced with this one and 338T, but not 16093C. There is also one entry on SMGF which stops at this point. Therefore, it's pretty obvious that for this haplotype this mutation appeared next. (The example in Behar's Table 4 has 16093C, but not 16291T; but since the full sequence is not given, I can't add it to my chart.) This mutation appears in only about 3% of K's, and almost always in conjunction with 16048A. In MitoSearch, it appears in small numbers in most haplogroups with the highest counts in H and U. In mtDB it's in 2.7%.

16093C: On the SMGF list, this is 42nd with 224 entries or 4.7%. In a 2000 study of heteroplasmy in mtDNA by Tully, this mutation was at the top of the list. The C variant appeared in 6% of the samples studied, while in K it appears in about 21%. In mtDB it's in 4.5%. The C variant appears in most haplogroups, but most commonly in K, where it appears in 18 different places on Behar’s K chart. About 28% of 16093C occurs in K on MitoSearch; the C variant appears in every haplogroup except B, F and pre-HV. Ian Logan has found it in pre-HV on GenBank.

524.3C, 524.4A: This second pair of 524 insertions may have occurred thousands of years after the first pair above. The second pair occurs in about 10% of K's, probably the highest percentage of any haplogroup. There were 20 in U, but that's only about 4% of the total with HVR2 results. The most common or modal haplotype in this cluster ends here, with at least as many examples as the full haplotype under discussion plus those further back along the chain.

16051G: A rare mutation in K, there were only three in MitoSearch in July, one matching this HVR1 pattern. In the K Project there are two within this haplotype, but there is an HVR1-only entry with just this mutation and the three basic ones. That person's match on MitoSearch has HVR2 results which are very different from those of this haplotype. This mutation appears in several different haplotypes, most commonly in U and H. In mtDB it's in 2.2%.

230T: There is no easy way to tell whether this mutation followed or preceded the one above. Perhaps one day a person will show up with just one of them. There are four at FTDNA with low-resolution HVR1 matches to this haplotype, which would include 16051G; so there is a chance that one of them could upgrade in the future and not have 230T. This mutation is so rare that I did not find another example in any other haplogroup on MitoSearch. It is not even listed as a polymorphic site on mtDB. Being that rare is mainly why I listed it last.

[The newly-found haplotype mentioned above adds 524.5C and 524.6A. This pair of the HVR2 524 insertions is present in two other entries in the K Project, or 1.7%. I’ve also seen examples in haplogroup U on MitoSearch. Until recently MitoSearch would not even accept that many insertions at one position, making searches for them somewhat difficult.]

I hope I have presented a convincing case for the proper chronological order for this rare haplotype’s mutations. I also hope I have demonstrated that all mutations are not created equal; some are only differences where there was a mutation somewhere down the line to the CRS; some are very recurrent mutations which appear in many haplogroups or at many places on the K chart, possibly due to heteroplasmy; others have great defining value for subdivisions of K; one may be a Unique Event Polymorphism, or at least close to that; some are much more common in K than in the general population; one is so rare that it may be restricted to this haplotype; most are independent of each other; some tend to follow others; some appear in pairs; most are the type of mutation called substitutions, specifically called transitions; none here are the other type of substitution called transversions; some are insertions; none here are deletions.

I have only discussed above the HVR mutations (also known as control-region, displacement loop, or D-loop mutations), which constitute less than 7% of the mtDNA circular genome. The rest are coding-region (CR) mutations which are usually only revealed by a full-sequence mtDNA test. There are 20 CR mutations required just to get to the beginning of K and at least two more before the beginning of this haplotype cluster is reached. So far I have not seen an example of full-sequence results for this haplotype cluster, as defined by 16048A. But now one person with all but the last two mutations has ordered such a test, so we may eventually know more about exactly where the cluster fits on the K chart. Since CR mutations may have medical implications, that level of haplotype analysis has to be performed by someone with more knowledge of the subject than I possess.

William R. Hurst

Administrator, mtDNA Haplogroup K Project