MTDNA HAPLOGROUP K SURVEY AT 403 MITOSEARCH ENTRIES – GEOGRAPHICAL CONSIDERATIONS

 

On March 9, 2006, I published a survey of the 403 mtDNA haplogroup K on FamilyTreeDNA's MitoSearch, which included 181 non-duplicated high-resolution HVR1 plus HVR2 entries. Since then I have looked at the same data from geographical and other perspectives, resulting in this report and several additional Microsoft Excel charts. The first new chart is sorted first by geographic region, then by country of origin. Remember that in MitoSearch, the country of origin may be just a guess. If I tend to use the past tense below, it’s because K entries have already increased to 442; but the new ones will just have to wait for a “K500” survey. I will make frequent references to the recent 2006 paper by Dr. Doron Behar, et al., which has a K chart apparently being used by FTDNA to determine K subclades.

 

By geographic region, the 181 entries are divided as follows:

 

British Isles: 38 – 21.0%

Canada: 3 – 1.7%

USA: 46 – 25.4%

Scandinavia: 5 – 2.8%

Germanic countries: 11 – 6.1%

Western Europe: 6 – 3.3%

Southern Europe: 3 – 1.7%

Eastern Europe: 27 – 14.9%

Unknown: 42 – 23.2%

 

HVR1 Mutation 16320T and HVR2 Deletion at Position 498

 

Of the 181 entries, 26 or 14% are marked in yellow on the K403 chart. A new chart has only those entries. 10 of those entries had just 498-, which should indicate Dr. Behar’s K1c subclade. 14 entries added 16320T, which should put them into K1c2. Note that all 24 also had 146C and 152C, which are required for those subclades. There were only two entries with just 16320T. One, VX6P5, which listed 497- was an error which has now been corrected to 498-. The remaining entry had 16320T, but was missing 498- and 152C. It also had two of the 524 insertions discussed below, which otherwise don’t appear in this chart. Therefore, in my opinion, the 16320T mutation is a “personal” or non-defining mutation for this entry, so it probably does not belong to the K1c or K1c2 subclades.

 

Some of the K1c entries may belong in K1c1, but that subclade is defined by coding-region mutations outside HVR1/2 and thus is not determinable by using MitoSearch. These subclades, and British K’s in general, are not well represented in the full-sequence samples used by Behar.

 

Looking at the countries of origin, these entries at first glance look “Scots-Irish.” Well, most of the Irish entries were actually from the Republic of Ireland. One entry was from Northern Ireland and one from northern England. The USA entries listed states where there was significant Scots-Irish immigration, even the one from Pennsylvania; my Scots-Irish ancestors probably arrived at Philadelphia. Outside the British Isles, the major location was Scandinavia, where the three entries were in K1c2, as was the one from Germany. The one entry from Slovakia perhaps illustrates the mysterious wanderings of our distant ancestors.

 

In Europe, the K1c subclade appeared mainly in England and Scotland, while K1c2 appeared in Ireland, south and north, and in Scandinavia. (This is probably the source of the old English saying “Cross a sea, pick up a 16320T.”) The USA and other European entries were split evenly, while the Unknowns trended toward K1c2.

 

HVR1 Mutations 16223T, 16234T and 16524G and HVR2 Mutation 512C

 

Dr. Behar found Ashkenazi Jewish samples in three subclades of K: K1a1b1a defined by 16234T, with a lower unnamed group defined by 16223T; K1a9 defined by 16524G; and K2a2a defined mostly by coding region mutations, but also by 512C. Each of the 32 entries (17.7%) with one or more of those mutations is marked in green on this new chart. Behar has the mutation 114T after 16234T but before 16223T; but on MitoSearch this mutation was present in all the entries, even with just 16234T. He also has 195C before 16524G in K1a9. Those two mutations, 114T and 195C, also appeared quite often without the “Ashkenazi” markers on MitoSearch. (This is a good place to mention that there are K’s with these mutations on MitoSearch and in the mtDNA Haplogroup K Project who are not Jewish and have no known Ashkenazi Jewish ancestors. I try to stay out of any arguments about whether the mutations predated the religion or not.)

 

On MitoSearch there were 18 examples (9.9%) with 16234T; 7 (3.9%) of those also had 16223T. Only 1 entry (0.6%) had 16223T without 16234T. (I did note that there was another one like that, but without HVR2 tested.) This may have been caused by a back mutation. There were 5 examples (2.8%) of 16524G and 8 examples (4.4%) of 512C. These latter two mutations never occurred together or with the first two.

 

I have included as Eastern European countries Lithuania, Russia, Ukraine, Poland, Romania, and Hungary, with the caveat that there have been many boundary changes between these countries and between them and neighboring Germany and Austria. (Behar notes that the Ashkenazi founding was in the Rhine Basin and expanded eastward.) I found that K1a1b1a was scattered fairly evenly across the 6 countries, with only Romania having none. The addition of 16223T was also even, except for Lithuania. For some reason, 4 of the 5 Unknowns had 16223T. The other two “Ashkenazi” subclades were definitely not evenly divided. In fact, K1a9 was not found in Lithuania and Russia, while K2a2a was found only in those two Eastern European countries. A further note is that there were no “green” entries from the Czech Republic or Slovakia, which had two total entries each. MitoSearch did not have any high-resolution K entries from Albania or Bulgaria, or the countries of the original Yugoslavia, or the many other countries of the former Soviet Union.

 

HVR2 Mutations 133G, 174T, 323G, 375.1C, 557T and Back Mutations at 73G, 114T, 263G, 315.1C, 497T

 

Since two of the Ashkenazi subclades are in the major subclade K1a, they all should have had 497T. But in fact, 3 did not. Instead they had 133G, 323G, 375.1C, and 557T. The two with 16234T also had 174T. The 3 were also missing the common mutations 73G, 114T, 263G and 315.1C. In fact, it appears that these “odd” entries have had 5 back mutations. In addition, there were 4 more “odd” entries which had the 133G, etc., mutations and the back mutations; but were not Eastern European and did not appear to be Ashkenazi. There was even 1 with the back mutations, but without 133G. All these with similar mutations, Ashkenazi or not, I call the “odd” haplotype cluster. A Google search for these mutations produced nothing, except for one of my previous surveys. Interesting note: 16224C, 16311C, and 16519C were never subject to back mutations in any of these entries. Entries with the 133G, etc., mutations are marked in aqua on the K403green chart and a separate K403odd chart.

 

HVR2 Position 524 Insertions

 

Dr. Behar does not use the insertions at HVR2 position 524 to create the K subclades. However, in looking at the entries on MitoSearch, I believe there is a definite geographical aspect to these insertions. First, they always appear in pairs, alternating between A and C nucleotides and beginning with either letter. The four patterns, with their counts and percentages on MitoSearch are:

 

524.1A, 524.2C-----------------------5 –  12.5%

524.1A, 524.2C, 524.3A, 524.4C-----3 -    7.5%

524.1C, 524.2A----------------------18 –  45.0%

524.1C, 524.2A, 524.3C, 524.4A----14 – 35.0%

 

I extracted all the entries with these insertions into a new chart. The first thing I noticed was that of the colors I had used in my original K403 chart only blue appeared on this one, with one exception. The blue denotes entries with the HVR2 mutation 497T, which Behar uses to define the K1a subclade. Of the 40 entries (22% of the total) with the insertions, 31 had 497T. There were no entries in green, which would have suggested an Ashkenazi origin. The one entry with 16320T and marked in yellow was discussed above. The geographical spread was wide, with no great difference whether 497T was present, except that there were no Eastern European entries, which would have been mostly Ashkenazi. These insertions never occurred in conjunction with mutations 16223T, 16234T, 16270T, or 16356C

 

There also did not seem to be a discernable difference in the origin of the series of two or four insertions or series beginning with A’s or C’s. A question has been asked about whether there is a real difference in the two sets of insertions beginning with A or C. One study by Wilson, et al., would seem to indicate it is only a difference in the method of counting.

 

HVR2 497T “Only”

 

Obviously, I don’t mean these entries only had the 497T mutation. These had 497T, but not any of the other mutations mentioned above. They should all be in subclade K1a, but it’s more difficult to get them into a lower subclade without testing coding-region mutations. There were 73 of these or 40.3% of the total. 32, or 17.7% of the total and 43.8% of the “blues” were from the British Isles, Canada, or the Southeastern USA. Only 2 were from Eastern Europe; one each from Poland and Slovakia. The only other large group was Unknown with 20 entries. A new chart has these entries.

 

Haplotypes without Defining Mutations

 

These haplotypes have a similar breakdown to the “blue” ones above. There were 43 of them, 23.8% of the total. 17 of them, 9.4% of the total, or 39.5% of this group were from the British Isles, Canada, or the Southeastern USA. Only 2 were from Eastern Europe, the Czech Republic to be specific. Yes, there is a new chart for this group.

 

Who are U? – Haplotypes with 16270T or 16356C Mutations

 

When Bryan Sykes named the “Seven Daughters of Eve,” “Katrine,” the founder of haplogroup K, was on an equal level with the other six European mtDNA founding mothers. However, most charts now show K as part of a superhaplogroup UK or as just another subdivision of U. If the names were assigned now, we might be called U9 or maybe U8a. Consequently, K haplotypes sometimes also have defining mutations for one of the U’s. The two situations which have come up involve mutations 16270T and 16356C, which are the defining mutations for U5 and U4, respectively. There were in MitoSearch three examples of the former and two of the latter. The last time I looked, for the entries on MitoSearch with the defining mutations for U5 and K, 5 were marked U5 and 5 were marked K. FTDNA is looking at these, so K may gain or lose 5 members in the future. Recently, a person not then on MitoSearch joined the K Project, after which FTDNA changed his assignment to U4; but he is still a perfect match to the two K’s on MitoSearch. Those two are shown in yellow on the “orange” chart. (The second one had the 497- deletion on her personal page and thus on MitoSearch, but this has now been corrected to 498-.) All three look like K1c2’s to me. FTDNA is running a retest on the U4.

 

Summary

 

I recently heard someone say that mtDNA mutations didn’t mean anything by themselves; they were only useful when compared to those of other persons. The more I look at the K haplotypes, the more I know that statement not to be true. In K at least, a quick look at a person’s list of HVR1/2 mutations can in many cases give a good indication of where his or her direct maternal ancestors came from. There are parallel and back mutations, to be sure, so any one mutation will never tell the whole story. In the not-too-distant past, there were two major subclades of K defined by HVR1 mutations 16093C and 16320T. (That presented a problem for me, since I’m the only one on MitoSearch with both of those.) That changed with the addition of HVR2 to the mix and has now changed further with the addition of coding-region mutations. Unfortunately, the latter are not easily available and are still not completely representative of the world’s K population. MitoSearch has a better representation from the British Isles, but has its own limitations – no coding-region mutations; “by hand” entries leading to typographical error; duplications; and an over-representation of the USA and thus British origins. MitoSearch is an easily-available, fast-growing database, which lends itself to informal, no-budget studies like this one. Even though FTDNA has not introduced a subclade test for K, at least two people have “official” subclade designations, K1a1a and K2a, from full-sequence tests. Even though I have suggested subclades for many haplotypes above, not all can be predicted from just HVR1/2 mutations; a subclade test based on coding-region mutations will be necessary to confirm any prediction or to find subclades defined by those beyond-HVR mutations.

 

William R. Hurst