mtDNA
Haplogroup K Project
Progress Report at 1000
Members
June 2, 2008
The
mtDNA Haplogroup K Project at FamilyTreeDNA reached a major milestone on June
2, 2008, with 1000 members 29 months after its founding. 979 of the members are
shown on the mtDNA Results tab on the K Project website.
Twenty-one members on the Results tab tested their mtDNA with nine companies
other than FTDNA. As far as I know, we continue to be the second largest mtDNA
haplogroup project after the H project.
396,
or 39.6%, of our members came originally from the National Geographic Society's
Genographic Project. Eight of the members began at FTDNA’s European office. At
least 584 of the FTDNA members, or 59.7%, have uploaded their data to
MitoSearch; others may have uploaded their results “by hand” and wouldn’t be
counted. 241 members have received coding-region results and official subclade
designations from full-sequence tests. All except one from Argus BioSciences were FTDNA members. We are waiting for the
results from 28 more of these FGS or Mega tests. So 268 or 27.4% of the FTDNA
members have ordered FGS tests. 49 of these have been uploaded to the federal
GenBank database so far, with other submissions in progress. Of the 228 total
FTDNA customer GenBank submissions, 21.5% are from K. There are
more K Project members with FGS results than there are K FGS results on GenBank
– even counting the ones in both.
656
members, or 67%, of the FTDNA members have HVR1 plus HVR2, or high-resolution,
results. In addition, ten of the non-FTDNA entries also have full high-resolution
results. (Eleven others have HVR1 only or incomplete HVR2 results.) We have 97
sets of high-resolution matches, including 451 members, or 67.7% of those with
high-resolution results. That's up from 60.3% at 500 members. With the 215
unmatched high-res "singletons" added to the 97 haplotypes in
matches, there are 312 different high-res haplotypes, for what I've been
calling a "diversity percentage" of 46.8% - compared to 55.7% at 500
members. That percentage continues to go down as new members are more likely to
find matches. The new percentage now means that the odds favor a new member
with high-res results having an exact match.
The
666 high-resolution results appear on a CHART
sorted by HVR2 then HVR1. A legend at the bottom of the chart explains the
colors used. The Haplo column contains either K or
the subclades assigned by FTDNA from full-sequence tests. There are now 25
different subclades from the FGS tests. Provisional subclades K1a10 and K1a11
are shown as just K1a.
The
haplotype (list of mutations) with the largest number of matches is now a
branch of the Ashkenazi K1a1b1a subclade, with 33 members. It has the six basic
K mutations - 16224C, 16311C, 16519C, 73G, 263G and
315.1C - plus 497T, the defining mutation for K1a, plus the defining HVR
mutation for K1a1b1a, 16234T, plus 114T.
The
second largest haplotype, with 26 members, is the ancestral haplotype of
subclade K2a with the six basic mutations plus 146C and 152C.
The
third largest haplotype, with 24 members, is the same as the largest in K1a1b1a
with the additional mutation 16223T. I have referred to this as the “second
modal” haplotype of K1a1b1a.
The
fourth largest haplotype, with 22 members, has the basic six mutations plus
146C, 152C, 498- and 16320T - the ancestral and modal K1c2.
The
fifth largest with 19 members is in K1a9, another Ashkenazi subclade; this adds
195C, 497T, 16093C and the key mutation 16524G. Note that this one is similar
to its “sister” subclade K1a10, found most commonly in those with Irish
ancestry.
The
sixth largest haplotype, with 17 members, has the basic six mutations, plus
497T and 195C; the latter mutation defines a large group under K1a. Added to
that are 16048A, 16093C, 16291T and two pairs of position 524 insertions to
form the modal haplotype of a subclade I’ve provisionally called K1a10.
The
number of matches for each haplotype is shown in the Counts column of the
CHART. Remember that the subclades are only officially determined by full-sequence
tests. Also, it should be noted that mutations don’t have equal value for
finding close connections. Two haplotypes only differing on 497T, for example, would
not be related in many thousands of years. But two differing on 309.1C could be
as close as siblings.
The
HVR1 and HVR2 mutation lists highlighted in yellow are those with the 498- and
16320T mutations, almost always suggesting K1c1 or K1c2 subclades. Those in
bright green are generally those in Dr. Doron Behar's Ashkenazi subclades,
K1a1b1a, K1a9 and K2a2a. About 123 members, or 18.5% of the 666 high-resolution
entries, are in one of the Ashkenazi subclades. I have used purple to mark the
524 series of HVR2 insertions. Of the 195, or 29.3%, with the 524's, eight
members have six of those insertions each, while one has eight.
Members
Subgrouping
FTDNA
recently implemented the long-awaited Members Subgrouping
feature for mtDNA results, which is reflected in the members’ chart under the
mtDNA Results tab on our website. The feature allows the group administrator to
create subgroups and assign members to them. Usually the subgroups are named
subclades of K. With a few exceptions the subgroups will contain those members
assigned to subclades by full-sequence (FGS) tests plus those predicted to be
in the subclades from HVR results. A few subclades are combined into subgroups.
Other subgroups are named after provisional subclades not on Dr. Doron Behar’s
current K tree. All these will be discussed below. The subgroups assigned to
members may change as new information becomes available. It should be remembered that official subclade designations are assigned
only by FGS tests. The counts mentioned below are as of the 1000-member
report for the K Project.
K1: This subgroup only
contains the two members assigned to subclade K1 by FGS tests. The members have
the defining coding-region mutations for K1, but not for any of the lower K1a,
K1b or K1c subclades. This is the subclade of Ötzi
the Iceman. There is no way to predict this subclade from HVR results.
K1a – Designated: These three members
have been designated as K1a, but not assigned to any lower subclade. Excluded
are those designated as K1a, but who fall into one of the provisional subclades
discussed below.
K1a – Predicted: This large subgroup
includes those who have not received assignments from FGS tests, but who either
have the defining mutation for K1a, 497T, or who have certain exact HVR1
matches with those in K1a. All of these members would move to another subgroup
if FGS test results were available. Excluded are those whose HVR results
indicate membership in one of the deeper K1a subclades.
K1a + 195C: On Behar’s K tree
there is a group of sequences including the Ashkenazi K1a9 subclade and a few
others with no subclade labels. This subgroup excludes those in K1a9 and the
provisional K1a10. I have previously called those with 195C and one or more
pairs of position 524 insertions “Pre-K1a10” and those without the insertions
“Pre-K1a9.” Some members of this subgroup might end up in other subclades,
usually K1a4a1, if FGS tests were taken. Since this subgroup requires two
specific HVR2 mutations, there are no members in it with just HVR1 results.
K1a1: This only includes
two members designated by FGS tests plus one exact HVR match. It’s usually very
difficult to predict K1a1 and its lower subclades from HVR-only results.
K1a10: Due to the alphanumeric
method by which the subgroups are sorted by FTDNA, this subgroup follows K1a1,
although it is not part of that group. K1a10 is a large provisional subclade
solely defined by HVR1 mutation 16048A. All members have 195C, so they are
relatively close to the K1a9 subclade and the “K1a + 195C” subgroup. Members
with FGS results are temporarily shown just as K1a. There is some chance that
on a future K tree this group will have some other designation, but I have
published a description of it and placed it on the K tree in an article in the Fall
2007 issue of the Journal of Genetic Genealogy.
K1a11: This is another
provisional subclade, easily predicted by HVR2 mutation 16T. All members also
have 16129A, 150T and 199C; but those are also found in other subclades. This
subclade may not be predicted by HVR1-only results. Members with FGS results
also have specific defining coding-region mutations. As with the provisional
K1a10, members are now labeled as just K1a. K1a11 was also described in my JoGG
article mentioned above.
K1a1a: This subgroup only
includes those assigned to the subclade by FGS results. There are no HVR
mutations with predictive value.
K1a1b & K1a1b1: This subgroup
includes those assigned to the two “mother-daughter” subclades by FGS results.
There are no HVR mutations which would separate them, but together they can
usually be predicted by some HVR matches or by 114T without K1a1b1a’s 16234T.
K1a1b1a: This largest
Ashkenazi subclade includes most of those with HVR2 mutation 16234T. That
mutation is not 100% predictive, but close to it. The addition of 16223T and/or
114T makes the prediction even more certain.
K1a2: This small subgroup
only includes those assigned by FGS results and one exact match.
K1a3, K1a3a &
K1a3a: This
subgroup includes three “mother-daughter-granddaughter” subclades which are
difficult to predict from HVR results. Also included are two HVR-only sequences
which are exact matches and connected
by geographical origin.
K1a4 & K1a4a: This subgroup
includes two “mother-daughter” subclades. Almost all members have been
designated by FGS results. A few HVR-only sequences are included due to
specific mutations which are predictive in context.
K1a4a1: This subclade is the
“granddaughter” to the above subgroup, but is much larger. Most sequences may
not be predicted from HVR-only results. However, all those with 16245T have
turned up here when FGS-tested. Mutation 16261T is often predictive. Several of
these have 195C and thus might have been confused with those in “K1a + 195C”
before FGS testing.
K1a9: This second-largest
Ashkenazi subclade always has 16524G and so is usually easily predicted even
from HVR1-only results. A few with this mutation have turned up in other subclades,
but usually other mutations can eliminate the confusion. Since all members have
497T and 195C and no 309 or 524 insertions, membership from high-resolution
results can be predicted with almost 100% certainty. (Note that there are no K
Project members in subclades K1a5, K1a6, K1a7 or K1a8.)
K1b1a: This subclade may
usually be predicted by HVR1 mutation 16319A, although one K2a has that. The
addition of 16463G makes the prediction almost certain.
K1b2: This subclade has no
predictive HVR1 mutations, but may usually be predicted by the combination of
HVR2 mutations 146C and 195C. However, that combination shows up occasionally
in other subclades; context is important. The addition of 152C may create
confusion with subclade K2a, so several members with all three of those
mutations have been left unassigned.
K1c1 & K1c1b: Another subgroup with
“mother-daughter” subclades which can’t be separated by HVR-only results. K1c
is defined by 146C, 152C and especially the 498- deletion, but so far no
members have been assigned to just K1c by FGS tests. Therefore, members with
498- and without 16320T are put into this subgroup. (Since writing this, a K1c1
with 16320T has been confirmed. “Never say never” is a good rule when predicting mtDNA
subclades.)
K1c2: This subgroup is
easily predicted by high-resolution HVR results with 16320T and 498-. (But see
above.) 16320T is also rarely found in other subclades, but all with it are
grouped here until proven otherwise.
K2a: This subclade can
usually be predicted by HVR2 mutations 146C and 152C and the lack of certain
others. It should be noted that 146C and 152C are very recurrent and may be
subject to back mutations.
K2a2a: This smallest
Ashkenazi subclade – the “granddaughter” of K2a – is easily determined by HVR2
mutation 512C. To my knowledge, that mutation is found in no other subclade or
even haplogroup.
K2a3: This small subclade
is composed of three members assigned by FGS tests and one exact match with
unusual mutations. There is usually no way to differentiate it from its parent
K2a with HVR mutations.
K2a4: At present there is a
single example of this subclade, which may only be determined by an FGS test.
K2b: This subclade is
somewhat smaller than its K2a “sibling.” It can usually determined by the addition
of 146C to the basic six HVR K mutations. One member with the additional 152C,
which would usually denote K2a, has been determined by an FGS test to be in
K2b. Again, this is because 152C is a recurrent mutation. In context, the
presence of 16270T, and especially with the addition of 16222T, allows some in
K2b to be predicted from HVR1-only results.
Unassigned Members: This section is for
all those whose HVR-only results do not permit a prediction of a subclade. New
K Project members are placed here automatically upon joining. The addition of
HVR2 results from an mtDNARefine test is usually sufficient to move a member to
one of the above subgroups. A few with HVR2 are not assigned, usually because
of results which might be either in one of the K1b or K2 subgroups. Other
sequences don’t have predictive HVR mutations for any subclade.
The
News tab on the K Project website should be consulted for recent developments
concerning K. Those tested as being in haplogroup K may join the Project by
clicking on the blue Join button on their FTDNA personal page, then proceeding
through four pages before clicking on yet another Join button.
©
2008 William R. Hurst
Administrator,
mtDNA Haplogroup K Project