New National Geographic Society Genographic Project Paper on mtDNA
There is a new and very important
scientific paper about mtDNA from the National Geographic Society Genographic
Project. The primary author is Dr. Doron Behar of FTDNA, who last year
published the K chart with the subclade definitions we use. There is a copy of
the PDF version
in FTDNA's Library. But if you want to see the full database and other tables,
go here.
The database and tables are under Supporting Information in the right-hand
column. You can also get the paper and supplements at the Genographic site.
As of today, 182 of our 448 FTDNA
members transferred their results from the Genographic Project (those whose kit
numbers start with "N"). All of you, plus those of us who copied our
FTDNA results to the Genographic Project, should be listed in the database.
All members can check the database to see how many HVR1 matches you have there.
I have one match. Of course, there is no geographical information or any way to
contact those matches.
The paper is full of good
information about mtDNA in general; some basic, some over my head. I'll let you
read that. I will point out one difference you will see in the mutations in the
database, the use of "N" for heteroplasmies.
As an example, on our K Project website many of us have mutation 16093C. Those
who don't are assumed to have the CRS version with base T. However, since each
cell has multiple copies of mtDNA, there are often some copies of each variant.
FTDNA apparently simply lists the one with the majority, but the database with
this paper shows 16093N if some of each variant is detected. (In my humble
opinion, if the technology allowed the detection of even one copy of the
minority variant, we would all have 16093N and everybody would be a perfect
match with everybody else and there would be no use to take all the tests! So FTDNA is using the preferred method for our purposes.)
I did pick out a few items relating
specifically to K. Figure 4, which shows how K fits into the entire mtDNA tree,
shows us as 8.12% of the total. Table 2 gives information about the database
divided four different ways with us ranging from 8.04% to 8.54%. The 8.26% K in
MitoSearch is in this range; see here. Both databases are
concentrated in
On p. 1087 of the PDF version of the
paper, it mentions that K is usually defined by 16224C and 16311C; but that
that combination has been found in other haplogroups. There are three of those
in haplogroup H in the database, starting at serial 4095; but those do not have
16519C which almost all of us have. I think we can still differentiate K by
just looking at HVR1 mutations. Also mentioned are some K haplotypes without
16224C or 16311C. We have one Project member without 16224C. There are a few
more of those in MitoSearch, along with ones missing 16311C. No big deal.
On the same page there is a mention
that most of the examples of 16223T below macro-haplogroup R are in our K1a1b1a
subclade.
I created a pie
chart showing the percentages of a few subclades of K in the paper's
database. It is not perfectly accurate as it is based on single HVR1 mutations
which occasionally show up in other subclades. However, each of the subclades
percentages is within 1% of those from the pie
chart I did for MitoSearch a few months ago; see You will see that fully 68% of the
sequences can't be placed in a subclade based on just HVR1 mutations. Most
require HVR2 mutations which are not tested by the Genographic Project. Now you
see why I recommend all our members upgrade with the mtDNARefine
test if they didn't start with the mtDNAPlus test.
Bill Hurst
Administrator, mtDNA Haplogroup K
Project