TRACING THE ANCESTORS OF MPONDO CLANS
ALONG THE WILD COAST OF THE EASTERN CAPE
David de Veredicis
0603298x
A dissertation submitted to the Faculty of Health Sciences, University of the Witwatersrand,
Johannesburg, in fulfilment of the requirements for the degree of Master of Science in
Medicine in the Division of Human Genetics
Pretoria, 2016
Declaration
I, David de Veredicis, declare that this dissertation is my own work, unless otherwise stated.
It is being submitted for the degree of Master of Science in Medicine in the branch of Human
Genetics, in the University of the Witwatersrand, Johannesburg. It has not been submitted
before for any degree or examination at this or any other university.
....................................................
.............day of........................., 2016.
ii
Abstract
Oral history and anthropological data indicate that several Xhosa clans in the mPondoland
region of the Eastern Cape (formerly the Transkei) were established by individuals of nonAfrican ancestry. Several oral and few written accounts state that circa 1730, survivors from
trade- and slave-bearing vessels shipwrecked along the Wild coast of the Eastern Cape.
Castaways who had survived the shipwrecking events had assimilated with the indigenous
people of the area, married local women, and established clans of their own. The group of
clans, which claim their ancestors to be of European and/or Eurasian descent, are known
as the abeLungu, meaning “the Whites”. These clans are discerned from other local groups
by variations in the practice of rituals from that of traditional Xhosa rituals, as these clans
retain an affiliation with the European culture to which their ancestors belonged. Nowadays
they still retain subtle phenotypic features like blue eyes, which are seen in several clan
members. The identity of these clans has, to date, been shrouded in myth due to conflicting
versions in the oral history and anthropological data, which leave the picture of the cultural
identity of the abeLungu people unresolved.
With the advent of molecular biology, it has been shown that DNA may be used as a tool to
trace population ancestry. The non-recombining region of the Y chromosome (NRY) serves
as a marker for patrilineal ancestry and similarly mitochondrial DNA, which is inherited from
mother to progeny, serves as a record for the matrilineal human history.
This study aims at exploring the degree of agreement between culture and genetics by
investigating the genetic variation of the abeLungu - a culturally and geographically defined
group. Focus is placed on their patrilineal history, since their oral history indicates clan
progenitors to be predominantly male, but also due to the patriarchal social structure with
regards to marriage and kinship of the abeLungu.
Buccal swabs were taken from which extracted DNA was used to perform Y chromosome
microsatellite short-tandem repeat (STR) and SNP minisequencing using a total of 60 SNPs
and 19 STRs taken from 146 abeLungu clan-affiliated individuals and 42 non-clan members
from the greater region of mPondoland. Mitochondrial DNA SNP determination and
sequencing analyses were also performed on 188 males and 10 females (the wives/ direct
relatives of primary male clan elders), so as to trace the matrilineal origins and examine the
congruence between the molecular and anthropological data.
iii
The frequency of European and Eurasian haplogroups in the male samples was 69.86%,
which are delineated predominantly by European haplogroups R1b, and West Asian
haplogroup R1a1a. Haplogroups G, I and Q which occur at high frequencies in Europe and
Eurasia were observed as well. It has also been observed (which was as expected) that
culturally defined groups with a unique (or a limited number of) common origins whose
membership is inherited only through the male line showed a relatively low intragroup
variation for genetic markers similarly transmitted. The maternal lineages of the abeLungu
clan members segregate with ancient and deeply-rooted African haplogroup L lineages, with
increased diversity on account of migration due to their exogamous marriage practices.
This study affirms the non-African paternal origin of the abeLungu clans of lineages
originating from few distinct founders, and elucidates the previously unresolved oral
accounts of genealogical information, which has been transferred across generations with
considerable accuracy, despite its propensity for change over time.
iv
Acknowledgements
I would like to acknowledge and thank:
My supervisor, Prof Himla Soodyall, for her continued guidance and support as well as her
encouragement, patience and advice during the course of this project.
Ms. Janet Hayward Kalis, (Lecturer, Department of Anthropology, School of Humanities, Walter
Sisulu University), for her collaboration and wealth of knowledge regarding the anthropology
component of this study.
Mr. Qaqambile Godlo, who was our interpreter between isiXhosa and English in the field.
Ms. Rajeshree Mahabeer, Ms. Pareen Patel and Mr. Thijessen Naidoo of the HGDDRL unit, as well
as Ms. Thandiswa Ngcungcu and Ms. Jackie Frost for their guidance and their assistance with the
training in laboratory methods for this research and their input in the analysis component of the study.
I would also like to acknowledge the following sources of funding, the University of the
Witwatersrand and the Genographic Consortium.
On a personal note, I would like to thank my parents, Dr. Nicola de Veredicis and Dr. Shifra Klebanoff,
my brother, Mark de Veredicis, and my girlfriend Ashleigh Duckitt, who stood by me and gave me
their guidance and encouragement throughout, and offered all their love.
v
Table of contents
Declaration........................................................................................................................... ii
List of figures and tables ..................................................................................................... ix
Appendix Figures and Tables .............................................................................................. x
CHAPTER 1.........................................................................................................................1
Introduction ..........................................................................................................................1
1.1 Background and history of the abeLungu clans .............................................................1
1.1.1 The Wild Coast, shipwrecks and the clans of their castaways ................................1
1.1.2 The origins of the abeLungu ....................................................................................2
1.1.3 The amaMolo ..........................................................................................................5
1.1.4 Secondary clans and multiple castaway settlements ..............................................8
1.1.5 The clan system ......................................................................................................9
1.2 Molecular Anthropology ...............................................................................................10
1.2.2 Y chromosome haplogroups and phylogeographic variation .................................13
1.2.2.1 European and Eurasian haplogroups .............................................................15
1.2.2.2 African Y haplogroups ....................................................................................18
1.2.3 Y chromosome Short Tandem Repeats (Y-STRs) and Y-haplotypes....................19
1.3 Matrilineal ancestry ......................................................................................................19
1.3.1 Mitochondrial DNA ................................................................................................19
1.3.2 mtDNA phylogeographic variation and inferring matrilines ....................................21
1.3.2.1 African mitochondrial haplogroups..................................................................23
1.3.2.2 Non-African mtDNA haplogroups....................................................................25
1.4 Aims and objectives of the study..................................................................................27
CHAPTER 2.......................................................................................................................29
Subjects and Methods .......................................................................................................29
2.1 Subjects and sampling .................................................................................................30
2.1.1 Ethics approval for study .......................................................................................30
2.1.2 Sampling and research area .................................................................................30
2.2. Laboratory Methods ....................................................................................................35
2.2.1 DNA extraction and quantification .........................................................................35
2.2.2 Molecular methods for Y chromosome DNA studies .............................................35
2.2.2.1 Y-STR genotyping ..........................................................................................36
2.2.2.2 Y chromosome binary marker screening ........................................................37
vi
2.2.2.3 Additional marker screening ...........................................................................41
2.2.3 Mitochondrial DNA molecular methods ................................................................42
2.2.3.1 Mitochondrial D-loop HVR sequencing ...........................................................42
2.2.3.2 Mitochondrial SNaPshotTM sequencing (MTSS).............................................44
2.3.1.1 Y chromosome haplotype networks ................................................................47
2.3.1.2 Database search queries ................................................................................47
CHAPTER 3.......................................................................................................................50
Results ...............................................................................................................................50
3.1
Y chromosome DNA studies ....................................................................................50
3.1.1 Y chromosome haplogroups..................................................................................50
3.1.2 Y chromosome DNA haplotype variation...............................................................53
3.1.2.1 Y chromosome variation linked with Eurasian origins: haplotypic variation
within the amaMolo .....................................................................................................59
3.1.2.2 Haplotypic variation within the primary abeLungu clans .................................60
3.1.2.3 Haplotypic variation within the secondary abeLungu clans .............................63
3.1.2.4 Y chromosome variation linked with African origins ........................................69
3.2
Mitochondrial DNA findings ......................................................................................74
3.2.1 MtDNA haplogroups ..............................................................................................74
3.2.2 MtDNA haplotype diversity ....................................................................................78
CHAPTER 4.......................................................................................................................81
Discussion .........................................................................................................................81
4.1 Y chromosomes and genetic heritage..........................................................................81
4.1.1 Y chromosomes and the founding fathers of the abeLungu ..................................81
4.1.2 The amaMolo and their affiliation with the abeLungu ............................................85
4.1.3 Multiple founding events ........................................................................................86
4.1.4 Clan-affiliated Africans ..........................................................................................87
4.1.5 Factors which shape clan diversity ........................................................................88
4.2 The maternal legacy of the abeLungu..........................................................................91
4.3 In summary of the findings ...........................................................................................95
4.4 Future Studies .............................................................................................................96
4.5 The impact of human population diversity and genetic genealogy studies ..................98
4.6 Genealogy testing and its limitations............................................................................99
vii
4.7 Biomedical and forensic impact of population diversity studies .................................102
4.8 Social cohesion and making a new South African demographic history ....................104
CHAPTER 5.....................................................................................................................106
Concluding remarks .........................................................................................................106
5.1 Testing the oral history of the abeLungu ....................................................................106
References ......................................................................................................................108
APPENDICES..................................................................................................................123
Appendix A: Ethics ...........................................................................................................124
Appendix B: SNP-marker panels for SBE multiplex assays .............................................125
Appendix C: Clan genealogies.........................................................................................127
Appendix D: Variant sites of unique mtDNA haplotypes ..................................................134
Appendix E: Comparative data sources ...........................................................................142
Appendix F: Preparation of solutions ...............................................................................144
viii
List of figures and tables
Figures
Figure 1.1. Partial pedigree of the amaMolo clan……………………………………………………………………6
Figure 1.2. Nqulo (praise) to the ancestors of the amaMolo………………………………….............................11
Figure 1.3. Geographic distribution map of Y chromosome macro-haplogroups……………………………….14
Figure 1.4. Schematic overview of the mitochondrial DNA molecule………………………………………….…21
Figure 1.5. Global distribution map of mtDNA haplogroups………………………………………………….……22
Figure 2.1. Schematic overview of methods employed in the study................................................................29
Figure 2.2. Photographs of three male subjects featuring blue eyes……………………………………………..31
Figure 2.3. Research area of the study………………………………………………………………………………34
Figure 2.4(a) The Y chromosome SNP phylogeny……………………………………………..............................38
Figure 2.4(b) An electropherogram showing the markers screened for using the YSNP1
SBE marker panel…………………………………………………………..……………………….38
Figure 2.5. MtDNA SNP-marker phylogeny………………………………………………………………………….45
Figure 3.1. Phylogeny and frequency distribution of Y chromosome haplogroups………………………………52
Figure 3.2. Haplogroup R1a1a (R-M198) RMJ network……………………………………………………………59
Figure 3.3. Haplogroup R1b (R-M343) RMJ network……………………………………………………..………..62
Figure 3.4. Haplogroup I (I-M170) RMJ network…………………………………………………………………....64
Figure 3.5. Haplogroup E1b1a1a1c1a (E-M191) RMJ network ……………………………..............................70
Figure 3.6. Haplogroup E2b1a (E-M85) RMJ network…………………………………………………………..…71
Figure 3.7. Haplogroup E1b1a1 (E-M2) RMJ network………………………………………...............................72
Figure 3.8. Haplogroup B2a1a1a1 (B-M152) RMJ network……..…………………………………………………73
Figure 3.9. Distribution of mtDNA haplogroups by clan…………………………………………………………….75
Figure 3.10. Neighbour-Joining (NJ) phylogenetic tree of 176 mtDNA haplotypes………………………………79
Figure 4.1. The amaTshomane clan genealogy…………………………………………………………………….95
Tables
Table 2.1. Geographic sampling regions of abeLungu clans……………………………………………..…........33
Table 2.2. Amplification of Y-STR loci……………………………………………………….................................36
Table 2.3. YSNP1 SBE multiplex PCR reagents……………………………………………................................39
Table 2.4. SBE PCR thermal cycler conditions………………………………………………………………….....39
Table 2.5. Post-PCR purification reaction reagents………………………………………….……………………..40
ix
Table 2.6. YSNP1 Multiplex SBE reaction………………………………………………………………….……….40
Table 2.7. Primer sequences for the mtDNA1kb D-loop PCR amplification………………………………….….42
Table 2.8. Primer sequences for HVR I & II cycle sequencing………………………………..............................43
Table 3.1. Y chromosome haplogroup distribution for non-clan affiliated samples ………..............................51
Table 3.2(a). Non-African haplotype distribution list………………………………………………………………..55
Table 3.2(b). African haplotype distribution list…………………………………………………………………..….57
Table 3.3. Haplotypes and presumed geographic origins of abeLungu clan founders……...............................67
Table 3.4. MtDNA haplogroup frequencies………………………………………………………………………..…77
Appendix Figures and Tables
Appendix Figures
Figure S1. amaMolo clan genealogy…………………………………………………………………...128
Figure S2. abeLungu Jekwa clan genealogy………………………………………………….............129
Figure S3. abeLungu Caine & Horner clan genealogies…………………………………….............130
Figure S4. abeLungu Hatu clan genealogy…………………………………………………………….131
Figure S5. abeLungu Ogle, Irish, France,Thaka & Buku clan genealogies………………………...132
Figure S6. abeLungu Fuzwayo, Hastoni & Sukwini clan genealogies………………………………133
Appendix Tables
Table S1. Y chromosome SBE marker panels…………………………………………………..........125
Table S2. Unique mtDNA haplotypes…………………………………………………………………..134
Table S3. Comparative data sources…………………………………………………………………...142
x
List of abbreviations
AIMs - Ancestry Informative Markers
CMH - Cohen Modal Haplotype
D-loop - Displacement loop
DNA – Deoxyribonucleic Acid
ddNTPs - Dideoxynucleotide-triphosphates
GWAS - Genome-wide association studies
HCV - Hepatitis C virus
HVR - Hypervariable Regions
ISOGG - International Society of Genetic Genealogy
minHt - Minimal haplotype
mtDNA - Mitochondrial DNA
NJ - Neighbour-Joining
NPTs - Non-patrilineal transmissions
NRY - Non-Recombining region of the Y chromosome
RMJ - Reduced-median-joining
rCRS - Revised Cambridge Reference Sequence
SWGDAM - Scientific Working Group on DNA Analysis Methods
STR - Short Tandem Repeat
SBE - Single Base Extension
SNPs - Single Nucleotide Polymorphism
TMRCA - Time to the Most Recent Common Ancestor
YHRD - Y-STR Haplotype Reference Database
VOC - Vereenigde Oost-Indische Compagnie (United East-India Company in Dutch)
xi
xii
CHAPTER 1
Introduction
1.1 Background and history of the abeLungu clans
1.1.1 The Wild Coast, shipwrecks and the clans of their castaways
mPondoland is a region on South Africa’s Wild Coast which forms part of the Transkei
republic in the Eastern Cape. The mPondo are one of 12 Xhosa speaking tribes who
had settled in mPondoland between 500 and 1200 years ago (Soga, 1930). At the
time the cultural territories were divided, with Khoi pastoralists dominating the area
around Port Elizabeth, San hunter-gatherers lived in the Drakensberg foothills and
Nguni mixed-farmers lived throughout the Transkei, while the mPondo resided closer
to the coast (Soga, 1930). Today the Transkei and its Wild Coast are still in a
developmental backwater and, even though South Africa has been in a state of
developmental transition since 1994, service delivery in this region has been so slow
that for many people things have not improved much. The history of the abeLungu
exemplifies a point in human history when foreigners from very distant shores
harmoniously integrated with indigenous populations, in strong contrast to some of the
more recent political history of racial prejudice and segregation in South Africa (Soga,
1930; Crampton, 2004).
The coastline of the Eastern Cape is notoriously harsh, and estimates are that there
are famous accounts for at least 20 shipwrecks which have occurred in the period
c.1500-1800 along the Wild Coast alone, and numerous others which have gone
unaccounted for, since Vasco de Gama first rounded the Cape of Good Hope in 1498
(Crampton, 2004). Some of the more well-known accounts of the wrecks and their
castaways include the Sâo Joâo Baptista, which ran aground east of the Kei River in
1622, the Stavenisse of 1686, the Bennebroek of 1713, the Doddington in 1755 and
the famous Grosvenor which met its fate in 1782 (Soga, 1930; Crampton, 2004).
During the course of history, however, a number of castaways, with no option of
returning to their homes, had harmoniously integrated with local communities of
mPondo people, even marrying into them, living out their days not far from where their
ships went down. Regarding whites who assimilated, often times the extent of
1
integration was marked, for example Stephen Taylor accounts that “…even more
curious perhaps are those white men and women who have felt called to enter
initiation, train as diviners and establish homesteads and followings in rural areas and
beyond” (Taylor, 2005; Kalis, 2010).
Near Mthatha, the capital of the Transkei, at the Xora River Mouth, exists a clan family
known as the abeLungu, who proclaim that they are descendants of European (white)
castaways. Theories of the clan’s origins are linked to the story of the arrival of a young
girl named Bessie on the Wild Coast, however details of her arrival remain unclear
(Soga, 1930; Crampton, 2004; online reference 1 and 2).
1.1.2 The origins of the abeLungu
The Xhosa have a saying which states that “If you want the truth, get it from the
original, rather than from one who has heard it second hand” (Crampton, 2004). With
this in mind, key resources by authorities on the abeLungu include Xhosa-Scot
historian John Henderson Soga in his account of The South-East Bantu (Soga, 1930)
and Hazel Crampton in her novel The Sunburnt Queen (Crampton, 2004), as well as
first-hand interviews with extant relatives of abeLungu clans of the Wild Coast which
were performed in the field by my collaborator, anthropologist Janet Hayward Kalis
(Kalis, 2006-2010, personal communications).
The abeLungu, or the “Whites”, are a black, Xhosa-speaking clan whose origins can
be traced to three white, English castaways, who were named Jekwa, Hatu and Badi.
While Jekwa, Hatu, and Badi are cited as the progenitors of the abeLungu, they were
not the only foreigners from whom the clan descends. Crampton (2004) learned that
the name abeLungu comes from the isiXhosa word meaning white foam, referring to
the frothy sea foam from where their ancestors had emerged and had originally been
encountered by the mPondo people. The abeLungu take pride in their unusual history,
but, as time goes by, it is slowly being forgotten (Crampton, 2004). Clan members
have become more confused about their history and through time, with names and the
sequence of their progenitors beginning to blend into a potpourri of castaways, dates
and wrecks. Upon interviewing contemporary members, the situation had declined
further, exacerbated by the fragmentation of traditional life due to the migrant labour
system and westernisation. Oral history serves as a tool for passing on a cultural
2
identity and a system of values, and in the telling and retelling the tale of the abeLungu,
filtered down through several generations, different patterns have been woven into the
whole, but the basic fabric holds true. A famous narrative about foreign shipwreck
survivors becoming integrated into the local population is that of Bessie. The ship in
which Bessie was a passenger remains unconfirmed to date, but it is suspected that
she may have been aboard one of the Dutch East India (VOC) vessels which became
wrecked sometime around 1737. Bessie was just one of thousands of people of
various nationalities who were castaway on the shores of the Eastern Cape over the
centuries. Survivors who did not wish to return to their homelands and their previous
lives, sought to take refuge with the local clans of their new found emplacement. Many
of these stories may have been lost forever, but some like that of Bessie remain as
part of the South African oral narrative. This is mostly because Bessie, a white woman,
most probably of British descent, came to marry into the amaTshomane royal family,
thus she is remembered in the oral histories of her people. This was further enabled
as two of her children were still living when the first English missionaries visited the
area in the 1880s, and so, her story had been recorded in written history and was not
lost (Crampton, 2004).
From an excerpt of Crampton’s novel: “It was on this notorious coast, at about the
same time that a Dutch fleet was destroyed in Table Bay in (1737), huddled against a
rock lay a little, white, English girl named Bessie, who was cast ashore from her ship,
at a remote spot known as Lambasi (the Bay of Mussels)...” (Crampton, 2004, p.12).
In time Bessie acquired a Xhosa name, Gquma which means “the roar of the sea”.
Even though Gquma became more frequently used, she never forgot her real name
and later, even named one of her children Bessy (Crampton, 2004).
Furthermore, “…legend has it that she was not alone, but the theory as to who her
companions were still remains a mystery” (Soga, 1930, p.379). From Soga’s account
we understand that there were “…four in number; three males and one female child”,
who were named Jekwa, Hatu, Badi, and Gquma, respectively. The first two were
thought to have been brothers, and the young girl was believed to be the daughter of
Badi. According to the mPondos, they were relatives of one another “as they came
from the same ‘house’, (viz. the ship) …” (Soga, 1930, p.379). However, this need not
be accepted as true since the mPondo are a polygamous society and family relations
3
operate differently, so it is important to note that the terms ‘brother’ may have a wider
application and meaning in isiXhosa than in English, which may not necessarily reflect
biological relations. It is in this context that the relationships of Bessie and her fellow
castaways should be understood (Crampton, 2004). The several accounts that have
survived the intervening centuries, are fragmented and contradictory, falling largely
into two camps: one in which the girl’s companions are said to have been white men,
and the other in which her companions are said to have been black, which may be
interpreted that her companions could have also been Indian or Arabic. “Several
slaves were with her…and were ‘black’ with long hair…” (Crampton, 2004). Soga
(1930) contrastingly claims that all of Bessie’s accompanying castaways were white.
From the independent expeditions of Dutch traders Hubner in 1736 and van Reenen,
in 1790, several clues contribute to better understanding the mystery of Bessie and
her Englishmen’s origins (Crampton, 2004; Soga, 1930). Hermanus Hubner who
headed an ivory-trade expedition in 1736, discovered a clan where three European
shipwreck survivors (named Miller, Clerk and Billyert) resided with “numerous wives
and offspring who had been shipwrecked many years before” (Crampton, 2004). It is
about 50 years later where van Reenen on his expedition had also discovered a place
with about 400 persons of mixed race, and three elderly women who had survived a
wreck sometime around 1730, who were presumably of the same party as that
encountered by Hubner (Crampton, 2004). Judging by the date, one of the women
that van Reenen came in contact with was Gquma’s (Bessie’s) daughter, Bessy, and
so it is evident that both Hubner and van Reenen are referring to having met the same
woman, and that it is fairly clear that this group of mixed race existed some time before
1730 (Soga, 1930).
Since the Dutch butchered English names as badly as they did isiXhosa, and the
names of the three English men who were encountered by Hubner in 1736 have a
great resemblance to those of Bessie’s three fellow castaways, it is therefore possible,
that through examining their conversion into English, Hendrik Clercq becomes Henry
Clarke; Tomas Willer becomes Thomas Miller; and Wellem Billyert may have been
(Bill) Elliot or perhaps even Billy Hart. When juxtaposed with the names of Bessie’s
white men, the result is striking. ‘Hatu’ resembles ‘Henry’ and ‘Badi’ ‘Billy’ enough to
suggest that the former are simply Xhosa-ised (corrupted) versions of the latter.
4
Finally, if Henry and Billy were Hatu and Badi, Thomas Willer (or Miller) must have
been the man known as Jekwa. It was, and in fact still is, customary for a man to take
on a new name when he became chief, and his Xhosa name, Jekwa, was probably
bestowed on him when his mentor, Chief Matayi, appointed him as Chief of the
abeLungu (Soga, 1930; Crampton, 2004).
Bessie grew to be an extremely attractive adolescent, who eventually caught the eye
of Tshomane, Great Son of the Tshomane chief Matayi, who would eventually marry
her and make her his Great Wife. Mysteriously, Tshomane died soon after their
marriage, without an heir, and so the chieftainship was taken up by a close relative,
Xwebisa (also known as Sango). In time Gquma (Bessie) married Sango, who became
the Paramount Chief of mPondoland. This was initially met with great shock and
disapproval as it implied the breaking of strict incest taboos. A very strong 'exogamy'
law exists among the Xhosa where it is considered incestuous to have any sexual
relations, let alone marry a person belonging to the same clan (Soga, 1930; Crampton,
2004).
Gquma and Sango had three sons, and a daughter, Bessy, who were
physically markedly different to the other clans-children: ‘Several children were
“yellow” in colour, having long hair and blue eyes’ (Soga, 1930; Crampton, 2004). It is
understood that Bessie had died sometime around 1810, aged about 80. Crampton,
quoting Scully (1984), accords Bessie a very romantic end: “On the day she died she
was, at her own request, carried down to the cleft in the reef where she partly lifted up
herself, and pointed across the sea, turned, and gave out her life with a long drawn
sigh”. “In the night a terrible storm arose, and the shore was found strewn with myriads
of dead fish” (Crampton, 2004).
1.1.3 The amaMolo
Although castaways were predominantly European, a large number of people were of
other races and cultures, including black, Japanese, Javanese, and South-Indian
Lascars. Just as the abeLungu identify their progenitors as having been white
castaways, the amaMolo identify theirs as ‘black’ castaways – it is thought that they
might have been Malagasy or Malay, but the general consensus is that they were
Indian (Crampton, 2004). Crampton (2004) chronicles an mPondo legend, which
describes the arrival of “long strange ships which anchored off shore, and at night had
sent a number of small boats with musket bearing men, in white headdresses and long
5
flowing robes… much before the first Europeans.” The clan progenitors, Bhayi (the
son of Jafliti), and Pita, were said to have had “an Arab look about them, whose hair
was straight, long and black.”
Soga agrees with Kirby (1953) in that the name
“amaMolo” probably a derivation, brought about by the corruption of the word Moor
(pertaining to natives of North Africa). In Kalis’ interviews with extant amaMolo
members however, it was reported that the name comes from the traditional isiXhosa
greeting, Mholo, which was the only isiXhosa word that the castaways knew and which
they would repeat when asked ’from which clan do they originate?’ (Kalis, 2010,
personal communication with permission).
Another possible theory to their origins is that some Malabar slaves who had survived
the wreck of the Bennebroek, in 1713, had remained with the mPondo, instead of
trying to return to their own country or to search for civilization (Soga, 1930; Crampton,
2004). The story of the origin of the amaMolos, as retold to Soga by the Great Son of
the amaMolo Chief Mxhaka, is as follows: “Bhayi and his wife Nosali, Pita (his brother)
and another man, Mera were captured by white men and taken aboard a ship which
became wrecked, where they were washed ashore the coast of mPondoland." As
Bhayi’s wife was barren, he settled down at Brazen-Head, Mganzana; three kilometres
from where Bessie resided, and married an mPondo woman with whom he had five
children named Poto, Falteni, Mnyuri, Mngcolwana, Nyango and lastly Mgareni (some
of which are indicated in the partial pedigree - Figure 1.1).
Figure 1.1. Partial pedigree of the amaMolo clan based on interviews with Chief of the amaMolo clan,
Mhlabunzima Mxhaka, by Janet Kalis, as part of her research (Kalis, 2009; personal communication). Two
primary branches indicate the relations of Bhayi, and his “brother”, Pita, who are the alleged sons of Jafiliti
(Soga, 1930; Crampton, 2004). Bhayi’s sons Poto, Falteni, Mnyuri, and Nyango also have been indicated.
Males are designated by triangle symbols and females are indicated by circles. The “=” indicates that these
are multiple wives of specific males.
6
The well documented story of the wreck of the Grosvenor, tells of an English East
India Company (EIC) vessel, wrecked in 1782; this date corresponds well with the
presumed date of Bhayi’s arrival in mPondoland. Among its survivors were 25 Indian
seamen, including an Indian maidservant called Mary (possibly a version of Bhayi’s
fellow castaway Mera), accompanied by another Indian woman, Sally, whose name is
similar to Bhayi’s wife’s name, Nosali. The mystery of the Indian origins of the
amaMolo was seemingly resolved by a friend of Crampton, who, born and raised in
India, immediately recognized that the names were of Hindi origin. ‘Bhayi’, she said
comes from ‘bhay’ or ‘brother’ in Hindi. ‘Pita’ – with whom Bhayi was captured – means
‘father’, and ‘Poto’, the name of Bhayi’s eldest son is a corruption of ‘pota’, meaning
grandson.
Makuliwe, who has conducted research during the 1990s on Southern-Nguni clans,
states of the amaMolo clan that it “…can be traced back to white people that were
shipwrecked in the Indian Ocean and then married to Pondos”. This was echoed by
Chief Mxhaka and other contemporary clan members, interviewed by Kalis in 2010,
who
proclaim
that amaMolo
clan forebears
were
white
(Kalis,
personal
communication, 2010). Thus, 80 years ago when Soga did his research, amaMolo
were considered to be of Asian descent, but their recent cultural association is with
white or European forebears. Kalis has learned that contemporary members of the
amaMolo clan consider the clan name ‘abeLungu’ to be synonymous with ‘amaMolo’
and members of the abeLungu clan recognise those of amaMolo as patrilineal kin and
both account for one another in their oral histories. Exactly when and from where the
arrival of the amaMolo and abeLungu forebears on this continent had occurred, and
to what extent their origins are bound up with one another has not yet been
ascertained. Earlier accounts suggest that the amaMolo and abeLungu forebears had
survived the same wreck, and that Asian and European survivors subsequently went
their own ways, both assimilating into their own local communities, founding the
amaMolo and abeLungu clans, respectively (Soga, 1930; Kirby, 1953).
7
1.1.4 Secondary clans and multiple castaway settlements
The abeLungu constitutes a broader super-clan family, incorporating numerous
lineages of clans which claim affiliation to non-African ancestry. The ‘primary’ clans
initiated by the original European and Asian castaways are the abeLungu Jekwa,
abeLungu Hatu, abeLungu Buku as well as the amaMolo clans. However, numerous
other clans exist within the abeLungu clan family. Multiple settlements and more recent
establishments of clans are believed to have originated from later shipwreck incidents
along the Eastern Cape’s Wild Coast, with survivors also having assimilated with local
Xhosa clans. As these founders were of non-African descent as well, they too had
founded their own independent abeLungu clans. This is supported by oral history as
well as the genealogies reconstructed thereof. The time-depths of the primary
abeLungu clan genealogies extend for ten generations on average until the common
non-African clan founder, while those of the more recently established clans go back
five generations on average (Kalis, 2009 – personal communication). A brief
description of clan families and subclans with hypothetical geographical regions of
origin is as follows: The founders of the original abeLungu clans Jekwa, Hatu and Buku
presumedly came from Western Europe (Britain and/or Ireland). The founders of
secondary abeLungu clans France, Horner, Irish, Caine, Ogle, Hastoni, Fuzwayo,
Sukwini and Thaka are believed to have also come from Western Europe
(England/Ireland) as well as Eurasia, while the amaMolo are believed to be of Eurasian
descent.
With regards to names of clans and clan-name prefixes used, Soga states “Etymology
as a science is unknown to the Bantu, and there are no phonetic rules laid down by
them. As a general rule the prefix is not a matter of choice, but it is subject to what we
may call dialectic phonetics. What, then, governs the selection of a clan or tribal prefix?
The answer is that the selection is governed purely by phonetic requirements. There
is no rule determining the use of any prefix attached to the tribal name but that which
suits the tongue. “It would be phonetically awkward to say for instance, aba-Xosa, or
aba-Huhu” (Soga, 1930). For all practical purposes, the naming of the descendants of
the clan progenitors, Jekwa, Buku and Hatu as well as secondary abeLungu clans
retain the clan family prefix ‘abeLungu’ while the descendants of Bhayi and Pita, are
referred to as the ‘amaMolo’.
8
1.1.5 The clan system
The dynamics through which an individual is recognised as a member of a group are
a critical part of the mechanism to defining the identity of a person (Montinaro, 2016).
The clan, here defined as a group of households reporting a shared ancestry, refers
to an intermediate level distributed between lineages within the hierarchical structuring
of a given society (Montinaro, 2016). Clan membership signifies descent from a
common ancestor after whom the clan itself has been named (Preston-Whyte, 1974).
Clan membership in agnatic societies like that of the abeLungu, is determined by the
principles of patrilineal descent - meaning clan name passes exclusively through the
male line and is infringed upon in the case of illegitimacy. The abeLungu are a
patrilocal society, where historically, migration has been limited from these clan nodes
(Soga, 1930). Clan lalis (homesteads) are geographically situated for the most part,
where clan forebears had originated their clans. The abeLungu observe strict clanexogamy practices and so it is customary not to marry inside one’s own clan. Polygyny
is widespread and the degree of which depends on the wealth of the husband (Soga,
1930; Chaix, 2007; Sanchez-Faddev, 2013).
Although the Xhosa clans in the sample live in deeply rural contexts, amongst
traditional Xhosa people, and have adopted their customs and religious practices, they
retain an affiliation with the European culture to which their ancestors belonged, which
is expressed in various ways (Kalis’ interviews, 2009). All of Kalis’ male informants
were able to name their male antecedents right back to the man who gave his name to
their clan, even when this went much further back than three generations. This
genealogical information is publicly recited on ritual occasions. Through the recitation
of clan names (iziduko) as well as praise names and poetry (izibongo), the presence
of deceased patrilineal ancestors is invoked at important occasions which are often
organised with the primary intention of appealing to ancestral spirits and seeking their
appeasement. Maguliwe recites the nqulo (praise) of the amaMolo (Kalis, field
interviews, 2009), listing the main forefathers of the clan (Figure 1.2).
However, as has been witnessed, abeLungu clans claim a degree of independence in
terms of how both rituals and praises to ancestors are performed. Our translator
Qaqambile had asked an abeLungu Horner clan member of these discrepancies:
9
Qaqambile: “You are of this nation with mixed blood living among indigenous
people who have their own ways of living.
How do you do it?”
Mlungisa: “Well that’s easy. We were born in Xhosaland and we are living
among Xhosas. We have customs and traditions but we don’t do our rituals
like the Xhosas. For example, when Xhosa people kill a goat they use a
spear but we just slaughter it with a knife and enjoy the meat, that’s it. We are
white people so we don’t perform rituals. We can even perform rituals with a
chicken rather than a goat. We are not governed by the strong traditions of
the true Xhosa people (2009-11-05 Mlungisi Horner).”
1.2
Molecular Anthropology
This study was initiated as a collaboration with Janet Hayward Kalis (University of
Mthatha), who had been conducting anthropological and genealogical research on 13
different clans in the Transkei-mPondoland region of the Eastern Cape. Kalis has
documented details of genealogical relations and ritualistic practices of sacrifice and
praise (izinqula) to clan ancestors of contemporary abeLungu clans, which differ from
traditional Xhosa people’s as they retain an affiliation with European and Eurasian
culture to which their ancestors belonged (Janet Kalis, 2010, personal communication
with permission).
The recent genealogical history of human populations is a complex mosaic formed by
individual migration, large-scale population movements, and other demographic
events. Reconstructing human history requires the collection of various narratives
from disciplines such as anthropology, archaeology, history, linguistic studies,
paleontology and climatology. In the absence of written history, oral history has
recorded the transmission of biographical and historical information, but it is proven
that it is subject to changes and distortion over time. Genealogical oral histories,
however, can now be tested through the application of genetic markers present in Y
chromosome and mitochondrial DNA (mtDNA), which shed light on anthropological
questions by documenting the similarities and differences between people in terms of
molecular characteristics which parallel anthropological historical events, thereby
providing a clearer understanding of the abeLungu’s origins, and ultimately
contributing to understanding humanity’s past history.
10
Figure 1. 2. Nqulo (praise) to the ancestors of the amaMolo
11
1.2.1 Patrilineal descent and Y chromosome DNA
As unique organisms, most of humanity carry a cultural marker of coancestry, a
surname (and similarly with clan names), which is a counterpart to the biological
marker of coancestry common to all organisms - DNA (King and Jobling, 2009 [a]).
Surnames (and clan names) have been shown to be specific to particular indigenous
populations and to show geographical specificity within regions. This property means
that they find wide application as convenient proxies for ethnic origin in healthcare as
well as epidemiological studies (Shriver and Kittles, 2004; King and Jobling, 2009 [a]).
However, analysis where surnames are combined with Y chromosomes has also
enabled them to be used in genetic studies of historical migrations and admixture (King
and Jobling, 2009 [b]). We may expect that a clan name should correlate with a type
of Y chromosome, which has been inherited from a shared paternal ancestor –
possibly even the clan name’s original founder.
Several features of Y chromosome DNA make it a suitable marker for investigating
population histories. The Y chromosome is inherited paternally, which coincides well
with the fact that clan name is also paternally inherited, making it a suitable marker for
delineating patrilineal ancestral lineages (Jobling, 2001; Shriver and Kittles, 2004).
Very little of the Y chromosome is made up of coding-DNA and as a result, markers in
the Non-Recombining region of the Y chromosome (NRY) are examined for insight
into patrilineal population history (Cann et al., 1987; Jobling and Tyler-Smith, 2003;
Ralph and Coop, 2013).
Formally, any combination of polymorphic markers along a non-recombining molecule,
and that tend to be inherited together constitute a haplotype [Jobling and Tyler-Smith,
2000; Y Chromosome Consortium (YCC), 2002]. Combinations of the biallelic markers
define stable lineages of Y chromosomes that we refer to as ‘haplogroups’; a
haplogroup describes haplotype groups which coalesce to a point where certain
coding-region Single Nucleotide Polymorphism (SNPs) are found in common, and
thus define a common ancestor, by having the same SNP in all haplotypes (Jobling
and Tyler-Smith, 2000; YCC, 2002). Hammer and Zegura (2002) define the term
haplogroup as ‘NRY lineages defined by binary polymorphisms’, whereas the term
haplotype is reserved ‘for all sub-lineages of haplogroups that are defined by variation
at STRs on the NRY’.
12
The estimated average Y chromosome SNP mutation rate is approximately 10-7 to
10-8 per generation (Jorde et al., 1998; Gray et al., 2000). Thus the low mutation rate
of SNPs allows us to investigate pre-history of humans, but these polymorphisms are
relatively uninformative about recent history (Jorde et al., 1998; Gray et al., 2000).
Microsatellite short tandem repeats (STRs) can provide better information about
recent evolutionary events than that of slowly evolving SNPs, due to their high
mutation rate. The mutation rates of STRs are on average about 4 to 5 orders of
magnitude higher than that of SNPs, and approach 10-3 per generation which is high
enough to be directly determined in pedigree studies, spanning only a few generations
(Jorde et al., 1998). We can hope to identify genetic evidence of more recent
relatedness, and so obtain insight into the population history of the past tens of
generations ago (Forster et al., 2000; Zhivotovsky et al., 2004; Ralph and Coop, 2013).
1.2.2 Y chromosome haplogroups and phylogeographic variation
Lineage based ancestry tests are popular because NRY haplotypes can provide
information that is regionally specific. Y chromosome haplotypes act as barcodes or
profiles of individuals sharing common ancestry. These profiles constitute haplogroups
that phylogenetically represent charted human lineages, which are consistent with the
movement of modern-day humans out of Africa (Jorde et al., 2000; Quintana-Murci et
al., 2004; Barik et al., 2008). The Y Chromosome Consortium (YCC) and the
International Society of Genetic Genealogy (ISOGG) have published updated versions
of the maximum parsimonious phylogenetic tree of human Y chromosomes
accompanied with proposed universal nomenclature. The most recent phylogenetic
tree from consists of 311 haplogroups, which are defined by 600 SNPs (Geppert and
Roewer, 2012; ISOGG, 2016). The phylogeny maps marker SNPs which correlate to
the current global phylogeographic diversity of the Y chromosome (Hammer and
Zegura, 2002; YCC, 2002; ISOGG 2016; Jobling and Tyler-Smith, 2003). The major
clades (haplogroups) are labeled with a capital letter (e.g., R) and sub-haplogroups
are designated alternately with numbers and lower-case letters (e.g., R1b). Usually
the terminal SNP is included to determine the branch unequivocally (R-M343 alias
R1b) (Hammer and Zegura, 2002; Geppert and Roewer, 2012). For the purpose of
clarity, the combined SNP-marker/haplogroup-name nomenclature will be used when
discussing haplogroups (for example, R1b (R-M343)).
13
Since Y chromosome haplogroup frequencies are highly structured by geography, it
is possible to distinguish between African and non-African Y chromosomes, and in
most instances, the combined haplogroup-haplotype information can reveal a
judicious indication of the ___location of the broader geographic region of the origin of the
Y chromosome (Hammer and Zegura, 2002; Shriver and Kittles, 2004; Naidoo et al.,
2010). Figure 1.3 has been adapted from Chiaroni et al., (2009) and illustrates the
prevalence and frequency distribution of Y chromosome macro-haplogroups globally.
Figure 1.3. Geographic distribution map of Y chromosome macro-haplogroups - adapted from
Chiaroni et al., (2009).
14
Based on the anthropological and oral histories it is expected to observe
predominantly European Y chromosome haplogroups in the abeLungu clans and
Eurasian ancestral haplogroups in the amaMolo. The majority of lineages observed in
contemporary European and Eurasian populations fall into the following main
haplogroups, namely R, G, I, J and Q, and are defined by SNP markers M198 and
M343, M201, M170, M172 and M242 respectively (Jobling and Tyler-Smith, 2003;
Karafet et al., 2008; Chiaroni et al., 2009; Myres et al., 2011). The prevalence of
macro-haplogroups A, B and E which originate in, and are largely restricted to the
African continent, will invariably be observed in the sample on account of gene-flow,
admixture and non-patrilineal events (Jobling and Tyler-Smith, 2003; Karafet et al.,
2008; Chiaroni et al., 2009; Ralph and Coop, 2013).
1.2.2.1 European and Eurasian haplogroups
Typically, greater than 50% of men in Europe are affiliated with haplogroup R (Chiaroni
et al., 2009; Myres et al., 2011). Macro-haplogroup R is defined by marker M207 and
is the most common clade throughout north-western Eurasia, and the majority of
European Y chromosomes segregate under this haplogroup (Jobling and Tyler-Smith,
2003; Karafet et al., 2008; Chiaroni et al., 2009; Myres et al., 2011; Geppert and
Roewer, 2012). Haplogroup R accounts for more than one-third of Indian Y
chromosomes, and its daughter clades R1 and R2 are both found in tribal and caste
groups (Sahoo et al., 2006). Clade R1 splits into R1a and R1b, which are similarly
variable in Indians and western Asians but are less so in Estonians, Czechs, and
central Asians (Kivilsild et al., 2002). Lacau et al., (2012) showed that majority of
Afghan individuals (67.4%), segregate under R-M207 with sub-haplogroup R1a1aM198 variants present in both the North and South.
Haplogroup R1a1a (with its alternate name R-M198) is the dominant Y chromosome
lineage found in modern Eurasia, having originated in the Eurasian Steppes north of
the Black and Caspian Seas (Jobling and Tyler-Smith, 2003; Klyosov and Rozhanskii,
2012). R1a1a is particularly common in the large region extending from South
Asia and Southern Siberia, across India to Central and Eastern Europe (Slavic
populations) and Scandinavia (Sengupta et al., 2006; Underhill, 2010; Klyosov and
Rozhanskii, 2012). Even though haplogroup R1a occurs as the most frequent Y
chromosome haplogroup among populations such as Slavic, Indo-Iranian, Dravidian,
15
Turkic and Finno-Ugric, many authors have interest in the link between R1a and the
Indo-European language family (Sengupta et al., 2006; Underhill, 2010). Haplogroup
R1a1 (R-M198) haplotypes were brought to India around 3500 years ago (Sengupta
et al., 2006). The present understanding is that R1a1 bearers, known later as the
Aryans, brought to India not only their haplotypes and the haplogroup, but also their
language, thereby building the linguistic and cultural bridge between India (and Iran)
and Europe, possibly creating the Indo-European family of languages (Sengupta et
al., 2006). This has relevance for the possible Indian roots of the Lascar slaves
believed to be the forefathers of the amaMolo clan, in that through discovering
Eurasian haplogroups it would support the oral history with regards to their nonAfrican, and possibly Indian origins.
Haplogroup R1b is believed to have originated and expanded as humans began to recolonize Europe after the last glacial maximum, approximately 10 to 12,000 years ago
(Myres et al., 2011). R1b is the most common haplogroup found in Western Europe
and is also found in Eastern European and West-Asian populations at lower
frequencies, and is also prevalent in the vast majority of the British Isles (Kivilsild et
al., 2002; Campbell, 2007; Karafet et al., 2008; Chiaroni et al., 2009; Myres et al.,
2011; Raghavan et al., 2014), and also in parts of sub-Saharan Central Africa, for
example around Chad and Cameroon (Balaresque et al., 2009). About one in five
males sampled in northwestern Ireland stems from an R1b-delineated haplogroup,
R1b3, and is linked via the patriline which descends from the most important dynasty
of early medieval Ireland, the Uı´Ne´ill (Moore et al., 2006; King and Jobling, 2009 [b]).
Other haplogroups which segregate under the Eurasian subcontinent include
haplogroup G-M201 which originated around 30,000 years ago, in either the Middle
East or South Asia (Cruciani et al., 2002; Cinniog˘lu et al., 2004; Karafet et al., 2008).
While haplogroup G (G-M201) occurs at its highest levels in the Caucasus region (e.g.
74% in Ossetians from Digora), it is widespread; occurring at low to moderate levels
from Northwest Europe to South and East Asia (Cinniog˘lu et al., 2004). Around 1 in
10 Ashkenazi Jewish males fall into haplogroup G (G-M201), and it is found at an
average frequency of 7.9% in the Afghan gene pool, established during the Neolithic
expansion throughout the region (Behar et al., 2004; Sengupta et al., 2006; Lacau
2012).
16
Haplogroup I (I-M170) is considered as the only native European Haplogroup, and
appeared in Europe from the Middle East roughly 20,000 years ago and, alongside
haplogroup R, it is considered as the second major European haplogroup (Semino et
al., 2000; Hammer and Zegura, 2002). Haplogroup I (I-M170) Y chromosomes occur
in nearly 20% of the European male population, and has also been found among some
populations of the Near East, the Caucasus, Northeast Africa and Central Siberia
(Hammer and Zegura, 2002; Karafet et al., 2008).
Haplogroup J lineages are found at high frequencies in the Middle East, North Africa,
Europe, Central Asia, Pakistan, and India (Underhill et al., 2001; Semino et al., 2002;
Behar et al., 2004; Sengupta et al., 2006). Haplogroup J-M172 is the most common J
sub-haplogroup in Europe, which emerged 30,000 years ago in the Middle East and
has been carried by Middle Eastern traders into Europe, central Asia, India, and
Pakistan (Di Giacomo et al., 2004; Karafet et al., 2008). Haplogroup J-M267
predominates in the Middle East, North Africa, and Ethiopia (Semino et al., 2004),
which contains the Cohen Modal Haplotype (CMH). The CMH is found exclusive to a
lineage believed to have originated from the Cohanim (Jewish high priests) in the
northern portion of the Fertile Crescent, where it later spread throughout central Asia,
the Mediterranean, and south into India around 10,000 years ago (Hammer et al.,
2009; Soodyall, et al., 2013). The lineage eventually migrated south, back into Africa
and a variant of the modal haplotype features in the Lemba people. Soodyall, et al.,
(2013) publicised revised haplotype data on the Lemba peoples’ origins, which
through carrying the CMH, feature semitic origins in majority of haplogroups. The
name "Lemba" may originate from chilemba, a Swahili word for turbans worn by Bantu
peoples, or lembi, a Bantu word meaning "non-African" or "respected foreigner"
(Shimona, 2003).
Haplogroup Q is defined by marker Q-M242, and is the lineage that links Asia and the
Americas (Jobling and Tyler-Smith, 2003; Zegura et al., 2004). This lineage is believed
to have originated in southern/central Siberia and central Asia, migrated through the
Altai / Baikal region of northern Eurasia and across the Bering straits eventually into
the Americas, thereby characterising a novel founder Native American haplogroup
(Jobling and Tyler-Smith 2003; Bortolini et al., 2004; Zegura et al., 2004; Karafet et
al., 2008).
17
1.2.2.2 African Y haplogroups
The frequently observed clinal pattern of reduced genetic diversity away from Africa is
seen as strong evidence for the out-of-Africa movement(s) of anatomically modern
humans approximately 35,000 and 89,000 years ago, where a minority of
contemporary East Africans and Khoisan represent the descendants of these most
ancient ancestral patrilines (Underhill et al., 2000; Soares et al., 2011; SanchezFaddev, 2013). Haplogroups A and B are the deepest branches in the Y chromosome
phylogeny and are essentially restricted to Africa, providing the evidence that modern
humans first arose there (Underhill et al., 2001; Jobling and Tyler-Smith, 2003; Karafet
et al., 2008; Chiaroni et al., 2009). Macro-haplogroup A is not mono-phyletic and
contains many sub-clades (Karafet et al., 2008). It is mainly restricted to the Rift Valley
from the Cape up to Ethiopia, to mostly, but not exclusively some of the oldest huntergatherers who still survive and speak Khoikhoi and San languages, which are believed
to be the oldest human languages represented by haplogroup A00 (Underhill et al.,
2001; Cruciani et al., 2002; Salas et al., 2002; Karafet et al., 2008). The interruption of
its distribution in the middle of the Rift Valley is possibly due to replacement by Bantuspeaking farmers who settled the region starting in the first millennium of the Christian
era (Chiaroni et al., 2009).
Haplogroup B is found mainly among African Pygmies of the central African forest who
are still predominantly hunters-gatherers but speak Bantu languages borrowed from
farmers who arrived in the area between 2,000 and 3,000 years ago (Underhill et al.,
2001; Karafet et al., 2008). Haplogroup B (B-M152) occurs at low to moderate
frequencies in most sub-Saharan African populations, including populations from
Cameroon and East Africa, and among Southern Bantu-speakers (Cruciani et al.,
2002; Underhill et al., 2001).
Haplogroup E1b1a (E-M2) is the most common haplogroup in sub-Saharan Africa
which originated in Northeast Africa between 30,000 to 40,000 years ago (Hammer
and Zegura, 2002; Crucianci et al., 2002). Today its lineages are found occurring in
the Mediterranean and the Near East (Cruciani et al., 2002; Karafet et al., 2008).
Settlement outside of Africa by haplogroup E members involves the later
subhaplogroup E-M35 varieties like M78, M81, and M123 that extended to Arabia and
the northern Mediterranean coast (Cruciani et al., 2002; Chiaroni et al., 2009).
18
Haplogroup E2b1 (E-M85) is seen throughout sub-Saharan Africa at moderate levels
and is a haplogroup that diversified some time afterward other haplogroup
sublineages, probably having descended from the East African population that
generated the Out-of-Africa expansion (Cruciani et al., 2002). Haplogroup
E1b1a1a1c1a (E-M191) most likely spread throughout sub-Saharan Africa as a result
of migrations associated with the Bantu Expansion. It is now the most common
haplogroup in sub-Saharan Africa, although, its highest levels are still seen in West
Africa (Cruciani et al., 2002).
1.2.3 Y chromosome Short Tandem Repeats (Y-STRs) and Y-haplotypes
Previous population-ancestry type studies have shown the utility of STR haplotypes in
pedigree analyses, which include King and Jobling (2009 [b]), Wu et al., (2010),
Balanovsky et al., (2011), Reguiero et al., (2012), Soodyall (2013) and Westen et al.,
(2015).
Y chromosome data is organized such that haplogroups provide an indication of
geographic clustering and haplotypes further refine variation within haplogroups. YSTR haplotypes are used to predict haplogroups which directs the sequence of
multiplexes for Single Base Extension (SBE) - SNP affirmation. The use of SNP
haplogroup data in conjunction with STR haplotype data creates extended haplotypes
which allows for the measure of unique transmissions in the pedigree, and permits the
examination of relationships between haplotypes in haplogroup-specific Y-haplotype
networks.
1.3 Matrilineal ancestry
1.3.1 Mitochondrial DNA
Mitochondrial DNA (mtDNA) has proved to be a powerful tool in reconstructing
population history and diversity studies (Richards et al., 2000; Finnila et al., 2001;
Fadhlaoui-Zid et al., 2004; Ralph and Coop, 2013). MtDNA is present in the
mitochondrion organelle of the cell which are usually numerous and polymorphic in
morphology. The mitochondria are involved with energy manufacture and processing
of the cell, where most mitochondrial genomes encode for 13 subunits of the oxidative
phosphorylation system, two ribosomal RNAs (rRNAs), and 22 transfer RNAs (tRNAs)
(Figure 1.4) (Scheffler, 2000; Iborra et al., 2004; Doosti and Dehkordi, 2011). Genetic
19
analysis of mtDNA has been an important tool in understanding human evolution due
to the characteristics of mtDNA, such as its high copy number, near-absence of
recombination, high substitution rate, as well as its maternal mode of inheritance
(Cann et al., 1987; Scheffler, 2000; Destro-Bisol et al., 2004; Iborra et al., 2004; Behar
et al., 2007; Gonder et al., 2007). Knowledge of mtDNA sequence variation is rapidly
accumulating, and the field of anthropological genetics, which initially made use of
only the first hypervariable segment (HVS-I) of mtDNA, is advancing to the point where
complete mtDNA genome analysis will be the common genotyping practice (Salas et
al., 2002; Kivisild et al., 2004; Behar et al., 2007; Gonder et al., 2007). Most studies of
human evolution include mtDNA sequences from the 1kb, non-coding control region
known as the displacement loop or ‘d-loop’, which occupies less than 7% of the mtDNA
genome (Scheffler, 2000; Iborra et al., 2004; Gonder et al., 2007) (Figure 1.4;
adapated from “the Mito Blog” online blog). Mutations within the hypervariable regions
I and II (situated within this 1kb non-coding region) act as highly informative marker
loci for delineating mitochondrial ancestry (Scheffler, 2000; Gonder et al., 2007;
Schlebush et al., 2009). Clusters of HVR mutations delineate haplogroups, which are
represented as the major branch points on the mitochondrial phylogenetic tree. The
Phylotree mtDNA tree (Build_16) provides a phylogenetic tree of global human
mitochondrial DNA variation, based on both coding- and control-region mutations, and
includes haplogroup nomenclature as defined by its developers, van Oven and Kayser
(2009). The phylogenetic tree is updated regularly to incorporate information from
novel mitochondrial genome sequences and was last updated on the 19 February of
2014.
20
Figure 1.4. Schematic overview of the mitochondrial DNA molecule. HVRI & HVRII as situated in
the D-Loop or control region of the mitochondrial DNA molecule, amidst other primary
mitochondrial coding genes. Numbers indicate positions of DNA base pairs. Adapted from the
“MitoBlog” online blog.
1.3.2 mtDNA phylogeographic variation and inferring matrilines
Without a cultural marker such as clan name which can be used to confer the
patrilines, the strict patrilineal inheritance of the amaXhosa clan name means that it
would be difficult to trace the lineages of abeLungu women. However, mitochondrial
genotyping also shows strong geographic structuring and will allow us to infer the
phylogeographic landscape of the maternal lines. It will reveal traces of non-African
ancestry found in the matrilines, having possibly derived from Bessie (whose maternal
legacy is the most historically renowned), or of any other female surviving members.
Given the historic nature of the abeLungu, descendants of non-African female
survivors may harbour possible European and/or Eurasian origins which could be
observed in the mtDNA of these lineages. Several historians including Soga (1930),
Crampton (2004) and Kirby (1953) are in agreement that the oral history states that
the non-African survivors from shipwrecks were predominantly male individuals, who
had integrated into Xhosa communities and married local Xhosa women with whom
21
they began clan families of mixed ethnicities. From this tenet we may expect to
observe a majority of African haplogroups in the maternal lineages, if not entirely.
The study of the geographic distribution and diversity of genetic variation is known as
the “phylogeographic approach” (King and Jobling, 2009). The global distribution of
mitochondrial haplogroups is such that Eurasia, Asia, Europe and the Americas all
retain haplogroup diversity signatures which reflect the migration of anatomically
modern humans out of Africa into the Near East, approximately 100 to 130 000 years
ago (Behar et al., 2008; Soares et al., 2011). Mitochondrial DNA diversity, in Africa,
can be assigned into seven macro-haplogroups (L0 to L6), with haplogroups L0–L3
and L5 as the primary mtDNA haplogroups whose spread is restricted mainly to subSaharan Africa (Kivilsild et al., 2004; Loogvali et al., 2004; Behar et al., 2008) with the
rest of the worlds’ lineages classified as subgroups of macrohaplogroups M, N and R
(Behar et al., 2008; Soodyall and Schlebusch, 2010). A world map illustrating the
global distribution of mtDNA haplogroups has been adapted from that found on the
J.D MacDonald family name reference database (Figure 1.5).
Figure 1.5. Global distribution map of mtDNA haplogroups, adapted from MacDonald
(2005). The map illustrates the global distribution of mtDNA haplogroups partitioned by
ethnic groups across the globe
22
1.3.2.1 African mitochondrial haplogroups
Macro-haplogroup L is geographically restricted to sub-Saharan Africa and has been
divided into haplogroups L0–L6 (Salas et al., 2002; Behar et al., 2008). Haplogroup
L0 is divided into sub-haplogroups L0a, L0d, L0f, and L0k, and the time to the most
recent common ancestor (TMRCA) of L0k, L0f, and L0a is 139.8 ± 24.6 kya (Gonder
et al., 2007). L0a is believed to have originated in eastern Africa and is the largest,
most diverse and widespread haplogroup of the L0’s. L0a common in eastern, central,
and southeastern Africa, but is almost absent in northern, western, and southern
Africa. Haplogroup L0a was probably brought to south-eastern Africa by the eastern
movement of the Bantu Expansion (Salas et al., 2002; Plaza et al., 2004). L0d is
thought to be the oldest of the L0 clans. The distribution of L0d and L0k strongly point
to an origin of these haplogroups among Khoe-San ancestors, which occurred prior to
the arrival of Bantu-speaking populations in southern Africa. The frequencies of these
clades are of up to 40% in different south-eastern Bantu-speaking tribes (Schlebusch
et al., 2009). Haplogroup L0d is present in the !Xun and Khwe peoples at frequencies
of 51% and 16%, respectively, while L0k was found at frequencies of 26% in the !Xun
and 23% in the Khwe (Salas et al., 2002; Schlebusch et al., 2009). Haplogroup L0f is
a rare group, scattered throughout populations from East Africa to South Africa, and
is most common in Kenya, Sudan, Tanzania, and Uganda (Gonder et al., 2007).
Macro-haplogroup L1 encompasses 52% of the haplogroup L haplotypes and 29% of
all African mtDNAs according to a study by Wallace et al., (1999), and is comprised of
sub-haplogroups L1a, L1b and L1c (Gonder et al., 2007). Salas et al., (2002) stipulate
that haplogroup L1a was most likely to have been brought to south-eastern Africa by
the eastern stream of the Bantu expansion, after having been picked up in East Africa.
Haplogroup L1b is concentrated in western Africa, but it also occurs in central and
northern Africa (particularly in areas adjacent, geographically, connected by the West
African coastal pathway) but prevalent little in East, southeastern, or southern Africa
(Salas et al., 2002; Gonder et al., 2007). Haplogroup L1c is the largest and most
diverse group in the L1 clan, and most likely arose in Central Africa, around 20,000
years ago. Haplogroup L1c is seen at high levels in Central Africa, and is also found
commonly in African Americans and central African Bantu speakers (Salas et al.,
2002). The origin of L1c can be placed somewhere in Central Africa towards the
23
Atlantic west coast, in the uncharacterized areas of Angola and the Congo delta, to
the south of the putative Bantu home-land, on the route of the “western stream” of the
Bantu expansion (Salas et al., 2002). Both L1b and L1c are nearly absent in eastern
and southern Africa (Gonder et al., 2007).
Haplogroup L2 is commonly subdivided into four main subclades, L2a through L2d
with haplogroup L2a as the most frequent and widespread haplogroup in Africa (Salas
et al., 2002). L2a appears to have arisen in West Africa around 33,000 years ago
before drastically increasing in number in south-eastern Africa, with the distribution of
haplogroup L2a possibly being a signature of the Bantu Expansion (Torroni et al.,
2001; Salas et al., 2002). Haplogroups L2b, L2c, and L2d appear to be largely confined
to West and western Central Africa Haplogroup L2c is frequent in western Africa, and
is rarely found in other parts of Africa (Salas et al., 2002). L2d being the oldest of the
L2 haplogroups is thought to have originated in West Africa and is found in most
western and central African populations, declining in frequency toward the south
(Salas et al., 2002).
Haplogroup L3e is the most widespread, frequent, and ancient of the African L3
clades, comprising approximately one-third of all L3 types in sub-Saharan Africa and
possibly arose in Central Africa near Sudan around 35,000 years ago (Bandelt et al.,
2001; Soares et al., 2011). Haplogroup L3d is found mainly in West Africa and was
found at high frequencies among southwestern Bantu speakers (Schlebusch et al.,
2009). It is said to have been brought into southern Africa with the western movement
of the Bantu Expansion (Bandelt et al., 2001; Soares et al., 2011).
Haplogroup L4 is common in East Africa and the Horn of Africa, and is prevalent in
North-eastern African populations, while less prevalent in central Africa (Batai et al.,
2013). It is found at low frequency or almost absent in southern African populations.
The highest frequencies are in Tanzania among the Hadza at 60-83%, and in the
Sandawe at 48% (Tishkoff et al., 2007).
Haplogroup L5 (previously referred to as L1e) has been observed at low frequency in
eastern Africa (Salas et al., 2002; Kivilsid et al., 2004), Egypt, and among the Mbuti
Pygmies (Kivisild et al., 2004; Gonder et al., 2007). The geographic spread of
24
haplogroup L5b lineages is more southern, extending to the Sukuma from Tanzania
(Knight et al., 2003; Kivisild et al., 2004; Gonder et al., 2007).
An East African origin of haplogroup L6 seems most likely, because of its presence in
Ethiopians and the fact that its sister haplogroups L2, L3, and L4 are all diverse and
frequent there (Kivilsid et al., 2004). This is confirmed in a study on African origins in
the Arabian Peninsula, where haplogroup L6 had been observed most frequently
in populations of Yemen and Ethiopia (Abu-amero et al., 2007). Due to a lack of an
exact match from the African database for Southern-Arabian L6 samples, and the
relatively deep time-depth of its variation in Ethiopians and Yemenis—taken together,
at approximately 36,600 years ago, it is possible that this haplogroup has been
preserved in isolation in the Ethiopian Highlands and southern Arabia for tens of
thousands of years (Kivilsid et al., 2004; Abu-amero et al., 2007). However, the most
frequent haplotype of L6 in Yemenis does not bear any descendant lineages, which
suggests that its carriers coalesce to a common ancestor which occurs within only a
couple of thousands of years (Kivilsid et al., 2004; Abu-amero et al., 2007).
1.3.2.2 Non-African mtDNA haplogroups
Possible non-African mtDNA haplogroups which may be observed are those which
the oral history accounts for and would presumably be those found in Western Europe.
Analysis of diversity in European mtDNA reveals a relatively homogeneous landscape
comprised of approximately 10 haplogroups (Torroni et al., 1996; Rosser et al., 2000;
Loogvali et al., 2004).
Bryan Sykes in his seminal work, entitled “The seven daughters of Eve”, assigned
haplogroup names which classify the seven major mitochondrial lineages for modern
Europeans which trace back along the maternal lineages, to seven prehistoric women,
each stemming from the African Mitochondrial Eve, the most recent common maternal
ancestor (Sykes, 2001). Loogvali et al., (2004) re-mapped European and western
Eurasian haplogroups as those including haplogroups H, J, K, N1, T, U4, U5, V, X and
W. The study by Torroni et al., (1994[a]) and then that of Finnila et al., (2001), had
identified four European clusters (H, I, J, and K) individuals of European ancestry.
Torroni et al., (1996) applied the same methodology to two Scandinavian population
samples which identified five additional clusters (T, U, V, W, and X), which, together
25
with the previous four clusters, appeared to encompass virtually all examined
European mtDNAs (Torroni et al., 1996; Macaulay et al., 1999; Simoni et al., 2000; Fu
et al., 2012). Haplogroup H alone constitutes about one half of the European mtDNA
pool and is widespread also in western Asia (Simoni et al., 2000; Loogvali et al., 2004;
Brotherton et al., 2013). The United Kingdom is comprised of 44.7% Eurasian
haplogroup H, which is predominant in Western Europe, and is found distributed
amidst the Iberian Peninsula, found in Spain at 27.8%, Morocco (19.2%) and Sardinia
in 17.9% of mtDNAs (Achilli et al., 2004). Haplogroup U is represented by its subclades
U1a, U3, U5, U5a1a, U7a, and K which are predominant in the Near East and Europe
(Macaulay et al., 1999; Richards et al., 2000; Fadhlaoui-Zid et al., 2004). Haplogroup
V, which is largely distributed in Western Mediterranean populations most likely
originated within Europe and spread eastward (Richards et al., 2000; Fadhlaoui-Zid et
al., 2004).
26
1.4 Aims and objectives of the study
It has always been the case that unlike much written history, the oral transmission of
biographical and historical information cannot be verified as it is subject to change and
elaboration over time. In this study we address the issue of markers of identity from a
genetic perspective. The main focus of this research is to use molecular markers to
shed light on the history of the ancestry of the abeLungu clans (and amaMolo) from
the Wild Coast region to the Eastern Cape. More specifically,
i)
Y chromosome DNA data will be used to trace the geographic region(s)
of origin(s) of the male founders of the clans; male subjects are the focus
of the study because both matrilineal (mtDNA) and patrilineal (Y-DNA)
studies can be conducted on DNA from a single individual, and more
fundamentally because Y-DNA is inherited in tandem with the cultural
marker of clan name
ii)
MtDNA will be used to assess the maternal genetic contribution of
females in these groups. A corollary would be to determine any links to
Bessie and her maternal legacy
iii)
The genetic data will be used in conjunction with genealogical information
to test/refine the oral history of the abeLungu and the amaMolo, which
claims European/Eurasian paternal ancestry. In particular, I hope to see
whether, and to what extent, oral histories and genealogies that link
contemporary isiXhosa clans to non-African forebears have survived.
Given that Y chromosomes are transmitted from father to sons, like surnames (clans
in this instance), the Y chromosomes found in living people from the group ought to
coalesce within clans, to the founding father of the clan. Since the patterns of Y
chromosome variation show strong correlation with geography, this study will make
use of the global Y chromosome phylogeny to assess the geographic region of origin
of the founding fathers of the abeLungu and amaMolo clans. Also, Y chromosome
data would be used in conjunction with the revised genealogy would enable an
examination of the fidelity of Y chromosome transmission within clans.
27
Similarly, the mtDNA data would resolve the ancestry of females who have contributed
to shaping the gene-pool of these clans. MtDNA data are also highly structured by
geography and African haplogroups can be distinguished from non-African ones.
28
CHAPTER 2
Subjects and Methods
The following chart (Figure 2.1) presents an overview of the methods employed in the
study so as to resolve extended Y chromosome haplotypes and mtDNA haplogroups
of the paternal and maternal lineages of the abeLungu.
Figure 2.1. Schematic overview of the methods employed in the study
29
2.1
Subjects and sampling
2.1.1 Ethics approval for study
This research has been approved by the Human Research Ethics Committee at the
University of the Witwatersrand under protocol numbers M090576 (previously
M980553) for Professor Himla Soodyall, and protocol number M120364 for David de
Veredicis (Appendix A). DNA was collected from research subjects with their informed
consent, following the appropriate ethical guidelines for research on human subjects
by the NHLS.
2.1.2 Sampling and research area
The sample set comprised of 198 subjects which consisted of 188 males and ten
females. Of the 188 males, 146 self-identified into one of 13 abeLungu clans. The
remaining 42 individuals were not affiliated with any of the abeLungu clans, but were
included in the study to represent males from regions co-habited with the abeLungu.
These subjects self-identified as ‘Xhosa’, ‘mixed’ race, or were unsure of their
ancestral background. Since the primary focus of the study revolves around the
patrilineal heritage of the abeLungu, the ten females (who were relatives of principal
male subjects), have been included for mtDNA studies. The inclusion criterion was to
sample at least one male from each compound in the village, two males were preferred
where available, and closely related individuals such as father–son pairs were
randomly included. Present-day abeLungu subjects resemble their Xhosa relatives
phenotypically for the most part, however some have retained recessive features that
they inherited from their European/Eurasian roots. Three subjects, including an
individual named “Lord Nicholas Beresford” of the clan Irish, were found to still retain
blue eyes (Figure 2.2).
30
Figure 2.2. Photographs of three male subjects featuring blue eyes which illustrate the
phenotypic links to their non-African patrilineal ancestry. Photographs have been taken
with the individuals’ permission; Mamolweni, Eastern Cape, 2009, 2010.
The research area spanned about 150 km of coastal region closely associated with
locations of shipwrecks which were found between the Mzimvubu and Xhora rivers
(Figure 2.3). In her fieldwork, Kalis had interviewed 13 abeLungu clans located in 24
homesteads or lalis (Table 2.1), all of which fall under the greater ‘abeLungu’ clan
superfamily. However, it is only the amaMolo, abeLungu Jekwa, abeLungu Hatu and
abeLungu Buku clans whose genealogies trace descent from their forebears of the
original shipwreck survivors of the 16th and 17th century, as documented by Soga in
1930, and are considered as the primary abeLungu clans. The secondary, more
recently established abeLungu clans include the clans Caine, France, Horner, Irish,
Ogle, Hastoni, Sukwini, Fuzwayo, and Thakha (Table 2.1) which have from more
recent origins from non-African shipwreck survivors having assimilated into the
mPondo people. Individuals from these clans were genotyped for the purpose of
elucidating their Y chromosomes origins as well, and were sampled from homesteads
near those of the three primary abeLungu clans (Buku, Hatu and Jekwa) and the
amaMolo. The geographic sampling locations of both primary and secondary
abeLungu clans as well as the amaMolo can be found summarized in Table 2.1 and
indicated in Figure 2.3.
31
Community engagement began by first visiting the amaMolo Chief Mhlabunzima
Mxhaka residing at ‘the Great Place at Mamolweni’ in mPondoland, so as to gain
endorsement for the study and introduction to people in this region. The Chief also
pointed out that even though his people identify through the amaMolo clan name they
are also known as abeLungu. This was the first indication that what had been read
about in documented history about the amaMolo being an independent clan was not
necessarily reflected in the field. Having obtained approval from the Chief, other
members of the abeLungu were engaged with from whom clan family histories were
collected. The research was facilitated by our interpreter Mr. Qaqambile Godlo, who
enabled translations between English and isiXhosa. Most subjects were able to name
their male antecedents back to their clan progenitor. The genealogies of clan
representatives were constructed from oral histories and kinship data collected by
Kalis, and are located in Appendix C, under Figures S1 – S6.
32
Table 2.1. Geographic sampling regions of abeLungu clans
33
Figure 2.3. Research area of the study. Depicted are the locations of lalis, or homesteads of the three
primary abeLungu clans, namely abeLungu Jekwa, Buku, Hatu as well as the amaMolo. The research
area is located between the Xhora and Mzimvubu rivers, as situated along the Wild Coast, in the
Eastern Cape of South Africa. This distribution can be found under Table 1. This map has been adapted
from Google Maps online service: Imagery @2013 TerraMetrics, Map data @2013 AfriGis (Pty) Ltd.
Google Maps
34
2.2. Laboratory Methods
2.2.1 DNA extraction and quantification
Buccal swabs were taken using sterile cytology brushes from Gentra Puregene®
Buccal Cell Kits (Qiagen, Germany). Swab heads were placed in labelled 2.0 ml
Eppendorf tubes containing 300µl Cell Lysis Solution (Puregene® Kit). Two brush
swabs (an A and B sample) were taken per individual, in the event of the contamination
the first sample or simply if there was not enough DNA of the first sample to fully
resolve the subject’s ancestry genotype. DNA extraction was performed on cheek cells
collected by using Puregene® DNA purification kits (Qiagen, Germany) according to
the product manufacturer’s instructions. The basic principles of this method of DNA
extraction involve the lysis of the cell membrane, degradation of proteins and the final
precipitation of DNA out of solution. Nucleic acid concentrations were quantified using
a NanoDrop® ND-1000 Spectrophotometer and ND-1000 v2.2.0 software (ThermoFisher Scientific Inc.). Absorbance values were measured at a wavelength of 260nm.
Once extracted, stock DNA was diluted with ddH20 into the required working
concentrations for the various genotyping processes (20ng/μl, 5ng/μl and 1 ng/μl).
2.2.2
Molecular methods for Y chromosome DNA studies
All methods for mtDNA and Y chromosome DNA screening have been developed and
optimised within the Human Genomic Disease and Diversity Research Laboratory
(HGDDRL). A combination SNP-STR method of genotyping Y chromosome DNA had
been employed for the purpose of examining Y chromosomes both at the haplogroup
level, and at the more resolved haplotype level. These include resolving haplogroupdefining Y chromosome SNPs by using several Single Base Extension (SBE) assays,
as described by Naidoo et al., (2010), and secondly through examining STR variation
using microsatellite loci. When using combination SNP-STR systems, one acquires
an improved resolution of the landscape of geographic origins as well as improved
precision and accuracy of divergence time estimates (Zhivotovsky et al., 2004;
Ramakrishnan and Mountain, 2004; Klyosov, 2009; Naidoo et al., 2010).
35
2.2.2.1 Y-STR genotyping
STR genotyping was performed using the AmpFℓSTR® Yfiler™ PCR Amplification Kit
(Life Technologies) according to the manufacturer protocol. The AmpFℓSTR® Yfiler™
assay is a multiplex which permits the simultaneous analysis of 17 STR loci in a single
PCR amplification, allowing for increased discrimination capacity of haplotype
analysis. The kit contains markers for the Y-STR ‘minimal haplotype' (minHt) which
encompasses the marker panel recommended by the Scientific Working Group on
DNA Analysis Methods (SWGDAM). The panel includes the Y-STR markers: DYS19,
DYS385a/b, DYS389I/II, DYS390, DYS391, DYS392, and DYS393. In addition to the
panel, markers for the highly polymorphic loci DYS438, DYS439, DYS427, DYS448,
DYS456, DYS458, DYS625 (Y GATA C4), and Y GATA H4 are included so as to
increase the resolution capacity to that of a suite of 17 microsatellite loci (Ayub et al.,
2000; Redd et al., 2002, Mulero et al., 2006; AmpFℓSTR® YFiler™ user guide). STR
screening using the STR Yfiler Kit System begins with the initial PCR amplification
step, with the reagents used, listed in Table 2.2.
Table 2.2. Amplification of Y-STR loci
PCR reagents
Final Volume (µl)
DNA sample (1ng/µl)
1.00
AmpflSTR Yfiler PCR reaction mix
2.30
AmpflSTR Yfiler Primer set
1.25
AmpliTaq Gold® DNA polymerase
0.20
ddH2O
1.50
The Y-STR amplification PCR Thermal Cycler (9700) conditions were: An initial
denaturation step of 11 minutes at 95°C, followed by 30 cycles of denaturation at 94°C
for one minute, annealing at 61°C and a ligation and extension at 72°C for one minute.
Lastly a final extension step at 60°C for 80 minutes occurs preceding the final cooling
and holding step which brings the temperature to rest at 4°C. PCR products were
visualized by suspending one microliter of PCR product together with 0.3µl internal
lane standard (GS500 LIZ) (Life Technologies) in 8.7µl Hi-Di® Formamide, before
36
being analyzed on a 3130xl Genetic Analyzer (Life Technologies), and subsequently
visualized using Genemapper® ID Software v3.2 (Life technologies).
2.2.2.2 Y chromosome binary marker screening
Single base extension (SBE), is a fluorescence-based multiplex PCR system that
allows for numerous ancestry informative SNPs to be typed in a single reaction (Gray,
2000; Schlebusch et al., 2009). The principle of the SBE method lies behind the
extension of a “detection” primer (which vary through the attachment of varying sized
polynucleotide tail) and has annealed immediately upstream to the 5’ of the mutation
site, using one of four fluorescently-labelled dideoxynucleotide-triphosphates
(ddNTPs), with a fifth colour (LIZ 120) used as the internal lane standard. Naidoo et
al., (2010), developed seven SBE assay multiplex panels which resolve Y
chromosomes to one of 61 terminal haplogroup branches, where six panels focus on
resolving markers delineating subclades of African haplogroups A, B and E. All
samples were initially run with the YSNP1 panel (as they had presumably European
or Eurasian ancestry) and subsequent SBE multiplexes were performed hierarchically,
following the phylogenetic placement and allelic states of resolved polymorphisms on
the Y chromosome SNP phylogeny [Figure 2.4(a)] (YCC, 2002; Jobling and TylerSmith, 2003; Karafet et al., 2008; Schlebusch et al., 2009; Geppert and Roewer,
2012). Haplogroup nomenclature is in accordance with the most current International
Society of Genetic Genealogy (ISOGG) standard; when last accessed in November,
2016 and topology of the phylogeny is based on that of Karafet et al., (2008). The
YSNP1 marker panel was used for resolving European and Eurasian haplogroups,
and included primers for the binary markers SRY1083.1, M168, M89, M201, M69,
M170, M172, M9, M207, M198 and M343 [Figure 2.4(b)]. Samples which were found
to be ancestral at marker SRY 1083.1 and M91 were run using the using HG-A and
HG-B SBE assays. The SNPs used in the various mini-sequencing panels are listed
under Appendix B, Table S1, where haplogroup nomenclature is also in accordance
with the most current ISOGG, 2016 nomenclature. The ancestral and derived states
of markers as they appear on electropherogram profiles of YSNP1 assays are
illustrated in Figure 2.4(b). The ABI PRISM® SNaPshotTM Multiplex Kit (Life
Technologies) was used for all SBE assays, according to the manufacturer’s protocol
37
with modifications on methods established by Naidoo et al., (2010). PCRs were
conducted using GeneAmp® PCR System 9700 Thermal Cyclers (Life Technologies).
Figure 2.4 (a) The Y chromosome SNP phylogeny with topology adapted from Karafet et al., (2008)
and nomenclature based on the ISOGG Y-DNA Haplogroup Tree (2016), illustrating the Y haplogroups
which are designated by specifc SNP markers screened for using the SNaPshotTM SBE method, as
well as the multiplex II assay. Newer branches of haplogroup A (A0 and A00), more recently defined
by ISOGG (2015) have not been indicated on the phylogeny (b) An electropherogram showing the
relative peak height, position and colour of peaks illustrating the derived states of the Y-STR markers
screened for when using the YSNP1 SBE marker panel
38
Products were separated on an ABI PRISM® 3130xl Genetic Analyzer (Life
technologies) and data was visualized using GeneMapperID v3.2 software (Life
technologies). The reagents used for the YSNP1 multiplex PCR are listed in Table 2.3.
Table 2.3. YSNP1 SBE multiplex PCR reagents
PCR reagents
Final Volume (µl)
FastStart 10x Buffer (with MgCl2)
2.5
MgCl2 (25 mM) [2.5 mM]
2
dNTPs
3
YSNP1 Forward Primer Mix
1
YSNP1 Reverse Primer Mix
1
ddH20
14.3
FastStart Taq
0.2
Total
24.0
The PCR conditions used for the YSNP1 multiplex PCR are listed in Table 2.4.
Table 2.4. SBE PCR thermal cycler conditions
Temperature
Phase
Time (minutes)
95°C
Initial Denaturation
6:00
95°C
Denaturation
00:30
54°C
Annealing
00:30
72°C
Ligation and extension
00:30
72°C
Final extension
10:00
25°C
Hold
∞
35 cycles
39
Excess PCR primers and dNTPs were removed via enzymatic purification. For every
5µl PCR product, 2µl of purification mix (see constituents in Table 2.5) was added
which was incubated at 37°C for 1 hour. Enzyme inactivation was at 75°C for 15min.
Table 2.5. Post-PCR purification reaction reagents
Reagents
Volumes per reaction (µl)
Shrimp Alkaline Phosphatase (1U/µl)
1.4
Exonuclease I (20U/µl)
0.2
ddH20
2.4
Once PCR-product had been purified, the SBE reaction proceeded. In this step the
“detection” primers were extended by ddNTPs that are complementary to the SNPs of
interest, using the ABI Prism SNaPshot™ Multiplex kit (ABI Life Technologies). A
positive control and a negative control template were included in the assay. The PCR
protocol is made of 35 cycles of a 10 second denaturation step at 96˚C, followed by
annealing which occurs at 50˚C for 5 seconds and lastly the ligation and extension at
60˚C for 30 seconds. The reagents and their proportions as used in the SBE reaction
are listed below in Table 2.6.
Table 2.6. YSNP1 Multiplex SBE reaction
SBE mix
Volumes per reaction (µl)
SNaPshot™ Multiplex Ready Reaction Mix
1
YSNP1 SBE primer mix
1
ddH20
1.5
Total
3.5
Positive control
Negative control
Control DNA Template
1.5
~
SNaPshot™ Multiplex Ready Reaction Mix
1
1
Control Primer mix
1
1
ddH20
1.5
3
Total
5
5
40
Post-Extension enzymatic purification was then performed so as to remove excess
dNTPs and reagents. For every 5µl of SBE product 2µl of post-extension mix
(comprising of 0.5µl SAP (1U/µl, 0.7µl 10X SAP buffer and 0.8µl ddH20) was added.
The mix was incubated at 37°C for 1 hour, and inactivation of the enzymes was done
at 75°C for 15 minutes. Detection of SNPs was performed by suspending 2.0µl of
cleaned SBE product in 7.5µl Hi-Di® Formamide (Life Technologies), together with
0.5µl of internal lane standard (GS120 LIZ), prior to running on the on a 3130xl Genetic
Analyzer (Life Technologies).
2.2.2.3 Additional marker screening
Additional ancestry informative SNPs and STR markers not covered in the Yfiler™
and SNaPshot™ systems were screened for using the Multiplex-II assay developed
in the HGDDRL. The additional SNP markers included in the Multiplex II assay were
M139, M17, M175, M186, M60 and M91, which screen for haplogroup clades A, B,
R1a1 as well as internal nodes within the Y chromosome phylogeny and can aid to
direct genotyping by indicating whether to exclude or include certain SNaPshot™
systems. The additional highly polymorphic Y-STRs included in the multiplex II assay
were DYS426 and DYS388 which also increased the resolution of haplotype profiles
to that of 19 STR markers.
The procedure for the Multiplex II PCR amplification entails suspending 1.0µl DNA
template at a concentration of 1ng/µL in 9.0µl True-Allele PCR Premix, together with
5.0µl primer mix, totaling to a 15.0µl reagent mix. The PCR protocol for the Multiplex
II PCR amplification is as follows: An initial denaturation is step is performed at 95°C
for 11 minutes. Thirty cycles of a three step process follow according to: a denaturation
step at 94°C for one minute; annealing at 61°C for one minute and ligation and
extension occur at 72°C for one minute. Lastly a final extension step occured at 60°C
for 80 minutes followed by a holding step at 4°C. PCR products were subsequently
prepared for analysis by suspending 1.0µl product in 8.5µl Hi-Di® Formamide (Life
Technologies), together with 0.5µl of internal lane standard (GS500 LIZ), prior to
running on the on a 3130xl Genetic Analyzer (Life Technologies). SNP and STR
profiles were visualized using GeneMapperID v3.2 software (Life technologies).
41
The ABI Taqman® assay (Life Technologies) was used to screen for SNP marker
M242 in order to confirm the presence of haplogroup Q samples (predicted using
YSTR haplogroup prediction software), according to the manufacturer’s protocol, on a
7900HT Fast Real-Time PCR System (Life Technologies). The PCR to amplify the
haplogroup Q-M242 SNP loci was performed by suspending 5.0µl of DNA (5ng) in
2.5µl Taqman® Universal Master Mix together with 0.25µl primer mix, and finally
adding 1.25µl of ddH2O, totaling the reagent mix at 5.0µl per reaction. The conditions
for the Taqman® assay consisted of an initial denaturation step at 95°C for 11 minutes
followed by 30 cycles of denaturation at 94°C for one minute, annealing at 61°C for
one minute and extension and ligation at 72°C for one minute. A final extension
occured at 60°C for 80 minutes prior to the end hold at 4°C.
2.2.3 Mitochondrial DNA molecular methods
Two approaches have been used for examining mitochondrial DNA (mtDNA) variation.
Firstly, sequencing of mitochondrial hypervariable region (HVR) d-loop variant sites
was performed and secondly to confirm haplogroups, minisequencing of mtDNA
control region SNPs was performed according to the methods described in
Schlebusch et al., (2009).
2.2.3.1 Mitochondrial D-loop HVR sequencing
Mitochondrial D-loop HVR sequencing was performed according to ABI Prism Dye
Terminator cycle-sequencing protocols developed by Life technologies, in conjunction
with methods previously published by Vigilant et al., (1989) and Behar et al., (2007).
Determining sequence variation of the mtDNA hypervariable regions was
accomplished by firstly amplifying the 1kb D-loop segment containing both the HVRs
using primers 15876F and 639R (Table 2.7).
Table 2.7. Primer sequences for the mtDNA1kb D-loop PCR amplification
Primer name
Primer sequence (5’ – 3’)
15876F (forward)
TCA AAT GGG CCT GTC CTT GTA G
639R (reverse)
GGG TGA TGT GAG CCC GTC TA
42
The 50μl reagent mix needed per reaction, for the mtDNA1kb D-loop PCR
amplification reaction included 3.0μl of DNA template (5ng/μl), 2.0μl of dNTPs (2.5
mM), 2.0μl of each primer (15876F and 639R), 0.2μl FastStart® Taq, and made up to
50.0μl with 35.8 μl ddH2O. Cycle sequencing of HVR I & II was performed both in the
forward and reverse direction to confirm sequence information with primers used listed
in Table 2.8. The conditions for the template fragment amplification consisted of an
initial denaturation step at 95°C, 35 cycles of a three-step process: i) cyclical
denaturation at 95°C for 30 seconds, ii) cyclical annealing at 55°C for 30 seconds, and
iii) two minutes of cyclical extension at 72°C, followed by a final extension step at 72°C
for 10 minutes and the final resting phase at 25°C.
The amplified fragment was resolved on a 2.0 % agarose gel with ethidium bromide
staining to check the presence and quality of PCR product using 1 X TBE buffer,
Bromophenol blue, Ficoll dye loading buffer and 1kb+ DNA ladder size standard
(Thermo Fisher Scientific). Each gel run included a negative and positive control as
well. Gels were run at 120V for 30 min and samples were visualized using the G-Box
and GeneSnap software (SynGene). After confirming the presence of DNA bands and
PCR product quality, products were then purified via enzymatic digestion to eliminate
excess ddNTPs and reagents, using 0.2μl exonuclease I (at a concentration of 20U/µl)
and 1.4μl shrimp alkaline phosphate (at a concentration of 1U/ µl), made up to 5.0µl
with ddH20.
Table 2.8. Primer sequences for HVR I & II cycle sequencing
Region
HVR I
HVR II
Primer
Sequence
15946 Forward
CAA GGA CAA ATC AGA GAA AA
132 Reverse
GAC AGA TAC TGC GAC ATA GG
L29 Forward
GGT CTA TCA CCC TAT TAA CCA C
H408 Reverse
CTG TTA AAA GTG CAT ACC GCC A
43
The volumes per reaction of cycle sequencing reagents included 2.0μl amplified 1kbPCR template, 1.0μl Big Dye Primer Mix (containing ddNTPs, Mg2+, Taq polymerase),
1.0μl of each of the four primers Primer L15946 / L29 / H132 / H408 (at a conentration
of 3.3μM), lastly made up to 10μl with ddH2O. The cycle-sequencing conditions
consisted of an initial denaturation step at 96°C for one minute, 25 cycles of {cyclical
denaturation at 96°C for 10 seconds, cyclical annealing at 50°C for five seconds, and
four minutes of cyclical extension at 60°C}, followed by the final resting phase at 4°C.
Prior to the final sequencing analysis sequencing products were purified using vacuum
purification and filtration using Montage SEQ96 sequencing reaction cleanup plates
(Millipore). After purification sequencing products were resuspended in Hi-Di™
formamide and were resolved on an ABI PRISM® 3130xl Genetic Analyzer (Life
technologies). POP_7 polymer was used instead of the supplier’s recommended
POP_4 polymer as recommended by a Life Technologies product bulletin (Life
Technologies, P/N: 4267258). Sequence data was visualized using Sequencing
Analysis Software v5.2 (Life technologies) which converted raw sequence data into
base-sequence and electropherograms.
2.2.3.2 Mitochondrial SNaPshotTM Sequencing (MTSS)
MTSS is an SBE based method which acts to target one of 14 SNPs in the
mitochondrial genome, which resolves to one of ten major macro-haplogroups
corresponding to global mitochondrial variation (Schlebusch et al., 2009), represented
in the mtDNA SNP phylogeny (Figure 2.5). The minisequencing protocol allows for
distinction between seven African L mitochondrial macro-haplogroups and three nonAfrican macro-haplogroups, namely M, N and R. MTSS assays were performed using
the ABI PRISM® SNaPshotTM Multiplex Kit (Life Technologies) according to methods
established by Schlebusch et al., (2009). The methods are based on those employed
in Y chromosome SNP SBE screening using the ABI PRISM® SNaPshotTM Multiplex
system (see section 2.2.2.2 - Y chromosome binary marker screening).
44
Figure 2.5. MtDNA SNP phylogeny, adapted from Schlebusch et al., (2009) with nomenclature
established by Behar et al., (2008) and van Oven and Kayser (2008). Positions of SNPs are
indicated and correspond to where they are found within the mtDNA molecule as illustrated in
Figure 1.4 above in section 1.3.1. Colours are as they appear as peaks on electropherograms,
which define ancestral or derived allelic states of the terminal SNP.
2.3 Data analysis
2.3.1 Y chromosome data analyses
Genemapper® ID Software v3.2 (ABI Life technologies) was used for visualisation of
STR allele peaks. Allelic variation was determined by measuring the size of STR peaks
which indicate the number of repeat copies of a specific locus. It is to be noted that
the DYS389 locus is composite in nature and contains phylogenetically informative as
well as fast-evolving regions that may obscure structure. To account for this, the peak
height value at locus DYS389I has been subtracted from DYS389II, to give the derived
value DYS389IIc which was further used in analyses (Moore et al., 2006; Roewer,
2009). The Whit Athey haplogroup predictor was used to determine haplogroups,
using Y-STR haplotype loci peak heights as input data. An ‘equal priors’ search
criterion was used so as to avoid geographic bias in predictions (Athey, 2005).
Haplogroups were confirmed using informative SNPs resolved in Y chromosome
SNaPshot™ SBE typing.
Extended haplotypes were generated when haplogroup SNPs were used in
conjunction with STR haplotype data. Modal haplotypes are those which appeared
45
most frequently, and in most instances in the genealogies can be presumed to be the
profiles tracing back to the original founding haplotype. Alternatively, modal
haplotypes acquired the higher frequencies due to drift or rapid expansion. Throughout
the haplotype analysis, for haplotype names a nomenclature system was implemented
to retain the anonymity of research subjects. For example, the haplotype name
“R343_2” represents the second (“_2”) unique haplotype found in haplogroup R1b (RM343) samples. Extended haplotype names were superimposed onto clan
genealogies to examine the patrilineal transmission of Y chromosomes in the
abeLungu. Genealogies were constructed using information collated from interviews
with clan elders and clan members, as well as subjects’ consent form genealogical
information (Appendix C, Figures S1 – S6).
Additional data from ongoing projects in the HGDDRL, as well as published Y
chromosome data from Eurasian, sub-Saharan African populations, including South
African samples as well as samples from Near-African islands including the Maldives,
Zanzibar and Madagascar were included in the comparative data analyses. Data from
publications which have been used as comparative data include Cadenas et al.,
(2008), Varzari et al., (2013), Roewer et al., (2008) and Jarve et al., (2009). For
haplogroup R1a1a segregating haplotypes comparative data was obtained from
Nebel, (2001), Qamar et al., (2002), Capelli et al., (2007), Sengupta et al., (2006),
Zalloua et al., (2008), Cadenas et al., (2008), Di Gaetano, (2009), Thanseem et al.,
(2006), and Msaidie et al., (2011). Depending on the comparative data, certain studies
had typed profiles using fewer STR marker loci, data was subject to truncation on the
number of useable loci from 19-markers to 12-marker haplotypes. A comprehensive
breakdown of comparative data exhibiting number of samples, population groups,
publication and/or in-house project and number of STR loci compared is included as
Appendix E. See Appendix E table S2.
Regarding matches to databases and published literature, the primary database
queried was the Y Haplotype Reference Database, the YHRD, which currently has
data compiled from 49781 haplotypes defined by seven major populations (which are
further divided into 20 subgroups). These include Eurasian, African, Afro-Eurasian,
East Asian, Amerindian, Australian Aboriginal, Eskimo Aleut, as well as an admixed
46
population which has equal contributions from different ancestral populations
(Willuweit and Roewer, 2007).
2.3.1.1 Y chromosome haplotype Networks
The association of haplotypes Y-haplogroups was examined within clans using
reduced-median-joining (RMJ) networks constructed using the program Network
version. 4.6.1.1 (Fluxus-engineering) (Bandelt et al., 1999). Reduced-median-joining
(RMJ) networks were done by calculating reduced-median (RM) and subsequently
median-joining (MJ) trees. The combined RM-MJ technique was used to reduce
network complexity as an RMJ network is often simpler than a pure MJ network,
because implausible parallelisms have been avoided, where additional star
contraction preprocessing has been used (Bandelt et al., 1999). For each haplogroup
modal, ancestral haplotypes were listed with variant haplotypes and submitted as
related sequences for processing using a reduction threshold (r = 2, by default)
(Bandelt et al., 1999). STR markers were weighted proportionally to the inverse of
STR allelic variance (Cruciani et al., 2004). Data was cleaned and readied as an input
file for the RMJ network calculations using Microsoft Excel as the tab-delimited texteditor. During analyses the epsilon parameter criteria (which increases reticulation
possibilities) was set to zero (Network 4.6.1 user guide; Bandelt et al., 1999).
2.3.1.2 Database Search queries
The Y-STR Haplotype Reference Database (YHRD) international forensic STR
reference database allows for the assessment of male population stratification among
world-wide populations as far as reflected by Y-STR haplotype frequency distributions
(Roewer et al., 2001). Y haplotypes were blasted to the YHRD database in search for
global haplotypic matches at the 17-loci level of resolution as well as at the “minimal
haplotype” eight-locus level of resolution. In addition, the Genographic Consortium Y
chromosome database and the HGDDRL Y-STR comparative dataset, as well as STR
profiles from unpublished data collected in the HGDDRL were blasted and queried for
haplotype matches against profiles from European, Asian, Eurasian, as well as subSaharan-African populations (which include South African samples of a number of
different ethnic backgrounds as well as samples from the Maldives, Zanzibar and
Madagascar).
47
2.3.2 Mitochondrial DNA analyses
The mtDNA haplogroups of all 198 subjects including the ten females sampled were
generated by using the online software prediction tool, Haplogrep, which operates by
comparing HVRI and HVRII variant sites to the mtDNA phylogeny available through
the PhyloTree platform (Build 16) (van Oven and Kayser, 2009). In addition, mtDNA
SNaPshot™ minisequencing was used for typing 17 coding-region SNPs to confirm
haplogroup identity. The regions of HVRI and HVRII which were sequenced ranged
from positions 15997-407 of the mtDNA molecule.
HVR sequences were aligned in BioEdit v 7.9, using the Clustal W algorithm (Hall,
1999) and variant sites were compared with the revised Cambridge Reference
Sequence (rCRS) (Andrews et al., 1999). The software program S-compare (Ronnie
Nelson, University of Pretoria) was used to identify and extract variant sites from the
alignment files. The accepted nomenclature used for mtDNA haplogroups is based on
that proposed by Behar et al., (2008), and van Oven and Kayser (2009) with recent
modifications as found on the Phylotree database (http://www.phyotree.org/).
Out of the 198 individuals originally sampled, 21 sibling pairs were observed by directly
counting them in the clan genealogies (Appendix C, Figures S1-S6). Duplicates from
these sib pairs were excluded from mtDNA analysis so as to avoid a bias in the data,
while the 51 individuals not affiliated with clans (including the ten females which were
sampled) were retained. This resulted in a subset of 176 individuals’ sequences which
were used in the subsequent analyses.
MtDNA haplotypes using HVR I and HVR II variant site data were used to construct a
phylogenetic tree using the Neighbour-Joining (NJ) method in the phylogenetic
software MEGA v5 (Tamura et al., 2007). The bootstrap consensus tree is taken to
represent the evolutionary history of the haplotypes analyzed, with the percentage of
replicate trees in which the associated taxa clustered together is shown next to the
branches. Branches corresponding to partitions reproduced in less than 50%
bootstrap replicates were collapsed. The tree is drawn to scale, with branch lengths in
the same units as those of the evolutionary distances used to infer the phylogenetic
tree. The evolutionary distances were computed using the Poisson correction method
and are in the units of the number of substitutions per site. All positions containing
48
gaps and missing data were eliminated. Phylogenetic analyses of Neandertal mtDNA
suggests that it diverged from the extant human mtDNA lineage on the order of
660,000 years ago, and that Neandertal mtDNA falls outside the variation of modern
human mtDNA (Green et al., 2008). Since the mtDNA genome is maternally inherited
without recombination, these results indicate that Neandertals made no lasting
contribution to the modern human mtDNA gene pool, thereby making it a suitable
sequence to include as an evolutionary out-group for phylogenetic analyses. The
Neanderthal mtDNA genome sequence made available through the NCBI, under
GenBank accession number: NC_011137.1 was included as the out-group in the
alignment (Green et al., 2008).
49
CHAPTER 3
Results
3.1
Y chromosome DNA studies
The Y chromosome results are presented three-fold: Firstly results are presented at
the haplogroup/SNP level indicating possible geographic locations of origin. The
results then proceed to discuss the associations of further resolved results at the
haplotype level in networks within haplogroups. Lastly, haplogroup/haplotype data is
consolidated with anthropological data of clan-genealogies and findings are presented
as well.
3.1.1 Y chromosome haplogroups
From the sample of 188 males including non-clan affiliated samples, 55.79% of Y
chromosomes segregate with non-African ancestry (Figure 3.1). When examining the
abeLungu and amaMolo clans only, the frequency of non-African Y chromosome
lineages escalates to 69.86% (Figure 3.1).
Individuals found to have Eurasian and European ancestry, had Y chromosomes
associated with haplogroups R1b (R-M343), R1a1a (R-M198), Q (Q-M242), G (GM201), I (I-M170) and J (J-M172). Haplogroup R1b was the most common haplogroup,
found at a frequency of 41.10% (Figure 3.1). This is the dominant haplogroup in
Western Europe and is also found in Eastern Europe and Western Asia (Kivilsild et
al., 2002; Campbell, 2007; Karafet et al., 2008; Chiaroni et al., 2009; Myres et al.,
2011; Raghavan et al., 2014). The second most frequently observed haplogroup,
R1a1a (R-M198), was found in 14.38% of the sample, is the dominant Y chromosome
lineage found in modern Eurasia, and is a prevalent in India and Eastern Europe and
the Caucasus region (Jobling and Tyler-Smith, 2003; Sengupta et al., 2006; Klyosov
and Rozhanskii, 2012). Haplogroup Q (Q-M242) is another haplogroup prevalent in
the Eurasian subcontinent, which was observed in 5.48% of clan-affiliated samples.
Haplogroup G (G-M201) is typically located in the Caucasus, the Middle East and
Southern Asia was found in two abeLungu Buku individuals (Cruciani et al., 2002;
Cinniog˘lu et al., 2004; Karafet et al., 2008). Haplogroup I (I-M170), appears in 8.65%
50
of samples and depicts Western European origins mostly from Britain. It is the
haplogroup which occurs in nearly 20% of the European male population, but has also
been found among populations of the Near East, the Caucasus, Central Siberia and
Northeast Africa (Hammer and Zegura, 2002; Karafet et al., 2008). Haplogroup J (JM172) was found in one individual; this haplogroup is found at high frequencies in the
Middle East, North Africa, Europe, Central Asia, Pakistan, and India (Underhill et al.,
2001; Semino et al., 2002; Behar et al., 2004; Sengupta et al., 2006).
Several haplogroups with origins found in the sub-Saharan-African region were
observed at relatively low frequencies in clans and at higher frequencies (some greater
than 20%) in non-clan affiliates (Figure 3.1; Table 3.1). These included haplogroups
A1b1b2a (A-M51), B2a1a1a1 (B-M152), E1b1a1 (E-M2), E1b1a1a1c1a (E-M191),
E2b1a (E-M85), and E1b1b1 (E-M35). The most frequently observed haplogroup in
these samples was haplogroup E-M85, which was observed in 20 of the non-clan
affiliated individuals. Only one non-clan affiliated individual featured non-African
haplogroup R1a1a (R-M198) and one featured haplogroup I (I-M170) (Table 3.1).
Table 3.1. Y chromosome haplogroup distribution for non-clan affiliated samples
Haplogroup
(n=42)
Frequency
B2a1a1a1 (B-M152)
2
4.7%
E1b1a1a1c1a (E-M191)
9
21.4%
E1b1a1 (E-M2)
8
19%
E1b1b1 (E-M35)
1
2.38%
E2b1a (E-M85)
20
47.6%
I (I-M170)
1
2.38%
R1a1a (R-M198)
1
2.38%
51
Figure 3.1. Phylogeny and frequency distribution of Y chromosome haplogroups observed i) in the abeLungu clans-people, as well
as ii) in the whole sample set, including non-clan affiliated samples. Nomenclature based on ISOGG (2016)
52
3.1.2 Y chromosome DNA haplotype variation
Altogether 78 unique haplotypes were derived from the sample of 188 male individuals
using the 19 Y-STR loci system (in the order DYS19, DYS385A, DYS385B, DYS388,
DYS389I, DYS389IIc, DYS390, DYS391, DYS392, DYS393, DYS426, DYS437,
DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATA H4), which are
summarised in Table 3.2(a) (non-African ancestry) and Table 3.2(b) (African
ancestry). To maximise the use of space for Tables 3.2(a) and 3.2(b), an abbreviated
haplotype-name format was used. For example, unique haplotype_4a within
haplogroup R1b (R-M343), is named “R343_4a”. Throughout the remainder of the text
however, haplotypes are named with the alphanumeric haplogroup name, followed by
the unique haplotype number, for example, ‘R1b_3’.
Network analysis permitted the examination of the relationships of haplotypes within
haplogroups R1b (R-M343), R1a1a (R-M198), I (I-M170), B2a1a1a1 (B-M152),
E1b1a1 (E-M2), E1b1a1a1c1a (E-M191) and E2b1a (E-M85) that had at least three
different haplotypes within them (Figures 3.2 - 3.8). Those haplogroups which did not
meet this criteria, like haplogroups G (G-M201), J (J-M172), Q (Q-M242), E1b1b1 (EM35) and A1b1b2a (A-M51), were excluded from network analyses. Note, in the
networks, branch lengths are proportional to the number of mutational steps between
haplotypes and the locus at which mutations are observed are shown. The size of the
circle is proportional to the number of individuals that have a particular haplotype and
nodes (positions in the network not found in the sample but which connects haplotypes
in the network) are shown as solid dots.
In addition to the haplotype networks, haplotype distribution within clans was also
examined in conjunction with genealogical information. In these genealogies
constructed from oral histories by Ms Janet Kalis, the genealogical information linking
extant males to their clan forefather, as well as their female relatives, including
(multiple) wives where applicable, are shown. Extended haplotype names were
superimposed next to individuals tested (Appendix C, Figures S1-S6). In these
pedigrees, triangles represent males, while circles are females, adopting the
nomenclature used by genealogists (which differs from that used by geneticists where
males are represented as squares). Filled shapes indicate sampled subjects.
Individuals with strikes through are deceased, which have been included to provide
53
clarity with kinship. Suffixes (lower case) letters are necessary to differentiate between
individual haplotypes under a modal haplotype.
There was no haplotype sharing observed among clan members with European and
Eurasian derived haplogroups, however several African haplotypes are found shared
between individuals among different clans, namely haplotypes E2b1a_2, E2b1a_4,
E2b1a_6, E2b1a_9, E2b1a_16, E1b1a1a1c1a_10, and B2a1a1a1_2 (Table 3.2(b)).
The presence of African haplotypes within clan patrilines can be attributed to genetic
input by males from other Xhosa clans of the neighbouring region.
Note: Figure 3.2(a) and (b) cover two pages each to ensure maximized resolution.
54
Table 3.2(a).Y chromosome STR haplotypes for the 13 abeLungu clans segregating with European
and Asian haplogroups R1b (R-M343), R1a1a (R-M198), Q (Q-M242), J (J-M172), I (I-M170) and G
(G-M201)
55
Table 3.2(a) continued .Frequencies of Y chromosome STR haplotypes for the 13 abeLungu clans
segregating with European and Asian haplogroups R1b (R-M343), R1a1a (R-M198), Q (Q-M242), J
(J-M172), I (I-M170) and G (G-M201)
56
57
Table 3.2(b). Y chromosome STR haplotypes within the 13 abeLungu clans segregating with African haplogroups E-M85,
E-M191, B-M152 and A-M51
E-M35, E-M2,
58
Table 3.2(b) continued. Frequemcies of Y chromosome STR haplotypes within the 13 abeLungu clans segregating with African
haplogroups E-M85,
E-M35, E-M2, E-M191, B-M152 and A-M51
3.1.2.1 Y chromosome variation linked with Eurasian origins: haplotypic
variation within the amaMolo
Acknowledging that matches to haplotypes are only as good as the data within
datasets and not representative of all Y chromosomes haplotypes found globally,
these data do support a non-African origin of Y chromosome haplotypes in the
amaMolo. Two distinct Eurasian haplogroups, namely haplogroup R1a1a (R-M198)
and haplogroup Q (Q-M242), were observed within the amaMolo clan genealogy
which spans 11 generations [Table 3.2(a) and Appendix C, (Figure S1)]. Five
haplotypes are associated with haplogroup R1a1a (R-M198) (Table 3.2(a)) and their
relationship with the modal haplotype R1a1a_2 are shown in Figure 3.2 below.
Figure 3.2. Haplogroup R1a1a (R-M198) RMJ network
Through comparison to the amaMolo clan genealogy we found that the R1a1a
haplotype was successfully transmitted in 52 out of 74 transmissions, with the modal
haplotype R1a1a_2 transmitted to 16 currently living amaMolo clan members
59
stemming from Bhayi. Similarly, the modal haplogroup Q (Q-M242) haplotype was
transmitted successfully in 18 out of 28 transmissions from Pita (Appendix C, Figure
S1). Using 19 STR marker haplotypes, the closest match to the modal haplotype
R1a1a_2 was found in an individual of Hungarian descent, which differed with single
mutations at three loci (Pamjav et al., 2011). When the YHRD minimal eight-STR
marker set was queried, identical matches were found in 19 individuals from mostly
Eastern European countries namely Croatia, Lithuania, Russia, Poland, Romania,
Slovakia and Ukraine as well as to individuals from Belgium, Germany, Norway and
the USA. In the amaMolo clan genealogy, haplotype R1a1a_5 is seen in an individual
described as the brother to an individual who segregated with the R1a1a_2 modal
haplotype (Appendix C, Figure S1). Although these both segregate under the R-M198
SNP marker the variation in the STRs between these two haplotypes is too different
for the individuals to be considered biologically related. Thus the presence of the
R1a1a_5 haplotype supports the theory of multiple contributions from several nonAfrican founders.
3.1.2.2 Haplotypic variation within the primary abeLungu clans
The Y chromosome data reaffirms the historic and genealogical information regarding
the multiple contributions of the male founders to the primary abeLungu clans as well
as clans within the broader abeLungu clan family. The primary abeLungu clans are
those which are believed to have first originated from non-African shipwreck survivors
and include abeLungu Jekwa, Buku and Hatu.
Within clan Jekwa, 13 haplotypes were derived; seven of which were within
haplogroup R1b (R-M343). Majority of the clan Jekwa members appear closely
associated to the modal haplotype R1b_5 in a star-shaped phylogeny, suggestive of
recent expansions (Figure 3.3, Table 3.2(a)). The modal haplotype, R1b_5 was
successfully transmitted in 81 out of 100 transmissions in the abeLungu Jekwa clan
genealogy, which spans 12 generations tracing back to the clan forefather, Jekwa
(Appendix C, Figure S2).
Two inconsistencies were observed in the transmission of Y chromosome lineages in
the Jekwa clan genealogy (demarcated with red squares in Appendix C, Figure S2).
In both instances, siblings segregated with African Y chromosome haplotypes
60
(E2b1a_9 and E2b1a_5) who were supposedly born from fathers segregating as
European haplogroup R1b (R-M343). These are accounted for as non-patrilineal
transmissions.
AbeLungu R1b haplotypes appear as uniquely segregating samples when examined
against comparative haplotype data, as well as to the YHRD. Haplotype R1b_5 did
not exactly match any of 33 Eastern Europeans (Jarve et al., 2009), nor any of 112
Hungarians genotyped in a study by Pamjav et al., (2011). When compared to data
published from studies done by the HGDDRL using 17 Y-STR marker haplotypes,
haplotype R1b_5 only partially matched to a single coloured-male sample from
Uitenhage in the Eastern Cape, having differed by single-step mutations at seven loci.
When the YHRD minimal eight-STR marker set was queried, identical matches were
found in eight individuals who were found in Spain, Switzerland, Germany, the Czech
Republic, the United Kingdom and the United States.
Descendants of the Hatu clan were linked with haplotype R1b_18 which features in
10 abeLungu Hatu individuals, which was successfully transmitted in 33 out of 38
paternal transmissions in the Hatu clan genealogy which spans 12 generations
(Appendix C, Figure S4). R1b_18 and its associated haplotype, R1b_17, (which
deviates from the modal haplotype by a single-step mutation at Y-STR locus DYS389
II) make up the second most prominent cluster in the R1b network and are more
closely related to each other than to the R1b haplotypes observed in other clans
(Figure 3.3 and Appendix C, Figure S4).
Two abeLungu Hatu individuals segregated with African Y chromosome haplotypes
(one, a haplogroup E2b1a individual, and the other, a haplogroup E1b1a1 individual)
were seen allegedly born from a haplogroup European R1b father. These too are
regarded as non-patrilineal transmissions which may have been introduced from
males of African origin from neighbouring clans (Table 3.2(b) and Appendix C, Figure
S4).
61
Figure 3.3. Haplogroup R1b (R-M343) RMJ network
62
Two out of three abeLungu Buku clan members shared exactly the same Eurasian
haplogroup G (G-M201) haplotype (Table 3.2(a)), which was transmitted successfully
in six out of nine paternal transmissions in the Buku clan genealogy. It is unsure as to
exactly how many generations there are before the Buku lineages trace to the original
founding father, since genealogical information was incomplete for abeLungu Buku,
but it is estimated to be at least four generations (Appendix C, Figure S5). Closest
matches of abeLungu Buku haplogroup G (G-M201) haplotypes were to a BetsileoMalagasy individual, a coloured-male sample from Uitenhage as well as a partial
match to an Egyptian individual, differing at seven loci, found in HGDDRL in-house
data from various projects (see Appendix E, Table S3: comparative data sources).
3.1.2.3 Haplotypic variation within the secondary abeLungu clans
Non-African ancestry was also observed in clans other than the three initial founder
abeLungu clans and the amaMolo. According to the oral history, these secondary
clans originated from founders who arrived in Pondoland at later points in time than
the three original abeLungu founders Jekwa, Buku, Hatu and the founders of the
amaMolo.
Three Ogle clan members originated from two R1b haplotypes, R343_8 and R343_14,
which segregated as lineages distinct from each other and from the majority of R1b
(R-M343) haplotypes, which may imply that these were two independent founders
(Figure 3.3; Table 3.2(a); Appendix C, Figure S5).
The haplotypes R343_16 (and its related variant, R343_15, differing at one STR
position) form the lineage tracing back six generations to Kristjan Caine, the founder
of the Caine clan (Figure 3.3, Table 3.2(a) and Appendix C, Figure S3). The two Irish
clan members featured a distinct R1b haplotype, R343_9, tracing back to its founder,
Irish, who lived only two generations prior to current clan members, which supports
the theory of multiple founding events having occurred at later points in time (Figure
3.3, Table 3.2(a) and Appendix C, Figure S3).
Three Sukwini samples emerge in the cluster of haplotypes including R343_10,
R343_11 and R343_12 (Figure 3.3, Table 3.2(a) and Appendix C, Figure S6).
Genealogical information for these three individuals is unknown so we cannot assert
relationship of these.
63
Upon examining the Horner clan genealogy two distinct Eurasian lineages were
observed (Appendix C, Figure S3). The R1b haplotype R343_2, stems from the son
of Alfred Horner’s first marriage, Johnson (Appendix C, Figure S3 and Figure 3.3). The
wives of his subsequent marriages bore the sons Ramsay, Charlie and Teddy, all of
which who segregate with haplogroup I (I-M170) Y chromosomes, with the modal
haplotype I170_5, and haplotypes I170_4 and I170_6 deviating by single mutations
(Figure 3.4 and Appendix C, Figure S3). This may signify two lineages within the clan,
introduced by separate founders. A second haplogroup I-M170 lineage, I170_3,
distinct from those found in abeLungu Horner was found within the abeLungu France
clan, tracing back five generations to the France clan founder, Tshali (Appendix C,
Figure S5 and Figure 3.4).
Figure 3.4. Haplogroup I (I-M170) RMJ network
64
When examined against comparative data the two haplogroup I (I-M170) modal
haplotypes were found to have several matches. Haplotype I_3 found in four France
clan members, was found to partially match two individuals from Archangelskaja and
Vologodskaja respectively, found published in Roewer et al., (2008). Haplotype I_5,
which was present in the one of the four Horner individuals was found to partially
match individuals from Noworgodskaja and Vologodskaja respectively as well.
Haplotype I_3 of clan Horner partially matched a coloured individual from Uitenhage
in the Eastern cape, and haplotype I_5 (also of clan Horner) partially matched a Xhosa
individual from the Eastern Cape, as well as to a Hungarian individual whose
haplotype data was found published in Pamjav et al., (2011).
To attempt to trace the clan name origins of the founders of the Irish, Horner and
France clans, several publications and databases were referred to. These included
the Type-III Irish Surname-Haplotype Reference Database, as well as the
genealogical studies by McEvoy and Bradley (2006), McEvoy et al. (2008), Klyosov
(2009) and King and Jobling (2009), who examined Y-chromosomal haplotype data of
Irish and British surnames. The two abeLungu-Irish clan members possessed
haplogroup R1b (R-M343) haplotypes which almost completely matched the ancestral
Irish R1b haplotype identified in Klyosov (2009) and McEvoy et al., (2008). The
haplotypes differed to the published sequence by only single mutations at Y-STR loci
DYS390 and DYS439, thereby allowing us to infer that these individuals have Irish
ancestral origins.
The Type-III Irish surname-haplotype reference database lists haplotypes for the
whole haplogroup-I tree and includes all sub-clades, most of which are genealogically
mapped to lineages of Irish surnames. The Jim Cullen sub-haplogroup I predictor,
available through the site, was used to query the five haplogroup I (I-M170) haplotypes
which had been resolved. Results showed that haplotypes I_1, I_2 and I_3 (found in
clan France) fall into the I-M253 sub-clade, while haplotypes I_4 and I_5 (both found
within clan Horner) segregate under the I-S24 sub-clade with 100% confidence. None
of these haplotypes segregated under the Dalcassian R-L226 cluster of haplogroup I
(I-M170) from Clare, Limerick and Tipperary - which is the predominant lineage of
Ireland - however they did match haplogroup I (I-M170) haplotypes found at high
frequencies in Ireland and Western Europe. Therefore, considering their Y
65
chromosome haplogroups of western European origins (haplogroups R1b (R-M343)
and G (G-M201)), it is likely that the forebears of clans Hatu, Jekwa and Buku were
from the British Isles. It is not possible to say definitively whether these forefathers
were the three English men discovered with Bessie (Crampton, 2004), as there is no
genealogical evidence or biological link to her maternal lineage, with which to verify a
common geographical ancestry.
The sole haplogroup J (J-M172) haplotype observed in the sample set was of the
single abeLungu Thaka clan member. The sample was found to partially match a
Vologodskaja and Smolenskaja haplogroup J haplotype, differing by single-step
mutations at eight Y-STR loci, published in Roewer et al., (2008). This haplotype also
partially matched a Maldivian individual and a Malagasy individual genotyped in
studies undergone by the HGDDRL. When examining truncated 12-loci haplotype
data, the J-M172 haplotype was found to match one of 70 Malaysian-Indian
haplotypes published in Pamjav et al., 2011.
Fuzwayo, Hastoni and Sukwini clan members featured predominantly African
haplotypes (Figures 3.5, 3.6, 3.7 and 3.8, and Appendix C, Figure S6).
The judicious evaluation of genealogical history against molecular data has allowed
for the determining of the ancestral haplotype present in each clan which could
convincingly be traced to the clan progenitors documented in clan genealogies (Table
3.3).
66
Table 3.3: Haplotypes and presumed geographic origin of abeLungu clan male founders
Clan
Founder
Haplogroup
Ancestral
Presumed
haplotype
geographic
origin
amaMolo
Bhayi
R1a1a
R1a1a_2
Eurasia
*Most likely
India
Jekwa
Pita
Q
Q_1
Eurasia
Jekwa
R1b
R1b_5
western Europe
*Most probably
British Isles
Hatu
Hatu
R1b
R1b_18
western Europe
*Most probably
British Isles
Buku
Buku
G
G_2
Eurasia
Ogle
Ogle
R1b
R1b_14
western Europe
*Most probably
British Isles
Caine
Kristjan
R1b
R1b_16
western Europe
*Most probably
British Isles
Irish
Irish
R1b
R1b_9
western Europe
*Most probably
Ireland
Horner (Alfred)
Johnson
R1b
R1b_2
western Europe
*Most probably
British Isles
Ramsay, Charlie
I
I_5
Europe
I
I_3
Europe
and Teddy
France
Tshali
67
Thaka
Thaka
J2
J2_1
Eurasia
*Middle-East
Fuzwayo
Fuzwayo
B2a1
B2a1a1_3
Southern African
Hastoni
Hecton
E2b1
E2b1a_6
Southern African
E2b1
E2b1a_9 &
Southern African
(Hastoni)
Sukwini
Chwama
E2b1a_11
68
3.1.2.4 Y chromosome variation linked with African origins
African haplotypes were found distributed predominantly among non-clan affiliated
samples, but are observed sparsely throughout the clan-affiliated sample as well.
African haplotypes were the only haplotypes to be found shared across clans, and
non-clan affiliates alike. The complexity of the African haplogroup networks is
attributed to the fact that haplotypes within African haplogroups are separated by many
mutational steps, indicating multiple contributions. The RMJ networks of all nonAfrican or Eurasian haplogroups do not feature reticulations and are all treelike forms
based on star-contraction phylogenies of satellite variants emerging from an ancestral,
founder, modal haplotype (Richards et al., 1998; Bandelt et al., 1999). However, the
RMJ networks for the African haplogroups E1b1a1a1c1a (E-M191), E2b1a (E-M85)
and
E1b1a1
(E-M2)
do
feature
reticulations
which
signify
ambiguous
connections/relations between haplotypes (Bandelt et al., 1999) (Figures 3.5, 3.6 and
3.7). This introduces doubt in the order of introduction of mutations, topology of the
network and consequently the chronology of the introduction of haplotypes, and is
indicative of more ancient, deeply-rooted haplotypes featuring greater diversity
(Richards et al., 1998).
The remaining haplotypes found within the abeLungu Jekwa clan are of African
descent, which included five E2b1a (E-M85) haplotypes. One haplogroup B2a1a1a1
(B-M152) and one haplogroup E1b1a1 (E-M2) haplotype was also observed in the
abeLungu Jekwa pedigree (Table 3.2(b) and Figures 3.7 and 3.8). These were most
likely due to non-patrilineal transmissions resulting in gene-flow and admixture from
the indigenous Xhosa gene-pool.
Three amaMolo clan members, one Fuzwayo clan member as well as two Sukwini
members segregate with haplogroup E1b1a1a1c1a (E-M191) haplotypes, while a
larger proportion of haplogroup E-M191 chromosomes are present in non-clan
affiliated samples, appearing as satellite haplotypes in the complex and diverse
haplogroup
E1b1a1a1c1a
(E-M191)
RMJ
network
(Figure
3.5).
Haplotype
E1b1a1a1c1a_10 was found to have matched the haplotype of a Khoisan individual
from Upington in the Northern Cape (haplogroup/haplotype data from research done
in the HGDDRL).
69
Figure 3.5. Haplogroup E1b1a1a1c1a (E-M191) RMJ network
70
Haplogroup E2b1a (E-M85) chromosomes appeared with more diversity. The largest
proportion of which were present in non-clan affiliated samples (39.0%), with seven
individuals identically sharing haplotype E2b1a_9 (Table 3.2(b) and Figure 3.6).
Haplotype E2b1a_9 is a modal haplotype within haplogroup E2b1a (E-M85), was seen
having given rise to 12 variant haplotypes, and pre-dates the arrival of non-African Y
chromosomes (Figure 3.6). This haplotype was also exactly shared by one individual
in the clan Buku, one in the amaMolo clan, one in abeLungu Jekwa and one in Sukwini
(Figure 3.6). Haplotype E2b1a_6 is also found shared within the amaMolo clan and all
of the members of the Hastoni clan (Figure 3.6). Haplotype E2b1a_9 also matched a
coloured individual from Middleburg in the Eastern Cape, genotyped in a previous
study by the HGDDRL unit. A contingent of haplogroup E2b1a (E-M85) haplotypes
were also present in all five members of the Hastoni sample, and were also found in
four Sukwini samples.
Figure 3.6. Haplogroup E2b1a (E-M85) RMJ network, featuring two reticulations marked A and B
71
Haplogroup E1b1a1 (E-M2) haplotypes featured predominantly in non-clan affiliates,
but were observed sporadically in several clans as well (Figure 3.7). Haplotype
E1b1a1_8, found in three Sukwini samples, was found to have matches to an
individual from Northern Chad (unpublished data), as well as to two Malagasy
individuals.
Figure 3.7. Haplogroup E1b1a1 (E-M2) RMJ network
72
Four of six clan members of the Fuzwayo clan possessed haplogroup B2a1a1a1 (BM152) chromosomes (Appendix C, Figure S6 and Figure 3.8). Haplotype B2a1a1a1_3
matched a Khoisan sample, sampled in Riversdale in the Eastern Cape. Lastly, the
haplogroup A1b1b2a (A-M51) haplotype, A1b1b2a_1, shared by two Sukwini subjects
matched to a Northern Cape Khomani individual as well as a coloured male from
Uitenhage in the Eastern Cape.
Figure 3.8. Haplogroup B2a1a1a1 (B-M152) RMJ network
73
3.2 Mitochondrial DNA findings
3.2.1 MtDNA haplogroups
All maternal lineages of clan members and non-clan affiliated individuals are
exclusively of African origin, with the majority appearing within haplogroups L0d
(32.34%) and L3e (23.75%) (Table 3.4). The frequencies of the major macrohaplogroups resolved are L0 (44.44%), L1 (2.02%), L2 (19.7%), L3 (33.35%) and L4
(0.5%). The frequencies observed for mtDNA sub-haplogroups are presented in Table
3.4. The distribution of haplogroups by clan are also described in the charts of Figure
3.9 below.
The frequency of the L0 clades are up to 40% in different south-eastern Bantuspeaking tribes (Schlebusch et al., 2009; Schlebusch et al., 2011). Haplogroup L0d is
thought to be the oldest of the L0 clades and its distribution in southern Africa strongly
points to an origin among Khoe-San ancestors, which occurred prior to the arrival of
Bantu-speaking populations in southern Africa, and is found in the !Xun and Khwe
peoples at frequencies of 51% and 16% respectively (Salas et al., 2002; Schlebusch
et al., 2009). Haplogroup L3e comprises approximately one-third of all L3 types in subSaharan Africa and is the most widespread, frequent and ancient of the African L3
clades, which arose in Central Africa near Sudan around 35,000 years ago (Bandelt
et al., 2001; Soares et al., 2011).
74
Figure 3.9. Distribution of mtDNA haplogroups by clan
75
Figure 3.9 continued. Distribution of mtDNA haplogroups by clan
76
Table 3.4. MtDNA haplogroup frequencies
77
3.2.2 MtDNA haplotype diversity
The evolutionary relationships of the African mtDNA haplotypes are depicted in the
neighbour-joining (NJ) tree, drawn using the mtDNA HVR I and HVR II sequences
(Figure 3.10). MtDNA haplotype sharing between clans is observed to a much larger
degree than with Y chromosome haplotypes, where none of the non-African Ychromosome haplotypes were seen shared between clans and only several African Ychromosome haplotypes were found shared among clans.
From the 198 individuals studied, there are 117 unique mtDNA haplotypes observed,
23 of which were found to be shared between multiple individuals - both within- and
among- clans (Appendix D, Supplementary Table S2). Sixteen of these haplotypes
were found to be shared between individuals from different clans and in 14 instances
haplotypes were shared with non-clan affiliated samples as well. The two most
frequent haplotypes are those which appear in a haplogroup L3e2b clade with HVR
mutations 16172C, 16183C, 16189C, 16223T, 16320T, 16519C, 73G, 150T, 152C,
263G (appearing at a frequency of 0.068); and a clade within haplogroup L3e2b
featuring the variants 16129A, 16187T, 16189C, 16212G, 16223T, 16230G, 16243C,
16311C, 16390A, 16519C, 73G, 146C, 152C, 195C, 198T, 247A, occurring at a
frequency of 0.0625 (Appendix D, Supplementary Table S2). Out of the 21 sibling
pairs, two pairs did not share the same parental haplotypes - a clan France siblingpair segregated with haplogroups L0d2a and L0d1a, while a amaMolo sibling-pair
segregated under haplogroups L0d1a and L3e2b suggestive of non-maternity,
alternative modes of kinship or adoption (Appendix C, Figures S1 and S5).
78
Figure 3.10. Neighbour-Joining (NJ) phylogenetic tree of 176 mtDNA haplotypes
79
Depicted in the mtDNA-haplotype NJ tree are distinct haplogroups seen clustering in
the correct evolutionary topology; all abeLungu mtDNA haplotypes cluster alongside
the Neanderthal mtDNA out-group (Figure 3.10). Haplogroup L0 sequences are seen
branching off into their sub-clades, while L2 sequences are coupled near L3
sequences which cluster alongside the revised Cambridge Reference Sequence
(rCRS). The rCRS is defined by its variant sites as a haplogroup H sequence, which
clusters alongside haplogroup L3e sequences - its closest related sequence. This is
also consistent with the correct evolutionary history of mtDNA ancestry markers as
haplogroup L3 is the clade which migrated north during the Out of Africa expansion,
and became every other haplogroup (Behar et al., 2008). Clades where multiple
individuals share exactly the same haplotypes have been collapsed, with the
frequency of individuals sharing the specific haplotype indicated in brackets (Figure
3.10).
80
CHAPTER 4
Discussion
4.1 Y chromosomes and genetic heritage
Y chromosome markers were used to trace the paternal ancestry of the abeLungu
from the Wild Coast region of the Eastern Cape. This study was conceived following
on discussions with Ms Janet Kalis, an anthropologist who has been conducting
genealogical and historical research among the abeLungu, to test the claims based
on their oral history about their “White” ancestry. Our laboratory subsequently
engaged in collaborative research with Ms Kalis to use genetic tools to test and/or
refine the oral and historical narrative. In this study Y chromosomes in a subset of
males representing various clans among the abeLungu were tested to derive their Y
chromosome profiles, which were then used to trace the most likely geographic region
of origins of their Y chromosomes. Given that the abeLungu are a patriarchal society,
Y chromosomes that are transmitted from fathers to sons, ought to segregate within
clans. Therefore, in the first part of the study, Y chromosome data was used to test
the oral history of the abeLungu, and also used to judiciously examine the transmission
of Y chromosomes within the clans to assess genealogical relationships. Apart from
determining the geographic regions of origins for the abeLungu, through the further
analysis of microsatellite haplotypes of both within- and between- clans, as well as
between clan members and Xhosa non-clan-affiliates, we were able to measure a
degree of population sub-structure which displays typical diversity patterns of founder
populations.
4.1.1 Y chromosomes and the founding fathers of the abeLungu
Throughout, the transmission of haplotypes generally remains consistent with the
transmission clan name in the genealogies. However, the oral history which has been
transferred across generations, depending solely on the memory of the present day
individuals representing it, has exhibited ambiguity and distortion over time in the
names, chronology and relations of clan members to their ancestors. This study, which
made use of Y chromosome DNA and mtDNA data in conjunction with clan affiliated
81
genealogical data, was used to refine several anthropological questions pertaining to
the history of the abeLungu.
Following SNP analysis of 146 abeLungu clan members, we were able to resolve their
Y chromosomes using their global distribution patterns into African (30.12%) and nonAfrican (69.86%) derived Y chromosomes. The commonest haplogroup was R1b (RM343) which was found at a frequency of 41.10% (Figure 3.1). The amaMolo are
associated with two haplogroups; the first an Eastern European haplogroup, R1a1a
(R-M198), and the second with haplogroup Q (Q-M242) of West Asian origin. While
the abeLungu (Jekwa, Hatu, Buku) and the amaMolo are considered to be the earlier
mixed-race clans of the mPondoland, there are other mixed-race clans whose origins
stem from more recent non-African contributions into the mPondo gene pool (Soga,
1930; Crampton, 2004). They are represented by four France clan members as well
as six Horner individuals in our study who were found to have haplogroup I (I-M170)
chromosomes, and two-thirds of Buku clan individuals segregated with haplogroup G
(G-M201) chromosomes, which provides further evidence for non-African origins. A
single Thaka clan member presented a haplogroup J-M172 profile which matches to
two Eastern European individuals of Semitic origin in the YHRD, but does not contain
the extended-CMH nor the CMH (the Cohen Modal Haplotype defined by 12 specific
YSTRs originating from the Kohanim, who were the Jewish high priests) as described
in Soodyall, (2013).
Two lines of evidence demonstrate a remarkable relationship between Ychromosomal haplotypes and patrilineally-inherited cultural markers (clan names) the low within-clan diversity and high non-African haplotype-sharing within abeLungu
clans (Tables 3.2(a) and 3.2(b), as well as Figures 3.2 – 3.8), as well as a high degree
of haplotypic variance observed between clans, and in particular between Xhosa nonclan affiliates and clan members (Tables 3.2(a) and 3.2(b), as well as Figures 3.2 –
3.8). It also demonstrates the powerful male-specific founder effects of the European
and Eurasian castaways.
Regarding the rarity of haplotype matches to databases and published literature, the
primary database queried was the Y Haplotype Reference Database, the YHRD,
When abeLungu Y-haplotypes were queried against all populations of the YHRD using
the maximum number of STRs, it proved difficult to find matches and abeLungu modal
82
haplotypes were found to cluster independently from nearest-match haplotypes, thus
indicating their rarity and uniqueness.
Upon examining the abeLungu clan genealogies in conjunction with haplotype data
several irregular aspects were discovered, while others were clarified. Clans which
featured Y chromosome haplogroups that could conclusively be traced back to nonAfrican founder individuals include the amaMolo, as well as the older abeLungu clans
Jekwa, Hatu and Buku. Non-African ancestry also featured in the more recently
established abeLungu clans Ogle, Caine, Irish, Horner, France and Thaka, however
the forebears of these clans were introduced from Europe and Eurasia at a later time
than that of the three primary abeLungu and amaMolo clans viz. variable time-depths
of clan genealogies [Tables 3.2(a) and 3.2(b); (Appendix D; Figures S1-S6)].
European haplogroup R1b (R-M343) features in the primary abeLungu clans Jekwa
and Hatu, as well as in the more recently established clans Caine, Irish, France,
Horner, Ogle and Sukwini. Similarly, Eurasian haplogroups R1a1a (R-M198) and Q
(Q-M242) feature in the amaMolo clan. European (and probably British) haplogroup I
(I-M170) is found in France and Horner clan members while haplogroup G (G-M201)
Y chromosomes feature in present-day clan Buku members, which indicates a more
Middle-Eastern or South-West Asian ancestry. The fact that the vast majority of
abeLungu and amaMolo clan members present with European and Eurasian ancestry
ultimately validates the cherished, but damaged narrative of their origins passed down
for ten-some generations, which states that their forebears were originally from very
distant shores.
Conflicting versions of the oral history exist about the origins of the three men who had
arrived with Bessie at Lambasi Bay (Soga, 1930; Crampton, 2004). A certain degree
of clarity on the relations of castaways has been achieved, where previously there was
suspicion from contradictory oral-historical details. Older beliefs were that they were
black or Indian men, where in more recent recollections they are considered white
(Soga, 1930; Crampton, 2004). The primary abeLungu clan founders Jekwa, Hatu and
Buku, were thought to share a common, white ancestry, and they were believed to
have survived the same shipwreck (Soga, 1930; Crampton, 2004). Crampton, (2004;
p12), noted that “Theories of the clan’s origins are linked to the story of the arrival of
83
a young girl named Bessie on the Wild Coast along with three white men…the
abeLungu proclaim that they are descendants of white European castaways...”.
Soga (1930) described abeLungu clan forefathers Buku and Jekwa, as both being
descendants of Mbomboshe, from which we would expect their Y chromosome
profiles to be the same. On the contrary, our data reveals that Jekwa clan individuals
segregated with R1b (R-M343) Western-European ancestry, while the haplogroup G
(G-M201) chromosomes of Eurasian origin genotyped in 66% of abeLungu Buku clan
members, illustrates differing geographical origins and so, Buku was not a bloodrelative to Jekwa through Mbomboshe as perceived in the oral history.
Soga (1930) had also described Hatu and Jekwa as clans having two independent
origins from shipwreck survivors. Support for this claim is provided in the relationship
of Y-haplotypes in the network for haplogroup R1b (R-M343), in which a clear
separation with at least 10 mutational step differences occurring between haplotypes
of individuals from the Jekwa and Hatu clans was observed (Figure 13, Table 11a and
Appendix C, Figure S2 & S4).
Further support to Soga’s theory of the original founders being white castaways can
be associated with the abeLungu clan-founder names. Kirby (1953) and Soga (1930)
had noted that corruption of language through translation seems evident in the
interaction between isiXhosa speaking individuals and Dutch speaking as well as
English speaking individuals, primarily brought about through the phonetic linguistic
differences of these languages. What this means is that it is possible that mPondo
clans-people could have distorted the pronunciation of names of castaways and terms
from other languages like Dutch and English, since isiXhosa has borrowed words from
Khoisan click-sounding languages (Soga, 1930; Knight et al., 2003; Schlebusch et al.,
2009; Schlebusch et al., 2011). If abeLungu ancestor names were derived from
corrupted Anglicised roots, then the inherited names of the three English men may
have been “Xhosa-ised” versions of their English names. Crampton (2004) suggests
that Badi may have originally been Willem Billyert (or Bill Elliot), Hatu may have been
Hendrik Clarke (or Henry Clark) and Jekwa most probably was Thomas Miller - the
three men who had first assimilated with mPondo clans, and the same men who were
believed to have arrived on the Wild Coast with Bessie.
84
The fact that the descendants of Hatu and Jekwa share differing haplogroup R1b (RM343) haplotypes, illustrates that these two patrilines stem from independent founder
individuals originally from Western Europe, ultimately validating a portion of Soga’s
research. We thus infer that Jekwa, Hatu and Buku are in fact three independent
founders who do not share a common ancestry, but who most probably arrived on the
Wild Coast having sailed from the British Isles and the Middle-East, quite possibly
aboard the same ship.
4.1.2 The amaMolo and their affiliation with the abeLungu
While socially and culturally the amaMolo are synonymous with the abeLungu, and
while the forebears of these clan families both retain non-African ancestry, they
originate from different geographical populations. The abeLungu clan-families
segregate with a Western-European background, chiefly under haplogroup R1b (RM343), while the amaMolo clearly segregate with a more Eurasian ancestral
background, bearing haplogroups R1a1a (R-M198) and Q (Q-M242). Only several
indigenous, deeply-rooted African haplotypes are seen shared between individuals of
the abeLungu and amaMolo clan families - most of which had been introduced via
gene-flow through extra-marital relations of abeLungu women with local Xhosa men
of the surrounding region; however there has been no evidence of shared European
or Eurasian haplotypes between the clan families. And so, we may conclude that the
amaMolo have emerged from independent lineages of founders different to those of
the abeLungu clan.
Soga’s genealogical studies (1930) state that the ancestors of the amaMolo are
believed to have originally come from either India or Malaysia. More recent accounts
however, state that the original amaMolo forebears were white Europeans (Crampton,
2004). The assumption of the amaMolo’s Indian origins however, is supported through
the presence of R1a1a (R-M198) haplotypes in the Bhayi lineage of the amaMolo although the exact geographical origin cannot be determined conclusively, as the
distribution of haplogroup R1a1a is relatively broad in Western Asia and Eastern
Europe (Klyosov, 2009). Previous studies by Sengupta et al., (2006) and Underhill,
(2010) on the Indo-European language family also show that haplogroup R1a1a (RM198) features in a large proportion of Indian males, which adds support to the notion
of Indian ancestry and Soga’s version of the amaMolo historical narrative. This may
85
also validate the beliefs held by Soga and Kirby that ‘Molo’ may be derived from the
word ‘Moor’, pertaining to Indian Lascar slaves (Soga, 1930; Kirby, 1953).
The haplogroup Q (Q-M242) modal haplotype observed in eight amaMolo clan
members delineates a second ancestral lineage of Asian descent, originating from
another progenitor, Pita (Appendix C, Figure S1). There is a 100% effective father-son
transmission rate for descendants of the haplogroup Q modal haplotype. The Asian
haplogroup Q (Q-M242) is believed to have arisen in South Central Siberia, around
the Altai Mountains area, between 17,000 and 31,000 years ago (Zegura, 2004).
Since there are two lineages which segregate with different backgrounds within the
amaMolo clan, this clarifies that Bhayi and Pita were not biological siblings, but rather
two distinctive founders, from Eastern-Europe/Eurasia and Asia respectively
(Appendix C, Figure S1). These results also lend further support to Soga’s claims that
the amaMolo have black or Indian, rather than white ancestors.
4.1.3 Multiple founding events
The detection of Y-SNP markers like M170 and M343 in more recently established
abeLungu clans, with a high degree of inter-clan haplotypic variance, proposes that
clans originated from multiple founders of independent founding events and
shipwrecks. The frequency of mutations in haplotypes reflects the time-scale from the
point of divergence of castaways from their population of origin to their establishment
in the Eastern Cape. Anthropological evidence agrees with molecular data in
supporting the theory that secondary waves of non-African shipwreck survivors also
reached the Wild Coast, assimilated with the mPondo community and became
founders of other abeLungu clans at later points in time (Appendix C, Figure S3, S5
and S6). This is because the genealogies of more recently established clans are
proportionally shorter - having a decreased time-depth with fewer generations dating
back to respective clan founders than the genealogies of the three primary abeLungu
and amaMolo clans (Soga, 1930; Kirby 1953; Crampton, 2004). The more recently
established clans Caine, Irish, Ogle, Horner, France, Hastoni, Fuzwayo, Sukwini, and
Thaka feature genealogies which average three to five generations to the point where
their clan founders had arrived in mPondoland, while the older abeLungu clan
86
genealogies (Hatu, Jekwa, Buku and amaMolo) trace back on average ten
generations to their respective founders.
4.1.4 Clan-affiliated Africans
The presence of ~30% of sub-Saharan African haplogroups A-M51, B-M52, E-M2, EM85, E-M191, and E-M35 in abeLungu clans may have been as a result of differential
gene flow and/or admixture from Y chromosomes of neighbouring, non-abeLungu
Xhosa clans - which acts to diminish non-African ancestry diversity, as well as the
degree of clan-name/haplotype coancestry. Input into the mPondo gene-pool by nonAfrican founders is restricted to abeLungu clan affiliates. This is shown by the general
absence of non-African haplogroups, and a high frequency of African haplotypes in
the non-clan affiliated individuals (n=42), which were sampled from the surrounding
regions of abeLungu clan lalis (homesteads). Only few African haplotypes were seen
shared amongst Xhosa non-clan-affiliated individuals (Table 3.2(b)). These clanaffiliated individuals which possess African haplotypes, hence “African Africans”.
These are also examples of possible non-patrilineal transmissions.
Given that kinship relations and genealogical information were not sourced for nonclan affiliated samples, it is possible that cousins or other male relatives were included,
resulting in higher type-sharing of African haplotypes which are common to the region
and its cohabiting populations. Fuzwayo, Hastoni and Sukwini are three abeLungu
clans which exhibit almost entirely African ancestry. Only three Sukwini individuals
carried Western-European haplogroups (Table 3.2(a); Appendix C, Figure S6). This
brings into question the true ancestral identity of these clans, and whether they actually
had non-African ancestry in their patriline to begin with, which may have been diluted
out through various processes such as non-patrilineal transmissions - or whether the
notion of their foreign ancestry was an artefact of their cultural associations and
geographical proximity with other abeLungu clans.
Alternatively, this could be an example of other Y chromosome patterns of
transmission representing the social practices and customs of the mPondo. It is not to
be taken for granted that we have an exact measurement of father/son pairs, because
culturally, what may be interpreted as a father/son relation, may not be the case
87
biologically. Regarding terms of kinship in the amaXhosa, the term brother (“bhuti” in
isiXhosa) has a broader use and a greater social impact in Xhosa culture than in
Western society, which generally uses the term to describe a biologically related male
sibling. In Xhosa culture, a brother is the son of a man related to an individual’s father
through paternal kinship, which includes an individual’s brother, uncle, cousin, or a
man with a shared father's clan name. Bhuti is also a term used for showing respect
for hierarchy by age and to address an older male individual. Bhuti also refers to a
young man who has returned from initiation school, and may also be used when
referring to another man who had undergone initiation prior to him (Young and
Jackson, 2011). This may create conflicting inheritance patterns for inferring and
understanding relations when Y data are checked against genealogical data. These
types of scenarios, in which social relatedness and biological relatedness do not
correlate, need to be addressed delicately. In the event of such a situation, the Chief
of the clan was informed prior to the individual, as they are in each instance aware of
all relations and have knowledge of any “illegitimacies”.
4.1.5 Factors which shape clan diversity
Factors affecting clan diversity, which are inherent to small groups of individuals, may
lead certain haplotype lineages to extinction and thus reduce the Y chromosome
genetic diversity, while others may act to increase the Y chromosome diversity withinand between- clans (Chaix, 2007).
One limiting factor was the sampling geography - where sampling of families in remote
locations was more often than not the circumstance, and there was great difficulty in
assembling subjects at one ___location for sampling. Translation was necessary to
convey the premise of the study, but the process was time-consuming, which limited
the number of families which could be sampled in a day.
Other factors which shape diversity include non-patrilineal transmissions (NPTs) and
multiple founders for names (where individuals with the same clan-name have
different haplotypes). Progeny having a certain ancestral background, with their
fathers exhibiting a different one, are most likely a result of illegitimacy, non-paternity,
maternal surname inheritance, name changes or adoption. Together we refer to these
as non-patrilineal transmissions (NPTs), which will act to introduce exogenous
88
haplotypes into a surname-lineage or clan (King and Jobling, 2009). The incidence of
non-paternity was found to have been low; only six out of the 146 clan-affiliated
individuals sampled were NPTs (4.1%). Two instances where NPTs were observed
were in the Jekwa and Hatu clan genealogies (Appendix C, Figures S2 and S4). In the
Jekwa clan, two individuals carrying European R1b_5 haplotypes were found bearing
progeny with African haplotypes, namely E2b1a_9 and E2b1a_5. Similarly, in clan
Hatu, an E2b1a_4 individual and an E1b1a1a_14 individual, were recorded as
progeny of individuals carrying the modal haplotype R1b_18, which is also European
in origin (Appendix C, Figure S4). These individuals are not related biologically to their
“fathers”, and were most likely the result of NPTs such as infidelity or adoption. There
are six clear-cut examples of NPTs observed in clan genealogies (Appendix C; Figures
S1-S6), however there is possible indication for a higher frequency of non-patrilineal
events due to the presence of African haplogroups in clans, however father or sibinformation is not available to definitively state in case whether these events are what
they seem. The informed consent pages distributed prior to sampling did not state that
NPTs could be discovered or disclosed to the subjects, however the matter was
discussed with clan elders. In report-backs to clan members all NPT cases and
possible NPTs were relayed back to clan elders first, who then took it upon themselves
to disseminate the information back to specific clan-members and families as they
were the ones who were informed of all kinship relations and cases of adoption and
non-paternity.
“Daughtering-out” is a process which also results in stochastic variation in the number
of sons fathered by different men, which over many generations can lead to the
extinction of Y chromosome lineages and the increase in frequency of others within
clans (King and Jobling, 2009). Absence of migration of men among clan descent
groups further exacerbates the strength of genetic drift. Thus, although some Y
chromosome haplotypes might go to extinction, others might reach rapidly high
frequencies within a clan and thus give rise to the so-called identity core-haplotypes
(Chaix, 2007). Similarly, genetic drift which occurs due to random changes in
haplotype frequencies over the generations acts to either reduce/increase diversity.
Mutation rates also affect haplotype diversity and can be used to infer whether a
random or a causative change in inheritance has occurred, which distinguishes truly
89
coancestral haplotypes from stochastic variants (Jorde et al., 1998). Jorde et al.,
(1998), Gray et al., (2000) and Kayser et al., (2000) show that the average STR
mutation rate (of ± 2.5 x 10-3 per locus per generation) is much higher than that of
single nucleotide polymorphisms (which mutate at a rate of 10-7 to 10-8 per generation).
While SNPs mutate more slowly and provide an indication for deeper ancestry,
microsatellites have average mutation rates about five times higher and therefore one
may expect to see mutations on the timescale of this type of surname (clan-name)
study (Klyosov et al., 2009). Rates of loci also vary depending on their structural makeup (i.e. whether they are tri-, tetra- or penta- nucleotide repeats) (Butler et al., 2002).
An example of a comparatively slowly mutating marker is trinucleotide repeat-marker
DYS388 which is more impactful than other hypervariable markers (Klyosov 2009).
The STR markers selected in the STR YFiler™ and Multiplex II PCR panels reflect: i)
the consideration of the rates of change of STR markers within a broad enough
haplotype and ii) the estimated time-span from which the founders purportedly
established clans.
A striking difference in average STR mutation rates is observed, depending on
whether the evolutionary estimate is used, or whether the rates were calculated by
direct count in deep-rooted pedigrees (while assuming one generation consists of 25
years) (Jorde et al., 1998; Forster et al., 2000; Zhivotovsky et al., 2004). Therefore, a
direct-count of the father-son transmission rate of Y chromosome haplotypes in
genealogies reflects the time-depth of the constructed clan genealogies, and is a more
suitable indicator than an evolutionary mutation rate for this study. The occurrence of
mutations in within haplogroups that give rise to the diversity of haplotypes within
haplogroups probably reflects the time-scale from the point of introduction of a lineage
from castaways to the present-day population established in the Eastern Cape. If we
use the date which the oral and documented history places the primary abeLungu
clans’ founding to be in 1723 (Soga,1930), which was 287 years before present-day
clan members were sampled in 2010, the time span equates to ~11 generations ago,
with one generation consisting of approximately 25 years (Kayser et al., 2000). This is
consistent with primary abeLungu clan genealogies, which exhibit on average 11
generations between the clan progenitor and present-day clan members.
90
An example that testifies to this is an amaMolo individual who carried haplotype
R1a1a_5 which was markedly different from other R1a1a (R-M198) haplotypes
(Figure 3.2), particularly with that of his alleged “brother” (according to the genealogy
and consent form information) who segregated with the R1a1a_2 modal haplotype
(Appendix C, Figure S1). The 16 mutational step differences between these two
haplotypes shows a divergence too great for these individuals to be biological siblings,
which implies illegitimacy or non-paternity and reduces the father/son transmission
efficacy of the modal haplotype in Bhayi’s lineage. Assuming the average STR
mutation rate estimated by Jorde et al., (1998), Gray et al., (2000) and Kayser et al.,
(2000) is ~2.5 x 10-3 per locus per generation, it is not possible to have observed
related haplotypes bearing 16 mutational differences within the given historical timeframe, which reaffirms that the R1a1a amaMolo brothers are in fact not related. An
alternative explanation for the presence of this haplotype may be from another
haplogroup R1a1a (R-M198) founder, possibly from an entirely different shipwrecking,
which was introduced into the clan at a later stage.
4.2 The maternal legacy of the abeLungu
While the main focus of the study was to assess the concordance between the oral
history (including genealogical records) of the abeLungu and patterns of Y
chromosome variation within and among clans, we also made use of mtDNA to
examine the maternal ancestries of the abeLungu. Given that that written records
claim, for example the story of Bessie (Crampton, 2004), that some women who
survived ship wrecks were also integrated into the local community, an assessment of
mtDNA haplogroups (using their known geographic patterns of distribution), would
help us trace the origins of woman who contributed to the mtDNA pool of the
abeLungu.
These results showed that all individuals had mtDNA haplogroups that were traced
exclusively to African origin, with the majority being found in haplogroups L0d and L3e
(Table 3.4). The presence of L0d lineages would suggest that the maternal history of
the abeLungu is associated with recent admixture from the Khoi and San groups
(Schlebusch et al., 2009). MtDNA variation patterns indicate high within-clan genetic
diversity, low levels of among-clan differentiation, suggesting virtually random female
91
mediated gene flow among clans of deeply-rooted ancestral mtDNA haplotypes. Their
mtDNA is bound to be from different mothers married into the clan, hence the
increased diversity. Two factors have an effect on mtDNA diversity patterns: Polygyny,
which is traditionally practiced in abeLungu clans (with the number of wives often
depending on the wealth of the husband) may result in higher mtDNA diversity (Soga,
1930; Crampton, 2004; Jackson, 2005). The increased levels of mitochondrial
diversity observed in clans might also be a consequence of the complex rules of
exogamy in practice in abeLungu clans. Traditionally, a man must choose a bride so
that he will not share a common ancestor on the paternal lineage with her for a given
number of past generations (Crampton, 2004; Chaix, 2007). This number is usually
close to the genealogical depth of a lineage (five to ten generations depending on the
population), so that in practice the bride usually belongs to a different lineage (lineage
exogamy). These rules of exogamy imply that, at each generation, a significant
number of women migrate from one descent group to another, amounting to increased
diversity (Chaix, 2007). The molecular evidence is congruent with oral history in that
a non-African input is clearly detected in the patriline, and that only Southern-African
haplogroups are witnessed in the mtDNA lines of the abeLungu clans. This indicates
that, apart from a small non-maternity rate, abeLungu women maintained fidelity and
their African maternal ancestral legacy, which has stood through the tests of changing
male population demographies.
As a corollary to examining the Y chromosome lineages which reflect the transmission
of clan name, it would have been interesting to detect non-African ancestry in the
mtDNA of clans-people, which may or may not have been derived from Bessie and
her marriage into the amaTshomane clan; however, this has not been observed.
Bessie’s story tells of how she became part of the mPondo people, had to learn their
language and practice their customs, her marriage into the amaTshomane clan, the
death of her husband and her subsequent remarriage to his brother Sango. Her story
contains several mysterious elements like the unconfirmed identity of the “three
Englishmen” who supposedly accompanied her upon her arrival on the Wild Coast, as
well as putative links to other shipwrecks’ surviving castaways, with particular interest
placed upon one of the more famous ships, the Grosvenor wrecked in 1782 and its
legendary bounty, which to date remains a mystery (Soga, 1930; Crampton, 2004).
Soga recalls Bessie’s affiliation (the only link, which has no relevance unfortunately):
92
“They were given isiXhosa names; the men were called Jekwa, Badi and Hatu, and
the girl, Gquma. Having all come from the same ship (interpreted locally as a house),
they were considered to be family, with Badi and Jekwa seen to be brothers and
Gquma, the daughter of Badi”.
Bessie married and bore children, including her daughter Bessy, but her offspring
would have belonged to the clan of their father - amaTshomane – and not abeLungu,
making her inclusion as a major founder not only unusual, but also fundamentally
contradictory to the principles of patrilineal descent. The little that is known about
Bessy’s family tree has survived the generations and has been passed down by oral
history. Soga (1930) accounts: “Gquma died at Mgazi about 1770. Her daughter Bessy
married Mjikwa, son of Wose, Chief of the amaNkumba clan, and Principal Son of the
Great House of Zwetsha - the premier clan of the amaBomvana. Xwebisa and
Gquma‘s family, claiming foreign heritage, still retains by virtue of descent through the
male line the original clan name of the amaTshomane, derived from Tshomane,
Xwebisa’s father” (Soga, 1930, p.380) (Figure 20). No trace of Bessie’s European
history was found in the mtDNA haplogroups of maternal lines. This too was expected,
considering no direct efforts to trace Bessie’s descendants could be made due to the
patriarchal mechanisms of inheritance of Xhosa clan name, and the lack of a social
marker to parallel mtDNA inheritance.
Further collaborative steps can be made, beginning with deeper and more widespread
anthropological research, so as to attempt to trace the mitochondrial lines stemming
from Bessie and other potential female survivors. This would be a difficult task,
considering there are no female cultural markers to parallel the patrilineal inheritance
of the abeLungu clan name. Tracing maternal ancestry has been successful in several
studies, however the proposed reconstruction of social relations, especially that of
matriclans, is based on very thin evidence for traditional - that is, pre-1900 times
(Pollock, 2009). Some of the seminal studies on populations that exhibit matrilineal
modes of inheritance include that of Godard (1867) on the Sudanese Nubians, the
study on Iroquai and Hopi Native Americans by Freire-Marrecco (1914), the Vanatinai
of Papua New Guinea by Lepowsky (1981), and the Rapanui in Polynesia by Hage &
Marck (2003). More recently, Starck, (2013) studied Minangkabau of West Sumatra
who form the largest matrilineal society in the world. The life in the core areas was
93
defined by a matrilineal way of life. This means there are certain kinship groups which
follow the female descent of a mother. The woman’s brother is responsible for her
children rather than her husband. These studies have investigated these cultural
groups which exhibit matrilineal and matrilocal inheritance of clan name, so as to trace
their matrilineal origins. These studies are the biological and socio-cultural
counterparts to the ancestry studies of those like this study on the abeLungu, and they
need to be used as guides for future studies on the abeLungu for the discovery of
Bessie’s maternal legacy.
The only detail that historical data provide is that it is suspected that Bessie was a
crew member aboard one of the Dutch East India (VOC) vessels which became
wrecked sometime around 1737, which remains unconfirmed to date. To investigate
the maternal line deriving from her daughter Bessy who married Mjikwa would be the
one means at discovering European maternal ancestry. However, the family line leads
up to an individual Sizungazane, who died in 1921 (Figure 4.1). There are no records
as to whether this individual had any offspring who may have continued the maternal
ancestral legacy of Gquma (Bessie), her daughter Bessy and their European origins
(Soga, 1930). Even if there are historical examples of female European castaways,
their non-African maternal ancestry would most likely be diluted out within a few
generations, due to admixture from the mtDNA haplotypes of local Xhosa women
which clan males had married.
Examples of matrilocality exist within the amaXhosa, albeit sparsely. Kalis, in her
interviews recorded in 2009: “…For example, a friend whose patrilineal family resides
in the Tsomo village in which his amaThembu great-great-great-grandmother was
born. She married an Englishman by the name of Jonas (possibly a trader) in the midnineteenth century, and their descendants took their mother’s clan-name,
amamTolo". Kalis also has a colleague whose mother’s family hails from Holy Cross
and traces its descent from a shipwreck survivor. Through simply discussing her
further research plans with people in and around Mthatha, Kalis had also received
suggestions of additional clan names which require following up.
94
Figure 4.1. The amaTshomane clan genealogy featuring the lineages and the progeny
born from Xwebisa (Sango) and Gquma (Bessy) (adapted from Soga, 1930, p.380)
4.3 In summary of the findings
These demographic and genetic processes may explain not only the existence of the
core identity haplotypes at the Y chromosome and mtDNA levels in clans, but also
their overall lower Y chromosome diversity compared to non-clan affiliates in
neighbouring, cohabiting regions, as well as the higher diversity of deeply-rooted
mtDNA haplotypes. The prevalence of non-African haplogroups in the vast majority of
abeLungu and amaMolo clan members with coancestral haplotypes ultimately
validates the hypothesis put forward in the documented and oral history (Soga, 1930;
Kirby, 1953; Crampton, 2004). Analysis of the results provides evidence for the
relevance of the dual inheritance model (culture and genetics) in understanding
patterns of human genetic variation, as inferred by gene-culture coevolution theory
(Jobling, Rasteiro & Wetton, 2015). Analyses indicate that the dynamics of patrilineal
descent groups imply different male and female socio-demographic histories, as well
as the fact that patrilocality, NPTs and polygyny are primarily responsible for these
sexually-asymmetric genetic patterns.
95
4.4 Future Studies
Sampling of a wider variety of clan names and genealogical histories will contribute to
alleviating any current social or geographical bias, leading to interesting new insights
of cultural and demographic history (King and Jobling, 2009[b]).
Admixture analysis would be needed to further refine origins and relations within clans
of the Pondoland region as well. Autosomal Ancestry Informative Markers (AIMs) are
distributed abundantly throughout the genome, and have shown to retain
geographically restricted allele frequency distributions which serve as indications for
the likely parental populations of the samples being investigated (Phillips, 2007; Hinds,
2005). Autosomal SNPs with population specificity to those of likely parental
populations of the samples can be screened for using an array of autosomal ancestry
informative markers (AIMs), selected from published data, found in European, African,
Asian, Eurasian, Middle Eastern and Oceanic populations, so as to uncover any
possible genetic substructure and/or admixture, in order to complement findings from
Y chromosome and mtDNA data. In these tests, AIMs are examined and compared
with frequency data of these markers in the surveyed groups. The results of this
comparison are then statistically analysed to produce a break-down in the form of
percentages of different ancestries (Nash, 2006). These percentages are therefore
not a measure of the number of ancestors of different backgrounds within an
individual’s genealogy, but rather an estimate that depends on the initial selection of
subjects, the examination of particular markers, complex statistical calculations and
the system of ordering patterns of human diversity into the main continental groups
(Nash, 2006).
Fejerman (2005) suggests that if an admixture event had occurred many generations
ago, then African alleles would be expected to be widespread among individuals.
However, if it took place only two or three generations ago (as in the case of some of
the secondary abeLungu clans, with more recent ages of establishment), we would
expect to observe a small proportion of individuals in the sample with a relatively high
probability of having African ancestors. He attributes this to the fact that the ‘immigrant’
alleles do not have time to become established among individuals of the endemic
population, but rather tend to remain concentrated within families. Autosomal tests do
provide a degree of statistical certainty about the proportions of different groups in
96
one’s ancestry; however, the results are more of an artefact of a series of
approximations that underestimate the complex ways in which ethnical and racial
identities are socially defined and experienced. Another misappropriation of admixture
analyses is that they can easily be interpreted as supporting to the idea that racial or
ethnic categories have a genetic basis (Fejerman, 2005; Nash, 2006).
Comparative diversity analysis between populations will tell us about differences in
their histories, and knowledge from multi-allelic marker mutation rate analyses will
allow for estimates of the age of the most recent common ancestor of the group of
chromosomes examined. These kinds of studies have implications beyond the field of
Y chromosome research, as they can reveal signals of population structure and history
which are important in choosing populations for mapping genes underlying complex
traits (Jobling and Tyler-Smith, 2000). Most new advances will emerge from the
exploitation of recent technological developments. Improvements to the methods of
analysis of ancient DNA should enable the testing of genealogical links between living
individuals and putative patrilineal ancestors and also among archaeological human
remains (Fejerman, 2005; Nash, 2006). High-resolution Y chromosome typing and
mitochondrial DNA sequencing, together with whole-genome SNP analyses, should
enable reliable reconstructions of genealogies; these will include the establishment of
links across the sexes, which cannot be achieved by the analysis of uni-parentally
inherited markers alone. In terms of relatedness, surname/clan-name—ascertained
cohorts of men who share Y-chromosomal coancestry lie between the traditional
pedigree and the population substructure, and application of whole-genome typing to
such groups could be useful in understanding the history of recombination, and for
genetic epidemiological purposes. With the decrease in cost of sequencing, private
individuals will fund their own genome projects, and it seems inevitable that SNPs
specific to surnames, clan names or their lineages will be identified, providing powerful
resources for genealogical research.
97
4.5 The impact of human population diversity and genetic genealogy
studies
No single record of the past is more important than another, but each one records
different features of the past. The utility of DNA over time is similar to that, in
archeological terms, of bones which have been exhumed, as it also provides us with
a clearer image of our past. Heritable clan-names, like surnames, are unique cultural
markers of coancestry, that represent a rich resource for the analysis of human
diversity (Kayser et al., 2003; Jobling, 2015), archaeology (Paabo et al., 2004; Green
et al., 2010), history (Moore et al., 2006), genealogical descent (Foster et al., 1998;
King and Jobling, 2009 [a and b]) and disease (Pritchard et al., 2001). The effort to
understand human origins and history of migration is intrinsic to our basic human
nature and curiosity.
As we track the ancestors from geographical stopping points we uncover the history
of the migration routes of anatomically modern humans that left Africa between 35,000
and 89,000 years ago (Underhill, 2000). A minority of contemporary East Africans and
Khoisan represent the descendants of humankind’s most ancestral patrilineages.
Examining the utility of heritable DNA on a more recent time scale allows for the
detection of ___location and migration of clan-groups and surname lineages. Clan-names
are patrilineal, and so men sharing surnames might be expected to share related Y
chromosome haplotypes, because these are also passed down from father to son
(Jobling and Tyler-Smith, 2003). However, the strength and structure of the
relationship between the two could be influenced by a number of additional factors
(Jobling, 2001). Mutation will alter haplotypes through time, although, on the timescale
of clan-names, this will only affect rapidly mutating markers such as short tandem
repeats (STRs) (Jobling, 2015). Knowledge of mutation rates and processes allows
this to be taken into account. Similarly, differences in the number of founders at the
time of surname/clan-name establishment within a given population could affect the
number of descendant lineages within a clan-name (King and Jobling, 2009[b]).
There are several advantages to using molecular data when investigating population
structure, namely: molecular entities are strictly heritable and the description of
molecular characters (mutations) is unambiguous, unlike oral history which is subject
to large degrees of distortion and incoherence. Molecular data are abundant and
98
amenable to quantitative treatment - we can statistically measure support for
hypotheses. Homology assessment is easier and more reliable with molecular data
than with morphological traits, which have been typically relied upon in the past. There
is some regularity to the evolution of molecular traits as well, which allows for accurate
measurements of demographic expansion timings. The population-specificity of binary
markers used in combination with microsatellites in Europe and elsewhere has proved
useful in analysing clan-names that are thought to reflect origins outside a particular
region, such as for those formerly British clan-names like Ogle, Caine (or McCaine),
Irish and Horner might suggest.
4.6 Genealogy testing and its limitations
In more recent times there has been an upsurge of commercial ancestry testing
companies, offering services aimed to situate individuals within global patterns of
human genetic diversity, locate genetic origins and sort out true biological relatedness
from practiced kinship (Shriver and Kittles, 2004; King and Jobling, 2009). Most of
these companies assert that DNA evidence can provide a link between possible
branches of a family tree when there is difficulty in establishing a connection by other
means (Shriver and Kittles, 2004; Nash, 2004).
Ancestry test results are depicted through the familiar graphics of the human family
tree and explained via recognised but newly geneticised notions of human
reproduction, ancestry and inheritance. Ancestry testing defines the most recent
association of popular and scientific models of ancestry and descent in geneticised
genealogy, and signifies the cultural work of authorising genetic answers to questions
of relatedness and identity.
However, as these tests are novel and still under-developed, much of the public is
often apprehensive to undergo testing due to the controversies associated with
personalised genetic histories - but for the most part, these controversies have more
to do with the complex history of race, discrimination and prejudice than the science
behind genetic ancestry itself. The outcomes are variable: for some individuals, the
results may be disconcerting and the experience at attempting to interpret the meaning
of the test results for personal or familial interpretations of origin and identity, is
99
sometimes complicated and often unsatisfactory. For others still, they can be only
marginally interesting, or end up even having very little significance, while for certain
people the tests provide, as testing companies assure, a meaningful and significant
sense of ancestral origin (Nash, 2004; Shriver and Kittles, 2004). Regarding the clanaffiliated subjects who received their ancestry reports the outcome was welcomed and
one of gratitude from the abeLungu for contributing to validatating their unique
ancestry.
Traditional anthropological genealogy should in theory be able to provide a family tree
that includes all the sets of great-grandparents, great great-grandparents and so on,
whose genetic material has been mixed together and passed on to a present day
individual. Genetic genealogy however cannot inform the public about the complex
blend of genetic material people have inherited from all the preceding ancestors of
that individual. The narrowness of the tests and their dependence on the form of direct
transmission, from fathers to sons in the case of the Y chromosome, and from mothers
to children in the case of mtDNA, makes them useful in exploring patterns of descent.
However, this specificity also means that the tests only focus on a small portion of
DNA which is inherited directly (Nash, 2006).
Technically more advanced tests that employ genome-wide association to infer
genetic 'ancestry painting’, admixture analyses, or one’s percentage 'global similarity'
to other people in the world, oversimplify the genetic data (and as a result the
individual’s identity), reducing it to something as scientifically lacking, but historically
marked, as ‘European' (Bhattacharya, 2010). Also, considering the time-scale of
genetic ancestry tests and the results of a genetic genealogy test alone, without
comparison to other socio-cultural genealogical narratives, these tests do not provide
information on more recent ancestors who could have lived in many different places
(Nash, 2004).
It is noteworthy for clients of genetic ancestry tests to consider that while haplogroups
are presented as personal results which suggest their special link to the customer,
(through claims that their result is their ‘unique genetic signature’), these haplogroup
designations are in fact shared with millions of other people. The estimated genomic
variation of any one person differs from that of another by only 0.1% to 1%, which in
turn means that the vast majority of the genes that are said to make us who we are,
100
are invariably shared with every other human. This means that the axiom that most
testing companies exploit, namely ‘Genes make you who you are as a unique
individual’ actually have very little discrimination capacity, other than in ascertaining
‘ethnic or geographical origins’, or in some instances, the compositional breakdown
thereof (Shriver and Kittles, 2004; Nash, 2006; Jobling, 2015).
Genetic genealogy tests provide individual results which may have implications for
other relatives who, as discussed above, in the case of Y-chromosome tests share
paternal descent, and in the case of mtDNA tests share maternal descent. The
information that the results suggest about ethnic or geographical origins that a person
receives is therefore pertinent to other family members who may not have chosen to
acquire this knowledge, and for whom it may complicate or disturb their particular
sense of cultural or ethnic identity. Examples exist, such as is frequently the case,
where Y-chromosome tests results can point to a white male ancestor for AfricanAmerican or British black men - as is the case with the abeLungu, who rather praise
their foreign ancestry than feign it. In other cases, when the oral history seems
irrefutable, the DNA evidence may sometimes be conflicting, and may provide
unwelcome news of an illegitimacy - examples of which have been discovered in this
particular study- and so these instances need to be treated with cultural and ethical
sensitivity.
Despite the limitations which are found in genetic ancestry testing currently available,
the power and resolution capacity of DNA analyses used in conjunction with traditional
historical research in genealogy will continue to grow. Genetic tests will become more
affordable and more sophisticated, and genetic databases will expand, incorporating
more markers representing a better geographical coverage of global diversity (Jobling,
2015). It must be noted - the finding of matching haplotypes needs careful
interpretation and, in particular, consideration must be taken of the frequency and
resolution of the haplotype in a haplogroup. Ideally, the question of whether a
mismatch is due to mutation, or whether it represents an exclusion of a recent common
ancestor, should be considered in terms of locus-specific mutation rates (Jobling,
2015). It is also crucial to keep in mind how genetic ancestry inference is a biotechnologically assisted process that is based on a socially constructed process, and
101
the two disciplines should aim to corroborate each other so as to define clearer
ancestral migration and demographic histories.
4.7 Biomedical and forensics impact of population diversity studies
An understanding of how genetic diversity is structured in the human species is not
only of anthropological and political importance, but also of medical relevance
(Jobling, 2000; Lu et al., 2016). Much of the population diversity literature recently
points out that individuals of various cultural and ethnic origins may often respond
differently to medical treatments where major differences in allele frequencies exist
between populations (Wilson et al., 2001; Shriver and Kittles, 2004; Jobling, 2015).
While the majority of polymorphisms investigated in population diversity studies are
probably neutral, they can be used to query for associations with particular phenotypes
and - to examine the reverse - to ask if there are specific phenotypes which influence
the distributions of polymorphisms within populations (Jobling, 2000). The first
approaches to quantifying biological differences were based on crude physical
measurements that were heavily biased in their execution, using characteristics that
define racial classification to which human perception is most directly amenable,
namely phenotypic traits such as skin colour, eye shape and colour, hair colour and
texture, etc. Until much more recently, direct and objective methods of quantifying
genetic variation (as opposed to “physical” characteristics) were non-existent (Lu et
al., 2016).
The first in a series of large public efforts that began to shift the field of medical
genetics away from purely descriptive documentation of patients’ phenotypes, coupled
with ineffective and time-consuming examination of a small subset of patients’
potentially disease-causing genes, was marked by the successful completion of the
Human Genome Project, in 2003 (Lu et al., 2016). Genome-wide association studies
(GWAS) provided the opportunity to efficiently and comprehensively assay genetic
variants common to a population, and to identify those variants more frequent in
patients with a particular disease, than in controls without the disease (Manolio et al.
2009; Lu et al., 2016). Numerous population-specific studies of disease have obtained
conclusive results for population affinity of alleles, which include the investigation of
myocardial infarction in Icelanders, or to prostate cancer in African-Americans, which
102
have capitalised on the disease-susceptibility of specific alleles in more genetically
homogeneous populations (Jobling and Tyler-Smith, 2000; Lu et al., 2016). Similarly,
the autosomal recessive disease sickle cell anemia, for example, was shown to be
largely restricted to African, Mediterranean, and South Asian populations (Lu et al.,
2016). As early as in 1966 it was already known about another example of populationaffinity for a disease genotype which is that of Ashkenazi Jews, who are statistically
more susceptible to carry the mutant alleles causing autosomal recessive Tay–Sachs
disease (Lu et al., 2016; Myrianthopoulous and Aronson, 1966). Another renowned
case study for treatment response to disease for which there is population specificity
is hepatitis C virus (HCV) infection, for which it was established that African individuals
respond more poorly to HCV drug treatment than Caucasian and Asian individuals (Lu
et al., 2016).
In studies such as these, the appropriate choice of the control population is of intrinsic
importance. While the Y chromosome is highly stratified by geography, it might also
be stratified by other factors such as social-class, which could amount to
ascertainment biases in the attendance of subjects at clinics, as an example. An
approach to this would be to use male, non-blood relatives from within the subjects’
families, as control subjects (Jobling and Tyler-Smith, 2000).
An understanding of population structure is critical for the identification of disease
genes by association with marker loci. Advances in DNA sequencing technologies and
analyses have driven the recent rise of genomics in medicine which is aimed at finding
genetic causes of common complex diseases, so as to develop marketable cures for
them (Lander and Schork, 1994; Cardon and Bell, 2001; Serre and Paabo, 2004;
Bhattacharya 2010; Jobling, 2015; Lu et al., 2016).
In another application of surname/clan-name—genotype association studies, a list of
surnames (clan-names) with associated Y-STR haplotypes could enable a Y profile to
be matched with one or more surnames (clan-names). This might allow the surname
of the depositor of DNA evidence to be deduced, in conjunction with the identification
of genes involved in certain phenotypic traits such as pigmentation, so as to provide a
means to prioritise a suspect list for crime investigation (King and Jobling, 2009).
103
Y-STR haplotypes will increase the success rate of identifying the male component in
male/female cell mixtures in body fluids where other discrimination methods were
unsuccessful or too risky; for example, highly degraded samples or samples with very
low sperm counts (Kayser et al., 1997).
King and Jobling (2009[b]) proposed that a database of surnames and associated Y
profiles would have forensic utility. However, for more frequent and common names
(those with greater than 6,000 bearers), predictive power is poor, due to high
haplotype diversity. However, for rarer names (those with less than 50 bearers)
databases would be ineffective, as crime-scene samples are relatively unlikely to be
deposited by bearers of these more unique names, so for the most practical reasons
and for optimum efficacy of such a database, a solution would be to incorporate
surnames of intermediate frequency. Regarding a database for forensic utility based
on haplotype-surname relationships, Bhatti et al., (2016) feel that current databases
are not very effective in forensic analysis, due to the currently limited sample size and
unknown geographical ethnic origins of the populations. Further studies need to be
conducted so as to expand and update the current population-specific databases at
ethnic and geographical level by generating a DNA data-set with higher resolutions of
discrimination capacity.
4.8 Social cohesion and making a new South African demographic history
As humans we are all descendants from the same human-species tree. However, still
today we in South Africa observe racial tensions based on the systemic violence,
displacement, racial formation and institutions of social control entrenched by
apartheid, because this enforced separation had resulted in radical racialised notions
of cultural identity. Race has the implied meaning only in the sense that its members
share common ancestry distinct from other groups. Races also share many things
besides genes, to the extent that the concept is inextricably cultural in nature (Kittles
and Weiss, 2003).
People often understand themselves in a capacity of not only what is genetically
inherited, but also in terms of a much more comprehensive set of social relationships,
namely narratives of what is culturally inherited, traditions, attitudes, as well as in terms
of the impact of their childhood and lifelong experiences (Nash, 2006; Bhattacharya,
104
2010). Genetic genealogy has the power to clarify and reinforce existing
understandings of the significance of ancestry with the public’s sense of identity as
individuals or as members of ethnic groups.
There is a need for social and molecular scientists to step up and dispel myths, by
investigating modest objectives such as confirming relationships between people with
similar surnames to answering specific questions like “what was life like for humans
10,000 years ago?”. This, as well as evaluating oral histories of traditional clans, which
would challenge our perceptions of race, ethnicity and culture. Molecular biology,
when used as a tool coupled with detailed anthropological history, can act to
corroborate and refine demographic history, which for South Africa (and indeed the
world today) can be used to illustrate the arbitrary nature of racial differences which
are insignificant in comparison to much more fundamental commonalities of the
human race.
This study has been in line with such intentions. Through the analyses of the DNA of
current clan descendants in conjunction with genealogical data, certain aspects of the
oral story of the origins of the abeLungu and amaMolo which have survived ten
generations have been affirmed, with other aspects invalidated. With this in mind, we
confirm that molecular (genetic) methods can be used to validate and refine
genealogical history of an example in human history where non-African shipwreck
survivors and immigrants underwent a harmonious integration into a deeply rooted
Xhosa culture, which contrasts with much of South Africa’s recent political history.
105
CHAPTER 5
Concluding remarks
5.1 Testing the oral history of the abeLungu
Oral history was, and is, a form of entertainment as well as a tool for passing on a
cultural identity and a system of values, the details of which have been added and
subtracted throughout the course of its telling and re-telling. The premise of this study
was to evaluate the congruency of the history passed down from abeLungu
progenitors primarily through the cultural medium of historical and genealogical
narration, by investigating the molecular evidence found in the DNA of its
contemporary clan members.
The genetic data supports the anthropological information regarding introgression of
non-African genes into the gene pool of the abeLungu (Appendix C, Figure S1-, S6).
It has been shown that the DNA narrative is in convincing agreement with the oral
history narrative, which contributes great value to affirming and refining the identity
and culture of the abeLungu people.
Present-day descendants of the abeLungu and the amaMolo exhibit high efficacy in
father/son chromosome transmission, in that they show continuity of transmission of
patrilineal haplotypes in parallel with the transmission of clan name, without dilution
across approximately ten generations (Appendix C, Figures S1-S6). The distinct
feature of tight clustering of the abeLungu ancestral modal-haplotypes with few
variants was consistent for most RMJ networks, which were resistant to repetition of
the analysis with randomised inputs (namely non-clan affiliates from the greater
mPondo region), as suggested in Bandelt et al., (1999).
106
In addition to affirming the non-african male ancestry of the abeLungu clans, this study
has also demonstrated how the cultural subdivision in patrilineal descent groups has
left its footprints on Y chromosome diversity of patrilocal populations, without affecting
mitochondrial diversity. The male abeLungu population is experiencing a demographic
history of lineal fissions of descent groups without subsequent migrations between
descent groups, and this results in so-called “identity cores” and to a reduction of Y
chromosome diversity (Chaix, 2007). At each generation the female population
undergoes migration flows between lineages of clans, as a result of the social rules of
exogamy in practice in the abeLungu, and therefore obstructs the social structure from
imprinting mitochondrial structure. This presents as an obstacle to investigating the
maternal origins of clan members.
In addition to forming a better understanding of the history of these unique groups of
Xhosa clans who feature foreign origins, a better picture of South Africa’s demographic
history is painted which we can only hope will demonstrate the superficiality of the
antiquated concept of race, provide a greater sense of unity for the future of our
country and provide a deeper understanding of our human origins for our unity as a
species.
As Haraway observes: ‘Epistemophilia, the lusty search for knowledge of origins, is
everywhere’ (Nash, 2004).
107
References
Abu-amero, K.K., González, A.M., Larruga, J.M. et al., 2007. Eurasian and African
mitochondrial DNA influences in the Saudi Arabian population. BMC Evolutionary
Biology. 15, pp.1–15.
Achilli, A., Rengo, C., Magri, C. et al., 2004. The Molecular Dissection of mtDNA
Haplogroup H Confirms That the Franco-Cantabrian Glacial Refuge Was a Major
Source for the European Gene Pool. American Journal of Human Genetics, 75,
pp.910–918.
Andrews, R.M., Kubacka, I., Chinnery, P. F. et al., 1999. Reanalysis and revision of
the Cambridge reference sequence for human mitochondrial DNA. Nature genetics,
23(2), p.147.
Athey, T.W., 2006. Haplogroup Prediction from Y-STR Values Using a BayesianAllele- Frequency Approach. Journal of Genetic Genealogy. 2(2), pp.34–39.
Ayub, Q., Mohyuddin, A., Qamar, R. et al., 2000. Identification and characterisation of
novel human Y-chromosomal microsatellites from sequence database information.
Nucleic acids research, 28(2), pp.1-5.
Balanovsky, O., Dibirova, K., Dybo, A. et al., 2011. Parallel evolution of genes and
languages in the Caucasus region. Molecular Biology and Evolution, 28(10), pp.2905–
2920.
Balaresque, P., Bowden, G.R., Adams, S.M. et al., 2010. A predominantly neolithic
origin for European paternal lineages. PLoS Biology, 8(1), pp1-9.
Bandelt, H.J., Alves-Silva, J. & Guimaraes, P.E.M., 2001. Phylogeography of the
human mitochondrial haplogroup L3e: a snapshot of African prehistory and Atlantic
slave trade. Annals of human genetics, 65, pp.549–563.
Bandelt, H.J., Forster, P. & Röhl, A., 1999. Median-joining networks for inferring
intraspecific phylogenies. Molecular biology and evolution, 16(1), pp.37–48.
108
Barik, S.S., Sahani, R., Prasad, B.V.R. et al., 2008. Detailed mtDNA genotypes permit
a reassessment of the settlement and population structure of the Andaman Islands.
American Journal of Physical Anthropology, 136(1), pp.19–27.
Batai, K., Babrowski, K.B., Arroyo, J. P. et al., 2013. Mitochondrial DNA diversity in
two ethnic groups in Southeastern Kenya: Perspectives from the northeastern
periphery of the bantu expansion. American Journal of Physical Anthropology, 150(3),
pp.482–491.
Behar, D.M., Villems, R., Soodyall, H. et al., 2008. The dawn of human matrilineal
diversity. American Journal of Human Genetics, 82(May), pp.1130–1140.
Behar, D.M., Rosset, S., Blue-Smith, J. et al., 2007. The genographic project public
participation mitochondrial DNA database. PLoS Genetics, 3(6), pp.1083–1095.
Behar, D.M., Hammer, M F., Garrigan, D. et al., 2004. MtDNA evidence for a genetic
bottleneck in the early history of the Ashkenazi Jewish population. European journal
of human genetics: EJHG, 12(5), pp.355–364.
Bhattacharya, R., 2010. Human Population Categories in Genomic Studies and
Racialisation. MSc in Race, Ethnicity and Post-Colonial Studies; London School of
Economics and Political Science. pp.1-66
Bhatti, S., Aslamkhan, M., Attimonelli, M. et al., 2016. Mitochondrial DNA variation in
the Sindh population of Pakistan. Australian Journal of Forensic Sciences, 618(May),
pp.1–16.
Bortolini, M-C., Salzano, F.M., Thomas, M.G. et al., 2004. Y-chromosome evidence
for differing ancient demographic histories in the Americas. American journal of human
genetics, 73(3), pp.524–539.
Brotherton, P., Haak, W., Templeton, J. et al., 2013. Europe PMC Funders Group
Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans.
Nature Communications, 4:1764, pp.1-21.
109
Cadenas, A.M., Zhivotovsky, L.A., Cavalli-Sforza, L.L. et al., 2008. Y-chromosome
diversity characterizes the Gulf of Oman. European journal of human genetics: EJHG,
16(3), pp.374–386.
Campbell, K.D., 2007. Geographic Patterns of R1b in the British Isles – Deconstructing
Oppenheimer. Journal of Genetic Genealogy, 3(2), pp.63–71.
Cann, R.L., Stoneking, M. & Wilson, A.C., 1987. Mitochondrial DNA and human
evolution. Nature. 325(1), pp.31-36.
Capelli, C., Brisighelli, F., Scarnicci, F. et al., 2007. Y chromosome genetic variation
in the Italian peninsula is clinal and supports an admixture model for the MesolithicNeolithic encounter. Molecular Phylogenetics and Evolution, 44(1), pp.228–239.
Cardon, L.R. and Bell, I.J., 2001. Association study designs for complex diseases.
Nature reviews. Genetics, 2(2), pp.91–99.
Chaix, R., Quintana-Murci, L., Hegay, T. et al., 2007. From Social to Genetic
Structures in Central Asia. Current Biology, 17(1), pp.43–48.
Chiaroni, J., Underhill, P.A. & Cavalli-Sforza, L.L., 2009. Y chromosome diversity,
human expansion, drift, and cultural evolution. Proceedings of the National Academy
of Sciences of the United States of America, 106(48), pp.20174–20179.
Cinnioǧlu, C., King, R., Kivisild, T. et al., 2004. Excavating Y-chromosome haplotype
strata in Anatolia. Human Genetics, 114(2), pp.127–148.
Crampton, H., 2004. The Sunburnt Queen. Jacana Media Pty (Ltd), 2004. ISBN:
1919931929.
Cruciani, F., La Fratta, R., Santolamazza, P. et al., 2004. Phylogeographic analysis of
haplogroup E3b (E-M215) y chromosomes reveals multiple migratory events within
and out of Africa. American journal of human genetics, 74(5), pp.1014–1022.
Cruciani, F., Santolamazza, P., Shen, P. et al., 2002. A back migration from Asia to
sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome
haplotypes. American Journal of Human Genetics, 70(5), pp.1197–214.
110
Destro-Bisol, G., Coia, V., Boschi, I. et al., 2004. The Analysis of Variation of mtDNA
Hypervariable Region I Suggests that Eastern and Western Pygmies Diverged before
the Bantu Expansion. The American Naturalist, 163(2), pp.212–226.
Di Gaetano, C. et al., Cerutti, N. & Crobu, F. 2009. Differential Greek and northern
African migrations to Sicily are supported by genetic evidence from the Y
chromosome. European journal of human genetics: EJHG, 17(1), pp.91–99.
Di Giacomo, F., Luca, F., Popa, L.O. et al., 2004. Y chromosomal haplogroup J as a
signature of the post-neolithic colonization of Europe. Human Genetics, 115(5),
pp.357–371.
Doosti, A. & Dehkordi, P., 2011. Genetic Polymorphisms of Mitochondrial Genome Dloop Region in Bakhtiarian Population by PCR-RFLP. International Journal of Biology,
3(4), pp.41–46.
Fadhlaoui-Zid, K., Plaza, S., Calafell, F. et al., 2004. Mitochondrial DNA heterogeneity
in Tunisian Berbers. Annals of Human Genetics, 68(3), pp.222–233.
Fejerman, L., Carnese, F.R., Goicoechea, A.S. et al., 2005. African ancestry of the
population of Buenos Aires. American Journal of Physical Anthropology, 128(1),
pp.164–170.
Finnilä, S., Lehtonen, M.S., Majamaa, K. et al., 2001. Phylogenetic Network for
European mtDNA. The American Journal of Human Genetics, 68(6), pp.1475–1484.
Forster, P., Röhl, A., Lünnemann, P. et al., 2000. A short tandem repeat-based
phylogeny for the human Y chromosome. American journal of human genetics, 67(1),
pp.182–196.
Freire-Marreco, B., 1914. Tewa Kinship terms from the Pueblo of Hano, Arizona.
American Anthropologist, N.S (16) pp.269-287.
Fu, Q., Rudan, P., Paabo, S. et al., 2012. A next-generation approach to the
characterization of a non-model plant transcriptome. Current Science, 101(11),
pp.1435–1439.
111
Geppert, M. and Roewer, L., 2012. SNaPshot® Minisequencing Analysis of Multiple
Ancestry-Informative Y-SNPs Using Capillary Electrophoresis. DNA Electrophoresis
Protocols for Forensic Genetics, Vol. 830, pp. 127-140
Gonder, M.K., Mortensen, H.M., Reed, F.A. et al., 2007. Whole-mtDNA genome
sequence analysis of ancient African lineages. Molecular Biology and Evolution, 24(3),
pp.757–768.
Gray, I.C., Campbell, D.A. & Spurr, N.K., 2000. Single nucleotide polymorphisms as
tools in human genetics. Human molecular genetics, 9(16), pp.2403–2408.
Green, R.E., Malaspinas, A.S., Krause, J. et al., 2008. A Complete Neandertal
Mitochondrial Genome Sequence Determined by High-Throughput Sequencing. Cell,
134(3), pp.416–426.
Green, R.E., Krause, J., Briggs, A.W. et al., 2010. A draft sequence of the Neandertal
genome. Science (New York, N.Y.), 328(5979), pp.710–22.
Hage, P. & Marck, J., 2003. Matrilineality and the Melanesian Origin of Polynesian Y
Chromosomes. Current Anthropology, 44(S5), pp. S121–S127.
Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and
analysis program for Windows 95/98/NT. Nucleic acids Symposium, pp.95–98.
Hammer, M.F., Behar, D.M., Karafet, T.M. et al., 2009. Extended y chromosome
haplotypes resolve multiple and unique lineages of the Jewish priesthood. Human
Genetics, 126(5), pp.707–717.
Hammer, M.F. & Zegura, S.L., 2002. The Human Y Chromosome Haplogroup Tree:
Nomenclature and Phylogeography of Its Major Divisions. Annual Review of
Anthropology, 31(1), pp.303–321.
Hinds, D.A., Stuve, L.L., Nilsen, G.B. et al., 2005. Whole-genome patterns of common
DNA variation in three human populations. Science (New York, N.Y.), 307(5712),
pp.1072–1079.
Iborra, F.J., Kimura, H. & Cook, P.R., 2004. The functional organization of
mitochondrial genomes in human cells. BMC biology, 2(9), pp.1–14.
112
Järve, M., Zhivotovsky, L.A., Rootsi, S. et al., 2009. Decreased rate of evolution in Y
chromosome STR loci of increased size of the repeat unit. PLoS ONE, 4(9).
Jobling, M.A., 2001. In the Name of the Father. Trends in Genetics, 17(6), pp.353–
357.
Jobling, M.A., Rasteiro, R. & Wetton, J.H., 2015. In the blood: the myth and reality of
genetic markers of identity. Ethnic and Racial Studies, 39(2), pp.142–161.
Jobling, M.A. & Tyler-Smith, C., 2000. New uses for new haplotypes. Trends in
Genetics, 16(8), pp.356–362.
Jobling, M.A. & Tyler-Smith, C., 2003. The human Y chromosome: an evolutionary
marker comes of age. Nature reviews. Genetics, 4(8), pp.598–612.
Jorde, L.B., Bamshad, M. & Rogers, A.R., 1998. Using mitochondrial and nuclear DNA
markers to reconstruct human evolution. BioEssays: news and reviews in molecular,
cellular and developmental biology, 20(2), pp.126–136.
Jorde, L.B., Watkins, W.S., Bamshad, M.J. et al., 2000. The distribution of human
genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data.
American journal of human genetics, 66(3), pp.979–988.
Karafet, T.M., Mendez, F.L., Meilerman, M.B. et al., 2008. New binary polymorphisms
reshape and increase resolution of the human Y chromosomal haplogroup tree.
Genome Research, 18(5), pp.830–838.
Kayser, M., Cagliá, A. & Fretwell, N., 1997. Evaluation of Y-chromosomal STRs: a
multicenter study. International Journal of Biology, pp.125–133.
Kayser, M. et al., 2000. Characteristics and frequency of germline mutations at
microsatellite loci from the human Y chromosome, as revealed by direct observation
in father/son pairs. American journal of human genetics, 66(5), pp.1580–1588.
Kayser, M., Roewer, L., Hedman, M. et al., 2003. Reduced Y-chromosome, but not
mitochondrial DNA, diversity in human populations from West New Guinea. American
journal of human genetics, 72(2), pp.281–302.
113
King, T.E. & Jobling, M.A., 2009. What’s in a name? Y chromosomes, surnames and
the genetic genealogy revolution. Trends in Genetics, 25(8), pp.351–360.
King, T.E. & Jobling, M.A., 2009. Founders, drift, and infidelity: The relationship
between y chromosome diversity and patrilineal surnames. Molecular Biology and
Evolution, 26(5), pp.1093–1102.
Kirby, P.R., 1953. A Source Book on the Wreck of the Grosvenor East Indianman.
Volume 34 of Van Riebeck Society publications. First series, 1953.
Kivisild, T. et al., 2004. Ethiopian mitochondrial DNA heritage: tracking gene flow
across and around the gate of tears. American journal of human genetics, 75(5),
pp.752–770.
Kivisild, T., Reidla, M., Metspalu, E. et al., 2002. The Genetics of Language and
Farming Spread in India. In Examining the farming/language dispersal hypothesis.
McDonald Institute Monographs, ISBN: 1902937201. Chpt. 17. pp. 215–222.
Klyosov, A., 2009. DNA Genealogy, Mutation Rates, and Some Historical Evidence
Written in the Y-Chromosome, Part II: Walking the Map, Journal of Genetic
Genealogy, 5(2), pp.217-256
Klyosov, A.A. & Rozhanskii, I.L., 2012. Haplogroup R1a as the Proto Indo-Europeans
and the Legendary Aryans as Witnessed by the DNA of Their Current Descendants.
Advances in Anthropology, 02(01), pp.1–13.
Knight, A., Underhill, P.A., Mortensen, H.M. et al., 2003. African Y chromosome and
mtDNA divergence provides insight into the history of click languages. Current
Biology, 13(6), pp.464–473.
Lacau, H., Gayden, T., Regueiro, M. et al., 2012. Afghanistan from a Y-chromosome
perspective. European Journal of Human Genetics, 20(10), pp.1063–1070.
Lander, E. & Schork, N.J., 1996. Genetic dissection of complex traits. Nature genetics,
12(4), pp.355–356.
114
Loogvali, E.L., Roostalu, U., Malyarchuk, B.A. et al., 2004. Disuniting uniformity: A
pied cladistic canvas of mtDNA haplogroup H in Eurasia. Molecular Biology and
Evolution, 21(11), pp.2012–2021.
Lu, Y., Goldstein, D.B., Angrist, M. et al., 2016. Personalized Medicine and Human
Genetic Diversity. Cold Spring Harbor Perspectives in Medicine, 4(9), pp.1-11.
Macaulay, V., Richards, M., Hickey, E. et al., 1999. The emerging tree of West
Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. American
journal of human genetics, 64(1), pp.232–49.
Manolio, T.A., Collins, F.S. & Cox, N.J., 2009. Finding the missing heritability of
complex diseases. Nature, 461(8), pp.747–753.
McEvoy, B. & Bradley, D.G., 2006. Y-chromosomes and the extent of patrilineal
ancestry in Irish surnames. Human Genetics, 119(1-2), pp.212–219.
McEvoy, B., Simms, K. & Bradley, D.G., 2008. Genetic investigation of the patrilineal
kinship structure of early medieval Ireland. American Journal of Physical
Anthropology, 136(4), pp.415–422.
Montinaro, F., Davies, J. & Capelli, C., 2016. Group membership, geography and
shared ancestry: Genetic variation in the Basotho of Lesotho. American Journal of
Physical Anthropology, 160(1), pp.156–161.
Moore, L.T., McEvoy, B., Cape, E. et al., 2006. A Y-chromosome signature of
hegemony in Gaelic Ireland. American journal of human genetics, 78(2), pp.334–338.
Msaidie, S., Ducourneau, A., Boetsch, G. et al., 2010. Genetic diversity on the
Comoros Islands shows early seafaring as major determinant of human biocultural
evolution in the Western Indian Ocean. European journal of human genetics: EJHG,
19(1), pp.89–94.
Mulero, J.J., Chang, C.W., Calandro, L. M. et al., 2006. Development and validation
of the AmpFℓSTR® YfilerTM PCR amplification kit: A male specific, single amplification
17 Y-STR multiplex system. Journal of Forensic Sciences, 51(1), pp.64–75.
115
Myres, N.M., Rootsi, S., Lin, A.A. et al., 2011. A major Y-chromosome haplogroup R1b
Holocene era founder effect in Central and Western Europe. European journal of
human genetics: EJHG, 19(1), pp.95–101.
Myrianthopoulos, N.C. & Aronson, S.M., 1966. Population dynamics of Tay-Sachs
disease. I. Reproductive fitness and selection. American Journal of Human Genetics,
18(4), pp.313–327.
Naidoo, T., Schlebusch, C.M., Makkan, H. et al., 2010. Development of a single base
extension method to resolve Y chromosome haplogroups in sub-Saharan African
populations. Investigative genetics, 1(1), p.6.
Nash, C., 2004. Genetic kinship. Cultural Studies, 18(1), pp.1–33.
Nash, C., 2006. Genetic tests for genealogy - 10 reasons to be wary. L’Observatoire
de la genetique, 29(Sept-Oct).
Nebel, A., Filon, D., Brinkmann, B. et al., 2001. The Y chromosome pool of Jews as
part of the genetic landscape of the Middle East. American journal of human genetics,
69(5), pp.1095–1112.
Pääbo, S., Poinar, H., Serre, D. et al., 2004. Genetic analyses from ancient DNA. Annu
Rev Genet, 38, pp.645–79.
Pamjav, H., Zalán, A., Béres, J. et al., 2011. Genetic structure of the paternal lineage
of the Roma People. American Journal of Physical Anthropology, 145(1), pp.21–29.
Phillips, C., Salas, A., Sánchez, J. J. et al., 2007. Inferring ancestral origin using a
single multiplex assay of ancestry-informative marker SNPs. Forensic Science
International: Genetics, 1(3-4), pp.273–280.
Plaza, S., Salas, A., Calafell, F. et al., 2004. Insights into the western Bantu dispersal:
mtDNA lineage analysis in Angola. Human Genetics, 115(5), pp.439–447.
Preston-Whyte, E., 1974. Reproductive health and the condom dilemma: identifying
situational barriers to HIV protection in South Africa. Resistances to Behavioural
Change to Reduce HIV/AIDS Infection, C, pp.139–155.
116
Pritchard, J.K., 2001. Are rare variants responsible for susceptibility to complex
diseases? American journal of human genetics, 69(1), pp.124–137.
Qamar, R., Ayub, Q., Mohyuddin, A. et al., 2002. Y-chromosomal DNA variation in
Pakistan. American journal of human genetics, 70(5), pp.1107–1124.
Quintana-Murci, L., Chaix, R., Wells, S.R. et al., 2004. Where west meets east: the
complex mtDNA landscape of the southwest and Central Asian corridor. American
journal of human genetics, 74(5), pp.827–845.
Raghavan, M., Skoglund, P., Graf, K.E. et al., 2014. Upper Palaeolithic Siberian
genome reveals dual ancestry of Native Americans. Nature, 505(7481), pp.87–91.
Ralph, P. & Coop, G., 2013. The Geography of Recent Genetic Ancestry across
Europe. PLoS Biology, 11(5).
Ramakrishnan, U. & Mountain, J.L., 2004. Precision and accuracy of divergence time
estimates from STR and SNPSTR variation. Molecular Biology and Evolution, 21(10),
pp.1960–1971.
Redd, A.J., Agellon, A.B., Kearney, V.A. et al., 2002. Forensic value of 14 novel STRs
on the human Y chromosome. Forensic Science International, 130(2-3), pp.97–111.
Regueiro, M., Rivera, L., Chennakrishnaiah, S. et al., 2012. Ancestral modal Y-STR
haplotype shared among Romani and South Indian populations. Gene, 504(2),
pp.296–302.
Richards, M.B., Macaulay, V.A., Bandelt, H.J. et al., 1998. Phylogeography of
mitochondrial DNA in western Europe. Annals of human genetics, 62(Pt 3), pp.241–
260.
Richards, M., Macaulay, V., Hickey, E. et al., 2000. Tracing European founder lineages
in the Near Eastern mtDNA pool. American journal of human genetics, 67(5), pp.1251–
1276.
Roewer, L., Krawczak, M., Willuweit, S. et al., 2001. Online reference database of
European Y-chromosomal short tandem repeat (STR) haplotypes. Forensic Science
International, 118(2-3), pp.106–113.
117
Roewer, L., 2009. Y chromosome STR typing in crime casework. Forensic Science,
Medicine, and Pathology, 5(2), pp.77–84.
Roewer, L., Willuweit, S., Krüger, C. et al., 2008. Analysis of Y chromosome STR
haplotypes in the European part of Russia reveals high diversities but non-significant
genetic distances between populations. International Journal of Legal Medicine,
122(3), pp.219–223.
Rosser, Z.H., Zerjal, T., Hurles, M.E. et al., 2000. Y-chromosomal diversity in Europe
is clinal and influenced primarily by geography, rather than by language. American
Journal of Human Genetics, 67(6), pp.1526–1543.
Sahoo, S., Singh, A., Himabindu, G. et al., 2006. A prehistory of Indian Y
chromosomes: evaluating demic diffusion scenarios. Proceedings of the National
Academy of Sciences of the United States of America, 103(4), pp.843–848.
Salas, A., Richards, M., De la Fe, T. et al., 2002. The making of the African mtDNA
landscape. American Journal of Human Genetics, The, 71(5), pp.1082–111.
Sanchez-Faddeev, H., Pijpe, J., van der Hulle, T. et al., 2013. The influence of clan
structure on the genetic variation in a single Ghanaian village. European Journal of
Human Genetics, 21(10), pp.1134–1139.
Scheffler, I.E., 2000. A century of mitochondrial research: Achievements and
perspectives. Mitochondrion, 1(1), pp.3–31.
Schlebusch, C.M., Naidoo, T. & Soodyall, H., 2009. SNaPshot minisequencing to
resolve mitochondrial macro-haplogroups found in Africa. Electrophoresis, 30(21),
pp.3657–3664.
Schlebusch, C.M., de Jongh, M. & Soodyall, H., 2011. Different contributions of
ancient mitochondrial and Y-chromosomal lineages in “Karretjie people” of the Great
Karoo in South Africa. Journal of Human Genetics, 56(9), pp.623–630.
Semino, O. et al., 2000. The genetic legacy of Paleolithic Homo sapiens sapiens in
extant Europeans: a Y chromosome perspective. Science (New York, N.Y.),
290(5494), pp.1155–1159.
118
Semino, O., Passarino, G., Oefner, P.J. et al., 2004. Origin, diffusion, and
differentiation of Y-chromosome haplogroups E and J: inferences on the neolithization
of Europe and later migratory events in the Mediterranean area. American journal of
human genetics, 74(5), pp.1023–1034.
Semino, O., Santachiara-Benerecetti, A.S., Falaschi, F. et al., 2002. Ethiopians and
Khoisan share the deepest clades of the human Y-chromosome phylogeny. American
journal of human genetics, 70(1), pp.265–268.
Sengupta, S., Zhivotovsky, L.A., King, R. et al., 2006. Polarity and temporality of highresolution Y-chromosome distributions in India identify both indigenous and
exogenous expansions and reveal minor genetic influence of Central Asian
pastoralists. American journal of human genetics, 78(2), pp.202–221.
Serre, D. & Pääbo, S., 2004. Evidence for gradients of human genetic diversity within
and among continents. Genome Research, 14(9), pp.1679–1685.
Shriver, M.D. & Kittles, R.A., 2004. Genetic ancestry and the search for personalized
genetic histories. Nature reviews. Genetics, 5(8), pp.611–618.
Simoni, L., Calafell, F., Pettener, D. et al., 2000. Geographic patterns of mtDNA
diversity in Europe. American journal of human genetics, 66(1), pp.262–278.
Soares, P., Alshamali, F.,Pereira, J.B. et al., 2011. The expansion of mtDNA
haplogroup L3 within and out of Africa. Molecular Biology and Evolution, 29(3),
pp.915–927.
Soga, J., 1930. The South-Eastern Bantu. Cambridge University Press, 2013. ISBN:
1108066828,
Soodyall, H., 2013. Lemba origins revisited: Tracing the ancestry of Y chromosomes
in South African and Zimbabwean Lemba. South African Medical Journal,
103(SUPPL. 1), pp.1009–1013.
Soodyall, H. & Schlebusch, C.M., 2010. The genetic landscape of sub-Saharan
African populations, unpublished, pp.1-34
119
Stark, A., 2013. The Matrilineal System of the Minangkabau and its Persistence
Throughout History: A Structural Perspective - the Minangkabau society. Southeast
Asia: A Multidisciplinary Journal, 13, pp.1–13.
Sykes,
B. 2001. The
seven
daughters of
Eve. Tandem Library,
2002.
ISBN:1417625929
Tamura, K., Dudley, J., Nei, M. et al., 2007. MEGA4: Molecular Evolutionary Genetics
Analysis (MEGA) software version 4.0. Molecular Biology and Evolution, 24(8),
pp.1596–1599.
Thanseem, I., Thangaraj, K., Chaubey, G. et al., 2006. Genetic affinities among the
lower castes and tribal groups of India: inference from Y chromosome and
mitochondrial DNA. BMC genetics, 7(42), p.42.
Taylor, S. 2005. The Caliban Shore. Faber & Faber, 2012. ISBN: 0571295673
Tishkoff, S.A., Gonder, M.K., Henn, B.M. et al., 2007. History of click-speaking
populations of Africa inferred from mtDNA and Y chromosome genetic variation.
Molecular Biology and Evolution, 24(10), pp.2180–2195.
Torroni, A., Lott, M.T., Cabell, M.F. et al., 1994. mtDNA and the origin of Caucasians:
Identification of ancient Caucasian- specific haplogroups, one of which is prone to a
recurrent somatic duplication in the D-loop region. American Journal of Human
Genetics, 55(4), pp.760–776.
Torroni, A., Rengo, C., Guida, V. et al., 2001. Do the four clades of the mtDNA
haplogroup L2 evolve at different rates? American journal of human genetics, 69(6),
pp.1348–1356.
Torroni, A., Huoponen, K., Francalacci, P. et al., 1996. Classification of European
mtDNAs from an analysis of three European populations. Genetics, 144(4), pp.1835–
1850.
Underhill, P.A., Passarino, G., Lin, A.A.
et al., 2001. The phylogeography of Y
chromosome binary haplotypes and the origins of modern human populations. Annals
of human genetics, 65(Pt 1), pp.43–62.
120
Underhill, P.A., Shen, P., Lin, A.A. et al., 2000. Y chromosome sequence variation and
the history of human populations. Nature genetics, 26(3), pp.358–361.
Underhill, P.A., Myres, N.M., Rootsi, S. et al., 2010. Separating the post-Glacial
coancestry of European and Asian Y chromosomes within haplogroup R1a. European
journal of human genetics: EJHG, 18(4), pp.479–484.
van Oven, M. & Kayser, M., 2009. Updated comprehensive phylogenetic tree of global
human mitochondrial DNA variation. Human mutation, 30(2), pp.386–394.
Varzari, A., Kharkov, V., Nikitin, A.G. et al., 2013. Paleo-Balkan and Slavic
Contributions to the Genetic Pool of Moldavians: Insights from the Y Chromosome.
PLoS ONE, 8(1), pp.1–9.
Vigilant, L. et al., 1989. Mitochondrial DNA sequences in single hairs from a southern
African population. Proceedings of the National Academy of Sciences of the United
States of America, 86(December), pp.9350–9354.
Westen, A.A., Westen, A.A., Kraaijenbrink, T. et al., 2015. Analysis of 36 Y-STR
marker units including a concordance study among 2085 Dutch males. Forensic
Science International: Genetics, 14, pp.174–181.
Willuweit, S. & Roewer, L., 2007. Y chromosome haplotype reference database
(YHRD): Update. Forensic Science International: Genetics, 1(2), pp.83–87.
Wilson, J.F., Weale, M.E., Smith, A.C. et al., 2001. Population genetic structure of
variable drug response. Nature genetics, 29(3), pp.265–269.
Wu, W., Pan, L., Hao, H. et al., 2010. Population genetics of 17 Y-STR loci in a large
Chinese Han population from Zhejiang Province, Eastern China. Forensic Science
International: Genetics, 5(1), pp.2009–2011.
Young, L.S. & Jackson, C., 2011. ‘Bhuti’: Meaning and Masculinities in Xhosa
Brothering. Journal of Psychology in Africa, 21(2), pp.221–228.
Zalloua, P.A., Platt, D.E., El Sibai, M. et al., 2008. Identifying Genetic Traces of
Historical Expansions: Phoenician Footprints in the Mediterranean. American Journal
of Human Genetics, 83(5), pp.633–642.
121
Zegura, S.L., Karafet, T.M., Zhivotovsky, L.A. et al., 2004. High-Resolution SNPs and
Microsatellite Haplotypes Point to a Single, Recent Entry of Native American Y
Chromosomes into the Americas. Molecular Biology and Evolution, 21(1), pp.164–
175.
Zhivotovsky, L.A., Underhill, P.A, Cinnioğlu, C. et al., 2004. The effective mutation rate
at Y chromosome short tandem repeats, with application to human populationdivergence time. American journal of human genetics, 74(1), pp.50–61.
Online references:
1. Hayward-Kalis,
J.
2006.
History
of
Pondoland
http://www.portstjohns.org.za/history-pondoland.html
Copyright
(Transkei)
c
2006-2010
portstjohns.org.za. Accessed August, 2011.
2. Zinto. 2007. Blood ties between Slaves, Europeans & Xhosa in the Cape
http://slaveryincapetown.blogspot.com/2007/02/blood-ties-between-slaveseuropeans.html. Accessed June, 2012.
3. The Mito Blog - ALL ABOUT MITOCHONDRIA! https://themitoblog.wordpress.com/tag/mitochondria-blog/ Last accessed
February, 2016
4. International Society of Genetic Genealogy (2016). Y-DNA Haplogroup Tree 2016,
Version: 11.298, 31 October 2016, http://www.isogg.org/tree/ [Date of last access:
June, 2016].
5. J.D
MacDonald
-
the
MacDonald
family
name
reference
database
<http://www.scs.illinois.edu/~mcdonald>. Accessed April, 2016
6. Irish surname/haplotype reference database: http://www.irishtype3dna.org/
Accessed April, 2016.
122
APPENDICES
123
Appendix A : Ethics
124
Appendix B: SNP-marker panels for SBE multiplex assays
Table S1
125
Table S1 continued
126
Appendix C: Clan genealogies
Figures S1 – S6 are the partial genealogies of the 13 clan families. The legend bellow
describes the symbols found in the clan genealogy diagrams:
Clan-affiliated individuals that are part of the sample set are demarcated with a sample
code (UMT prefix) and their unique haplotype, while the remainder of the individuals
in the genealogies are demarcated by a numbering system where the roman numeral
indicates a particular generation, and the subsequent
numbers designate the
particular individual within that generation.
Note: The amaMolo and abeLungu Jekwa genealogies (Figures S1 & S2) have been
split and resized as opposed to having featured them in landscape orientation on the
page, so as to illustrate them with an optimal aspect ratio.
127
Figure S1 - The amaMolo clan genealogy featuring the distinctive Bhayi haplogroup R1a1a
(R-M198)] lineage and the Pita [haplogroup Q (Q-M242)] lineage
128
Figure S2 - The abeLungu Jekwa clan genealogy featuring the modal haplotype R343_5
and variant lineages. The two NPTs are demarcated in red squares
129
Figure S3 - The abeLungu Caine and Horner clan genealogies
130
Figure S4 - The abeLungu Hatu clan genealogy
131
Figure S5 - The abeLungu Ogle, Irish, France, Buku and Thaka clan genealogies
132
Figure S6 - The abeLungu Fuzwayo, Hastoni and Sukwini clan genealogies featuring African
haplotypes
133
Appendix D: Variant sites of unique mtDNA haplotypes
Table S2
HG
HT
HT
HVR1 Variant Sites
HVR2 Variant Sites
freq.
L0a1b
1a
2
16129A, 16148T, 16168T, 16172C, 16187T,
73G, 146C, 152C, 195C, 247A
16188G, 16189C, 16223T, 16230G, 16278T,
16293G, 16311C, 16320T
1b
1
16129A, 16148T, 16168T, 16172C, 16187T,
73G, 146C, 152C, 195C, 263G
16188G, 16189C, 16223T, 16230G, 16278T,
16293G, 16311C, 16320T
1c
1
16129A, 16148T, 16168T, 16172C, 16187T,
73G, 150T, 185A, 189G, 263G
16188G, 16189C, 16223T, 16230G, 16278T,
16293G, 16311C, 16320T
1d
1
16129A, 16148T, 16168T, 16172C, 16187T,
73G, 185A, 236C, 247A, 263G
16188G, 16189C, 16223T, 16230G, 16278T,
16293G, 16311C, 16320T
1e
1
16129A, 16148T, 16168T, 16172C, 16187T,
73G, 95C, 189G, 198T, 247A, 263G
16188G, 16189C, 16223T, 16230G, 16278T,
16293G, 16311C, 16320T
1f
1
16129A, 16148T, 16168T, 16172C, 16187T,
89C, 93G, 95C, 185A, 189G, 236C,
16188G, 16189C, 16223T, 16230G, 16278T,
247A, 263G
16293G, 16311C, 16320T
1g
2
16129A, 16148T, 16168T, 16172C, 16187T,
93G, 95C, 185A, 189G, 236C,
16188G, 16189C, 16223T, 16230G, 16278T,
247A, 263G
16293G, 16311C, 16320T
1h
1
16129A, 16148T, 16168T, 16172C, 16187T,
93G, 95C, 185A, 189G, 236C,
16188G, 16189C, 16223T, 16230G, 16278T,
247A, 263G
16293G, 16311C, 16320T, 16519C
1i
1
16129A, 16148T, 16168T, 16172C, 16188G,
93G, 95C, 185A, 189G, 236C,
16189C, 16223T, 16230G, 16278T, 16293G,
247A, 263G
16311C, 16320T
L0a2a2 2a
2b
2
1
16129A, 16148T, 16169T, 16172C, 16187T,
93G, 146C, 150T, 185A, 189G,
16188A, 16189C, 16223T, 16230G, 16261T,
195C, 204C, 207A, 236C, 247A,
16278T, 16311C, 16320T, 16519C
263G
16148T, 16172C, 16187T, 16188G, 16189C,
93G, 152C, 189G, 204C, 207A,
16223T, 16230G, 16311C, 16320T
236C, 247A, 263G
134
2c
2d
2e
2
1
1
16148T, 16172C, 16187T, 16188G, 16189C,
64T, 93G, 152C, 189G, 204C,
16223T, 16230G, 16311C, 16320T, 16519C
207A, 236C, 247A, 263G
16148T, 16172C, 16187T, 16188G, 16189C,
73G, 150T, 185A, 189G, 195C,
16223T, 16230G, 16311C, 16320T, 16519C
263G
16148T, 16172C, 16187T, 16188G, 16189C,
73G, 152C, 195C, 263G
16223T, 16230G, 16311C, 16320T, 16519C
2f
1
16148T, 16172C, 16187T, 16188G, 16189C,
73G, 93G, 146C, 150T, 152C,
16223T, 16230G, 16311C, 16320T, 16519C
182T, 183G, 195C, 198T, 263G,
325T
2g
L0d1a
3a
1
1
16148T, 16172C, 16187T, 16188G, 16189C,
73G, 93G, 152C, 189G, 204C,
16223T, 16230G, 16311C, 16320T, 16519C
207A, 236C, 247A, 263G
16129A, 16187T, 16189C, 16209C, 16230G,
73G, 146C, 153G, 199C, 247A
16234T, 16243C, 16266A, 16311C, 16519C
3b
3c
1
1
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 146C, 152C, 199C, 247A,
16243C, 16266A, 16284G, 16311C, 16519C
310C, 311T
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 150T, 185A, 189G, 263G
16243C, 16266A, 16284G, 16311C, 16519C
3d
1
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 146C, 195C, 199C, 247A
16243C, 16266A, 16311C, 16464C, 16519C
3e
2
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 146C, 152C, 199C, 247A
16243C, 16266A, 16311C, 16519C
3f
3
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 146C, 195C, 199C, 247A
16243C, 16266A, 16311C, 16519C
3g
3h
1
1
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 146C, 195C, 199C, 247A,
16243C, 16266A, 16311C, 16519C
318C
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 150T, 195C, 263G
16243C, 16266A, 16311C, 16519C
3i
1
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 185A, 236C, 247A, 263G
16243C, 16266A, 16311C, 16519C
3j
1
16129A, 16187T, 16189C, 16230G, 16234T,
93G, 146C, 150T, 185A, 189G,
16243C, 16266A, 16311C, 16519C
195C, 204C, 207A, 236C, 247A,
263G
3k
1
16129A, 16187T, 16189C, 16230G, 16234T,
73G, 146C, 195C, 199C, 247A
16243C, 16266T, 16311C, 16318T, 16519C
3l
1
16187T, 16189C, 16223T, 16230G, 16234T,
73G, 146C, 152C, 195C, 247A
16243C, 16249C, 16311C, 16519C
135
L0d1b
4a
1
16129A, 16187T, 16189C, 16218T, 16223T,
73G, 146C, 152C, 195C, 247A
16227G, 16239T, 16243C, 16294T, 16311C,
16478G, 16519C
4b
2
16129A, 16187T, 16189C, 16218T, 16223T,
73G, 146C, 152C, 195C, 247A
16239T, 16243C, 16294T, 16311C, 16519C
L0d2a
5a
1
16093C, 16129A, 16187T, 16189C, 16212G,
73G, 146C, 152C, 195C, 198T,
16223T, 16230G, 16243C, 16311C, 16390A,
247A
16519C
5b
1
16129A, 16145A, 16187T, 16189C, 16212G,
73G, 146C, 152C, 195C, 198T,
16223T, 16230G, 16243C, 16311C, 16390A,
247A
16519C, 16524G
5c
1
16129A, 16154G, 16187T, 16189C, 16212G,
73G, 146C, 152C, 195C, 198T,
16223T, 16230G, 16243C, 16311C, 16390A,
247A
16519C
5d
1
16129A, 16179T, 16187T, 16189C, 16212G,
73G, 146C, 152C, 195C, 198T,
16223T, 16230G, 16243C, 16311C, 16390A,
247A
16519C
5e
4
16129A, 16187T, 16188G, 16189A, 16212G,
73G, 146C, 152C, 195C, 198T,
16223T, 16230G, 16243C, 16311C, 16390A,
247A
16519C
5f
1
16129A, 16187T, 16189C, 16212G, 16221T,
73G, 146C, 195C, 199C, 247A
16223T, 16230G, 16243C, 16311C, 16390A,
16519C
5g
2
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 146C, 152C, 195C, 198T,
16230G, 16243C, 16311C, 16320T, 16390A,
247A
16519C
5h
5i
5j
5k
5l
1
1
11
1
1
16129A, 16187T, 16189C, 16212G, 16223T,
185A, 189G, 236C, 247A, 263G,
16230G, 16243C, 16311C, 16390A, 16519C
324G, 348G
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 146C, 152C, 195C, 198T,
16230G, 16243C, 16311C, 16390A, 16519C
227G, 247A
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 146C, 152C, 195C, 198T,
16230G, 16243C, 16311C, 16390A, 16519C
247A
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 146C, 152C, 199C, 247A,
16230G, 16243C, 16311C, 16390A, 16519C
310C, 311T
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 146C, 153G, 199C, 247A
16230G, 16243C, 16311C, 16390A, 16519C
136
5m
1
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 150T, 185A, 189G, 263G
16230G, 16243C, 16311C, 16390A, 16519C
5n
1
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 150T, 195C, 263G
16230G, 16243C, 16311C, 16390A, 16519C
5o
5p
1
1
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 93G, 146C, 152C, 195C,
16230G, 16243C, 16311C, 16390A, 16519C
236C, 247A, 263G
16129A, 16187T, 16189C, 16212G, 16223T,
73G, 146C, 152C, 195C, 198T,
16230G, 16243C, 16311C, 16390A, 16519C,
247A
16549G
5q
5r
1
1
16129A, 16187T, 16189C, 16223T, 16230G,
73G, 146C, 152C, 195C, 198T,
16239T, 16243C, 16294T, 16311C, 16519C
247A
16129A, 16187T, 16189C, 16223T, 16230G,
73G, 146C, 152C, 195C, 247A
16239T, 16243C, 16294T, 16311C, 16519C
5s
1
16129A, 16187T, 16189C, 16223T, 16230G,
73G, 150T, 195C, 263G
16239T, 16243C, 16294T, 16311C, 16519C
5t
1
16129A, 16187T, 16189C, 16223T, 16239T,
73G, 146C, 152C, 195C, 247A
16243C, 16294T, 16311C, 16519C
L0d2b
6a
2
16069T, 16126C, 16129A, 16169T, 16182C,
73G, 146C, 195C, 247A, 265C
16183C, 16189C, 16212G, 16223T, 16230G,
16243C, 16258C, 16291T, 16311C, 16519C
L0d2c
7a
1
16129A, 16187T, 16189C, 16223T, 16230G,
73G, 146C, 195C, 247A, 294A
16243C, 16311C, 16519C
L0d3
8a
1
16187T, 16189C, 16223T, 16230G, 16243C,
73G, 146C, 150T, 195C, 247A
16256T, 16274A, 16278T, 16290T, 16300G,
16311C, 16362C, 16519C
8b
1
16187T, 16189C, 16223T, 16230G, 16243C,
73G, 146C, 150T, 195C, 247A,
16256T, 16274A, 16278T, 16290T, 16300G,
316A
16311C, 16519C
L0f1
9a
1
16129A, 16169T, 16172C, 16174T, 16182C,
93G, 151T, 152C, 189G, 207A,
16183C, 16189C, 16223T, 16230G, 16278T,
247A, 263G
16311C, 16327T, 16368C, 16519C
L1c1d
10a
1
16038G, 16187T, 16189C, 16223T, 16278T,
73G, 151T, 152C, 182T, 186A,
16293G, 16294T, 16311C, 16360T, 16519C
189C, 195C, 198T, 247A, 263G,
297G, 316A
L1c3
11a
1
16129A, 16182C, 16183C, 16189C, 16215G,
73G, 152C, 182T, 186A, 189C,
16223T, 16278T, 16294T, 16311C, 16360T,
247A, 263G, 316A
16519C
137
11b
1
16129A, 16183C, 16189C, 16209C, 16215G,
73G, 150T, 189G, 263G
16223T, 16278T, 16294T, 16311C, 16360T,
16519C
L2a1a
12a
12b
1
3
16092C, 16223T, 16278T, 16286T, 16294T,
73G, 146C, 152C, 195C, 198T,
16309G, 16390A, 16519C
247A
16092C, 16223T, 16278T, 16286T, 16294T,
73G, 146C, 152C, 195C, 263G
16309G, 16390A, 16519C
12c
1
16092C, 16223T, 16278T, 16286T, 16294T,
73G, 150T, 189G, 200G, 263G
16309G, 16390A, 16519C
12d
12e
1
1
16223T, 16278T, 16286T, 16294T, 16309G,
73G, 146C, 152C, 195C, 198T,
16390A, 16519C
247A
16223T, 16278T, 16286T, 16294T, 16309G,
73G, 146C, 152C, 195C, 263G
16390A, 16519C
L2a1b
13a
1
16051G, 16182C, 16183C, 16189C, 16192T,
73G, 146C, 152C, 195C, 263G
16223T, 16278T, 16290T, 16294T, 16309G,
16390A
13b
1
16051G, 16182C, 16183C, 16189C, 16223T,
73G, 146C, 152C, 195C, 263G
16278T, 16290T, 16294T, 16309G, 16390A
13c
1
16182C, 16183C, 16189C, 16194C, 16195A,
73G, 146C, 152C, 195C, 263G
16223T, 16278T, 16290T, 16294T, 16309G,
16390A
13d
13e
1
1
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 146C, 150T, 152C, 182T,
16290T, 16294T, 16309G, 16380T, 16390A
183G, 195C, 198T, 263G, 325T
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 146C, 152C, 195C, 263G
16290T, 16294T, 16309G, 16380T, 16390A
13f
4
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 146C, 152C, 195C, 263G
16290T, 16294T, 16309G, 16390A
13g
13h
13i
1
1
1
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 150T, 152C, 185A, 189G,
16290T, 16294T, 16309G, 16390A
263G
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 150T, 185A, 189G, 195C,
16290T, 16294T, 16309G, 16390A
263G
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 150T, 185A, 189G, 263G
16290T, 16294T, 16309G, 16390A
13j
1
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 93G, 146C, 152C, 195C,
16290T, 16294T, 16309G, 16390A
236C, 247A, 263G
138
13j
13k
1
1
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 93G, 146C, 152C, 195C,
16290T, 16294T, 16309G, 16390A
263G
16182C, 16183C, 16189C, 16223T, 16278T,
73G, 146C, 152C, 195C, 263G
16290T, 16294T, 16390A
13l
1
16189C, 16223T, 16278T, 16294T, 16309G,
73G, 146C, 152C, 195C, 263G
16390A, 16519C
13m 1
L2c2
14a
14b
1
1
16189C, 16223T, 16278T, 16294T, 16309G,
93G, 152C, 189G, 204C, 207A,
16390A, 16519C
236C, 247A, 263G
16223T, 16264T, 16265G, 16278T, 16311C,
73G, 146C, 150T, 152C, 182T,
16390A
183G, 195C, 198T, 263G, 325T
16223T, 16264T, 16265G, 16278T, 16311C,
73G, 150T, 195C, 263G
16390A
14c
1
16223T, 16264T, 16265G, 16278T, 16311C,
73G, 93G, 146C, 150T, 152C,
16390A
182T, 183G, 195C, 198T, 263G,
325T
14d
14e
1
1
16223T, 16264T, 16265G, 16278T, 16311C,
73G, 93G, 146C, 150T, 152C,
16390A
182T, 195C, 198T, 263G, 325T
16223T, 16264T, 16278T, 16311C, 16390A
73G, 93G, 146C, 150T, 152C,
182T, 183G, 195C, 198T, 263G,
325T
14f
1
16223T, 16264T, 16278T, 16390A
73G, 93G, 146C, 150T, 152C,
182T, 195C, 198T, 263G, 325T
L2d1
15a
15b
15c
4
1
1
16093C, 16223T, 16278T, 16294T, 16311C,
73G, 143A, 146C, 152C, 182T,
16390A, 16399G, 16519C
195C, 263G
16093C, 16223T, 16278T, 16294T, 16311C,
73G, 146C, 152C, 195C, 198T,
16390A, 16399G, 16519C
247A
16093C, 16223T, 16278T, 16294T, 16311C,
73G, 146C, 152C, 195C, 247A
16390A, 16399G, 16519C
15d
1
16093C, 16223T, 16278T, 16294T, 16311C,
73G, 146C, 152C, 195C, 263G
16390A, 16399G, 16519C
L3d1a
16a
1
16124C, 16223T, 16319A
73G, 146C, 152C, 195C, 198T,
247A
16b
2
16124C, 16223T, 16319A
73G, 146C, 152C, 195C, 263G
16c
1
16124C, 16223T, 16319A
73G, 146C, 195C, 247A, 294A
139
16d
7
16124C, 16223T, 16319A
73G, 150T, 152C, 263G
16e
1
16124C, 16223T, 16319A
73G, 150T, 185A, 189G, 263G
16f
1
16124C, 16223T, 16319A
93G, 95C, 185A, 189G, 236C,
247A, 263G
L3d3
17a
1
16124C, 16183C, 16189C, 16223T, 16278T,
73G, 152C, 195C, 263G
16304C, 16311C
17b
1
16124C, 16183C, 16189C, 16223T, 16278T,
73G, 152C, 195C, 263G
16304C, 16311C, 16430C
17c
1
16124C, 16183C, 16189C, 16223T, 16278T,
93G, 146C, 150T, 185A, 189G,
16304C, 16311C, 16430C
195C, 204C, 207A, 236C, 247A,
263G
L3e1a1 18a
2
16185T, 16223T, 16265G, 16311C, 16327T,
73G, 150T, 189G, 200G, 263G
16519C
18b
1
16185T, 16223T, 16311C, 16327T
73G, 150T, 185A, 189G, 200G,
263G
L3e1b
18c
1
16185T, 16223T, 16311C, 16327T, 16519C
73G, 146C, 152C, 195C, 263G
18d
2
16185T, 16223T, 16311C, 16327T, 16519C
73G, 150T, 185A, 189G, 263G
18e
1
16185T, 16223T, 16311C, 16327T, 16519C
73G, 150T, 189G, 200G, 263G
18f
1
16185T, 16223T, 16311C, 16327T, 16519C
73G, 150T, 189G, 263G
19a
1
16223T, 16239T, 16325delT
73G, 146C, 152C, 195C, 198T,
247A
19b
1
16223T, 16239T, 16325delT
73G, 150T, 152C, 185A, 189G,
263G
19c
2
16223T, 16239T, 16325delT
73G, 150T, 185A, 189G, 195C,
263G
L3e2b
19d
5
16223T, 16239T, 16325delT
73G, 150T, 185A, 189G, 263G
19e
1
16223T, 16239T, 16325delT
73G, 150T, 195C, 263G
19f
2
16223T, 16239T, 16325delT, 16519C
73G, 150T, 185A, 189G, 263G
19g
3
16223T, 16325delT, 16327T
73G, 150T, 185A, 189G, 263G
20a
1
16070C, 16172C, 16183C, 16189C, 16223T,
73G, 150T, 195C, 263G
16320T, 16519C
140
20b
1
16172C, 16182C, 16183C, 16189C, 16223T,
73G, 150T, 195C, 263G
16320T, 16519C
20c
20d
1
12
16172C, 16183C, 16189C, 16223T, 16320T,
73G, 146C, 152C, 195C, 244G,
16519C
263G, 340T
16172C, 16183C, 16189C, 16223T, 16320T,
73G, 150T, 152C, 263G
16519C
20e
20f
1
1
16172C, 16183C, 16189C, 16223T, 16320T,
73G, 152C, 182T, 186A, 189C,
16519C
247A, 263G, 316A
16172C, 16183C, 16189C, 16223T, 16320T,
73G, 95C, 189G, 198T, 247A, 263G
16519C
L3e3
21a
1
16223T, 16265T, 16519C
73G, 150T, 195C, 263G
L4b2
22a
1
16051G, 16114T, 16189C, 16207G, 16223T,
73G, 146C, 152C, 195C, 244G,
16293T, 16311C, 16316G, 16355T, 16362C,
263G
16399G, 16519C
141
Appendix E: Comparative data sources
Table S3
142
Comparative data from In-house (HGDDRU)
projects
Table S3 continued
143
Appendix F: Preparation of Solutions
70% Ethanol Solution
96% Ethanol Solution 729ml
*Make up to 1000ml with ddH2O
0.5M Ethylenediamine Tetra-acetic Acid (EDTA)
Na2EDTA.2H2O
93.05g
dH2O
300ml
Final volume (dH2O) 500ml
* pH adjusted to 8.0 with 10M NaOH and then autoclaved
1 X TE buffer
10 ml 1 M Tris-HCl pH8
2 ml 0.5 M EDTA
Make up to 1000ml with dH2O and autoclave
10 X TBE buffer
108 g Tris
55 g Boric acid
7.44 g EDTA
Make up to 1000ml with dH2O and autoclave
1 X TBE (1:10 dilution)
40 ml 10 X TBE
Make up to 200ml with ddH20
10M NaOH
NaOH pellets
4g
dH2O
10ml
2% Agarose Gel
Agarose
0.5g
144
1 x TBE
50ml
EtBr
0.5µl
Ficoll loading Dye
Sucrose
50%
EDTA
50mM
Bromophenol blue 0.10%
Ficoll
10%
1 M Tris-HCl
121.1 g Tris
1 L dH2O
Autoclave
1 M MgCl2
101.66 g MgCl2
500 ml dH2O
Autoclave
Proteinase K (10 mg/ml)
100 mg Proteinase K stock (100 mg/ml)*
10 ml ddH20
*Available from Roche Diagnostics
Proteinase-K mix
For 16 extractions:
400µl 10% SDS
16µl 0.5 M EDTA
2.8 ml autoclaved dH2O
Add 800 µl Proteinase K (10 mg/ml stock) just before use
Saturated NaCl
100 ml autoclaved dH2O
Slowly add 40 g NaCl until absolutely saturated (some NaCl will precipitate out)
Before use, agitate and let NaCl precipitate out
145
Bromophenol blue Ficoll dye
50 ml dH2O
50 g sucrose
1.86 g EDTA
0.1 g bromophenol blue
145
10 g Ficoll
Dissolve
Adjust volume to 100 ml with dH2O, stir overnight
pH to 8.0
Filter through Whatmann filter paper
Store at room temperature
10 mg/ml Ethidium bromide (EtBr)
Add 1 g of ethidium bromide to
100 ml of ddH2O
Stir until completely dissolved
Store at 4°C wrapped in aluminum foil
1kb size standard
285 µl 1kb ladder (GibcoBRL)
143 µl Ficoll dye
2400 µl 1 X TE
146