"Unleash your creativity and unlock your potential with MsgBrains.Com - the innovative platform for nurturing your intellect." » » "Critical Perspectives on Ancient DNA" by Daniel Strand👁️‍🗨️

Add to favorite "Critical Perspectives on Ancient DNA" by Daniel Strand👁️‍🗨️

Select the language in which you want the text you are reading to be translated, then select the words you don't know with the cursor to get the translation above the selected word!




Go to page:
Text Size:

22.  Källén, Trouble with Ancient DNA.

23.  Brace et al., “Ancient Genomes Indicate Population Replacement,” 765–771.

24.  Charlotte Hedenstierna-Jonson et al., “A Female Viking Warrior Confirmed by Genomics,” American Journal of Physical Anthropology 164, no. 4 (2017): 853–860; Källén et al., “Archaeogenetics in Popular Media.”

25.  Richard Buckley et al., “ ‘The King in the Car Park’: New Light on the Death and Burial of Richard III in the Grey Friars Church, Leicester, in 1485,” Antiquity 336, no. 87 (2013): 519–538; Jo Appleby et al., “The Scoliosis of Richard III, last Plantagenet King of England: Diagnosis and Clinical Significance,” Lancet 383, no. 9932 (2014): 1944; Jo Appleby et al., “Perimortem Trauma in King Richard III: A Skeletal Analysis,” Lancet 385, no. 9964 (2015): 253–259.

26.  Angela L. Lamb et al., “Multi-Isotope Analysis Demonstrates Significant Lifestyle Changes in King Richard III,” Journal of Archaeological Science 50 (2014): 559–565.

27.  Turi Emma King et al., “Identification of the Remains of King Richard III,” Nature Communications 5, no. 5631 (2014): 1–8.

28.  King et al., “Identification of the Remains,” 4.

29.  Sirak and Sedig, “Balancing Analytical Goals,” 560–573; David Reich, “The Future of Ancient DNA,” in Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past (Oxford: Oxford University Press, 2018), 274–286.

30.  Noam Chomsky, For Reasons of State (New York: Pantheon, 1973), 402.

31.  See, for example, Callaway, “Divided by DNA”; Lewis-Kraus, “Is Ancient DNA Research Revealing New Truths?”; Källén et al., “Petrous Fever.”





2   Diagrams of Human Genetic Kinship and Diversity: From the Tree to the Mosaic and the Network?

Marianne Sommer and Ruth Amstutz

Genealogy is traditionally expressed in the shape of a family tree, and this also holds true for human evolution. Just like the tree of life, the human family tree visualizes the descent of hominin species.1 With the turn toward an evolutionary understanding of human origins and the discovery of fossil remains, such trees have been based on the comparative anatomy of living and fossil species. The tree shape has also been used to speculate about inner-human differentiation: the evolution and kinship of what used to be called “human races.”

In the second half of the nineteenth century, the influential evolutionary biologist Ernst Haeckel considered both physical characteristics and comparative linguistics to analogize the evolution and kinship of languages to trees of “racial descent.” Thus, a structure that lent itself to depicting both the genealogy of individuals and the descent of species came to be applied to groups of humans. In these “racial trees,” long, independent branches led to the different “modern human races,” suggesting that such “races” had evolved in isolation and stayed “pure” over long periods of time. Indeed, Haeckel and others thought that the differences between the “human races” were at least equivalent to the differences between species. The tree was a picture adequate to capturing their understanding of intrahuman as well as hominin evolution.2

By the 1960s, the time was ripe for a new way of drawing phylogenetic trees. Computer technologies and genetic data allowed the Italian population geneticist Luigi Luca Cavalli-Sforza and the British statistician and geneticist A. W. F. Edwards to publish what they called the first tree of human populations. This was before it was possible to sequence DNA. Cavalli-Sforza and Edwards analyzed twenty alleles3 from the five main blood group systems of fifteen human populations. The analysis resulted in a phylogenetic tree that separated a wild mix of population labels—from “English” to “Eskimo (Victoria 1)”—into different branches.4 While some of the labels had a legacy in the history of racial and colonial anthropology, the emergent study of human population genetics was very different from traditional racial typology. It was genetic variation that was of central interest, and the notion of pure races and race in general was often emphatically rejected.

Figure 2.1 represents a population tree, also produced by Cavalli-Sforza and colleagues, but based on an analysis that, compared to the study mentioned above, used far more alleles sampled across many so-called aboriginal populations. For the original tree, archaeological dates for steps in human expansion were used to calibrate genetic differentiation as well as to check for constant rates of genetic evolution.5 Like Haeckel, Cavalli-Sforza saw in the evolution of languages a powerful tool to support genetic trees. In its original form, figure 2.1 was seen as indicating parallel linguistic and genetic evolution.

Figure 2.1

“Average linkage tree for 42 populations.” Used with permission of Princeton University Press, from Luigi L. Cavalli-Sforza, Paolo Menozzi, and Alberto Piazza, The History and Geography of Human Genes (Princeton, NJ: Princeton University Press, 1994), 78. Permission conveyed through Copyright Clearance Center Inc.

Although such tree figures suggest that the human populations have a common origin, the human groups appear to have separated at one point (more or less) back in time and to never have mixed again. Such trees would thus make it seem as if human populations were discreet, homogeneous entities that had independent evolutionary histories after the last population split. Indeed, Cavalli-Sforza and colleagues used only data referring to populations that were supposedly “aboriginal, with little or no admixture.”6 Despite the rejection of many of the notions of racial anthropology—most importantly its typological understanding of race—the tree image that supported this understanding was thus carried over into human population genetics. The early human population geneticists were, however, not unaware of the issues with population trees. As early as the 1970s, Cavalli-Sforza suggested that trees might work only for populations that are geographically far apart, because otherwise, “instead of a ‘tree’ one may have to estimate a ‘network’; such methods do not yet exist.”7

At the end of Cavalli-Sforza’s career at the beginning of the twenty-first century, new theoretical, statistical, and computational approaches could be brought to bear on the organization and interpretation of an unprecedented amount of human genomic data. It was now possible to analyze and visualize the degree to which present-day individuals and populations are the result of admixture between human groups. With the introduction of such statistical computer software, the visual black box of these seemingly discreet and homogeneous entities—human populations—was opened. Individuals and populations now often came to be represented as colored bar plots indicating their admixed histories. Accordingly, extant human genomes at the individual and population level were now conceived of as a mosaic: that is to say, they comprised genetic elements from ancestral populations of different geographical origins.8

The era of population genomics also witnessed the possibility of extracting DNA from the fossils of hominins and ancient humans, and integrating these ancient DNA data into the analysis of evolutionary history and kinship. The advancing field of aDNA research relied on population genomics, from which it adopted terminologies, methodologies, and visualization techniques. At the same time, by bringing in a new deep-historical structure, the inclusion of aDNA into population-genomic models and visualizations shifted the focus more strongly toward processes of ancestral admixture, even between archaic humans, such as the Neanderthals, and modern humans.

With the advent of aDNA, the understanding of human history and diversity thus seems to have changed considerably. To find out how this shift is reflected—or not—in the images used in the field, we follow the representation of human history and diversity through the admixture paradigm in human population genomics and the shift toward the inclusion of aDNA data. Our focus thereby rests on prominent models and tools, on the meaning that representations seem to carry regarding human diversity, and on how this meaning fits the assumptions of practitioners. We build on the observation that underneath the representation of individual and populational genetic kinship and diversity in terms of admixture and as mosaics continues to lurk the hierarchically organized tree that suggests independent (nonadmixing) histories of discreet populations.9 We argue that although there was a paradigm shift toward the idea of a mosaic structure for human populations, the tree image was not dissolved by the new—and aDNA-driven—models that emphasize admixture, introgression (the transfer of genetic material from one population to another), and gene flow.

From the Tree to the Mosaic

At the beginning of the twenty-first century, a new standard for the analysis and representation of population genetic variation emerged through novel model-based clustering software that was typically developed by mathematicians together with statistically and computationally trained geneticists. The first of these clustering programs, STRUCTURE, was released in 2000. In the following years, it became one of the major tools to estimate ancestry from genome-wide human data, and the same holds true for the diagrammatic representation of the results of such analyses.10 The diagram commonly used to represent the results from STRUCTURE was first used for the visualization of a genome-wide analysis of the Human Genome Diversity Panel in 2002 (see figure 2.3).11 A software package that generated bar plots from STRUCTURE analyses, called DISTRUCT, was published two years later. Today, other programs are mainly in use, and these include follow-up programs of DISTRUCT as well as clustering software, such as ADMIXTURE, first published in 2009. The functions and the standard graphical representation of STRUCTURE and ADMIXTURE analyses are only marginally different. The main advantage of ADMIXTURE over STRUCTURE is that the program is considerably faster, allowing the processing of a much larger set of markers.12

With STRUCTURE, it was for the first time possible to compute genetic admixture by analyzing genome-wide data. At a very general level, the term “admixture” in this context means two things. First, it denotes a historical process: the mixing of at least two distinct populations through migration and reproduction. Second, it refers to a state of relatedness: the genetic makeup of individuals and populations in terms of so-called ancestry coefficients13 and population structure.14 These two levels of meaning are commonly understood to be causally related: admixture as a process is thought to result in admixture as a state. However, as we show below, the relationship between these two meanings of admixture is far more complex, and the significance of the term varies depending on the context of use. Whereas the inventors of STRUCTURE particularly emphasized the utility of their program for population genetics, in which the historical processes of admixture are of crucial interest, the developers of ADMIXTURE advertised their program as a tool for medical genetics, where the interest in admixture is typically limited to present genetic structures.15 However, ADMIXTURE is used beyond this context of application.

STRUCTURE and ADMIXTURE are based on the assumption that human genetic diversity is substantially shaped by admixture. At the same time, geneticists usually understand certain populations to be “more admixed” than others. As data scientist Daniel Lawson and geneticists Lucy van Dorp and Daniel Falush—the latter being one of the coauthors of the second version of STRUCTURE—have pointed out, the admixture model on which STRUCTURE and ADMIXTURE are based was derived from a very specific case of such “recent admixture”: that of African Americans.16 It relied on the assumption that every African American individual has ancestry from two genetically distinct “sources”—West Africa and Europe—and that before their abduction to, or settlement of, America, both groups had minimal contact. Therefore, the history of African Americans is divided into two phases: a phase of thousands of years of independent evolution and a phase of admixture in the past few hundred years. In other words, most of the ancestors of contemporary African American individuals who lived 500 or more years ago are taken to have been either Africans or Europeans.

By comparing multiple different sequences of individual whole-genome samples, STRUCTURE and ADMIXTURE identify subgroups in which certain gene variants (alleles) occur at different frequencies. In this process, samples are grouped into several clusters (K), the number of which is chosen in advance but can be varied across independent runs of the algorithm. Figure 2.2 shows a bar plot as it is typically generated in the course of STRUCTURE and ADMIXTURE analyses. Here, ADMIXTURE is set to K = 4, which means that the membership coefficients of the individual samples are calculated in relation to four different clusters. The clusters are represented by different colors. From left to right, we see a line-up of brightly colored lines that represent the individual samples. The arrangement of these lines is based on the populations from which the samples were taken. Below the graph, the standardized abbreviations of these sample populations are given. In the present example, these are populations that were defined in the framework of the International Haplotype Map Project. ASW stands for “African Ancestry in South West USA,” CEU for “Utah residents with northern and western European ancestry,” MEX presumably for “Mexican Ancestry in Los Angeles” (the standard abbreviation for this population is MXL), and YRI for “Yoruba in Ibadan, Nigeria.” From top to bottom, the estimated membership coefficients of the individuals in the four clusters are represented by the corresponding lengths of sections in the respective colors of the clusters. What these “membership coefficients” refer to is specified to the left of the graph. A scale from 0 to 1 titled “Ancestry” clarifies that the analysis is intended to provide information about the ancestry composition of the sampled individuals and populations. The authors of STRUCTURE and ADMIXTURE thus assume that the clusters inferred by means of those programs correspond to what they call “source populations”17 or “ancestral populations,”18 and that individual genomes can be understood as composites of these “sources.”

Figure 2.2

ADMIXTURE bar plot. David H. Alexander et al., Admixture 1.3. Software Manual (2020), https://dalexander.github.io/admixture/admixture-manual.pdf.

Two years after STRUCTURE was released, a group of researchers around population geneticist Noah Rosenberg—among them Jonathan Pritchard, one of the coauthors of STRUCTURE—used the program to analyze global genome-wide data from the Human Genome Diversity Panel.19 The resulting paper provides an example of how STRUCTURE derives population structure from individual ancestry coefficients. In the paper, the authors aimed to show that individuals from different present-day cultural and geographic groups—from ethnoreligious and ethnic groups (such as “Druze,” “Han,” or “Uyghur”) and nationalities (such as “Russian” or “Italian”) to regional subgroups of these nations (such as “Sardinian” or “Tuscan”)—show different patterns of allele frequencies. Thereby, they wanted to prove that these “predefined populations” correspond to human populations in a genetic sense.20

The analysis with STRUCTURE generally supported this assumption. However, what Rosenberg and his coauthors particularly emphasized was that the diagram produced through STRUCTURE identified these populations as composites of “six genetic main clusters, five of which correspond to major geographic regions.”21

The legend for the diagram explains that “each individual is represented by a thin vertical line, which is partitioned into K colored segments,” with K representing the number of clusters. It further explains that these colored segments “represent the individual’s estimated membership fractions in K clusters.” The black lines separate individuals of different populations, and these populations are labeled below the figure, with their regional affiliations above it.

Figure 2.3

Representation of the first genome-wide analysis of global human genetic diversity with STRUCTURE. From Noah A. Rosenberg et al., “Genetic Structure of Human Population,” Science 298, no. 5602 (2002): 2382. Reprinted with permission from AAAS.

This last point is significant, since the developers of STRUCTURE stressed that their program allows the clustering of samples without any “population information,” that is, without relying on any information about the sampled individuals’ affiliations to cultural and geographic groups.22 Nevertheless, after clustering the individual samples independent of population information, STRUCTURE is set to reintroduce this information in order to visually arrange and label the individual data according to their prior assignment to cultural and geographic groups. In the diagram, the vertical lines representing the ancestry of the individual samples become hardly distinguishable. Instead, those predefined cultural groups become the salient entity of analysis and emerge as horizontal bars with a characteristic pattern of colors divided by black lines. Since this kind of representation is used as the de facto standard for visualizing STRUCTURE results for any kind of inquiry into the structure of human genetic variation, this rearrangement and labeling seems to subvert one of the crucial innovations of STRUCTURE: the purely inductive method of analyzing human genetic diversity without the use of population information. Rather, this kind of visualization reinforces the idea that human genetic variation is essentially structured by genetically distinct populations that largely match notions of ethnicity, nation and religion.

Looking at a single STRUCTURE bar plot, there appears to be no hierarchical or temporal order. Rather, the rainbow-colored diagram represents an ahistorical snapshot of human population genetic diversity, in which a patchwork of genetically reified cultural groups of varying scale come to stand together on the same plane. But this is not how STRUCTURE analyses are usually conducted. Typically, such analyses proceed according to the principle of trial and error. First, one assumes that there are two clusters and lets the program compute how much of the ancestry of every individual “comes” from each cluster. Then one proceeds with three, four, five clusters and so on—until the optimal number of clusters is identified. Determining this “optimal number” of clusters consists in assessing how well the results of the different analyses fit the data. The inventors of ADMIXTURE, for instance, suggested that the choice of K—the number of clusters—should be guided by knowledge about the history of the predefined groups that are being analyzed.23 This strategy might be adequate in cases where ADMIXTURE is used for its originally stated purpose in medical genetics, that is, to mine for genetic variants associated with disease. Historical interpretations of ADMIXTURE analyses, however, are ultimately turned into circular arguments by the strategy of determining K based on historical knowledge.

The way Rosenberg and colleagues interpreted their bar charts exemplifies the role that the choice of the number of clusters may have in terms of conceiving of those clusters as “ancestral populations” that actually existed at some point in the past. One interpretation in the paper was derived from a single bar plot of five clusters.24 The authors suggested that this was basically congruent with a continental pattern, since the colors of the clusters align with the geographical designations at the top of the diagram: Africa is mainly orange, Europe mainly blue, East Asia pink, Oceania green, and America purple, whereas the Middle East, Central and South Asia emerge as mixes of their geographical neighbors. This interpretation blurs statistical clusters with the idea of distinct and relatively homogenous groups of a clear continental origin. It basically suggests that clusters can be understood in the sense of continental groups that align with common notions of “race.”

The second interpretation arose from all bar plots in combination. The authors observed that “at K = 2 the clusters were anchored by Africa and America, regions separated by a relatively large genetic distance.” They went on to state that “each increase in K split one of the clusters obtained with the previous value. At K = 5, clusters corresponded largely to major geographic regions.”25 Thus, by looking at several bar plots in succession, they arrived at a narrative of subsequent splitting along the lines of a human evolutionary history that is essentially shaped by the divergence of continental groups. These are the “ancestral populations” that—in line with the dominant model of admixture exemplified by the demographic history of African Americans—are supposed to have undergone a long period of divergence, followed by a relatively short period of convergence that created the current genetic mosaics.

Such simple interpretations of STRUCTURE bar plots seem to be common enough to have induced a group of data scientists and geneticists to publish a paper entitled “A Tutorial on How Not to Over-Interpret STRUCTURE and ADMIXTURE Bar Plots” in 2018.26 The authors simulated populations with specific patterns of allele frequencies, as they would theoretically form based on three historical scenarios, and then subjected these simulated populations to analysis with ADMIXTURE.27

The top line shows three different demographic scenarios involving four simulated populations represented by different colors (dark blue, black, dark green, and dark magenta). The lower part of figure 2.4 consists of the admixture analyses for each of the respective simulated populations. The clusters to which the individuals of the simulated populations were thereby assigned to are represented by lighter shades of the colors that are used to distinguish the populations in the upper part of the illustration (blue, light green, magenta).

Figure 2.4

“Three scenarios that give indistinguishable ADMIXTURE results.” Daniel J. Lawson, Lucy van Dorp, and Daniel Falush, “A Tutorial on How Not to Over-Interpret STRUCTURE and ADMIXTURE Bar Plots,” Nature Communications 9, no. 3258 (2018): 3.

This color scheme visually anticipates and emphasizes the controversial assumption that the simulated populations are mixtures of different pure ancestral populations.28 However, the illustration is actually intended to demonstrate that admixture constructs nearly indistinguishable bar plots from the different scenarios. The first scenario represents a history qualitatively similar to the one that is assumed for African Americans, but with three instead of two “ancestral populations.” In the second scenario, we see admixture between a known population (P1) and an unknown population giving rise to a mixed population (P2) as well as two populations (P3 and P4) not involved in any kind of mixing. The third scenario does not involve processes of admixture at all.

For the first scenario, an interpretation that assumes the African American admixture model, and therefore interprets the patterns as representing the proportions to which each genome was inherited from different “ancestral populations,” seems adequate. However, this interpretation is misleading for the second and third scenario. The simulations show that the patterns of an admixture analysis might reflect the length of time that each population has evolved independently from the others, rather than the proportion of ancestry resulting from admixture. As the authors of the paper point out, these distortions occur because the algorithm attempts to find the combination of clusters and admixture proportions in the data that best supports a simple admixture model—regardless of whether that model is historically accurate.29

As we will see, such difficulties are exacerbated with the use of aDNA in ADMIXTURE analyses. Suffice it to say here that groups that contain fewer samples are likely to be represented as mixes of populations with a greater sample size, rather than being assigned to their own “ancestral population.” Even if the aDNA sample is older than the separation date of the modern populations with which it is compared, the ancient sample is typically represented as an admixture of the modern populations.30

Enter Ancient DNA: Mosaic and Trees

Are sens