12. Källén et al., “Archaeogenetics in Popular Media,” 69–91.
13. Haapasaari, Kulmaka and Sakari, “Growing into Interdisciplinarity,” 1–12.
14. Martin Furholt, “Biodeterminism and Pseudo-Objectivity as Obstacles for the Emerging Field of Archaeogenetics,” Archaeological Dialogues 27, no. 1 (2020): 23–25.
15. Janet Stephenson et al., “The Practice of Interdisciplinarity,” International Journal of Interdisciplinary Social Science 5, no. 7 (2010): 271–282.
16. See the chapters in this volume by Magnus Fiskesjö (chapter 7), as well as by Marianne Sommer and Ruth Amstutz (chapter 2).
17. Anna Källén, The Trouble with Ancient DNA (Chicago: University of Chicago Press, 2024).
18. Alpaslan-Roodenberg et al., “Ethics of DNA Research”; Prendergast and Sawchuk, “Boots on the Ground”; Sirak and Sedig, “Balancing Analytical Goals.”
19. A haplotype can be defined as a series of genetic variants that are inherited together on the same chromosome. In this example, where women share the same mitochondrial haplotype, it is indicative of matrilineal relatedness.
20. Hans Eiberg et al., “Blue Eye Color in Humans May be Caused by a Perfectly Associated Founder Mutation in a Regulatory Element Located within the HERC2 Gene Inhibiting OCA2 Expression,” Human Genetics 123, no. 2 (2008): 177–187.
21. Selina Brace et al., “Ancient Genomes Indicate Population Replacement in Early Neolithic Britain,” Nature Ecology & Evolution 3, no. 5 (2019): 765–771.
22. Källén, Trouble with Ancient DNA.
23. Brace et al., “Ancient Genomes Indicate Population Replacement,” 765–771.
24. Charlotte Hedenstierna-Jonson et al., “A Female Viking Warrior Confirmed by Genomics,” American Journal of Physical Anthropology 164, no. 4 (2017): 853–860; Källén et al., “Archaeogenetics in Popular Media.”
25. Richard Buckley et al., “ ‘The King in the Car Park’: New Light on the Death and Burial of Richard III in the Grey Friars Church, Leicester, in 1485,” Antiquity 336, no. 87 (2013): 519–538; Jo Appleby et al., “The Scoliosis of Richard III, last Plantagenet King of England: Diagnosis and Clinical Significance,” Lancet 383, no. 9932 (2014): 1944; Jo Appleby et al., “Perimortem Trauma in King Richard III: A Skeletal Analysis,” Lancet 385, no. 9964 (2015): 253–259.
26. Angela L. Lamb et al., “Multi-Isotope Analysis Demonstrates Significant Lifestyle Changes in King Richard III,” Journal of Archaeological Science 50 (2014): 559–565.
27. Turi Emma King et al., “Identification of the Remains of King Richard III,” Nature Communications 5, no. 5631 (2014): 1–8.
28. King et al., “Identification of the Remains,” 4.
29. Sirak and Sedig, “Balancing Analytical Goals,” 560–573; David Reich, “The Future of Ancient DNA,” in Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past (Oxford: Oxford University Press, 2018), 274–286.
30. Noam Chomsky, For Reasons of State (New York: Pantheon, 1973), 402.
31. See, for example, Callaway, “Divided by DNA”; Lewis-Kraus, “Is Ancient DNA Research Revealing New Truths?”; Källén et al., “Petrous Fever.”
2 Diagrams of Human Genetic Kinship and Diversity: From the Tree to the Mosaic and the Network?
Marianne Sommer and Ruth Amstutz
Genealogy is traditionally expressed in the shape of a family tree, and this also holds true for human evolution. Just like the tree of life, the human family tree visualizes the descent of hominin species.1 With the turn toward an evolutionary understanding of human origins and the discovery of fossil remains, such trees have been based on the comparative anatomy of living and fossil species. The tree shape has also been used to speculate about inner-human differentiation: the evolution and kinship of what used to be called “human races.”
In the second half of the nineteenth century, the influential evolutionary biologist Ernst Haeckel considered both physical characteristics and comparative linguistics to analogize the evolution and kinship of languages to trees of “racial descent.” Thus, a structure that lent itself to depicting both the genealogy of individuals and the descent of species came to be applied to groups of humans. In these “racial trees,” long, independent branches led to the different “modern human races,” suggesting that such “races” had evolved in isolation and stayed “pure” over long periods of time. Indeed, Haeckel and others thought that the differences between the “human races” were at least equivalent to the differences between species. The tree was a picture adequate to capturing their understanding of intrahuman as well as hominin evolution.2
By the 1960s, the time was ripe for a new way of drawing phylogenetic trees. Computer technologies and genetic data allowed the Italian population geneticist Luigi Luca Cavalli-Sforza and the British statistician and geneticist A. W. F. Edwards to publish what they called the first tree of human populations. This was before it was possible to sequence DNA. Cavalli-Sforza and Edwards analyzed twenty alleles3 from the five main blood group systems of fifteen human populations. The analysis resulted in a phylogenetic tree that separated a wild mix of population labels—from “English” to “Eskimo (Victoria 1)”—into different branches.4 While some of the labels had a legacy in the history of racial and colonial anthropology, the emergent study of human population genetics was very different from traditional racial typology. It was genetic variation that was of central interest, and the notion of pure races and race in general was often emphatically rejected.
Figure 2.1 represents a population tree, also produced by Cavalli-Sforza and colleagues, but based on an analysis that, compared to the study mentioned above, used far more alleles sampled across many so-called aboriginal populations. For the original tree, archaeological dates for steps in human expansion were used to calibrate genetic differentiation as well as to check for constant rates of genetic evolution.5 Like Haeckel, Cavalli-Sforza saw in the evolution of languages a powerful tool to support genetic trees. In its original form, figure 2.1 was seen as indicating parallel linguistic and genetic evolution.
Figure 2.1
“Average linkage tree for 42 populations.” Used with permission of Princeton University Press, from Luigi L. Cavalli-Sforza, Paolo Menozzi, and Alberto Piazza, The History and Geography of Human Genes (Princeton, NJ: Princeton University Press, 1994), 78. Permission conveyed through Copyright Clearance Center Inc.
Although such tree figures suggest that the human populations have a common origin, the human groups appear to have separated at one point (more or less) back in time and to never have mixed again. Such trees would thus make it seem as if human populations were discreet, homogeneous entities that had independent evolutionary histories after the last population split. Indeed, Cavalli-Sforza and colleagues used only data referring to populations that were supposedly “aboriginal, with little or no admixture.”6 Despite the rejection of many of the notions of racial anthropology—most importantly its typological understanding of race—the tree image that supported this understanding was thus carried over into human population genetics. The early human population geneticists were, however, not unaware of the issues with population trees. As early as the 1970s, Cavalli-Sforza suggested that trees might work only for populations that are geographically far apart, because otherwise, “instead of a ‘tree’ one may have to estimate a ‘network’; such methods do not yet exist.”7
At the end of Cavalli-Sforza’s career at the beginning of the twenty-first century, new theoretical, statistical, and computational approaches could be brought to bear on the organization and interpretation of an unprecedented amount of human genomic data. It was now possible to analyze and visualize the degree to which present-day individuals and populations are the result of admixture between human groups. With the introduction of such statistical computer software, the visual black box of these seemingly discreet and homogeneous entities—human populations—was opened. Individuals and populations now often came to be represented as colored bar plots indicating their admixed histories. Accordingly, extant human genomes at the individual and population level were now conceived of as a mosaic: that is to say, they comprised genetic elements from ancestral populations of different geographical origins.8
The era of population genomics also witnessed the possibility of extracting DNA from the fossils of hominins and ancient humans, and integrating these ancient DNA data into the analysis of evolutionary history and kinship. The advancing field of aDNA research relied on population genomics, from which it adopted terminologies, methodologies, and visualization techniques. At the same time, by bringing in a new deep-historical structure, the inclusion of aDNA into population-genomic models and visualizations shifted the focus more strongly toward processes of ancestral admixture, even between archaic humans, such as the Neanderthals, and modern humans.
With the advent of aDNA, the understanding of human history and diversity thus seems to have changed considerably. To find out how this shift is reflected—or not—in the images used in the field, we follow the representation of human history and diversity through the admixture paradigm in human population genomics and the shift toward the inclusion of aDNA data. Our focus thereby rests on prominent models and tools, on the meaning that representations seem to carry regarding human diversity, and on how this meaning fits the assumptions of practitioners. We build on the observation that underneath the representation of individual and populational genetic kinship and diversity in terms of admixture and as mosaics continues to lurk the hierarchically organized tree that suggests independent (nonadmixing) histories of discreet populations.9 We argue that although there was a paradigm shift toward the idea of a mosaic structure for human populations, the tree image was not dissolved by the new—and aDNA-driven—models that emphasize admixture, introgression (the transfer of genetic material from one population to another), and gene flow.
From the Tree to the Mosaic
At the beginning of the twenty-first century, a new standard for the analysis and representation of population genetic variation emerged through novel model-based clustering software that was typically developed by mathematicians together with statistically and computationally trained geneticists. The first of these clustering programs, STRUCTURE, was released in 2000. In the following years, it became one of the major tools to estimate ancestry from genome-wide human data, and the same holds true for the diagrammatic representation of the results of such analyses.10 The diagram commonly used to represent the results from STRUCTURE was first used for the visualization of a genome-wide analysis of the Human Genome Diversity Panel in 2002 (see figure 2.3).11 A software package that generated bar plots from STRUCTURE analyses, called DISTRUCT, was published two years later. Today, other programs are mainly in use, and these include follow-up programs of DISTRUCT as well as clustering software, such as ADMIXTURE, first published in 2009. The functions and the standard graphical representation of STRUCTURE and ADMIXTURE analyses are only marginally different. The main advantage of ADMIXTURE over STRUCTURE is that the program is considerably faster, allowing the processing of a much larger set of markers.12
With STRUCTURE, it was for the first time possible to compute genetic admixture by analyzing genome-wide data. At a very general level, the term “admixture” in this context means two things. First, it denotes a historical process: the mixing of at least two distinct populations through migration and reproduction. Second, it refers to a state of relatedness: the genetic makeup of individuals and populations in terms of so-called ancestry coefficients13 and population structure.14 These two levels of meaning are commonly understood to be causally related: admixture as a process is thought to result in admixture as a state. However, as we show below, the relationship between these two meanings of admixture is far more complex, and the significance of the term varies depending on the context of use. Whereas the inventors of STRUCTURE particularly emphasized the utility of their program for population genetics, in which the historical processes of admixture are of crucial interest, the developers of ADMIXTURE advertised their program as a tool for medical genetics, where the interest in admixture is typically limited to present genetic structures.15 However, ADMIXTURE is used beyond this context of application.
STRUCTURE and ADMIXTURE are based on the assumption that human genetic diversity is substantially shaped by admixture. At the same time, geneticists usually understand certain populations to be “more admixed” than others. As data scientist Daniel Lawson and geneticists Lucy van Dorp and Daniel Falush—the latter being one of the coauthors of the second version of STRUCTURE—have pointed out, the admixture model on which STRUCTURE and ADMIXTURE are based was derived from a very specific case of such “recent admixture”: that of African Americans.16 It relied on the assumption that every African American individual has ancestry from two genetically distinct “sources”—West Africa and Europe—and that before their abduction to, or settlement of, America, both groups had minimal contact. Therefore, the history of African Americans is divided into two phases: a phase of thousands of years of independent evolution and a phase of admixture in the past few hundred years. In other words, most of the ancestors of contemporary African American individuals who lived 500 or more years ago are taken to have been either Africans or Europeans.
By comparing multiple different sequences of individual whole-genome samples, STRUCTURE and ADMIXTURE identify subgroups in which certain gene variants (alleles) occur at different frequencies. In this process, samples are grouped into several clusters (K), the number of which is chosen in advance but can be varied across independent runs of the algorithm. Figure 2.2 shows a bar plot as it is typically generated in the course of STRUCTURE and ADMIXTURE analyses. Here, ADMIXTURE is set to K = 4, which means that the membership coefficients of the individual samples are calculated in relation to four different clusters. The clusters are represented by different colors. From left to right, we see a line-up of brightly colored lines that represent the individual samples. The arrangement of these lines is based on the populations from which the samples were taken. Below the graph, the standardized abbreviations of these sample populations are given. In the present example, these are populations that were defined in the framework of the International Haplotype Map Project. ASW stands for “African Ancestry in South West USA,” CEU for “Utah residents with northern and western European ancestry,” MEX presumably for “Mexican Ancestry in Los Angeles” (the standard abbreviation for this population is MXL), and YRI for “Yoruba in Ibadan, Nigeria.” From top to bottom, the estimated membership coefficients of the individuals in the four clusters are represented by the corresponding lengths of sections in the respective colors of the clusters. What these “membership coefficients” refer to is specified to the left of the graph. A scale from 0 to 1 titled “Ancestry” clarifies that the analysis is intended to provide information about the ancestry composition of the sampled individuals and populations. The authors of STRUCTURE and ADMIXTURE thus assume that the clusters inferred by means of those programs correspond to what they call “source populations”17 or “ancestral populations,”18 and that individual genomes can be understood as composites of these “sources.”
Figure 2.2