Forgotten Books

The Application of Unseen Species Models to the Survival of Culture


The survival of culture

Scholars studying the past of human cultures struggle with the fact that many artefacts have been lost over time. This introduces “survivorship bias”, and the risk that we will underestimate the cultural diversity of past societies. In this project, we show that unseen species models from ecology can be used to estimate the loss rates of cultural artefacts. For medieval literature (chivalric and heroic narratives in particular) we obtain survival estimates which are compatible with prior research and which emphasize the severity of the loss in this domain. Comparison of results between languages highlights interesting differences in literary survival patterns across European medieval vernaculars.


Author explanation video

In this author explanation video, multiple collaborators involved in this project provide a high-level explanation of our research project and results. Their explanations are complemented with footage of historical manuscripts from across medieval Europe.

Works and documents

The survival of medieval literature

In the middle ages, works of narrative fiction circulated in hand-written books, called manuscripts. Each manuscript was individually produced and each thus presents a unique material artefact surviving as a parchment or paper volume (or the fragmentary remnants of a volume). Multiple parallel witnesses of the same medieval work could therefore circulate, especially for more popular narratives. Scholars agree that much medieval literature has been lost, both through accidental losses (e.g. library fires) and deliberate destruction (e.g. the recycling of books as binding material for other books). However, the precise extent of these losses remains a matter of considerable debate and speculation. We draw a firm distinction between the (material, tangible) document and the (immaterial, non-tangible) work surviving in that document. Loss should be considered, we suggest, both at the level of the document and that of the work: in our model, a work is considered “lost” when none of the copies that once preserved it survive any longer.

The "Snow Whites" of Leuven, Belgium: some of the books which survived the library fires in WWI are now kept in glass boxes. © KU Leuven. Digitaal Labo.

Measuring diversity

Loss rates: documents and works

Abundance data in ecology records how often different species have been spotted during a bioregistration campaign. Chao1 is an unseen species model that estimates how many species were not observed, on the basis of the data for species which were observed only rarely. Once we have an idea of the actual, real number of species, we can estimate how many of these species were in fact detected. Chao1 has been integrated into the Hill number framework, an elegant model used to present multiple metrics for expressing species richness (ecodiversity) on a single spectrum, for various values of q. We apply this framework to cultural data to model the under-detection of medieval literature: we treat works as species and documents as sightings of those species. We show the empirical and estimated Hill number profiles (left) and a species accumulation curve (right), showing how many more works we are likely to find by discovering more documents. Of the original ca. 1,170 works that once existed, 799 would survive today; the 3,648 documents that still exist would be a sample from an original population of ca. 40,614 specimens.

International comparison

Did island literatures fare better?

Our loss figures are compatible with prior studies in book history using other methodologies, but they hide considerable variation across the six medieval languages considered here. We present survival ratios both for (material) documents and (immaterial) works. While the confidence intervals are large, we can observe clear trends. Our analyses confirm the severity of loss, but suggest that German, Icelandic and Irish are characterized by higher survival ratios than French, Dutch or English. The results for the island literatures are remarkable: in spite of their small size, the survival ratios for these literatures were on par or better than for more widely-influential literatures, such as French, on the mainland. That these isolated island cultures behave differently is particularly exciting, because in ecology too, islands are of special interest. In ecology, endemic species richness, for instance, is higher on islands: if islands are indeed better able to preserve their biological heritage, could the same be true for their cultural heritage?


Evenness of distributions

Past research has mostly focused on post-medieval factors that drove the loss of historic literature, such as library fires or collectors disposing of “duplicate” copies. We identify, however, an additional factor that has typically been overlooked: the original evenness of these literatures. Evenness is a concept that we borrow from ecology. In a more even literary tradition, copies are more evenly distributed over works, so that the difference in the number of copies between the most popular and the least popular works is smaller. A more even distribution can guard a literature against losses: if we randomly lose a manuscript in a more evenly-distributed literature, the chance that we lose a unique copy of a work is smaller than it is in an equal-sized literary tradition that is less even. To the right, we show “evenness curves” for the traditions studied; these are integrated in the Hill number framework and display evenness across various values of q. The island literatures (Irish and Icelandic) differ sharply from the other four languages.

global dispersion

The spread of literature

Just like plant seeds, historic books have been subjected to a global dispersal after the middle ages. Often, fragments of manuscripts traveled unnoticed in the spines of later books, re-emerging later in distant corners of the world. In other cases, lavishly illustrated codices were traded by well-known book salespeople for record prices at public auctions. There are many aspects of the survival of literature that deserve further quantitative research. In the Sankey diagram to the right, we plot where the various documents of the vernaculars from our study are currently being kept. The English documents (again) stand out: their dispersion has remained surprisingly local to the British Isles, whereas the other vernaculars experienced a much wider spread across the European continent. Just asin ecology, the ability to migrate might have been a crucial factor in the survival of literatures.


Open Science

To support the findings in our paper and ensure their replicability, we have made our full datasets and Python code open access. This includes documented Jupyter notebooks and the release of a new, open-source software package for running unseen species models, called “copia” (Latin for “abundance”, which is a classical concept in ecology), available from the Python Package Index. The logo with a horned goat playfully refers to the mythological cornucopia or "horn of plenty", the legendary horn of the goat Amaltheia, who fed the infant Zeus with her milk. The software has been published on Github, where we will welcome community contributions in the future; a snapshot of this repository (including the data) has been sustainably archived on Zenodo. Additionally, we provide an independent reimplementation of our entire analysis in the statistical software R, which has an established tradition in biostatistics, in particular for unseen species models.

GitHub Repository

university of antwerp

Mike Kestemont

Computational Humanities

knaw meertens instituut

Folgert Karsdorp

Cultural Evolution

university of antwerp

Elisabeth de Bruijn

Dutch and German Literature

university of copenhagen

Matthew Driscoll

Icelandic Literature

university of oxford

Katarzyna A. Kapitan

Icelandic Literature

university college cork

Pádraig Ó Macháin

Irish Literature

university of oxford

Daniel Sawyer

English Literature

university of antwerp

Remco Sleiderink

Dutch and French Literature

nat. tsing hua university

Anne Chao