‘Lost’ medieval literature uncovered by techniques used to track wildlife

Ask any Dutch schoolchild about Reynard the fox, and they’ll tell you all about the adventures of the dashing, anthropomorphic folk hero, whose exploits were laid down in the 13th century by Willem die Madoc maecte, or “William who made the Madoc.” Madoc is likely the name of another once-popular poem about a legendary Welsh knight and explorer. Despite being the well-known medieval author’s calling card, nobody knows the content of that poem, which has been lost to time.

“People have been frantically looking for it ever since,” says Mike Kestemont, a computational text researcher at the University of Antwerp. In a new study, he and colleagues tried to figure out just how much medieval literature—like Madoc—has been lost over time, using techniques that are more commonly used to track actual foxes and other wildlife.

During Europe’s medieval period, which stretches from roughly the beginning of the sixth century to the end of the 15th, narrative fiction took off in a big way. Authors penned chivalric romances and heroic tales of knights battling fantastic monsters and traveling to exotic lands—think, Beowulf and King Arthur—by hand onto parchment and eventually paper codices. “You can liken these to action hero movies nowadays,” Kestemont says.

Authors often referenced each other’s works, and ancient catalogs reveal a glimpse of the vast, past literary landscape. Yet only a fraction of these works has survived to the present day. Before the invention of the printing press, mass copies simply didn’t exist; if a particular text were lost to a fire, eaten by insects, or used to reinforce a bishop’s hat (as befell one 13th century collection of Old Norse tales), it could be lost forever.

To estimate how much medieval literature once existed, book historians compare ancient book catalogs, which are incomplete, with the number and scope of surviving texts. To offer another, perhaps more informative, estimate of how much literature once existed, Kestemont and colleagues borrowed a technique from ecology called the “unseen species” model. Developed by co-author and statistician Anne Chao at National Tsing Hua University, the model uses a statistical approach to estimate how many species are missing from a field count—present but simply unobserved by scientists.

The statistical model doesn’t care whether you’re comparing missing birds or books, explains co-author Folgert Karsdorp, a computational humanities researcher at the Meertens Institute. “It’s a very general method of bias correction,” he says. For instance, it’s also been used to estimate the number of bugs in long stretches of computer code.

The researchers turned to lists of surviving medieval texts—and those suspected to have been lost—written between 600 and 1450 C.E. in Dutch, French, Icelandic, Irish, English, and German. There were 3648 texts in total. When they ran those numbers through the unseen species model, the algorithm suggested just 9% of medieval texts from that period survived to the present day, the researchers report today in Science. That’s rather close to traditional estimates of 7%. But the new study also broke things down by region: The model suggests only about 5% of English vernacular works have survived, compared with 17% and 19% for Icelandic and Irish vernacular works, respectively.

Robert Colwell, an evolutionary biologist and emeritus professor at the University of Connecticut, Storrs, who helped pioneer the quantitative ecology methods behind unseen species model, calls the study “superb.” “It has been a joy to see how rigorous estimation methods initially developed for biodiversity statistics … are increasingly applied in the social sciences and humanities,” he says.

The paper seems geared more toward systems theorists and statisticians, says Daniel Smail, a historian at Harvard University who studies medieval social and cultural history, and the authors haven’t done enough to establish why cultural production should follow the same rules as life systems. But for him, the bigger question is: Given that we already have catalogs of ancient texts, and previous estimates were pretty close to the model’s new one, what does the new work add? “What is this telling us that we didn’t know?”