Nov 19, 2014

Information retrieval strategies

London, British Library MS Royal, fol. 1v. Image: British Library.
The image here is a table of signa, indexing symbols invented by Ralph of Diceto in the 1100s for his historical works, the Abbreviationes chronicorum and Imagines historiarum. They enable the reader to search for points in the text that deal with various topics or historical persons, such as kings of England and dukes of Normandy (a crown next to a sword), or conflicts within the royal family (two hands pulling a crown in different directions).

Information retrieval is one of the major industries of our digital world. It is both an academic field of study and a set of technologies and techniques. We often think of it, therefore, as something that we do with machines, or that machines do for us. If I want to find, say, a definition of ‘information retrieval’, I type the term into everyone’s favourite search engine and discover that the first hit is (no surprise) a Wikipedia article, followed by a couple of online textbooks and a link to an academic journal devoted to the subject. Here’s how one of the textbooks (Manning, Raghavan, and Sch├╝tze 2008) defines information retrieval:

    Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

three main strategies for finding a particular item of textual information
I like the repetition of ‘usually’ in that definition because it reminds us that although we usually think of information retrieval as a computing activity, searching for stuff on Google is only one form – and generally not the most effective – of information retrieval.

How did information retrieval work in the Middle Ages? We can posit three main strategies for finding a particular item of textual information.

The first was user-dependent. Here the data would be stored in the human memory, and the trained memory would be able to retrieve the information that was needed. In a primarily oral society, this is the main form of information retrieval, and for this reason important information is made memorable: organised into poetic forms, proverbs, and narratives. Medieval England was mostly an oral society, but medieval scholars had developed methods of providing visual cues to help the trained memory; thus the medieval book was, at the outset, not a device for storing information but a device for retrieving the information that the user should have stored in his or her memory.

It is in the context of user-dependent information retrieval that we should understand the illustrations and decorations in a medieval psalter or book of hours.

London, British Library MS Additional 49999, fol. 69r. Image: British Library.
On this page from the De Brailes Hours, the earliest extant English-made book of hours (c. 1240, illuminated by the Oxford artist William de Brailes), the opening of Psalm 37 (Psalm 38 in modern Bibles), Domine ne in furore tuo, one of the Seven Penitential Psalms, is illustrated by a historiated initial showing David praying under the downward-gesturing hand of God. This image is obviously related to the text of the psalm, which is an appeal to God’s mercy, but the grotesque in the lower margin has no direct relation to the words above it. Rather, it is meant to be a memorable image so that the reader, having seen it in the margin of this text and remembered it because of its striking weirdness, will be able to find this text again simply by looking for the images associated with it. Notice, too, that the psalms in this book are not numbered; as was usual throughout the Middle Ages, each psalm was known and referred to by its incipit, its opening words, which would recall for the trained reader or listener the entire text. For example, in Langland’s Piers Plowman, which assumes an extensive knowledge of the Latin Bible, quoting the first line is sufficient to bring to mind the rest of the psalm.

The second type of information retrieval is text-dependent. This was a natural development of medieval memory techniques, which involved, among other things, dividing up a text into units that could be stored in the mind in organised ways. In other words, medieval memory training created structured data for the purpose of information retrieval. It was then natural to transfer these mental structures to the written text, and to represent them visually in documents. In the early Middle Ages, an educated person was supposed to have internalised the Bible (or at least the Psalms and Gospels) well enough to navigate the text by memory, but strategies existed for more complex kinds of analysis. For example, the section numbers assigned by Eusebius of Caesarea to the Gospels allowed the comparison of parallel passages in those four biblical books, and the correspondences between those passages were expressed visually in canon tables. As well, from the twelfth century onwards a standard commentary on the Bible, the Glossa ordinaria, became a secondary text that inhabited the margins of medieval Bibles and circulated as a text in its own right; it consisted in large part of extracts from patristic authors.

Glossed gospel of Luke, France, 13th century. Western Michigan University MS 141, verso of leaf. Image: WMU.

Columbia University MS Western 85, fol. 352v. Image: Digital Scriptorium.
Even more basically, a more-or-less standard division of the Bible into numbered chapters was introduced in the early 1200s, and is still used today. (The image to the right is from Columbia University MS Western 85, a 13th-century Bible with chapter numbers in the margins.) Such a system is text-dependent because it travels with the text; whatever copy of the Bible you have, in whatever medium, translated into whatever language, it will probably have the chapter numbers assigned to the text in the thirteenth century.

A third method of information retrieval is document-dependent. When a text is encoded in material form, it becomes possible to design visual and material systems for searching the physical document. Such systems are usually specific to the material object itself; for example, dog-earing a book or inserting a bookmark in it is of use only to a person interacting with that particular copy of the book.

Brendan Missal, University of Saskatchewan Special Collections
One example of a document-dependent retrieval system is the leather tabs sewn into this early 15th-century missal, marking particular sections of the book for liturgical use. Another document-dependent retrieval system is page numbering; for the most part, the division of a medieval manuscript into folios, or a modern printed book into pages, has no close relationship with the structure of the text itself. The arbitrary nature of page numbering is one basis of the common student complaint (at least I used to complain of this when I was a student) that a publisher can force students to purchase a new edition of a textbook not by making substantial changes to the text, but by making some small changes to layout that alter the pagination. Digital text documents posted on the Web, for example articles in online scholarly journals, often require an alternate method of dividing the text into sections, since page numbers make no sense for a text read from a screen.

Each method of information retrieval has advantages and disadvantages, and requires different elements to be in place to be effective. For example, a user-dependent system presupposes trained users. Text-dependent systems work well for communication and interchange as long as encoding is standardised: that is, as long as everyone uses the same system, as came to be the case for chapter numbering in the text of the Christian Bible. Document-dependent systems, which rely least on human memory, are preserved as long as the physical artifacts are themselves preserved. Both text- and document-dependent systems extend the human capacity to store, organise, and retrieve information, but all systems need to interact with human users in some way, and thus training and interface design will always be essential elements of information retrieval.

Of course, most information retrieval systems are actually combinations of the three types. For example, finding all references to popes in Ralph of Diceto's histories, or a particular passage in a medieval book of hours, involved human memory interacting with illustrations and decorations that were associated with structured text. The history of medieval information encoding and retrieval is a history of innovation and refinement, as scribes and scholars invented and experimented with different ways of realising these methods in a wide range of contexts. But when people refer to ‘information retrieval’ today, they are most likely thinking of computer-assisted methods, which we might regard as a fourth type: machine-dependent (or perhaps algorithmic) retrieval. The ways in which this new type of information retrieval interacts with older methods are still being explored.

Yin Liu

References and Further Reading

A very thorough account of medieval memory training, and its implications for the design of textual documents, is Mary Carruthers’ The Book of Memory (2nd ed. 2008), and M. T. Clanchy has also explored the role of memory in post-Conquest English record keeping, in From Memory to Written Record (3rd ed. 2013; see especially pp. 174-186).

On referencing the Psalms in medieval England, see Eyal Poleg, Approaching the Bible in Medieval England (2013), especially pp. 131-151.

For medieval practices of Bible reading, a classic is Beryl Smalley, The Study of the Bible in the Middle Ages (1964). On Eusebius, see Anthony Grafton and Megan Williams, Christianity and the Transformation of the Book: Origen, Eusebius, and the Library of Caesarea (2006). For the history of the Glossa ordinaria and its incorporation into the design of medieval Bibles, see C. F. R. de Hamel, Glossed Books of the Bible and the Origins of the Paris Booktrade (1984). The division of the Bible into numbered chapters is usually attributed to Stephen Langton, but a carefully argued dissenting view is presented by Paul Saenger and Laura Bruck in ‘The Anglo-Hebraic Origins of the Modern Chapter Division of the Latin Bible’ (2008).

No comments :

Post a Comment