Digitizing Books post

This week’s readings are particularly relevant for my Pet Book project, as I am working in the digital realm. The source objects that I am working with, of course, are digital – i.e., they are digital images of the book, rather than the physical object itself – but also, the project ultimately involves the production of a digital book.

Susan Schreibman traces the development of digital editions since the early 1990s, from the days of SGML to today’s XML formats like TEI. Julia Flanders, on the other hand, describes a particular project employing TEI markup – the Women Writers Project (WWP) here at Northeastern University.

Flanders addresses the distinction between digitizing “specific documents—particular copies of particular books” and digitizing works – that is, constructing Platonic-ideal chimæras. For a variety of reasons, the WWP makes no attempt to produce a digital edition of Charlotte Smith’s Elegiac Sonnets, and Other Poems, say; rather, its product is a digital representation of one specific copy of Smith’s work. Similarly, my project is to begin producing a digital representation of the Dragon Prayer Book – one specific book’s text. However, the Dragon Prayer Book is rather more distant from a contemporary American university audience than are most of the WWP’s; not only is it written in Latin, but in a highly-abbreviated mediæval Latin. It inevitably requires more editorial intervention, therefore, to make its contents comprehensible to a nonspecialist reader. (As an aside, the idiosyncracies of mediæval orthography the book exhibits, such as the substitution of simple e for æ or the spelling of -tio word endings as -cio, in fact present less of an obstacle to a modern reader than do the idiosyncracies of Early Modern English orthography.)

While Flanders’s piece on the WWP does not emphasize it, that project employs a customization of TEI, tailored to the project’s needs, as Sarah Connell explained in her presentation to the class. For my purposes, a largely suitable TEI customization already exists: Epidoc. Epidoc was, as its name implies, developed for the use of epigraphers, that is, scholars working with inscriptional texts, generally from the ancient Mediterranean. Its features have proved useful to papyrologists, as well, whose needs are quite similar to those of manuscript scholars, though their texts tend to be quite fragmentary and rarely of book length. The Leiden Conventions for transcription, on which Epidoc is closely based, were developed for use with Greek and Latin inscriptions and papyri, but they differ little from the conventions used by palæographers transcribing mediæval Latin texts.

As an example of transcription using the Epidoc customization of TEI, consider the following word which appears in the manuscript:


Here we have the word intercede, written īteꝛcede. We can encode it thus:


Here the markup shows that we have an abbreviation – the contents of the <expan> element – encoded along with its expansion. The <ex> elements contain the expansion text (i.e., that which we would include in parentheses in a traditional epigraphic or palæographic transcription), while the <am> element contains an abbreviation marker (a hexadecimal entity reference to Unicode character U+0304, the combining macron), denoting a glyph which is present in the abbreviated source text but should be omitted from the expanded text. Together, the <expan> element indicates that the manuscript contains the glyph sequence īteꝛcede, which is an abbreviation which expands to inteꝛcede. The remainder of the markup indicates that a normalized reading of that expansion would be intercede.

It is conventional in palæolographic transcriptions perform various substitutions, replacing (“r rotunda”) characters with modern rs, similarly eliminating long ses (ſ), and replacing v and j with u and i, respectively. For my purpose, though, such silent emendation is undesirable, because the goal is to present the reader with a representation of the actual book text, not an idealized form of it. However, it is also desirable to encode the normalized form of the word, both for the convenience of the reader and to facilitate automated lookup with a tool like Perseus’s Word Study Tool, which can look up inflected Greek and Latin word forms, returning possible dictionary entries. Thus we use the <app> and <rdg> elements to provide normalized spellings here and for other cases like Mariae (written marie in the book) or subsidiis (written ſubſidijs).

Leave a Reply

Your email address will not be published. Required fields are marked *