Thursday, April 17, 2014

Oral Poetics and the Homer Multitext

One of the central research questions that drives the Homer Multitext is this: “How do you make a critical edition of an oral tradition, like that of the Homeric Iliad and Odyssey, that spanned a thousand years or more? What is the best way to represent the textual history of songs that were created in and for performance, but survive only in textual forms from later eras?” In our 2010 book, Iliad 10 and the Poetics of Ambush, Mary Ebbott and I attempted to demonstrate that a "multitextual"  approach to Homeric poetry is useful not only for understanding the transmission of the text of the epics, but also for better understanding the poetics of oral poetry. We could not have written that book, which is meant to be a sustained demonstration of the workings of oral poetry over the course of an entire book of the Iliad, without the data and tools of the Homer Multitext that were available to us at that time.

As new ways of viewing and working with the surviving documents that transmit Homeric poetry become possible, Mary and I would like to continue to use them to enhance our understanding of the poetics of the Iliad and Odyssey. With that in mind, we have decided to revive a long neglected Oral Poetry blog, which we will maintain along with this one, and in close coordination with one another. The Oral Poetry blog will be devoted primarily to questions of poetics, while we will continue to make posts here about the manuscripts and papyri and what they tell about the system of oral poetry in which the Iliad and Odyssey were composed.

To kick off this phase of the Oral Poetry blog, we are planning a series of posts about the poetics of Iliad 2. You can read my initial post about this work here. You can also read a much older post on this blog about the transmission of the Catalogue of Ships from Book 2 here. It is the special treatment and seemingly controversial place of the Catalogue in the surviving manuscripts and papyri that drives us to try to better understand the poetics of this fascinating record of names and places. 

Sunday, April 13, 2014

Testing the HMT project’s technical underpinnings

In February, we noted the release of new draft specifications for the CTS URN notation that we use to cite texts, and the CTS protocol that we use to retrieve texts in the Homer Multitext project. Since the publication of the draft specifications, we have released updates of a suite of test data and of software using the test data to assess the compliance of a given CTS service with the current version of the protocol.

Together with version 1.6 of this software, the ctsvalidator servlet, we are today releasing version 0.9.0 of our implementation of the CTS protocol, sparqlcts. The new version of sparqlcts passes 100% of the ctsvalidator tests.

To recapitulate what we have released in 2014 in our work on CTS:
  • Formal specifications for the Canonical Text Services protocol, and CTS URNs. The specifications include Relax NG schemas for a CTS Text Inventory (the catalogue of a CTS library), and Relax NG schemas for validating the responses to CTS requests.
  • A test data set, documented in a valid CTS Text Inventory, and available in three formats:
    • valid and well-formed XML
    • tabular data in simple delimited text files
    • RDF triples in .ttl format
  • A set of 68 tests applying CTS requests to the test data set. The tests are defined in an XML file listing the request and parameters to be submitted to a running CTS installation. For each test, a corresponding XML file gives the expected responses to the request.
  • The CTS Validator, a web-app that runs the tests against any online CTS service hosting the corpus of test-data.
  • An implementation of the Canonical Text Services, sparqlcts, a Java web-app drawing its data from a SPARQL endpoint.  When the SPARQL end point is hosting the corpus of test data, sparqlcts passes 68 out of 68 of our defined tests.
This of course does not mean that sparqlcts is necessarily flawless (there may be problems that ctsvalidator does not test for), but it is an important milestone. One of the most profound implications of digital scholarship is that when we can automate the testing of digital work, we should invert the humanist’s traditional order of composition and assessment: specify the automated test first, then work until you pass the test. This applies to the software we use, too. When we next update our online services, we can be confident that our text service has successfully passed 100% of a challenging series of tests.

Links

Christopher Blackwell and Neel Smith, project architects

Monday, March 31, 2014

Got git?

Over the past several years, github (https://github.com/) has emerged as the primary site for sharing openly licensed software. More recently, it has begun to assume a comparably important role in sharing openly licensed data; from the perspective of the Homer Multitext project, it’s tempting to say that you can have “as many githubs as you please.” (For a few examples of open datasets on github ranging from wedding invitations to arrangements of organ accompaniment for Gregorian chant, see this recent article. )

While the Homer Multitext project has relied on publicly available version control systems for years, and has specifically used the git version control system since 2012, we have only recently taken advantage of github’s option to group repositories by organization. To simplify the task of following the varied work in progress connected with the HMT, we have created two github organizations, homermultitext and cite-architecture.
  1. Repositories for the HMT data archive and digital services of the Homer Multitext project belong to the homermultitext organization. See http://homermultitext.github.io/.
  2. The CITE architecture is a generic architecture for working with citable scholarly data. It was originally developed specifically for the Homer Multitext project, and underpins all the digital work on HMT. See links to its repositories at http://cite-architecture.github.io/.

If you are new to version control in general or git in particular, you can probably find local expertise on your college or university campus; a little googling for help will certainly turn up an embarrassment of riches.

Christopher Blackwell and Neel Smith, project architects

Thursday, February 27, 2014

Publishing the HMT archive

The editorial work of the Homer Multitext project is ongoing, and, as good photography of more manuscripts and papyri becomes available, is open-ended. While we have provided openly licensed access to our source images and editorial work in progress since our first digital photography in 2007, we have not previously offered packaged publications of our archive.

That is changing in 2014. The project’s editors have decided on a publishing cycle of roughly three issues a year (since our work tends to be concentrated around an academic calendar of fall term, spring term, and summer work). Published issues of the project archive must satisfy four requirements.
  1. The issue must be clearly identified. Our releases are labelled with a year and issue number: our first issue is 2014.1.
  2. All content published in a given issue must pass a clearly identified review process. Teams of contributing editors work in individual workspaces. (We use github repositories to track the work history of these teams.) When a block of work passes a series of manual review and automated tests, it migrates from “draft” to “provisionally accepted” status and is added to the project’s central archival repository. This is the repository that we are publishing for the first time this week.
  3. All published material must be in appropriate open digital formats. Apart from our binary image data, all the data we create are structured in simple tabular text files or XML files with published schemas.
  4. All published material must be appropriately licensed for scholarly use. All of our work is published under a Creative Commons Attribution-ShareAlike license. (Licenses for some of our image collections additionally include a “non-commercial” clause: in those cases, a license for commercial reuse must be separately negotiated with the copyright holder.)

Access to the Published Digital Archive

The published packages are available for download from http://beta.hpcc.uh.edu/hmt/archival-publications as zip files. An accompanying README explains the contents of each zip file.
We are also distributing our published issues as nexus artifacts (previously mentioned briefly here), a system that allows software to identify and retrieve published versions automatically. Whether manually or automatically downloaded, it now becomes possible for scholars (and their software) to work with citable data sets from the constantly changing archive of the HMT project.

Tracking Work in Progress

We will continue to make our work in progress available. For easy access to the current state of “provisionally accepted” material in our archive, we also generate a nightly set of packages. These are available for manual download here, but are not distributed through our nexus server.
They should be considered unpublished: other publications should cite only published issues of the archive.

Like our individual editorial teams, we manage our publication repository through github: http://homermultitext.github.io/. Our data archive includes a publicly available issue tracker where you can submit questions or bug reports, and follow our progress.

More technical information

If you’re interested in technical information about how we develop the published archive and use it to build applications, Christopher Blackwell and I have recently published a discussion here.

Links

Saturday, February 8, 2014

Technically speaking ...

For over a decade, the Homer Multitext project has been exploring how to represent a multitext in digital form.  For some of our essential work, we have been able to adopt well understood practices (such as how to use XML markup to structure a diplomatic edition of a text).  In other aspects of our work, we are faced with issues that have not been explored in prior work on digital scholarship, and have had to define new standards.

We have devoted special attention to the fundamental question of how to cite texts in a form that is independent of any specific technology and sufficiently rigorously defined for computers to use.  We have defined the syntax and semantics for a notation for citing texts that is based on the Internet Engineering Taskforce's Uniform Resource Name (URN) notation.  We call this notation the Canonical Text Service URN, or CTS URN.

We have also defined a protocol for a networked service that understands the CTS URN notation, and can retrieve passages of texts.  Unsurprisingly, we call this the Canonical Text Service protocol, or CTS protocol.

We have worked hard to ensure that the technical design of our notation and service fully satisfies the needs of the Homer Multitext project, but is not limited to or in any way specific to the HMT project's corpus of texts.  Both of us have applied the CTS notation and CTS service protocol to a range of other projects, not limited to Greek or Latin texts.  As our work on these two technical projects has matured, we have found more and more interest in it from scholars working with canonically citable texts.

This week, we were able to complete revisions for a new version of the specification for both the CTS URN notation and CTS protocol.  It was especially gratifying that we were able to complete this work during a visit to Leiden University, where we were graciously hosted by Ineke Sluiter and her colleagues, a new group of collaborators on HMT who first participated in the summer 2013 seminar at the Center for Hellenic Studies.

The specifications:



Christopher Blackwell and Neel Smith, HMT project architects

Friday, December 13, 2013

Images of the Geneva Iliad have been Posted!

Images of the Geneva Iliad, which has undergone extensive restoration and digitization in a partnership between the E-Codices project of Switzerland and the Homer Multitext, have now been published. Here is an excerpt from the E-Codices press release:

The “Geneva Iliad” was most likely produced in Constantinople in the 13th century. The manuscript was purchased in the 16th century, probably in Venice, by Henri Estienne, who used this manuscript as a basis for his 1566 edition of the Iliad, which remained the standard edition into the 18th century. This manuscript is unique for numerous scholia, which are not found in any other similar manuscript.
The digital publication of this manuscript was requested in 2010 by the “Homer Multitext”, a project of the Center for Hellenic Studies at Harvard University, which uses digital techniques to facilitate research regarding the multiformity of the textual tradition in Homeric epics.
We are extremely excited to be able to start making use of these images within the Homer Mutlitext. The E-Codices Creative Commons non-commercial licensing on the images will allow to use these images for education and research. As with the other manuscripts we have brought together digitally within the project, we will now be able to compare the Geneva Iliad side by side along with other manuscripts, and will no doubt learn a great deal about how the manuscript was conceived and constructed, the information contained in the scholia, and the sources for its scholia.

This collaboration between the Mellon Foundation, the E-Codices Virtual Manuscript Library of Switzerland, and the Homer Multiext has been four years in the making. We are so grateful to the Bibliothèque de Genève for overseeing the restoration, digitization, and scholarship on the manuscript that allowed for publication of these images to happen. Stay tuned as we learn more about this remarkable manuscript with its unusual lay out and unique set of scholia.

Wednesday, November 20, 2013

The Homer Multitext on the road this weekend

This weekend the Homer Multitext will be part of two conferences. Christopher Blackwell will lead a workshop entitled “Scholarship Outside the Codex: Citation-based digital workflows for integrating objects, images and texts without making a mess” at the Sixth Annual Lawrence J. Schoenberg Symposium on Manuscript Studies in the Digital Age: Thinking Outside the Codex. The symposium is being held at the University of Pennsylvania in Philadelphia.

Meanwhile, Mary Ebbott will be speaking about “Rethinking the Role of Editors in the Homer Multitext” as part of the New Testament Textual Criticism panel (offered as a joint session with Digital Humanities in Biblical, Early Jewish, and Christian Studies) at the Society of Biblical Literature annual meeting, held in Baltimore this year.

The participation of the HMT at both events highlights our interest in cross-disciplinary conversations about the use of digital tools for scholarship on manuscripts. We have much to learn from our colleagues in other disciplines that also focus on manuscripts as primary sources, and we hope we have something to contribute in fruitful collaboration and sharing of ideas and methods.

The beauty of collaboration within our project also allows us to bring the HMT to two conferences on the same weekend!

How can digital tools help us understand and publish for others to study a complex document like the Venetus A manuscript? (Seen here is folio 15v of that manuscript.) Such questions will be considered at two conferences this weekend.