Exploiting the content of the HathiTrust

I have been exploring possibilities of exploiting to a greater degree the content in the HathiTrust. This blog posting outlines some of my initial ideas.

The OCLC Research Library Partnership program recently sent us here at the University of Notre Dame a report describing and illustrating the number and types of materials held by both the University of Notre Dame and the HathiTrust — an overlap report.

As illustrated by the pie chart from the report, approximately 1/3 of our collection is in the HathiTrust. It might be interesting to link our local library catalog records to the records in the ‘Trust. I suppose the people who wrote the original report would be able supply us with a list our overlapping titles. Links could be added to our local records facilitating enhanced services to our readers. “Service excellence.”

pie chart

Percentage of University of Notre Dame and HathiTrust overlap

According to the second chart, of our approximately 1,000,000 overlapping titles, about 121,000 (5%) are in the public domain. The majority of the public domain documents are government documents. On the other hand about 55,000 of our overlapping titles are both in the public domain and a part of our collection’s strengths (literature, philosophy, and history). It might be interesting to mirror any or all of these public domain documents locally. This would enable us to enhance our local collections and possibly provide services (text mining, printing, etc.) against them. “Lots of copies keep stuff safe.”


Subject coverage of the overlapping materials

According to the HathiTrust website, about 250,000 items in the ‘Trust are freely available via the public domain. For example, somebody has created a collection of public domain titles called English Short Title Catalog, which is apparently the basis of EBBO and in the public domain. [2] Maybe we could query the ‘Trust for public domain items of interest, and mirror them locally too? Maybe we could “simply” add those public domain records to our catalog? The same process could be applied collections from the Internet Archive.

The primary purpose of the HathiTrust is to archive digitized items for its membership. A secondary purpose it to provide some public access to the materials. After a bit of creative thinking on our parts, I believe it is possible to extend the definition of public access and provide enhanced services against some of the content in the archive as well as fulfill our mission as a research library.

I think will spend some time trying to get a better idea of exactly what public domain titles are in our collection as well as in the HathiTrust. Wish me luck.

2 Responses to “Exploiting the content of the HathiTrust”