{"id":686,"date":"2015-06-11T20:25:53","date_gmt":"2015-06-11T15:25:53","guid":{"rendered":"http:\/\/blogs.nd.edu\/emorgan\/?p=686"},"modified":"2015-06-11T20:25:53","modified_gmt":"2015-06-11T15:25:53","slug":"eebo-browser","status":"publish","type":"post","link":"https:\/\/sites.nd.edu\/emorgan\/2015\/06\/eebo-browser\/","title":{"rendered":"EEBO-TCP Workset Browser"},"content":{"rendered":"<p>\nI have begun creating a &#8220;browser&#8221; against content from EEBO-TCP in the same way I have created a browser against worksets from the HathiTrust. The goal is to provide &#8220;distant reading&#8221; services against subsets of the Early English poetry and prose. You can see these fledgling efforts against a complete set of Richard Baxter&#8217;s works. Baxter was an English Puritan church leader, poet, and hymn-writer. [1, 2, 3]\n<\/p>\n<p>\n<img align='right' src='http:\/\/blogs.nd.edu\/emorgan\/files\/2015\/06\/cloud1.png' \/>EEBO is an acronym for Early English Books Online. It is intended to be a complete collection of English literature between 1475 through to 1700. TCP is an acronym for Text Creation Partnership, a consortium of libraries dedicated to making EEBO freely available in the form of XML called TEI (Text Encoding Initiative). [4, 5]\n<\/p>\n<p>\nThe EEBO-TCP initiative is releasing their efforts in stages. The content of Stage I is available from a number of (rather hidden) venues. I found the content on a University Michigan Box site to be the easiest to use, albiet not necessarily the most current. [6] Once the content is cached &#8212; in the fullest of TEI glory &#8212; it is possible to search and browse the collection. I created a local, terminal-only interface to the cache and was able to exploit authority lists, controlled vocabularies, and free text searching of metadata to create subsets of the cache. [7] The subsets are akin to HathiTrust &#8220;worksets&#8221; &#8212; items of particular interest to me.\n<\/p>\n<p>\nOnce a subset was identified, I was able to mirror (against myself) the necessary XML files and begin to do deeper analysis. For example, I am able to create a dictionary of all the words in the &#8220;workset&#8221; and tabulate their frequencies. Baxter used the word &#8220;god&#8221; more than any other, specifically, 65,230 times. [8] I am able to pull out sets of unique words, and I am able to count how many times Baxter used words from three sets of locally defined &#8220;lexicons&#8221; of colors, &#8220;big&#8221; names, and &#8220;great&#8221; ideas. Furthermore, I am be to chart and graph trends of the works, such as when they were written and how they cluster together in terms of word usage or lexicons. [9, 10]\n<\/p>\n<p>\nI was then able to repeat the process for other subsets, items about: lutes, astronomy, Unitarians, and of course, Shakespeare. [11, 12, 13, 14]\n<\/p>\n<p>\nThe EEBO-TCP Workset Browser is not as mature as my HathiTrust Workset Browser, but it is coming along. [15] Next steps include: calculating an integer denoting the number of pages in an item, implementing a Web-based search interface to a subset&#8217;s full text as well as metadata, putting the source code (written in Python and Bash) on GitHub. After that I need to: identify more robust ways to create subsets from the whole of EEBO, provide links to the raw TEI\/XML as well as HTML versions of items, implement quite a number of cosmetic enhancements, and most importantly, support the means to compare &amp; contrast items of interest in each subset. Wish me luck?\n<\/p>\n<p>\nMore fun with well-structured data, open access content, and the definition of librarianship.\n<\/p>\n<ol>\n<li>Richard Baxter (the person) &#8211; <a href=\"http:\/\/en.wikipedia.org\/wiki\/Richard_Baxter\">http:\/\/en.wikipedia.org\/wiki\/Richard_Baxter<\/a><\/li>\n<li>Richard Baxter (works) &#8211; <a href=\"http:\/\/bit.ly\/ebbo-browser-baxter-works\">http:\/\/bit.ly\/ebbo-browser-baxter-works<\/a><\/li>\n<li>Richard Baxter (analysis of works) &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-baxter-analysis\">http:\/\/bit.ly\/eebo-browser-baxter-analysis<\/a><\/li>\n<li>EEBO-TCP &#8211; <a href=\"http:\/\/www.textcreationpartnership.org\/tcp-eebo\/\">http:\/\/www.textcreationpartnership.org\/tcp-eebo\/<\/a><\/li>\n<li>TEI &#8211; <a href=\"http:\/\/www.tei-c.org\/\">http:\/\/www.tei-c.org\/<\/a><\/li>\n<li>University of Michigan Box site &#8211; <a href=\"http:\/\/bit.ly\/1QcvxLP\">http:\/\/bit.ly\/1QcvxLP<\/a><\/li>\n<li>local cache of EEBO-TCP &#8211; <a href=\"http:\/\/bit.ly\/eebo-cache\">http:\/\/bit.ly\/eebo-cache<\/a><\/li>\n<li>dictionary of all Baxter words &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-baxter-dictionary\">http:\/\/bit.ly\/eebo-browser-baxter-dictionary<\/a><\/li>\n<li>histogram of dates &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-baxter-dates\">http:\/\/bit.ly\/eebo-browser-baxter-dates<\/a><\/li>\n<li>clusters of &#8220;great&#8221; ideas &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-baxter-cluster\">http:\/\/bit.ly\/eebo-browser-baxter-cluster<\/a><\/li>\n<li>lute &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-lute\">http:\/\/bit.ly\/eebo-browser-lute<\/a><\/li>\n<li>astronomy &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-astronomy\">http:\/\/bit.ly\/eebo-browser-astronomy<\/a><\/li>\n<li>Unitarians &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-unitarian\">http:\/\/bit.ly\/eebo-browser-unitarian<\/a><\/li>\n<li>Shakespeare &#8211; <a href=\"http:\/\/bit.ly\/eebo-browser-shakespeare\">http:\/\/bit.ly\/eebo-browser-shakespeare<\/a><\/li>\n<li>HathiTrust Workset Browser &#8211; <a href=\"https:\/\/github.com\/ericleasemorgan\/HTRC-Workset-Browser\">https:\/\/github.com\/ericleasemorgan\/HTRC-Workset-Browser<\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>I have begun creating a &#8220;browser&#8221; against content from EEBO-TCP in the same way I have created a browser against worksets from the HathiTrust. The goal is to provide &#8220;distant reading&#8221; services against subsets of the Early English poetry and prose. You can see these fledgling efforts against a complete set of Richard Baxter&#8217;s works. [&hellip;]<\/p>\n","protected":false},"author":92,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-686","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/686","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/comments?post=686"}],"version-history":[{"count":4,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/686\/revisions"}],"predecessor-version":[{"id":692,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/686\/revisions\/692"}],"wp:attachment":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/media?parent=686"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/categories?post=686"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/tags?post=686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}