{"id":420,"date":"2013-09-12T20:35:55","date_gmt":"2013-09-12T15:35:55","guid":{"rendered":"http:\/\/blogs.nd.edu\/emorgan\/?p=420"},"modified":"2014-04-15T20:46:47","modified_gmt":"2014-04-15T15:46:47","slug":"htrc-lib","status":"publish","type":"post","link":"https:\/\/sites.nd.edu\/emorgan\/2013\/09\/htrc-lib\/","title":{"rendered":"HathiTrust Research Center Perl Library"},"content":{"rendered":"<p>\n<a href=\"http:\/\/blogs.nd.edu\/emorgan\/files\/2013\/09\/irises.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blogs.nd.edu\/emorgan\/files\/2013\/09\/irises.jpg\" alt=\"Irises\" width=\"120\" height=\"160\" class=\"alignright size-full wp-image-421\" \/><\/a>This is the README file for a tiny library of Perl subroutines to be used against the HathiTrust Research Center (<a href=\"http:\/\/www.hathitrust.org\/htrc\">HTRC<\/a>) application programmer interfaces (<a href=\"http:\/\/www.hathitrust.org\/htrc\/api-guide\">API<\/a>). The <a href=\"https:\/\/github.com\/ericleasemorgan\/htrc\"> Github distribution<\/a> ought to contain a number of files, each briefly described below:\n<\/p>\n<ul>\n<li><strong>README.md<\/strong> &#8211; this file<\/li>\n<li><strong>LICENSE<\/strong> &#8211; a copy of the GNU Public License<\/li>\n<li><strong>htrc-lib.pl<\/strong> &#8211; our raison d&#8217;\u00eatre; more below<\/li>\n<li><strong>search.pl<\/strong> &#8211; given a Solr query, return a list of no more than 100 HTRC identifiers<\/li>\n<li><strong>authorize.pl<\/strong> &#8211; given a client identifier and secret, return an authorization token<\/li>\n<li><strong>retrieve.pl<\/strong> &#8211; given a list of HTRC identifiers, return a zip stream of no more than 100 text and METS files<\/li>\n<li><strong>search-retrieve.pl<\/strong> &#8211; given a Solr query, return a zip stream of no more than 100 texts and METS files<\/li>\n<\/ul>\n<p>\nThe file doing the heavy lifting is htrc-lib.pl. It contains only three subroutines:\n<\/p>\n<ol>\n<li><strong>search<\/strong> &#8211; given a Solr query, returns a list of no more than 100 HTRC identifiers<\/li>\n<li><strong>obtainOAuth2Token<\/strong> &#8211; given a client ID and secret (supplied by the HTRC), return an authorization token, and this token is expected to be included in the HTTP header of any HTRC Data API request.<\/li>\n<li><strong>retrieve<\/strong> &#8211; given a client ID, secret, and list of HTRC identifiers, return a zip stream of no more than 100 HTRC text and METS files<\/li>\n<\/ol>\n<p>\nThe library is configured at the beginning of the file with three constants:\n<\/p>\n<ol>\n<li><strong>SOLR<\/strong> &#8211; a stub URL pointing to the location of the HTRC Solr index, and in this configuration you can change the number of search results that will be returned<\/li>\n<li><strong>AUTHORIZE<\/strong> &#8211; the URL pointing to the authorization engine<\/li>\n<li><strong>DATAAPI<\/strong> &#8211; the URL pointing to the HTRC Data API, specifically the API to get volumes<\/li>\n<\/ol>\n<p>\nThe other .pl files in this distribution are the simplest of scripts demonstrating how to use the library.\n<\/p>\n<p>\nBe forewarned. The library does very little error checking, nor is there any more documentation beyond what you are reading here.\n<\/p>\n<p>\nBefore you will be able to use the obtainOAuth2Token and retrieve subroutines, you will need to acquire a client identifier and secret from the HTRC. These are required in order for the Center to track who is using their services.\n<\/p>\n<p>\nThe home page for the HTRC is <a href=\"http:\/\/www.hathitrust.org\/htrc\">http:\/\/www.hathitrust.org\/htrc<\/a>. From there you ought to be able to read more information about the Center and their supported APIs.\n<\/p>\n<p>\nThis software is distributed under the GNU Public License.\n<\/p>\n<p>\nFinally, here is a suggestion of how to use this library:\n<\/p>\n<ol>\n<li>Use your Web browser to search the HTRC for content &#8212; <a href=\"https:\/\/htrc2.pti.indiana.edu\/HTRC-UI-Portal2\/\">https:\/\/htrc2.pti.indiana.edu\/HTRC-UI-Portal2\/<\/a> or <a href=\"https:\/\/sandbox.htrc.illinois.edu:8443\/blacklight\">https:\/\/sandbox.htrc.illinois.edu:8443\/blacklight<\/a> &#8212; ultimately generating a list of HTRC identifiers.<\/li>\n<li>Programmatically feed the list of identifiers to the retrieve subroutine.<\/li>\n<li>&#8220;Inflate&#8221; the zip stream into its constituent text and METS files.<\/li>\n<li>Do analysis against the result.<\/li>\n<\/ol>\n<p>\nI&#8217;m tired. That is enough for now. Enjoy.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the README file for a tiny library of Perl subroutines to be used against the HathiTrust Research Center (HTRC) application programmer interfaces (API). The Github distribution ought to contain a number of files, each briefly described below: README.md &#8211; this file LICENSE &#8211; a copy of the GNU Public License htrc-lib.pl &#8211; our [&hellip;]<\/p>\n","protected":false},"author":92,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-420","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/420","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/comments?post=420"}],"version-history":[{"count":9,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/420\/revisions"}],"predecessor-version":[{"id":430,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/420\/revisions\/430"}],"wp:attachment":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/media?parent=420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/categories?post=420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/tags?post=420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}