HathiTrust Workset Browser on GitHub

cloudI have put my (fledgling) HathiTrust Workset Browser on GitHub. Try:


The Browser is a tool for doing “distant reading” against HathiTrust “worksets”. Given a workset rsync file, it will cache the workset’s content locally, index it, create some reports against the content, and provide the means to search/browse the collection. It should run out of the box on Linux and Macintosh computers. It requires the bash shell and Python, which come for free on these operating systems. Some sample content is available at:


Developing code with and through GitHub is interesting. I’m learning.

