The Distant Reader: An update
Posted on December 6, 2021 in Distant Reader by Eric Lease Morgan
A number of things have been happening with the Distant Reader, and I hope to share them here.
- Reader in use – The Reader is currently being used in a number of ways. For example, it has become a part of a data science class here at Notre Dame. It is being used in project to predict possible violent attacks. I use it on a regular basis and it has helped me understand: 1) the role of small farmers in developing countries, 2) how the Psalms have evolved over time, 3) the degree I can summarize thousands of medical documents, and 4) the similarities and difference between the novels of Jane Austen.
- Bibliography – Over the past few years, I have written a number of blog postings describing the Reader, and it is linked here in the hopes of providing you with more meaningful messages about: 1) what the Reader is, 2) what it is designed to do, and 3) how to use it.
- Reader Toolbox – The Distant Reader takes sets of unstructured data as input, applies various text mining techniques against it, and outputs a set of structured data intended for analysis — “reading”. These data sets, called “study carrels”, are very amenable to computer processing. Consequently I have developed a thing called the Reader Toolbox which makes it easy to do all sorts of feature extraction (ngrams, parts-of-speech, named entities, URLs, etc.), topic modeling, semantic indexing, and full text indexing against study carrels. Give it a try!
- Reader Library – A fledgling collection of previously created study carrels is in the process of being curated. The collection includes about 3,000 carrels on topics ranging from big ideas (love, honor, truth, justice, etc.) to COVID. The content of the carrels comes from places like Project Gutenberg, the HathiTrust, the ‘Net in general, and a data set called CORD-19. In the coming months I hope to create various indexes against the collection. Right now the iterface is functional but pretty raw.
- Sponsorships – The Reader has been supported by a number of groups over the past few years. Most recently, support has come from Microsoft’s AI for Health initiative as well as the Pittsburgh Supercomputer Center through an organization called XSEDE. All good things come to an end, and this will soon be true of their support. Thank you very much! I will be looking for a new home for the Distant Reader. Got any ideas?
- Just for fun – Lastly, you might want to listen to a podcast. Excellently produced by the folks at Lost In The Stacks, it is a humorous interview where I describe the Reader.
Thank you for… reading.