Distant Reader Bibliography To-date

pondOver the past few years, I have written a number of blog postings which describe a thing called the Distant Reader. I have reorganized many of these postings into the following bibliography in the hopes of providing you with a more meaningful message about: 1) what the Reader is, 2) what it is designed to do, and 3) how to use it. Happy…. reading!

If nothing else, read these

  • What is the Distant Reader and why should I care? – The Distant Reader is a tool for reading. It takes an arbitrary amount of unstructured data (text) as input, and it outputs sets of structured data for analysis — reading. Given a corpus of any size, the Distant Reader will analyze the corpus, and it will output a myriad of reports…
  • The Distant Reader and its five different types of input – The Distant Reader can take five different types of input, and this blog posting describes what they are.
  • Distant Reader “study carrels”: A manifest – The results of the Distant Reader process is the creation of a “study carrel” — a set of structured data files intended to help you to further “read” your corpus. This blog posting describes these files in greater detail.
  • Introducing the Distant Reader Toolbox – The Distant Reader Toolbox is a command-line tool for interacting with data sets created by the Distant Reader. This posting describes the Toolbox in greater detail.

Tools and features

  • Distant Reader Workshop Hands-On Activities – This is a small set of hands-on activities presented for the Keystone Digital Humanities 2021 annual meeting. The intent of the activities is to familiarize participants with the use and creation of Distant Reader study carrels. This page is also available as PDF file designed for printing. Introduction The Distant Reader is a tool for…
  • Searching Project Gutenberg at the Distant Reader – The venerable Project Gutenberg is a collection of about 60,000 transcribed editions of classic literature in the public domain, mostly from the Western cannon. A subset of about 30,000 Project Gutenberg items has been cached locally, indexed, and made available through a website called the Distant Reader. Search the index and create a study carrel from the result. Great practice.
  • Searching CORD-19 at the Distant Reader – This blog posting documents the query syntax for an index of scientific journal articles called CORD-19. CORD-19 is a data set of scientific journal articles on the topic of COVID-19. As of this writing, it includes more than 750,000 items. Search the index and create a study carrel from the results.
  • OpenRefine and the Distant Reader – The student, researcher, or scholar can use OpenRefine to open one or more different types of delimited files, files created by the Distant Reader. This posting is an elaboration of this idea.
  • The Distant Reader and concordancing with AntConc – Concordancing is really a process about find, and AntConc is a very useful program for this purpose. Given a Distant Reader study carrel, AntConc can be very useful.
  • Wordle and the Distant Reader – Visualized word frequencies, while often considered sophomoric, can be quite useful when it comes to understanding a text, especially when the frequencies are focused on things like parts-of-speech, named entities, or co-occurrences. This posting describes this in more detail


  • How to use text mining to address research questions – This tiny blog posting outlines a sort of recipe for the use of text mining to address research questions. Through the use of this process the student, researcher, or scholar can easily supplement the traditional reading process. Articulate a research question – This is one of the more difficult parts of the process, and the…
  • The Distant Reader Workbook – I am in the process of writing a/the Distant Reader workbook, which will make its debut at a Code4Lib preconference workshop in March 2020. Below is both the “finished” introduction and table-of-contents of the workbook
  • PTPBio and the Reader – The following missive was written via an email message to a former colleague, and it is a gentle introduction to Distant Reader “study carrels”.
  • Invitation to hack the Distant Reader – We invite you to write a cool hack enabling students & scholars to “read” an arbitrarily large corpus of textual materials. Introduction A website called The Distant Reader takes an arbitrary number of files or links to files as input. [1] The Reader then amasses the files locally, transforms them into plain text files, and…
  • The Distant Reader and a Web-based demonstration – The following is an announcement of a Web-based demonstration to the Distant Reader: Please join us for a web-based demo and Q&A on The Distant Reader, a web-based text analysis toolset for reading and analyzing texts that removes the hurdle of acquiring computational expertise. The Distant Reader offers a ready way to onboard scholars to…
  • A Distant Reader Field Trip to Bloomington – Yesterday I was in Bloomington (Indiana) for a Distant Reader field trip. More specifically, I met with Marlon Pierce and Team XSEDE to talk about Distant Reader next steps. We discussed the possibility of additional grant opportunities, possible ways to exploit the Airivata/Django front-end, and Distant Reader embellishments such as: Distant Reader Lite – a…

Comments are closed.