Catholic Youth Literature Project: A Beginning
Posted on August 27, 2011 in Uncategorized by Eric Lease Morgan
This posting outlines some of the beginnings behind the Catholic Youth Literature Project.
The Catholic Youth Literature Project is about digitizing, teaching, and learning from public domain literature from the 1800’s intended for Catholic children. The idea is to bring together some of this literature, make it easily available for downloading and reading, and enable learners to “read” texts in new & different ways. I am working with Jean McManus, Pat Lawton, and Sean O’Brien on this Project. My specific tasks are to:
- assemble a corpus of documents, in this case about 50 PDF files of books written for Catholic children from the 1800’s
- “catalog” the each item in the corpus and describe them using author names, titles, size (measured in words), readability (measured by grade level, etc.), statistically significant key words, names & places programmatically extracted from the texts, and dates
- enable to learners to download the book and read it in the traditional manner
- provide “services against the texts” where these services include things such as but not limited to: list the most frequently used words or phrases in the book, list all the words starting with a given letter, chart & graph where those words and phrases exist in the text, employ a concordance against the texts so the reader can see how the words are used in context, list all the names & places from the text an allow the reader to look them up in Wikipedia as well as plot them on a world map, programmatically summarize the book, extract all the date-related values from the book and plot the result on a timeline, tabulate the parts-of-speech (nouns, verbs, adjectives, etc.) in a document and graph the result, and provide the means for centrally discussing the content of the books with fellow learners
- finally, provide all of these services on an iPad
Written (and spoken) language follow sets of loosely defined rules. If this were not the case, then none of us would be able to understand one another. If I have digital versions of books, I can use a computer to extract and tabulate the words/phrases it contains, once that is done I can then look for patterns or anomalies. For example, I might use these tools to see how Thoreau uses the word “woodchuck” in Walden. When I do I see that he doesn’t like woodchucks because they eat his beans. In addition, I can see how Thoreau used the word “woodchuck” in a different book and literally see how he used it differently. In the second book he discusses woodchucks in relation to other small animals. A reader could learn these things through the traditional reading process, but the time and effort to do so is laborious. These tools will enable the reader to do such things across many books at the same time.
In the Spring O’Brien is teaching a class on children and Catholicism. He will be using my tool as a part of his class.
I do not advocate this tool as a replacement for traditional “close” reading. This is a supplement. It is an addition. These tools are analogous to tables-of-contents and back-of-the-book indexes. Just because a person reads those things does not mean they understand the book. Similarly, just because they use my tools does not mean they know what the book contains.
I have created the simplest of “catalogs” so far, and here is screen dump:
You can also try the “catalog” for yourself, but remember, the interface is designed for iPads (and other Webkit-based browsers). Your milage may vary.
Wish us luck, and ‘more later.