What’s Eric Reading?

I have resurrected an application/system of files used to archive and disseminate things (mostly articles) I’ve been reading. I call it What’s Eric Reading? From the original About page:

I have been having fun recently indexing PDF files.

For the pasts six months or so I have been keeping the articles I’ve read in a pile, and I was rather amazed at the size of the pile. It was about a foot tall. When I read these articles I “actively” read them — meaning, I write, scribble, highlight, and annotate the text with my own special notation denoting names, keywords, definitions, citations, quotations, list items, examples, etc. This active reading process: 1) makes for better comprehension on my part, and 2) makes the articles easier to review and pick out the ideas I thought were salient. Being the librarian I am, I thought it might be cool (“kewl”) to make the articles into a collection. Thus, the beginnings of Highlights & Annotations: A Value-Added Reading List.

The techno-weenie process for creating and maintaining the content is something this community might find interesting:

  1. Print article and read it actively.
  2. Convert the printed article into a PDF file — complete with embedded OCR — with my handy-dandy ScanSnap scanner.
  3. Use MyLibrary to create metadata (author, title, date published, date read, note, keywords, facet/term combinations, local and remote URLs, etc.) describing the article.
  4. Save the PDF to my file system.
  5. Use pdttotext to extract the OCRed text from the PDF and index it along with the MyLibrary metadata using Solr.
  6. Provide a searchable/browsable user interface to the collection through a mod_perl module.

Software is never done, and if it were then it would be called hardware. Accordingly, I know there are some things I need to do before I can truely deem the system version 1.0. At the same time my excitment is overflowing and I thought I’d share some geekdom with my fellow hackers.

Fun with PDF files and open source software.

Comments are closed.