- About

Hands-on text analysis workshop

Posted on January 9, 2015 in Uncategorized by Eric Lease Morgan

I have all but finished writing a hands-on text analysis workshop. From the syllabus:

The purpose of this 5-week workshop is to increase the knowledge of text mining principles among participants. By the end of the workshop, students will be able to describe the range of basic text mining techniques (everything from the creation of a corpus, to the counting/tabulating of words, to classification & clustering, and visualizing the results of text analysis) and have garnered hands-on experience with all of them. All the materials for this workshop are available online. There are no prerequisites except for two things: 1) a sincere willingness to learn, and 2) a willingness to work at a computer’s command line interface. Students are really encouraged to bring their own computers to class.

The workshop is divided into the following five, 90-minute sessions, one per week:

Overview of text mining and working from the command line

Building a corpus

Word and phrase frequencies

Extracting meaning with dictionaries, partsofspeech analysis, and named entity recognition

Classification and topic modeling

For better or for worse, the workshop’s computing environment will be the Linux command line. Besides the usual command-line suspects, participants will get their hands dirty with wget, tika, a bit of Perl, a lot of Python, Wordnet, Treetagger, Standford’s Named Entity Recognizer, and Mallet.

For more detail, see the syllabus, sample code, and corpus.

Comments are closed.

Days in the Life of a Librarian

Recent Posts

Subscribe

Hands-on text analysis workshop

Posted on January 9, 2015 in Uncategorized by Eric Lease Morgan