The Distant Reader and concordancing with AntConc
Posted on January 31, 2020 in Distant Reader by Eric Lease Morgan
Concordancing is really a process about find, and AntConc is a very useful program for this purpose. Given one or more plain text files, AntConc will enable the student, researcher, or scholar to: find all the occurrences of a word, illustrate where the word is located, navigate through document(s) where the word occurs, list word collocations, and calculate quite a number of useful statistics regarding a word. Concordancing, dating from the 13th Century, is the oldest form of text mining. Think of it as control-F (^f) on steroids. AntConc does all this and more. For example, one can load all of the Iliad and the Odyssey into AntConc. Find all the occurrences of the word ship, visualize where ship appears in each chapter, and list the most significant words associated with the word ship.
AntConc recipes
This recipe simply implements search:
- Download and install AntConc
- Use the “Open Files(s)…” menu option to open all files in the txt directory
- Select the Concordance tab
- Enter a word of interest into the search box
- Click the Start button
The result ought to be a list of phrases where the word of interest is displayed in the middle of the screen. In modern-day terms, such a list is called a “key word in context” (KWIC) index.
This recipe combines search with “control-F”:
- Select the Concordance tab
- Enter a word of interest into the search box
- Click the Start button
- Peruse the resulting phrases and click on one of interest; the result ought to a display of a text and the search term(s) is highlighted in the larger context
- Go to Step #1 until tired
This recipe produces a dispersion plot, an illustration of where a search term appears in a document:
- Select the Concordance tab
- Enter a word of interest into the search box
- Select the “Concordance Plot” tab
The result will be a list of illustrations. Each illustration will include zero or more vertical lines denoting the location of your search term in a given file. The more lines in each illustrations, the more times the search terms appear in the document.
This recipe counts & tabulates the frequency of words:
- Select the “Word List” tab
- Click the Start button; the result will be a list of all the words and their frequencies
- Scroll up and down the list to get a feel for what is common
- Select a word of interest; the result will be the same as if you entered the word in Recipe #1
It is quite probable the most frequent words will be “stop words” like the, a, an, etc. AntConc supports the elimination of stop words, and the Reader supplies a stop word list. Describing how to implement this functionality is too difficult to put into words. (No puns intended.) But here is an outline:
- Select the “Tool Preferences” menu option
- Select the “Word List” category
- Use the resulting dialog box to select a stop words list, and such a list is called stopwords.txt found in the etc directory
- Click the Apply button
- Go to Step #1; and the result will be a frequency list sans any stop words, and the result will be much more meaningful
Ideas are rarely articulated through the use of individual words; ideas are usually articulated through the use of sets of words (ngrams, sentences, paragraphs, etc.). Thus, as John Rupert Firth once said, “You shall know a word by the company it keeps.” This recipe outlines how to list word co-occurrences and collocations:
- Select the “Cluster/N-grams” tab
- Enter a word of interest in the search box
- Click the Start button; the result ought to be a list of two-word phrases (bigrams) sort in frequency order
- Select a phrase of interest, and the result will just as if you had search for the phrase in Recipe #1
- Go to Step #1 until tired
- Select the Collocates tab
- Enter a word of interest in the search box
- Click the Start button; the result ought to be a list of words and associated scores, and the scores compare the frequencies of the search word and the given word; words with higher scores can be considered “more interesting”
- Select “Sort by Freq” from the “Sort by” pop-up menu
- Click the Sort button; the result will be the same list of words and associated scores, but this time the list will be sorted by the frequency of the search term/given word combination
Again, a word is known by the company it keeps. Use the co-occurrences and collocations features to learn how a given word (or phrase) is associated with other words.
There is much more to AntConc than outlined in the recipes outlined above. Learning more is left up to you, the student, research, and scholar.