Some automated analysis of Ralph Waldo Emerson’s works
Posted on June 12, 2015 in Uncategorized by Eric Lease Morgan
emerson
This page describes a corpus named emerson, and it was programmatically created with a program called the HathiTrust Research Center Workset Browser.
General statistics
An analysis of the corpus’s metadata provides an overview of what and how many things it contains, when things were published, and the sizes of its items:
- Number of items – 62
- Publication date range – 1838 to 1956 (histogram : boxplot)
- Sizes in pages – 20 to 660 (histogram : boxplot)
- Total number of pages – 11866
- Average number of pages per item – 191
Possible correlations between numeric characteristics of records in the catalog can be illustrated through a matrix of scatter plots. As you would expect, there is almost always a correlation between pages and number of words. Are others exist? For more detail, browse the catalog.
Notes on word usage
By counting and tabulating the words in each item of the corpus, it is possible to measure additional characteristics:
- Sizes of items in words – 1032 to 78195 (histogram : boxplot)
- Total number of words – 1176289
- Average number of words per item – 18972
- Total number of unique words – 30507
- Most common words – man (8911) one (7704) men (6260) every (5108) nature (4837) us (4652) life (4072) good (3904) must (3825) new (3812) great (3760) like (3518) world (3455) shall (3391) would (3359) see (3349) yet (3099) may (3092) much (2978) time (2928) thought (2902) never (2750) old (2700) mind (2692) day (2605)
Perusing the list of all words in the corpus (and their frequencies) as well as all unique words can prove to be quite insightful. Are there one or more words in these lists connoting an idea of interest to you, and if so, then to what degree do these words occur in the corpus?
To begin to see how words of your choosing occur in specific items, search the collection.
Through the creation of locally defined “dictionaries” or “lexicons”, it is possible to count and tabulate how specific sets of words are used across a corpus. This particular corpus employs three such dictionaries — sets of: 1) “big” names, 2) “great” ideas, and 3) colors. Their frequencies are listed below:
- Most common “big” names – plato (424) goethe (364) milton (283) james (250) montaigne (250) homer (206) bacon (189) plutarch (172) newton (141) swift (136) dante (131) shakespeare (116) smith (99) aristotle (88) mill (72) chaucer (71) augustine (51) hegel (51) plotinus (51) kant (46) locke (43) gibbon (41) berkeley (37) archimedes (35) aristophanes (35) For more detail, see the list of “big” name frequencies.
- Most common “great” ideas – man (8912) one (7705) nature (4838) life (4073) good (3905) world (3456) time (2929) mind (2693) love (2333) many (2247) soul (1907) god (1742) history (1639) truth (1602) beauty (1434) art (1380) law (1207) state (1135) form (1058) sense (927) poetry (898) virtue (844) religion (698) science (693) matter (691) For more detail, see the list of “great” idea frequencies.
- Colors – white (310) green (236) red (219) black (211) brown (206) blue (167) gray (129) purple (102) yellow (85) orange (23) For more detail, see the list of color word frequencies.
The distribution of words (histograms and boxplots) and the frequency of words (wordclouds), and how these frequencies “cluster” together can be illustrated:
- Histograms – “big” names; “great” ideas; colors
- Boxplots – “big” names; “great” ideas; colors
- Wordclouds – most common words; “big” names; “great” ideas; colors
- Cluster dendrograms – most common words; “big” names; “great” ideas; colors
Items of interest
Based on the information above, the following items (and their associated links) are of possible interest:
- Shortest item (20 p.) – The wisest words ever written on war / by R.W. Emerson … Preface by Henry Ford. (HathiTrust : WorldCat : plain text)
- Longest item (660 p.) – Representative men : nature, addresses and lectures. (HathiTrust : WorldCat : plain text)
- Oldest item (1838) – An address delivered before the senior class in Divinity College, Cambridge, Sunday evening, 15 July, 1838 / by Ralph Waldo Emerson. (HathiTrust : WorldCat : plain text)
- Most recent (1956) – Emerson at Dartmouth; a reprint of his oration, Literary ethics. With an introd. by Herbert Faulkner West. (HathiTrust : WorldCat : plain text)
- Most thoughtful item – Transcendentalism : and other addresses / by Ralph Waldo Emerson. (HathiTrust : WorldCat : plain text)
- Least thoughtful item – Emerson-Clough letters, edited by Howard F. Lowry and Ralph Leslie Rusk. (HathiTrust : WorldCat : plain text)
- Biggest name dropper – A letter of Emerson : being the first publication of the reply of Ralph Waldo Emerson to Solomon Corner of Baltimore in 1842 ; With analysis and notes by Willard Reed. (HathiTrust : WorldCat : plain text)
- Fewest quotations – The wisest words ever written on war / by R.W. Emerson … Preface by Henry Ford. (HathiTrust : WorldCat : plain text)
- Most colorful – Excursions. Illustrated by Clifton Johnson. (HathiTrust : WorldCat : plain text)
- Ugliest – An address delivered before the senior class in Divinity College, Cambridge, Sunday evening, 15 July, 1838 / by Ralph Waldo Emerson. (HathiTrust : WorldCat : plain text)