Early English love was black & white
Posted on June 15, 2015 in Uncategorized by Eric Lease Morgan
Apparently, when it comes to the idea of love during the Early English period, everything is black & white.
Have harvested the totality of the EEBO-TCP (Early English Books Online – Text Creation Partnership) corpus. Using an extraordinarily simple (but very effective) locally developed indexing system, I extracted all the EEBO-TCP identifiers whose content was cataloged with the word love. I then fed these identifiers to a suite of software which: 1) caches the EEBO-TCP TEI files locally, 2) indexes them, 3) creates a browsable catalog of them, 4) supports a simle full text search engine against them, and 5) reports on the whole business (below). Through this process I have employed three sets of “themes” akin to the opposite of stop (function) words. Instead of specifically eliminating these words from the analysis, I specifically do analysis based on these words. One theme is “big” names. Another theme is “great” ideas. The third them is colors: white, black, red, yellow, blue, etc. Based on the ratio of each item’s number of words compared the number of times specific color words appear, I can generate a word cloud of colors (or colours) words, and you can “see” that in terms of love, everything is black & white. Moreover, the “most colorful” item is entitled The whole work of love, or, A new poem, on a young lady, who is violently in love with a gentleman of Lincolns-Inn by a student in the said art. — a charming, one-page document whose first two lines are:
LOVE is a thing that’s not on Reaſon laid,
But upon Nature and her Dictates made.
The corpus of the EEBO-TCP is some of the cleanest data I’ve ever seen. The XML is not only well-formed, but conforms the TEI schema. The metadata is thorough, (almost) 100% complete, (usually) consistently applied. It comes with very effective stylesheets, and the content is made freely easily available in a number of places. It has been a real joy to work with!
General statistics
An analysis of the corpus’s metadata provides an overview of what and how many things it contains, when things were published, and the sizes of its items:
- Number of items – 156
- Publication date range – 1493 to 9999 (histogram : boxplot)
- Sizes in pages – 1 to 606 (histogram : boxplot)
- Total number of pages – 12332
- Average number of pages per item – 79
Possible correlations between numeric characteristics of records in the catalog can be illustrated through a matrix of scatter plots. As you would expect, there is almost always a correlation between pages and number of words. Are others exist? For more detail, browse the catalog.
Notes on word usage
By counting and tabulating the words in each item of the corpus, it is possible to measure additional characteristics:
- Sizes of items in words – 97 to 127471 (histogram : boxplot)
- Total number of words – 1348641
- Average number of words per item – 8645
- Total number of unique words – 49942
- Most common words – god (15610) love (12184) may (8301) one (6911) man (6862) us (6838) yet (6475) hath (6124) good (6067) upon (5721) would (5523) men (5307) things (4956) much (4819) lord (4515) loue (4269) great (4201) make (4027) doth (3850) world (3736) shall (3677) thing (3556) know (3549) heart (3476) therefore (3383)
Perusing the list of all words in the corpus (and their frequencies) as well as all unique words can prove to be quite insightful. Are there one or more words in these lists connoting an idea of interest to you, and if so, then to what degree do these words occur in the corpus?
To begin to see how words of your choosing occur in specific items, search the collection.
Through the creation of locally defined “dictionaries” or “lexicons”, it is possible to count and tabulate how specific sets of words are used across a corpus. This particular corpus employs three such dictionaries — sets of: 1) “big” names, 2) “great” ideas, and 3) colors. Their frequencies are listed below:
- Most common “big” names – smith (108) plato (108) galen (87) hippocrates (80) james (67) homer (49) swift (44) plutarch (36) augustine (25) aristotle (25) virgil (23) euripides (18) lucretius (13) mill (13) sterne (11) aquinas (10) tacitus (9) herodotus (9) sophocles (8) plotinus (8) apollonius (6) ptolemy (6) aristophanes (5) locke (5) aurelius (5) For more detail, see the list of “big” name frequencies.
- Most common “great” ideas – god (15611) love (12185) one (6912) man (6863) good (6068) world (3737) many (3107) life (2945) time (2850) nature (2494) soul (1810) death (1774) law (1536) desire (1315) mind (1238) truth (1234) art (1209) knowledge (1051) peace (971) matter (970) duty (963) sin (943) religion (882) evil (880) cause (838) For more detail, see the list of “great” idea frequencies.
- Colors – black (235) white (207) red (84) green (61) purple (47) yellow (20) brown (17) blue (13) gray (13) orange (4) For more detail, see the list of color word frequencies.
The distribution of words (histograms and boxplots) and the frequency of words (wordclouds), and how these frequencies “cluster” together can be illustrated:
- Histograms – “big” names; “great” ideas; colors
- Boxplots – “big” names; “great” ideas; colors
- Wordclouds – most common words; “big” names; “great” ideas; colors
- Cluster dendrograms – most common words; “big” names; “great” ideas; colors
Items of interest
Based on the information above, the following items (and their associated links) are of possible interest:
- Shortest item (1 p.) – Now she that I louyd trewly beryth a full fayre face hath chosen her … (TEI : HTML : plain text)
- Longest item (606 p.) – Psyche, or, Loves mysterie in XX canto’s, displaying the intercourse betwixt Christ and the soule / by Joseph Beaumont … (TEI : HTML : plain text)
- Oldest item (1493) – This tretyse is of loue and spekyth of iiij of the most specyall louys that ben in the worlde and shewyth veryly and perfitely bi gret resons and causis, how the meruelous [and] bounteuous loue that our lord Ihesu cryste had to mannys soule excedyth to ferre alle other loues … Whiche tretyse was translatid out of frenshe into englyshe, the yere of our lord M cccc lxxxxiij, by a persone that is vnperfight insuche werke … (TEI : HTML : plain text)
- Most recent (9999) – Ovid’s Art of love; in three books: : together with his Remedy of love: / translated into English verse, by several eminent hands: ; to which are added, The court of love, The history of love, and Armstrong’s Oeconomy of love. (TEI : HTML : plain text)
- Most thoughtful item – A sermon directing what we are to do, after strict enquiry whether or no we truly love God preached April 29, 1688. (TEI : HTML : plain text)
- Least thoughtful item – Amoris effigies, sive, Quid sit amor? efflagitanti responsum (TEI : HTML : plain text)
- Biggest name dropper – Wit for money, or, Poet Stutter a dialogue between Smith, Johnson, and Poet Stutter : containing reflections on some late plays and particularly, on Love for money, or, The boarding school. (TEI : HTML : plain text)
- Fewest quotations – Mount Ebal, or A heavenly treatise of divine love Shewing the equity and necessity of his being accursed that loves not the Lord Iesus Christ. Together with the motives meanes markes of our love towards him. By that late faithfull and worthy divine, John Preston, Doctor in Divinitie, chaplaine in ordinary to his Majestie, master of Emmanuel Colledge in Cambridge, and sometimes preacher of Lincolnes Inne. (TEI : HTML : plain text)
- Most colorful – The whole work of love, or, A new poem, on a young lady, who is violently in love with a gentleman of Lincolns-Inn by a student in the said art. (TEI : HTML : plain text)
- Ugliest – Eubulus, or A dialogue, where-in a rugged Romish rhyme, (inscrybed, Catholicke questions, to the Protestaut [sic]) is confuted, and the questions there-of answered. By P.A. (TEI : HTML : plain text)