{"id":617,"date":"2014-11-19T22:54:28","date_gmt":"2014-11-19T17:54:28","guid":{"rendered":"http:\/\/blogs.nd.edu\/emorgan\/?p=617"},"modified":"2014-11-19T22:54:28","modified_gmt":"2014-11-19T17:54:28","slug":"dispersion","status":"publish","type":"post","link":"https:\/\/sites.nd.edu\/emorgan\/2014\/11\/dispersion\/","title":{"rendered":"My second Python script, dispersion.py"},"content":{"rendered":"<p>\nThis is my second Python script, dispersion.py, and it illustrates where common words appear in a text.\n<\/p>\n<blockquote>\n<pre><code>#!\/usr\/bin\/env python2\r\n\r\n# dispersion.py - illustrate where common words appear in a text\r\n#\r\n# usage: .\/dispersion.py &lt;file&gt;\r\n\r\n# Eric Lease Morgan &lt;emorgan@nd.edu&gt;\r\n# November 19, 2014 - my second real python script; \"Thanks for the idioms, Don!\"\r\n\r\n\r\n# configure\r\nMAXIMUM = 25\r\nPOS     = 'NN'\r\n\r\n# require\r\nimport nltk\r\nimport operator\r\nimport sys\r\n\r\n# sanity check\r\nif len( sys.argv ) != 2 :\r\n  print \"Usage:\", sys.argv[ 0 ], \"&lt;file&gt;\"\r\n  quit()\r\n  \r\n# get input\r\nfile = sys.argv[ 1 ]\r\n\r\n# initialize\r\nwith open( file, 'r' ) as handle : text = handle.read()\r\nsentences = nltk.sent_tokenize( text )\r\npos       = {}\r\n\r\n# process each sentence\r\nfor sentence in sentences : \r\n  \r\n  # POS the sentence and then process each of the resulting words\r\n  for word in nltk.pos_tag( nltk.word_tokenize( sentence ) ) :\r\n    \r\n    # check for configured POS, and increment the dictionary accordingly\r\n    if word[ 1 ] == POS : pos[ word[ 0 ] ] = pos.get( word[ 0 ], 0 ) + 1\r\n\r\n# sort the dictionary\r\npos = sorted( pos.items(), key = operator.itemgetter( 1 ), reverse = True )\r\n\r\n# do the work; create a dispersion chart of the MAXIMUM most frequent pos words\r\ntext = nltk.Text( nltk.word_tokenize( text ) )\r\ntext.dispersion_plot( [ p[ 0 ] for p in pos[ : MAXIMUM ] ] )\r\n\r\n# done\r\nquit()<\/code><\/pre>\n<\/blockquote>\n<p>\nI used the program to analyze two works: 1) Thoreau&#8217;s <cite>Walden<\/cite>, and 2) Emerson&#8217;s <cite>Representative Men<\/cite>. From the dispersion plots displayed below, we can conclude a few things:\n<\/p>\n<ul>\n<li>The words &#8220;man&#8221;, &#8220;life&#8221;, &#8220;day&#8221;, and &#8220;world&#8221; are common between both works.<\/li>\n<li>Thoreau discusses water, ponds, shores, and surfaces together.<\/li>\n<li>While Emerson seemingly discussed man and nature in the same breath, but none of his core concepts are discussed as densely as Thoreau&#8217;s.<\/li>\n<\/ul>\n<div id=\"attachment_619\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blogs.nd.edu\/emorgan\/files\/2014\/11\/thoreau.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-619\" src=\"http:\/\/blogs.nd.edu\/emorgan\/files\/2014\/11\/thoreau-300x225.png\" alt=\"Thoreau&#039;s Walden\" width=\"300\" height=\"225\" class=\"size-medium wp-image-619\" srcset=\"https:\/\/sites.nd.edu\/emorgan\/files\/2014\/11\/thoreau-300x225.png 300w, https:\/\/sites.nd.edu\/emorgan\/files\/2014\/11\/thoreau.png 800w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-619\" class=\"wp-caption-text\">Thoreau&#8217;s Walden<\/p><\/div>\n<div id=\"attachment_621\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blogs.nd.edu\/emorgan\/files\/2014\/11\/emerson.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-621\" src=\"http:\/\/blogs.nd.edu\/emorgan\/files\/2014\/11\/emerson-300x225.png\" alt=\"Emerson&#039;s Representative Men\" width=\"300\" height=\"225\" class=\"size-medium wp-image-621\" srcset=\"https:\/\/sites.nd.edu\/emorgan\/files\/2014\/11\/emerson-300x225.png 300w, https:\/\/sites.nd.edu\/emorgan\/files\/2014\/11\/emerson.png 800w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-621\" class=\"wp-caption-text\">Emerson&#8217;s Representative Men<\/p><\/div>\n<p>\nPython&#8217;s Natural Langauge Toolkit (<a href=\"http:\/\/www.nltk.org\">NLTK<\/a>) is a good library to get start with for digital humanists. I have to learn more though. My jury is still out regarding which is better, Perl or Python. So far, they have more things in common than differences.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is my second Python script, dispersion.py, and it illustrates where common words appear in a text. #!\/usr\/bin\/env python2 # dispersion.py &#8211; illustrate where common words appear in a text # # usage: .\/dispersion.py &lt;file&gt; # Eric Lease Morgan &lt;emorgan@nd.edu&gt; # November 19, 2014 &#8211; my second real python script; &#8220;Thanks for the idioms, Don!&#8221; [&hellip;]<\/p>\n","protected":false},"author":92,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-617","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/617","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/comments?post=617"}],"version-history":[{"count":3,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/617\/revisions"}],"predecessor-version":[{"id":622,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/617\/revisions\/622"}],"wp:attachment":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/media?parent=617"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/categories?post=617"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/tags?post=617"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}