{"id":608,"date":"2014-11-10T21:43:41","date_gmt":"2014-11-10T16:43:41","guid":{"rendered":"http:\/\/blogs.nd.edu\/emorgan\/?p=608"},"modified":"2014-11-10T21:43:41","modified_gmt":"2014-11-10T16:43:41","slug":"python","status":"publish","type":"post","link":"https:\/\/sites.nd.edu\/emorgan\/2014\/11\/python\/","title":{"rendered":"My first Python script, concordance.py"},"content":{"rendered":"<p>\nBelow is my first Python script, concordance.py:\n<\/p>\n<blockquote><p><code><\/p>\n<pre>\r\n#!\/usr\/bin\/env python2\r\n\r\n# concordance.py - do KWIK search against a text\r\n#\r\n# usage: .\/concordance.py &lt;file&gt; &lt;word&gt;ph\r\n\r\n# Eric Lease Morgan &lt;emorgan@nd.edu&gt;\r\n# November 5, 2014 - my first real python script!\r\n\r\n\r\n# require\r\nimport sys\r\nimport nltk\r\n\r\n# get input; needs sanity checking\r\nfile = sys.argv[ 1 ]\r\nword = sys.argv[ 2 ]\r\n\r\n# do the work\r\ntext = nltk.Text( nltk.word_tokenize( open( file ).read( ) ) )\r\ntext.concordance( word )\r\n\r\n# done\r\nquit()\r\n<\/pre>\n<p><\/code><\/p><\/blockquote>\n<p>\nGiven the path to a plain text file as well as a word, the script will output no more than twenty-five lines containing the given word. It is a keyword-in-context (KWIC) search engine, one of the oldest text mining tools in existence.\n<\/p>\n<p>\nThe script is my first foray into Python scripting. While Perl is cool (and &#8220;kewl&#8221;), it behooves me to learn the language of others if I expect good communication to happen. This includes others using my code and me using the code of others. Moreover, Python comes with a library (module) call the Natural Langauge Toolkit (<a href=\"http:\/\/www.nltk.org\">NLTK<\/a>) which makes it relatively easy to get my feet wet with text mining in this environment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Below is my first Python script, concordance.py: #!\/usr\/bin\/env python2 # concordance.py &#8211; do KWIK search against a text # # usage: .\/concordance.py &lt;file&gt; &lt;word&gt;ph # Eric Lease Morgan &lt;emorgan@nd.edu&gt; # November 5, 2014 &#8211; my first real python script! # require import sys import nltk # get input; needs sanity checking file = sys.argv[ 1 [&hellip;]<\/p>\n","protected":false},"author":92,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-608","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/608","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/comments?post=608"}],"version-history":[{"count":2,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/608\/revisions"}],"predecessor-version":[{"id":610,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/posts\/608\/revisions\/610"}],"wp:attachment":[{"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/media?parent=608"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/categories?post=608"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.nd.edu\/emorgan\/wp-json\/wp\/v2\/tags?post=608"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}