My first R script, wordcloud.r

This is my first R script, wordcloud.r:

#!/usr/bin/env Rscript

# wordcloud.r - output a wordcloud from a set of files in a given directory

# Eric Lease Morgan <eric_morgan@infomotions.com>
# November 8, 2014 - my first R script!


# configure
MAXWORDS    = 100
RANDOMORDER = FALSE
ROTPER      = 0

# require
library( NLP )
library( tm )
library( methods )
library( RColorBrewer )
library( wordcloud )

# get input; needs error checking!
input <- commandArgs( trailingOnly = TRUE )
  
# create and normalize corpus
corpus <- VCorpus( DirSource( input[ 1 ] ) )
corpus <- tm_map( corpus, content_transformer( tolower ) )
corpus <- tm_map( corpus, removePunctuation )
corpus <- tm_map( corpus, removeNumbers )
corpus <- tm_map( corpus, removeWords, stopwords( "english" ) )
corpus <- tm_map( corpus, stripWhitespace )

# do the work
wordcloud( corpus, max.words = MAXWORDS, random.order = RANDOMORDER, rot.per = ROTPER )

# done
quit()

Given the path to a directory containing a set of plain text files, the script will generate a wordcloud.

Like Python, R has a library well-suited for text mining — tm. Its approach to text mining (or natural language processing) is both similar and dissimilar to Python’s. They are similar in that they both hope to provide a means for analyzing large volumes of texts. It is similar in that they use different underlying data structures to get there. R might be more for analytic person. Think statistics. Python may be more for the “literal” person, all puns intended. I will see if I can exploit the advantages of both.

Comments are closed.