Dictionary-Based Tool Selection

We can give you a hammer here…

But not everything is a nail!

So you want to use dictionary-based computer-aided text analysis? Let’s help you find the right tool for your research question. There are probably a hundred text analysis tools out there, I’ve used these three extensively, so I’ll discuss these three.

Allow me to outline what I know about these tools to help you decide which tool is right for you.

LIWC was developed by Jamie Pennebaker and his associates at the University of Texas at Austin. Pennebaker developed this tool in the psychology literature, looking at both how language influences and is influenced by psychological phenomena. For instance he has a very interesting book about the unexpected importance of pronouns use in psychology.

LIWC2015 ships with nearly 100 built-in dictionaries. Dictionaries range from parts of speech (e.g., pronouns) to deeper psychological constructs (e.g., need for affiliation).

LIWC can handle the most diverse range of file types. For instance, LIWC can read Excel spreadsheets, Word documents, plain text files, PDFs, and so on.

My take: If you’re doing research in psychology/social psychology, this is the package for you. DICTION and CAT Scanner will work, but this is the tool that is closest aligned with what you do.

DICTION was developed by Rodrick Hart and his associated at the University of Texas at Austin. Hart developed this tool in the political science literature, looking at the linguistic characteristics of political discourse.

DICTION 7.0 ships with nearly 40 dictionaries and calculated features (e.g., combinations of the dictionaries). Dictionaries generally emphasize important elements of political text, such as certainty, commonality, and optimism. DICTION also has results from over 50,000 previously-analyzed texts of various kinds built-in with which comparisons can be made.

DICTION can also handle several different file types, such as html, plain text files, PDF, and Word files.

My take: If you’re doing research in political science, this is the package for you. LIWC and CAT Scanner will work, but this tool is the closest aligned with what you do.

CAT Scanner

CAT Scanner was developed by Aaron McKenny of Indiana University and Jeremy Short of University of Oklahoma, respectively. They developed this tool in the management and entrepreneurship literatures, looking at the influence of language in organizational phenomena.

While CAT Scanner ships with nearly 20 dictionaries, these dictionaries are NOT proprietary. Over 100 additional dictionaries can be downloaded from the CAT Scanner website.

CAT Scanner can ONLY handle plain text (.txt) files at the moment and is a bit slower than both DICTION and LIWC.

CAT Scanner has a garbage character removal and inductive word list generation tool that LIWC and DICTION do not offer.

My take: If you’re not doing research in political science or psychology/social psychology and you do not need the built-in dictionaries from either LIWC or DICTION, consider starting here. It’s free, so if you end up deciding to go with one of the other packages, you will not have lost anything by testing the waters with CAT Scanner. If it does work out for you, you saved some money.

Other Options

NVivo and other Computer-Assisted Qualitative Data Analysis (CAQDAS) tools can often back into dictionary-based analysis using their query functions.

These tools tend to be considerably more expensive than dictionary-based CATA software; however, if your university has a site license for these tools, it could be the way to go.

Also, if you plan to do manual coding in addition to your dictionary coding, CAQDAS tools can make your life a lot easier. It’s something to consider, but with the price tag of some of these tools, make sure it’s what you want before you spring for it.

Let’s be honest, dictionary-based computer-aided text analysis is NOT rocket science. You can recreate the basic functionalities of these tools in probably 30 lines of code (considerably less if you want indecipherable code).

If you are already an R or Python aficionado, it may make sense to add text analysis to your R/Python toolbelt. This will not only enable you to do dictionary-based analyses, but will enable you to use more advanced methods as well.

I am a fan of Python, and the nltk and stanfordnlp packages are my go-to tools of choice; however, most of dictionary-based analysis can be done using a simple Counter from the collections package.

I am not an R user, but Stanford’s coreNLP is available in R as well (in the coreNLP package) and there are a number of other tools available in R (tm, tidytext, corpustools, etc).