There are a few questions that come up frequently regarding CAT Scanner, I include these below in case they might be helpful to you as well.
The next version of CAT Scanner we hope to make available across Windows, Macintosh, and PC. However, in the meantime, LIWC is a fantastic tool and has a Macintosh version available.
If you do not know where your dictionary folder is, go into the Execute Computer-Aided Text Analysis tool, click on Dictionaries --> Set Dictionaries Folder. It will open up a window with your Dictionaries Folder highlighted (just make sure not to click anything other than the scroll bars as doing so could select a different folder). Click 'Cancel' when done unless you intend to change the dictionary folder.
Aaron's Tip: If you use Dropbox or a similar file sharing tool, consider using a folder on that as your dictionaries folder. In this way you can have your dictionaries in-sync across multiple devices. You will have to 'change' your dictionary folder in CAT Scanner on each device to point to that directory individually. But once you are done, you will only ever have to add/change dictionaries in one place for them to be accessible to all devices.
I'm getting a message "The following dictionaries are malformed:" when I open the Execute Computer-Aided Text Analysis tool.
- Deleting or changing one of the dictionary section markers (i.e., #1-MEMO#, #2-METADATA#, #3-WORDLIST#, ###)
- Changing something outside of the quotation marks in the metadata section
- Adding or changing something besides words to the wordlist
- An issue with the quotation marks – usually: missing a quotation mark for one or more of your words or in the metadata section or having a stylized quotation mark (e.g., “ ” ) instead of a regular quotation mark (e.g., " ), but can also be that you included a 'word' with quotation marks in it (not allowed)
You can usually rule these issues out pretty quickly. Make sure to back up your original dictionary first, of course.
- Create a copy of a dictionary that is working right now or download/generate a template that you have tested and you know works (for clarity, I’ll refer to this as the ‘template’ below)
- Replace the template memo with the new memo (if there are any pound signs in the memo, try deleting those – they *should* be fine, but when debugging I always do this)
- Replace the template title with the new title (making sure to only change the contents between the quotation marks)
- Replace the template variable name with the new variable name (making sure to only change the contents between the quotation marks)
- Delete everything except for one word from the previous list between #3-WORDLIST# and ###
- Save it (as a new .dict file so as not to overwrite your other dictionary) and try loading it in CAT Scanner – hopefully it works now... it's empty, but it should work... if not, contact me and send it to me and I'll take a look
- Start copying in the word list in small chunks, saving, and reloading the dictionary in CAT Scanner
- When you see the error appear, that means that one or more of the words in the chunk you just loaded is causing the error. For each of those words you just copied in:
- Make sure that there is both an opening and closing quotation mark
- Make sure that the quotation marks aren't stylized (they should look like two vertical lines - ", not italicized like MS Office uses - “ ”)
- Make sure that there is one word per row
- Make sure that there is nothing outside of the quotation marks
- Repeat this step until the error goes away then resume copying in chunks per step 8
- If you can't get the error to go away, start deleting words out of the most recent chunk until it does – when the error goes away you'll know that the most recent word deleted was an offending word – you can look more carefully at that word to see if something sticks out about it.
This is the process that I follow when I have issues. What often catches me is the stylized quotation marks – I show an example below. Fortunately, if this is the issue, you can use the "Edit & Replace" function of Notepad to find all the stylized quotation marks and replace them with the normal ones. Just copy and paste the stylized quotation mark into the "Find what" box and type a quotation mark into the "Replace with" box and click the "Replace All"; button. You will have to do it both for the "opening" (”) and "closing" (”) quotation mark stylizations separately, but this can make the process faster.
The source of the problem often arises from Microsoft trying to make your life easier (but failing in this instance). In the Notepad, when saving files the default is to add .txt to whatever you type in. So when you type in "myfile.dict", it will save it as "myfile.dict.txt". CAT Scanner will not pick this up because it thinks it is a text file - like the ones you analyze with the CAT Scanner software!
There are two ways around this. First, when saving in Notepad, change the "Save as type" dropdown from "Text Documents (*.txt)" to "All Files (*.*)" - see the image below for an example. This will make it so that when you save, it will NOT add .txt to the end. Save your file as a .dict in your dictionary folder, reload CAT Scanner (or click 'refresh' if you're in the CATA tool), and you'll be good to go.
The other way is to take your file that currently ends in '.dict.txt' and rename it so that it ends in just '.dict'. In Windows Explorer you will need to make sure that "File name extensions" under the 'view' tab is checked (see the image below). Then with the file ending in '.dict.txt' selected, press the 'F2' key on your keyboard. This will let you rename the file. Remove the '.txt' from the end and press 'Enter'. It may ask you for confirmation - say yes. Reload CAT Scanner (or click 'refresh' if you're in the CATA tool), and you'll be good to go.
The last thing I would check, is to make sure it's in the right directory folder. If you do not know where your dictionary folder is, go into the Execute Computer-Aided Text Analysis tool, click on Dictionaries --> Set Dictionaries Folder. It will open up a window with your Dictionaries Folder highlighted (just make sure not to click anything other than the scroll bars as doing so could select a different folder). Click 'Cancel' when done unless you intend to change the dictionary folder.
CAT Scanner cannot analyze text from PDFs, Word Documents, Excel Files or other formats. Extending the number of filetypes available is in the works for the next version of CAT Scanner, but in the meantime, LIWC does have a greater number of filetypes it can work with. You may also be able to use "Save As" in some software (e.g., Word) and choose to save to a plaintext file to get the files into a '.txt' format.
You may be able to conduct analyses of other languages; however, if they use letters that are not in English (e.g., â, ℵ) CAT Scanner is likely to treat that character incorrectly, invalidating your analysis. If you use CAT Scanner in this way, make sure to test that it is working correctly by feeding it small files with known numbers of words and ensure that when you conduct the analysis, you get the results expected.
Here are a few things that I hope to include in CAT Scanner 2.0:
- Python-based development to enable Mac and Linux users to use the tool - and to enable those who want to tinker with it to do so more easily?
- More granular control of garbage character removal - we'll provide you with a few suggested options, but you will ultimately get to control what characters stay and go. This will help with conducting analyses of non-English languages.
- More granular control over inductive word list generation - Change the number of incidences required (either per text file or for the entire corpus) for the word to appear in the inductive word list generated.
- More granular control over technical aspects of dictionary-based CATA - Do you want the analysis to be case-sensitive? Do you want to split texts into words differently than the default method? This will be possible.
- New feature: Word weighting - Right now, all words are weighted at 1. In the next round, you will be able to set weights for words so that some words 'count' more than others.
- New feature: Text cleaning - not all text problems arise from garbage characters. Sometimes words are just misspelled or are missing a space in between words. The next version will have a feature that compares all words against an English language dictionary, show you a list of words that appear in the texts, and enable you to decide what to do with them (if anything).
- New feature: KWIC analysis - When writing papers, it is often helpful to provide samples of the dictionary words in the context of your texts to show how the language is being used. KWIC analysis will provide you with these.
- New feature: Centralized corpus file - Working with thousands of individual text files can be cumbersome. The next version should be able to save the entire corpus into one file that is more easily stored and shared with colleagues than having to deal with zip files of thousands of individual files (Bonus, the analyses will run faster too).
- New feature: Centralized dictionaries - Many of our constructs have multiple dimensions, and for some, there may be more than one version of the dictionary. In the next version, I hope to enable you to consolidate multiple dictionaries into one file so that you have less to keep track of.
- More advanced text analytics - Things like topic modeling, text classification, and other more advanced analyses. I'd love to do it... but making it user friendly while still ensuring rigor is still priority #1.
- Dictionary conversion - Automate the conversion of other software file formats into CAT Scanner dictionary format and vice versa to enable you to jump between packages more seamlessly.
- Corpus/textfile metadata - Very rarely do we work only with linguistic data. Often we are tying this data to other things (e.g., financial performance). Right now, this is possible through the indexing text feature, but it's clunky and forces you to change the textfile itself. I'd like to find a way to enable you to store non-text data in the corpus alongside the texts so that you don't have to match up the CATA data and external data in a separate program
- Online dictionaries - Rather than having to download dictionaries, it'd be cool to be able to access all of the dictionaries on catscanner.net in real-time and use those in addition to the database files you have stored locally. This way you always have access to the most recent dictionaries for your constructs of interest.
- More reporting options - Want to see a disaggregated score for each word in your dictionary so you can get a sense for which words are and are not being used? Yeah, me too. I'd like to find a way to let you slice and dice the results in more ways
- Facilitated dictionary creation - There are several really cool tools out there (e.g., WordNet) that can make dictionary creation less time consuming. I'd like to try to automate more of the dictionary creation process.
- Online results - I'd like to find a way to let you see the results of your analysis before exporting it to a file.
The reality: I am pre-tenure and, while CAT Scanner is a labor of love, I only work on this in my personal time because my work time needs to remain dedicated to research and teaching. This is going to take a while...