Does anyone know how to do this in SPSS? I have a column of sentences, and wanted to count how many words were in each, and then how many times particular words (e.g. cat, dog, run) occurred in each.
I haven't seen a built in function that does this, and the macros I saw online were a bit confusing.
Thank you, Joseph |
Have you considered using Python to do this?
I have a script or two that I wrote not too long ago that can be modified to do exactly what you want to do. Currently, my script reads lines/sentences and removes the 1000 most common words according to a referenced dictionary. Leaving behind "key" words.
I don't see why you/we/I couldn't adjust it to read lines, count words, and report frequencies of word usage (it would be a heavy mod to the existing script and may just be more entertaining to write it from scratch). But, I know Python would make easy work of it.
-J On Thu, Jun 21, 2012 at 12:18 PM, Joseph Williams <[hidden email]> wrote: Does anyone know how to do this in SPSS? I have a column of sentences, and wanted to count how many words were in each, and then how many times particular words (e.g. cat, dog, run) occurred in each. |
In reply to this post by Joseph Williams
Joseph, I saw J. R. Carroll’s reply and it sounds like he has an existing, quite powerful Python program and would be interested in adapting it. That acknowledged, here is my untested method using syntax. I’ll assume that by ‘cell’ you mean a string variable, which I will call ‘line’. The number of words is number of spaces+1 under the assumption that there are no double or triple spaces. If so, that is another issue. Compute words=0. Compute #l= char.length(line). Do if (#l gt 0). Loop #i=1 to #l. + if (substr(line,#i,1) eq ‘ ‘) words=words+1. End loop. End if. Compute #l= char.length(line). Do repeat x=’cat’ ‘dog’ ‘run’/y=3 3 3/N=ncat ndog nrun. Compute n=0. Do if (#l gt 0). Loop #i=1 to #l-y-1. + if (substr(line,#i,y) eq x) n=n+1. End loop. End if. End repeat. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joseph Williams Does anyone know how to do this in SPSS? I have a column of sentences, and wanted to count how many words were in each, and then how many times particular words (e.g. cat, dog, run) occurred in each. I haven't seen a built in function that does this, and the macros I saw online were a bit confusing. Thank you,
|
Administrator
|
In reply to this post by Joseph Williams
Here is an 'old school' approach using an elementary parsing technique.
One could 'VECTOR' the words then use VARSTOCASES but that presumes knowing the max words for any sentence. I hence opted for the 'old fashioned' XSAVE approach. -- DATA LIST /Sentence (A200). BEGIN DATA The lazy dog jumped over the quick brown fox and the fox jumped over the moon. Every SPSS user must eventually learn syntax. There exists a quite powerful Python program and some would be interested in adapting it. Teaching an old dog new tricks is always lots of fun. I met a fox in a bar but all she wanted to do was turn some tricks. END DATA. STRING #Cpy(A200). COMPUTE #CPY=SUBSTR(Sentence,1,LENGTH(RTRIM(Sentence))-1). COMPUTE SrcLine=$CASENUM. STRING Word(A20). LOOP. + COMPUTE #=INDEX(#Cpy," "). + COMPUTE Word=SUBSTR(#Cpy,1,#-1). + COMPUTE #Cpy=SUBSTR(#Cpy,#+1). + XSAVE OUTFILE "C:\TEMP\Words.sav" / KEEP SrcLine Word. END LOOP IF #Cpy=" ". EXE. GET FILE "C:\TEMP\Words.sav" . AGGREGATE OUTFILE * / BREAK SrcLine Word / N=N. AGGREGATE OUTFILE */ MODE=ADDVARIABLES / BREAK Srcline / NWords=SUM(N). ERASE FILE "C:\TEMP\Words.sav" .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |