Counting number of words in a cell containing a sentence, counting number of times a word occurs in a cell

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Counting number of words in a cell containing a sentence, counting number of times a word occurs in a cell

Joseph Williams
Does anyone know how to do this in SPSS? I have a column of sentences, and wanted to count how many words were in each, and then how many times particular words (e.g. cat, dog, run) occurred in each.

I haven't seen a built in function that does this, and the macros I saw online were a bit confusing.

Thank you,

Joseph
Reply | Threaded
Open this post in threaded view
|

Re: Counting number of words in a cell containing a sentence, counting number of times a word occurs in a cell

J. R. Carroll
Have you considered using Python to do this?

I have a script or two that I wrote not too long ago that can be modified to do exactly what you want to do.  

Currently, my script reads lines/sentences and removes the 1000 most common words according to a referenced dictionary.  Leaving behind "key" words.

I don't see why you/we/I couldn't adjust it to read lines, count words, and report frequencies of word usage (it would be a heavy mod to the existing script and may just be more entertaining to write it from scratch).  But, I know Python would make easy work of it.

-J

 



On Thu, Jun 21, 2012 at 12:18 PM, Joseph Williams <[hidden email]> wrote:
Does anyone know how to do this in SPSS? I have a column of sentences, and wanted to count how many words were in each, and then how many times particular words (e.g. cat, dog, run) occurred in each.

I haven't seen a built in function that does this, and the macros I saw online were a bit confusing.

Thank you,

Joseph

Reply | Threaded
Open this post in threaded view
|

Re: Counting number of words in a cell containing a sentence, counting number of times a word occurs in a cell

Maguin, Eugene
In reply to this post by Joseph Williams

Joseph,

 

I saw J. R. Carroll’s reply and it sounds like he has an existing, quite powerful Python program and would be interested in adapting it. That acknowledged, here is my untested method using syntax. I’ll assume that by ‘cell’ you mean a string variable, which I will call ‘line’.

 

The number of words is number of spaces+1 under the assumption that there are no double or triple spaces. If so, that is another issue.

 

Compute words=0.

Compute #l= char.length(line).

Do if (#l gt 0).

Loop #i=1 to #l.

+  if (substr(line,#i,1) eq ‘ ‘) words=words+1.

End loop.

End if.

 

Compute #l= char.length(line).

Do repeat x=’cat’ ‘dog’ ‘run’/y=3 3 3/N=ncat ndog nrun.

Compute n=0.

Do if (#l gt 0).

Loop #i=1 to #l-y-1.

+  if (substr(line,#i,y) eq x) n=n+1.

End loop.

End if.

End repeat.

 

Gene Maguin

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joseph Williams
Sent: Thursday, June 21, 2012 3:19 PM
To: [hidden email]
Subject: Counting number of words in a cell containing a sentence, counting number of times a word occurs in a cell

 

Does anyone know how to do this in SPSS? I have a column of sentences, and wanted to count how many words were in each, and then how many times particular words (e.g. cat, dog, run) occurred in each.

 

I haven't seen a built in function that does this, and the macros I saw online were a bit confusing.

 

Thank you,


Joseph

Reply | Threaded
Open this post in threaded view
|

Re: Counting number of words in a cell containing a sentence, counting number of times a word occurs in a cell

David Marso
Administrator
In reply to this post by Joseph Williams
Here is an 'old school' approach using an elementary parsing technique.
One could 'VECTOR' the words then use VARSTOCASES but that presumes knowing the max words for any sentence.  I hence opted for the 'old fashioned' XSAVE approach.
--
DATA LIST /Sentence (A200).
BEGIN DATA
The lazy dog jumped over the quick brown fox and the fox jumped over the moon.
Every SPSS user must eventually learn syntax.
There exists a quite powerful Python program and some would be interested in adapting it.
Teaching an old dog new tricks is always lots of fun.
I met a fox in a bar but all she wanted to do was turn some tricks.
END DATA.

STRING #Cpy(A200).
COMPUTE #CPY=SUBSTR(Sentence,1,LENGTH(RTRIM(Sentence))-1).
COMPUTE SrcLine=$CASENUM.

STRING Word(A20).
LOOP.
+  COMPUTE #=INDEX(#Cpy," ").
+  COMPUTE Word=SUBSTR(#Cpy,1,#-1).
+  COMPUTE #Cpy=SUBSTR(#Cpy,#+1).
+    XSAVE OUTFILE "C:\TEMP\Words.sav" / KEEP SrcLine Word.
END LOOP IF #Cpy=" ".
EXE.
GET FILE "C:\TEMP\Words.sav" .
AGGREGATE OUTFILE * / BREAK SrcLine Word / N=N.
AGGREGATE OUTFILE */ MODE=ADDVARIABLES / BREAK Srcline / NWords=SUM(N).
ERASE FILE "C:\TEMP\Words.sav" .



Joseph Williams wrote
Does anyone know how to do this in SPSS? I have a column of sentences, and
wanted to count how many words were in each, and then how many times
particular words (e.g. cat, dog, run) occurred in each.

I haven't seen a built in function that does this, and the macros I saw
online were a bit confusing.

Thank you,

Joseph
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"