Hi! New to this list and came from spsstools. I'm an academic user so
SPSS won't provide assistance. My problem.. I'm doing medical research and have millions of rows. One variable is called 'description' and it's a string variable. In that field there are sentences on the patient's condition. I need a way to count the number of times a word like 'hospital' shows up. The word will always be lowercase and may or may not end with a period (end of sentence). My goal is to compute a new variable called 'ntimes' that holds the number value of how many times my target word shows up in the description field for each row. Lastly, if the above is possible, is it also possible to exapand your solution to allow for finding a two word phrase like 'hospitalization time' which also may, or may not, end with a period? Thank you kindly for any assistance! Bill ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This would be pretty easy to do if you can use the Python programmability plugin. What version are you on?
And what do you mean by "a word like hospital"? Do you have a list of words you need to count? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bill Sent: Sunday, September 06, 2009 8:52 AM To: [hidden email] Subject: [SPSSX-L] Find number of occurrences of word Hi! New to this list and came from spsstools. I'm an academic user so SPSS won't provide assistance. My problem.. I'm doing medical research and have millions of rows. One variable is called 'description' and it's a string variable. In that field there are sentences on the patient's condition. I need a way to count the number of times a word like 'hospital' shows up. The word will always be lowercase and may or may not end with a period (end of sentence). My goal is to compute a new variable called 'ntimes' that holds the number value of how many times my target word shows up in the description field for each row. Lastly, if the above is possible, is it also possible to exapand your solution to allow for finding a two word phrase like 'hospitalization time' which also may, or may not, end with a period? Thank you kindly for any assistance! Bill ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bill-192
hi!
define !word () "hospital" !enddefine. compute ntimes = 0. loop #letter = 1 to length (description). +if (substr(description, #letter, length(!word)) eq !word ) ntimes = ntimes + 1. end loop. fre ntimes. If you're not sure about the cAsInG, use the LOWER function. Cheers!! Albert-Jan --- On Sun, 9/6/09, Bill <[hidden email]> wrote: > From: Bill <[hidden email]> > Subject: [SPSSX-L] Find number of occurrences of word > To: [hidden email] > Date: Sunday, September 6, 2009, 4:51 PM > Hi! New to this list and came > from spsstools. I'm an academic user so > SPSS won't provide assistance. My problem.. > > I'm doing medical research and have millions of rows. > One variable is > called 'description' and it's a string variable. In > that field there are > sentences on the patient's condition. I need a way to > count the number of > times a word like 'hospital' shows up. The word will > always be lowercase > and may or may not end with a period (end of > sentence). My goal is to > compute a new variable called 'ntimes' that holds the > number value of how > many times my target word shows up in the description field > for each row. > > Lastly, if the above is possible, is it also possible to > exapand your > solution to allow for finding a two word phrase like > 'hospitalization time' > which also may, or may not, end with a period? > > Thank you kindly for any assistance! > Bill > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
...Or it can be all packed in a macro like the following. Beware, it does search substrings, not whole words. It will count also "hospitality" even if you look for "hospital" only.
Jan data list fixed / text (a200). begin data. night town. night town a glass. color mahogany. color mahogany center. rose is a rose is a rose is a rose. loveliness extreme. extra gaiters. loveliness extreme. sweetest ice-cream. page ages page ages page ages. wiped wiped wire wire. sweeter than peaches and pears and cream. wiped wire wiped wire. end data. define !counter (haystack=!tok(1) / out=!tok(1) / needle=!encl ('(', ')') ). * counts occurences of the string needle in the string variable haystack; the results are saved in the variable out. compute !out = 0. formats !out (f3). loop #i = 1 to length( !haystack ) - !length( !needle ) . - if ( INDEX(substr(!haystack, #i), !quote(!needle)) = 1 ) !out = !out + 1 . end loop. execute. !enddefine. !counter haystack=text out=wires needle=(wire). !counter haystack=text out=roses needle=(rose is a rose). -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-Jan Roskam Sent: Monday, September 07, 2009 12:22 PM To: [hidden email] Subject: Re: Find number of occurrences of word hi! define !word () "hospital" !enddefine. compute ntimes = 0. loop #letter = 1 to length (description). +if (substr(description, #letter, length(!word)) eq !word ) ntimes = ntimes + 1. end loop. fre ntimes. If you're not sure about the cAsInG, use the LOWER function. Cheers!! Albert-Jan --- On Sun, 9/6/09, Bill <[hidden email]> wrote: > From: Bill <[hidden email]> > Subject: [SPSSX-L] Find number of occurrences of word > To: [hidden email] > Date: Sunday, September 6, 2009, 4:51 PM Hi! New to this list and > came from spsstools. I'm an academic user so SPSS won't provide > assistance. My problem.. > > I'm doing medical research and have millions of rows. > One variable is > called 'description' and it's a string variable. In that field there > are sentences on the patient's condition. I need a way to count the > number of times a word like 'hospital' shows up. The word will always > be lowercase and may or may not end with a period (end of sentence). > My goal is to compute a new variable called 'ntimes' that holds the > number value of how many times my target word shows up in the > description field for each row. > > Lastly, if the above is possible, is it also possible to exapand your > solution to allow for finding a two word phrase like 'hospitalization > time' > which also may, or may not, end with a period? > > Thank you kindly for any assistance! > Bill > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____________ Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem. Jste si jisti, že opravdu potřebujete vytisknout tuto zprávu a/nebo její přílohy? Myslete na přírodu. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. Are you sure that you really need a print version of this message and/or its attachments? Think about nature. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
SPSS wasn't designed to do this. There must
be other software from the qualitative side which would work. Try Surrey
Univ (UK) for their stuff on coding text.
However, try exporting the data to Word then
doing
Find & replace hospital with
hospital etc . Tedious but it gives a count and might
work.
|
In reply to this post by Albert-Jan Roskam
Hi,
I am new to python and have just recently started learning it. The code worked on my example just fine but was wondering if you could explain the third line of your code. Is #letter a defined variable? how does the loop know to go to the next letter in the second iteration? Thank you! define !word () "hospital" !enddefine. *this line declares at temporary variable !word compute ntimes = 0. *this line starts an iteration that declares a variable ntimes that would increase by an increment of one if the program finds that a condition is satisfied loop #letter = 1 to length (allstatus). *this line starts a loop. I don't know what #letter or length do Does it mean that it tells the loop to run from first to last letter of the expression? How does it now to skip to the next letter? +if (substr(allstatus, #letter, length(!word)) eq !word ) ntimes = ntimes + 1. *defines substring within var, start with the first position as #letter is assigned 1, separate what is defined as word and compare it to word, then add one count to ntimes end loop.
Anastasia Vishnyakova
|
Administrator
|
This IS NOT Python! It is a simple macro.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I thought it is a python based macro.
Anastasia Vishnyakova
|
Administrator
|
I found your first post hard to read with all the comments interspersed. Here is the same info reorganized.
define !word () "hospital" !enddefine. /* [1] . compute ntimes = 0. /* [2] . loop #letter = 1 to length(allstatus). /* [3] . + if (substr(allstatus, #letter, length(!word)) eq !word ) ntimes = ntimes + 1. /* [4] . end loop. /* [5] . /* [1] this defines a macro called !word that expands to "hospital" in this case */ /* [2] this line intitializes variable ntimes to 0 */ /* [3] Scratch variable #letter will loop from 1 to the length of a string variable called allstatus */ /* [4] each time the string defined by the !word macro is found in variable allstatus, variable ntimes is incremented by 1 */ /* [5] this line ends the loop */ As David noted, line [1] is a simple SPSS macro. The other lines are SPSS syntax. You could reduce the number of times through the loop, because there's no point in looking for the appearance of "hospital" in allstatus when the number of characters remaining in allstatus is less than 8 (the length of "hospital"). So, you could change line 3 to: loop #letter = 1 to length(allstatus)-length(!word)+1. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |