dear fellow list visitors,
please help me with this problem. I have the following syntax which works perfectly. It 'replaces' strings in old variables (name1 to name14) into numerical ones in a new variable (newname1 to newname14). example: vector name=name1 to name14. vector newname(14). loop i=1 to 14. do repeat a="alpha" "beta" "gamma" / b=1 2 3. - if name(i) = a newname(i)=b. end repeat print. end loop. However, instead of three values (alpha beta and gamma) I have thousands of unique string values stored in a separate system file, each identified with a unique numerical code. How can I insert these values in the do repeat function (after 'a=' and after 'b=')? The reason why I want to change these from string to numeric ones is that I know the system file will be smaller and hopefully also faster to read. You help is much appreciated. sincerely Maurice -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University� (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996–2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Hi Maurice. Does AUTORECODE with /GROUP give you what you want?
AUTORECODE VARIABLES=name1 to name14 /INTO newname1 to newname14 /GROUP.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Maurice Vergeer
from <help>
Overview (AUTORECODE command)AUTORECODE recodes the values of string and numeric variables to consecutive integers and puts the recoded values into a new variable called a target variable. The value labels or values of the original variable are used as value labels for the target variable. AUTORECODE is useful for creating numeric independent (grouping) variables from string variables for procedures such as ONEWAY and DISCRIMINANT. AUTORECODE can also recode the values of factor variables to consecutive integers, which may be required by some procedures and which reduces the amount of workspace needed by some statistical procedures. AUTORECODE VARIABLES=varlist
/INTO new varlist
[/BLANK={VALID**}
{MISSING}
[/GROUP]
[/DESCENDING]
[/PRINT]
if this will be applied to many files there are two more
specifications that are possible. [/APPLY TEMPLATE=’filespec’]
[/SAVE TEMPLATE=’filespec’]
if you want blanks to be labeled as missing you will need the /BLANK.
Art Kendall
Social Research Consultants
On 3/29/2011 2:59 AM, Maurice Vergeer wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDdear fellow list visitors, please help me with this problem. I have the following syntax which works perfectly. It 'replaces' strings in old variables (name1 to name14) into numerical ones in a new variable (newname1 to newname14). example: vector name=name1 to name14. vector newname(14). loop i=1 to 14. do repeat a="alpha" "beta" "gamma" / b=1 2 3. - if name(i) = a newname(i)=b. end repeat print. end loop. However, instead of three values (alpha beta and gamma) I have thousands of unique string values stored in a separate system file, each identified with a unique numerical code. How can I insert these values in the do repeat function (after 'a=' and after 'b=')? The reason why I want to change these from string to numeric ones is that I know the system file will be smaller and hopefully also faster to read. You help is much appreciated. sincerely Maurice -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University� (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996–2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Maurice Vergeer
Hi Maurice,
If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes have some specific meaning). SORT your external system file by the string variable and save it. Transform your master file from wide to long using VARSTOCASES retaining caseidentifier and string and index. SORT by string. MATCH FILES using the external file as a table with the string as a key. transform the file from long to wide. Done. HTH, David --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
dear all,
thanks for your suggestions. Regarding autorecode (David and Art's suggestion): I tried this, but it took enormously long, so I interrupted it. The point is, there are thousands of unique values, but appr. 4.5 million records (file size over 3 gigabyte). So, it's large. regarding vartstocase option, I'm not sure whether spss allows so many columns. The values as such are not necessarily meaningful but need to stay unique. It appears there is no easy or obvious solution. One option not explored yet is just inserting the string values and numerical values in the do repeat. This would result in a very large syntax file. This is a dirty solution, not sure whether it's quick either. Tonight I'll try to run one of options above and see whether it'll be finished when I return from work tomorrow afternoon. I'll let you kno whether it worked. thanks again Maurice On Tue, Mar 29, 2011 at 20:37, David Marso <[hidden email]> wrote: > Hi Maurice, > If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes have > some specific meaning). > SORT your external system file by the string variable and save it. > Transform your master file from wide to long using VARSTOCASES retaining > caseidentifier and string and index. > SORT by string. > MATCH FILES using the external file as a table with the string as a key. > transform the file from long to wide. > Done. > HTH, David > -- > > Maurice Vergeer wrote: >> >> dear fellow list visitors, >> >> please help me with this problem. >> I have the following syntax which works perfectly. >> >> It 'replaces' strings in old variables (name1 to name14) into >> numerical ones in a new variable (newname1 to newname14). >> >> example: >> vector name=name1 to name14. >> vector newname(14). >> loop i=1 to 14. >> do repeat a="alpha" "beta" "gamma" / b=1 2 >> 3. >> - if name(i) = a newname(i)=b. >> end repeat print. >> end loop. >> >> >> However, instead of three values (alpha beta and gamma) I have >> thousands of unique string values stored in a separate system file, >> each identified with a unique numerical code. >> How can I insert these values in the do repeat function (after 'a=' >> and after 'b=')? >> >> The reason why I want to change these from string to numeric ones is >> that I know the system file will be smaller and hopefully also faster >> to read. >> >> You help is much appreciated. >> >> sincerely >> Maurice >> >> >> >> >> -- >> ___________________________________________________________________ >> Maurice Vergeer >> Department of communication, Radboud University� � (www.ru.nl) >> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands >> >> Visiting Professor Yeungnam University, Gyeongsan, South Korea >> >> Recent publications: >> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). >> Online social networks and micro-blogging in political campaigning: >> The exploration of a new campaign tool and a new campaign style. Party >> Politics. >> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions >> and daily television use in the Netherlands, 1996–2005. International >> Journal of Meteorology. >> >> Webspace >> www.mauricevergeer.nl >> http://blog.mauricevergeer.nl/ >> www.journalisteninhetdigitaletijdperk.nl >> maurice.vergeer (skype) >> ___________________________________________________________________ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-with-thousands-of-unique-values-to-insert-tp4268902p4269231.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University� (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996–2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Maurice,
There's another solution to this. I don't know whether it is faster than autorecode or David's varstocases restructure. (I think you should try the varstocases because, if I understand your correctly, you have only 14 variables. Not to be snarky, but that is a trivial number of variables.) However, maybe it's not. So, the basic problem is that across the 14 variables you want to make sure the same string value gets the same numeric value in the corresponding new variable. Sort the master file by string value and use $casenum to assign the numeric value. Save it. Open up the file with the 14 variables and sort by the first variable, call it var1. Do a match files with table and a rename for the two variables in the master file so that the string value variable name is var1 and the numeric value variable will be unique and the var1 as the by variable. You'll need to repeat that sequence of sort cases-match files 14 times, once for each of your 14 variables. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maurice Vergeer Sent: Tuesday, March 29, 2011 8:29 AM To: [hidden email] Subject: Re: loop and do repeat problem with thousands of unique values to insert dear all, thanks for your suggestions. Regarding autorecode (David and Art's suggestion): I tried this, but it took enormously long, so I interrupted it. The point is, there are thousands of unique values, but appr. 4.5 million records (file size over 3 gigabyte). So, it's large. regarding vartstocase option, I'm not sure whether spss allows so many columns. The values as such are not necessarily meaningful but need to stay unique. It appears there is no easy or obvious solution. One option not explored yet is just inserting the string values and numerical values in the do repeat. This would result in a very large syntax file. This is a dirty solution, not sure whether it's quick either. Tonight I'll try to run one of options above and see whether it'll be finished when I return from work tomorrow afternoon. I'll let you kno whether it worked. thanks again Maurice On Tue, Mar 29, 2011 at 20:37, David Marso <[hidden email]> wrote: > Hi Maurice, > If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes have > some specific meaning). > SORT your external system file by the string variable and save it. > Transform your master file from wide to long using VARSTOCASES retaining > caseidentifier and string and index. > SORT by string. > MATCH FILES using the external file as a table with the string as a key. > transform the file from long to wide. > Done. > HTH, David > -- > > Maurice Vergeer wrote: >> >> dear fellow list visitors, >> >> please help me with this problem. >> I have the following syntax which works perfectly. >> >> It 'replaces' strings in old variables (name1 to name14) into >> numerical ones in a new variable (newname1 to newname14). >> >> example: >> vector name=name1 to name14. >> vector newname(14). >> loop i=1 to 14. >> do repeat a="alpha" "beta" "gamma" / b=1 2 >> 3. >> - if name(i) = a newname(i)=b. >> end repeat print. >> end loop. >> >> >> However, instead of three values (alpha beta and gamma) I have >> thousands of unique string values stored in a separate system file, >> each identified with a unique numerical code. >> How can I insert these values in the do repeat function (after 'a=' >> and after 'b=')? >> >> The reason why I want to change these from string to numeric ones is >> that I know the system file will be smaller and hopefully also faster >> to read. >> >> You help is much appreciated. >> >> sincerely >> Maurice >> >> >> >> >> -- >> ___________________________________________________________________ >> Maurice Vergeer >> Department of communication, Radboud University? (www.ru.nl) >> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands >> >> Visiting Professor Yeungnam University, Gyeongsan, South Korea >> >> Recent publications: >> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). >> Online social networks and micro-blogging in political campaigning: >> The exploration of a new campaign tool and a new campaign style. Party >> Politics. >> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather >> and daily television use in the Netherlands, 1996-2005. International >> Journal of Meteorology. >> >> Webspace >> www.mauricevergeer.nl >> http://blog.mauricevergeer.nl/ >> www.journalisteninhetdigitaletijdperk.nl >> maurice.vergeer (skype) >> ___________________________________________________________________ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: h-thousands-of-unique-values-to-insert-tp4268902p4269231.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996-2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Dear Gene,
this solution crossed my mind. But I dismissed it. However, the solution can be right under your nose and not see it because it seems too simple. It's straightforward, a bit inefficient (running it 14 times), but it gets the job done. The only thing I worry about is having to sort on the 14 variables. I could do it as follows: sort cases by name1 name2 name3 ... name14. or sort cases by name1. match fieles etc and this another 13 times. the first option is more efficient in syntax, but hard on spss. Still, with two hours before going to bed, this seems the most likely candidate. thanks On Tue, Mar 29, 2011 at 22:12, Gene Maguin <[hidden email]> wrote: > Maurice, > > There's another solution to this. I don't know whether it is faster than > autorecode or David's varstocases restructure. (I think you should try the > varstocases because, if I understand your correctly, you have only 14 > variables. Not to be snarky, but that is a trivial number of variables.) > However, maybe it's not. So, the basic problem is that across the 14 > variables you want to make sure the same string value gets the same numeric > value in the corresponding new variable. Sort the master file by string > value and use $casenum to assign the numeric value. Save it. Open up the > file with the 14 variables and sort by the first variable, call it var1. Do > a match files with table and a rename for the two variables in the master > file so that the string value variable name is var1 and the numeric value > variable will be unique and the var1 as the by variable. You'll need to > repeat that sequence of sort cases-match files 14 times, once for each of > your 14 variables. > > Gene Maguin > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Maurice Vergeer > Sent: Tuesday, March 29, 2011 8:29 AM > To: [hidden email] > Subject: Re: loop and do repeat problem with thousands of unique values to > insert > > dear all, > > thanks for your suggestions. > > Regarding autorecode (David and Art's suggestion): I tried this, but > it took enormously long, so I interrupted it. The point is, there are > thousands of unique values, but appr. 4.5 million records (file size > over 3 gigabyte). So, it's large. > > regarding vartstocase option, I'm not sure whether spss allows so many > columns. The values as such are not necessarily meaningful but need to > stay unique. > > It appears there is no easy or obvious solution. > One option not explored yet is just inserting the string values and > numerical values in the do repeat. > This would result in a very large syntax file. This is a dirty > solution, not sure whether it's quick either. > > Tonight I'll try to run one of options above and see whether it'll be > finished when I return from work tomorrow afternoon. > > I'll let you kno whether it worked. > > thanks again > Maurice > > > > > On Tue, Mar 29, 2011 at 20:37, David Marso <[hidden email]> wrote: >> Hi Maurice, >> If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes > have >> some specific meaning). >> SORT your external system file by the string variable and save it. >> Transform your master file from wide to long using VARSTOCASES retaining >> caseidentifier and string and index. >> SORT by string. >> MATCH FILES using the external file as a table with the string as a key. >> transform the file from long to wide. >> Done. >> HTH, David >> -- >> >> Maurice Vergeer wrote: >>> >>> dear fellow list visitors, >>> >>> please help me with this problem. >>> I have the following syntax which works perfectly. >>> >>> It 'replaces' strings in old variables (name1 to name14) into >>> numerical ones in a new variable (newname1 to newname14). >>> >>> example: >>> vector name=name1 to name14. >>> vector newname(14). >>> loop i=1 to 14. >>> do repeat a="alpha" "beta" "gamma" / b=1 2 >>> 3. >>> - if name(i) = a newname(i)=b. >>> end repeat print. >>> end loop. >>> >>> >>> However, instead of three values (alpha beta and gamma) I have >>> thousands of unique string values stored in a separate system file, >>> each identified with a unique numerical code. >>> How can I insert these values in the do repeat function (after 'a=' >>> and after 'b=')? >>> >>> The reason why I want to change these from string to numeric ones is >>> that I know the system file will be smaller and hopefully also faster >>> to read. >>> >>> You help is much appreciated. >>> >>> sincerely >>> Maurice >>> >>> >>> >>> >>> -- >>> ___________________________________________________________________ >>> Maurice Vergeer >>> Department of communication, Radboud University? � (www.ru.nl) >>> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands >>> >>> Visiting Professor Yeungnam University, Gyeongsan, South Korea >>> >>> Recent publications: >>> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). >>> Online social networks and micro-blogging in political campaigning: >>> The exploration of a new campaign tool and a new campaign style. Party >>> Politics. >>> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather > conditions >>> and daily television use in the Netherlands, 1996-2005. International >>> Journal of Meteorology. >>> >>> Webspace >>> www.mauricevergeer.nl >>> http://blog.mauricevergeer.nl/ >>> www.journalisteninhetdigitaletijdperk.nl >>> maurice.vergeer (skype) >>> ___________________________________________________________________ >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> >> >> -- >> View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-wit > h-thousands-of-unique-values-to-insert-tp4268902p4269231.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > > -- > ___________________________________________________________________ > Maurice Vergeer > Department of communication, Radboud University � (www.ru.nl) > PO Box 9104, NL-6500 HE Nijmegen, The Netherlands > > Visiting Professor Yeungnam University, Gyeongsan, South Korea > > Recent publications: > -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). > Online social networks and micro-blogging in political campaigning: > The exploration of a new campaign tool and a new campaign style. Party > Politics. > -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions > and daily television use in the Netherlands, 1996-2005. International > Journal of Meteorology. > > Webspace > www.mauricevergeer.nl > http://blog.mauricevergeer.nl/ > www.journalisteninhetdigitaletijdperk.nl > maurice.vergeer (skype) > ___________________________________________________________________ > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University� (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996–2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Maurice,
No. Take the 14 variables one at a time. The syntax load is pretty trivial. Besidees, you can copy and modify the syntax. Gene Maguin -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Maurice Vergeer Sent: Tuesday, March 29, 2011 9:23 AM To: Gene Maguin Cc: [hidden email] Subject: Re: loop and do repeat problem with thousands of unique values to insert Dear Gene, this solution crossed my mind. But I dismissed it. However, the solution can be right under your nose and not see it because it seems too simple. It's straightforward, a bit inefficient (running it 14 times), but it gets the job done. The only thing I worry about is having to sort on the 14 variables. I could do it as follows: sort cases by name1 name2 name3 ... name14. or sort cases by name1. match fieles etc and this another 13 times. the first option is more efficient in syntax, but hard on spss. Still, with two hours before going to bed, this seems the most likely candidate. thanks On Tue, Mar 29, 2011 at 22:12, Gene Maguin <[hidden email]> wrote: > Maurice, > > There's another solution to this. I don't know whether it is faster than > autorecode or David's varstocases restructure. (I think you should try the > varstocases because, if I understand your correctly, you have only 14 > variables. Not to be snarky, but that is a trivial number of variables.) > However, maybe it's not. So, the basic problem is that across the 14 > variables you want to make sure the same string value gets the same numeric > value in the corresponding new variable. Sort the master file by string > value and use $casenum to assign the numeric value. Save it. Open up the > file with the 14 variables and sort by the first variable, call it var1. Do > a match files with table and a rename for the two variables in the master > file so that the string value variable name is var1 and the numeric value > variable will be unique and the var1 as the by variable. You'll need to > repeat that sequence of sort cases-match files 14 times, once for each of > your 14 variables. > > Gene Maguin > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Maurice Vergeer > Sent: Tuesday, March 29, 2011 8:29 AM > To: [hidden email] > Subject: Re: loop and do repeat problem with thousands of unique values to > insert > > dear all, > > thanks for your suggestions. > > Regarding autorecode (David and Art's suggestion): I tried this, but > it took enormously long, so I interrupted it. The point is, there are > thousands of unique values, but appr. 4.5 million records (file size > over 3 gigabyte). So, it's large. > > regarding vartstocase option, I'm not sure whether spss allows so many > columns. The values as such are not necessarily meaningful but need to > stay unique. > > It appears there is no easy or obvious solution. > One option not explored yet is just inserting the string values and > numerical values in the do repeat. > This would result in a very large syntax file. This is a dirty > solution, not sure whether it's quick either. > > Tonight I'll try to run one of options above and see whether it'll be > finished when I return from work tomorrow afternoon. > > I'll let you kno whether it worked. > > thanks again > Maurice > > > > > On Tue, Mar 29, 2011 at 20:37, David Marso <[hidden email]> wrote: >> Hi Maurice, >> If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes > have >> some specific meaning). >> SORT your external system file by the string variable and save it. >> Transform your master file from wide to long using VARSTOCASES retaining >> caseidentifier and string and index. >> SORT by string. >> MATCH FILES using the external file as a table with the string as a key. >> transform the file from long to wide. >> Done. >> HTH, David >> -- >> >> Maurice Vergeer wrote: >>> >>> dear fellow list visitors, >>> >>> please help me with this problem. >>> I have the following syntax which works perfectly. >>> >>> It 'replaces' strings in old variables (name1 to name14) into >>> numerical ones in a new variable (newname1 to newname14). >>> >>> example: >>> vector name=name1 to name14. >>> vector newname(14). >>> loop i=1 to 14. >>> do repeat a="alpha" "beta" "gamma" / b=1 2 >>> 3. >>> - if name(i) = a newname(i)=b. >>> end repeat print. >>> end loop. >>> >>> >>> However, instead of three values (alpha beta and gamma) I have >>> thousands of unique string values stored in a separate system file, >>> each identified with a unique numerical code. >>> How can I insert these values in the do repeat function (after 'a=' >>> and after 'b=')? >>> >>> The reason why I want to change these from string to numeric ones is >>> that I know the system file will be smaller and hopefully also faster >>> to read. >>> >>> You help is much appreciated. >>> >>> sincerely >>> Maurice >>> >>> >>> >>> >>> -- >>> ___________________________________________________________________ >>> Maurice Vergeer >>> Department of communication, Radboud University? (www.ru.nl) >>> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands >>> >>> Visiting Professor Yeungnam University, Gyeongsan, South Korea >>> >>> Recent publications: >>> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). >>> Online social networks and micro-blogging in political campaigning: >>> The exploration of a new campaign tool and a new campaign style. Party >>> Politics. >>> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather > conditions >>> and daily television use in the Netherlands, 1996-2005. International >>> Journal of Meteorology. >>> >>> Webspace >>> www.mauricevergeer.nl >>> http://blog.mauricevergeer.nl/ >>> www.journalisteninhetdigitaletijdperk.nl >>> maurice.vergeer (skype) >>> ___________________________________________________________________ >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> >> >> -- >> View this message in context: > > h-thousands-of-unique-values-to-insert-tp4268902p4269231.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > > -- > ___________________________________________________________________ > Maurice Vergeer > Department of communication, Radboud University (www.ru.nl) > PO Box 9104, NL-6500 HE Nijmegen, The Netherlands > > Visiting Professor Yeungnam University, Gyeongsan, South Korea > > Recent publications: > -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). > Online social networks and micro-blogging in political campaigning: > The exploration of a new campaign tool and a new campaign style. Party > Politics. > -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions > and daily television use in the Netherlands, 1996-2005. International > Journal of Meteorology. > > Webspace > www.mauricevergeer.nl > http://blog.mauricevergeer.nl/ > www.journalisteninhetdigitaletijdperk.nl > maurice.vergeer (skype) > ___________________________________________________________________ > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 19962005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
that's true.
thanks On Tue, Mar 29, 2011 at 22:32, Gene Maguin <[hidden email]> wrote: > Maurice, > > No. Take the 14 variables one at a time. The syntax load is pretty trivial. > Besidees, you can copy and modify the syntax. > > Gene Maguin > > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of Maurice Vergeer > Sent: Tuesday, March 29, 2011 9:23 AM > To: Gene Maguin > Cc: [hidden email] > Subject: Re: loop and do repeat problem with thousands of unique values to > insert > > Dear Gene, > > this solution crossed my mind. But I dismissed it. > However, the solution can be right under your nose and not see it > because it seems too simple. > It's straightforward, a bit inefficient (running it 14 times), but it > gets the job done. > The only thing I worry about is having to sort on the 14 variables. I > could do it as follows: > sort cases by name1 name2 name3 ... name14. > or > sort cases by name1. > match fieles etc > and this another 13 times. > > the first option is more efficient in syntax, but hard on spss. > > Still, with two hours before going to bed, this seems the most likely > candidate. > > thanks > > > On Tue, Mar 29, 2011 at 22:12, Gene Maguin <[hidden email]> wrote: >> Maurice, >> >> There's another solution to this. I don't know whether it is faster than >> autorecode or David's varstocases restructure. (I think you should try the >> varstocases because, if I understand your correctly, you have only 14 >> variables. Not to be snarky, but that is a trivial number of variables.) >> However, maybe it's not. So, the basic problem is that across the 14 >> variables you want to make sure the same string value gets the same > numeric >> value in the corresponding new variable. Sort the master file by string >> value and use $casenum to assign the numeric value. Save it. Open up the >> file with the 14 variables and sort by the first variable, call it var1. > Do >> a match files with table and a rename for the two variables in the master >> file so that the string value variable name is var1 and the numeric value >> variable will be unique and the var1 as the by variable. You'll need to >> repeat that sequence of sort cases-match files 14 times, once for each of >> your 14 variables. >> >> Gene Maguin >> >> >> -----Original Message----- >> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >> Maurice Vergeer >> Sent: Tuesday, March 29, 2011 8:29 AM >> To: [hidden email] >> Subject: Re: loop and do repeat problem with thousands of unique values to >> insert >> >> dear all, >> >> thanks for your suggestions. >> >> Regarding autorecode (David and Art's suggestion): I tried this, but >> it took enormously long, so I interrupted it. The point is, there are >> thousands of unique values, but appr. 4.5 million records (file size >> over 3 gigabyte). So, it's large. >> >> regarding vartstocase option, I'm not sure whether spss allows so many >> columns. The values as such are not necessarily meaningful but need to >> stay unique. >> >> It appears there is no easy or obvious solution. >> One option not explored yet is just inserting the string values and >> numerical values in the do repeat. >> This would result in a very large syntax file. This is a dirty >> solution, not sure whether it's quick either. >> >> Tonight I'll try to run one of options above and see whether it'll be >> finished when I return from work tomorrow afternoon. >> >> I'll let you kno whether it worked. >> >> thanks again >> Maurice >> >> >> >> >> On Tue, Mar 29, 2011 at 20:37, David Marso <[hidden email]> wrote: >>> Hi Maurice, >>> If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes >> have >>> some specific meaning). >>> SORT your external system file by the string variable and save it. >>> Transform your master file from wide to long using VARSTOCASES retaining >>> caseidentifier and string and index. >>> SORT by string. >>> MATCH FILES using the external file as a table with the string as a key. >>> transform the file from long to wide. >>> Done. >>> HTH, David >>> -- >>> >>> Maurice Vergeer wrote: >>>> >>>> dear fellow list visitors, >>>> >>>> please help me with this problem. >>>> I have the following syntax which works perfectly. >>>> >>>> It 'replaces' strings in old variables (name1 to name14) into >>>> numerical ones in a new variable (newname1 to newname14). >>>> >>>> example: >>>> vector name=name1 to name14. >>>> vector newname(14). >>>> loop i=1 to 14. >>>> do repeat a="alpha" "beta" "gamma" / b=1 2 >>>> 3. >>>> - if name(i) = a newname(i)=b. >>>> end repeat print. >>>> end loop. >>>> >>>> >>>> However, instead of three values (alpha beta and gamma) I have >>>> thousands of unique string values stored in a separate system file, >>>> each identified with a unique numerical code. >>>> How can I insert these values in the do repeat function (after 'a=' >>>> and after 'b=')? >>>> >>>> The reason why I want to change these from string to numeric ones is >>>> that I know the system file will be smaller and hopefully also faster >>>> to read. >>>> >>>> You help is much appreciated. >>>> >>>> sincerely >>>> Maurice >>>> >>>> >>>> >>>> >>>> -- >>>> ___________________________________________________________________ >>>> Maurice Vergeer >>>> Department of communication, Radboud University? � (www.ru.nl) >>>> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands >>>> >>>> Visiting Professor Yeungnam University, Gyeongsan, South Korea >>>> >>>> Recent publications: >>>> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). >>>> Online social networks and micro-blogging in political campaigning: >>>> The exploration of a new campaign tool and a new campaign style. Party >>>> Politics. >>>> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather >> conditions >>>> and daily television use in the Netherlands, 1996-2005. International >>>> Journal of Meteorology. >>>> >>>> Webspace >>>> www.mauricevergeer.nl >>>> http://blog.mauricevergeer.nl/ >>>> www.journalisteninhetdigitaletijdperk.nl >>>> maurice.vergeer (skype) >>>> ___________________________________________________________________ >>>> >>>> ===================== >>>> To manage your subscription to SPSSX-L, send a message to >>>> [hidden email] (not to SPSSX-L), with no body text except the >>>> command. To leave the list, send the command >>>> SIGNOFF SPSSX-L >>>> For a list of commands to manage subscriptions, send the command >>>> INFO REFCARD >>>> >>> >>> >>> -- >>> View this message in context: >> > http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-wit >> h-thousands-of-unique-values-to-insert-tp4268902p4269231.html >>> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> >> >> >> -- >> ___________________________________________________________________ >> Maurice Vergeer >> Department of communication, Radboud University � (www.ru.nl) >> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands >> >> Visiting Professor Yeungnam University, Gyeongsan, South Korea >> >> Recent publications: >> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). >> Online social networks and micro-blogging in political campaigning: >> The exploration of a new campaign tool and a new campaign style. Party >> Politics. >> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions >> and daily television use in the Netherlands, 1996-2005. International >> Journal of Meteorology. >> >> Webspace >> www.mauricevergeer.nl >> http://blog.mauricevergeer.nl/ >> www.journalisteninhetdigitaletijdperk.nl >> maurice.vergeer (skype) >> ___________________________________________________________________ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > > -- > ___________________________________________________________________ > Maurice Vergeer > Department of communication, Radboud University� (www.ru.nl) > PO Box 9104, NL-6500 HE Nijmegen, The Netherlands > > Visiting Professor Yeungnam University, Gyeongsan, South Korea > > Recent publications: > -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). > Online social networks and micro-blogging in political campaigning: > The exploration of a new campaign tool and a new campaign style. Party > Politics. > -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions > and daily television use in the Netherlands, 1996–2005. International > Journal of Meteorology. > > Webspace > www.mauricevergeer.nl > http://blog.mauricevergeer.nl/ > www.journalisteninhetdigitaletijdperk.nl > maurice.vergeer (skype) > ___________________________________________________________________ > > > > > > -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University� (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996–2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Maurice Vergeer
"regarding vartstocase option, I'm not sure whether spss allows so many
columns. " .... Note you are only bring back the 14 mapped fields, not thousands of variables so it shouldn't be a problem. Hard to say whether the DO REPEAT would work with thousands of values aside from the fact that it is potentially doing an enormous number of if statements to place a given value and even after it has it will continue until the end of the list. So if you have 1000 values it will take 14000 comparisons to fill one case. (even if all 14 values are at the beginning of the list). Here is a mock up of what I had in mind. Omit any unnecessary sorts/saves depending upon your data files. ie If you already have a sequential unique ID variable in your master file omit the COMPUTE ID=$CASENUM and SAVE. HTH, David data list free / strvar (a3) nummap (f4). begin data abc 1 def 2 ghi 3 jkl 4 bcd 5 efg 6 rst 7 ijk 8 kml 9 uvw 10 end data. sort cases by strvar. save outfile 'c:\temp\temp.sav'. data list list / strvar1 to strvar4 (4A3) stuff01 to stuff20 (20f4). begin data abc def ghi jkl 3 2 3 4 3 4 2 4 3 2 3 4 2 2 4 3 2 6 7 5 bcd efg rst ijk 6 7 2 3 4 5 6 7 5 6 7 5 3 4 2 6 5 6 7 5 ghi kml rst uvw 3 4 2 6 7 5 7 6 2 3 4 6 7 5 6 7 5 7 2 5 end data. compute ID=$CASENUM. SAVE OUTFILE 'c:\temp\raw.sav'. match files / file * / keep ID strvar1 to strvar4. VARSTOCASES MAKE strvar FROM strvar1 TO strvar4 / INDEX=Index1(4) / KEEP ID. SORT CASES BY strvar. MATCH FILES / FILE * / TABLE 'c:\temp\temp.sav' / BY strvar. VECTOR numvars (4). COMPUTE numvars(index1)=nummap. AGGREGATE outfile * / BREAK id / numvars1 to numvars4=MAX(numvars1 to numvars4). MATCH FILES FILE 'c:\temp\raw.sav' / file * / BY id.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Maurice Vergeer
Best bet is these cases is to write syntax once against a properly normalized data structure (in this a sorted case long format). Do the necessary merge and then mop up with an aggregate and a final match. see my example. Sorting the huge file multiple times with all of the variables in the file in place is going to kill your hard disk. Trust me, I've been doing this stuff for over 20 years!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Dear David,
thanks for your suggestion you provided in the earlier email. I need to let it sink in what really happens in the example. True, sorting is nasty in SPSS now you mention normalizing: the variable later in line (name14) the more blanks there are. thanks Maurice On Tue, Mar 29, 2011 at 22:47, David Marso <[hidden email]> wrote: > Best bet is these cases is to write syntax once against a properly normalized > data structure (in this a sorted case long format). � Do the necessary merge > and then mop up with an aggregate and a final match. � see my example. > Sorting the huge file multiple times with all of the variables in the file > in place is going to kill your hard disk. � Trust me, I've been doing this > stuff for over 20 years! > > > -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
"True, sorting is nasty in SPSS "
Especially if you are sorting the entire file 14 times. The nasty comes in from dragging everything else along in the sort. In my example you are only sorting on the strvar and carrying along the original ID. If you already have an ID variable in your file (recommended best practices IMNSHO) you can skip a few expensive steps. The normalized version (lean-mean-long data structure) is going to run faster than 14 sorts of the entire file (I assume you have much more than just these string variables in your file). "now you mention normalizing: the variable later in line (name14) the more blanks there are." Please complete this last thought... If you have many blanks then SELECT IF strvar NE " " prior to the sort of the long file. Any blanks will be unmapped and you cam recode the sysmis later across the 14 vars. OTOH, varstocases has a NULL clause which might be useful. HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Maurice Vergeer
but some of these possibilities in some combination may help. try running on a machine with more memory Make sure that you have a lot of WORKSPACE so your run does not go virtual.� Make sure that you are running local not from a server. Make sure that you have lots of disk space. when you GET the original file /keep on the 14 variables and a caseid. � � � � � � autorecode that file� SAVE with / keep for the 14 new variables and MATCH CASES Use N OF CASES to get a subset for developing your syntax and then a larger subset to generate a template. � Write varstocases that saves a new file with a single variable and use that to generate a template. then apply the template to the original. run autorecode on a few variables to generate a template and try that template on all 14. HTH Art Kendall On 3/29/2011 8:29 AM, Maurice Vergeer wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDdear all, thanks for your suggestions. Regarding autorecode (David and Art's suggestion): I tried this, but it took enormously long, so I interrupted it. The point is, there are thousands of unique values, but appr. 4.5 million records (file size over 3 gigabyte). So, it's large. regarding vartstocase option, I'm not sure whether spss allows so many columns. The values as such are not necessarily meaningful but need to stay unique. It appears there is no easy or obvious solution. One option not explored yet is just inserting the string values and numerical values in the do repeat. This would result in a very large syntax file. This is a dirty solution, not sure whether it's quick either. Tonight I'll try to run one of options above and see whether it'll be finished when I return from work tomorrow afternoon. I'll let you kno whether it worked. thanks again Maurice On Tue, Mar 29, 2011 at 20:37, David Marso [hidden email] wrote:Hi Maurice, If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes have some specific meaning). SORT your external system file by the string variable and save it. Transform your master file from wide to long using VARSTOCASES retaining caseidentifier and string and index. SORT by string. MATCH FILES using the external file as a table with the string as a key. transform the file from long to wide. Done. HTH, David -- Maurice Vergeer wrote:dear fellow list visitors, please help me with this problem. I have the following syntax which works perfectly. It 'replaces' strings in old variables (name1 to name14) into numerical ones in a new variable (newname1 to newname14). example: vector name=name1 to name14. vector newname(14). loop i=1 to 14. do repeat a="alpha" "beta" "gamma" / b=1 2 3. - if name(i) = a newname(i)=b. end repeat print. end loop. However, instead of three values (alpha beta and gamma) I have thousands of unique string values stored in a separate system file, each identified with a unique numerical code. How can I insert these values in the do repeat function (after 'a=' and after 'b=')? The reason why I want to change these from string to numeric ones is that I know the system file will be smaller and hopefully also faster to read. You help is much appreciated. sincerely Maurice -- ___________________________________________________________________ Maurice Vergeer Department of communication, Radboud University� � (www.ru.nl) PO Box 9104, NL-6500 HE Nijmegen, The Netherlands Visiting Professor Yeungnam University, Gyeongsan, South Korea Recent publications: -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). Online social networks and micro-blogging in political campaigning: The exploration of a new campaign tool and a new campaign style. Party Politics. -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions and daily television use in the Netherlands, 1996–2005. International Journal of Meteorology. Webspace www.mauricevergeer.nl http://blog.mauricevergeer.nl/ www.journalisteninhetdigitaletijdperk.nl maurice.vergeer (skype) ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD-- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-with-thousands-of-unique-values-to-insert-tp4268902p4269231.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Bruce Weaver
Hi Maurice,
I'd store all the char-to-num translations of your reference file into one Python dictionary and generate spss syntax based on that. It could be a long list of IF statements, or perhaps one big fat recode. I can help you with that if you want. I don't have spss on my laptop though, so just let me know. Here's a first sketch: import spss cursor = spss.Cursor([0:2]). # char var and num codes in first and second column mytable = dict([rec for rec in cursor.fetchall()]) # this may not be entirely correct cursor.close() recode = "recode myvar " for ch, num in mytable.iteritems(): recode += "('%s'=%d) " spss.Submit(recode + ".\nexe.") Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Bruce Weaver <[hidden email]> To: [hidden email] Sent: Tue, March 29, 2011 1:08:51 PM Subject: Re: [SPSSX-L] loop and do repeat problem with thousands of unique values to insert Hi Maurice. Does AUTORECODE with /GROUP give you what you want? AUTORECODE VARIABLES=name1 to name14 /INTO newname1 to newname14 /GROUP. Maurice Vergeer wrote: > > dear fellow list visitors, > > please help me with this problem. > I have the following syntax which works perfectly. > > It 'replaces' strings in old variables (name1 to name14) into > numerical ones in a new variable (newname1 to newname14). > > example: > vector name=name1 to name14. > vector newname(14). > loop i=1 to 14. > do repeat a="alpha" "beta" "gamma" / b=1 2 > 3. > - if name(i) = a newname(i)=b. > end repeat print. > end loop. > > > However, instead of three values (alpha beta and gamma) I have > thousands of unique string values stored in a separate system file, > each identified with a unique numerical code. > How can I insert these values in the do repeat function (after 'a=' > and after 'b=')? > > The reason why I want to change these from string to numeric ones is > that I know the system file will be smaller and hopefully also faster > to read. > > You help is much appreciated. > > sincerely > Maurice > > > > > -- > ___________________________________________________________________ > Maurice Vergeer > Department of communication, Radboud University� (www.ru.nl) > PO Box 9104, NL-6500 HE Nijmegen, The Netherlands > > Visiting Professor Yeungnam University, Gyeongsan, South Korea > > Recent publications: > -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication). > Online social networks and micro-blogging in political campaigning: > The exploration of a new campaign tool and a new campaign style. Party > Politics. > -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions > and daily television use in the Netherlands, 1996–2005. International > Journal of Meteorology. > > Webspace > www.mauricevergeer.nl > www.journalisteninhetdigitaletijdperk.nl > maurice.vergeer (skype) > ___________________________________________________________________ > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-with-thousands-of-unique-values-to-insert-tp4268902p4269185.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
"or perhaps one big fat recode."
RECODE would be the way to go rather than all those IF statements! ---------------------- That would likely work. Kind of reminds me (in spirit) of the "old school" syntax generation/include code I first inflicted upon the world in the early 90's ;-) recap (may be wrong in details, but basic idea is as follows.... Since my version doesn't support Python ...In all likelihood this approach (using python) will be more efficient than the wholesale butchery , the match and aggregate mop-up of the file I suggested previously. For historical reference only! Can you believe this actually works (sort of ;-)))) OTOH: I believe we have grown beyond it. --------------------------------------------- GET FILE "REFERENCEMappingFILE". DO IF $CASENUM=1. WRITE OUTFILE 'OMG I cant believe Im posting this.txt' /"RECODE var01 TO var14 ". END IF. WRITE OUTFILE 'OMG I cant believe Im posting this.txt' /" ("," '",strval,"'=", numvalue,")" . DO IF $CASENUM=1. WRITE OUTFILE 'OMG I cant believe Im posting this.txt' /" INTO newvar01 TO newvar14 ". END IF. Similar gobblygook for VALUE LABELS. EXE. GET FILE "bigfatfile". INCLUDE 'OMG I cant believe Im posting this.txt'. (or INSERT if that floats your boat more steady)....
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I will be out of office on March 29 afternoon from 1pm. I will have very limited access to email. If you need immediate assistance please
contact 479-575-2905. Thank you. |
Administrator
|
In reply to this post by David Marso
I like it, so I must be old-school. But I think Jon P is cringing. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by David Marso
;-)
Hmmm, the good ol' WRITE OUTFILE. I remember I generated 25 pages of IF statements and my colleague printed that syntax. *sigh* so many trees. I wonder if there's a limit to the number of recodes within one RECODE though. Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: David Marso <[hidden email]> To: [hidden email] Sent: Tue, March 29, 2011 8:59:47 PM Subject: Re: [SPSSX-L] loop and do repeat problem with thousands of unique values to insert "or perhaps one big fat recode." RECODE would be the way to go rather than all those IF statements! ---------------------- That would likely work. Kind of reminds me (in spirit) of the "old school" syntax generation/include code I first inflicted upon the world in the early 90's ;-) recap (may be wrong in details, but basic idea is as follows.... Since my version doesn't support Python ...In all likelihood this approach (using python) will be more efficient than the wholesale butchery , the match and aggregate mop-up of the file I suggested previously. For historical reference only! Can you believe this actually works (sort of ;-)))) OTOH: I believe we have grown beyond it. --------------------------------------------- GET FILE "REFERENCEMappingFILE". DO IF $CASENUM=1. WRITE OUTFILE 'OMG I cant believe Im posting this.txt' /"RECODE var01 TO var14 ". END IF. WRITE OUTFILE 'OMG I cant believe Im posting this.txt' /" ("," '",strval,"'=", numvalue,")" . DO IF $CASENUM=1. WRITE OUTFILE 'OMG I cant believe Im posting this.txt' /" INTO newvar01 TO newvar14 ". END IF. Similar gobblygook for VALUE LABELS. EXE. GET FILE "bigfatfile". INCLUDE 'OMG I cant believe Im posting this.txt'. (or INSERT if that floats your boat more steady).... Albert-Jan Roskam wrote: > > Hi Maurice, > > I'd store all the char-to-num translations of your reference file into one > Python dictionary and generate spss syntax based on that. It could be a > long > list of IF statements, or perhaps one big fat recode. > > I can help you with that if you want. I don't have spss on my laptop > though, so > just let me know.Here's a first sketch: > > > import spss > cursor = spss.Cursor([0:2]). # char var and num codes in first and second > column > mytable = dict([rec for rec in cursor.fetchall()]) # this may not be > entirely > correct > cursor.close() > recode = "recode myvar " > for ch, num in mytable.iteritems(): > recode += "('%s'=%d) " > spss.Submit(recode + ".\nexe.") > > Cheers!! > Albert-Jan > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > All right, but apart from the sanitation, the medicine, education, wine, > public > order, irrigation, roads, a fresh water system, and public health, what > have the > Romans ever done for us? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > ________________________________ > From: Bruce Weaver <[hidden email]> > To: [hidden email] > Sent: Tue, March 29, 2011 1:08:51 PM > Subject: Re: [SPSSX-L] loop and do repeat problem with thousands of unique > values to insert > > Hi Maurice. Does AUTORECODE with /GROUP give you what you want? > > AUTORECODE VARIABLES=name1 to name14 > /INTO newname1 to newname14 > /GROUP. > > > > Maurice Vergeer wrote: > > > > dear fellow list visitors, > > > > please help me with this problem. > > I have the following syntax which works perfectly. > > > > It 'replaces' strings in old variables (name1 to name14) into > > numerical ones in a new variable (newname1 to newname14). > > > > example: > > vector name=name1 to name14. > > vector newname(14). > > loop i=1 to 14. > > do repeat a=&quot;alpha&quot; &quot;beta&quot; > &quot;gamma&quot; / b=1 2 > > 3. > > - if name(i) = a newname(i)=b. > > end repeat print. > > end loop. > > > > > > However, instead of three values (alpha beta and gamma) I have > > thousands of unique string values stored in a separate system file, > > each identified with a unique numerical code. > > How can I insert these values in the do repeat function (after 'a=' > > and after 'b=')? > > > > The reason why I want to change these from string to numeric ones is > > that I know the system file will be smaller and hopefully also faster > > to read. > > > > You help is much appreciated. > > > > sincerely > > Maurice > > > > > > > > > > -- > > ___________________________________________________________________ > > Maurice Vergeer > > Department of communication, Radboud University� (www.ru.nl) > > PO Box 9104, NL-6500 HE Nijmegen, The Netherlands > > > > Visiting Professor Yeungnam University, Gyeongsan, South Korea > > > > Recent publications: > > -Vergeer, M., Hermans, L., &amp; Sams, S. (accepted for > publication). > > Online social networks and micro-blogging in political campaigning: > > The exploration of a new campaign tool and a new campaign style. > Party > > Politics. > > -Eisinga, R., Franses, Ph.H., &amp; Vergeer, M. (2010). Weather > conditions > > and daily television use in the Netherlands, 1996–2005. International > > Journal of Meteorology. > > > > Webspace > > www.mauricevergeer.nl > > www.journalisteninhetdigitaletijdperk.nl > > maurice.vergeer (skype) > > ___________________________________________________________________ > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > [hidden email] (not to SPSSX-L), with no body text except > the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the command > > INFO REFCARD > > > > > ----- > -- > Bruce Weaver > [hidden email] > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-with-thousands-of-unique-values-to-insert-tp4268902p4269185.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-with-thousands-of-unique-values-to-insert-tp4268902p4269934.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
"But I think Jon P is cringing. ;-)"
Of course, why else would I possibly post such banal tripe ;)))) Vi Anne is fighting back an urge to respond and Jon fry is probably chuckling. --- Back in the "good old days" it was sometimes necessary to resort to hard core ass backwards methods to get the job done. We didn't have dialog boxes! Hell we didn't even have the data editor to look at. The only sanity checks we had were PRINT statements nested in DO IF statements. For those lucky to have experienced SPSS PC+ we had a limit of 512K of memory (later versions more). I recall being overjoyed when my 386 got a memory upgrade (from 16M to 32M) and a math coprocessor installed. I recall at one time in my life being totally mystified by MACRO language and wondering why !LET !ARG=!ARG+1 didn't work. I could go on and on but won't. LOL!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |