Re: loop and do repeat problem with thousands of unique values to insert

Posted by Maurice Vergeer on
URL: http://spssx-discussion.165.s1.nabble.com/loop-and-do-repeat-problem-with-thousands-of-unique-values-to-insert-tp4268902p4269402.html

Dear Gene,

this solution crossed my mind. But I dismissed it.
However, the solution can be right under your nose and not see it
because it seems too simple.
It's straightforward, a bit inefficient (running it 14 times), but it
gets the job done.
The only thing I worry about is having to sort on the 14 variables. I
could do it as follows:
sort cases by name1 name2 name3 ... name14.
or
sort cases by name1.
match fieles etc
and this another 13 times.

the first option is more efficient in syntax, but hard on spss.

Still, with two hours before going to bed, this seems the most likely candidate.

thanks


On Tue, Mar 29, 2011 at 22:12, Gene Maguin <[hidden email]> wrote:

> Maurice,
>
> There's another solution to this. I don't know whether it is faster than
> autorecode or David's varstocases restructure. (I think you should try the
> varstocases because, if I understand your correctly, you have only 14
> variables. Not to be snarky, but that is a trivial number of variables.)
> However, maybe it's not. So, the basic problem is that across the 14
> variables you want to make sure the same string value gets the same numeric
> value in the corresponding new variable. Sort the master file by string
> value and use $casenum to assign the numeric value. Save it. Open up the
> file with the 14 variables and sort by the first variable, call it var1. Do
> a match files with table and a rename for the two variables in the master
> file so that the string value variable name is var1 and the numeric value
> variable will be unique and the var1 as the by variable. You'll need to
> repeat that sequence of sort cases-match files 14 times, once for each of
> your 14 variables.
>
> Gene Maguin
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Maurice Vergeer
> Sent: Tuesday, March 29, 2011 8:29 AM
> To: [hidden email]
> Subject: Re: loop and do repeat problem with thousands of unique values to
> insert
>
> dear all,
>
> thanks for your suggestions.
>
> Regarding autorecode (David and Art's suggestion): I tried this, but
> it took enormously long, so I interrupted it. The point is, there are
> thousands of unique values, but appr. 4.5 million records (file size
> over 3 gigabyte). So, it's large.
>
> regarding vartstocase option, I'm not sure whether spss allows so many
> columns. The values as such are not necessarily meaningful but need to
> stay unique.
>
> It appears there is no easy or obvious solution.
> One option not explored yet is just inserting the string values and
> numerical values in the do repeat.
> This would result in a very large syntax file. This is a dirty
> solution, not sure whether it's quick either.
>
> Tonight I'll try to run one of options above and see whether it'll be
> finished when I return from work tomorrow afternoon.
>
> I'll let you kno whether it worked.
>
> thanks again
> Maurice
>
>
>
>
> On Tue, Mar 29, 2011 at 20:37, David Marso <[hidden email]> wrote:
>> Hi Maurice,
>> If the AUTORECODE ../GROUP is not what you wish (ie your numeric codes
> have
>> some specific meaning).
>> SORT your external system file by the string variable and save it.
>> Transform your master file from wide to long using VARSTOCASES retaining
>> caseidentifier and string and index.
>> SORT by string.
>> MATCH FILES using the external file as a table with the string as a key.
>> transform the file from long to wide.
>> Done.
>> HTH, David
>> --
>>
>> Maurice Vergeer wrote:
>>>
>>> dear fellow list visitors,
>>>
>>> please help me with this problem.
>>> I have the following syntax which works perfectly.
>>>
>>> It 'replaces' strings in old variables (name1 to name14) into
>>> numerical ones in a new variable (newname1 to newname14).
>>>
>>> example:
>>> vector name=name1 to name14.
>>> vector newname(14).
>>> loop i=1 to 14.
>>> do repeat a=&quot;alpha&quot; &quot;beta&quot; &quot;gamma&quot; / b=1 2
>>> 3.
>>> - if name(i) = a newname(i)=b.
>>> end repeat print.
>>> end loop.
>>>
>>>
>>> However, instead of three values (alpha beta and gamma) I have
>>> thousands of unique string values stored in a separate system file,
>>> each identified with a unique numerical code.
>>> How can I insert these values in the do repeat function (after 'a='
>>> and after 'b=')?
>>>
>>> The reason why I want to change these from string to numeric ones is
>>> that I know the system file will be smaller and hopefully also faster
>>> to read.
>>>
>>> You help is much appreciated.
>>>
>>> sincerely
>>> Maurice
>>>
>>>
>>>
>>>
>>> --
>>> ___________________________________________________________________
>>> Maurice Vergeer
>>> Department of communication, Radboud University? � (www.ru.nl)
>>> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands
>>>
>>> Visiting Professor Yeungnam University, Gyeongsan, South Korea
>>>
>>> Recent publications:
>>> -Vergeer, M., Hermans, L., &amp; Sams, S. (accepted for publication).
>>> Online social networks and micro-blogging in political campaigning:
>>> The exploration of a new campaign tool and a new campaign style. Party
>>> Politics.
>>> -Eisinga, R., Franses, Ph.H., &amp; Vergeer, M. (2010). Weather
> conditions
>>> and daily television use in the Netherlands, 1996-2005. International
>>> Journal of Meteorology.
>>>
>>> Webspace
>>> www.mauricevergeer.nl
>>> http://blog.mauricevergeer.nl/
>>> www.journalisteninhetdigitaletijdperk.nl
>>> maurice.vergeer (skype)
>>> ___________________________________________________________________
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>>
>> --
>> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/loop-and-do-repeat-problem-wit
> h-thousands-of-unique-values-to-insert-tp4268902p4269231.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
>
> --
> ___________________________________________________________________
> Maurice Vergeer
> Department of communication, Radboud University � (www.ru.nl)
> PO Box 9104, NL-6500 HE Nijmegen, The Netherlands
>
> Visiting Professor Yeungnam University, Gyeongsan, South Korea
>
> Recent publications:
> -Vergeer, M., Hermans, L., & Sams, S. (accepted for publication).
> Online social networks and micro-blogging in political campaigning:
> The exploration of a new campaign tool and a new campaign style. Party
> Politics.
> -Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions
> and daily television use in the Netherlands, 1996-2005. International
> Journal of Meteorology.
>
> Webspace
> www.mauricevergeer.nl
> http://blog.mauricevergeer.nl/
> www.journalisteninhetdigitaletijdperk.nl
> maurice.vergeer (skype)
> ___________________________________________________________________
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
___________________________________________________________________
Maurice Vergeer
Department of communication, Radboud University�  (www.ru.nl)
PO Box 9104, NL-6500 HE Nijmegen, The Netherlands

Visiting Professor Yeungnam University, Gyeongsan, South Korea

Recent publications:
-Vergeer, M., Hermans, L., & Sams, S. (accepted for publication).
Online social networks and micro-blogging in political campaigning:
The exploration of a new campaign tool and a new campaign style. Party
Politics.
-Eisinga, R., Franses, Ph.H., & Vergeer, M. (2010). Weather conditions
and daily television use in the Netherlands, 1996–2005. International
Journal of Meteorology.

Webspace
www.mauricevergeer.nl
http://blog.mauricevergeer.nl/
www.journalisteninhetdigitaletijdperk.nl
maurice.vergeer (skype)
___________________________________________________________________

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD