scratch variables in do-repeat loop

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

scratch variables in do-repeat loop

mpirritano

All,

 

I have a very large file (+4GB) and I don’t want to add any unnecessary variables to it. I am identifying claims with a diagnosis code indicating that a mammography was done. There are eight diagnosis codes so I need to check each one of them. I could create a new variable for each diagnosis code saying whether it matched the ICD9 code for mammography and then count the hits across these new variables. If there was at least one hit I would say that there was a mammography. I’m trying to do this with do repeat using scratch variables so that I don’t have to create 8 new variables. Let’s say for the sake of argument that “50000” is the code for mammography.

 

data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).

begin data

50000 64233 44444

88888 00321 23456

89989 50000 23242

end data.

 

dataset name test.

 

dataset activate test.

do repeat diagnosis = test1 to test3

               /mammo_diag = #mammo_diag1 to #mammo_diag3.

                if (ltrim(rtrim(diagnosis)) = "50000") mammo_diag = 1.

end repeat.

exe.

 

if (sum.1(#mammo_diag1 to #mammo_diag3)>0) mammo = 1.

exe.

 

“mammo” is the variable that says 1 for yes there was a mammography done on that line. But what is happening is that when “50000” is identified on line one of the data every subsequent line is getting a 1.

 

Can someone explain to me why this is not working? I’m sure I’ve done this before.

 

Thanks

Matt

 

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

 

Reply | Threaded
Open this post in threaded view
|

scratch variables in do-repeat loop

mpirritano

There should not have been an ‘exe’ after the do-repeat.

 

dataset name test.

 

dataset activate test.

do repeat diagnosis = test1 to test3

               /mammo_diag = #mammo_diag1 to #mammo_diag3.

                if (ltrim(rtrim(diagnosis)) = "50000") mammo_diag = 1.

end repeat.

 

if (sum.1(#mammo_diag1 to #mammo_diag3)>0) mammo = 1.

exe.

 

I’ve just discovered that this does work if I replace the ‘if’ statement in the do-repeat with a compute statement.

 

do repeat diagnosis = test1 to test3

               /mammo_diag = #mammo_diag1 to #mammo_diag3.

                Compute mammo_diag = ltrim(rtrim(diagnosis)) = "50000".

end repeat.

 

 

Thanks

Matt

 

 

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

 

Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Maguin, Eugene
In reply to this post by mpirritano
Matt,
 
I think your do repeat is unecessarily complex. I think this ought to work
 
compute mammo_diag=0.

do repeat diagnosis = test1 to test3.

+   if (test = "50000") mammo_diag = mammo_diag + 1.

end repeat.

 
 
Gene Maguin


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew
Sent: Tuesday, February 22, 2011 11:13 AM
To: [hidden email]
Subject: scratch variables in do-repeat loop

All,

 

I have a very large file (+4GB) and I don’t want to add any unnecessary variables to it. I am identifying claims with a diagnosis code indicating that a mammography was done. There are eight diagnosis codes so I need to check each one of them. I could create a new variable for each diagnosis code saying whether it matched the ICD9 code for mammography and then count the hits across these new variables. If there was at least one hit I would say that there was a mammography. I’m trying to do this with do repeat using scratch variables so that I don’t have to create 8 new variables. Let’s say for the sake of argument that “50000” is the code for mammography.

 

data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).

begin data

50000 64233 44444

88888 00321 23456

89989 50000 23242

end data.

 

dataset name test.

 

dataset activate test.

do repeat diagnosis = test1 to test3

               /mammo_diag = #mammo_diag1 to #mammo_diag3.

                if (ltrim(rtrim(diagnosis)) = "50000") mammo_diag = 1.

end repeat.

exe.

 

if (sum.1(#mammo_diag1 to #mammo_diag3)>0) mammo = 1.

exe.

 

“mammo” is the variable that says 1 for yes there was a mammography done on that line. But what is happening is that when “50000” is identified on line one of the data every subsequent line is getting a 1.

 

Can someone explain to me why this is not working? I’m sure I’ve done this before.

 

Thanks

Matt

 

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

 

Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
In reply to this post by mpirritano
try this.

data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).
begin data
50000 64233 44444
88888 00321 23456
89989 50000 23242
end data.
dataset name test.
dataset activate test.
compute mammo = any("50000", ltrim(rtrim(test1)),ltrim(rtrim(test2)),ltrim(rtrim(test3))).
compute mammo2 =  index(concat(test1 to test3),"50000") gt 0.
list.

Art Kendall
Social Research Consultants

On 2/22/2011 11:13 AM, Pirritano, Matthew wrote:

All,

 

I have a very large file (+4GB) and I don’t want to add any unnecessary variables to it. I am identifying claims with a diagnosis code indicating that a mammography was done. There are eight diagnosis codes so I need to check each one of them. I could create a new variable for each diagnosis code saying whether it matched the ICD9 code for mammography and then count the hits across these new variables. If there was at least one hit I would say that there was a mammography. I’m trying to do this with do repeat using scratch variables so that I don’t have to create 8 new variables. Let’s say for the sake of argument that “50000” is the code for mammography.

 

data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).

begin data

50000 64233 44444

88888 00321 23456

89989 50000 23242

end data.

 

dataset name test.

 

dataset activate test.

do repeat diagnosis = test1 to test3

               /mammo_diag = #mammo_diag1 to #mammo_diag3.

                if (ltrim(rtrim(diagnosis)) = "50000") mammo_diag = 1.

end repeat.

exe.

 

if (sum.1(#mammo_diag1 to #mammo_diag3)>0) mammo = 1.

exe.

 

“mammo” is the variable that says 1 for yes there was a mammography done on that line. But what is happening is that when “50000” is identified on line one of the data every subsequent line is getting a 1.

 

Can someone explain to me why this is not working? I’m sure I’ve done this before.

 

Thanks

Matt

 

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
In reply to this post by mpirritano
Replace the previous syntax  with this definition of mammo2 just in case the target string crosses between words.
I don't know whether mammo or mammo2 is more efficient.

data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).
begin data
50000 64233 44444
88888 00321 23456
89989 50000 23242
end data.
dataset name test.
dataset activate test.
compute mammo = any("50000", ltrim(rtrim(test1)),ltrim(rtrim(test2)),ltrim(rtrim(test3))).
compute mammo2 =  index(concat(test1,'x',test2,'x'test3),"50000") gt 0.
list.

Art Kendall
Social Research Consultants

On 2/22/2011 11:13 AM, Pirritano, Matthew wrote:

if (sum.1(#mammo_diag1 to #mammo_diag3)>0) mammo = 1.

exe.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

David Marso
Administrator
In reply to this post by mpirritano
Yet another method (might be marginally faster than the DO REPEAT as it short circuits the lookup).:
Would be interesting to compare the 3 approaches (LOOP/DO REPEAT/ANY?).
HTH, David
--
data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).
begin data
50000 64233 44444
88888 00321 23456
89989 50000 23242
end data.
VECTOR test=test1 to test3.
LOOP #=1 to 3.
COMPUTE MAMM=LTRIM(RTRIM(test(#)))="50000".
END LOOP IF MAMM.
end loop.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
I ran the syntax below and the output had.
what                  useLOOP       useANY    useCONCAT    useREPEAT
 
Processor         0:00:00.141  0:00:00.157  0:00:00.094  0:00:00.078
Elapsed           0:00:00.140  0:00:00.157  0:00:00.094  0:00:00.079


I guess that the processor time is legitimately more than the elapsed time when using the LOOP method because I have 4 processors.

However the .079 second difference between ANY and REPEAT is much less than it would take to worry about which approach to use.

Art Kendall
Social Research Consultants

new file.
input program.
string test1 to test3(a7).
loop #i = 1 to 10000.
compute test1 =' 50000 '.
compute test2 =' 64233 '.
compute test3 =' 44444 '.
end case.
compute test1 =' 88888 '.
compute test2 =' 00321 '.
compute test3 =' 23456 '.
end case.
compute test1 =' 89988 '.
compute test2 =' 50000 '.
compute test3 =' 23432 '.
end case.
end loop.
end file.
end input program.
*need actual procedure to get notes.
frequencies vars = test1.
VECTOR test=test1 to test3.
LOOP #=1 to 3.
COMPUTE MAMM=LTRIM(RTRIM(test(#)))="50000".
END LOOP IF MAMM.
frequencies vars = mamm.
compute mamm = any("50000", ltrim(rtrim(test1)),ltrim(rtrim(test2)),ltrim(rtrim(test3))).
frequencies vars = mamm.
compute mamm =  index(concat(test1,'x',test2,'x',test3),"50000") gt 0.
frequencies vars = mamm.
compute mamm=0.
do repeat test = test1 to test3.
+   if (test = "50000") mamm = mamm + 1.
end repeat.
frequencies vars = mamm.

On 2/22/2011 1:57 PM, David Marso wrote:
Yet another method (might be marginally faster than the DO REPEAT as it short
circuits the lookup).:
Would be interesting to compare the 3 approaches (LOOP/DO REPEAT/ANY?).
HTH, David
--
data list / test1 1-5 (A) test2 7-11 (A) test3 13-17 (A).
begin data
50000 64233 44444
88888 00321 23456
89989 50000 23242
end data.
VECTOR test=test1 to test3.
LOOP #=1 to 3.
COMPUTE MAMM=LTRIM(RTRIM(test(#)))="50000".
END LOOP IF MAMM.
end loop.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/scratch-variables-in-do-repeat-loop-tp3395727p3395991.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

David Marso
Administrator
Try the following and what numbers do you get?
----
With 10000 cases in your example and only 3 fields probably difficult to get good signal.
Notes from FREQ is probably *not* the best way to get timings from SPSS transformations ;-(

-----

input program.
VECTOR test(10,A7).
loop #i = 1 to 1000000.
DO REPEAT t=test1 to test10.
compute t=concat(" ",string(trunc(uniform(99999)),F5)," ").
compute #=uniform(1).
if # > .90 t=" 50000 ".
END REPEAT.
end case.
end loop.
end file.
end input program.
exe.

DO IF $CASENUM=1.
PRINT / $TIME.
END IF.
EXE.

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=LTRIM(RTRIM(test(#)))="50000".
+  DO IF MAMM01.
+    COMPUTE last01=#.
+    BREAK.
+  END IF.
END LOOP.
DO IF $CASENUM=1.
PRINT / $TIME.
END IF.
exe.

compute mamm02 = any("50000", ltrim(rtrim(test1)),
                              ltrim(rtrim(test2)),
                              ltrim(rtrim(test3)),
                              ltrim(rtrim(test4)),  
                              ltrim(rtrim(test5)),
                              ltrim(rtrim(test6)),
                              ltrim(rtrim(test7)),
                              ltrim(rtrim(test8)),
                              ltrim(rtrim(test9)),
                              ltrim(rtrim(test10))     ).
DO IF $CASENUM=1.
PRINT / $TIME.
END IF.
EXE.


compute mamm03 =  index(concat(test1,
                         'x',test2,
                         'x',test3,
                         'x',test4,
                         'x',test5,
                         'x',test6,
                         'x',test7,
                         'x',test8,
                         'x',test9,
                         'x',test10),"50000") gt 0.

DO IF $CASENUM=1.
PRINT / $TIME.
END IF.
EXE.
compute mamm04=0.
do repeat test = test1 to test10.
+   if (test = " 50000 ") mamm04 = mamm04 + 1.
end repeat.
DO IF $CASENUM=1.
PRINT / $TIME.
END IF.
EXE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Rich Ulrich
In reply to this post by mpirritano

As to the original question about "Why ..."
 - You were using Temporary variables, the ones that
start with #.   These have the *important*  property
(at times like this)  that the value from a previous case
is held over to a new case.   That is to say, the temp-var
is *not*  re-initialized to missing for a new record.

That screwed me up once, too.
 - You can make use of this (odd) feature by, for instance,
using a temp-var to create a cumulative sum, when the
temp-var is re-initialized only for a new variable.
 - It has always seemed more transparent (and more
sure to remain a feature)  to use an explicit Lag for that
sort of thing, instead.

--
Rich Ulrich



[snip, most]
 

>“mammo” is the variable that says 1 for yes there was a mammography done on that line. But what is happening is that when “50000” is identified on line one of the data every subsequent line is getting a 1.

 

>Can someone explain to me why this is not working? I’m sure I’ve done this before.

 


Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Jon K Peck
Or use LEAVE.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Rich Ulrich <[hidden email]>
To:        [hidden email]
Date:        02/22/2011 05:30 PM
Subject:        Re: [SPSSX-L] scratch variables in do-repeat loop
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





As to the original question about "Why ..."
- You were using Temporary variables, the ones that
start with #.   These have the *important*  property
(at times like this)  that the value from a previous case
is held over to a new case.   That is to say, the temp-var
is *not*  re-initialized to missing for a new record.

That screwed me up once, too.
- You can make use of this (odd) feature by, for instance,
using a temp-var to create a cumulative sum, when the
temp-var is re-initialized only for a new variable.
- It has always seemed more transparent (and more
sure to remain a feature)  to use an explicit Lag for that
sort of thing, instead.

--
Rich Ulrich



[snip, most]

>“mammo” is the variable that says 1 for yes there was a mammography done on that line. But what is happening is that when “50000” is identified on line one of the data every subsequent line is getting a 1.
 
>Can someone explain to me why this is not working? I’m sure I’ve done this before.
 

Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
In reply to this post by David Marso
I tried David's syntax with a few mods.
1) moved vertical location of PRINT command so I could see what to put in additional info on PRINT.
2) added info to tell me  time for what was being printed.

And then
3) In <edit> <options> unchecked printing commands in the output.
4) copied-and-pasted from the output to  in between the begin data... end data in the second set of syntax below.
5) ran the second set of syntax.

this is what I found for 100,000 cases!
what        clocktime elapsed
 
start     13517843692      .
loop      13517843703     11
any       13517843717     14
concat    13517843724      7
repeat    13517843728      4



What results do other members of the list get if they try the same thing?

Art Kendall
Social Research Consultants

new file.
dataset name biggie.
input program.
VECTOR test(10,A7).
loop #i = 1 to 1000000.
DO REPEAT t=test1 to test10.
compute t=concat(" ",string(trunc(uniform(99999)),F5)," ").
compute #=uniform(1).
if # > .90 t=" 50000 ".
END REPEAT.
end case.
end loop.
end file.
end input program.
exe.

DO IF $CASENUM=1.
PRINT /"'start'" $TIME.
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=LTRIM(RTRIM(test(#)))="50000".
+  DO IF MAMM01.
+    COMPUTE last01=#.
+    BREAK.
+  END IF.
END LOOP.
exe.

DO IF $CASENUM=1.
PRINT /"'loop'" $TIME.
END IF.
compute mamm02 = any("50000", ltrim(rtrim(test1)),
                              ltrim(rtrim(test2)),
                              ltrim(rtrim(test3)),
                              ltrim(rtrim(test4)),
                              ltrim(rtrim(test5)),
                              ltrim(rtrim(test6)),
                              ltrim(rtrim(test7)),
                              ltrim(rtrim(test8)),
                              ltrim(rtrim(test9)),
                              ltrim(rtrim(test10))     ).
EXE.
DO IF $CASENUM=1.
PRINT /"'any'" $TIME.
END IF.

compute mamm03 =  index(concat(test1,
                         'x',test2,
                         'x',test3,
                         'x',test4,
                         'x',test5,
                         'x',test6,
                         'x',test7,
                         'x',test8,
                         'x',test9,
                         'x',test10),"50000") gt 0.
EXE.
DO IF $CASENUM=1.
PRINT / "'concat'" $TIME.
END IF.
compute mamm04=0.
do repeat test = test1 to test10.
+   if (test = " 50000 ") mamm04 = mamm04 + 1.
end repeat.
EXE.
DO IF $CASENUM=1.
PRINT /"'repeat'" $TIME.
END IF.
execute.

----------
new file.
dataset name timediff.
data list list/ what (a9) clocktime (f11).
begin data
'start'         13517843692
'loop'         13517843703
'any'         13517843717
'concat'         13517843724
'repeat'         13517843728
end data.
compute elapsed = clocktime - lag(clocktime).
format elapsed (f5).
list.




On 2/22/2011 4:43 PM, David Marso wrote:
<snip>
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/scratch-variables-in-do-repeat-loop-tp3395727p3396203.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

David Marso
Administrator
Looks like LTRIM and RTRIM ***really*** slow it down.
So if one knows that there are leading or trailing spaces accommodate that in the code directly!!!
Simple string compare is fast is comparison.
Wonder how another version using NUMBER(test(#),F6) would compare?
I am VERY surprised that the INDEX(CONCAT...) method is this fast.
One other variation would be to use the direct string comparison within the loop.
Also get rid of the extra compute which was simply a check to verify early loop termination.
The early termination *should* be faster than the DO REPEAT at least in the second version below unless there is a *huge* overhead to allocate a vector!!!
Might also want to run multiple repetitions of the various methods and counterbalance their order.
are you running 100,000 cases or 1,000,000 cases as in my program?

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=test(#)=" 50000 ".
+  DO IF MAMM01.
+    BREAK.
+  END IF.
END LOOP.

also the following variation!

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  DO IF test(#)=" 50000 "
+  COMPUTE MAMM01=1.
+    BREAK.
+  END IF.
END LOOP.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
In reply to this post by Art Kendall
Correction 1 million cases.

On 2/23/2011 1:03 PM, Art Kendall wrote:
I tried David's syntax with a few mods.
1) moved vertical location of PRINT command so I could see what to put in additional info on PRINT.
2) added info to tell me  time for what was being printed.

And then
3) In <edit> <options> unchecked printing commands in the output.
4) copied-and-pasted from the output to  in between the begin data... end data in the second set of syntax below.
5) ran the second set of syntax.

this is what I found for 100,000 cases!
what        clocktime elapsed
 
start     13517843692      .
loop      13517843703     11
any       13517843717     14
concat    13517843724      7
repeat    13517843728      4



What results do other members of the list get if they try the same thing?

Art Kendall
Social Research Consultants

new file.
dataset name biggie.
input program.
VECTOR test(10,A7).
loop #i = 1 to 1000000.
DO REPEAT t=test1 to test10.
compute t=concat(" ",string(trunc(uniform(99999)),F5)," ").
compute #=uniform(1).
if # > .90 t=" 50000 ".
END REPEAT.
end case.
end loop.
end file.
end input program.
exe.

DO IF $CASENUM=1.
PRINT /"'start'" $TIME.
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=LTRIM(RTRIM(test(#)))="50000".
+  DO IF MAMM01.
+    COMPUTE last01=#.
+    BREAK.
+  END IF.
END LOOP.
exe.

DO IF $CASENUM=1.
PRINT /"'loop'" $TIME.
END IF.
compute mamm02 = any("50000", ltrim(rtrim(test1)),
                              ltrim(rtrim(test2)),
                              ltrim(rtrim(test3)),
                              ltrim(rtrim(test4)),
                              ltrim(rtrim(test5)),
                              ltrim(rtrim(test6)),
                              ltrim(rtrim(test7)),
                              ltrim(rtrim(test8)),
                              ltrim(rtrim(test9)),
                              ltrim(rtrim(test10))     ).
EXE.
DO IF $CASENUM=1.
PRINT /"'any'" $TIME.
END IF.

compute mamm03 =  index(concat(test1,
                         'x',test2,
                         'x',test3,
                         'x',test4,
                         'x',test5,
                         'x',test6,
                         'x',test7,
                         'x',test8,
                         'x',test9,
                         'x',test10),"50000") gt 0.
EXE.
DO IF $CASENUM=1.
PRINT / "'concat'" $TIME.
END IF.
compute mamm04=0.
do repeat test = test1 to test10.
+   if (test = " 50000 ") mamm04 = mamm04 + 1.
end repeat.
EXE.
DO IF $CASENUM=1.
PRINT /"'repeat'" $TIME.
END IF.
execute.

----------
new file.
dataset name timediff.
data list list/ what (a9) clocktime (f11).
begin data
'start'         13517843692
'loop'         13517843703
'any'         13517843717
'concat'         13517843724
'repeat'         13517843728
end data.
compute elapsed = clocktime - lag(clocktime).
format elapsed (f5).
list.




On 2/22/2011 4:43 PM, David Marso wrote:
<snip>
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/scratch-variables-in-do-repeat-loop-tp3395727p3396203.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
In reply to this post by David Marso
I did not change that part of the syntax.  My runs were for 1 million cases!

I watched the meter the whole process got up to almost 30% at times during the run.

 I was using exactly your syntax except for the two changes I mentioned.
I changed the format for printing the time to show more precision. The syntax is below.

 The new syntax added your two suggested additional methods and took the TRIMs off the ANY..  It is below my sig.

loop 1 is the old loop approach.   "Loop 2" and "Loop 3" are the two in David's post. "Any 2" matches the complete 7 character variables.
Clearly the TRIMs make a difference.

what                 clocktime              elapsed
 
start        3754960:41:15.314                 .
loop 1       3754960:41:26.517          0:00:11.203
loop 2       3754960:41:29.376          0:00:02.859
loop 3       3754960:41:32.173          0:00:02.797
any          3754960:41:47.423          0:00:15.250
any 2        3754960:41:48.658          0:00:01.235
concat       3754960:41:55.470          0:00:06.812
repeat       3754960:41:58.298          0:00:02.828


I'll be on the road and I'll put the new syntax on my laptop. It should take longer since it only has two processors.

Art Kendall
Social Research Consultants

new file.
dataset name timediff.
data list list/ what (a9) clocktime (time20.3).
begin data
'start'   3754960:41:15.314
'loop 1'   3754960:41:26.517
'loop 2'   3754960:41:29.376
'loop 3'   3754960:41:32.173
'any'   3754960:41:47.423
'any 2'   3754960:41:48.658
'concat'   3754960:41:55.470
'repeat'   3754960:41:58.298
end data.
compute elapsed = clocktime - lag(clocktime).
format elapsed (time20.3).
list.


--------

new file.
dataset name biggie.
input program.
VECTOR test(10,A7).
loop #i = 1 to 1000000.
DO REPEAT t=test1 to test10.
compute t=concat(" ",string(trunc(uniform(99999)),F5)," ").
compute #=uniform(1).
if # > .90 t=" 50000 ".
END REPEAT.
end case.
end loop.
end file.
end input program.
exe.

DO IF $CASENUM=1.
PRINT /"'start'"   $time (time20.3).
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=LTRIM(RTRIM(test(#)))="50000".
+  DO IF MAMM01.
+    COMPUTE last01=#.
+    BREAK.
+  END IF.
END LOOP.
exe.

DO IF $CASENUM=1.
PRINT /"'loop 1'"   $time (time20.3).
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=test(#)=" 50000 ".
+  DO IF MAMM01.
+    BREAK.
+  END IF.
END LOOP.
exe.

DO IF $CASENUM=1.
PRINT /"'loop 2'"   $time (time20.3).
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  DO IF test(#)=" 50000 ".
+  COMPUTE MAMM01=1.
+    BREAK.
+  END IF.
END LOOP.
EXECUTE.

DO IF $CASENUM=1.
PRINT /"'loop 3'"   $time (time20.3).
END IF.
compute mamm02 = any("50000", ltrim(rtrim(test1)),
                              ltrim(rtrim(test2)),
                              ltrim(rtrim(test3)),
                              ltrim(rtrim(test4)),
                              ltrim(rtrim(test5)),
                              ltrim(rtrim(test6)),
                              ltrim(rtrim(test7)),
                              ltrim(rtrim(test8)),
                              ltrim(rtrim(test9)),
                              ltrim(rtrim(test10))     ).
EXE.
DO IF $CASENUM=1.
PRINT /"'any'"   $time (time20.3).
END IF.
compute mamm02 = any(" 50000 ",test1 to test10).
EXE.
DO IF $CASENUM=1.
PRINT /"'any 2'"   $time (time20.3).
END IF.

compute mamm03 =  index(concat(test1,
                         'x',test2,
                         'x',test3,
                         'x',test4,
                         'x',test5,
                         'x',test6,
                         'x',test7,
                         'x',test8,
                         'x',test9,
                         'x',test10),"50000") gt 0.
EXE.
DO IF $CASENUM=1.
PRINT / "'concat'"   $time (time20.3).
END IF.
compute mamm04=0.
do repeat test = test1 to test10.
+   if (test = " 50000 ") mamm04 = mamm04 + 1.
end repeat.
EXE.
DO IF $CASENUM=1.
PRINT /"'repeat'"   $time (time20.3).
END IF.
execute.




On 2/23/2011 2:14 PM, David Marso wrote:
Looks like LTRIM and RTRIM ***really*** slow it down.
So if one knows that there are leading or trailing spaces accommodate that
in the code directly!!!
Simple string compare is fast is comparison.
Wonder how another version using NUMBER(test(#),F6) would compare?
I am VERY surprised that the INDEX(CONCAT...) method is this fast.
One other variation would be to use the direct string comparison within the
loop.
Also get rid of the extra compute which was simply a check to verify early
loop termination.
The early termination *should* be faster than the DO REPEAT at least in the
second version below unless there is a *huge* overhead to allocate a
vector!!!
Might also want to run multiple repetitions of the various methods and
counterbalance their order.
are you running 100,000 cases or 1,000,000 cases as in my program?

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=test(#)=" 50000 ".
+  DO IF MAMM01.
+    BREAK.
+  END IF.
END LOOP.

also the following variation!

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  DO IF test(#)=" 50000 "
+  COMPUTE MAMM01=1.
+    BREAK.
+  END IF.
END LOOP.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/scratch-variables-in-do-repeat-loop-tp3395727p3397561.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Bruce Weaver
Administrator
Art Kendall wrote
I did not change that part of the syntax.  My runs were for 1 million cases!

I watched the meter the whole process got up to almost 30% at times during the run.

 I was using exactly your syntax except for the two changes I mentioned.
I changed the format for printing the time to show more precision. The syntax is below.

 The new syntax added your two suggested additional methods and took the TRIMs off the ANY..  It is below my sig.

loop 1 is the old loop approach.   "Loop 2" and "Loop 3" are the two in David's post. "Any 2" matches the complete 7 character variables.
Clearly the TRIMs make a difference.

what                 clocktime              elapsed
 
start        3754960:41:15.314                 .
loop 1       3754960:41:26.517          0:00:11.203
loop 2       3754960:41:29.376          0:00:02.859
loop 3       3754960:41:32.173          0:00:02.797
any          3754960:41:47.423          0:00:15.250
any 2        3754960:41:48.658          0:00:01.235
concat       3754960:41:55.470          0:00:06.812
repeat       3754960:41:58.298          0:00:02.828

I'll be on the road and I'll put the new syntax on my laptop. It should take longer since it only has two processors.

Art Kendall
Social Research Consultants

Art, here are my results for v18 running on a 4 or 5 year-old ThinkPad laptop that is painfully slow, apparently.  ;-)

what                 clocktime              elapsed

start        3754961:07:00.640                 .
loop 1       3754961:07:35.125          0:00:34.485
loop 2       3754961:07:44.812          0:00:09.687
loop 3       3754961:07:54.765          0:00:09.953
any          3754961:08:44.859          0:00:50.094
any 2        3754961:08:49.687          0:00:04.828
concat       3754961:09:13.234          0:00:23.547
repeat       3754961:09:23.468          0:00:10.234


And again, sorted by "elapsed".

what                   elapsed

any 2              0:00:04.828
loop 2             0:00:09.687
loop 3             0:00:09.953
repeat             0:00:10.234
concat             0:00:23.547
loop 1             0:00:34.485
any                0:00:50.094

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: scratch variables in do-repeat loop

Art Kendall
In reply to this post by Art Kendall
From my laptop.

what                 clocktime              elapsed
 
start        3754961:30:51.689                 .
loop 1       3754961:31:12.889          0:00:21.200
loop 2       3754961:31:17.975          0:00:05.086
loop 3       3754961:31:23.091          0:00:05.116
any          3754961:31:52.576          0:00:29.485
any 2        3754961:31:54.697          0:00:02.121
concat       3754961:32:07.505          0:00:12.808
repeat       3754961:32:12.902          0:00:05.397


Art

On 2/23/2011 4:55 PM, Art Kendall wrote:
I did not change that part of the syntax.  My runs were for 1 million cases!

I watched the meter the whole process got up to almost 30% at times during the run.

 I was using exactly your syntax except for the two changes I mentioned.
I changed the format for printing the time to show more precision. The syntax is below.

 The new syntax added your two suggested additional methods and took the TRIMs off the ANY..  It is below my sig.

loop 1 is the old loop approach.   "Loop 2" and "Loop 3" are the two in David's post. "Any 2" matches the complete 7 character variables.
Clearly the TRIMs make a difference.

what                 clocktime              elapsed
 
start        3754960:41:15.314                 .
loop 1       3754960:41:26.517          0:00:11.203
loop 2       3754960:41:29.376          0:00:02.859
loop 3       3754960:41:32.173          0:00:02.797
any          3754960:41:47.423          0:00:15.250
any 2        3754960:41:48.658          0:00:01.235
concat       3754960:41:55.470          0:00:06.812
repeat       3754960:41:58.298          0:00:02.828


I'll be on the road and I'll put the new syntax on my laptop. It should take longer since it only has two processors.

Art Kendall
Social Research Consultants

new file.
dataset name timediff.
data list list/ what (a9) clocktime (time20.3).
begin data
'start'   3754960:41:15.314
'loop 1'   3754960:41:26.517
'loop 2'   3754960:41:29.376
'loop 3'   3754960:41:32.173
'any'   3754960:41:47.423
'any 2'   3754960:41:48.658
'concat'   3754960:41:55.470
'repeat'   3754960:41:58.298
end data.
compute elapsed = clocktime - lag(clocktime).
format elapsed (time20.3).
list.


--------

new file.
dataset name biggie.
input program.
VECTOR test(10,A7).
loop #i = 1 to 1000000.
DO REPEAT t=test1 to test10.
compute t=concat(" ",string(trunc(uniform(99999)),F5)," ").
compute #=uniform(1).
if # > .90 t=" 50000 ".
END REPEAT.
end case.
end loop.
end file.
end input program.
exe.

DO IF $CASENUM=1.
PRINT /"'start'"   $time (time20.3).
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=LTRIM(RTRIM(test(#)))="50000".
+  DO IF MAMM01.
+    COMPUTE last01=#.
+    BREAK.
+  END IF.
END LOOP.
exe.

DO IF $CASENUM=1.
PRINT /"'loop 1'"   $time (time20.3).
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=test(#)=" 50000 ".
+  DO IF MAMM01.
+    BREAK.
+  END IF.
END LOOP.
exe.

DO IF $CASENUM=1.
PRINT /"'loop 2'"   $time (time20.3).
END IF.
VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  DO IF test(#)=" 50000 ".
+  COMPUTE MAMM01=1.
+    BREAK.
+  END IF.
END LOOP.
EXECUTE.

DO IF $CASENUM=1.
PRINT /"'loop 3'"   $time (time20.3).
END IF.
compute mamm02 = any("50000", ltrim(rtrim(test1)),
                              ltrim(rtrim(test2)),
                              ltrim(rtrim(test3)),
                              ltrim(rtrim(test4)),
                              ltrim(rtrim(test5)),
                              ltrim(rtrim(test6)),
                              ltrim(rtrim(test7)),
                              ltrim(rtrim(test8)),
                              ltrim(rtrim(test9)),
                              ltrim(rtrim(test10))     ).
EXE.
DO IF $CASENUM=1.
PRINT /"'any'"   $time (time20.3).
END IF.
compute mamm02 = any(" 50000 ",test1 to test10).
EXE.
DO IF $CASENUM=1.
PRINT /"'any 2'"   $time (time20.3).
END IF.

compute mamm03 =  index(concat(test1,
                         'x',test2,
                         'x',test3,
                         'x',test4,
                         'x',test5,
                         'x',test6,
                         'x',test7,
                         'x',test8,
                         'x',test9,
                         'x',test10),"50000") gt 0.
EXE.
DO IF $CASENUM=1.
PRINT / "'concat'"   $time (time20.3).
END IF.
compute mamm04=0.
do repeat test = test1 to test10.
+   if (test = " 50000 ") mamm04 = mamm04 + 1.
end repeat.
EXE.
DO IF $CASENUM=1.
PRINT /"'repeat'"   $time (time20.3).
END IF.
execute.




On 2/23/2011 2:14 PM, David Marso wrote:
Looks like LTRIM and RTRIM ***really*** slow it down.
So if one knows that there are leading or trailing spaces accommodate that
in the code directly!!!
Simple string compare is fast is comparison.
Wonder how another version using NUMBER(test(#),F6) would compare?
I am VERY surprised that the INDEX(CONCAT...) method is this fast.
One other variation would be to use the direct string comparison within the
loop.
Also get rid of the extra compute which was simply a check to verify early
loop termination.
The early termination *should* be faster than the DO REPEAT at least in the
second version below unless there is a *huge* overhead to allocate a
vector!!!
Might also want to run multiple repetitions of the various methods and
counterbalance their order.
are you running 100,000 cases or 1,000,000 cases as in my program?

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  COMPUTE MAMM01=test(#)=" 50000 ".
+  DO IF MAMM01.
+    BREAK.
+  END IF.
END LOOP.

also the following variation!

VECTOR test=test1 to test10.
LOOP #=1 to 10.
+  DO IF test(#)=" 50000 "
+  COMPUTE MAMM01=1.
+    BREAK.
+  END IF.
END LOOP.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/scratch-variables-in-do-repeat-loop-tp3395727p3397561.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants