Interesting bit of code ;-)

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Interesting bit of code ;-)

David Marso
Administrator
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Jon K Peck
Or use the SPSSINC CREATE DUMMIES extension command.  It can do multiple variables in some command, doesn't require specification of the categories, and it makes appropriate variable labels, too.  It can even do 2- and 3-way interaction terms.

Example:
SPSSINC CREATE DUMMIES VARIABLE=y
ROOTNAME = ydummies
/OPTIONS ORDER=A USEVALUELABELS=YES
MACRONAME="!ydummies" OMITFIRST=YES.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        02/01/2012 01:39 PM
Subject:        [SPSSX-L] Interesting bit of code ;-)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
    / G4 (2=1) INTO DX2
    / G4 (3=1) INTO DX3
    / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Interesting-bit-of-code-tp5448698p5448698.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Marks, Jim
In reply to this post by David Marso
Also interesting-- DO REPEAT + TO keyword instead of multiple DO IF
statements:

new file.
data list free /g10 (f8.0).
begin data
 1 2 3 4 5 6 7 8 9 10
End data.

Dataset name g10 window = front.
list.

do repeat x = dx1 to dx10 /y = 1 TO 10.
COMPUTE x = 0.
if g10 = y x = 1.
end repeat.
list.



Jim Marks
Director, Market Research
x1616


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Wednesday, February 01, 2012 2:30 PM
To: [hidden email]
Subject: Interesting bit of code ;-)

DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Interesting-bit-of-code-tp
5448698p5448698.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

David Marso
Administrator
In reply to this post by Jon K Peck
I had a feeling ;-)
---
OTOH:  I have been using SPSS for some 25+ years and had never thought
to use a single RECODE to map a single variable onto 3 new ones and to
then RECODE these new variables which are defined on the same recode
to resolve missing values.
Hats off to Tex Hull, Jon Fry, Bill Hoskins and I'm sure others who
designed the elegant guts of this crazy program ;-))!

Jon K Peck wrote
Or use the SPSSINC CREATE DUMMIES extension command.  It can do multiple
variables in some command, doesn't require specification of the
categories, and it makes appropriate variable labels, too.  It can even do
2- and 3-way interaction terms.

Example:
SPSSINC CREATE DUMMIES VARIABLE=y
ROOTNAME = ydummies
/OPTIONS ORDER=A USEVALUELABELS=YES
MACRONAME="!ydummies" OMITFIRST=YES.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:   David Marso <[hidden email]>
To:     [hidden email]
Date:   02/01/2012 01:39 PM
Subject:        [SPSSX-L] Interesting bit of code ;-)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Interesting-bit-of-code-tp5448698p5448698.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

David Marso
Administrator
In reply to this post by Marks, Jim
Or :
do repeat x = dx1 to dx10 /y = 1 TO 10.
COMPUTE x = (g10 EQ y).
end repeat.
> list.
>

On Wed, Feb 1, 2012 at 5:06 PM, Marks, Jim [via SPSSX Discussion]
<[hidden email]> wrote:

> Also interesting-- DO REPEAT + TO keyword instead of multiple DO IF
> statements:
>
> new file.
> data list free /g10 (f8.0).
> begin data
>  1 2 3 4 5 6 7 8 9 10
> End data.
>
> Dataset name g10 window = front.
> list.
>
> do repeat x = dx1 to dx10 /y = 1 TO 10.
> COMPUTE x = 0.
> if g10 = y x = 1.
> end repeat.
> list.
>
>
>
> Jim Marks
> Director, Market Research
> x1616
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> David Marso
> Sent: Wednesday, February 01, 2012 2:30 PM
> To: [hidden email]
> Subject: Interesting bit of code ;-)
>
> DUMMY Variables...
> --------------------
> *G4 exists in the data file and has integer values between 1 and 4.
> **COMMENTS??**.
> RECODE G4 (1=1) INTO DX1
>      / G4 (2=1) INTO DX2
>      / G4 (3=1) INTO DX3
>      / DX1 DX2 DX3 (MISSING=0).
>
> OTOH:  The following is more concise with large number of groups ;-)
>
> NUMERIC DX1 TO DX4 (F1).
> RECODE DX1 TO DX4 (ELSE=0).
> VECTOR DX=DX1 TO DX4.
> COMPUTE DX(G4)=1.
>
> The following IMNSHO is abysmal.
> DO IF G4=1.
> +  COMPUTE DX1=1.
> ELSE IF G4=2.
> +  COMPUTE DX2=1.
> ELSE IF G4=3.
> +  COMPUTE DX3=1.
> END IF.
> RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Interesting-bit-of-code-tp
> 5448698p5448698.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://spssx-discussion.1045642.n5.nabble.com/Interesting-bit-of-code-tp5448698p5448975.html
> To unsubscribe from Interesting bit of code ;-), click here.
> NAML
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Bruce Weaver
Administrator
In reply to this post by David Marso
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

David Marso
Administrator
OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the interesting way that RECODE allows multiple vars to be created from one variable and the ability to subsequently recode these new variables in the single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.
 
  18  0 +COMPUTE        DX1 = G4 EQ 1
  19  0 +COMPUTE        DX2 = G4 EQ 2
  20  0 +COMPUTE        DX3 = G4 EQ 3
  21  0 +COMPUTE        DX4 = G4 EQ 4
 
LIST.
Bruce Weaver wrote
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Jon K Peck
There would not be any significant difference between DO REPEAT and unrolling the computation into specific computes.  DO REPEAT saves a little parsing time, but that would be trivial compared to the calculation time in most cases.

RECODE is generally much more efficient than separate COMPUTEs or a DO IF approach.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        02/01/2012 04:07 PM
Subject:        Re: [SPSSX-L] Interesting bit of code ;-)
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the
interesting way that RECODE allows multiple vars to be created from one
variable and the ability to subsequently recode these new variables in the
single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.

 18  0 +COMPUTE        DX1 = G4 EQ 1
 19  0 +COMPUTE        DX2 = G4 EQ 2
 20  0 +COMPUTE        DX3 = G4 EQ 3
 21  0 +COMPUTE        DX4 = G4 EQ 4

LIST.

Bruce Weaver wrote
>
> I like DO-REPEAT for this task.  It's very transparent, I think, and not
> significantly more likely to cause RSI than the other methods you show .
> ;-)
>
> DATA LIST FREE / g4 (F1).
> BEGIN DATA
> 1 2 3 4
> END DATA.
>
> NUMERIC dx1 TO dx4 (F1).
> DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
> - COMPUTE dx = g4 EQ #.
> END REPEAT.
> LIST.
>
>
>
> David Marso wrote
>>
>> DUMMY Variables...
>> --------------------
>> *G4 exists in the data file and has integer values between 1 and 4.
>> **COMMENTS??**.
>> RECODE G4 (1=1) INTO DX1
>>      / G4 (2=1) INTO DX2
>>      / G4 (3=1) INTO DX3
>>      / DX1 DX2 DX3 (MISSING=0).
>>
>> OTOH:  The following is more concise with large number of groups ;-)
>>
>> NUMERIC DX1 TO DX4 (F1).
>> RECODE DX1 TO DX4 (ELSE=0).
>> VECTOR DX=DX1 TO DX4.
>> COMPUTE DX(G4)=1.
>>
>> The following IMNSHO is abysmal.
>> DO IF G4=1.
>> +  COMPUTE DX1=1.
>> ELSE IF G4=2.
>> +  COMPUTE DX2=1.
>> ELSE IF G4=3.
>> +  COMPUTE DX3=1.
>> END IF.
>> RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
>>
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Interesting-bit-of-code-tp5448698p5449105.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Bruce Weaver
Administrator
In reply to this post by David Marso
You're right about efficiency.  But unless one is working with a HUGE data file, the difference will likely be imperceptible (to the human eye, at least).  And in that case, transparency should trump efficiency.  

I feel like I'm stealing material from Art Kendall here.  ;-)

p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again).


David Marso wrote
OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the interesting way that RECODE allows multiple vars to be created from one variable and the ability to subsequently recode these new variables in the single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.
 
  18  0 +COMPUTE        DX1 = G4 EQ 1
  19  0 +COMPUTE        DX2 = G4 EQ 2
  20  0 +COMPUTE        DX3 = G4 EQ 3
  21  0 +COMPUTE        DX4 = G4 EQ 4
 
LIST.
Bruce Weaver wrote
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

David Marso
Administrator
Yeah, BUT I find the following to be as transparent as glass ;-).
ONE compute per case rather than 50 and NO logical comparison required.
NUMERIC state_01 TO state_50 (F1).
RECODE state_01 TO state_50 (ELSE=0).
VECTOR state_dummy=state_01 TO state_50.
COMPUTE state_dummy(state)=1.

"p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again)."
Glad to be of service!
--
** You can also use:
END REPEAT NOPRINT (but WTF? there for the sake of completeness???).
--
Bruce Weaver wrote
You're right about efficiency.  But unless one is working with a HUGE data file, the difference will likely be imperceptible (to the human eye, at least).  And in that case, transparency should trump efficiency.  

I feel like I'm stealing material from Art Kendall here.  ;-)

p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again).


David Marso wrote
OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the interesting way that RECODE allows multiple vars to be created from one variable and the ability to subsequently recode these new variables in the single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.
 
  18  0 +COMPUTE        DX1 = G4 EQ 1
  19  0 +COMPUTE        DX2 = G4 EQ 2
  20  0 +COMPUTE        DX3 = G4 EQ 3
  21  0 +COMPUTE        DX4 = G4 EQ 4
 
LIST.
Bruce Weaver wrote
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Bruce Weaver
Administrator
Yes, it becomes more transparent the longer one looks at it.  I was just thinking about how to extend it to the case of a factorial design (only for folks with old versions that won't support the Python-based dummy variable generator, of course).  Something like this, I suppose.

* Generate A and B variables for a 3x4 factorial design.

data list free / a b (2f1).
begin data
1 1  1 2  1 3  1 4
2 1  2 2  2 3  2 4
3 1  3 2  3 3  3 4
end data.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=
  /maxB 'Max value of B'=MAX(B).
FORMATS a b maxB (f1.0).

* Now generate indicator variables for A, B, and A*B .

NUMERIC A1 TO A3 B1 TO B4
        A1B1 A1B2 A1B3 A1B4
        A2B1 A2B2 A2B3 A2B4
        A3B1 A3B2 A3B3 A3B4 (F1).
RECODE A1 TO A3B4 (ELSE=0). /* Initialize all indicators to 0.
VECTOR AV = A1 TO A3 / BV = B1 TO B4 / ABV = A1B1 TO A3B4 .
COMPUTE AV(A) = 1.
COMPUTE BV(B) = 1.
COMPUTE ABV((A-1)*maxB+B) = 1. /* Note the use of maxB here .
LIST A B A1 to A3B4.



David Marso wrote
Yeah, BUT I find the following to be as transparent as glass ;-).
ONE compute per case rather than 50 and NO logical comparison required.
NUMERIC state_01 TO state_50 (F1).
RECODE state_01 TO state_50 (ELSE=0).
VECTOR state_dummy=state_01 TO state_50.
COMPUTE state_dummy(state)=1.

"p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again)."
Glad to be of service!
--
** You can also use:
END REPEAT NOPRINT (but WTF? there for the sake of completeness???).
--
Bruce Weaver wrote
You're right about efficiency.  But unless one is working with a HUGE data file, the difference will likely be imperceptible (to the human eye, at least).  And in that case, transparency should trump efficiency.  

I feel like I'm stealing material from Art Kendall here.  ;-)

p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again).


David Marso wrote
OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the interesting way that RECODE allows multiple vars to be created from one variable and the ability to subsequently recode these new variables in the single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.
 
  18  0 +COMPUTE        DX1 = G4 EQ 1
  19  0 +COMPUTE        DX2 = G4 EQ 2
  20  0 +COMPUTE        DX3 = G4 EQ 3
  21  0 +COMPUTE        DX4 = G4 EQ 4
 
LIST.
Bruce Weaver wrote
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Bruce Weaver
Administrator
Here's a small improvement to the NUMERIC command used to generate the indicator variables:

NUMERIC
  A1 TO A3
  B1 TO B4
  A1B1 TO A1B4
  A2B1 TO A2B4
  A3B1 TO A3B4 (F1).


Bruce Weaver wrote
Yes, it becomes more transparent the longer one looks at it.  I was just thinking about how to extend it to the case of a factorial design (only for folks with old versions that won't support the Python-based dummy variable generator, of course).  Something like this, I suppose.

* Generate A and B variables for a 3x4 factorial design.

data list free / a b (2f1).
begin data
1 1  1 2  1 3  1 4
2 1  2 2  2 3  2 4
3 1  3 2  3 3  3 4
end data.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=
  /maxB 'Max value of B'=MAX(B).
FORMATS a b maxB (f1.0).

* Now generate indicator variables for A, B, and A*B .

NUMERIC A1 TO A3 B1 TO B4
        A1B1 A1B2 A1B3 A1B4
        A2B1 A2B2 A2B3 A2B4
        A3B1 A3B2 A3B3 A3B4 (F1).
RECODE A1 TO A3B4 (ELSE=0). /* Initialize all indicators to 0.
VECTOR AV = A1 TO A3 / BV = B1 TO B4 / ABV = A1B1 TO A3B4 .
COMPUTE AV(A) = 1.
COMPUTE BV(B) = 1.
COMPUTE ABV((A-1)*maxB+B) = 1. /* Note the use of maxB here .
LIST A B A1 to A3B4.



David Marso wrote
Yeah, BUT I find the following to be as transparent as glass ;-).
ONE compute per case rather than 50 and NO logical comparison required.
NUMERIC state_01 TO state_50 (F1).
RECODE state_01 TO state_50 (ELSE=0).
VECTOR state_dummy=state_01 TO state_50.
COMPUTE state_dummy(state)=1.

"p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again)."
Glad to be of service!
--
** You can also use:
END REPEAT NOPRINT (but WTF? there for the sake of completeness???).
--
Bruce Weaver wrote
You're right about efficiency.  But unless one is working with a HUGE data file, the difference will likely be imperceptible (to the human eye, at least).  And in that case, transparency should trump efficiency.  

I feel like I'm stealing material from Art Kendall here.  ;-)

p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again).


David Marso wrote
OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the interesting way that RECODE allows multiple vars to be created from one variable and the ability to subsequently recode these new variables in the single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.
 
  18  0 +COMPUTE        DX1 = G4 EQ 1
  19  0 +COMPUTE        DX2 = G4 EQ 2
  20  0 +COMPUTE        DX3 = G4 EQ 3
  21  0 +COMPUTE        DX4 = G4 EQ 4
 
LIST.
Bruce Weaver wrote
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

David Marso
Administrator
In reply to this post by Bruce Weaver
And for a real mind-blower consider this MATRIX code ;-))
----
data list free / a b (2f1).
begin data
1 1  1 2  1 3  1 4
2 1  2 2  2 3  2 4
3 1  3 2  3 3  3 4
end data.
MATRIX.
GET X /VAR a b.
COMPUTE DESIGNX={X,DESIGN(X),KRONEKER(IDENT(CMAX(X(:,1))),IDENT(CMAX(X(:,2))))}.
SAVE DESIGNX / OUTFILE *.
END MATRIX.
-----
Bruce Weaver wrote
Yes, it becomes more transparent the longer one looks at it.  I was just thinking about how to extend it to the case of a factorial design (only for folks with old versions that won't support the Python-based dummy variable generator, of course).  Something like this, I suppose.

* Generate A and B variables for a 3x4 factorial design.

data list free / a b (2f1).
begin data
1 1  1 2  1 3  1 4
2 1  2 2  2 3  2 4
3 1  3 2  3 3  3 4
end data.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=
  /maxB 'Max value of B'=MAX(B).
FORMATS a b maxB (f1.0).

* Now generate indicator variables for A, B, and A*B .

NUMERIC A1 TO A3 B1 TO B4
        A1B1 A1B2 A1B3 A1B4
        A2B1 A2B2 A2B3 A2B4
        A3B1 A3B2 A3B3 A3B4 (F1).
RECODE A1 TO A3B4 (ELSE=0). /* Initialize all indicators to 0.
VECTOR AV = A1 TO A3 / BV = B1 TO B4 / ABV = A1B1 TO A3B4 .
COMPUTE AV(A) = 1.
COMPUTE BV(B) = 1.
COMPUTE ABV((A-1)*maxB+B) = 1. /* Note the use of maxB here .
LIST A B A1 to A3B4.



David Marso wrote
Yeah, BUT I find the following to be as transparent as glass ;-).
ONE compute per case rather than 50 and NO logical comparison required.
NUMERIC state_01 TO state_50 (F1).
RECODE state_01 TO state_50 (ELSE=0).
VECTOR state_dummy=state_01 TO state_50.
COMPUTE state_dummy(state)=1.

"p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again)."
Glad to be of service!
--
** You can also use:
END REPEAT NOPRINT (but WTF? there for the sake of completeness???).
--
Bruce Weaver wrote
You're right about efficiency.  But unless one is working with a HUGE data file, the difference will likely be imperceptible (to the human eye, at least).  And in that case, transparency should trump efficiency.  

I feel like I'm stealing material from Art Kendall here.  ;-)

p.s. - I'd never noticed that one can use PRINT like that on END REPEAT.  Thanks for educating me (once again).


David Marso wrote
OTOH It is less efficient than the VECTOR approach.
N computes rather than 1.
I'm not sure about the RECODE WRT processing efficiency.  My point was the interesting way that RECODE allows multiple vars to be created from one variable and the ability to subsequently recode these new variables in the single recode statement.
--
Remember DO REPEAT cycles through the entire list of stand in 'variables'.
--
NUMERIC dx1 TO dx4 (F1).

DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT PRINT.
 
  18  0 +COMPUTE        DX1 = G4 EQ 1
  19  0 +COMPUTE        DX2 = G4 EQ 2
  20  0 +COMPUTE        DX3 = G4 EQ 3
  21  0 +COMPUTE        DX4 = G4 EQ 4
 
LIST.
Bruce Weaver wrote
I like DO-REPEAT for this task.  It's very transparent, I think, and not significantly more likely to cause RSI than the other methods you show .  ;-)

DATA LIST FREE / g4 (F1).
BEGIN DATA
1 2 3 4
END DATA.

NUMERIC dx1 TO dx4 (F1).
DO REPEAT dx = dx1 to dx4 / # = 1 to 4 .
- COMPUTE dx = g4 EQ #.
END REPEAT.
LIST.


David Marso wrote
DUMMY Variables...
--------------------
*G4 exists in the data file and has integer values between 1 and 4.
**COMMENTS??**.
RECODE G4 (1=1) INTO DX1
     / G4 (2=1) INTO DX2
     / G4 (3=1) INTO DX3
     / DX1 DX2 DX3 (MISSING=0).

OTOH:  The following is more concise with large number of groups ;-)

NUMERIC DX1 TO DX4 (F1).
RECODE DX1 TO DX4 (ELSE=0).
VECTOR DX=DX1 TO DX4.
COMPUTE DX(G4)=1.

The following IMNSHO is abysmal.
DO IF G4=1.
+  COMPUTE DX1=1.
ELSE IF G4=2.
+  COMPUTE DX2=1.
ELSE IF G4=3.
+  COMPUTE DX3=1.
END IF.
RECODE DX1 TO DX4 (MISSING=0)(ELSE=COPY).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interesting bit of code ;-)

Garry Gelade
In reply to this post by Bruce Weaver
> Bruce Weaver wrote
>>
>> You're right about efficiency.  But unless one is working with a HUGE
>> data file, the difference will likely be imperceptible (to the human eye,
>> at least).  And in that case, transparency should trump efficiency.

I agree. In fact transparency should **ALWAYS** trump efficiency except in
extreme cases.

On my first course in commercial programming, the instructor began by saying
"Your objective as a professional programmer should be to write clear and
simple code."

It's advice I have no hesitation in passing on. The time spent in condensing
code to an irreducible and indecipherable minimum is usually time wasted,
and rarely necessary in an SPSS context. Writing transparent code that can
be easily understood and maintained (by oneself and others) is far more
important. Geeks who delight in producing obscure compressed code don't
usually last very long in a professional programming team. Their colleagues
soon get fed up with trying to decipher what they've done and working out
where the (inevitable) error is.


Garry

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD