SAS-Code -> SPSS-Code = spaghetti code ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SAS-Code -> SPSS-Code = spaghetti code ?

Hans-Peter Oskamp
Hi,

I'm new to SPSS and habe some experience in SAS.
I managed to "translate" below SAS-Statements into SPSS (i.e. correct result) but my SPSS-Code looks like spaghetti code compared to the - in my eyes - quite readable SAS-Code (multiple Match Statements for 1 step in SAS).

I'd like to ask if somebody knows a way to code one or both of the two tasks below in SPSS/Python that is quite elegant in terms of readability.



/* TabA & TabB unique by sorted by <key1,key2> */
/* create inner join &  2 result files wit the rows that are not matched, keep only certain vars*/

Data in_A_and_in_B
        in_A_and_not_in_B
        not_in_A_and_in_B
;
SET TabA (IN=A KEEP= key1 key2 var1a var2a)
        TabB (IN=B KEEP= key1 key2 var1a var2a)
;
BY key1 key2;
IF A AND B THEN OUTPUT in_A_and_in_B;
ELSE IF A THEN OUTPUT in_A_and_not_in_B;
ELSE OUTPUT not_in_A_and_in_B;
END;
RUN;



/* cartesian product with SAS, TabC & TabD not unique by <key1,key2> */

PROC SQL;
CREATE TABLE cartesian AS
SELECT C.* D.*
FROM   TabC, TabD
WHERE  TabC.key1 = TabD.key1
   AND  TabC.key2 = TabD.key2
;
QUIT;
Reply | Threaded
Open this post in threaded view
|

Re: SAS-Code -> SPSS-Code I. Combining files

Richard Ristow
At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:

I managed to "translate" below SAS-Statements into SPSS (i.e. correct
result) but my SPSS-Code looks like spaghetti code. I'd like to ask if somebody knows a way to code one or both of the two tasks below in SPSS/Python that is quite elegant in terms of readability.

SAS task I: combining files.

/* TabA & TabB unique by sorted by <key1,key2>       */
/* create inner join &  2 result files with the      */
/* rows that are not matched, keep only certain vars */

Data    in_A_and_in_B
        in_A_and_not_in_B
        not_in_A_and_in_B;
SET     TabA (IN=A KEEP= key1 key2 var1a var2a)
        TabB (IN=B KEEP= key1 key2 var1a var2a)
;
BY key1 key2;
IF      A AND B THEN OUTPUT   in_A_and_in_B;
ELSE IF A       THEN OUTPUT   in_A_and_not_in_B;
ELSE                 OUTPUT   not_in_A_and_in_B;
END;
RUN;

This SAS code may not be right. First, is the "END;" statement valid here? Second, I believe it will result in "in_A_and_not_in_B" having the same records as "TabA"; "not_in_A_and_in_B" having the same records as "TabB"; and "in_A_and_in_B" being empty.

However, one can often translate this kind of SAS code to SPSS by using SPSS statement "ADD FILES" for SAS "SET"; "XSAVE" for "OUTPUT"; and "EXECUTE" for "RUN", like this (code not tested):

ADD FILES
   /FILE=TabA /IN=A
   /FILE=TabB /IN=B
   /BY   key1 key2
   /KEEP=key1 key2 var1a var2a.

DO IF    A AND B.
.  XSAVE OUTFILE=in_A_and_in_B     /DROP=A B.
ELSE IF  A.
.  XSAVE OUTFILE=in_A_and_not_in_B /DROP=A B.
ELSE.
.  XSAVE OUTFILE=not_in_A_and_in_B /DROP=A B.
END IF.
EXECUTE /* which IS needed, here */.

In this code, "in_A_and_in_B", "in_A_and_not_in_B", and "not_in_A_and_in_B" must be files, not SPSS datasets. You can use the SPSS "FILE HANDLE" statement to associate a name to a complete data path, filename, and extension.

-Best of luck,
 Richard Ristow
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SAS-Code -> SPSS-Code I. Combining files

David Marso
Administrator
ADD FILES
   /FILE=TabA
   /IN=A
   /FILE=TabB
   /IN=B
   /BY  key1 key2
   /KEEP=key1 key2 var1a var2a.

DO IF  A AND B.
.....

I suspect that ADD FILES needs to be replaced with MATCH FILES for the flags
to end up on the same records.



On Sun, 17 Oct 2010 03:35:04 -0400, Richard Ristow <[hidden email]>
wrote:

><html>
><body>
>At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:<br><br>
><blockquote type=cite class=cite cite="">I managed to
>&quot;translate&quot; below SAS-Statements into SPSS (i.e. correct<br>
>result) but my SPSS-Code looks like spaghetti code. I'd like to ask if
>somebody knows a way to code one or both of the two tasks below in
>SPSS/Python that is quite elegant in terms of
>readability.</blockquote><br>
><b>SAS task I:</b> combining files. <br><br>
><blockquote type=cite class=cite cite=""><font size=2>/* TabA &amp; TabB
>unique by sorted by &lt;key1,key2&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>*/<br>
>/* create inner join &amp;&nbsp; 2 result files with
>the&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; */<br>
>/* rows that are not matched, keep only certain vars */<br><br>
>Data&nbsp;&nbsp;&nbsp; in_A_and_in_B<br>
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; in_A_and_not_in_B<br>
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; not_in_A_and_in_B;<br>
>SET&nbsp;&nbsp;&nbsp;&nbsp; TabA (IN=A KEEP= key1 key2 var1a var2a)<br>
>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TabB (IN=B KEEP= key1 key2
>var1a var2a)<br>
>;<br>
>BY key1 key2;<br>
>IF&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A AND B THEN OUTPUT&nbsp;&nbsp;
>in_A_and_in_B;<br>
>ELSE IF A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; THEN OUTPUT&nbsp;&nbsp;
>in_A_and_not_in_B;<br>
>ELSE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>OUTPUT&nbsp;&nbsp; not_in_A_and_in_B;<br>
>END;<br>
>RUN;</font></blockquote><br>
>This SAS code may not be right. First, is the
>&quot;<font size=2>END;</font>&quot; statement valid here? Second, I
>believe it will result in
>&quot;<font size=2>in_A_and_not_in_B</font>&quot; having the same records
>as &quot;<font size=2>TabA</font>&quot;;
>&quot;<font size=2>not_in_A_and_in_B</font>&quot; having the same records
>as &quot;<font size=2>TabB</font>&quot;; and
>&quot;<font size=2>in_A_and_in_B</font>&quot; being empty.<br><br>
>However, one can often translate this kind of SAS code to SPSS by using
>SPSS statement &quot;<font size=2>ADD FILES</font>&quot; for SAS
>&quot;<font size=2>SET</font>&quot;;
>&quot;<font size=2>XSAVE</font>&quot; for
>&quot;<font size=2>OUTPUT</font>&quot;; and
>&quot;<font size=2>EXECUTE</font>&quot; for
>&quot;<font size=2>RUN</font>&quot;, like this (code not
>tested):<br><br>
><font size=2>ADD FILES<br>
>&nbsp;&nbsp; /FILE=TabA /IN=A<br>
>&nbsp;&nbsp; /FILE=TabB /IN=B<br>
>&nbsp;&nbsp; /BY&nbsp;&nbsp; key1 key2<br>
>&nbsp;&nbsp; /KEEP=key1 key2 var1a var2a.<br><br>
>DO IF&nbsp;&nbsp;&nbsp; A AND B.<br>
>.&nbsp; XSAVE OUTFILE=in_A_and_in_B&nbsp;&nbsp;&nbsp;&nbsp; /DROP=A
>B.<br>
>ELSE IF&nbsp; A.<br>
>.&nbsp; XSAVE OUTFILE=in_A_and_not_in_B /DROP=A B.<br>
>ELSE.<br>
>.&nbsp; XSAVE OUTFILE=not_in_A_and_in_B /DROP=A B.<br>
>END IF.<br>
>EXECUTE /* which IS needed, here */.<br><br>
></font>In this code, &quot;<font size=2>in_A_and_in_B</font>&quot;,
>&quot;<font size=2>in_A_and_not_in_B</font>&quot;, and
>&quot;<font size=2>not_in_A_and_in_B</font>&quot; must be <i>files,</i>
>not SPSS datasets. You can use the SPSS &quot;<font size=2>FILE
>HANDLE</font>&quot; statement to associate a name to a complete data
>path, filename, and extension.<br><br>
>-Best of luck,<br>
>&nbsp;Richard Ristow</body>
><br>
></html>
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: SAS-Code -> SPSS-Code I. Combining files

Richard Ristow
At 09:28 AM 10/18/2010, David Marso wrote:

[In code]

ADD FILES
   /FILE=TabA
   /IN=A
   /FILE=TabB
   /IN=B
   /BY  key1 key2
   /KEEP=key1 key2 var1a var2a.

DO IF  A AND B.

I suspect that ADD FILES needs to be replaced with MATCH FILES for the flags to end up on the same records.

Something like that may well be the case; it would correspond to using "MERGE" instead of "SET" in the SAS code.

Hans-Peter, what do you make of this? I posted with "ADD FILES" because, first I didn't want to tweak your logic without being sure I understood what you're trying to do; second, "MERGE" (SAS) or "MATCH FILES" (SPSS) are rarely advisable for input files that have exactly the same sets of variables, as (after the "KEEP" clauses) your inputs do.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SAS-Code -> SPSS-Code II. Inner join

Richard Ristow
In reply to this post by Hans-Peter Oskamp
At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:
At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:

I managed to "translate" below SAS-Statements into SPSS (i.e. correct
result) but my SPSS-Code looks like spaghetti code. I'd like to ask if somebody knows a way to code one or both of the two tasks below in SPSS/Python that is quite elegant in terms of readability.

SAS task II: inner join

/* cartesian product with SAS,           */
/* TabC & TabD not unique by <key1,key2> */

PROC SQL;
CREATE TABLE cartesian AS
SELECT C.* D.*
FROM   TabC, TabD
WHERE  TabC.key1 = TabD.key1
   AND TabC.key2 = TabD.key2
;
QUIT;

In SAS, now, PROC SQL directly supports this operation, the 'inner join' in SQL terminology, and commonly called a 'many-to-many merge' here on the SPSS list. (Strictly, 'cartesian product' applies where there's no equivalent of the 'WHERE' clause.)

SPSS does not have an equivalent built-in facility. However, there is a satisfactory algorithm, which I credit to David Marso and myself. A tested version, with test data, was posted to SPSSX-L as

Date:     Thu, 26 Nov 2009 14:34:54 -0500
From:     Richard Ristow <[hidden email]>
Subject:  Many-to-many merge in SPSS
To:       [hidden email]
X-ELNK-Received-Info: spv=0;
X-ELNK-AV: 0
X-ELNK-Info: sbv=0; sbrc=.0; sbf=0b; sbw=000;
X-Antivirus: AVG for E-mail 8.5.426 [270.14.83/2528]
Content-Type: multipart/mixed; boundary="=======AVGMAIL-244336AA======="

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD