SPSSX Discussion

SAS-Code -> SPSS-Code = spaghetti code ?

Classic

List

Threaded

5 messages Options

Hans-Peter Oskamp

SAS-Code -> SPSS-Code = spaghetti code ?

Hi,

I'm new to SPSS and habe some experience in SAS.
I managed to "translate" below SAS-Statements into SPSS (i.e. correct result) but my SPSS-Code looks like spaghetti code compared to the - in my eyes - quite readable SAS-Code (multiple Match Statements for 1 step in SAS).

I'd like to ask if somebody knows a way to code one or both of the two tasks below in SPSS/Python that is quite elegant in terms of readability.

/* TabA & TabB unique by sorted by <key1,key2> */
/* create inner join & 2 result files wit the rows that are not matched, keep only certain vars*/

Data in_A_and_in_B
in_A_and_not_in_B
not_in_A_and_in_B
;
SET TabA (IN=A KEEP= key1 key2 var1a var2a)
TabB (IN=B KEEP= key1 key2 var1a var2a)
;
BY key1 key2;
IF A AND B THEN OUTPUT in_A_and_in_B;
ELSE IF A THEN OUTPUT in_A_and_not_in_B;
ELSE OUTPUT not_in_A_and_in_B;
END;
RUN;

/* cartesian product with SAS, TabC & TabD not unique by <key1,key2> */

PROC SQL;
CREATE TABLE cartesian AS
SELECT C.* D.*
FROM TabC, TabD
WHERE TabC.key1 = TabD.key1
AND TabC.key2 = TabD.key2
;
QUIT;

Richard Ristow

Re: SAS-Code -> SPSS-Code I. Combining files

At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:

I managed to "translate" below SAS-Statements into SPSS (i.e. correct
result) but my SPSS-Code looks like spaghetti code. I'd like to ask if somebody knows a way to code one or both of the two tasks below in SPSS/Python that is quite elegant in terms of readability.

SAS task I: combining files.

/* TabA & TabB unique by sorted by <key1,key2> */
/* create inner join & 2 result files with the */
/* rows that are not matched, keep only certain vars */

Data in_A_and_in_B
 in_A_and_not_in_B
 not_in_A_and_in_B;
SET TabA (IN=A KEEP= key1 key2 var1a var2a)
 TabB (IN=B KEEP= key1 key2 var1a var2a)
;
BY key1 key2;
IF A AND B THEN OUTPUT in_A_and_in_B;
ELSE IF A THEN OUTPUT in_A_and_not_in_B;
ELSE OUTPUT not_in_A_and_in_B;
END;
RUN;

This SAS code may not be right. First, is the "END;" statement valid here? Second, I believe it will result in "in_A_and_not_in_B" having the same records as "TabA"; "not_in_A_and_in_B" having the same records as "TabB"; and "in_A_and_in_B" being empty.

However, one can often translate this kind of SAS code to SPSS by using SPSS statement "ADD FILES" for SAS "SET"; "XSAVE" for "OUTPUT"; and "EXECUTE" for "RUN", like this (code not tested):

ADD FILES
   /FILE=TabA /IN=A
   /FILE=TabB /IN=B
   /BY   key1 key2
   /KEEP=key1 key2 var1a var2a.

DO IF    A AND B.
. XSAVE OUTFILE=in_A_and_in_B     /DROP=A B.
ELSE IF A.
. XSAVE OUTFILE=in_A_and_not_in_B /DROP=A B.
ELSE.
. XSAVE OUTFILE=not_in_A_and_in_B /DROP=A B.
END IF.
EXECUTE /* which IS needed, here */.

In this code, "in_A_and_in_B", "in_A_and_not_in_B", and "not_in_A_and_in_B" must be files, not SPSS datasets. You can use the SPSS "FILE HANDLE" statement to associate a name to a complete data path, filename, and extension.

-Best of luck,
Richard Ristow
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

David Marso

Re: SAS-Code -> SPSS-Code I. Combining files

Administrator

ADD FILES
/FILE=TabA
/IN=A
/FILE=TabB
/IN=B
/BY key1 key2
/KEEP=key1 key2 var1a var2a.

DO IF A AND B.
.....

I suspect that ADD FILES needs to be replaced with MATCH FILES for the flags
to end up on the same records.

On Sun, 17 Oct 2010 03:35:04 -0400, Richard Ristow <[hidden email]>
wrote:

><html>
><body>
>At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote: 
><blockquote type=cite class=cite cite="">I managed to
>"translate" below SAS-Statements into SPSS (i.e. correct 
>result) but my SPSS-Code looks like spaghetti code. I'd like to ask if
>somebody knows a way to code one or both of the two tasks below in
>SPSS/Python that is quite elegant in terms of
>readability.</blockquote> 
>SAS task I: combining files. 
><blockquote type=cite class=cite cite="">/* TabA & TabB
>unique by sorted by <key1,key2>      
>*/ 
>/* create inner join &  2 result files with
>the      */ 
>/* rows that are not matched, keep only certain vars */ 
>Data    in_A_and_in_B 
>        in_A_and_not_in_B 
>        not_in_A_and_in_B; 
>SET     TabA (IN=A KEEP= key1 key2 var1a var2a) 
>        TabB (IN=B KEEP= key1 key2
>var1a var2a) 
>; 
>BY key1 key2; 
>IF      A AND B THEN OUTPUT  
>in_A_and_in_B; 
>ELSE IF A       THEN OUTPUT  
>in_A_and_not_in_B; 
>ELSE                
>OUTPUT   not_in_A_and_in_B; 
>END; 
>RUN;</blockquote> 
>This SAS code may not be right. First, is the
>"END;" statement valid here? Second, I
>believe it will result in
>"in_A_and_not_in_B" having the same records
>as "TabA";
>"not_in_A_and_in_B" having the same records
>as "TabB"; and
>"in_A_and_in_B" being empty. 
>However, one can often translate this kind of SAS code to SPSS by using
>SPSS statement "ADD FILES" for SAS
>"SET";
>"XSAVE" for
>"OUTPUT"; and
>"EXECUTE" for
>"RUN", like this (code not
>tested): 
>ADD FILES 
>   /FILE=TabA /IN=A 
>   /FILE=TabB /IN=B 
>   /BY   key1 key2 
>   /KEEP=key1 key2 var1a var2a. 
>DO IF    A AND B. 
>.  XSAVE OUTFILE=in_A_and_in_B     /DROP=A
>B. 
>ELSE IF  A. 
>.  XSAVE OUTFILE=in_A_and_not_in_B /DROP=A B. 
>ELSE. 
>.  XSAVE OUTFILE=not_in_A_and_in_B /DROP=A B. 
>END IF. 
>EXECUTE /* which IS needed, here */. 
>In this code, "in_A_and_in_B",
>"in_A_and_not_in_B", and
>"not_in_A_and_in_B" must be files,
>not SPSS datasets. You can use the SPSS "FILE
>HANDLE" statement to associate a name to a complete data
>path, filename, and extension. 
>-Best of luck, 
> Richard Ristow</body>
> 
></html>
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Richard Ristow

Re: SAS-Code -> SPSS-Code I. Combining files

At 09:28 AM 10/18/2010, David Marso wrote:

[In code]

ADD FILES /FILE=TabA /IN=A /FILE=TabB /IN=B /BY key1 key2 /KEEP=key1 key2 var1a var2a. DO IF A AND B.
I suspect that ADD FILES needs to be replaced with MATCH FILES for the flags to end up on the same records.

Something like that may well be the case; it would correspond to using "MERGE" instead of "SET" in the SAS code.

Hans-Peter, what do you make of this? I posted with "ADD FILES" because, first I didn't want to tweak your logic without being sure I understood what you're trying to do; second, "MERGE" (SAS) or "MATCH FILES" (SPSS) are rarely advisable for input files that have exactly the same sets of variables, as (after the "KEEP" clauses) your inputs do.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Richard Ristow

Re: SAS-Code -> SPSS-Code II. Inner join

In reply to this post by Hans-Peter Oskamp

At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:
At 08:48 AM 10/11/2010, Hans-Peter Oskamp wrote:

I managed to "translate" below SAS-Statements into SPSS (i.e. correct
result) but my SPSS-Code looks like spaghetti code. I'd like to ask if somebody knows a way to code one or both of the two tasks below in SPSS/Python that is quite elegant in terms of readability.

SAS task II: inner join

/* cartesian product with SAS, */ /* TabC & TabD not unique by <key1,key2> */ PROC SQL; CREATE TABLE cartesian AS SELECT C.* D.* FROM TabC, TabD WHERE TabC.key1 = TabD.key1 AND TabC.key2 = TabD.key2 ; QUIT;

In SAS, now, PROC SQL directly supports this operation, the 'inner join' in SQL terminology, and commonly called a 'many-to-many merge' here on the SPSS list. (Strictly, 'cartesian product' applies where there's no equivalent of the 'WHERE' clause.)

SPSS does not have an equivalent built-in facility. However, there is a satisfactory algorithm, which I credit to David Marso and myself. A tested version, with test data, was posted to SPSSX-L as

Date: Thu, 26 Nov 2009 14:34:54 -0500
From: Richard Ristow <[hidden email]>
Subject: Many-to-many merge in SPSS
To: [hidden email]
X-ELNK-Received-Info: spv=0;
X-ELNK-AV: 0
X-ELNK-Info: sbv=0; sbrc=.0; sbf=0b; sbw=000;
X-Antivirus: AVG for E-mail 8.5.426 [270.14.83/2528]
Content-Type: multipart/mixed; boundary="=======AVGMAIL-244336AA======="

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD