Cartesian product in SPSS

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Cartesian product in SPSS

Lakshmikanth Makaraju
Dear Listers,

I am trying to compute cartesian product using SPSS. I have two datasets one
is having some 700 cases and another with 40000 cases. I want cartesian
product of these two.

I am trying to use the syntax provided by [David Marso <[hidden email]>] but
I am getting errors. Can any one help me in computing cartesian product in
SPSS.

Here is the syntax provided by David Marso...in his post. But I am getting
errors. This syntax may be right but i may not using that correctly.

Can any body help me in getting the syntax for cartesian product?

Regards
Lakshmikanth

data list / ID1 1 var1 3 a1 4 z1 5 .
begin data
1 334
2 523
3 313
4 256
end data.
save outfile 'f1'.


data list / ID2 1 var2 3 a2 4  z2 5.
begin data
5 234
6 567
7 123
8 611
end data.
save outfile 'f2'.


* CARTESIAN PRODUCT USING SPSS *.
GET FILE 'f1'.
compute case=$CASENUM.


*spread file 1*.
vector v(4).
compute v(case)=ID1.
compute merge=1.
aggregate outfile 'file1' / break merge / v1 to v4=MAX(v1 to v4).




*Table lookup*.
get file 'f2'.
compute merge=1.
match files file * / TABLE='file1'  / By merge.




*Stack the product of the files*.
vector v=V1 to v4.
loop I=1 to 4.
compute ID1=V(I).
XSAVE OUTFILE 'cartesian.sav' / keep ID1 ID2 var2 a2 z2.
end loop.
execute.


* Map the data from file 1 back to the product *.
get file 'cartesian.sav' .
sort cases by ID1.
MATCH FILES FILE * / TABLE 'f1' / BY ID1.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product in SPSS

Maguin, Eugene
Lakshmikanth,

>>I am trying to compute cartesian product using SPSS. I have two datasets
one
is having some 700 cases and another with 40000 cases. I want cartesian
product of these two.

>>I am trying to use the syntax provided by [David Marso <[hidden email]>]
but
I am getting errors. Can any one help me in computing cartesian product in
SPSS.

I'm pretty sure that spss has a 32,000 variable limit. In terms of Dave's
sytax, which file is it that has the 40000 cases? What errors are you
getting? Exact text please.

Also, it seems to me that Dave Marso's syntax could be simplified via
casestovars and varstocases.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product in SPSS

Richard Ristow
In reply to this post by Lakshmikanth Makaraju
At 11:27 AM 11/19/2009, Lakshmikanth wrote:

I am trying to compute cartesian product using SPSS. I have two datasets one
is having some 700 cases and another with 40000 cases. I want cartesian
product of these two. I am trying to use the syntax provided by [David Marso <[hidden email]>] but I am getting errors.

In general, writing "I am getting errors" doesn't help much. If we're to work with it, we need to know what errors and when they occur.

However, I've tried out David Marso's syntax as you posted it, changing it to use datasets instead of scratch files, changing the test data, some reformatting. It appears to work for me (note to self: this is v2 of the code). Notice that the number of cases in file 1 (4, in this test data) must be given explicitly in the code.

begin data
     A    Alpha 1.2
     B    Beta  3.4
     C    Gamma 5.6
     D    Delta 7.8
end data.
DATASET NAME     f1.

begin data
       I  One   987
      II  Two   654
     III  Three 321
      IV  Four    9
end data.
COMPUTE ID2 = LPAD(RTRIM(ID2),LENGTH(ID2)).
DATASET NAME     f2.


* CARTESIAN PRODUCT USING SPSS *.

ADD FILES /FILE= f1.

compute case=$CASENUM.

*spread file 1*.
vector  ID1.(4,A1).
compute ID1.(case)=ID1.

compute merge=1.
FORMATS merge (F2).

DATASET DECLARE   file1.
aggregate outfile file1
  / break merge
  / ID1.1 TO ID1.4 =MAX(ID1.1 TO ID1.4).

.  /**/  DATASET ACTIVATE file1  /*-*/.
.  /**/  LIST                    /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |24-NOV-2009 00:34:23       |
|-----------------------------|---------------------------|
[file1]
 
merge ID1.1 ID1.2 ID1.3 ID1.4

   1  A     B     C     D

Number of cases read:  1    Number of cases listed:  1

 
*Table lookup*.
NEW FILE.
ADD FILES /file= f2.
DATASET NAME     file2.
compute merge=1.
FORMATS merge(F2).
match files
      file  = *
    / TABLE = file1
    / By merge.

.  /**/  DATASET ACTIVATE file2  /*-*/.
.  /**/  LIST                    /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |24-NOV-2009 00:34:29       |
|-----------------------------|---------------------------|
[file2]
 
ID2   var2    a2 merge ID1.1 ID1.2 ID1.3 ID1.4

I     One    987    1  A     B     C     D
II    Two    654    1  A     B     C     D
III   Three  321    1  A     B     C     D
IV    Four     9    1  A     B     C     D

Number of cases read:  4    Number of cases listed:  4
 
*Stack the product of the files*.
STRING ID1(A1).
vector ID1v=ID1.1 TO ID1.4.
loop I=1 to 4.
.  compute ID1=ID1v(I).
.  XSAVE OUTFILE=Cartesian
       / keep ID1 ID2 var2 a2.
end loop.
execute.


* Map the data from file 1 back to the product *.

get file Cartesian .
.  /**/  LIST                    /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |24-NOV-2009 00:34:34       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents
  \Temporary\SPSS\
   2009-11-19 Lakshmikanth-Cartesian product in SPSS - CARTESIAN.SAV
 
ID1 ID2   var2    a2

A   I     One    987
B   I     One    987
C   I     One    987
D   I     One    987
A   II    Two    654
B   II    Two    654
C   II    Two    654
D   II    Two    654
A   III   Three  321
B   III   Three  321
C   III   Three  321
D   III   Three  321
A   IV    Four     9
B   IV    Four     9
C   IV    Four     9
D   IV    Four     9

Number of cases read:  16    Number of cases listed:  16

 
sort cases by ID1.
MATCH FILES
     FILE  = *
   / TABLE = f1
   / BY ID1.

.  /**/  LIST                    /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |24-NOV-2009 00:34:35       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents
  \Temporary\SPSS\
   2009-11-19 Lakshmikanth-Cartesian product in SPSS - CARTESIAN.SAV
 
ID1 ID2   var2    a2 var1     a1

A   I     One    987 Alpha   1.2
A   II    Two    654 Alpha   1.2
A   III   Three  321 Alpha   1.2
A   IV    Four     9 Alpha   1.2
B   I     One    987 Beta    3.4
B   II    Two    654 Beta    3.4
B   III   Three  321 Beta    3.4
B   IV    Four     9 Beta    3.4
C   I     One    987 Gamma   5.6
C   II    Two    654 Gamma   5.6
C   III   Three  321 Gamma   5.6
C   IV    Four     9 Gamma   5.6
D   I     One    987 Delta   7.8
D   II    Two    654 Delta   7.8
D   III   Three  321 Delta   7.8
D   IV    Four     9 Delta   7.8

Number of cases read:  16    Number of cases listed:  16
=============================
APPENDIX: Test data, and code
=============================
*  C:\Documents and Settings\Richard\My Documents                    .
*    \Technical\spssx-l\Z-2009d\                                     .
*     2009-11-19 Lakshmikanth-Cartesian product in SPSS.SPS          .


*  In response to posting                                            .
*  Date:    Thu, 19 Nov 2009 11:27:59 -0500                          .
*  From:    Lakshmikanth <[hidden email]>           .
*  Subject: Cartesian product in SPSS                                .
*  To:      [hidden email]                                 .


*  I am trying to compute cartesian product using SPSS. I have two   .
*  datasets one is having some 700 cases and another with 40000      .
*  cases. I want cartesian product of these two.                     .
*                                                                    .
*  I am trying to use the syntax provided by [David Marso            .
*  <[hidden email]>] but I am getting errors. Can any one help me    .
*  in computing cartesian product in SPSS. Here is the syntax        .
*  provided by David Marso...in his post. But I am getting errors.   .
*  This syntax may be right but i may not using that correctly.      .

*  ................................................................. .
*  The following is the posted syntax, reformatted. See 'v1' of this .
*  file for its original form. In this version, the two identifiers  .
*  are both alphabetic.                                              .
*  ................................................................. .


FILE HANDLE Cartesian
 /NAME='C:\Documents and Settings\Richard\My Documents'              +
         '\Temporary\SPSS\'                                          +
         '2009-11-19 Lakshmikanth-Cartesian product in SPSS'         +
         ' - '                                                       +
         'CARTESIAN.SAV'.


data list LIST
   / ID1  var1  a1
    (A1,  A5,   F4.1).
begin data
     A    Alpha 1.2
     B    Beta  3.4
     C    Gamma 5.6
     D    Delta 7.8
end data.

DATASET NAME     f1.


data list LIST
   / ID2  var2  a2
    (A5,  A5,   F4).
begin data
       I  One   987
      II  Two   654
     III  Three 321
      IV  Four    9
end data.
COMPUTE ID2 = LPAD(RTRIM(ID2),LENGTH(ID2)).
DATASET NAME     f2.


* CARTESIAN PRODUCT USING SPSS *.

*  ... GET FILE 'f1'.
ADD FILES /FILE= f1.

compute case=$CASENUM.

*spread file 1*.
vector  ID1.(4,A1).
compute ID1.(case)=ID1.

compute merge=1.
FORMATS merge (F2).

DATASET DECLARE   file1.
aggregate outfile file1
  / break merge
  / ID1.1 TO ID1.4 =MAX(ID1.1 TO ID1.4).

.  /**/  DATASET ACTIVATE file1  /*-*/.
.  /**/  LIST                    /*-*/.


*Table lookup*.
NEW FILE.
* ... get  file 'f2'.
ADD FILES /file= f2.
DATASET NAME     file2.

compute merge=1.
FORMATS merge(F2).
match files
      file  = *
    / TABLE = file1 
    / By merge.

.  /**/  DATASET ACTIVATE file2  /*-*/.
.  /**/  LIST                    /*-*/.


*Stack the product of the files*.
STRING ID1(A1).
vector ID1v=ID1.1 TO ID1.4.
loop I=1 to 4.
.  compute ID1=ID1v(I).
.  XSAVE OUTFILE=Cartesian
       / keep ID1 ID2 var2 a2.
end loop.
execute.


* Map the data from file 1 back to the product *.

get file Cartesian .
.  /**/  LIST                    /*-*/.

sort cases by ID1.
MATCH FILES
     FILE  = *
   / TABLE = f1
   / BY ID1.

.  /**/  LIST                    /*-*/.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product in SPSS

Maguin, Eugene
Richard,

Lakshmikanth sent me a message late last week naming this message.

Warning # 525
>An attempt was made to store a value into an element of a vector the
subscript
>of which was missing or otherwise invalid. The subscript must be a positive
>integer and must not be greater than the length of the vector. No store can
>occur.
>Command line: 12 Current case: 4 Current splitfile group: 1


As I read Dave Marso's syntax it seemed to me that it was a pre
varstocases-casestovars structure. It also concerned me that if the 40000
case file were to be converted to a wide format there would be trouble
because, I think I remember you saying, there is a 32000 variable limit to
spss. Although I think the cause of the warning may have been that
Lakshmikanth did not change the vector dimension to however many cases he
had, I didn't confirm that with him.

Lakshmikanth, would you tell us whether you have got things to work
correctly and, if not, what problems you have had?

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product in SPSS

Richard Ristow
At 09:26 AM 11/24/2009, Gene Maguin wrote:

Lakshmikanth sent me a message late last week naming this message.

Warning # 525
An attempt was made to store a value into an element of a vector the subscript of which was missing or otherwise invalid. The subscript must be a positive integer and must not be greater than the length of the vector. No store can occur.
Command line: 12 Current case: 4 Current splitfile group: 1

I think the cause of the warning may have been that Lakshmikanth did not change the vector dimension to however many cases he had, I didn't confirm that with him.

Yep. You learn the most amazing things actually reading the messages, don't you? As I wrote, the number of cases in file 1 have to be hard-coded in the syntax. This sure looks like what's happen if that isn't done.

As I read Dave Marso's syntax it seemed to me that it was a pre varstocases-casestovars structure.

Yes, though I think that would change surprisingly little. The following syntax:

*spread file 1*.
vector  ID1.(4,A1).
compute ID1.(case)=ID1.

compute merge=1.
FORMATS merge (F2).

DATASET DECLARE   file1.
aggregate outfile file1
  / break merge
  / ID1.1 TO ID1.4 =MAX(ID1.1 TO ID1.4).

could easily be recast, and made simpler, as CASESTOVARS logic. But that isn't the crucial part of the code.

It also concerned me that if the 40000 case file were to be converted to a wide format there would be trouble because, I think I remember you saying, there is a 32000 variable limit to spss.

Look at David Marso's syntax: he never converts either file into 'wide' form. (That would be practically impossible without CASESTOVARS, anyway.) He is using an ingenious form of what I call 'spine' logic: the 'wide' portion of the file is file 2 with the indices of the records in file 1 spread over all records - not the contents of those records. This is his 'spread' file:

ID2   var2    a2 merge ID1.1 ID1.2 ID1.3 ID1.4

I     One    987    1  A     B     C     D
II    Two    654    1  A     B     C     D
III   Three  321    1  A     B     C     D
IV    Four     9    1  A     B     C     D

Interestingly, I think that even that spread can be avoided, as can the necessity of hard-coding the number of cases in the logic. I have discovered a truly wonderful proof of this, which this message is insufficient to hold.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product in SPSS

Maguin, Eugene
Ah, come on Richard, Don't hold back on your discovery.

Gene Maguin

>>Interestingly, I think that even that spread can be avoided, as can the
necessity of hard-coding the number of cases in the logic. I have discovered
a truly wonderful proof of this, which this message is insufficient to hold.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cartesian product in SPSS

Richard Ristow
At 05:02 PM 11/24/2009, Gene Maguin wrote:

>>Interestingly, I think that even spreading IDs into 'wide' form can
>>be avoided, as can the necessity of hard-coding the number of cases
>>in the logic. I have discovered a truly wonderful proof of this,
>>which this message is insufficient to hold.
>
>Ah, come on Richard, Don't hold back on your discovery.

;-) I just wanted to tease everybody, and to work out all details.
See today's posting "Many-to-many merge in SPSS"; which is also, I
believe, a satisfactory general method for many-to-many merging in SPSS. 8-)

Happy Thanksgiving! :-)

-Rather proudly,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD