SPSSX Discussion

Cartesian product in SPSS

Classic

List

Threaded

7 messages Options

Lakshmikanth Makaraju

Cartesian product in SPSS

Dear Listers,

I am trying to compute cartesian product using SPSS. I have two datasets one
is having some 700 cases and another with 40000 cases. I want cartesian
product of these two.

I am trying to use the syntax provided by [David Marso <[hidden email]>] but
I am getting errors. Can any one help me in computing cartesian product in
SPSS.

Here is the syntax provided by David Marso...in his post. But I am getting
errors. This syntax may be right but i may not using that correctly.

Can any body help me in getting the syntax for cartesian product?

Regards
Lakshmikanth

data list / ID1 1 var1 3 a1 4 z1 5 .
begin data
1 334
2 523
3 313
4 256
end data.
save outfile 'f1'.

data list / ID2 1 var2 3 a2 4 z2 5.
begin data
5 234
6 567
7 123
8 611
end data.
save outfile 'f2'.

* CARTESIAN PRODUCT USING SPSS *.
GET FILE 'f1'.
compute case=$CASENUM.

*spread file 1*.
vector v(4).
compute v(case)=ID1.
compute merge=1.
aggregate outfile 'file1' / break merge / v1 to v4=MAX(v1 to v4).

*Table lookup*.
get file 'f2'.
compute merge=1.
match files file * / TABLE='file1' / By merge.

*Stack the product of the files*.
vector v=V1 to v4.
loop I=1 to 4.
compute ID1=V(I).
XSAVE OUTFILE 'cartesian.sav' / keep ID1 ID2 var2 a2 z2.
end loop.
execute.

* Map the data from file 1 back to the product *.
get file 'cartesian.sav' .
sort cases by ID1.
MATCH FILES FILE * / TABLE 'f1' / BY ID1.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Cartesian product in SPSS

Lakshmikanth,

>>I am trying to compute cartesian product using SPSS. I have two datasets
one
is having some 700 cases and another with 40000 cases. I want cartesian
product of these two.

>>I am trying to use the syntax provided by [David Marso <[hidden email]>]
but
I am getting errors. Can any one help me in computing cartesian product in
SPSS.

I'm pretty sure that spss has a 32,000 variable limit. In terms of Dave's
sytax, which file is it that has the 40000 cases? What errors are you
getting? Exact text please.

Also, it seems to me that Dave Marso's syntax could be simplified via
casestovars and varstocases.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Cartesian product in SPSS

In reply to this post by Lakshmikanth Makaraju

At 11:27 AM 11/19/2009, Lakshmikanth wrote:

I am trying to compute cartesian product using SPSS. I have two datasets one
is having some 700 cases and another with 40000 cases. I want cartesian
product of these two. I am trying to use the syntax provided by [David Marso <[hidden email]>] but I am getting errors.

In general, writing "I am getting errors" doesn't help much. If we're to work with it, we need to know what errors and when they occur.

However, I've tried out David Marso's syntax as you posted it, changing it to use datasets instead of scratch files, changing the test data, some reformatting. It appears to work for me (note to self: this is v2 of the code). Notice that the number of cases in file 1 (4, in this test data) must be given explicitly in the code.

begin data A Alpha 1.2 B Beta 3.4 C Gamma 5.6 D Delta 7.8 end data. DATASET NAME f1. begin data I One 987 II Two 654 III Three 321 IV Four 9 end data. COMPUTE ID2 = LPAD(RTRIM(ID2),LENGTH(ID2)). DATASET NAME f2. * CARTESIAN PRODUCT USING SPSS *. ADD FILES /FILE= f1. compute case=$CASENUM. *spread file 1*. vector ID1.(4,A1). compute ID1.(case)=ID1. compute merge=1. FORMATS merge (F2). DATASET DECLARE file1. aggregate outfile file1 / break merge / ID1.1 TO ID1.4 =MAX(ID1.1 TO ID1.4). . /**/ DATASET ACTIVATE file1 /*-*/. . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |24-NOV-2009 00:34:23 | |-----------------------------|---------------------------| [file1] merge ID1.1 ID1.2 ID1.3 ID1.4 1 A B C D Number of cases read: 1 Number of cases listed: 1 *Table lookup*. NEW FILE. ADD FILES /file= f2. DATASET NAME file2. compute merge=1. FORMATS merge(F2). match files file = * / TABLE = file1 / By merge. . /**/ DATASET ACTIVATE file2 /*-*/. . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |24-NOV-2009 00:34:29 | |-----------------------------|---------------------------| [file2] ID2 var2 a2 merge ID1.1 ID1.2 ID1.3 ID1.4 I One 987 1 A B C D II Two 654 1 A B C D III Three 321 1 A B C D IV Four 9 1 A B C D Number of cases read: 4 Number of cases listed: 4 *Stack the product of the files*. STRING ID1(A1). vector ID1v=ID1.1 TO ID1.4. loop I=1 to 4. . compute ID1=ID1v(I). . XSAVE OUTFILE=Cartesian / keep ID1 ID2 var2 a2. end loop. execute. * Map the data from file 1 back to the product *. get file Cartesian . . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |24-NOV-2009 00:34:34 | |-----------------------------|---------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS\ 2009-11-19 Lakshmikanth-Cartesian product in SPSS - CARTESIAN.SAV ID1 ID2 var2 a2 A I One 987 B I One 987 C I One 987 D I One 987 A II Two 654 B II Two 654 C II Two 654 D II Two 654 A III Three 321 B III Three 321 C III Three 321 D III Three 321 A IV Four 9 B IV Four 9 C IV Four 9 D IV Four 9 Number of cases read: 16 Number of cases listed: 16 sort cases by ID1. MATCH FILES FILE = * / TABLE = f1 / BY ID1. . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |24-NOV-2009 00:34:35 | |-----------------------------|---------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS\ 2009-11-19 Lakshmikanth-Cartesian product in SPSS - CARTESIAN.SAV ID1 ID2 var2 a2 var1 a1 A I One 987 Alpha 1.2 A II Two 654 Alpha 1.2 A III Three 321 Alpha 1.2 A IV Four 9 Alpha 1.2 B I One 987 Beta 3.4 B II Two 654 Beta 3.4 B III Three 321 Beta 3.4 B IV Four 9 Beta 3.4 C I One 987 Gamma 5.6 C II Two 654 Gamma 5.6 C III Three 321 Gamma 5.6 C IV Four 9 Gamma 5.6 D I One 987 Delta 7.8 D II Two 654 Delta 7.8 D III Three 321 Delta 7.8 D IV Four 9 Delta 7.8 Number of cases read: 16 Number of cases listed: 16=============================
APPENDIX: Test data, and code
=============================
* C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2009d\ . * 2009-11-19 Lakshmikanth-Cartesian product in SPSS.SPS . * In response to posting . * Date: Thu, 19 Nov 2009 11:27:59 -0500 . * From: Lakshmikanth <[hidden email]> . * Subject: Cartesian product in SPSS . * To: [hidden email] . * I am trying to compute cartesian product using SPSS. I have two . * datasets one is having some 700 cases and another with 40000 . * cases. I want cartesian product of these two. . * . * I am trying to use the syntax provided by [David Marso . * <[hidden email]>] but I am getting errors. Can any one help me . * in computing cartesian product in SPSS. Here is the syntax . * provided by David Marso...in his post. But I am getting errors. . * This syntax may be right but i may not using that correctly. . * ................................................................. . * The following is the posted syntax, reformatted. See 'v1' of this . * file for its original form. In this version, the two identifiers . * are both alphabetic. . * ................................................................. . FILE HANDLE Cartesian /NAME='C:\Documents and Settings\Richard\My Documents' + '\Temporary\SPSS\' + '2009-11-19 Lakshmikanth-Cartesian product in SPSS' + ' - ' + 'CARTESIAN.SAV'. data list LIST / ID1 var1 a1 (A1, A5, F4.1). begin data A Alpha 1.2 B Beta 3.4 C Gamma 5.6 D Delta 7.8 end data. DATASET NAME f1. data list LIST / ID2 var2 a2 (A5, A5, F4). begin data I One 987 II Two 654 III Three 321 IV Four 9 end data. COMPUTE ID2 = LPAD(RTRIM(ID2),LENGTH(ID2)). DATASET NAME f2. * CARTESIAN PRODUCT USING SPSS *. * ... GET FILE 'f1'. ADD FILES /FILE= f1. compute case=$CASENUM. *spread file 1*. vector ID1.(4,A1). compute ID1.(case)=ID1. compute merge=1. FORMATS merge (F2). DATASET DECLARE file1. aggregate outfile file1 / break merge / ID1.1 TO ID1.4 =MAX(ID1.1 TO ID1.4). . /**/ DATASET ACTIVATE file1 /*-*/. . /**/ LIST /*-*/. *Table lookup*. NEW FILE. * ... get file 'f2'. ADD FILES /file= f2. DATASET NAME file2. compute merge=1. FORMATS merge(F2). match files file = * / TABLE = file1 / By merge. . /**/ DATASET ACTIVATE file2 /*-*/. . /**/ LIST /*-*/. *Stack the product of the files*. STRING ID1(A1). vector ID1v=ID1.1 TO ID1.4. loop I=1 to 4. . compute ID1=ID1v(I). . XSAVE OUTFILE=Cartesian / keep ID1 ID2 var2 a2. end loop. execute. * Map the data from file 1 back to the product *. get file Cartesian . . /**/ LIST /*-*/. sort cases by ID1. MATCH FILES FILE = * / TABLE = f1 / BY ID1. . /**/ LIST /*-*/. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Maguin, Eugene

Re: Cartesian product in SPSS

Richard,

Lakshmikanth sent me a message late last week naming this message.

Warning # 525
>An attempt was made to store a value into an element of a vector the
subscript
>of which was missing or otherwise invalid. The subscript must be a positive
>integer and must not be greater than the length of the vector. No store can
>occur.
>Command line: 12 Current case: 4 Current splitfile group: 1

As I read Dave Marso's syntax it seemed to me that it was a pre
varstocases-casestovars structure. It also concerned me that if the 40000
case file were to be converted to a wide format there would be trouble
because, I think I remember you saying, there is a 32000 variable limit to
spss. Although I think the cause of the warning may have been that
Lakshmikanth did not change the vector dimension to however many cases he
had, I didn't confirm that with him.

Lakshmikanth, would you tell us whether you have got things to work
correctly and, if not, what problems you have had?

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Cartesian product in SPSS

At 09:26 AM 11/24/2009, Gene Maguin wrote:

Lakshmikanth sent me a message late last week naming this message.

Warning # 525

An attempt was made to store a value into an element of a vector the subscript of which was missing or otherwise invalid. The subscript must be a positive integer and must not be greater than the length of the vector. No store can occur.
Command line: 12 Current case: 4 Current splitfile group: 1

I think the cause of the warning may have been that Lakshmikanth did not change the vector dimension to however many cases he had, I didn't confirm that with him.

Yep. You learn the most amazing things actually reading the messages, don't you? As I wrote, the number of cases in file 1 have to be hard-coded in the syntax. This sure looks like what's happen if that isn't done.

As I read Dave Marso's syntax it seemed to me that it was a pre varstocases-casestovars structure.

Yes, though I think that would change surprisingly little. The following syntax:

*spread file 1*. vector ID1.(4,A1). compute ID1.(case)=ID1. compute merge=1. FORMATS merge (F2). DATASET DECLARE file1. aggregate outfile file1 / break merge / ID1.1 TO ID1.4 =MAX(ID1.1 TO ID1.4).

could easily be recast, and made simpler, as CASESTOVARS logic. But that isn't the crucial part of the code.

It also concerned me that if the 40000 case file were to be converted to a wide format there would be trouble because, I think I remember you saying, there is a 32000 variable limit to spss.

Look at David Marso's syntax: he never converts either file into 'wide' form. (That would be practically impossible without CASESTOVARS, anyway.) He is using an ingenious form of what I call 'spine' logic: the 'wide' portion of the file is file 2 with the indices of the records in file 1 spread over all records - not the contents of those records. This is his 'spread' file:

ID2 var2 a2 merge ID1.1 ID1.2 ID1.3 ID1.4 I One 987 1 A B C D II Two 654 1 A B C D III Three 321 1 A B C D IV Four 9 1 A B C D

Interestingly, I think that even that spread can be avoided, as can the necessity of hard-coding the number of cases in the logic. I have discovered a truly wonderful proof of this, which this message is insufficient to hold.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Maguin, Eugene

Re: Cartesian product in SPSS

Ah, come on Richard, Don't hold back on your discovery.

Gene Maguin

>>Interestingly, I think that even that spread can be avoided, as can the
necessity of hard-coding the number of cases in the logic. I have discovered
a truly wonderful proof of this, which this message is insufficient to hold.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Cartesian product in SPSS

At 05:02 PM 11/24/2009, Gene Maguin wrote:

>>Interestingly, I think that even spreading IDs into 'wide' form can
>>be avoided, as can the necessity of hard-coding the number of cases
>>in the logic. I have discovered a truly wonderful proof of this,
>>which this message is insufficient to hold.
>
>Ah, come on Richard, Don't hold back on your discovery.

;-) I just wanted to tease everybody, and to work out all details.
See today's posting "Many-to-many merge in SPSS"; which is also, I
believe, a satisfactory general method for many-to-many merging in SPSS. 8-)

Happy Thanksgiving! :-)

-Rather proudly,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD