Macro !Mny2Mny (joins and Cartesian products), v.2

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Macro !Mny2Mny (joins and Cartesian products), v.2

Richard Ristow
Since I just wrote to Peter Spangler that I'd posted a macro that can
do Cartesian product, I thought I'd better post the latest revision,
though it isn't fully tested.

This version incorporates David Marso's code using MATRIX to 'unroll'
the records containing pairs of maximum sequence values from the two
inputs, to the set of all pairs of sequence values thereby
represented. It will, therefore, handle a much larger Cartesian
product than the earlier version I posted, though I don't give it
much chance of handling the 1.5 giga-record file Peter Spangler seems
to be needing.

*  ****************************************************************  .
*  Macro for many-to-many merge of two datasets or .SAV files.       .
*  Richard Ristow, March 2014                                        .
*  Contact: [hidden email]                                  .
*                                                                    .
*  The technique of representing a set of consecutive records in a   .
*  file by the largest value of a sequence number was introducted to .
*  the author by David Marso. David also wrote code using MATRIX to  .
*  'unroll' a record containing a pair of such numbers into the set  .
*  of all pairs of sequence numbers represented, which has been      .
*  adapted here.                                                     .
*  ================================================================= .
*                                                                    .
*  All arguments are terminated by '/'.                              .
*                                                                    .
*  Required arguments:                                               .
*  LeftIn   Name of one of the datasets, or saved files, to be joined.
*           (referred to as the "Left input")                        .
*  RightIn  Name of the other  dataset,  or saved file,  to be joined.
*           (referred to as the "Right input")                       .
*                                                                    .
*  Optional arguments for the join operation:                        .
*  BYvar    Variable or variables on which the inputs are to be      .
*           joined; if not specified, the result will be a Cartesian .
*           product, matching all pairs of records from the inputs.  .
*           Requirements are as for BY variables in MATCH FILES:     .
*           all variables must be present and of the same type in    .
*           both inputs; and both inputs must be in ascending order  .
*           on the BYvar variable(s).                                .
*  Join     Type of join (relevant only if BYvar is specified):      .
*           Outer (default), LeftOuter, RightOuter, or Inner;        .
*           Outer        Output includes contents of all records     .
*                        from both inputs                            .
*           Left Outer   Output includes contents of all records     .
*        or LeftOuter    from the Left input, but only records from  .
*        or LOuter       the Right input that match some left-input  .
*                        record                                      .
*           Right Outer  Output includes contents of all records     .
*        or RightOuter   from the Right input, but only records from .
*        or ROuter       the Left input that match some right-input  .
*                        record                                      .
*           Inner        Output contains only records from each      .
*                        input that match some record in the other   .
*  Left#    Name of variable to hold sequence numbers of Left  input .
*           records, within values of BYvar. Name should not occur   .
*           in either input file.                  (default: Left#)  .
*  Right#   Name of variable to hold sequence numbers of Right input .
*           records, within values of BYvar. Name should not occur   .
*           in either input file.                  (default: Right#) .
*                                                                    .
*  Optional arguments for the inputs:                                .
*  LRename  'RENAME' to be applied to the Left  input                .
*  LDrop    'DROP'   to be applied to the Left  input                .
*  LKeep    'KEEP'   to be applied to the Left  input (default: ALL) .
*  RRename  'RENAME' to be applied to the Right input                .
*  RDrop    'DROP'   to be applied to the Right input                .
*  RKeep    'KEEP'   to be applied to the Right input (default: ALL) .
*                                                                    .
*  Optional argument  for the output:                                .
*  DSname   Name of the output, joined dataset. The joined datasets  .
*           will always be active, but will be unnamed unless        .
*           DSname is specified.                                     .
*  ................................................................  .
*  Reserved names:                                                   .
*                                                                    .
*  By default, variables Left# and Right# are created, and retained  .
*  in the output. (These variables may be given other names by       .
*  specifying macro arguments Left#= and Right#=)                    .
*                                                                    .
*  Variables @Mny2Mny_NoBreak, @Mny2Mny_NewGrp                       .
*            @Mny2Mny_LMaxRec  @Mny2Mny_RMaxRec                      .
*            @Mny2Mny_Grp#     @Mny2Mny_GrpRecs                      .
*        are created, used, and discarded.                           .
*                                                                    .
*  Datasets  @Mny2Mny_Left#d   @Mny2Mny_Right#d                      .
*            @Mny2Mny_Left##   @Mny2Mny_GrpSmry                      .
*        are created, used, and discarded.                           .
*  ****************************************************************  .

DEFINE !Mny2Mny(LeftIn  = !CHAREND('/')
                /RightIn = !CHAREND('/')
                /BYvar   = !CHAREND('/')
                /LRename = !CHAREND('/')
                /LDrop   = !CHAREND('/')
                /LKeep   = !CHAREND('/') !DEFAULT(ALL)
                /RRename = !CHAREND('/')
                /RDrop   = !CHAREND('/')
                /RKeep   = !CHAREND('/') !DEFAULT(ALL)
                /Join    = !CHAREND('/') !DEFAULT(Outer)
                /Left#   = !CHAREND('/') !DEFAULT(Left#)
                /Right#  = !CHAREND('/') !DEFAULT(Right#)
                /DSname  = !CHAREND('/'))

    ECHO " ".
    ECHO !QUOTE(!CONCAT('Macro !','Mny2Mny:')).

*  .......     Section I:   Input parsing and error checking   ....... .
    !LET !Error = No

*  .......     I.A:         Required inputs                    ....... .

    !IF (   !LeftIn  !EQ !NULL
        !OR !RightIn !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Arguments LeftIn and RightIn (the two inputs)".
        ECHO "      are required.".
        !IF  (!LeftIn  !NE !NULL) !THEN
        ECHO !QUOTE(!CONCAT("      (Left input specified: ",
                                   !LeftIn,')')).
        !IFEND
        !IF  (!RightIn !NE !NULL) !THEN
        ECHO !QUOTE(!CONCAT("      (Right input specified: ",
                                   !RightIn,')')).
        !IFEND
    !IFEND

    !IF (   !RightIn !EQ '*') !THEN
        !LET !Error = Yes
        ECHO "The Right input may not be '*'".
    !IFEND

*  .......     I.B:         One-token inputs                   ....... .

    !IF (!TAIL(!Left#) !NE !NULL) !THEN
        !LET !Error = Yes
        ECHO "'Left#', if specified, must be a single token".
        ECHO !QUOTE(!CONCAT('(Entered: ','Left#,')')).
    !IFEND

    !IF (!TAIL(!Right) !NE !NULL) !THEN
        !LET !Error = Yes
        ECHO "'Right#', if specified, must be a single token".
        ECHO !QUOTE(!CONCAT('(Entered: ','Right#,')')).
    !IFEND

    !IF (!TAIL(!DSname) !NE !NULL) !THEN
        !LET !Error = Yes
        ECHO "'DSname', if specified, must be a single token".
        ECHO !QUOTE(!CONCAT('(Entered: ',!DSname,')')).
    !IFEND

*  .......     I.C:         Join type                          ....... .

    !IF (!UPCASE(!Join) !EQ 'OUTER') !THEN
       !LET !JType = 'OUTER'
       !LET !JMsg  = 'Outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFT OUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFTOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'RIGHT OUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'RIGHTOUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'ROUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'INNER') !THEN
       !LET !JType = 'INNER'
       !LET !JMsg  = 'Inner join'
    !IFEND

    !IF (!BYvar         !EQ !NULL)   !THEN
       !LET !JType = 'CARTPROD'
       !LET !JMsg  = 'Cartesian product'
    !IFEND

    !IF (!JType !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Invalid join type:".
        ECHO !QUOTE(!CONCAT('    Join=',!Join)).
    !IFEND


*  .......     Section II:  Informational messages            ........ .
    !IF (!Error !EQ No) !THEN
        ECHO !QUOTE(!CONCAT(!JMsg,' of')).
        ECHO !QUOTE(!CONCAT('   ',!LeftIn)).
        !IF (        !LRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!LRename))
        !IFEND
        !IF (        !LDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!LDrop))
        !IFEND
        !IF (!UPCASE(!LKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!LKeep))
        !IFEND

        ECHO !QUOTE(!CONCAT('to ',!RightIn)).
        !IF (        !RRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!RRename))
        !IFEND
        !IF (        !RDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!RDrop))
        !IFEND
        !IF (!UPCASE(!RKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!RKeep))
        !IFEND
        !IF (!BYvar  !NE !NULL)          !THEN
           ECHO !QUOTE(!CONCAT('by ',!BYvar)).
        !IFEND
        !IF (!DSname !NE !NULL)          !THEN
           ECHO !QUOTE(!CONCAT('Output to dataset ',!DSname)).
        !IFEND
      !IFEND

*  .......     Section III: Processing                          ....... .
*  .......     III.A    Sequence-numbered copies of the inputs  ....... .
*  .......     III.A.1  Left  input -> dataset @Mny2Mny_Left#d  ....... .
    !IF (!Error !EQ No) !THEN

        !IF (!BYvar !EQ !NULL)
        !THEN
        * ... Left  input, when there are no BYvar vars  ... .
        ADD FILES
           /FILE=!Leftin
          !IF (!LRename !NE !NULL) !THEN
           /RENAME = !LRename
          !IFEND
          !IF (!LDrop   !NE !NULL) !THEN
           /DROP   = !LDrop
          !IFEND
           /KEEP   = !BYvar !LKeep.

        COMPUTE @Mny2Mny_NewGrp  = ($CASENUM EQ 1).
        COMPUTE !Left#           =  $CASENUM.

        !ELSE
         * ... Left  input, when there are    BYvar vars  ... .
         ADD FILES
           /FILE=!Leftin
          !IF (!LRename !NE !NULL) !THEN
           /RENAME = !LRename
          !IFEND
          !IF (!LDrop   !NE !NULL) !THEN
           /DROP   = !LDrop
          !IFEND
           /KEEP   = !BYvar !LKeep
           /BY      !BYvar
           /FIRST  = @Mny2Mny_NewGrp.

         DO IF    @Mny2Mny_NewGrp.
         .  COMPUTE !Left# = 1.
         ELSE.
         .  COMPUTE !Left# = LAG(!Left#) + 1.
         END IF.
         !IFEND

        * ... Left  input, code common to both cases     ... .

        COMPUTE        @Mny2Mny_NoBreak = 1.
        FORMATS        @Mny2Mny_NoBreak
                       @Mny2Mny_NewGrp (F2).
        FORMATS        !Left#          (F4).
        VARIABLE WIDTH !Left#          (6).

        DATASET NAME      @Mny2Mny_Left#d WINDOW=ASIS.
        *  ... End of processing Left  input .
        /*-- LIST /*-*/.

*  .......     III.A.2  Right input -> dataset @Mny2Mny_Right#d ....... .

        !IF (!BYvar !EQ !NULL)
        !THEN
        * ... Right input, when there are no BYvar vars ... .
        ADD FILES
           /FILE=!RightIn
        !IF (!RRename !NE !NULL) !THEN
           /RENAME = !RRename
        !IFEND
        !IF (!RDrop   !NE !NULL) !THEN
           /DROP   = !RDrop
        !IFEND
           /KEEP   = !BYvar !RKeep.

        COMPUTE @Mny2Mny_NewGrp  = ($CASENUM EQ 1).
        COMPUTE !Right#          =  $CASENUM.

        !ELSE
        * ... Right input, when there are    BYvar vars ... .
        ADD FILES
           /FILE=!RightIn
        !IF (!RRename !NE !NULL) !THEN
           /RENAME = !RRename
        !IFEND
        !IF (!RDrop   !NE !NULL) !THEN
           /DROP   = !RDrop
        !IFEND
           /KEEP   = !BYvar !RKeep
           /BY !BYvar
           /FIRST  = @Mny2Mny_NewGrp.

        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE !Right# = 1.
        ELSE.
        .  COMPUTE !Right# = LAG(!Right#) + 1.
        END IF.
        !IFEND

        * ... Right input, code common to both cases    ... .

        COMPUTE        @Mny2Mny_NoBreak = 1.
        FORMATS        @Mny2Mny_NoBreak
                       @Mny2Mny_NewGrp (F2).
        FORMATS        !Right#         (F4).
        VARIABLE WIDTH !Right#         (6).

        DATASET NAME      @Mny2Mny_Right#d WINDOW=ASIS.
        *  ... End of processing Right input .
        /*-- LIST /*-*/.

*  .......     III.B  Build 'spine' of all keys for all records ....... .
*  .......     III.B.1  Combine record counts from both inputs  ....... .

        DATASET ACTIVATE  @Mny2Mny_Left#d  WINDOW=ASIS.
        DATASET DECLARE   @Mny2Mny_Left##.
        AGGREGATE OUTFILE=@Mny2Mny_Left##
           /BREAK = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_LMaxRec = MAX(!Left#).


        DATASET ACTIVATE  @Mny2Mny_Right#d WINDOW=ASIS.
        AGGREGATE OUTFILE=*
           /BREAK   = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_RMaxRec = MAX(!Right#).

        MATCH FILES
           /FILE    = @Mny2Mny_Left##
           /FILE    = *
           /BY @Mny2Mny_NoBreak !BYvar.

        RECODE @Mny2Mny_LMaxRec
               @Mny2Mny_RMaxRec (MISSING=0).

        DATASET NAME  @Mny2Mny_GrpSmry.

        /*-- LIST /*-*/.

*  .......     III.B.2  Select, based on join type              ....... .
       !IF  (!Jtype !EQ 'INNER') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0
               AND @Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'LEFTOUT') !THEN
        SELECT IF (@Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'RIGHTOUT') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0).
       !IFEND

        * ..................................... .
        * For OUTER join and Cartesian product, .
        * no selection is necessary.            .
        * ..................................... .

        /*--  LIST /*-*/.

*  .......     III.B.3  Expand the counts to get list of        ........ .
*                       pairs of record numbers from the        ........ .
*                       groups, by group number                 ........ .

        COMPUTE @Mny2Mny_Grp#    = $CASENUM.

        COMPUTE @Mny2Mny_GrpRecs = MAX(@Mny2Mny_LMaxRec,1)
                                  *MAX(@Mny2Mny_RMaxRec,1).

*  .......     MATRIX code by David Marso, adapted here.        ........ .
*  ... OMS  /SELECT TABLES
             /IF COMMANDS = ["Matrix"]
                 SUBTYPES = ["Notes"]
             /DESTINATION   VIEWER = NO
             /TAG = "ZapMATRIX".

        MATRIX.
        +  GET GrpSmry
           /FILE=*
           /VARIABLES=@Mny2Mny_Grp#
                      @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec
                      @Mny2Mny_GrpRecs
           /NAMES=GSnames.

        +  LOOP #=1 TO NROW(GrpSmry).

        +     COMPUTE LeftNos=T({1:MMAX({GrpSmry(#,2),1})}).
        +     DO IF (GrpSmry(#,2) EQ 0).
        +        COMPUTE LeftNos = 0*LeftNos.
        +     END IF.

        +     COMPUTE RightNos=T({1:MMAX({GrpSmry(#,3),1})}).
        +     DO IF (GrpSmry(#,3) EQ 0).
        +        COMPUTE RightNos = 0*RightNos.
        +     END IF.

        +     SAVE ({MAKE(GrpSmry(#,4),1,GrpSmry(#,1)),
                     KRONEKER(MAKE(NROW(RightNos),1,1),LeftNos),
                     KRONEKER(RightNos,MAKE(NROW(LeftNos),1,1)) })
               / OUTFILE *
               / VARIABLES @Mny2Mny_Grp# !Left# !Right#.

        +  END LOOP.
        END MATRIX.

        FORMATS @Mny2Mny_Grp# !Left# !Right# (F4).
        VARIABLE WIDTH        !Left# !Right# (6).
        /*--  LIST /*-*/.

*  .......     III.B.4  Combine with summary file by group key  ........ .
*                       to get the 'spine': list of all pairs   ........ .
*                       of record numbers by group key.         ........ .

        MATCH FILES
           /TABLE=@Mny2Mny_GrpSmry
           /FILE =*
           /BY    @Mny2Mny_Grp#.

        /*-- DATASET COPY Spine      /*-*/.

*  .......     III.C  Add the substantive data                  ....... .

        MATCH FILES
           /FILE =*
           /TABLE=@Mny2Mny_Right#d
           /BY   !BYvar !Right#
           /DROP  @Mny2Mny_Grp#    @Mny2Mny_GrpRecs
                  @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec
                  @Mny2Mny_NoBreak @Mny2Mny_NewGrp.

        SORT CASES
               BY !BYvar !Left# !Right#.

        MATCH FILES
           /TABLE=@Mny2Mny_Left#d
           /FILE =*
           /BY   !BYvar !Left#
           /DROP  @Mny2Mny_NoBreak @Mny2Mny_NewGrp
           /KEEP !BYvar !Left# !Right# ALL.

        EXECUTE /* so input datasets can be dropped */.

        !IF (!DSname !NE !NULL) !THEN
           DATASET NAME !DSname WINDOW=FRONT.
        !IFEND
        DATASET CLOSE @Mny2Mny_Left#d.
        DATASET CLOSE @Mny2Mny_Right#d.
        DATASET CLOSE @Mny2Mny_Left##.
        DATASET CLOSE @Mny2Mny_GrpSmry.

    !IFEND
!ENDDEFINE.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny (joins and Cartesian products), v.2

Andy W
IMO posting code here on Nabble is not a great solution. Ignoring whether Nabble will be around forever or not. I would suggest posting a link to the code somewhere, and there you can just update the code so people following the same link will always get the most up to date version.

There are alot of options for posting the code online. I frequently go the easiest route and just share a link to my dropbox account, but I've used google code previous. The SPSS developerworks site you can upload code chunks. Although most (all?) examples are SPD and SPE files, you can upload anything. See here I just uploaded a chart template to my account.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny (joins and Cartesian products), v.2

Jon K Peck
There are traditional code tools both from IBM and independent authors posted on the SPSS Community site (www.ibm.com/developerworks/spssdevcentral) mostly in the Statistical Tools collection (https://www.ibm.com/developerworks/community/files/app?lang=en#/collection/bbe88aaf-f3cd-466a-83fb-592d48eecb1c).  Sorry for the long url, but you can easily navigate to the collection from the first one.

We welcome user contributions.  See the Sharing topic on the first page of the site for details on how to do this.  Contributors retain full ownership and control of their submissions subject to the developerWorks Terms of Use.



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email],
Date:        03/20/2014 06:05 AM
Subject:        Re: [SPSSX-L] Macro !Mny2Mny (joins and Cartesian products), v.2
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




IMO posting code here on Nabble is not a great solution. Ignoring whether
Nabble will be around forever or not. I would suggest posting a link to the
code somewhere, and there you can just update the code so people following
the same link will always get the most up to date version.

There are alot of options for posting the code online. I frequently go the
easiest route and just share a link to my dropbox account, but I've used
google code previous. The SPSS developerworks site you can upload code
chunks. Although most (all?) examples are SPD and SPE files, you can upload
anything. See  here
<
https://www.ibm.com/developerworks/community/files/app?lang=en#/file/8937d380-f86e-4790-b871-dd14ce6e50d7>
I just uploaded a chart template to my account.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Macro-Mny2Mny-joins-and-Cartesian-products-v-2-tp5724939p5724943.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny (joins and Cartesian products), v.2

David Marso
Administrator
In reply to this post by Richard Ristow
Rather than create a Cartesian product of 1800 x 800000, consider that
1. the 800,000 may not be unique values so substantial reduction here via aggregation.
2. Of the reduced set consider the number of unique words in the entire set as well as the 1800.
3. If one treats a word as the unit of analysis then one needs only consider cases in which one or more words found in the 1800 match with those found in the reduced (aggregated) 800000.
4. Since the MATRIX code builds the KP for a given row (word) of the input data and saves it within the loop, one will never have to hold the entire possibly huge KP of 800000 x 1800.
Leaving it to P. Spangler to ponder and write necessary code.

Richard Ristow wrote
Since I just wrote to Peter Spangler that I'd posted a macro that can
do Cartesian product, I thought I'd better post the latest revision,
though it isn't fully tested.

This version incorporates David Marso's code using MATRIX to 'unroll'
the records containing pairs of maximum sequence values from the two
inputs, to the set of all pairs of sequence values thereby
represented. It will, therefore, handle a much larger Cartesian
product than the earlier version I posted, though I don't give it
much chance of handling the 1.5 giga-record file Peter Spangler seems
to be needing.
<SNIP macro code>.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny (joins and Cartesian products), v.2

pspangler1
Thank you all for this valuable conversation and suggestions. I am beginning to see there are better ways of conceptualizing the analysis and much to learn!

Sent from my iPhone

> On Mar 20, 2014, at 6:38 AM, David Marso <[hidden email]> wrote:
>
> Rather than create a Cartesian product of 1800 x 800000, consider that
> 1. the 800,000 may not be unique values so substantial reduction here via
> aggregation.
> 2. Of the reduced set consider the number of unique words in the entire set
> as well as the 1800.
> 3. If one treats a word as the unit of analysis then one needs only consider
> cases in which one or more words found in the 1800 match with those found in
> the reduced (aggregated) 800000.
> 4. Since the MATRIX code builds the KP for a given row (word) of the input
> data and saves it within the loop, one will never have to hold the entire
> possibly huge KP of 800000 x 1800.
> Leaving it to P. Spangler to ponder and write necessary code.
>
>
> Richard Ristow wrote
>> Since I just wrote to Peter Spangler that I'd posted a macro that can
>> do Cartesian product, I thought I'd better post the latest revision,
>> though it isn't fully tested.
>>
>> This version incorporates David Marso's code using MATRIX to 'unroll'
>> the records containing pairs of maximum sequence values from the two
>> inputs, to the set of all pairs of sequence values thereby
>> represented. It will, therefore, handle a much larger Cartesian
>> product than the earlier version I posted,
> *
>> though I don't give it
>> much chance of handling the 1.5 giga-record file Peter Spangler seems
>> to be needing.
> *
> <SNIP macro code>.
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Macro-Mny2Mny-joins-and-Cartesian-products-v-2-tp5724939p5724946.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

--
This email may contain confidential information for the sole use of the
intended recipient(s). If you are not an intended recipient, please notify
the sender and delete all copies immediately.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD