MATRIX code to create 'spine' for R. Ristows Macro !Mny2Mny, for many-to-many merge and Cartesian product

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

MATRIX code to create 'spine' for R. Ristows Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
This post was updated on .
Hi Richard and whoever is interested:
This is a very crude basic sketch and I have made no effort to turn it into a macro at this point.
Basically rather than have the variables as a set be the key, imagine one has mapped these to a single variable using a counter or some other device.
The logic in the matrix program then generates the necessary number of records for merging into the two data sets.  Since the 'spine' is already ordered properly for the left file we immediately merge it in as a TABLE.  SORT to align with the 'right' file and match that.
I hope this makes sense ;-)))
----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.
SORT CASES BY Category.
DATASET NAME f2.
LIST.
ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
IF $CASENUM EQ 1 OR @IN2@ NE LAG(@IN2@) OR Category NE LAG(Category) InternalCount=1.
IF MISSING(InternalCount) InternalCount=LAG(InternalCount)+1.
DATASET NAME stacked.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .
DATASET Activate stacked.
DATASET COPY stack2.
SELECT IF NOT(@IN2@).
EXECUTE.
DELETE VARIABLES Subject.
RENAME VARIABLES INTERNALCount=F1INstance.

DATASET Activate stack2.
SELECT IF @IN2@.
EXECUTE.
DELETE VARIABLES School.

RENAME VARIABLES INTERNALCount=F2INstance.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

DATASET DECLARE spine .
MATRIX.
GET DATA /FILE * / VARIABLES @IN2@  AR_Category NPerCat NCatPerFile.
COMPUTE Base1=Data(1:Data(1,4),2:3).
COMPUTE Base2=Data((Data(1,4)+1):NROW(Data),2:3).
COMPUTE MaxCat=MMAX(Data(:,2)).
COMPUTE X1=MAKE(MaxCat,1,0).
COMPUTE X2=MAKE(MaxCat,1,0).
LOOP #=1 TO NROW(Base1) .
COMPUTE X1(Base1(#,1))=Base1(#,2).
END LOOP.
LOOP #=1 TO NROW(Base2) .
COMPUTE X2(Base2(#,1))=Base2(#,2).
END LOOP.
COMPUTE M12=RMAX({{X1 &*X2},{X1,X2}}).
COMPUTE C={T({1:MaxCat}),X1,X2,M12}.
COMPUTE Spine=MAKE(MSUM(M12),3,0).
COMPUTE CurRow=1.
LOOP #=1 TO NRow(C).
COMPUTE LastRow=CurRow+C(#,4)-1.
COMPUTE T1=T({1:MMAX({C(#,2),1})}).
COMPUTE T2=T({1:MMAX({C(#,3),1})}).
COMPUTE T0=MAKE(C(#,4),1,C(#,1)).
COMPUTE Spine(CurRow:LastRow,:)=
          {T0,KRONEKER(T1,MAKE(NROW(T2),1,1)),
              KRONEKER(MAKE(NROW(T1),1,1),T2)}.
COMPUTE CurRow=LastRow+1.
END LOOP.
SAVE Spine   / OUTFILE Spine    / VARIABLES AR_Category F1Instance F2Instance.
END MATRIX.
MATCH FILES FILE spine  / TABLE stacked  / BY AR_Category F1Instance.
SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE stack2  / BY AR_Category F2Instance.



Richard Ristow wrote
At 08:57 AM 3/10/2014, David Marso wrote, under subject line
"Re: News from the SPSS Community":

>Building Richard's 'spine' can be done easily with KRONEKER.

I'd be interested in how to do this, since I don't find the
double-VARSTOCASES approach wholly satisfactory(*).

The input from which the 'spine' is generated consists of an
indefinite number of records, each containing,
. One key value, where the key can consist of any number of variables
of any datatypes. (The key value is unique in this file.)
. A left-hand record count, GE 0:  Rl
. A right-hand record count, GE 0: Rr

The output, the spine itself, contains the same key values, but with
Rl*Rr records for each (or Rl or Rr records, when an outer join is
being performed and one of the values is 0). There's one record for
each {l,r:1<=l<=Rl,1<=r<=Rr} (except that l is 0 if Rl is 0, and r is
0 if Rr is 0.)

I'm appending Appendix I (with the output) and Appendix II (with the
test data and all code, including the macro definition) from a small
demonstration run. In the output, the pre-spine and spine are listed,
following the listing of the merged data.  You'll see that both have
variables beyond the basics I listed above. Those are variables used
in earlier stages, which are carried forward past when they are no
longer needed, to the first time they can conveniently be eliminated.

Finally, notice that this is a true outer join, with unmatched
records from both inputs appearing in the output.
======================================
(*)Definition and use of the 'spine': see
Date:    Sun, 9 Mar 2014 17:08:56 -0500
From:    Richard Ristow <[hidden email]>
Subject: Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product
To:      [hidden email]
=======================================
APPENDIX I:  Test and demonstration run
=======================================
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Schools]
Category        School

Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH

Number of cases read:  9    Number of cases listed:  9
...
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Subjects]
Category        Subject

Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering

Number of cases read:  11    Number of cases listed:  11

!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

Macro !Mny2Mny:
Outer join of
    Schools
to Subjects
by Category

DATASET NAME     Merged    WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Merged]
Category        Left# Right# School                 Subject

Art                 1     1  RI School of Design    Painting
Art                 1     2  RI School of Design    Drawing
Art                 1     3  RI School of Design    Sculpture
Art                 2     1  Metropolitan Museum    Painting
Art                 2     2  Metropolitan Museum    Drawing
Art                 2     3  Metropolitan Museum    Sculpture
Culinary            1     0  Johnson & Wales
Liberal Arts        1     1  Oberlin                Literature
Liberal Arts        1     2  Oberlin                Philosophy
Liberal Arts        2     1  Antioch                Literature
Liberal Arts        2     2  Antioch                Philosophy
Liberal Arts        3     1  Swarthmore             Literature
Liberal Arts        3     2  Swarthmore             Philosophy
Psychoceramics      0     1                         Carberry
Psychoceramics      0     2                         Library science
Technical           1     1  MIT                    Calculus
Technical           1     2  MIT                    Physics
Technical           1     3  MIT                    Chemistry
Technical           1     4  MIT                    Engineering
Technical           2     1  Cal Tech               Calculus
Technical           2     2  Cal Tech               Physics
Technical           2     3  Cal Tech               Chemistry
Technical           2     4  Cal Tech               Engineering
Technical           3     1  ETH                    Calculus
Technical           3     2  ETH                    Physics
Technical           3     3  ETH                    Chemistry
Technical           3     4  ETH                    Engineering

Number of cases read:  27    Number of cases listed:  27

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Pre_Spine]
@Mny2Mny_NoBreak Category        @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec

         1        Art                      2                3
         1        Culinary                 1                0
         1        Liberal Arts             3                2
         1        Psychoceramics           0                2
         1        Technical                3                4


Number of cases read:  5    Number of cases listed:  5

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:08       |
|-----------------------------|---------------------------|
  [Spine] [listing has been hand-reformatted]
@Mny2Mny_    Category        @Mny2Mny_  @Mny2Mny_  Right#  Left#
NoBreak                      LMaxRec    RMaxRec

     1        Art                 2          3           1      1
     1        Art                 2          3           1      2
     1        Art                 2          3           2      1
     1        Art                 2          3           2      2
     1        Art                 2          3           3      1
     1        Art                 2          3           3      2
     1        Culinary            1          0           0      1
     1        Liberal Arts        3          2           1      1
     1        Liberal Arts        3          2           1      2
     1        Liberal Arts        3          2           1      3
     1        Liberal Arts        3          2           2      1
     1        Liberal Arts        3          2           2      2
     1        Liberal Arts        3          2           2      3
     1        Psychoceramics      0          2           1      0
     1        Psychoceramics      0          2           2      0
     1        Technical           3          4           1      1
     1        Technical           3          4           1      2
     1        Technical           3          4           1      3
     1        Technical           3          4           2      1
     1        Technical           3          4           2      2
     1        Technical           3          4           2      3
     1        Technical           3          4           3      1
     1        Technical           3          4           3      2
     1        Technical           3          4           3      3
     1        Technical           3          4           4      1
     1        Technical           3          4           4      2
     1        Technical           3          4           4      3

Number of cases read:  27    Number of cases listed:  27
=======================================
APPENDIX II: Test data, and all code
=======================================
*  C:\Documents and Settings\Richard\My Documents            .
*    \Technical\spssx-l\Z-2014                               .
*    \2014-03-10 Marso-Re News from the SPSS Community.SPS   .

*  In response to posting                                    .
*  Date:     Mon, 10 Mar 2014 06:57:24 -0700                 .
*  From:     David Marso <[hidden email]>             .
*  Subject:  Re: News from the SPSS Community                .
*  To:       [hidden email]                        .

*  The response will be posted under subject line                   .
*  Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product .

*  David Marso wrote,                                               .
*  "Building Richard's 'spine' can be done easily with [MATRIX      .
*  function] KRONEKER."                                             .
*                                                                   .
*  See thread beginning with                                        .
*     Date:    Sun, 9 Mar 2014 17:54:52 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Macro !Mny2Mny, for many-to-many merge               .
*              and Cartesian product                                .
*     To:      [hidden email]                             .
*  especially                                                       .
*     Date:    Sun, 9 Mar 2014 17:08:56 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Re: Macro !Mny2Mny, for many-to-many merge           .
*              and Cartesian product                                .
*     To:      [hidden email]                             .

*  .................   Matrix <!>Mny2Mny       .................... .
*  ... This version modified, to save copies of the dataset from    .
*  ... which the 'spine' is built, and the spine itself.            .

DEFINE !Mny2Mny(LeftIn  = !CHAREND('/')
                /RightIn = !CHAREND('/')
                /BYvar   = !CHAREND('/')
                /LRename = !CHAREND('/')
                /LDrop   = !CHAREND('/')
                /LKeep   = !CHAREND('/') !DEFAULT(ALL)
                /RRename = !CHAREND('/')
                /RDrop   = !CHAREND('/')
                /RKeep   = !CHAREND('/') !DEFAULT(ALL)
                /Join    = !CHAREND('/') !DEFAULT(Outer)
                /MaxGrp  = !TOKENS(1)    !DEFAULT(50))

    ECHO " ".
    ECHO !QUOTE(!CONCAT('Macro !','Mny2Mny:')).

*  .......     Section I:   Input parsing and error checking   ....... .
    !LET !Error = No

    !IF (   !LeftIn  !EQ !NULL
        !OR !RightIn !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Arguments LeftIn and RightIn (the two inputs)".
        ECHO "      are required.".
    !IFEND

    !IF (   !RightIn !EQ '*') !THEN
        !LET !Error = Yes
        ECHO "The Right input may not be '*'".
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'OUTER') !THEN
       !LET !JType = 'OUTER'
       !LET !JMsg  = 'Outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFT OUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFTOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'RIGHT OUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'RIGHTOUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'ROUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'INNER') !THEN
       !LET !JType = 'INNER'
       !LET !JMsg  = 'Inner join'
    !IFEND

    !IF (!BYvar         !EQ !NULL)   !THEN
       !LET !JType = 'CARTPROD'
       !LET !JMsg   = 'Cartesian product'
    !IFEND

    !IF (!JType !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Invalid join type:".
        ECHO !QUOTE(!CONCAT('    Join=',!Join)).
    !IFEND


*  .......     Section II:  Informational messages            ........ .
    !IF (!Error !EQ No) !THEN
        ECHO !QUOTE(!CONCAT(!JMsg,' of')).
        ECHO !QUOTE(!CONCAT('   ',!LeftIn)).
        !IF (        !LRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!LRename))
        !IFEND
        !IF (        !LDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!LDrop))
        !IFEND
        !IF (!UPCASE(!LKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!LKeep))
        !IFEND

        ECHO !QUOTE(!CONCAT('to ',!RightIn)).
        !IF (        !RRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!RRename))
        !IFEND
        !IF (        !RDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!RDrop))
        !IFEND
        !IF (!UPCASE(!RKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!RKeep))
        !IFEND
        !IF (!BYvar  !NE !NULL)          !THEN
           ECHO !QUOTE(!CONCAT('by ',!BYvar)).
        !IFEND
      !IFEND

*  .......     Section III: Processing                          ....... .
*  .......     III.A  Sequence-numbered copies of the inputs    ....... .
    !IF (!Error !EQ No) !THEN
        ADD FILES
           /FILE=!Leftin.
        NUMERIC @Mny2Mny_NoBreak (F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !LRename
        !IFEND
           /BY      @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!LDrop   !NE !NULL) !THEN
           /DROP   = !LDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !LKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Left# = 1.
        ELSE.
        .  COMPUTE Left# = LAG(Left#) + 1.
        END IF.
        FORMATS    Left# (F4).
        DATASET NAME      @Mny2Mny_Left#d WINDOW=ASIS.
        /*-- LIST /*-*/.

        DATASET DECLARE   @Mny2Mny_Left##.
        AGGREGATE OUTFILE=@Mny2Mny_Left##
           /BREAK = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_LMaxRec = MAX(Left#).


        ADD FILES
           /FILE=!RightIn.
        NUMERIC @Mny2Mny_NoBreak(F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !RRename
        !IFEND
           /BY @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!RDrop   !NE !NULL) !THEN
           /DROP   = !RDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !RKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Right# = 1.
        ELSE.
        .  COMPUTE Right# = LAG(Right#) + 1.
        END IF.
        FORMATS    Right# (F4).
        DATASET NAME      @Mny2Mny_Right#d WINDOW=ASIS.
        /*-- LIST /*-*/.

*  .......     III.B  Build 'spine' of all keys for all records ....... .
*  .......     III.B.1  Combine record counts from the inputs   ....... .
        AGGREGATE OUTFILE=*
           /BREAK   = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_RMaxRec = MAX(Right#).
        /*-- LIST /*-*/.

        MATCH FILES
           /FILE    = @Mny2Mny_Left##
           /FILE    = *
           /BY @Mny2Mny_NoBreak !BYvar.

        RECODE @Mny2Mny_LMaxRec
               @Mny2Mny_RMaxRec (MISSING=0).

*  .......     III.B.2  Select, based on join type              ....... .
       !IF  (!Jtype !EQ 'INNER') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0
               AND @Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'LEFTOUT') !THEN
        SELECT IF (@Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'RIGHTOUT') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0).
       !IFEND

        * ..................................... .
        * For OUTER join and Cartesian product, .
        * no selection is necessary.            .
        * ..................................... .

*  .......     III.B.3  Error message, if record count exceeds  ....... .
*                       the allowed maximum                     ....... .

        IF $CASENUM EQ 1 #Overflow = 0.

        DO IF    @Mny2Mny_LMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

        DO IF    @Mny2Mny_RMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

*  .......     III.B.4  Build the 'spine', of all record keys   ........ .

*  .......  From the file with counts of records from both       ........ .
*  .......  inputs by BYvar values (which is currently active),  ........ .
*  .......  'unroll' to build a file with all combinations of    ........ .
*  .......  BYvar values, and pairs of record sequence numbers,  ........ .
*  .......  that will occur in the output.                       ........ .
*  .......  This file is the 'spine'.                            ........ .

        /**/ DATASET COPY Pre_Spine  /*-*/.

        * 'Unroll' the Right record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_RMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_RMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        !LET !Sfx = !LENGTH (!BLANKS (!MaxGrp))

        OMS /SELECT ALL
            /IF COMMANDS = ["Variables to Cases"]
            /DESTINATION   VIEWER = NO
            /TAG = "@MNY2MNY_V2C_ZAP".

        VARSTOCASES
           /MAKE Right# FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        * 'Unroll' the Left  record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_LMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_LMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        VARSTOCASES
           /MAKE Left#  FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        OMSEND TAG=["@MNY2MNY_V2C_ZAP"].

        /**/ DATASET COPY Spine      /*-*/.

*  .......     III.C  Add the substantive data                  ....... .
        MATCH FILES
           /FILE=*
           /TABLE=@Mny2Mny_Right#d
           /BY !BYvar Right#
           /DROP @Mny2Mny_NoBreak @Mny2Mny_NewGrp
                 @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec.

        SORT CASES BY !BYvar Left# Right#.

        MATCH FILES
           /TABLE=@Mny2Mny_Left#d
           /FILE=*
           /BY   !BYvar Left#
           /DROP  @Mny2Mny_NoBreak @Mny2Mny_NewGrp
           /KEEP !BYvar Left# Right# ALL.

        EXECUTE /* so input datasets can be dropped */.

        DATASET CLOSE @Mny2Mny_Left#d.
        DATASET CLOSE @Mny2Mny_Right#d.
        DATASET CLOSE @Mny2Mny_Left##.
    !IFEND
!ENDDEFINE.

*  .................   Test data               .................... .

DATA LIST FIXED
    /Category      04-18 (A)
     School        19-40 (A).
BEGIN DATA
    Art            RI School of Design
    Art            Metropolitan Museum
    Culinary       Johnson & Wales
    Liberal Arts   Oberlin
    Liberal Arts   Antioch
    Liberal Arts   Swarthmore
    Technical      MIT
    Technical      Cal Tech
    Technical      ETH
END DATA.
DATASET NAME     Schools.
.  /*-- LIST /*-*/.

DATA LIST FIXED
    /Category      04-18 (A)
     Subject       19-35 (A).
BEGIN DATA
    Art            Painting
    Art            Drawing
    Art            Sculpture
    Liberal Arts   Literature
    Liberal Arts   Philosophy
    Psychoceramics Carberry
    Psychoceramics Library science
    Technical      Calculus
    Technical      Physics
    Technical      Chemistry
    Technical      Engineering
END DATA.
DATASET NAME    Subjects.
*  ...................................................... .

DATASET ACTIVATE Schools    WINDOW=FRONT.
LIST.
DATASET ACTIVATE Subjects   WINDOW=FRONT.
LIST.


!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

DATASET NAME     Merged    WINDOW=FRONT.
LIST.

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
A *VERY* MUCH improved MATRIX solution from previous.

DATASET DECLARE spine .
MATRIX.
GET DATA /FILE * / VARIABLES @IN2@  AR_Category NPerCat NCatPerFile.
COMPUTE Base1=Data(1:Data(1,4),2:3).
COMPUTE Base2=Data((Data(1,4)+1):NROW(Data),2:3).
COMPUTE MaxCat=MMAX(Data(:,2)).
COMPUTE XX=MAKE(MaxCat,2,0).
LOOP #=1 TO MaxCat .
+  DO IF NRow(Base1) GE #.
+    COMPUTE XX(Base1(#,1),1)=Base1(#,2).
+  END IF.
+  DO IF NRow(Base2) GE #.
+    COMPUTE XX(Base2(#,1),2)=Base2(#,2).
+  END IF.
END LOOP.
COMPUTE C={T({1:MaxCat}),XX,RMAX({XX(:,1) &* XX(:,2),XX})}.
LOOP #=1 TO NRow(C).
COMPUTE T1=T({1:MMAX({C(#,2),1})}).
COMPUTE T2=T({1:MMAX({C(#,3),1})}).
SAVE ({MAKE(C(#,4),1,C(#,1)),
       KRONEKER(T1,MAKE(NROW(T2),1,1)),
       KRONEKER(MAKE(NROW(T1),1,1),T2)})
   / OUTFILE Spine / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.
MATCH FILES FILE spine  / TABLE stacked  / BY AR_Category F1Instance.
SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE stack2  / BY AR_Category F2Instance.
EXECUTE.
DATASET NAME MergedCartProd.
DATASET ACTIVATE MergedCartProd.

David Marso wrote
Hi Richard and whoever is interested:
This is a very crude basic sketch and I have made no effort to turn it into a macro at this point.
Basically rather than have the variables as a set be the key, imagine one has mapped these to a single variable using a counter or some other device.
The logic in the matrix program then generates the necessary number of records for merging into the two data sets.  Since the 'spine' is already ordered properly for the left file we immediately merge it in as a TABLE.  SORT to align with the 'right' file and match that.
I hope this makes sense ;-)))
----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.
SORT CASES BY Category.
DATASET NAME f2.
LIST.
ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
IF $CASENUM EQ 1 OR @IN2@ NE LAG(@IN2@) OR Category NE LAG(Category) InternalCount=1.
IF MISSING(InternalCount) InternalCount=LAG(InternalCount)+1.
DATASET NAME stacked.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .
DATASET Activate stacked.
DATASET COPY stack2.
SELECT IF NOT(@IN2@).
EXECUTE.
DELETE VARIABLES Subject.
RENAME VARIABLES INTERNALCount=F1INstance.

DATASET Activate stack2.
SELECT IF @IN2@.
EXECUTE.
DELETE VARIABLES School.

RENAME VARIABLES INTERNALCount=F2INstance.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

DATASET DECLARE spine .
MATRIX.
GET DATA /FILE * / VARIABLES @IN2@  AR_Category NPerCat NCatPerFile.
COMPUTE Base1=Data(1:Data(1,4),2:3).
COMPUTE Base2=Data((Data(1,4)+1):NROW(Data),2:3).
COMPUTE MaxCat=MMAX(Data(:,2)).
COMPUTE X1=MAKE(MaxCat,1,0).
COMPUTE X2=MAKE(MaxCat,1,0).
LOOP #=1 TO NROW(Base1) .
COMPUTE X1(Base1(#,1))=Base1(#,2).
END LOOP.
LOOP #=1 TO NROW(Base2) .
COMPUTE X2(Base2(#,1))=Base2(#,2).
END LOOP.
COMPUTE M12=RMAX({{X1 &*X2},{X1,X2}}).
COMPUTE C={T({1:MaxCat}),X1,X2,M12}.
COMPUTE Spine=MAKE(MSUM(M12),3,0).
COMPUTE CurRow=1.
LOOP #=1 TO NRow(C).
COMPUTE LastRow=CurRow+C(#,4)-1.
COMPUTE T1=T({1:MMAX({C(#,2),1})}).
COMPUTE T2=T({1:MMAX({C(#,3),1})}).
COMPUTE T0=MAKE(C(#,4),1,C(#,1)).
COMPUTE Spine(CurRow:LastRow,:)=
          {T0,KRONEKER(T1,MAKE(NROW(T2),1,1)),
              KRONEKER(MAKE(NROW(T1),1,1),T2)}.
COMPUTE CurRow=LastRow+1.
END LOOP.
SAVE Spine   / OUTFILE Spine    / VARIABLES AR_Category F1Instance F2Instance.
END MATRIX.
MATCH FILES FILE spine  / TABLE stacked  / BY AR_Category F1Instance.
SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE stack2  / BY AR_Category F2Instance.



Richard Ristow wrote
At 08:57 AM 3/10/2014, David Marso wrote, under subject line
"Re: News from the SPSS Community":

>Building Richard's 'spine' can be done easily with KRONEKER.

I'd be interested in how to do this, since I don't find the
double-VARSTOCASES approach wholly satisfactory(*).

The input from which the 'spine' is generated consists of an
indefinite number of records, each containing,
. One key value, where the key can consist of any number of variables
of any datatypes. (The key value is unique in this file.)
. A left-hand record count, GE 0:  Rl
. A right-hand record count, GE 0: Rr

The output, the spine itself, contains the same key values, but with
Rl*Rr records for each (or Rl or Rr records, when an outer join is
being performed and one of the values is 0). There's one record for
each {l,r:1<=l<=Rl,1<=r<=Rr} (except that l is 0 if Rl is 0, and r is
0 if Rr is 0.)

I'm appending Appendix I (with the output) and Appendix II (with the
test data and all code, including the macro definition) from a small
demonstration run. In the output, the pre-spine and spine are listed,
following the listing of the merged data.  You'll see that both have
variables beyond the basics I listed above. Those are variables used
in earlier stages, which are carried forward past when they are no
longer needed, to the first time they can conveniently be eliminated.

Finally, notice that this is a true outer join, with unmatched
records from both inputs appearing in the output.
======================================
(*)Definition and use of the 'spine': see
Date:    Sun, 9 Mar 2014 17:08:56 -0500
From:    Richard Ristow <[hidden email]>
Subject: Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product
To:      [hidden email]
=======================================
APPENDIX I:  Test and demonstration run
=======================================
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Schools]
Category        School

Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH

Number of cases read:  9    Number of cases listed:  9
...
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Subjects]
Category        Subject

Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering

Number of cases read:  11    Number of cases listed:  11

!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

Macro !Mny2Mny:
Outer join of
    Schools
to Subjects
by Category

DATASET NAME     Merged    WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Merged]
Category        Left# Right# School                 Subject

Art                 1     1  RI School of Design    Painting
Art                 1     2  RI School of Design    Drawing
Art                 1     3  RI School of Design    Sculpture
Art                 2     1  Metropolitan Museum    Painting
Art                 2     2  Metropolitan Museum    Drawing
Art                 2     3  Metropolitan Museum    Sculpture
Culinary            1     0  Johnson & Wales
Liberal Arts        1     1  Oberlin                Literature
Liberal Arts        1     2  Oberlin                Philosophy
Liberal Arts        2     1  Antioch                Literature
Liberal Arts        2     2  Antioch                Philosophy
Liberal Arts        3     1  Swarthmore             Literature
Liberal Arts        3     2  Swarthmore             Philosophy
Psychoceramics      0     1                         Carberry
Psychoceramics      0     2                         Library science
Technical           1     1  MIT                    Calculus
Technical           1     2  MIT                    Physics
Technical           1     3  MIT                    Chemistry
Technical           1     4  MIT                    Engineering
Technical           2     1  Cal Tech               Calculus
Technical           2     2  Cal Tech               Physics
Technical           2     3  Cal Tech               Chemistry
Technical           2     4  Cal Tech               Engineering
Technical           3     1  ETH                    Calculus
Technical           3     2  ETH                    Physics
Technical           3     3  ETH                    Chemistry
Technical           3     4  ETH                    Engineering

Number of cases read:  27    Number of cases listed:  27

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Pre_Spine]
@Mny2Mny_NoBreak Category        @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec

         1        Art                      2                3
         1        Culinary                 1                0
         1        Liberal Arts             3                2
         1        Psychoceramics           0                2
         1        Technical                3                4


Number of cases read:  5    Number of cases listed:  5

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:08       |
|-----------------------------|---------------------------|
  [Spine] [listing has been hand-reformatted]
@Mny2Mny_    Category        @Mny2Mny_  @Mny2Mny_  Right#  Left#
NoBreak                      LMaxRec    RMaxRec

     1        Art                 2          3           1      1
     1        Art                 2          3           1      2
     1        Art                 2          3           2      1
     1        Art                 2          3           2      2
     1        Art                 2          3           3      1
     1        Art                 2          3           3      2
     1        Culinary            1          0           0      1
     1        Liberal Arts        3          2           1      1
     1        Liberal Arts        3          2           1      2
     1        Liberal Arts        3          2           1      3
     1        Liberal Arts        3          2           2      1
     1        Liberal Arts        3          2           2      2
     1        Liberal Arts        3          2           2      3
     1        Psychoceramics      0          2           1      0
     1        Psychoceramics      0          2           2      0
     1        Technical           3          4           1      1
     1        Technical           3          4           1      2
     1        Technical           3          4           1      3
     1        Technical           3          4           2      1
     1        Technical           3          4           2      2
     1        Technical           3          4           2      3
     1        Technical           3          4           3      1
     1        Technical           3          4           3      2
     1        Technical           3          4           3      3
     1        Technical           3          4           4      1
     1        Technical           3          4           4      2
     1        Technical           3          4           4      3

Number of cases read:  27    Number of cases listed:  27
=======================================
APPENDIX II: Test data, and all code
=======================================
*  C:\Documents and Settings\Richard\My Documents            .
*    \Technical\spssx-l\Z-2014                               .
*    \2014-03-10 Marso-Re News from the SPSS Community.SPS   .

*  In response to posting                                    .
*  Date:     Mon, 10 Mar 2014 06:57:24 -0700                 .
*  From:     David Marso <[hidden email]>             .
*  Subject:  Re: News from the SPSS Community                .
*  To:       [hidden email]                        .

*  The response will be posted under subject line                   .
*  Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product .

*  David Marso wrote,                                               .
*  "Building Richard's 'spine' can be done easily with [MATRIX      .
*  function] KRONEKER."                                             .
*                                                                   .
*  See thread beginning with                                        .
*     Date:    Sun, 9 Mar 2014 17:54:52 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Macro !Mny2Mny, for many-to-many merge               .
*              and Cartesian product                                .
*     To:      [hidden email]                             .
*  especially                                                       .
*     Date:    Sun, 9 Mar 2014 17:08:56 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Re: Macro !Mny2Mny, for many-to-many merge           .
*              and Cartesian product                                .
*     To:      [hidden email]                             .

*  .................   Matrix <!>Mny2Mny       .................... .
*  ... This version modified, to save copies of the dataset from    .
*  ... which the 'spine' is built, and the spine itself.            .

DEFINE !Mny2Mny(LeftIn  = !CHAREND('/')
                /RightIn = !CHAREND('/')
                /BYvar   = !CHAREND('/')
                /LRename = !CHAREND('/')
                /LDrop   = !CHAREND('/')
                /LKeep   = !CHAREND('/') !DEFAULT(ALL)
                /RRename = !CHAREND('/')
                /RDrop   = !CHAREND('/')
                /RKeep   = !CHAREND('/') !DEFAULT(ALL)
                /Join    = !CHAREND('/') !DEFAULT(Outer)
                /MaxGrp  = !TOKENS(1)    !DEFAULT(50))

    ECHO " ".
    ECHO !QUOTE(!CONCAT('Macro !','Mny2Mny:')).

*  .......     Section I:   Input parsing and error checking   ....... .
    !LET !Error = No

    !IF (   !LeftIn  !EQ !NULL
        !OR !RightIn !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Arguments LeftIn and RightIn (the two inputs)".
        ECHO "      are required.".
    !IFEND

    !IF (   !RightIn !EQ '*') !THEN
        !LET !Error = Yes
        ECHO "The Right input may not be '*'".
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'OUTER') !THEN
       !LET !JType = 'OUTER'
       !LET !JMsg  = 'Outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFT OUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFTOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'RIGHT OUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'RIGHTOUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'ROUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'INNER') !THEN
       !LET !JType = 'INNER'
       !LET !JMsg  = 'Inner join'
    !IFEND

    !IF (!BYvar         !EQ !NULL)   !THEN
       !LET !JType = 'CARTPROD'
       !LET !JMsg   = 'Cartesian product'
    !IFEND

    !IF (!JType !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Invalid join type:".
        ECHO !QUOTE(!CONCAT('    Join=',!Join)).
    !IFEND


*  .......     Section II:  Informational messages            ........ .
    !IF (!Error !EQ No) !THEN
        ECHO !QUOTE(!CONCAT(!JMsg,' of')).
        ECHO !QUOTE(!CONCAT('   ',!LeftIn)).
        !IF (        !LRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!LRename))
        !IFEND
        !IF (        !LDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!LDrop))
        !IFEND
        !IF (!UPCASE(!LKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!LKeep))
        !IFEND

        ECHO !QUOTE(!CONCAT('to ',!RightIn)).
        !IF (        !RRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!RRename))
        !IFEND
        !IF (        !RDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!RDrop))
        !IFEND
        !IF (!UPCASE(!RKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!RKeep))
        !IFEND
        !IF (!BYvar  !NE !NULL)          !THEN
           ECHO !QUOTE(!CONCAT('by ',!BYvar)).
        !IFEND
      !IFEND

*  .......     Section III: Processing                          ....... .
*  .......     III.A  Sequence-numbered copies of the inputs    ....... .
    !IF (!Error !EQ No) !THEN
        ADD FILES
           /FILE=!Leftin.
        NUMERIC @Mny2Mny_NoBreak (F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !LRename
        !IFEND
           /BY      @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!LDrop   !NE !NULL) !THEN
           /DROP   = !LDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !LKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Left# = 1.
        ELSE.
        .  COMPUTE Left# = LAG(Left#) + 1.
        END IF.
        FORMATS    Left# (F4).
        DATASET NAME      @Mny2Mny_Left#d WINDOW=ASIS.
        /*-- LIST /*-*/.

        DATASET DECLARE   @Mny2Mny_Left##.
        AGGREGATE OUTFILE=@Mny2Mny_Left##
           /BREAK = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_LMaxRec = MAX(Left#).


        ADD FILES
           /FILE=!RightIn.
        NUMERIC @Mny2Mny_NoBreak(F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !RRename
        !IFEND
           /BY @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!RDrop   !NE !NULL) !THEN
           /DROP   = !RDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !RKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Right# = 1.
        ELSE.
        .  COMPUTE Right# = LAG(Right#) + 1.
        END IF.
        FORMATS    Right# (F4).
        DATASET NAME      @Mny2Mny_Right#d WINDOW=ASIS.
        /*-- LIST /*-*/.

*  .......     III.B  Build 'spine' of all keys for all records ....... .
*  .......     III.B.1  Combine record counts from the inputs   ....... .
        AGGREGATE OUTFILE=*
           /BREAK   = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_RMaxRec = MAX(Right#).
        /*-- LIST /*-*/.

        MATCH FILES
           /FILE    = @Mny2Mny_Left##
           /FILE    = *
           /BY @Mny2Mny_NoBreak !BYvar.

        RECODE @Mny2Mny_LMaxRec
               @Mny2Mny_RMaxRec (MISSING=0).

*  .......     III.B.2  Select, based on join type              ....... .
       !IF  (!Jtype !EQ 'INNER') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0
               AND @Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'LEFTOUT') !THEN
        SELECT IF (@Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'RIGHTOUT') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0).
       !IFEND

        * ..................................... .
        * For OUTER join and Cartesian product, .
        * no selection is necessary.            .
        * ..................................... .

*  .......     III.B.3  Error message, if record count exceeds  ....... .
*                       the allowed maximum                     ....... .

        IF $CASENUM EQ 1 #Overflow = 0.

        DO IF    @Mny2Mny_LMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

        DO IF    @Mny2Mny_RMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

*  .......     III.B.4  Build the 'spine', of all record keys   ........ .

*  .......  From the file with counts of records from both       ........ .
*  .......  inputs by BYvar values (which is currently active),  ........ .
*  .......  'unroll' to build a file with all combinations of    ........ .
*  .......  BYvar values, and pairs of record sequence numbers,  ........ .
*  .......  that will occur in the output.                       ........ .
*  .......  This file is the 'spine'.                            ........ .

        /**/ DATASET COPY Pre_Spine  /*-*/.

        * 'Unroll' the Right record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_RMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_RMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        !LET !Sfx = !LENGTH (!BLANKS (!MaxGrp))

        OMS /SELECT ALL
            /IF COMMANDS = ["Variables to Cases"]
            /DESTINATION   VIEWER = NO
            /TAG = "@MNY2MNY_V2C_ZAP".

        VARSTOCASES
           /MAKE Right# FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        * 'Unroll' the Left  record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_LMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_LMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        VARSTOCASES
           /MAKE Left#  FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        OMSEND TAG=["@MNY2MNY_V2C_ZAP"].

        /**/ DATASET COPY Spine      /*-*/.

*  .......     III.C  Add the substantive data                  ....... .
        MATCH FILES
           /FILE=*
           /TABLE=@Mny2Mny_Right#d
           /BY !BYvar Right#
           /DROP @Mny2Mny_NoBreak @Mny2Mny_NewGrp
                 @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec.

        SORT CASES BY !BYvar Left# Right#.

        MATCH FILES
           /TABLE=@Mny2Mny_Left#d
           /FILE=*
           /BY   !BYvar Left#
           /DROP  @Mny2Mny_NoBreak @Mny2Mny_NewGrp
           /KEEP !BYvar Left# Right# ALL.

        EXECUTE /* so input datasets can be dropped */.

        DATASET CLOSE @Mny2Mny_Left#d.
        DATASET CLOSE @Mny2Mny_Right#d.
        DATASET CLOSE @Mny2Mny_Left##.
    !IFEND
!ENDDEFINE.

*  .................   Test data               .................... .

DATA LIST FIXED
    /Category      04-18 (A)
     School        19-40 (A).
BEGIN DATA
    Art            RI School of Design
    Art            Metropolitan Museum
    Culinary       Johnson & Wales
    Liberal Arts   Oberlin
    Liberal Arts   Antioch
    Liberal Arts   Swarthmore
    Technical      MIT
    Technical      Cal Tech
    Technical      ETH
END DATA.
DATASET NAME     Schools.
.  /*-- LIST /*-*/.

DATA LIST FIXED
    /Category      04-18 (A)
     Subject       19-35 (A).
BEGIN DATA
    Art            Painting
    Art            Drawing
    Art            Sculpture
    Liberal Arts   Literature
    Liberal Arts   Philosophy
    Psychoceramics Carberry
    Psychoceramics Library science
    Technical      Calculus
    Technical      Physics
    Technical      Chemistry
    Technical      Engineering
END DATA.
DATASET NAME    Subjects.
*  ...................................................... .

DATASET ACTIVATE Schools    WINDOW=FRONT.
LIST.
DATASET ACTIVATE Subjects   WINDOW=FRONT.
LIST.


!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

DATASET NAME     Merged    WINDOW=FRONT.
LIST.

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Maguin, Eugene
Hi David,
I was running your code to really see how it worked, particularly the Kronecker section because my reading and understanding of the function is that it is a product. I'm not asking for a comment on the kronecker function. You're a highly skilled programmer; let me see if I can get it.

But before I got there, I ran into a problem with the code.  Following is a section of your code with my comments.

DATASET Activate stacked.
DATASET COPY stack2.

*  these 4 lines operate on stack2 and functionally delete the f2 dataset.
SELECT IF NOT(@IN2@).
EXECUTE.
DELETE VARIABLES Subject.
RENAME VARIABLES INTERNALCount=F1INstance.

DATASET Activate stack2.
SELECT IF @IN2@.
EXECUTE.  /* with the execution of this command, stack2 has no records and variables of category, @in2@, AR_category, F1Instance.
DELETE VARIABLES School.
RENAME VARIABLES INTERNALCount=F2INstance. /*<<< InternalCount does not exist in stack2.


Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
Actually the copy is not active at the point of the code you indicate.
Stacked is the active file.
I could probably have done this using AUTORECODE with templates but it would be messier.




On Wed, Mar 12, 2014 at 10:11 AM, Maguin, Eugene [via SPSSX Discussion] <[hidden email]> wrote:
Hi David,
I was running your code to really see how it worked, particularly the Kronecker section because my reading and understanding of the function is that it is a product. I'm not asking for a comment on the kronecker function. You're a highly skilled programmer; let me see if I can get it.

But before I got there, I ran into a problem with the code.  Following is a section of your code with my comments.

DATASET Activate stacked.
DATASET COPY stack2.

*  these 4 lines operate on stack2 and functionally delete the f2 dataset.
SELECT IF NOT(@IN2@).
EXECUTE.
DELETE VARIABLES Subject.
RENAME VARIABLES INTERNALCount=F1INstance.

DATASET Activate stack2.
SELECT IF @IN2@.
EXECUTE.  /* with the execution of this command, stack2 has no records and variables of category, @in2@, AR_category, F1Instance.
DELETE VARIABLES School.
RENAME VARIABLES INTERNALCount=F2INstance. /*<<< InternalCount does not exist in stack2.


Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



To unsubscribe from MATRIX code to create 'spine' for R. Ristows Macro !Mny2Mny, for many-to-many merge and Cartesian product, click here.
NAML

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Bruce Weaver
Administrator
In reply to this post by David Marso
Once again, I find that referring to a dataset name on the SAVE command in the MATRIX program is generating an error for me.  

Here is the error message that appears after END MATRIX:

Run MATRIX procedure:
 
>Error # 34 in column 12.  Text: spine
>SPSS Statistics cannot access a file with the given file specification.  The
>file specification is either syntactically invalid, specifies an invalid
>drive, specifies a protected directory, specifies a protected file, or
>specifies a non-sharable file.
>Execution of this command stops.
 
------ END MATRIX -----

To avoid it, I have to change spine to * in the SAVE command.  
Version info:  SPSS 21.0 0.2, 64-bit version; Windows 7 Professional, SP1

David, I assume you are not having a problem with this on your system?  What is your version info?  


Here is a slightly modified version of David's syntax that runs error-free for me.


NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.
SORT CASES BY Category.
DATASET NAME f2.
LIST.
ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
IF $CASENUM EQ 1 OR @IN2@ NE LAG(@IN2@) OR Category NE LAG(Category) InternalCount=1.
IF MISSING(InternalCount) InternalCount=LAG(InternalCount)+1.
DATASET NAME stacked.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .
DATASET Activate stacked.
DATASET COPY stack2.
SELECT IF NOT(@IN2@).
EXECUTE.
DELETE VARIABLES Subject.
RENAME VARIABLES INTERNALCount=F1INstance.

DATASET Activate stack2.
SELECT IF @IN2@.
EXECUTE.
DELETE VARIABLES School.

RENAME VARIABLES INTERNALCount=F2INstance.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

DATASET DECLARE spine . /* Removed this from DM's code .
MATRIX.
GET DATA /FILE * / VARIABLES @IN2@  AR_Category NPerCat NCatPerFile.
COMPUTE Base1=Data(1:Data(1,4),2:3).
COMPUTE Base2=Data((Data(1,4)+1):NROW(Data),2:3).
COMPUTE MaxCat=MMAX(Data(:,2)).
COMPUTE XX=MAKE(MaxCat,2,0).
LOOP #=1 TO MaxCat .
+  DO IF NRow(Base1) GE #.
+    COMPUTE XX(Base1(#,1),1)=Base1(#,2).
+  END IF.
+  DO IF NRow(Base2) GE #.
+    COMPUTE XX(Base2(#,1),2)=Base2(#,2).
+  END IF.
END LOOP.
COMPUTE C={T({1:MaxCat}),XX,RMAX({XX(:,1) &* XX(:,2),XX})}.
LOOP #=1 TO NRow(C).
COMPUTE T1=T({1:MMAX({C(#,2),1})}).
COMPUTE T2=T({1:MMAX({C(#,3),1})}).

***** Change "OUTFILE spine" to "OUTFILE *"  *******.

SAVE ({MAKE(C(#,4),1,C(#,1)),
       KRONEKER(T1,MAKE(NROW(T2),1,1)),
       KRONEKER(MAKE(NROW(T1),1,1),T2)})
   / OUTFILE spine / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.

******************* .
DATASET NAME spine. /* Added this to DM's code.
******************* .

MATCH FILES FILE spine  / TABLE stacked  / BY AR_Category F1Instance.
SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE stack2  / BY AR_Category F2Instance.
EXECUTE.
DATASET NAME MergedCartProd.
DATASET ACTIVATE MergedCartProd.
FORMATS F1instance F2instance(f2.0).
LIST.



David Marso wrote
A *VERY* MUCH improved MATRIX solution from previous.

DATASET DECLARE spine .
MATRIX.
GET DATA /FILE * / VARIABLES @IN2@  AR_Category NPerCat NCatPerFile.
COMPUTE Base1=Data(1:Data(1,4),2:3).
COMPUTE Base2=Data((Data(1,4)+1):NROW(Data),2:3).
COMPUTE MaxCat=MMAX(Data(:,2)).
COMPUTE XX=MAKE(MaxCat,2,0).
LOOP #=1 TO MaxCat .
+  DO IF NRow(Base1) GE #.
+    COMPUTE XX(Base1(#,1),1)=Base1(#,2).
+  END IF.
+  DO IF NRow(Base2) GE #.
+    COMPUTE XX(Base2(#,1),2)=Base2(#,2).
+  END IF.
END LOOP.
COMPUTE C={T({1:MaxCat}),XX,RMAX({XX(:,1) &* XX(:,2),XX})}.
LOOP #=1 TO NRow(C).
COMPUTE T1=T({1:MMAX({C(#,2),1})}).
COMPUTE T2=T({1:MMAX({C(#,3),1})}).
SAVE ({MAKE(C(#,4),1,C(#,1)),
       KRONEKER(T1,MAKE(NROW(T2),1,1)),
       KRONEKER(MAKE(NROW(T1),1,1),T2)})
   / OUTFILE Spine / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.
MATCH FILES FILE spine  / TABLE stacked  / BY AR_Category F1Instance.
SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE stack2  / BY AR_Category F2Instance.
EXECUTE.
DATASET NAME MergedCartProd.
DATASET ACTIVATE MergedCartProd.

David Marso wrote
Hi Richard and whoever is interested:
This is a very crude basic sketch and I have made no effort to turn it into a macro at this point.
Basically rather than have the variables as a set be the key, imagine one has mapped these to a single variable using a counter or some other device.
The logic in the matrix program then generates the necessary number of records for merging into the two data sets.  Since the 'spine' is already ordered properly for the left file we immediately merge it in as a TABLE.  SORT to align with the 'right' file and match that.
I hope this makes sense ;-)))
----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.
SORT CASES BY Category.
DATASET NAME f2.
LIST.
ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
IF $CASENUM EQ 1 OR @IN2@ NE LAG(@IN2@) OR Category NE LAG(Category) InternalCount=1.
IF MISSING(InternalCount) InternalCount=LAG(InternalCount)+1.
DATASET NAME stacked.
DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .
DATASET Activate stacked.
DATASET COPY stack2.
SELECT IF NOT(@IN2@).
EXECUTE.
DELETE VARIABLES Subject.
RENAME VARIABLES INTERNALCount=F1INstance.

DATASET Activate stack2.
SELECT IF @IN2@.
EXECUTE.
DELETE VARIABLES School.

RENAME VARIABLES INTERNALCount=F2INstance.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

DATASET DECLARE spine .
MATRIX.
GET DATA /FILE * / VARIABLES @IN2@  AR_Category NPerCat NCatPerFile.
COMPUTE Base1=Data(1:Data(1,4),2:3).
COMPUTE Base2=Data((Data(1,4)+1):NROW(Data),2:3).
COMPUTE MaxCat=MMAX(Data(:,2)).
COMPUTE X1=MAKE(MaxCat,1,0).
COMPUTE X2=MAKE(MaxCat,1,0).
LOOP #=1 TO NROW(Base1) .
COMPUTE X1(Base1(#,1))=Base1(#,2).
END LOOP.
LOOP #=1 TO NROW(Base2) .
COMPUTE X2(Base2(#,1))=Base2(#,2).
END LOOP.
COMPUTE M12=RMAX({{X1 &*X2},{X1,X2}}).
COMPUTE C={T({1:MaxCat}),X1,X2,M12}.
COMPUTE Spine=MAKE(MSUM(M12),3,0).
COMPUTE CurRow=1.
LOOP #=1 TO NRow(C).
COMPUTE LastRow=CurRow+C(#,4)-1.
COMPUTE T1=T({1:MMAX({C(#,2),1})}).
COMPUTE T2=T({1:MMAX({C(#,3),1})}).
COMPUTE T0=MAKE(C(#,4),1,C(#,1)).
COMPUTE Spine(CurRow:LastRow,:)=
          {T0,KRONEKER(T1,MAKE(NROW(T2),1,1)),
              KRONEKER(MAKE(NROW(T1),1,1),T2)}.
COMPUTE CurRow=LastRow+1.
END LOOP.
SAVE Spine   / OUTFILE Spine    / VARIABLES AR_Category F1Instance F2Instance.
END MATRIX.
MATCH FILES FILE spine  / TABLE stacked  / BY AR_Category F1Instance.
SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE stack2  / BY AR_Category F2Instance.



Richard Ristow wrote
At 08:57 AM 3/10/2014, David Marso wrote, under subject line
"Re: News from the SPSS Community":

>Building Richard's 'spine' can be done easily with KRONEKER.

I'd be interested in how to do this, since I don't find the
double-VARSTOCASES approach wholly satisfactory(*).

The input from which the 'spine' is generated consists of an
indefinite number of records, each containing,
. One key value, where the key can consist of any number of variables
of any datatypes. (The key value is unique in this file.)
. A left-hand record count, GE 0:  Rl
. A right-hand record count, GE 0: Rr

The output, the spine itself, contains the same key values, but with
Rl*Rr records for each (or Rl or Rr records, when an outer join is
being performed and one of the values is 0). There's one record for
each {l,r:1<=l<=Rl,1<=r<=Rr} (except that l is 0 if Rl is 0, and r is
0 if Rr is 0.)

I'm appending Appendix I (with the output) and Appendix II (with the
test data and all code, including the macro definition) from a small
demonstration run. In the output, the pre-spine and spine are listed,
following the listing of the merged data.  You'll see that both have
variables beyond the basics I listed above. Those are variables used
in earlier stages, which are carried forward past when they are no
longer needed, to the first time they can conveniently be eliminated.

Finally, notice that this is a true outer join, with unmatched
records from both inputs appearing in the output.
======================================
(*)Definition and use of the 'spine': see
Date:    Sun, 9 Mar 2014 17:08:56 -0500
From:    Richard Ristow <[hidden email]>
Subject: Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product
To:      [hidden email]
=======================================
APPENDIX I:  Test and demonstration run
=======================================
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Schools]
Category        School

Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH

Number of cases read:  9    Number of cases listed:  9
...
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Subjects]
Category        Subject

Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering

Number of cases read:  11    Number of cases listed:  11

!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

Macro !Mny2Mny:
Outer join of
    Schools
to Subjects
by Category

DATASET NAME     Merged    WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Merged]
Category        Left# Right# School                 Subject

Art                 1     1  RI School of Design    Painting
Art                 1     2  RI School of Design    Drawing
Art                 1     3  RI School of Design    Sculpture
Art                 2     1  Metropolitan Museum    Painting
Art                 2     2  Metropolitan Museum    Drawing
Art                 2     3  Metropolitan Museum    Sculpture
Culinary            1     0  Johnson & Wales
Liberal Arts        1     1  Oberlin                Literature
Liberal Arts        1     2  Oberlin                Philosophy
Liberal Arts        2     1  Antioch                Literature
Liberal Arts        2     2  Antioch                Philosophy
Liberal Arts        3     1  Swarthmore             Literature
Liberal Arts        3     2  Swarthmore             Philosophy
Psychoceramics      0     1                         Carberry
Psychoceramics      0     2                         Library science
Technical           1     1  MIT                    Calculus
Technical           1     2  MIT                    Physics
Technical           1     3  MIT                    Chemistry
Technical           1     4  MIT                    Engineering
Technical           2     1  Cal Tech               Calculus
Technical           2     2  Cal Tech               Physics
Technical           2     3  Cal Tech               Chemistry
Technical           2     4  Cal Tech               Engineering
Technical           3     1  ETH                    Calculus
Technical           3     2  ETH                    Physics
Technical           3     3  ETH                    Chemistry
Technical           3     4  ETH                    Engineering

Number of cases read:  27    Number of cases listed:  27

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Pre_Spine]
@Mny2Mny_NoBreak Category        @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec

         1        Art                      2                3
         1        Culinary                 1                0
         1        Liberal Arts             3                2
         1        Psychoceramics           0                2
         1        Technical                3                4


Number of cases read:  5    Number of cases listed:  5

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:08       |
|-----------------------------|---------------------------|
  [Spine] [listing has been hand-reformatted]
@Mny2Mny_    Category        @Mny2Mny_  @Mny2Mny_  Right#  Left#
NoBreak                      LMaxRec    RMaxRec

     1        Art                 2          3           1      1
     1        Art                 2          3           1      2
     1        Art                 2          3           2      1
     1        Art                 2          3           2      2
     1        Art                 2          3           3      1
     1        Art                 2          3           3      2
     1        Culinary            1          0           0      1
     1        Liberal Arts        3          2           1      1
     1        Liberal Arts        3          2           1      2
     1        Liberal Arts        3          2           1      3
     1        Liberal Arts        3          2           2      1
     1        Liberal Arts        3          2           2      2
     1        Liberal Arts        3          2           2      3
     1        Psychoceramics      0          2           1      0
     1        Psychoceramics      0          2           2      0
     1        Technical           3          4           1      1
     1        Technical           3          4           1      2
     1        Technical           3          4           1      3
     1        Technical           3          4           2      1
     1        Technical           3          4           2      2
     1        Technical           3          4           2      3
     1        Technical           3          4           3      1
     1        Technical           3          4           3      2
     1        Technical           3          4           3      3
     1        Technical           3          4           4      1
     1        Technical           3          4           4      2
     1        Technical           3          4           4      3

Number of cases read:  27    Number of cases listed:  27
=======================================
APPENDIX II: Test data, and all code
=======================================
*  C:\Documents and Settings\Richard\My Documents            .
*    \Technical\spssx-l\Z-2014                               .
*    \2014-03-10 Marso-Re News from the SPSS Community.SPS   .

*  In response to posting                                    .
*  Date:     Mon, 10 Mar 2014 06:57:24 -0700                 .
*  From:     David Marso <[hidden email]>             .
*  Subject:  Re: News from the SPSS Community                .
*  To:       [hidden email]                        .

*  The response will be posted under subject line                   .
*  Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product .

*  David Marso wrote,                                               .
*  "Building Richard's 'spine' can be done easily with [MATRIX      .
*  function] KRONEKER."                                             .
*                                                                   .
*  See thread beginning with                                        .
*     Date:    Sun, 9 Mar 2014 17:54:52 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Macro !Mny2Mny, for many-to-many merge               .
*              and Cartesian product                                .
*     To:      [hidden email]                             .
*  especially                                                       .
*     Date:    Sun, 9 Mar 2014 17:08:56 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Re: Macro !Mny2Mny, for many-to-many merge           .
*              and Cartesian product                                .
*     To:      [hidden email]                             .

*  .................   Matrix <!>Mny2Mny       .................... .
*  ... This version modified, to save copies of the dataset from    .
*  ... which the 'spine' is built, and the spine itself.            .

DEFINE !Mny2Mny(LeftIn  = !CHAREND('/')
                /RightIn = !CHAREND('/')
                /BYvar   = !CHAREND('/')
                /LRename = !CHAREND('/')
                /LDrop   = !CHAREND('/')
                /LKeep   = !CHAREND('/') !DEFAULT(ALL)
                /RRename = !CHAREND('/')
                /RDrop   = !CHAREND('/')
                /RKeep   = !CHAREND('/') !DEFAULT(ALL)
                /Join    = !CHAREND('/') !DEFAULT(Outer)
                /MaxGrp  = !TOKENS(1)    !DEFAULT(50))

    ECHO " ".
    ECHO !QUOTE(!CONCAT('Macro !','Mny2Mny:')).

*  .......     Section I:   Input parsing and error checking   ....... .
    !LET !Error = No

    !IF (   !LeftIn  !EQ !NULL
        !OR !RightIn !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Arguments LeftIn and RightIn (the two inputs)".
        ECHO "      are required.".
    !IFEND

    !IF (   !RightIn !EQ '*') !THEN
        !LET !Error = Yes
        ECHO "The Right input may not be '*'".
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'OUTER') !THEN
       !LET !JType = 'OUTER'
       !LET !JMsg  = 'Outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFT OUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFTOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'RIGHT OUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'RIGHTOUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'ROUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'INNER') !THEN
       !LET !JType = 'INNER'
       !LET !JMsg  = 'Inner join'
    !IFEND

    !IF (!BYvar         !EQ !NULL)   !THEN
       !LET !JType = 'CARTPROD'
       !LET !JMsg   = 'Cartesian product'
    !IFEND

    !IF (!JType !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Invalid join type:".
        ECHO !QUOTE(!CONCAT('    Join=',!Join)).
    !IFEND


*  .......     Section II:  Informational messages            ........ .
    !IF (!Error !EQ No) !THEN
        ECHO !QUOTE(!CONCAT(!JMsg,' of')).
        ECHO !QUOTE(!CONCAT('   ',!LeftIn)).
        !IF (        !LRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!LRename))
        !IFEND
        !IF (        !LDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!LDrop))
        !IFEND
        !IF (!UPCASE(!LKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!LKeep))
        !IFEND

        ECHO !QUOTE(!CONCAT('to ',!RightIn)).
        !IF (        !RRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!RRename))
        !IFEND
        !IF (        !RDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!RDrop))
        !IFEND
        !IF (!UPCASE(!RKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!RKeep))
        !IFEND
        !IF (!BYvar  !NE !NULL)          !THEN
           ECHO !QUOTE(!CONCAT('by ',!BYvar)).
        !IFEND
      !IFEND

*  .......     Section III: Processing                          ....... .
*  .......     III.A  Sequence-numbered copies of the inputs    ....... .
    !IF (!Error !EQ No) !THEN
        ADD FILES
           /FILE=!Leftin.
        NUMERIC @Mny2Mny_NoBreak (F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !LRename
        !IFEND
           /BY      @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!LDrop   !NE !NULL) !THEN
           /DROP   = !LDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !LKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Left# = 1.
        ELSE.
        .  COMPUTE Left# = LAG(Left#) + 1.
        END IF.
        FORMATS    Left# (F4).
        DATASET NAME      @Mny2Mny_Left#d WINDOW=ASIS.
        /*-- LIST /*-*/.

        DATASET DECLARE   @Mny2Mny_Left##.
        AGGREGATE OUTFILE=@Mny2Mny_Left##
           /BREAK = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_LMaxRec = MAX(Left#).


        ADD FILES
           /FILE=!RightIn.
        NUMERIC @Mny2Mny_NoBreak(F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !RRename
        !IFEND
           /BY @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!RDrop   !NE !NULL) !THEN
           /DROP   = !RDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !RKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Right# = 1.
        ELSE.
        .  COMPUTE Right# = LAG(Right#) + 1.
        END IF.
        FORMATS    Right# (F4).
        DATASET NAME      @Mny2Mny_Right#d WINDOW=ASIS.
        /*-- LIST /*-*/.

*  .......     III.B  Build 'spine' of all keys for all records ....... .
*  .......     III.B.1  Combine record counts from the inputs   ....... .
        AGGREGATE OUTFILE=*
           /BREAK   = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_RMaxRec = MAX(Right#).
        /*-- LIST /*-*/.

        MATCH FILES
           /FILE    = @Mny2Mny_Left##
           /FILE    = *
           /BY @Mny2Mny_NoBreak !BYvar.

        RECODE @Mny2Mny_LMaxRec
               @Mny2Mny_RMaxRec (MISSING=0).

*  .......     III.B.2  Select, based on join type              ....... .
       !IF  (!Jtype !EQ 'INNER') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0
               AND @Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'LEFTOUT') !THEN
        SELECT IF (@Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'RIGHTOUT') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0).
       !IFEND

        * ..................................... .
        * For OUTER join and Cartesian product, .
        * no selection is necessary.            .
        * ..................................... .

*  .......     III.B.3  Error message, if record count exceeds  ....... .
*                       the allowed maximum                     ....... .

        IF $CASENUM EQ 1 #Overflow = 0.

        DO IF    @Mny2Mny_LMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

        DO IF    @Mny2Mny_RMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

*  .......     III.B.4  Build the 'spine', of all record keys   ........ .

*  .......  From the file with counts of records from both       ........ .
*  .......  inputs by BYvar values (which is currently active),  ........ .
*  .......  'unroll' to build a file with all combinations of    ........ .
*  .......  BYvar values, and pairs of record sequence numbers,  ........ .
*  .......  that will occur in the output.                       ........ .
*  .......  This file is the 'spine'.                            ........ .

        /**/ DATASET COPY Pre_Spine  /*-*/.

        * 'Unroll' the Right record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_RMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_RMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        !LET !Sfx = !LENGTH (!BLANKS (!MaxGrp))

        OMS /SELECT ALL
            /IF COMMANDS = ["Variables to Cases"]
            /DESTINATION   VIEWER = NO
            /TAG = "@MNY2MNY_V2C_ZAP".

        VARSTOCASES
           /MAKE Right# FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        * 'Unroll' the Left  record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_LMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_LMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        VARSTOCASES
           /MAKE Left#  FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        OMSEND TAG=["@MNY2MNY_V2C_ZAP"].

        /**/ DATASET COPY Spine      /*-*/.

*  .......     III.C  Add the substantive data                  ....... .
        MATCH FILES
           /FILE=*
           /TABLE=@Mny2Mny_Right#d
           /BY !BYvar Right#
           /DROP @Mny2Mny_NoBreak @Mny2Mny_NewGrp
                 @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec.

        SORT CASES BY !BYvar Left# Right#.

        MATCH FILES
           /TABLE=@Mny2Mny_Left#d
           /FILE=*
           /BY   !BYvar Left#
           /DROP  @Mny2Mny_NoBreak @Mny2Mny_NewGrp
           /KEEP !BYvar Left# Right# ALL.

        EXECUTE /* so input datasets can be dropped */.

        DATASET CLOSE @Mny2Mny_Left#d.
        DATASET CLOSE @Mny2Mny_Right#d.
        DATASET CLOSE @Mny2Mny_Left##.
    !IFEND
!ENDDEFINE.

*  .................   Test data               .................... .

DATA LIST FIXED
    /Category      04-18 (A)
     School        19-40 (A).
BEGIN DATA
    Art            RI School of Design
    Art            Metropolitan Museum
    Culinary       Johnson & Wales
    Liberal Arts   Oberlin
    Liberal Arts   Antioch
    Liberal Arts   Swarthmore
    Technical      MIT
    Technical      Cal Tech
    Technical      ETH
END DATA.
DATASET NAME     Schools.
.  /*-- LIST /*-*/.

DATA LIST FIXED
    /Category      04-18 (A)
     Subject       19-35 (A).
BEGIN DATA
    Art            Painting
    Art            Drawing
    Art            Sculpture
    Liberal Arts   Literature
    Liberal Arts   Philosophy
    Psychoceramics Carberry
    Psychoceramics Library science
    Technical      Calculus
    Technical      Physics
    Technical      Chemistry
    Technical      Engineering
END DATA.
DATASET NAME    Subjects.
*  ...................................................... .

DATASET ACTIVATE Schools    WINDOW=FRONT.
LIST.
DATASET ACTIVATE Subjects   WINDOW=FRONT.
LIST.


!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

DATASET NAME     Merged    WINDOW=FRONT.
LIST.

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
I am running version 22 on Doze 8.1
---
Here is a fresh version of the code.  
I have revised the hacky crap involving the copied data set by creating the instance counter at the beginning in each source file. Creating a secondary map file to bridge the spine to the actual category codes back to the source files (f1,f2)
spine       (AR_Category, Instance1, Instance2)
-->  map (AR_Category, Category)
--> f1 (Category, Instance1)
--> f2 (Category, Instance2)

Also did a bit of tweaking of the MATRIX code itself.
This should handle some pretty large problems as the full KP matrix/spine is not actually stored in full.
It writes out subsets (one KP matrix for each original matching category) to the active file as they are computed.  
One immense advantage is there is NO XSAVE to external data sets as required by Jon's extension or Richard's code.

Pats self on back......
-----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F1INstance=1.
IF MISSING(F1INstance) F1INstance=LAG(F1INstance)+1.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.

SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F2INstance=1.
IF MISSING(F2INstance) F2INstance=LAG(F2INstance)+1.
DATASET NAME f2.
LIST.

ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
DATASET NAME stacked.

DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .

DATASET DECLARE map.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE map / BREAK Category AR_Category/ NperCat=N .

DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

MATRIX.
GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
COMPUTE B1=Data(1:Data(1,3),1:3).
COMPUTE B2=Data((Data(1,3)+1):NROW(Data),1:3).
COMPUTE MaxCat=MMAX(Data(:,1)).
COMPUTE C={T({1:MaxCat}),MAKE(MaxCat,3,0)}.

LOOP #=1 TO NROW(B1).
+  COMPUTE C(B1(#,1),2)=B1(#,2).
END LOOP.

LOOP #=1 TO NROW(B2).
+  COMPUTE C(B2(#,1),3)=B2(#,2).
END LOOP.

LOOP #=1 TO MaxCat.
+  COMPUTE C(#,2:3) = CMAX ({C(#,2),C(#,3);1,1}).
+  COMPUTE C(#,4)=C(#,2) * C(#,3).
+  SAVE ({MAKE(C(#,4),1,C(#,1)),
          KRONEKER(T({1:C(#,2)}),MAKE(C(#,3),1,1)),
          KRONEKER(MAKE(C(#,2),1,1),T({1:C(#,3)}))})
    / OUTFILE *
    / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.

/* Merge 'spine' into map file to link to source files */.
MATCH FILES FILE * / TABLE map  / BY AR_Category .
MATCH FILES FILE * / TABLE f1 / BY Category F1Instance.

SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE f2  / BY Category F2Instance.
DATASET NAME MergedCartProd.

EXECUTE.

DATASET CLOSE agg.
DATASET CLOSE map.
DATASET CLOSE stacked.

DATASET ACTIVATE MergedCartProd.
DELETE VARIABLES F1Instance F2Instance AR_Category NperCat.
LIST.

 
 
 
 
Category         School                         Subject
 
Art              RI School of Design            Painting
Art              Metropolitan Museum            Painting
Art              RI School of Design            Drawing
Art              Metropolitan Museum            Drawing
Art              RI School of Design            Sculpture
Art              Metropolitan Museum            Sculpture
Culinary         Johnson & Wales
Liberal Arts     Oberlin                        Literature
Liberal Arts     Antioch                        Literature
Liberal Arts     Swarthmore                     Literature
Liberal Arts     Oberlin                        Philosophy
Liberal Arts     Antioch                        Philosophy
Liberal Arts     Swarthmore                     Philosophy
Psychoceramics                                  Carberry
Psychoceramics                                  Library science
Technical        MIT                            Calculus
Technical        Cal Tech                       Calculus
Technical        ETH                            Calculus
Technical        MIT                            Physics
Technical        Cal Tech                       Physics
Technical        ETH                            Physics
Technical        MIT                            Chemistry
Technical        Cal Tech                       Chemistry
Technical        ETH                            Chemistry
Technical        MIT                            Engineering
Technical        Cal Tech                       Engineering
Technical        ETH                            Engineering
 
 
Number of cases read:  27    Number of cases listed:  27



Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

bdates
FYI,

I'm running v21.0.0.2 and an old version of v15.  This runs on both. Thanks for this.  I'm doing more work with strings, and this came along serendipitously.

B

-----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F1INstance=1.
IF MISSING(F1INstance) F1INstance=LAG(F1INstance)+1.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.

SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F2INstance=1.
IF MISSING(F2INstance) F2INstance=LAG(F2INstance)+1.
DATASET NAME f2.
LIST.

ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
DATASET NAME stacked.

DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .

DATASET DECLARE map.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE map / BREAK Category AR_Category/ NperCat=N .

DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

MATRIX.
GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
COMPUTE B1=Data(1:Data(1,3),1:3).
COMPUTE B2=Data((Data(1,3)+1):NROW(Data),1:3).
COMPUTE MaxCat=MMAX(Data(:,1)).
COMPUTE C={T({1:MaxCat}),MAKE(MaxCat,3,0)}.

LOOP #=1 TO NROW(B1).
+  COMPUTE C(B1(#,1),2)=B1(#,2).
END LOOP.

LOOP #=1 TO NROW(B2).
+  COMPUTE C(B2(#,1),3)=B2(#,2).
END LOOP.

LOOP #=1 TO MaxCat.
+  COMPUTE C(#,2:3) = CMAX ({C(#,2),C(#,3);1,1}).
+  COMPUTE C(#,4)=C(#,2) * C(#,3).
+  SAVE ({MAKE(C(#,4),1,C(#,1)),
          KRONEKER(T({1:C(#,2)}),MAKE(C(#,3),1,1)),
          KRONEKER(MAKE(C(#,2),1,1),T({1:C(#,3)}))})
    / OUTFILE *
    / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.

/* Merge 'spine' into map file to link to source files */.
MATCH FILES FILE * / TABLE map  / BY AR_Category .
MATCH FILES FILE * / TABLE f1 / BY Category F1Instance.

SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE f2  / BY Category F2Instance.
DATASET NAME MergedCartProd.

EXECUTE.

DATASET CLOSE agg.
DATASET CLOSE map.
DATASET CLOSE stacked.

DATASET ACTIVATE MergedCartProd.
DELETE VARIABLES F1Instance F2Instance AR_Category NperCat.
LIST.





Category         School                         Subject

Art              RI School of Design            Painting
Art              Metropolitan Museum            Painting
Art              RI School of Design            Drawing
Art              Metropolitan Museum            Drawing
Art              RI School of Design            Sculpture
Art              Metropolitan Museum            Sculpture
Culinary         Johnson & Wales
Liberal Arts     Oberlin                        Literature
Liberal Arts     Antioch                        Literature
Liberal Arts     Swarthmore                     Literature
Liberal Arts     Oberlin                        Philosophy
Liberal Arts     Antioch                        Philosophy
Liberal Arts     Swarthmore                     Philosophy
Psychoceramics                                  Carberry
Psychoceramics                                  Library science
Technical        MIT                            Calculus
Technical        Cal Tech                       Calculus
Technical        ETH                            Calculus
Technical        MIT                            Physics
Technical        Cal Tech                       Physics
Technical        ETH                            Physics
Technical        MIT                            Chemistry
Technical        Cal Tech                       Chemistry
Technical        ETH                            Chemistry
Technical        MIT                            Engineering
Technical        Cal Tech                       Engineering
Technical        ETH                            Engineering


Number of cases read:  27    Number of cases listed:  27







-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/MATRIX-code-to-create-spine-for-R-Ristows-Macro-Mny2Mny-for-many-to-many-merge-and-Cartesian-product-tp5724820p5724853.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
I'm curious Brian, do the previous incarnations work on your v 21.x.x and 15?
In part:
NEW FILE.
DATASET CLOSE all.
DATASET DECLARE tmp.
MATRIX.
SAVE UNIFORM(100,10) / OUTFILE tmp / VARIABLES x01 TO x10.
END MATRIX.

I know MATRIX throws a fit when READING from datasets, but in my experience (IIRC) writing to datasets from MATRIX was working fine in code that I have living happily in the wild).
(ahem... FIX THIS IBM in v 22 fixpack please.  I can't afford to step up to 23 when it is released)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

bdates
David,

Yes.  Works like a charm in both versions.  I, like you I believe, think first in matrix; but there are admittedly times when it's not as friendly as non-matrix macros.

B

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Wednesday, March 12, 2014 4:59 PM
To: [hidden email]
Subject: Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

I'm curious Brian, do the previous incarnations work on your v 21.x.x and 15?
In part:
NEW FILE.
DATASET CLOSE all.
DATASET DECLARE tmp.
MATRIX.
SAVE UNIFORM(100,10) / OUTFILE tmp / VARIABLES x01 TO x10.
END MATRIX.

*I know MATRIX throws a fit when READING from datasets*, but in my
experience (IIRC) writing to datasets from MATRIX was working fine in code
that I have living happily in the wild).
(ahem... FIX THIS IBM in v 22 fixpack please.  I can't afford to step up to
23 when it is released)



-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/MATRIX-code-to-create-spine-for-R-Ristows-Macro-Mny2Mny-for-many-to-many-merge-and-Cartesian-product-tp5724820p5724859.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Maguin, Eugene
In reply to this post by David Marso
I'm continuing to have trouble with the code, which is the new version. The problem occurs here

MATRIX.
GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.

And the message is
Line: 58 Command: Matrix Information: Matrix syntax error: unrecognized statement name. is it spelled correctly?


So I ran a little Input program job to create a data file with three numeric variables and a few hundred cases and then wrote

matrix.
get x/file */variables x y.
end matrix.

I got the same message as before when I ran it and the following in the output file. My impression was that the matrix-end matrix command set was part of the 'base'. Am I wrong?

Run MATRIX procedure:

>Error # 12546 in column 1.  Text: _SLINE
>MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
>Execution of this command stops.
Scan error detected in parser.

The MATRIX statement skipped.


>Error # 12546 in column 1.  Text: _SLINE
>MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
>Execution of this command stops.
Scan error detected in parser.

The MATRIX statement skipped.


------ END MATRIX -----




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Wednesday, March 12, 2014 11:25 AM
To: [hidden email]
Subject: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

I am running version 22 on Doze 8.1
---
Here is a fresh version of the code.
I have revised the hacky crap involving the copied data set by creating the instance counter at the beginning in each source file. Creating a secondary map file to bridge the spine to the actual category codes back to the source files (f1,f2)
spine       (AR_Category, Instance1, Instance2)
-->  map (AR_Category, Category)
--> f1 (Category, Instance1)
--> f2 (Category, Instance2)

Also did a bit of tweaking of the MATRIX code itself.
This should handle some pretty large problems as the full KP matrix/spine is not actually stored in full.
It writes out subsets (one KP matrix for each original matching category) to the active file as they are computed.
One immense advantage is there is NO XSAVE to external data sets as required by Jon's extension or Richard's code.

Pats self on back......
-----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F1INstance=1.
IF MISSING(F1INstance) F1INstance=LAG(F1INstance)+1.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.

SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F2INstance=1.
IF MISSING(F2INstance) F2INstance=LAG(F2INstance)+1.
DATASET NAME f2.
LIST.

ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
DATASET NAME stacked.

DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .

DATASET DECLARE map.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE map / BREAK Category AR_Category/ NperCat=N .

DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

MATRIX.
GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
COMPUTE B1=Data(1:Data(1,3),1:3).
COMPUTE B2=Data((Data(1,3)+1):NROW(Data),1:3).
COMPUTE MaxCat=MMAX(Data(:,1)).
COMPUTE C={T({1:MaxCat}),MAKE(MaxCat,3,0)}.

LOOP #=1 TO NROW(B1).
+  COMPUTE C(B1(#,1),2)=B1(#,2).
END LOOP.

LOOP #=1 TO NROW(B2).
+  COMPUTE C(B2(#,1),3)=B2(#,2).
END LOOP.

LOOP #=1 TO MaxCat.
+  COMPUTE C(#,2:3) = CMAX ({C(#,2),C(#,3);1,1}).
+  COMPUTE C(#,4)=C(#,2) * C(#,3).
+  SAVE ({MAKE(C(#,4),1,C(#,1)),
          KRONEKER(T({1:C(#,2)}),MAKE(C(#,3),1,1)),
          KRONEKER(MAKE(C(#,2),1,1),T({1:C(#,3)}))})
    / OUTFILE *
    / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.

/* Merge 'spine' into map file to link to source files */.
MATCH FILES FILE * / TABLE map  / BY AR_Category .
MATCH FILES FILE * / TABLE f1 / BY Category F1Instance.

SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE f2  / BY Category F2Instance.
DATASET NAME MergedCartProd.

EXECUTE.

DATASET CLOSE agg.
DATASET CLOSE map.
DATASET CLOSE stacked.

DATASET ACTIVATE MergedCartProd.
DELETE VARIABLES F1Instance F2Instance AR_Category NperCat.
LIST.





Category         School                         Subject

Art              RI School of Design            Painting
Art              Metropolitan Museum            Painting
Art              RI School of Design            Drawing
Art              Metropolitan Museum            Drawing
Art              RI School of Design            Sculpture
Art              Metropolitan Museum            Sculpture
Culinary         Johnson & Wales
Liberal Arts     Oberlin                        Literature
Liberal Arts     Antioch                        Literature
Liberal Arts     Swarthmore                     Literature
Liberal Arts     Oberlin                        Philosophy
Liberal Arts     Antioch                        Philosophy
Liberal Arts     Swarthmore                     Philosophy
Psychoceramics                                  Carberry
Psychoceramics                                  Library science
Technical        MIT                            Calculus
Technical        Cal Tech                       Calculus
Technical        ETH                            Calculus
Technical        MIT                            Physics
Technical        Cal Tech                       Physics
Technical        ETH                            Physics
Technical        MIT                            Chemistry
Technical        Cal Tech                       Chemistry
Technical        ETH                            Chemistry
Technical        MIT                            Engineering
Technical        Cal Tech                       Engineering
Technical        ETH                            Engineering


Number of cases read:  27    Number of cases listed:  27







-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/MATRIX-code-to-create-spine-for-R-Ristows-Macro-Mny2Mny-for-many-to-many-merge-and-Cartesian-product-tp5724820p5724853.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
"My impression was that the matrix-end matrix command set was part of the 'base'. Am I wrong?"
You are correct: MATRIX is part of the base.
eg:

data list free / x y.
begin data
1 2
3 4
5 6
end data.
MATRIX.
GET data / FILE * /VARIABLES x y .
PRINT data.
END MATRIX.

Run MATRIX procedure:

DATA
  1  2
  3  4
  5  6

------ END MATRIX -----
Maguin, Eugene wrote
I'm continuing to have trouble with the code, which is the new version. The problem occurs here

MATRIX.
GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.

And the message is
Line: 58 Command: Matrix Information: Matrix syntax error: unrecognized statement name. is it spelled correctly?


So I ran a little Input program job to create a data file with three numeric variables and a few hundred cases and then wrote

matrix.
get x/file */variables x y.
end matrix.

I got the same message as before when I ran it and the following in the output file. My impression was that the matrix-end matrix command set was part of the 'base'. Am I wrong?

Run MATRIX procedure:

>Error # 12546 in column 1.  Text: _SLINE
>MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
>Execution of this command stops.
Scan error detected in parser.

The MATRIX statement skipped.


>Error # 12546 in column 1.  Text: _SLINE
>MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
>Execution of this command stops.
Scan error detected in parser.

The MATRIX statement skipped.


------ END MATRIX -----




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Wednesday, March 12, 2014 11:25 AM
To: [hidden email]
Subject: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

I am running version 22 on Doze 8.1
---
Here is a fresh version of the code.
I have revised the hacky crap involving the copied data set by creating the instance counter at the beginning in each source file. Creating a secondary map file to bridge the spine to the actual category codes back to the source files (f1,f2)
spine       (AR_Category, Instance1, Instance2)
-->  map (AR_Category, Category)
--> f1 (Category, Instance1)
--> f2 (Category, Instance2)

Also did a bit of tweaking of the MATRIX code itself.
This should handle some pretty large problems as the full KP matrix/spine is not actually stored in full.
It writes out subsets (one KP matrix for each original matching category) to the active file as they are computed.
One immense advantage is there is NO XSAVE to external data sets as required by Jon's extension or Richard's code.

Pats self on back......
-----
NEW FILE.
DATASET CLOSE ALL.
DATA LIST / Category (A16) School (A30).
BEGIN DATA
Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH
END DATA.
SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F1INstance=1.
IF MISSING(F1INstance) F1INstance=LAG(F1INstance)+1.
DATASET NAME f1.
LIST.

DATA LIST / Category  (A16) Subject (A30) .
BEGIN DATA
Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering
END DATA.

SORT CASES BY Category.
IF $CASENUM EQ 1 OR Category NE LAG(Category) F2INstance=1.
IF MISSING(F2INstance) F2INstance=LAG(F2INstance)+1.
DATASET NAME f2.
LIST.

ADD FILES / FILE f1  / FILE f2 / IN=@IN2@.
AUTORECODE Category / INTO AR_Category.
DATASET NAME stacked.

DATASET DECLARE agg.
AGGREGATE OUTFILE agg / BREAK @IN2@ Category AR_Category/ NperCat=N .

DATASET DECLARE map.
DATASET ACTIVATE agg.
AGGREGATE OUTFILE map / BREAK Category AR_Category/ NperCat=N .

DATASET ACTIVATE agg.
AGGREGATE OUTFILE * MODE ADDVARIABLES / BREAK @IN2@ / NCatPerFile=N.

MATRIX.
GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
COMPUTE B1=Data(1:Data(1,3),1:3).
COMPUTE B2=Data((Data(1,3)+1):NROW(Data),1:3).
COMPUTE MaxCat=MMAX(Data(:,1)).
COMPUTE C={T({1:MaxCat}),MAKE(MaxCat,3,0)}.

LOOP #=1 TO NROW(B1).
+  COMPUTE C(B1(#,1),2)=B1(#,2).
END LOOP.

LOOP #=1 TO NROW(B2).
+  COMPUTE C(B2(#,1),3)=B2(#,2).
END LOOP.

LOOP #=1 TO MaxCat.
+  COMPUTE C(#,2:3) = CMAX ({C(#,2),C(#,3);1,1}).
+  COMPUTE C(#,4)=C(#,2) * C(#,3).
+  SAVE ({MAKE(C(#,4),1,C(#,1)),
          KRONEKER(T({1:C(#,2)}),MAKE(C(#,3),1,1)),
          KRONEKER(MAKE(C(#,2),1,1),T({1:C(#,3)}))})
    / OUTFILE *
    / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.

/* Merge 'spine' into map file to link to source files */.
MATCH FILES FILE * / TABLE map  / BY AR_Category .
MATCH FILES FILE * / TABLE f1 / BY Category F1Instance.

SORT CASES BY AR_Category F2Instance.
MATCH FILES FILE *  / TABLE f2  / BY Category F2Instance.
DATASET NAME MergedCartProd.

EXECUTE.

DATASET CLOSE agg.
DATASET CLOSE map.
DATASET CLOSE stacked.

DATASET ACTIVATE MergedCartProd.
DELETE VARIABLES F1Instance F2Instance AR_Category NperCat.
LIST.





Category         School                         Subject

Art              RI School of Design            Painting
Art              Metropolitan Museum            Painting
Art              RI School of Design            Drawing
Art              Metropolitan Museum            Drawing
Art              RI School of Design            Sculpture
Art              Metropolitan Museum            Sculpture
Culinary         Johnson & Wales
Liberal Arts     Oberlin                        Literature
Liberal Arts     Antioch                        Literature
Liberal Arts     Swarthmore                     Literature
Liberal Arts     Oberlin                        Philosophy
Liberal Arts     Antioch                        Philosophy
Liberal Arts     Swarthmore                     Philosophy
Psychoceramics                                  Carberry
Psychoceramics                                  Library science
Technical        MIT                            Calculus
Technical        Cal Tech                       Calculus
Technical        ETH                            Calculus
Technical        MIT                            Physics
Technical        Cal Tech                       Physics
Technical        ETH                            Physics
Technical        MIT                            Chemistry
Technical        Cal Tech                       Chemistry
Technical        ETH                            Chemistry
Technical        MIT                            Engineering
Technical        Cal Tech                       Engineering
Technical        ETH                            Engineering


Number of cases read:  27    Number of cases listed:  27







-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/MATRIX-code-to-create-spine-for-R-Ristows-Macro-Mny2Mny-for-many-to-many-merge-and-Cartesian-product-tp5724820p5724853.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Rich Ulrich
In reply to this post by Maguin, Eugene
For your first problem, consider this from the manual --

• A matrix variable name follows the same rules as those applicable to an ordinary IBM® SPSS® Statistics variable name, except matrix variable names cannot exceed 8 bytes.

For your problem with the test program...  I see "get x" which
names the matrix x, and also a variable x.  Would that mess it up?

--
Rich Ulrich

> Date: Thu, 13 Mar 2014 14:42:52 +0000

> From: [hidden email]
> Subject: Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product
> To: [hidden email]
>
> I'm continuing to have trouble with the code, which is the new version. The problem occurs here
>
> MATRIX.
> GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
>
> And the message is
> Line: 58 Command: Matrix Information: Matrix syntax error: unrecognized statement name. is it spelled correctly?
>
>
> So I ran a little Input program job to create a data file with three numeric variables and a few hundred cases and then wrote
>
> matrix.
> get x/file */variables x y.
> end matrix.
>
> I got the same message as before when I ran it and the following in the output file. My impression was that the matrix-end matrix command set was part of the 'base'. Am I wrong?
>
> Run MATRIX procedure:
>
> >Error # 12546 in column 1. Text: _SLINE
> >MATRIX syntax error: unrecognized statement name. Is it spelled correctly?
> >Execution of this command stops.
> Scan error detected in parser.
>
> The MATRIX statement skipped.
>
>
> >Error # 12546 in column 1. Text: _SLINE
> >MATRIX syntax error: unrecognized statement name. Is it spelled correctly?
> >Execution of this command stops.
> Scan error detected in parser.
>
> The MATRIX statement skipped.
>
>
> ------ END MATRIX -----
>
>
>
>
> -----Original Message-----
[snip]
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
No, that would be fine as can be seen here.
the VARIABLES clause refers to the variables in the SPSS data file.
GET x assigns the contents of whatever is in the SPSS variables x and y to the matrix object x.
Do wish they would up the limit on MATRIX object names to something reasonable ( > 8 chars).

data list free / x y.
begin data
1 2
3 4
5 6
end data.
MATRIX.
GET x / FILE * /VARIABLES x y .
PRINT x.
END MATRIX.

Run MATRIX procedure:
 
X
  1  2
  3  4
  5  6
 
------ END MATRIX -----

Rich Ulrich wrote
For your first problem, consider this from the manual --

• A matrix variable name follows the same rules as
those applicable to an ordinary IBM® SPSS® Statistics variable name, except matrix variable names cannot exceed 8 bytes.

For your problem with the test program...  I see "get x" which
names the matrix x, and also a variable x.  Would that mess it up?

--
Rich Ulrich

> Date: Thu, 13 Mar 2014 14:42:52 +0000
> From: [hidden email]
> Subject: Re: Revised: Macro !Mny2Mny,              for many-to-many merge and Cartesian product
> To: [hidden email]
>
> I'm continuing to have trouble with the code, which is the new version. The problem occurs here
>
> MATRIX.
> GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
>
> And the message is
> Line: 58 Command: Matrix Information: Matrix syntax error: unrecognized statement name. is it spelled correctly?
>
>
> So I ran a little Input program job to create a data file with three numeric variables and a few hundred cases and then wrote
>
> matrix.
> get x/file */variables x y.
> end matrix.
>
> I got the same message as before when I ran it and the following in the output file. My impression was that the matrix-end matrix command set was part of the 'base'. Am I wrong?
>
> Run MATRIX procedure:
>
> >Error # 12546 in column 1.  Text: _SLINE
> >MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
> >Execution of this command stops.
> Scan error detected in parser.
>
> The MATRIX statement skipped.
>
>
> >Error # 12546 in column 1.  Text: _SLINE
> >MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
> >Execution of this command stops.
> Scan error detected in parser.
>
> The MATRIX statement skipped.
>
>
> ------ END MATRIX -----
>
>
>
>
> -----Original Message-----
[snip]
Rich Ulrich wrote
For your first problem, consider this from the manual --

• A matrix variable name follows the same rules as
those applicable to an ordinary IBM® SPSS® Statistics variable name, except matrix variable names cannot exceed 8 bytes.

For your problem with the test program...  I see "get x" which
names the matrix x, and also a variable x.  Would that mess it up?

--
Rich Ulrich

> Date: Thu, 13 Mar 2014 14:42:52 +0000
> From: [hidden email]
> Subject: Re: Revised: Macro !Mny2Mny,              for many-to-many merge and Cartesian product
> To: [hidden email]
>
> I'm continuing to have trouble with the code, which is the new version. The problem occurs here
>
> MATRIX.
> GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
>
> And the message is
> Line: 58 Command: Matrix Information: Matrix syntax error: unrecognized statement name. is it spelled correctly?
>
>
> So I ran a little Input program job to create a data file with three numeric variables and a few hundred cases and then wrote
>
> matrix.
> get x/file */variables x y.
> end matrix.
>
> I got the same message as before when I ran it and the following in the output file. My impression was that the matrix-end matrix command set was part of the 'base'. Am I wrong?
>
> Run MATRIX procedure:
>
> >Error # 12546 in column 1.  Text: _SLINE
> >MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
> >Execution of this command stops.
> Scan error detected in parser.
>
> The MATRIX statement skipped.
>
>
> >Error # 12546 in column 1.  Text: _SLINE
> >MATRIX syntax error: unrecognized statement name.  Is it spelled correctly?
> >Execution of this command stops.
> Scan error detected in parser.
>
> The MATRIX statement skipped.
>
>
> ------ END MATRIX -----
>
>
>
>
> -----Original Message-----
[snip]
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Maguin, Eugene
The problem is my not/mis-understanding how it, matrix-end matrix works. I selected/executed the first two lines (Matrix. Get ...) rather than the whole Matrix-end matrix sequence. When I did the first two lines I got the error I earlier posted about. When i did the whole sequence, I didn’t get the error.  I haven't yet ran the whole matrix-end matrix sequence in David's code but I'll bet it'll work when I do.
Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Thursday, March 13, 2014 12:46 PM
To: [hidden email]
Subject: Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

No, that would be fine as can be seen here.
the VARIABLES clause refers to the variables in the SPSS data file.
GET x assigns the contents of whatever is in the SPSS variables x and y to the matrix object x.
Do wish they would up the limit on MATRIX object names to something reasonable ( > 8 chars).

data list free / x y.
begin data
1 2
3 4
5 6
end data.
MATRIX.
GET x / FILE * /VARIABLES x y .
PRINT x.
END MATRIX.

Run MATRIX procedure:

X
  1  2
  3  4
  5  6

------ END MATRIX -----


Rich Ulrich wrote

> For your first problem, consider this from the manual --
>
> • A matrix variable name follows the same rules as those applicable to
> an ordinary IBM® SPSS® Statistics variable name, except matrix
> variable names cannot exceed 8 bytes.
>
> For your problem with the test program...  I see "get x" which names
> the matrix x, and also a variable x.  Would that mess it up?
>
> --
> Rich Ulrich
>
>> Date: Thu, 13 Mar 2014 14:42:52 +0000
>> From:

> emaguin@

>> Subject: Re: Revised: Macro !Mny2Mny,              for many-to-many merge
>> and Cartesian product
>> To:

> SPSSX-L@.UGA

>>
>> I'm continuing to have trouble with the code, which is the new version.
>> The problem occurs here
>>
>> MATRIX.
>> GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
>>
>> And the message is
>> Line: 58 Command: Matrix Information: Matrix syntax error:
>> unrecognized statement name. is it spelled correctly?
>>
>>
>> So I ran a little Input program job to create a data file with three
>> numeric variables and a few hundred cases and then wrote
>>
>> matrix.
>> get x/file */variables x y.
>> end matrix.
>>
>> I got the same message as before when I ran it and the following in
>> the output file. My impression was that the matrix-end matrix command
>> set was part of the 'base'. Am I wrong?
>>
>> Run MATRIX procedure:
>>
>> >Error # 12546 in column 1.  Text: _SLINE MATRIX syntax error:
>> >unrecognized statement name.  Is it spelled
>> correctly?
>> >Execution of this command stops.
>> Scan error detected in parser.
>>
>> The MATRIX statement skipped.
>>
>>
>> >Error # 12546 in column 1.  Text: _SLINE MATRIX syntax error:
>> >unrecognized statement name.  Is it spelled
>> correctly?
>> >Execution of this command stops.
>> Scan error detected in parser.
>>
>> The MATRIX statement skipped.
>>
>>
>> ------ END MATRIX -----
>>
>>
>>
>>
>> -----Original Message-----
> [snip]


Rich Ulrich wrote

> For your first problem, consider this from the manual --
>
> • A matrix variable name follows the same rules as those applicable to
> an ordinary IBM® SPSS® Statistics variable name, except matrix
> variable names cannot exceed 8 bytes.
>
> For your problem with the test program...  I see "get x" which names
> the matrix x, and also a variable x.  Would that mess it up?
>
> --
> Rich Ulrich
>
>> Date: Thu, 13 Mar 2014 14:42:52 +0000
>> From:

> emaguin@

>> Subject: Re: Revised: Macro !Mny2Mny,              for many-to-many merge
>> and Cartesian product
>> To:

> SPSSX-L@.UGA

>>
>> I'm continuing to have trouble with the code, which is the new version.
>> The problem occurs here
>>
>> MATRIX.
>> GET DATA /FILE * / VARIABLES AR_Category NPerCat NCatPerFile.
>>
>> And the message is
>> Line: 58 Command: Matrix Information: Matrix syntax error:
>> unrecognized statement name. is it spelled correctly?
>>
>>
>> So I ran a little Input program job to create a data file with three
>> numeric variables and a few hundred cases and then wrote
>>
>> matrix.
>> get x/file */variables x y.
>> end matrix.
>>
>> I got the same message as before when I ran it and the following in
>> the output file. My impression was that the matrix-end matrix command
>> set was part of the 'base'. Am I wrong?
>>
>> Run MATRIX procedure:
>>
>> >Error # 12546 in column 1.  Text: _SLINE MATRIX syntax error:
>> >unrecognized statement name.  Is it spelled
>> correctly?
>> >Execution of this command stops.
>> Scan error detected in parser.
>>
>> The MATRIX statement skipped.
>>
>>
>> >Error # 12546 in column 1.  Text: _SLINE MATRIX syntax error:
>> >unrecognized statement name.  Is it spelled
>> correctly?
>> >Execution of this command stops.
>> Scan error detected in parser.
>>
>> The MATRIX statement skipped.
>>
>>
>> ------ END MATRIX -----
>>
>>
>>
>>
>> -----Original Message-----
> [snip]





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/MATRIX-code-to-create-spine-for-R-Ristows-Macro-Mny2Mny-for-many-to-many-merge-and-Cartesian-product-tp5724820p5724869.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Richard Ristow
In reply to this post by David Marso
At 11:24 AM 3/12/2014, David Marso wrote:
>One immense advantage is there is NO XSAVE to external data sets as required
>by Jon's extension or Richard's code.
>
>Pats self on back......

By all means pat yourself on the back, but my macro did NOT do an
XSAVE; it does two MATCH FILES. Read the code, or the documentation I
posted, or both.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
Hi Richard,
  I took another look at your code and indeed, you do not XSAVE to disk.
I apologize for that misattribution.
Other than that doesn't the MATRIX approach seem a bit more intuitive than the double VARSTOCASES?
David
Richard Ristow wrote
At 11:24 AM 3/12/2014, David Marso wrote:
>One immense advantage is there is NO XSAVE to external data sets as required
>by Jon's extension or Richard's code.
>
>Pats self on back......

By all means pat yourself on the back, but my macro did NOT do an
XSAVE; it does two MATCH FILES. Read the code, or the documentation I
posted, or both.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Richard Ristow
At 04:41 PM 3/13/2014, David Marso wrote:

>Other than that doesn't the MATRIX approach seem a bit more
>intuitive than the double VARSTOCASES?

I don't know whether it's more intuitive -- the MATRIX syntax takes a
lot of hard reading!

But MATRIX is definitely the better method, because it doesn't
require that the maximum size of an incoming group be specified in
the code, or as a macro parameter. That's why I asked if you could
work out the MATRIX logic for the spine; and I thank you for doing so.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Revised: Macro !Mny2Mny, for many-to-many merge and Cartesian product

David Marso
Administrator
I took a few minutes to add commentary (WTF is going on assurance) to the code .
Should make the reading of the code a bit easier.
Here is just the MATRIX piece (slightly modified from previous version).
MATRIX.
GET DATA /FILE */VARIABLES AR_Category NPerCat NCatPerFile.
/* Read first file's index variables into B1 */.
COMPUTE B1=Data(1:Data(1,3),:).
/* Read second file's index variables into B2 */.
COMPUTE B2=Data((Data(1,3)+1):NROW(Data),:).
/* Store largest value of the key variable in MaxCat */.
COMPUTE MaxCat=MMAX(Data(:,1)).
/* Create placeholder matrix indexed by category value */.
COMPUTE C={T({1:MaxCat}),MAKE(MaxCat,2,0)}.
/* Store number of category instances from each file in C(:,2:3) */.
LOOP #=1 TO NROW(B1).
+  COMPUTE C(B1(#,1),2)=B1(#,2).
END LOOP.
LOOP #=1 TO NROW(B2).
+  COMPUTE C(B2(#,1),3)=B2(#,2).
END LOOP.
/* Loop over each category and build KP and save to active file */.
LOOP #=1 TO MaxCat.
/* Need to set zeroes to 1 to build single row for missing categories */.
+  COMPUTE C(#,2:3) = CMAX ({C(#,2),C(#,3);1,1}).
/* Example given C(1,:)={1,2,3}, */.
/* Result saved to file is  {1,1,1;   1,1,2;   1,1,3;   1,2,1;   1,2,2;   1,2,3}  */.
  SAVE ({MAKE(C(#,2)*C(#,3),1,C(#,1)),
          KRONEKER(T({1:C(#,2)})   ,MAKE(C(#,3),1,1)),
          KRONEKER(MAKE(C(#,2),1,1),T({1:C(#,3)}))})
    / OUTFILE *
    / VARIABLES AR_Category F1Instance F2Instance.
END LOOP.
END MATRIX.
Richard Ristow wrote
At 04:41 PM 3/13/2014, David Marso wrote:

>Other than that doesn't the MATRIX approach seem a bit more
>intuitive than the double VARSTOCASES?

I don't know whether it's more intuitive -- the MATRIX syntax takes a
lot of hard reading!

But MATRIX is definitely the better method, because it doesn't
require that the maximum size of an incoming group be specified in
the code, or as a macro parameter. That's why I asked if you could
work out the MATRIX logic for the spine; and I thank you for doing so.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"