News from the SPSS Community

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: News from the SPSS Community

Jon K Peck
Menu typo corrected.  That's what happens in Juniper pollen season.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

Reply | Threaded
Open this post in threaded view
|

Re: News from the SPSS Community

Richard Ristow
In reply to this post by David Marso
At 08:57 AM 3/10/2014, David Marso wrote, among other things:

>Building Richard's 'spine' can be done easily with KRONEKER.

Substantive response has been moved to thread
"Macro !Mny2Mny, for many-to-many merge and Cartesian product"

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product

Richard Ristow
In reply to this post by David Marso
At 08:57 AM 3/10/2014, David Marso wrote, under subject line
"Re: News from the SPSS Community":

>Building Richard's 'spine' can be done easily with KRONEKER.

I'd be interested in how to do this, since I don't find the
double-VARSTOCASES approach wholly satisfactory(*).

The input from which the 'spine' is generated consists of an
indefinite number of records, each containing,
. One key value, where the key can consist of any number of variables
of any datatypes. (The key value is unique in this file.)
. A left-hand record count, GE 0:  Rl
. A right-hand record count, GE 0: Rr

The output, the spine itself, contains the same key values, but with
Rl*Rr records for each (or Rl or Rr records, when an outer join is
being performed and one of the values is 0). There's one record for
each {l,r:1<=l<=Rl,1<=r<=Rr} (except that l is 0 if Rl is 0, and r is
0 if Rr is 0.)

I'm appending Appendix I (with the output) and Appendix II (with the
test data and all code, including the macro definition) from a small
demonstration run. In the output, the pre-spine and spine are listed,
following the listing of the merged data.  You'll see that both have
variables beyond the basics I listed above. Those are variables used
in earlier stages, which are carried forward past when they are no
longer needed, to the first time they can conveniently be eliminated.

Finally, notice that this is a true outer join, with unmatched
records from both inputs appearing in the output.
======================================
(*)Definition and use of the 'spine': see
Date:    Sun, 9 Mar 2014 17:08:56 -0500
From:    Richard Ristow <[hidden email]>
Subject: Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product
To:      [hidden email]
=======================================
APPENDIX I:  Test and demonstration run
=======================================
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Schools]
Category        School

Art             RI School of Design
Art             Metropolitan Museum
Culinary        Johnson & Wales
Liberal Arts    Oberlin
Liberal Arts    Antioch
Liberal Arts    Swarthmore
Technical       MIT
Technical       Cal Tech
Technical       ETH

Number of cases read:  9    Number of cases listed:  9
...
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:05       |
|-----------------------------|---------------------------|
  [Subjects]
Category        Subject

Art             Painting
Art             Drawing
Art             Sculpture
Liberal Arts    Literature
Liberal Arts    Philosophy
Psychoceramics  Carberry
Psychoceramics  Library science
Technical       Calculus
Technical       Physics
Technical       Chemistry
Technical       Engineering

Number of cases read:  11    Number of cases listed:  11

!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

Macro !Mny2Mny:
Outer join of
    Schools
to Subjects
by Category

DATASET NAME     Merged    WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Merged]
Category        Left# Right# School                 Subject

Art                 1     1  RI School of Design    Painting
Art                 1     2  RI School of Design    Drawing
Art                 1     3  RI School of Design    Sculpture
Art                 2     1  Metropolitan Museum    Painting
Art                 2     2  Metropolitan Museum    Drawing
Art                 2     3  Metropolitan Museum    Sculpture
Culinary            1     0  Johnson & Wales
Liberal Arts        1     1  Oberlin                Literature
Liberal Arts        1     2  Oberlin                Philosophy
Liberal Arts        2     1  Antioch                Literature
Liberal Arts        2     2  Antioch                Philosophy
Liberal Arts        3     1  Swarthmore             Literature
Liberal Arts        3     2  Swarthmore             Philosophy
Psychoceramics      0     1                         Carberry
Psychoceramics      0     2                         Library science
Technical           1     1  MIT                    Calculus
Technical           1     2  MIT                    Physics
Technical           1     3  MIT                    Chemistry
Technical           1     4  MIT                    Engineering
Technical           2     1  Cal Tech               Calculus
Technical           2     2  Cal Tech               Physics
Technical           2     3  Cal Tech               Chemistry
Technical           2     4  Cal Tech               Engineering
Technical           3     1  ETH                    Calculus
Technical           3     2  ETH                    Physics
Technical           3     3  ETH                    Chemistry
Technical           3     4  ETH                    Engineering

Number of cases read:  27    Number of cases listed:  27

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.
List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:07       |
|-----------------------------|---------------------------|
  [Pre_Spine]
@Mny2Mny_NoBreak Category        @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec

         1        Art                      2                3
         1        Culinary                 1                0
         1        Liberal Arts             3                2
         1        Psychoceramics           0                2
         1        Technical                3                4


Number of cases read:  5    Number of cases listed:  5

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |10-MAR-2014 14:37:08       |
|-----------------------------|---------------------------|
  [Spine] [listing has been hand-reformatted]
@Mny2Mny_    Category        @Mny2Mny_  @Mny2Mny_  Right#  Left#
NoBreak                      LMaxRec    RMaxRec

     1        Art                 2          3           1      1
     1        Art                 2          3           1      2
     1        Art                 2          3           2      1
     1        Art                 2          3           2      2
     1        Art                 2          3           3      1
     1        Art                 2          3           3      2
     1        Culinary            1          0           0      1
     1        Liberal Arts        3          2           1      1
     1        Liberal Arts        3          2           1      2
     1        Liberal Arts        3          2           1      3
     1        Liberal Arts        3          2           2      1
     1        Liberal Arts        3          2           2      2
     1        Liberal Arts        3          2           2      3
     1        Psychoceramics      0          2           1      0
     1        Psychoceramics      0          2           2      0
     1        Technical           3          4           1      1
     1        Technical           3          4           1      2
     1        Technical           3          4           1      3
     1        Technical           3          4           2      1
     1        Technical           3          4           2      2
     1        Technical           3          4           2      3
     1        Technical           3          4           3      1
     1        Technical           3          4           3      2
     1        Technical           3          4           3      3
     1        Technical           3          4           4      1
     1        Technical           3          4           4      2
     1        Technical           3          4           4      3

Number of cases read:  27    Number of cases listed:  27
=======================================
APPENDIX II: Test data, and all code
=======================================
*  C:\Documents and Settings\Richard\My Documents            .
*    \Technical\spssx-l\Z-2014                               .
*    \2014-03-10 Marso-Re News from the SPSS Community.SPS   .

*  In response to posting                                    .
*  Date:     Mon, 10 Mar 2014 06:57:24 -0700                 .
*  From:     David Marso <[hidden email]>             .
*  Subject:  Re: News from the SPSS Community                .
*  To:       [hidden email]                        .

*  The response will be posted under subject line                   .
*  Re: Macro !Mny2Mny, for many-to-many merge and Cartesian product .

*  David Marso wrote,                                               .
*  "Building Richard's 'spine' can be done easily with [MATRIX      .
*  function] KRONEKER."                                             .
*                                                                   .
*  See thread beginning with                                        .
*     Date:    Sun, 9 Mar 2014 17:54:52 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Macro !Mny2Mny, for many-to-many merge               .
*              and Cartesian product                                .
*     To:      [hidden email]                             .
*  especially                                                       .
*     Date:    Sun, 9 Mar 2014 17:08:56 -0500                       .
*     From:    Richard Ristow <[hidden email]>             .
*     Subject: Re: Macro !Mny2Mny, for many-to-many merge           .
*              and Cartesian product                                .
*     To:      [hidden email]                             .

*  .................   Matrix <!>Mny2Mny       .................... .
*  ... This version modified, to save copies of the dataset from    .
*  ... which the 'spine' is built, and the spine itself.            .

DEFINE !Mny2Mny(LeftIn  = !CHAREND('/')
                /RightIn = !CHAREND('/')
                /BYvar   = !CHAREND('/')
                /LRename = !CHAREND('/')
                /LDrop   = !CHAREND('/')
                /LKeep   = !CHAREND('/') !DEFAULT(ALL)
                /RRename = !CHAREND('/')
                /RDrop   = !CHAREND('/')
                /RKeep   = !CHAREND('/') !DEFAULT(ALL)
                /Join    = !CHAREND('/') !DEFAULT(Outer)
                /MaxGrp  = !TOKENS(1)    !DEFAULT(50))

    ECHO " ".
    ECHO !QUOTE(!CONCAT('Macro !','Mny2Mny:')).

*  .......     Section I:   Input parsing and error checking   ....... .
    !LET !Error = No

    !IF (   !LeftIn  !EQ !NULL
        !OR !RightIn !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Arguments LeftIn and RightIn (the two inputs)".
        ECHO "      are required.".
    !IFEND

    !IF (   !RightIn !EQ '*') !THEN
        !LET !Error = Yes
        ECHO "The Right input may not be '*'".
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'OUTER') !THEN
       !LET !JType = 'OUTER'
       !LET !JMsg  = 'Outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFT OUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LEFTOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'LOUTER') !THEN
       !LET !JType = 'LEFTOUT'
       !LET !JMsg  = 'Left outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'RIGHT OUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'RIGHTOUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND
    !IF (!UPCASE(!Join) !EQ 'ROUTER') !THEN
       !LET !JType = 'RIGHTOUT'
       !LET !JMsg  = 'Right outer join'
    !IFEND

    !IF (!UPCASE(!Join) !EQ 'INNER') !THEN
       !LET !JType = 'INNER'
       !LET !JMsg  = 'Inner join'
    !IFEND

    !IF (!BYvar         !EQ !NULL)   !THEN
       !LET !JType = 'CARTPROD'
       !LET !JMsg   = 'Cartesian product'
    !IFEND

    !IF (!JType !EQ !NULL) !THEN
        !LET !Error = Yes
        ECHO "Invalid join type:".
        ECHO !QUOTE(!CONCAT('    Join=',!Join)).
    !IFEND


*  .......     Section II:  Informational messages            ........ .
    !IF (!Error !EQ No) !THEN
        ECHO !QUOTE(!CONCAT(!JMsg,' of')).
        ECHO !QUOTE(!CONCAT('   ',!LeftIn)).
        !IF (        !LRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!LRename))
        !IFEND
        !IF (        !LDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!LDrop))
        !IFEND
        !IF (!UPCASE(!LKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!LKeep))
        !IFEND

        ECHO !QUOTE(!CONCAT('to ',!RightIn)).
        !IF (        !RRename !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /RENAME=',!RRename))
        !IFEND
        !IF (        !RDrop   !NE !NULL) !THEN
           ECHO !QUOTE(!CONCAT('      /DROP  =',!RDrop))
        !IFEND
        !IF (!UPCASE(!RKeep)  !NE 'ALL') !THEN
           ECHO !QUOTE(!CONCAT('      /KEEP  =',!BYVar,' ',!RKeep))
        !IFEND
        !IF (!BYvar  !NE !NULL)          !THEN
           ECHO !QUOTE(!CONCAT('by ',!BYvar)).
        !IFEND
      !IFEND

*  .......     Section III: Processing                          ....... .
*  .......     III.A  Sequence-numbered copies of the inputs    ....... .
    !IF (!Error !EQ No) !THEN
        ADD FILES
           /FILE=!Leftin.
        NUMERIC @Mny2Mny_NoBreak (F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !LRename
        !IFEND
           /BY      @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!LDrop   !NE !NULL) !THEN
           /DROP   = !LDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !LKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Left# = 1.
        ELSE.
        .  COMPUTE Left# = LAG(Left#) + 1.
        END IF.
        FORMATS    Left# (F4).
        DATASET NAME      @Mny2Mny_Left#d WINDOW=ASIS.
        /*-- LIST /*-*/.

        DATASET DECLARE   @Mny2Mny_Left##.
        AGGREGATE OUTFILE=@Mny2Mny_Left##
           /BREAK = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_LMaxRec = MAX(Left#).


        ADD FILES
           /FILE=!RightIn.
        NUMERIC @Mny2Mny_NoBreak(F2).
        COMPUTE @Mny2Mny_NoBreak = 1.
        ADD FILES
           /FILE=*
        !IF (!LRename !NE !NULL) !THEN
           /RENAME = !RRename
        !IFEND
           /BY @Mny2Mny_NoBreak !BYvar
           /FIRST = @Mny2Mny_NewGrp
        !IF (!RDrop   !NE !NULL) !THEN
           /DROP   = !RDrop
        !IFEND
           /KEEP   = @Mny2Mny_NoBreak !BYvar !RKeep.
        DO IF    @Mny2Mny_NewGrp.
        .  COMPUTE Right# = 1.
        ELSE.
        .  COMPUTE Right# = LAG(Right#) + 1.
        END IF.
        FORMATS    Right# (F4).
        DATASET NAME      @Mny2Mny_Right#d WINDOW=ASIS.
        /*-- LIST /*-*/.

*  .......     III.B  Build 'spine' of all keys for all records ....... .
*  .......     III.B.1  Combine record counts from the inputs   ....... .
        AGGREGATE OUTFILE=*
           /BREAK   = @Mny2Mny_NoBreak !BYvar
           /@Mny2Mny_RMaxRec = MAX(Right#).
        /*-- LIST /*-*/.

        MATCH FILES
           /FILE    = @Mny2Mny_Left##
           /FILE    = *
           /BY @Mny2Mny_NoBreak !BYvar.

        RECODE @Mny2Mny_LMaxRec
               @Mny2Mny_RMaxRec (MISSING=0).

*  .......     III.B.2  Select, based on join type              ....... .
       !IF  (!Jtype !EQ 'INNER') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0
               AND @Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'LEFTOUT') !THEN
        SELECT IF (@Mny2Mny_RMaxRec GT 0).
       !IFEND

       !IF  (!Jtype !EQ 'RIGHTOUT') !THEN
        SELECT IF (@Mny2Mny_LMaxRec GT 0).
       !IFEND

        * ..................................... .
        * For OUTER join and Cartesian product, .
        * no selection is necessary.            .
        * ..................................... .

*  .......     III.B.3  Error message, if record count exceeds  ....... .
*                       the allowed maximum                     ....... .

        IF $CASENUM EQ 1 #Overflow = 0.

        DO IF    @Mny2Mny_LMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Left  input: ' @Mny2Mny_LMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

        DO IF    @Mny2Mny_RMaxRec GT !MaxGrp.
        .  DO IF #Overflow EQ 0.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '
                !ELSE
                      ' records for key ', !BYvar
                !IFEND
                      !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')).
        .     COMPUTE #Overflow = 1.
        .  ELSE.
        .     PRINT / 'Right input: ' @Mny2Mny_RMaxRec
                !IF  (!Jtype !EQ 'CARTPROD') !THEN
                      ' records '.
                !ELSE
                      ' records for key ', !BYvar.
                !IFEND
        .  END IF.
        END IF.

*  .......     III.B.4  Build the 'spine', of all record keys   ........ .

*  .......  From the file with counts of records from both       ........ .
*  .......  inputs by BYvar values (which is currently active),  ........ .
*  .......  'unroll' to build a file with all combinations of    ........ .
*  .......  BYvar values, and pairs of record sequence numbers,  ........ .
*  .......  that will occur in the output.                       ........ .
*  .......  This file is the 'spine'.                            ........ .

        /**/ DATASET COPY Pre_Spine  /*-*/.

        * 'Unroll' the Right record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_RMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_RMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        !LET !Sfx = !LENGTH (!BLANKS (!MaxGrp))

        OMS /SELECT ALL
            /IF COMMANDS = ["Variables to Cases"]
            /DESTINATION   VIEWER = NO
            /TAG = "@MNY2MNY_V2C_ZAP".

        VARSTOCASES
           /MAKE Right# FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        * 'Unroll' the Left  record numbers.
        VECTOR  Rec#(!MaxGrp,F4).

        DO IF   @Mny2Mny_LMaxRec EQ 0.
        .     COMPUTE Rec#(1)     = 0.
        ELSE.
        .  LOOP #Idx = 1 TO MIN(@Mny2Mny_LMaxRec,!MaxGrp).
        .     COMPUTE Rec#(#Idx) = #Idx.
        .  END LOOP.
        END IF.
        EXECUTE  /* apparently necessary */.

        VARSTOCASES
           /MAKE Left#  FROM Rec#1 TO !CONCAT('Rec#',!Sfx).

        OMSEND TAG=["@MNY2MNY_V2C_ZAP"].

        /**/ DATASET COPY Spine      /*-*/.

*  .......     III.C  Add the substantive data                  ....... .
        MATCH FILES
           /FILE=*
           /TABLE=@Mny2Mny_Right#d
           /BY !BYvar Right#
           /DROP @Mny2Mny_NoBreak @Mny2Mny_NewGrp
                 @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec.

        SORT CASES BY !BYvar Left# Right#.

        MATCH FILES
           /TABLE=@Mny2Mny_Left#d
           /FILE=*
           /BY   !BYvar Left#
           /DROP  @Mny2Mny_NoBreak @Mny2Mny_NewGrp
           /KEEP !BYvar Left# Right# ALL.

        EXECUTE /* so input datasets can be dropped */.

        DATASET CLOSE @Mny2Mny_Left#d.
        DATASET CLOSE @Mny2Mny_Right#d.
        DATASET CLOSE @Mny2Mny_Left##.
    !IFEND
!ENDDEFINE.

*  .................   Test data               .................... .

DATA LIST FIXED
    /Category      04-18 (A)
     School        19-40 (A).
BEGIN DATA
    Art            RI School of Design
    Art            Metropolitan Museum
    Culinary       Johnson & Wales
    Liberal Arts   Oberlin
    Liberal Arts   Antioch
    Liberal Arts   Swarthmore
    Technical      MIT
    Technical      Cal Tech
    Technical      ETH
END DATA.
DATASET NAME     Schools.
.  /*-- LIST /*-*/.

DATA LIST FIXED
    /Category      04-18 (A)
     Subject       19-35 (A).
BEGIN DATA
    Art            Painting
    Art            Drawing
    Art            Sculpture
    Liberal Arts   Literature
    Liberal Arts   Philosophy
    Psychoceramics Carberry
    Psychoceramics Library science
    Technical      Calculus
    Technical      Physics
    Technical      Chemistry
    Technical      Engineering
END DATA.
DATASET NAME    Subjects.
*  ...................................................... .

DATASET ACTIVATE Schools    WINDOW=FRONT.
LIST.
DATASET ACTIVATE Subjects   WINDOW=FRONT.
LIST.


!Mny2Mny LeftIn=Schools/
          RightIn=Subjects/
          BYvar=Category.

DATASET NAME     Merged    WINDOW=FRONT.
LIST.

DATASET ACTIVATE Pre_Spine WINDOW=FRONT.
LIST.

DATASET ACTIVATE Spine     WINDOW=FRONT.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
12