|
A following posting will give the definition of macro !Mny2Mny, an
improved general-use macro for many-to-many merges ('joins', in database terms) and Cartesian products. Use: !Mny2Mny LeftIn={'file'/} {dataset/} {*/} [LRename=(old varnames=new varnames).../] [LDrop=varlist/] [LKeep={ALL** }/] RightIn={'file'/} {dataset/} [RRename=(old varnames=new varnames).../] [RDrop=varlist/] [RKeep={ALL** }/] [BYvar=varlist/] [Join= {Outer**}/] [MaxGrp={50**}/] !Mny2Mny combines variables from two SPSS-format files or datasets, called respectively the 'left input' (keyword 'LeftIn') and the 'right input' (keyword 'RightIn'). The left input, but not the right input, may be specified as '*'. As for MATCH FILES, you may specify RENAME (keywords LRename, RRename), DROP (keywords LDROP, RDrop), and KEEP (keywords LKeep, RKeep) for either or both inputs. (Unlike MATCH FILES, !Mny2Mny takes separate DROP and KEEP lists for each input.) In general, !Mny2Mny matches each record from each file with several records from the other. If BYvar is not specified, every record from each file matches every record from the other; i.e., the result is a Cartesian product of the files. If BYvar is specified, records from each file match those records from the other with the same value of the BYvar variables. As for MATCH FILES, if BYvar is specified, all variables named must have the same name, data type, and (if strings) length in both inputs. (If necessary, use LRename or RRename to make variable names match). When BYvar is used, both inputs must be sorted in ascending order of the key variables named on BYvar. The combined file will be the active dataset, and will be unnamed. Order of variables will be . The variables named on BYvar, if any . New variables 'Left#' and 'Right#'. . Remaining variables from the left input . Remaining variables from the right input Variables 'Left#' and 'Right#' identify which records from the inputs were the sources for the output record. They are sequence numbers, respectively, for the left and right inputs, within the given value of the BYvar variables. (If BYvar is not specified, 'Left#' and 'Right#' are sequence numbers for all records in the inputs.) The combined file will by in ascending order of the key variables named on BYvar; within a value of the BYvar variables, in ascending order of Left#, then Right#. If BYvar is specified, you may use subcommand JOIN= to control matching. It may be one of, . Inner: Records from either input that do not match any records in the other, are excluded . Outer: All records from both inputs are included, whether the match any record in the othe or not. (DEFAULT) . Left Outer: All records from the left input, but only matching records from the right input, are included . Right outer: All records from the right input, but only matching records from the left input, are included. When an output record has information from only one input, the sequence number (Left# or Right#) from the input that is present will be as usual; the sequence number for the absent input will be 0. Option MaxGrp (default: 50) gives the largest number of records that can be handled from either input, for one value of the BYvar variables. (If BYvar is not specified, MaxGrp must be at least as large as the number of records in the larger input.) Restrictions and side effects: . Variables Left#, Right#; and @Mny2Mny_NoBreak, @Mny2Mny_NewGrp, @Mny2Mny_LMaxRec, and @Mny2Mny_RMaxRec, should not occur in either input. (However, ones that do, e.g. Left# and Right# because the input is the result of a prior run of !Mny2Mny, may be renamed using LRename or RRename; or dropped, using LDrop or RDrop.) . Datasets @Mny2Mny_Left#d, @Mny2Mny_Right#d, and @Mny2Mny_Left## should not exist, and will be lost if they do. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Here's the definition. It's in a late stage of testing, rather than
finished; I sent the documentation and discussion to the list earlier than I intended, so I don't want to hold up the code. * **************************************************************** . * Macro for many-to-many merge of two datasets or .SAV files, . * by Richard Ristow, March 2014 . * Contact: [hidden email] . * . * Required arguments: . * LeftIn Name of one of the datasets, or saved files, to be joined. * (referred to as the "Left input") . * RightIn Name of the other dataset, or saved file, to be joined. * (referred to as the "Right input") . * . * Optional arguments for the inputs: . * LRename 'RENAME' to be applied to the Left input . * LDrop 'DROP' to be applied to the Left input . * LKeep 'KEEP' to be applied to the Left input (default: ALL) . * RRename 'RENAME' to be applied to the Right input . * RDrop 'DROP' to be applied to the Right input . * RKeep 'KEEP' to be applied to the Right input (default: ALL) . * . * Optional arguments for the join operation: . * Join Outer (default), LeftOuter, RightOuter, or Inner: . * Outer Output includes contents of all records . * from both inputs . * Left Outer Output includes contents of all records . * or LeftOuter from the Left input, but only records from . * or LOuter the Right input that match some left-input . * record . * Right Outer Output includes contents of all records . * or RightOuter from the Right input, but only records from . * or ROuter the Left input that match some right-input . * record . * Inner Output contains only records from each . * input that match some record in the other . * BYvar Variable or variables on which the two are to be joined;. * if not specified, a Cartesian product will be performed. . * Requirements are as for BY variables in MATCH FILES: . * all variables must be present and of the same type in . * both inputs; and both inputs must be in ascending order . * on the BYvar variable(s). . * . * MaxGrp Largest number of records allowed for any value of the . * BYvar variable(s) in either input; for Cartesian . * product, larget number of records in either intput. . * Default: 50. . * ................................................................ . * Reserved names: . * Variables Left# and Right# are created, and retained in the output. * Variables @Mny2Mny_NoBreak @Mny2Mny_NewGrp . * and @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec . * are used, and discarded. . * Datasets @Mny2Mny_Left#d @Mny2Mny_Right#d . * and @Mny2Mny_Left## . * are created, used, and discarded. * **************************************************************** . DEFINE !Mny2Mny(LeftIn = !CHAREND('/') /RightIn = !CHAREND('/') /BYvar = !CHAREND('/') /LRename = !CHAREND('/') /LDrop = !CHAREND('/') /LKeep = !CHAREND('/') !DEFAULT(ALL) /RRename = !CHAREND('/') /RDrop = !CHAREND('/') /RKeep = !CHAREND('/') !DEFAULT(ALL) /Join = !CHAREND('/') !DEFAULT(Outer) /MaxGrp = !TOKENS(1) !DEFAULT(50)) ECHO " ". ECHO !QUOTE(!CONCAT('Macro !','Mny2Mny:')). * ....... Section I: Input parsing and error checking ....... . !LET !Error = No !IF ( !LeftIn !EQ !NULL !OR !RightIn !EQ !NULL) !THEN !LET !Error = Yes ECHO "Arguments LeftIn and RightIn (the two inputs)". ECHO " are required.". !IFEND !IF ( !RightIn !EQ '*') !THEN !LET !Error = Yes ECHO "The Right input may not be '*'". !IFEND !IF (!UPCASE(!Join) !EQ 'OUTER') !THEN !LET !JType = 'OUTER' !LET !JMsg = 'Outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'LEFT OUTER') !THEN !LET !JType = 'LEFTOUT' !LET !JMsg = 'Left outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'LEFTOUTER') !THEN !LET !JType = 'LEFTOUT' !LET !JMsg = 'Left outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'LOUTER') !THEN !LET !JType = 'LEFTOUT' !LET !JMsg = 'Left outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'RIGHT OUTER') !THEN !LET !JType = 'RIGHTOUT' !LET !JMsg = 'Right outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'RIGHTOUTER') !THEN !LET !JType = 'RIGHTOUT' !LET !JMsg = 'Right outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'ROUTER') !THEN !LET !JType = 'RIGHTOUT' !LET !JMsg = 'Right outer join' !IFEND !IF (!UPCASE(!Join) !EQ 'INNER') !THEN !LET !JType = 'INNER' !LET !JMsg = 'Inner join' !IFEND !IF (!BYvar !EQ !NULL) !THEN !LET !JType = 'CARTPROD' !LET !JMsg = 'Cartesian product' !IFEND !IF (!JType !EQ !NULL) !THEN !LET !Error = Yes ECHO "Invalid join type:". ECHO !QUOTE(!CONCAT(' Join=',!Join)). !IFEND * ....... Section II: Informational messages ........ . !IF (!Error !EQ No) !THEN ECHO !QUOTE(!CONCAT(!JMsg,' of')). ECHO !QUOTE(!CONCAT(' ',!LeftIn)). !IF ( !LRename !NE !NULL) !THEN ECHO !QUOTE(!CONCAT(' /RENAME=',!LRename)) !IFEND !IF ( !LDrop !NE !NULL) !THEN ECHO !QUOTE(!CONCAT(' /DROP =',!LDrop)) !IFEND !IF (!UPCASE(!LKeep) !NE 'ALL') !THEN ECHO !QUOTE(!CONCAT(' /KEEP =',!BYVar,' ',!LKeep)) !IFEND ECHO !QUOTE(!CONCAT('to ',!RightIn)). !IF ( !RRename !NE !NULL) !THEN ECHO !QUOTE(!CONCAT(' /RENAME=',!RRename)) !IFEND !IF ( !RDrop !NE !NULL) !THEN ECHO !QUOTE(!CONCAT(' /DROP =',!RDrop)) !IFEND !IF (!UPCASE(!RKeep) !NE 'ALL') !THEN ECHO !QUOTE(!CONCAT(' /KEEP =',!BYVar,' ',!RKeep)) !IFEND !IF (!BYvar !NE !NULL) !THEN ECHO !QUOTE(!CONCAT('by ',!BYvar)). !IFEND !IFEND * ....... Section III: Processing ....... . * ....... III.A Sequence-numbered copies of the inputs ....... . !IF (!Error !EQ No) !THEN ADD FILES /FILE=!Leftin. NUMERIC @Mny2Mny_NoBreak (F2). COMPUTE @Mny2Mny_NoBreak = 1. ADD FILES /FILE=* !IF (!LRename !NE !NULL) !THEN /RENAME = !LRename !IFEND /BY @Mny2Mny_NoBreak !BYvar /FIRST = @Mny2Mny_NewGrp !IF (!LDrop !NE !NULL) !THEN /DROP = !LDrop !IFEND /KEEP = @Mny2Mny_NoBreak !BYvar !LKeep. DO IF @Mny2Mny_NewGrp. . COMPUTE Left# = 1. ELSE. . COMPUTE Left# = LAG(Left#) + 1. END IF. FORMATS Left# (F4). DATASET NAME @Mny2Mny_Left#d WINDOW=ASIS. /*-- LIST /*-*/. DATASET DECLARE @Mny2Mny_Left##. AGGREGATE OUTFILE=@Mny2Mny_Left## /BREAK = @Mny2Mny_NoBreak !BYvar /@Mny2Mny_LMaxRec = MAX(Left#). ADD FILES /FILE=!RightIn. NUMERIC @Mny2Mny_NoBreak(F2). COMPUTE @Mny2Mny_NoBreak = 1. ADD FILES /FILE=* !IF (!LRename !NE !NULL) !THEN /RENAME = !RRename !IFEND /BY @Mny2Mny_NoBreak !BYvar /FIRST = @Mny2Mny_NewGrp !IF (!RDrop !NE !NULL) !THEN /DROP = !RDrop !IFEND /KEEP = @Mny2Mny_NoBreak !BYvar !RKeep. DO IF @Mny2Mny_NewGrp. . COMPUTE Right# = 1. ELSE. . COMPUTE Right# = LAG(Right#) + 1. END IF. FORMATS Right# (F4). DATASET NAME @Mny2Mny_Right#d WINDOW=ASIS. /*-- LIST /*-*/. * ....... III.B Build 'spine' of all keys for all records ....... . * ....... III.B.1 Combine record counts from the inputs ....... . AGGREGATE OUTFILE=* /BREAK = @Mny2Mny_NoBreak !BYvar /@Mny2Mny_RMaxRec = MAX(Right#). /*-- LIST /*-*/. MATCH FILES /FILE = @Mny2Mny_Left## /FILE = * /BY @Mny2Mny_NoBreak !BYvar. RECODE @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec (MISSING=0). * ....... III.B.2 Select, based on join type ....... . !IF (!Jtype !EQ 'INNER') !THEN SELECT IF (@Mny2Mny_LMaxRec GT 0 AND @Mny2Mny_RMaxRec GT 0). !IFEND !IF (!Jtype !EQ 'LEFTOUT') !THEN SELECT IF (@Mny2Mny_RMaxRec GT 0). !IFEND !IF (!Jtype !EQ 'RIGHTOUT') !THEN SELECT IF (@Mny2Mny_LMaxRec GT 0). !IFEND * ..................................... . * For OUTER join and Cartesian product, . * no selection is necessary. . * ..................................... . * ....... III.B.3 Error message, if record count exceeds ....... . * the allowed maximum ....... . IF $CASENUM EQ 1 #Overflow = 0. DO IF @Mny2Mny_LMaxRec GT !MaxGrp. . DO IF #Overflow EQ 0. . PRINT / 'Left input: ' @Mny2Mny_LMaxRec !IF (!Jtype !EQ 'CARTPROD') !THEN ' records ' !ELSE ' records for key ', !BYvar !IFEND !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')). . COMPUTE #Overflow = 1. . ELSE. . PRINT / 'Left input: ' @Mny2Mny_LMaxRec !IF (!Jtype !EQ 'CARTPROD') !THEN ' records '. !ELSE ' records for key ', !BYvar. !IFEND . END IF. END IF. DO IF @Mny2Mny_RMaxRec GT !MaxGrp. . DO IF #Overflow EQ 0. . PRINT / 'Right input: ' @Mny2Mny_RMaxRec !IF (!Jtype !EQ 'CARTPROD') !THEN ' records ' !ELSE ' records for key ', !BYvar !IFEND !QUOTE(!CONCAT(' (max is ',!MaxGrp,')')). . COMPUTE #Overflow = 1. . ELSE. . PRINT / 'Right input: ' @Mny2Mny_RMaxRec !IF (!Jtype !EQ 'CARTPROD') !THEN ' records '. !ELSE ' records for key ', !BYvar. !IFEND . END IF. END IF. * ....... III.B.4 'Unroll' to build file of all record ....... . * keys ....... . * 'Unroll' the Right record numbers. VECTOR Rec#(!MaxGrp,F4). DO IF @Mny2Mny_RMaxRec EQ 0. . COMPUTE Rec#(1) = 0. ELSE. . LOOP #Idx = 1 TO MIN(@Mny2Mny_RMaxRec,!MaxGrp). . COMPUTE Rec#(#Idx) = #Idx. . END LOOP. END IF. EXECUTE /* apparently necessary */. !LET !Sfx = !LENGTH (!BLANKS (!MaxGrp)) OMS /SELECT ALL /IF COMMANDS = ["Variables to Cases"] /DESTINATION VIEWER = NO /TAG = "@MNY2MNY_V2C_ZAP". VARSTOCASES /MAKE Right# FROM Rec#1 TO !CONCAT('Rec#',!Sfx). * 'Unroll' the Left record numbers. VECTOR Rec#(!MaxGrp,F4). DO IF @Mny2Mny_LMaxRec EQ 0. . COMPUTE Rec#(1) = 0. ELSE. . LOOP #Idx = 1 TO MIN(@Mny2Mny_LMaxRec,!MaxGrp). . COMPUTE Rec#(#Idx) = #Idx. . END LOOP. END IF. EXECUTE /* apparently necessary */. VARSTOCASES /MAKE Left# FROM Rec#1 TO !CONCAT('Rec#',!Sfx). OMSEND TAG=["@MNY2MNY_V2C_ZAP"]. * ....... III.C Add the substantive data ....... . MATCH FILES /FILE=* /TABLE=@Mny2Mny_Right#d /BY !BYvar Right# /DROP @Mny2Mny_NoBreak @Mny2Mny_NewGrp @Mny2Mny_LMaxRec @Mny2Mny_RMaxRec. SORT CASES BY !BYvar Left# Right#. MATCH FILES /TABLE=@Mny2Mny_Left#d /FILE=* /BY !BYvar Left# /DROP @Mny2Mny_NoBreak @Mny2Mny_NewGrp /KEEP !BYvar Left# Right# ALL. EXECUTE /* so input datasets can be dropped */. DATASET CLOSE @Mny2Mny_Left#d. DATASET CLOSE @Mny2Mny_Right#d. DATASET CLOSE @Mny2Mny_Left##. !IFEND !ENDDEFINE. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
