|
Here's what I'm trying to do. I have two SPSS files with multiple records of
each class name with start dates. I want to search the second file for the class name and start date and if it matches to a class name-start date +/- 2 days in the first file to create a "match" variable. Does anyone know how I can do this ? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Alina,
The short answer that this can not be done without a merge. You will have to do a match files. Using syntax only, spss can not move back and forth between two files in the way that you would need to be able to do. You might be able to do this with python or script. I don't know. Others can comment and offer examples, please. This is not very easy to do even with matching. So let me make sure I understand some things. First, your language is not so clear about which file is to be matched to which. So, let file 1 be Classname startdate D9edmf 9/10/08 Wru39nv 9/18/08 Io3489 9/18/08 Admm394 9/15/08 Ppo098 8/31/08 Let file 2 by D9edmf 9/12/08 Wru39nv 9/18/08 Io3489 9/13/08 Admm394 9/11/08 As I read your message you want to mark a new variable in file 2, call it 'match', as 'yes' if a record from file 2 matches a record from file 1 exactly for classname and the file 1 startdate is +/- 2 days of the file 2 startdate. Is this exactly true? So in the above example file 2, when finished would look like: Classname startdate match D9edmf 9/12/08 yes Wru39nv 9/18/08 yes Io3489 9/13/08 Admm394 9/11/08 Is this exactly correct? Lastly (for the moment), does either file 1 or file 2 have duplicate combinations of classname and startdate? For example: Admm394 9/15/08 Ppo098 8/31/08 Admm394 9/15/08 Has this been checked? Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Alina Sheyman-3
At 12:38 PM 10/1/2008, Alina Sheyman wrote:
>I have two SPSS files with multiple records of each class name with >start dates. I want to search the second file for the class name and >start date and if it matches to a class name-start date +/- 2 days >in the first file to create a "match" variable. I wonder if this is the same problem you're trying to solve in thread '"do repeat" code'? That looks similar, except with 'wide' organization -- all dates in one record -- as if you used CASESTOVARS on this problem. Pardon my guessing. Anyway, I think this is a solution to your problem, as posted in this thread. The trick is to compare dates from File2 with the 4-day 'windows' from File1 by >interleaving<, not merging, the files; and setting the dates in File1 two days back, so they'll come before any File2 date in their 'window'. The data here avoids multiple matches by having all dates in File1 more than 4 days apart. Unless there is one file in which the 4-day 'windows' don't overlap, there can be multiple-to-multiple matches, and the logic musts be more complicated. * ................. Show the test data ..................... . DATASET ACTIVATE File1 WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |04-OCT-2008 01:26:27 | |-----------------------------|---------------------------| [File1] Class StartDate Sequence1 Class_A 09/15/2008 a Class_A 09/21/2008 b Class_A 10/01/2008 c Class_A 10/11/2008 d Class_A 10/16/2008 e Class_B 09/15/2008 A Class_B 09/21/2008 B Class_B 10/01/2008 C Class_B 10/11/2008 D Class_B 10/16/2008 E Class_C 09/15/2008 a Class_C 09/21/2008 b Class_C 10/01/2008 c Class_C 10/11/2008 d Class_C 10/16/2008 e Number of cases read: 15 Number of cases listed: 15 DATASET ACTIVATE File2 WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |04-OCT-2008 01:26:27 | |-----------------------------|---------------------------| [File2] Class StartDate Sequence2 Class_A 09/01/2008 01 Class_A 09/10/2008 02 Class_A 09/19/2008 03 Class_A 09/28/2008 04 Class_A 10/07/2008 05 Class_B 09/15/2008 01 Class_B 09/19/2008 02 Class_B 09/24/2008 03 Class_B 09/29/2008 04 Class_B 10/04/2008 05 Class_B 10/09/2008 06 Class_B 10/12/2008 07 Class_B 10/17/2008 08 Class_C 10/01/2008 01 Class_C 10/11/2008 02 Class_C 10/19/2008 03 Class_C 10/24/2008 04 Class_C 10/31/2008 05 Class_C 11/09/2008 06 Number of cases read: 19 Number of cases listed: 19 * ................................................................. . * ................. Computations ..................... . * I. Copy of File1 records keeping only the key variables, ..... . * with real start date as 'F1Start', and 'StartDate' ..... . * set back two days. ..... . * (The latter is the 'cute' trick. When the files are ..... . * interleaved, it makes File 1 come before File 2 ..... . * records that are as much as two days earlier.) ..... . * * ----- Load File1: ..... . ADD FILES /FILE=File1 /KEEP=Class StartDate. DATASET NAME IntrLeav WINDOW=FRONT. * ----- 'Remember' the real starting date, and set variable ..... . * "StartDate" back by two days ..... . COMPUTE F1StDate = StartDate. FORMATS F1StDate (ADATE10). VAR LABEL F1StDate 'Matching start date, from File 1'. COMPUTE StartDate = StartDate - TIME.DAYS(2). . /**/ LIST /*-*/. [output suppressed] * II. >Interleave< (not merge) with File 2. ..... . * The result has all data from File 2, ..... . * and the key variables from File 1. ..... . ADD FILES /FILE=* /IN=File1Rcd /FILE=File2 /BY Class StartDate /KEEP=Class StartDate ALL. . /**/ LIST /*-*/. [output suppressed] * III. If a date in File 2 matches one from File 1 within ..... . * two days, keep it as a matching date in the File 2 ..... . * record ..... . * III.A From File 1 records, 'remember' the starting date ..... . DO IF File1Rcd. . COMPUTE #LtstF1Dt = F1StDate. . FORMATS #LtstF1Dt (ADATE10). * III.B For File 2 records, ..... . ELSE /* (Records not from File 1 are from File 2) */. * ----- 'StartDate' becomes the File 2 start date ..... . . COMPUTE F2StDate = StartDate. . VAR LAB F2StDate 'Starting date of this File 2 record'. . FORMATS F2StDAte (ADATE10). * ----- If the last File 1 date seen is not from the ..... . * current class, 'forget' it. ..... . . IF Class NE LAG(CLASS) #LtstF1Dt = $SYSMIS. * ----- If the last File 1 date seen is within two days ..... . * the date in File 2, store it as a matching date. ..... . * (Remember: "F1StDate" is initially missing in File 2 ..... . * records.) ..... . . IF NOT MISSING(#LtstF1Dt) AND ABS(#LtstF1Dt - F2StDate) LE TIME.DAYS(2) F1StDate = #LtstF1Dt. END IF. * IV. Add File 1 data to matching File 2 records ..... . * ----- For non-matching File 2 records -- those without a ..... . * start date from File 1 -- fill in File 1 start date ..... . * with an artificial value that can't match anything, ..... . * but is in the correct sort order. ..... . DO IF MISSING(F1StDate). . DO IF Class EQ LAG(Class). . COMPUTE F1StDate = LAG(F1StDate) + TIME.HMS(0,0,1). . ELSE. . COMPUTE F1StDate = DATE.MDY(01,01,1900) + TIME.HMS(0,0,1). . END IF. END IF. * ----- Merge in the data from matching File 1 ..... . MATCH FILES /TABLE=File1 /RENAME=(StartDate=F1StDate) /FILE =* /BY Class F1StDate /DROP =File1Rcd StartDate /KEEP =Class F1StDate F2StDate ALL. * ----- Remove the artifical 'File 1 start dates' ..... . IF XDATE.TIME(F1StDate) GT 0 F1StDate = $SYSMIS. * ----- Drop the File 1 records ..... . SELECT IF NOT MISSING(F2StDate) /* File 1 records have no File 2 date */. LIST. List |-----------------------------|---------------------------| |Output Created |04-OCT-2008 01:26:34 | |-----------------------------|---------------------------| [IntrLeav] Class F1StDate F2StDate Sequence1 Sequence2 Class_A . 09/01/2008 01 Class_A . 09/10/2008 02 Class_A 09/21/2008 09/19/2008 b 03 Class_A . 09/28/2008 04 Class_A . 10/07/2008 05 Class_B 09/15/2008 09/15/2008 A 01 Class_B 09/21/2008 09/19/2008 B 02 Class_B . 09/24/2008 03 Class_B 10/01/2008 09/29/2008 C 04 Class_B . 10/04/2008 05 Class_B 10/11/2008 10/09/2008 D 06 Class_B 10/11/2008 10/12/2008 D 07 Class_B 10/16/2008 10/17/2008 E 08 Class_C 10/01/2008 10/01/2008 c 01 Class_C 10/11/2008 10/11/2008 d 02 Class_C . 10/19/2008 03 Class_C . 10/24/2008 04 Class_C . 10/31/2008 05 Class_C . 11/09/2008 06 Number of cases read: 19 Number of cases listed: 19 ============================= APPENDIX: Test data, and code ============================= * C:\Documents and Settings\Richard\My Documents . * \Technical\spssx-l\Z-2008d . * \2008-10-10 Sheyman - working with multiple files without merge.SPS. * In response to posting . * Date: Wed, 1 Oct 2008 12:38:55 -0400 . * From: Alina Sheyman <[hidden email]> . * Subject: working with multiple files without merge . * To: [hidden email] . * I have two SPSS files with multiple records of each class name . * with start dates. I want to search the second file for the class . * name and start date and if it matches to a class name-start date . * +/- 2 days in the first file to create a "match" variable. . * ................................................................. . * ................. Test data ..................... . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 0039 /* Providence, RI telephone book */ . * ..... File 1: In this file, ..... . * All start dates are 15 Sept to 15 Oct ..... . * They're spaced 5, 6, or 10 days apart ..... . * They're the same for all three 'Classes' ..... . NEW FILE. INPUT PROGRAM. . STRING Class (A8). . NUMERIC StartDate (ADATE10). . STRING Sequence1 (A1). . LEAVE Class, StartDate Sequence1. . COMPUTE StartDate = DATE.MDY(09,15,2008). . LOOP #Day = 01 TO 30 /* Much more than needed */. . DO IF #Day GT 01. . COMPUTE #Select = TRUNC(RV.UNIFORM(1,4)). . RECODE #Select (1 = 5) (2 = 6) (3 = 10) INTO #Interval. . COMPUTE StartDate = StartDate + TIME.DAYS(#Interval). . END IF. . COMPUTE Sequence1 = SUBSTR('abcdefghijklmnopqrstuvwxyz',#Day,1). . COMPUTE Class = 'Class_A'. . END CASE. . COMPUTE Class = 'Class_C'. . END CASE. . COMPUTE Class = 'Class_B'. . COMPUTE Sequence1 = UPCASE(Sequence1). . END CASE. . END LOOP IF StartDate GT DATE.MDY(10,15,2008). END FILE. END INPUT PROGRAM. DATASET NAME File1. SORT CASES BY Class StartDate. . /*-- LIST /*-*/. * ..... File 2: In this file, ..... . * Class A: Dates in September, spaced 08-10 days ..... . * Class B: Dates 15 Sept-15 Oct., spaced 01-05 days ..... . * Class C: Dates in October, spaced 04-10 days ..... . NEW FILE. INPUT PROGRAM. . STRING Class (A8). . NUMERIC StartDate (ADATE10) Sequence2 (N2). . LEAVE Class, StartDate Sequence2. * Class A: Dates in September, spaced 08-10 days ..... . . COMPUTE Class = 'Class_A'. . COMPUTE StartDate = DATE.MDY(09,01,2008). . LOOP #Day = 01 TO 30 /* Much more than needed */. . DO IF #Day GT 01. . COMPUTE #Interval = TRUNC(RV.UNIFORM(08,11)). . COMPUTE StartDate = StartDate + TIME.DAYS(#Interval). . END IF. . COMPUTE Sequence2 = #Day. . END CASE. . END LOOP IF StartDate GT DATE.MDY(09,30,2008). * Class B: Dates 15 Sept-15 Oct., spaced 01-05 days ..... . . COMPUTE Class = 'Class_B'. . COMPUTE StartDate = DATE.MDY(09,15,2008). . LOOP #Day = 01 TO 30 /* Many more than needed */. . DO IF #Day GT 01. . COMPUTE #Interval = TRUNC(RV.UNIFORM(01,06)). . COMPUTE StartDate = StartDate + TIME.DAYS(#Interval). . END IF. . COMPUTE Sequence2 = #Day. . END CASE. . END LOOP IF StartDate GT DATE.MDY(10,15,2008). * Class C: Dates in October, spaced 04-10 days ..... . . COMPUTE Class = 'Class_C'. . COMPUTE StartDate = DATE.MDY(10,01,2008). . LOOP #Day = 01 TO 30 /* Many more than needed */. . DO IF #Day GT 01. . COMPUTE #Interval = TRUNC(RV.UNIFORM(03,11)). . COMPUTE StartDate = StartDate + TIME.DAYS(#Interval). . END IF. . COMPUTE Sequence2 = #Day. . END CASE. . END LOOP IF StartDate GT DATE.MDY(10,31,2008). END FILE. END INPUT PROGRAM. DATASET NAME File2. . /*-- LIST /*-*/. * ................. Post after this point ..................... . * ................................................................. . * ................. Show the test data ..................... . DATASET ACTIVATE File1 WINDOW=FRONT. LIST. DATASET ACTIVATE File2 WINDOW=FRONT. LIST. * ................................................................. . * ................. Computations ..................... . * I. Copy of File1 records keeping only the key variables, ..... . * with real start date as 'F1Start', and 'StartDate' ..... . * set back two days. ..... . * (The latter is the 'cute' trick. When the files are ..... . * interleaved, it makes File 1 come before File 2 ..... . * records that are as much as two days earlier.) ..... . * * ----- Load File1: ..... . ADD FILES /FILE=File1 /KEEP=Class StartDate. DATASET NAME IntrLeav WINDOW=FRONT. * ----- 'Remember' the real starting date, and set variable ..... . * "StartDate" back by two days ..... . COMPUTE F1StDate = StartDate. FORMATS F1StDate (ADATE10). VAR LABEL F1StDate 'Matching start date, from File 1'. COMPUTE StartDate = StartDate - TIME.DAYS(2). . /**/ LIST /*-*/. * II. >Interleave< (not merge) with File 2. ..... . * The result has all data from File 2, ..... . * and the key variables from File 1. ..... . ADD FILES /FILE=* /IN=File1Rcd /FILE=File2 /BY Class StartDate /KEEP=Class StartDate ALL. . /**/ LIST /*-*/. * III. If a date in File 2 matches one from File 1 within ..... . * two days, keep it as a matching date in the File 2 ..... . * record ..... . * III.A From File 1 records, 'remember' the starting date ..... . DO IF File1Rcd. . COMPUTE #LtstF1Dt = F1StDate. . FORMATS #LtstF1Dt (ADATE10). * III.B For File 2 records, ..... . ELSE /* (Records not from File 1 are from File 2) */. * ----- 'StartDate' becomes the File 2 start date ..... . . COMPUTE F2StDate = StartDate. . VAR LAB F2StDate 'Starting date of this File 2 record'. . FORMATS F2StDAte (ADATE10). * ----- If the last File 1 date seen is not from the ..... . * current class, 'forget' it. ..... . . IF Class NE LAG(CLASS) #LtstF1Dt = $SYSMIS. * ----- If the last File 1 date seen is within two days ..... . * the date in File 2, store it as a matching date. ..... . * (Remember: "F1StDate" is initially missing in File 2 ..... . * records.) ..... . . IF NOT MISSING(#LtstF1Dt) AND ABS(#LtstF1Dt - F2StDate) LE TIME.DAYS(2) F1StDate = #LtstF1Dt. END IF. * IV. Add File 1 data to matching File 2 records ..... . * ----- For non-matching File 2 records -- those without a ..... . * start date from File 1 -- fill in File 1 start date ..... . * with an artificial value that can't match anything, ..... . * but is in the correct sort order. ..... . DO IF MISSING(F1StDate). . DO IF Class EQ LAG(Class). . COMPUTE F1StDate = LAG(F1StDate) + TIME.HMS(0,0,1). . ELSE. . COMPUTE F1StDate = DATE.MDY(01,01,1900) + TIME.HMS(0,0,1). . END IF. END IF. * ----- Merge in the data from matching File 1 ..... . MATCH FILES /TABLE=File1 /RENAME=(StartDate=F1StDate) /FILE =* /BY Class F1StDate /DROP =File1Rcd StartDate /KEEP =Class F1StDate F2StDate ALL. * ----- Remove the artifical 'File 1 start dates' ..... . IF XDATE.TIME(F1StDate) GT 0 F1StDate = $SYSMIS. * ----- Drop the File 1 records ..... . SELECT IF NOT MISSING(F2StDate) /* File 1 records have no File 2 date */. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
