|
Hi all,
I have received some great help so far from a few of you, but have run into another puzzler on some data work. I want to insert missing cases within an id beginning from the first year someone appears in the file to the cut-off point, 2005. I have got everything down so that I can add the cases and flag the first and last year, but I am a bit stuck on how I might use a macro or some other approach in the LOOP command. Any ideas? CURRENT FORMAT DATA LIST LIST/ caseid YearOnly campus var2. BEGIN DATA 1 1999 2 3 1 2001 5 2 1 2002 8 2 2 2000 2 5 2 2002 3 3 2 2003 8 5 3 2004 9 6 4 2005 0 2. END DATA. DESIRED FORMAT 1 1999 2 3 1 2000 1 2001 5 2 1 2002 8 2 1 2003 1 2004 1 2005 2 2000 2 5 2 2001 2 2002 3 3 2 2003 8 5 2 2004 2 2005 3 2004 9 6 3 2005 4 2005 0 2. END DATA. MY CODE SO FAR DATA LIST LIST/ caseid YearOnly campus var2. BEGIN DATA 1 2000 2 3 1 2001 5 2 1 2002 8 2 2 2000 2 5 2 2002 3 3 2 2003 8 5 3 2004 9 6 4 2005 0 2. END DATA. SORT CASES BY CASEID(A) YearOnly (A) . MATCH FILES /FILE = * /BY CASEID /FIRST = FirstCase . VARIABLE LABELS FirstCase 'Indicator of each first matching case as Primary' . VALUE LABELS FirstCase 0 'Duplicate Case' 1 'Primary Case'. VARIABLE LEVEL FirstCase (ORDINAL). EXECUTE. SORT CASES BY CASEID(A) YearOnly (A) . MATCH FILES /FILE = * /BY CASEID /Last = LastCase . VARIABLE LEVEL Last (ORDINAL). EXECUTE. DO IF (firstcase=1). COMPUTE FirstYear=YearOnly. END IF. DO IF (lastcase=1). COMPUTE LastYear=YearOnly. END IF. EXE. SORT CASES BY CASEID(A) YearOnly (A) . SAVE OUTFILE='C:\temp.sav'. AGGREGATE /OUTFILE=* /BREAK=CASEID /YearOnly = N(YearOnly). LOOP yr = (FirstYear) TO (LastYear). + XSAVE OUTFILE='temp.sav' /KEEP=CASEID yr . END LOOP. EXECUTE. GET FILE='temp.sav'. RENAME VARIABLE (yr=YearOnly). ****Matching the Files and filling in the gaps in years. MATCH FILES /FILE=* /FILE='C:\temp.sav' /BY CASEID YearOnly. EXECUTE. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 08:04 AM 4/4/2008, Jake Gross wrote:
>I want to insert missing cases within an id beginning from the first >year someone appears in the file to the cut-off point, 2005. How is this different from the problem you posed in your posting, Date: Thu, 3 Apr 2008 15:37:34 -0400 From: "Gross, Paul Jacob" <[hidden email]> Subject: Re: Adding missing cases to univariate data file To: [hidden email] except for starting from 1999 instead of 2000? Is the solution given in Date: Fri, 4 Apr 2008 12:25:15 -0400 From: Richard Ristow <[hidden email]> Subject: Re: Adding missing cases to univariate data file To: [hidden email] insufficient? Is the problem that you have an SPSS before release 14, and the DATASET commands don't work? But your code below seems to be for a different problem: fill in missing cases from the first year to the LAST YEAR OBSERVED, rather than to the year 2005. I'll go with that problem. Anyway, let me look at your approach. At the end, I'll post an XSAVE solution. (It could be done with VARSTOCASES, too.) The big change is using a single AGGREGATE, rather than various MATCH FILES, etc., to get first and last years for an ID in a record to run XSAVE from. Considering your code: >I have got everything down so that I can add the cases and flag the first >and last year, but I am a bit stuck on how I might use a macro or >some other approach in the LOOP command. Any ideas? Briefly, by the time you reach your LOOP command, you've wiped out the variables denoting first and last year. See comments, below. You have these variables in input and desired output: caseid YearOnly campus var2. (I'm not repeating the listing of input and desired output data. But we MUCH appreciate your posting those; it makes the job far easier.) [Code to read the test data omitted] *WRR: I didn't do the SORT, because your test ... * data came in sorted that way: ... >SORT CASES BY CASEID(A) YearOnly (A) . > >MATCH FILES > /FILE = * /BY CASEID > /FIRST = FirstCase . >VARIABLE LABELS > FirstCase > 'Indicator of each first matching case as Primary' . >VALUE LABELS FirstCase > 0 'Duplicate Case' > 1 'Primary Case'. >VARIABLE LEVEL FirstCase (ORDINAL). *WRR: The following EXECUTE isn't useful: ... >EXECUTE. *WRR: The file is already sorted as below; ... * there's no reason to sort again. ... >SORT CASES BY CASEID(A) YearOnly (A) . >MATCH FILES /FILE = * /BY CASEID > /Last = LastCase . >VARIABLE LEVEL Last (ORDINAL). *WRR: And this EXECUTE also isn't useful: ... >EXECUTE. *WRR: All code above should work. ... * However, it can be simplified by having ... * /FIRST and /LAST on the same MATCH FILES... * (The SORT may or may not be needed): ... SORT CASES BY CASEID(A) YearOnly (A) . MATCH FILES /FILE = * /BY CASEID /FIRST = FirstCase /Last = LastCase . *WRR: Below, label LastCase too, if desired ... VARIABLE LABELS FirstCase 'Indicator of each first matching case as Primary' . VALUE LABELS FirstCase 0 'Duplicate Case' 1 'Primary Case'. *WRR: The following is irrelevant, I think ... VARIABLE LEVEL FirstCase LastCase (ORDINAL). *WRR: Below is your first problem. You want ... * variables "FirstYear" and "LastYear" ... * to have values in the same case; but the ... * following gives "FirstYear" a value only ... * in the first case for a >DO IF (firstcase=1). >. COMPUTE FirstYear=YearOnly. >END IF. >DO IF (lastcase=1). >. COMPUTE LastYear=YearOnly. >END IF. *WRR: Again, the following SORT is redundant ... >SORT CASES BY CASEID(A) YearOnly (A) . *WRR: Better to do the following SAVE at the ... * beginning, to save the data as it was ... * before you added any new variables ... >SAVE OUTFILE='C:\temp.sav'. *WRR: Below is your second problem: the ... * AGGREGATE loses variables FirstYear and ... * LastYear. (It also saves the count of ... * records as YearOnly, which is confusing ... * since that variable started out meaning ... * some calendar year.) ... >AGGREGATE > /OUTFILE=* > /BREAK=CASEID > /YearOnly = N(YearOnly). *WRR: And by the time you try this LOOP, it's ... * too late; you've lost the variables you ... * need. ... * (It doesn't help that you're using both ... * 'temp.save' and 'C:\temp.sav' as scratch ... * files; that's confusing.) ... >LOOP yr = (FirstYear) TO (LastYear). >+ XSAVE OUTFILE='temp.sav' /KEEP=CASEID yr . >END LOOP. *WRR: The following EXECUTE is one of those ... * that *IS* necessary: ... >EXECUTE. [...I'm skipping further annotation...] =================================================== Solution using XSAVE (not saved separately): * ... Save original data in a dataset, rather than a scratch ... . * ... file (but the latter could be done easily) ... . DATASET NAME Original WINDOW=FRONT. LIST. List |-----------------------------|---------------------------| |Output Created |04-APR-2008 19:22:20 | |-----------------------------|---------------------------| [Original] caseid YearOnly campus var2 1 1999 2 3 1 2001 5 2 1 2002 8 2 2 2000 2 5 2 2002 3 3 2 2003 8 5 3 2004 9 6 4 2005 0 2 Number of cases read: 8 Number of cases listed: 8 * ... And activate a copy to work on: ... . DATASET COPY WorkOnIt. DATASET ACTIVATE WorkOnIT WINDOW=FRONT. * ... BIG CHANGE IN LOGIC: ... . * ... It's much easier to get the first and last years for an ... . * ... ID with a single AGGREGATE statement: ... . AGGREGATE OUTFILE=* /BREAK=caseid /FirstYear 'Earliest year with data for the ID' = MIN(YearOnly) /LastYear 'Latest year with data for the ID' = MAX(YearOnly). . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |04-APR-2008 19:22:21 | |-----------------------------|---------------------------| caseid FirstYear LastYear 1 1999 2002 2 2000 2003 3 2004 2004 4 2005 2005 Number of cases read: 4 Number of cases listed: 4 * ... The XSAVE loop isn't too different from what you wrote. ... . * ... (But there's no need to call the output year 'yr' and ... . * ... then have to rename it to 'YearOnly') ... . NUMERIC YearOnly (F4). LOOP YearOnly = (FirstYear) TO (LastYear). + XSAVE OUTFILE=BlankTemp /KEEP=CASEID YearOnly . END LOOP. EXECUTE /* this IS a necessary EXECUTE statement */. GET FILE=BlankTemp. . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |04-APR-2008 19:22:21 | |-----------------------------|---------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS\ Adding cases (within caseid) from a varying year forward - BLANKCASES.SAV caseid YearOnly 1 1999 1 2000 1 2001 1 2002 2 2000 2 2001 2 2002 2 2003 3 2004 4 2005 Number of cases read: 10 Number of cases listed: 10 * ... And the final MATCH is almost the same, too: ... . ****Matching the Files and filling in the gaps in years. MATCH FILES /FILE=* /FILE=Original /BY CASEID YearOnly. LIST. List |-----------------------------|---------------------------| |Output Created |04-APR-2008 19:22:22 | |-----------------------------|---------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS\ Adding cases (within caseid) from a varying year forward - BLANKCASES.SAV caseid YearOnly campus var2 1 1999 2 3 1 2000 . . 1 2001 5 2 1 2002 8 2 2 2000 2 5 2 2001 . . 2 2002 3 3 2 2003 8 5 3 2004 9 6 4 2005 0 2 Number of cases read: 10 Number of cases listed: 10 ============================ APPENDIX: Test data and code ============================ * ................................................................. . * ................. Target file for XSAVE ..................... . FILE HANDLE BlankTemp /NAME='C:\Documents and Settings\Richard\My Documents' + '\Temporary\SPSS\' + 'Adding cases (within caseid) from a varying year forward ' + ' - ' + 'BLANKCASES.SAV'. * ................. Test data ..................... . DATA LIST LIST/ caseid YearOnly campus var2. BEGIN DATA 1 1999 2 3 1 2001 5 2 1 2002 8 2 2 2000 2 5 2 2002 3 3 2 2003 8 5 3 2004 9 6 4 2005 0 2. END DATA. FORMATS caseid (F2) YearOnly (F4) campus (F3) var2 (F2). * ... Save original data in a dataset, rather than a scratch ... . * ... file (but the latter could be done easily) ... . DATASET NAME Original WINDOW=FRONT. LIST. * ... And activate a copy to work on: ... . DATASET COPY WorkOnIt. DATASET ACTIVATE WorkOnIT WINDOW=FRONT. * ... BIG CHANGE IN LOGIC: ... . * ... It's much easier to get the first and last years for an ... . * ... ID with a single AGGREGATE statement: ... . AGGREGATE OUTFILE=* /BREAK=caseid /FirstYear 'Earliest year with data for the ID' = MIN(YearOnly) /LastYear 'Latest year with data for the ID' = MAX(YearOnly). . /**/ LIST /*-*/. * ... The XSAVE loop isn't too different from what you wrote. ... . * ... (But there's no need to call the output year 'yr' and ... . * ... then have to rename it to 'YearOnly') ... . NUMERIC YearOnly (F4). LOOP YearOnly = (FirstYear) TO (LastYear). + XSAVE OUTFILE=BlankTemp /KEEP=CASEID YearOnly . END LOOP. EXECUTE /* this IS a necessary EXECUTE statement */. GET FILE=BlankTemp. . /**/ LIST /*-*/. * ... And the final MATCH is almost the same, too: ... . ****Matching the Files and filling in the gaps in years. MATCH FILES /FILE=* /FILE=Original /BY CASEID YearOnly. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
