Adding cases (within caseid) from a varying year forward

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding cases (within caseid) from a varying year forward

Jake Gross
Hi all,

I have received some great help so far from a few of you, but have run into
another puzzler on some data work. I want to insert missing cases within an
id beginning from the first year someone appears in the file to the cut-off
point, 2005.

I have got everything down so that I can add the cases and flag the first
and last year, but I am a bit stuck on how I might use a macro or some other
approach in the LOOP command. Any ideas?

CURRENT FORMAT

DATA LIST LIST/ caseid YearOnly campus var2.
BEGIN DATA
1 1999 2 3
1 2001 5 2
1 2002 8 2
2 2000 2 5
2 2002 3 3
2 2003 8 5
3 2004 9 6
4 2005 0 2.
END DATA.

DESIRED FORMAT
1 1999 2 3
1 2000
1 2001 5 2
1 2002 8 2
1 2003
1 2004
1 2005
2 2000 2 5
2 2001
2 2002 3 3
2 2003 8 5
2 2004
2 2005
3 2004 9 6
3 2005
4 2005 0 2.
END DATA.


MY CODE SO FAR

DATA LIST LIST/ caseid YearOnly campus var2.
BEGIN DATA
1 2000 2 3
1 2001 5 2
1 2002 8 2
2 2000 2 5
2 2002 3 3
2 2003 8 5
3 2004 9 6
4 2005 0 2.
END DATA.

SORT CASES BY CASEID(A) YearOnly (A) .
MATCH FILES /FILE = *  /BY CASEID
 /FIRST = FirstCase .
VARIABLE LABELS FirstCase 'Indicator of each first matching case as Primary' .
VALUE LABELS FirstCase 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL FirstCase (ORDINAL).
EXECUTE.


SORT CASES BY CASEID(A) YearOnly (A) .
MATCH FILES /FILE = *  /BY CASEID
 /Last = LastCase .
VARIABLE LEVEL Last (ORDINAL).
EXECUTE.

DO IF (firstcase=1).
COMPUTE FirstYear=YearOnly.
END IF.
DO IF  (lastcase=1).
COMPUTE LastYear=YearOnly.
END IF.
EXE.

SORT CASES BY CASEID(A) YearOnly (A) .
SAVE OUTFILE='C:\temp.sav'.

AGGREGATE
  /OUTFILE=*
  /BREAK=CASEID
  /YearOnly = N(YearOnly).

LOOP yr = (FirstYear) TO (LastYear).
+ XSAVE OUTFILE='temp.sav' /KEEP=CASEID yr .
END LOOP.
EXECUTE.
GET FILE='temp.sav'.
RENAME VARIABLE (yr=YearOnly).

****Matching the Files and filling in the gaps in years.
MATCH FILES /FILE=*
  /FILE='C:\temp.sav'
  /BY CASEID YearOnly.
EXECUTE.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Adding cases (within caseid) from a varying year forward

Richard Ristow
At 08:04 AM 4/4/2008, Jake Gross wrote:

>I want to insert missing cases within an id beginning from the first
>year someone appears in the file to the cut-off point, 2005.

How is this different from the problem you posed in your posting,
  Date:    Thu, 3 Apr 2008 15:37:34 -0400
  From:    "Gross, Paul Jacob" <[hidden email]>
  Subject: Re: Adding missing cases to univariate data file
  To:      [hidden email]
except for starting from 1999 instead of 2000? Is the solution given in
  Date:    Fri, 4 Apr 2008 12:25:15 -0400
  From:    Richard Ristow <[hidden email]>
  Subject: Re: Adding missing cases to univariate data file
  To:      [hidden email]
insufficient? Is the problem that you have an SPSS before release 14,
and the DATASET commands don't work?

But your code below seems to be for a different problem: fill in
missing cases from the first year to the LAST YEAR OBSERVED, rather
than to the year 2005. I'll go with that problem.

Anyway, let me look at your approach. At the end, I'll post an XSAVE
solution. (It could be done with VARSTOCASES, too.) The big change is
using a single AGGREGATE, rather than various MATCH FILES, etc., to
get first and last years for an ID in a record to
run XSAVE from.


Considering your code:
>I have got everything down so that I can add the cases and flag the first
>and last year, but I am a bit stuck on how I might use a macro or
>some other approach in the LOOP command. Any ideas?

Briefly, by the time you reach your LOOP command, you've wiped out
the variables denoting first and last year. See comments, below.

You have these variables in input and desired output:
  caseid YearOnly campus var2.
(I'm not repeating the listing of input and desired output data. But
we MUCH appreciate your posting those; it makes the job far easier.)

[Code to read the test data omitted]

*WRR:   I didn't do the SORT, because your test ...
*       data came in sorted that way:           ...

>SORT CASES BY CASEID(A) YearOnly (A) .
>
>MATCH FILES
>   /FILE  = *  /BY CASEID
>   /FIRST = FirstCase .
>VARIABLE LABELS
>              FirstCase
>             'Indicator of each first matching case as Primary' .
>VALUE LABELS FirstCase
>              0 'Duplicate Case'
>              1 'Primary Case'.
>VARIABLE LEVEL FirstCase (ORDINAL).

*WRR:   The following EXECUTE isn't useful:     ...
>EXECUTE.


*WRR:   The file is already sorted as below;    ...
*       there's no reason to sort again.        ...
>SORT CASES BY CASEID(A) YearOnly (A) .
>MATCH FILES /FILE = *  /BY CASEID
>  /Last = LastCase .
>VARIABLE LEVEL Last (ORDINAL).
*WRR:   And this EXECUTE also isn't useful:     ...
>EXECUTE.

*WRR:   All code above should work.             ...
*       However, it can be simplified by having ...
*       /FIRST and /LAST on the same MATCH FILES...
*       (The SORT may or may not be needed):    ...
SORT CASES BY CASEID(A) YearOnly (A) .

MATCH FILES
   /FILE  = *  /BY CASEID
   /FIRST =   FirstCase
   /Last  =   LastCase .

*WRR:   Below, label LastCase too, if desired   ...
VARIABLE LABELS
              FirstCase
             'Indicator of each first matching case as Primary' .
VALUE LABELS FirstCase
              0 'Duplicate Case'
              1 'Primary Case'.

*WRR:   The following is irrelevant, I think    ...
VARIABLE LEVEL FirstCase LastCase (ORDINAL).



*WRR:  Below is your first problem. You want    ...
*      variables "FirstYear" and "LastYear"     ...
*      to have values in the same case; but the ...
*      following gives "FirstYear" a value only ...
*      in the first case for a
>DO IF (firstcase=1).
>.  COMPUTE FirstYear=YearOnly.
>END IF.
>DO IF  (lastcase=1).
>.  COMPUTE LastYear=YearOnly.
>END IF.

*WRR:  Again, the following SORT is redundant   ...
>SORT CASES BY CASEID(A) YearOnly (A) .

*WRR:  Better to do the following SAVE at the   ...
*      beginning, to save the data as it was    ...
*      before you added any new variables       ...
>SAVE OUTFILE='C:\temp.sav'.

*WRR:  Below is your second problem: the        ...
*      AGGREGATE loses variables FirstYear and  ...
*      LastYear.  (It also saves the count of   ...
*      records as YearOnly, which is confusing  ...
*      since that variable started out meaning  ...
*      some calendar year.)                     ...

>AGGREGATE
>   /OUTFILE=*
>   /BREAK=CASEID
>   /YearOnly = N(YearOnly).

*WRR:  And by the time you try this LOOP, it's  ...
*      too late; you've lost the variables you  ...
*      need.                                    ...
*      (It doesn't help that you're using both  ...
*      'temp.save' and 'C:\temp.sav' as scratch ...
*      files; that's confusing.)                ...
>LOOP yr = (FirstYear) TO (LastYear).
>+ XSAVE OUTFILE='temp.sav' /KEEP=CASEID yr .
>END LOOP.
*WRR:  The following EXECUTE is one of those    ...
*      that *IS* necessary:                     ...
>EXECUTE.
[...I'm skipping further annotation...]
===================================================
Solution using XSAVE (not saved separately):

*  ...   Save original data in a dataset, rather than a scratch  ... .
*  ...   file (but the latter could be done easily)              ... .
DATASET NAME     Original WINDOW=FRONT.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |04-APR-2008 19:22:20       |
|-----------------------------|---------------------------|
[Original]

caseid YearOnly campus var2

    1     1999       2    3
    1     2001       5    2
    1     2002       8    2
    2     2000       2    5
    2     2002       3    3
    2     2003       8    5
    3     2004       9    6
    4     2005       0    2

Number of cases read:  8    Number of cases listed:  8


*  ...   And activate a copy to work on:                         ... .
DATASET COPY     WorkOnIt.
DATASET ACTIVATE WorkOnIT  WINDOW=FRONT.


*  ...   BIG CHANGE IN LOGIC:                                    ... .
*  ...   It's much easier to get the first and last years for an ... .
*  ...   ID with a single AGGREGATE statement:                   ... .

AGGREGATE OUTFILE=*
    /BREAK=caseid
    /FirstYear 'Earliest year with data for the ID' = MIN(YearOnly)
    /LastYear  'Latest   year with data for the ID' = MAX(YearOnly).

.  /**/ LIST /*-*/.

List

|-----------------------------|---------------------------|
|Output Created               |04-APR-2008 19:22:21       |
|-----------------------------|---------------------------|
caseid FirstYear LastYear

    1      1999     2002
    2      2000     2003
    3      2004     2004
    4      2005     2005

Number of cases read:  4    Number of cases listed:  4


*  ...   The XSAVE loop isn't too different from what you wrote. ... .
*  ...   (But there's no need to call the output year 'yr' and   ... .
*  ...   then have to rename it to 'YearOnly')                   ... .

NUMERIC YearOnly (F4).
LOOP    YearOnly = (FirstYear) TO (LastYear).
+ XSAVE OUTFILE=BlankTemp /KEEP=CASEID YearOnly .
END LOOP.
EXECUTE /* this IS a necessary EXECUTE statement */.
GET FILE=BlankTemp.

.  /**/ LIST /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |04-APR-2008 19:22:21       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents
   \Temporary\SPSS\
   Adding cases (within caseid) from a varying year forward    - BLANKCASES.SAV

caseid YearOnly

    1     1999
    1     2000
    1     2001
    1     2002
    2     2000
    2     2001
    2     2002
    2     2003
    3     2004
    4     2005

Number of cases read:  10    Number of cases listed:  10


*  ...   And the final MATCH is almost the same, too:            ... .
****Matching the Files and filling in the gaps in years.
MATCH FILES
   /FILE=*
   /FILE=Original
   /BY CASEID YearOnly.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |04-APR-2008 19:22:22       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents
   \Temporary\SPSS\
   Adding cases (within caseid) from a varying year forward    - BLANKCASES.SAV

caseid YearOnly campus var2

    1     1999       2    3
    1     2000       .    .
    1     2001       5    2
    1     2002       8    2
    2     2000       2    5
    2     2001       .    .
    2     2002       3    3
    2     2003       8    5
    3     2004       9    6
    4     2005       0    2

Number of cases read:  10    Number of cases listed:  10
============================
APPENDIX: Test data and code
============================

*  ................................................................. .
*  .................   Target file for XSAVE   ..................... .
FILE HANDLE BlankTemp
  /NAME='C:\Documents and Settings\Richard\My Documents'              +
          '\Temporary\SPSS\'                                          +
        'Adding cases (within caseid) from a varying year forward   ' +
        ' - '                                                         +
        'BLANKCASES.SAV'.

*  .................   Test data               ..................... .
DATA LIST LIST/ caseid YearOnly campus var2.
BEGIN DATA
1 1999 2 3
1 2001 5 2
1 2002 8 2
2 2000 2 5
2 2002 3 3
2 2003 8 5
3 2004 9 6
4 2005 0 2.
END DATA.

FORMATS   caseid   (F2)
           YearOnly (F4)
           campus   (F3)
           var2     (F2).

*  ...   Save original data in a dataset, rather than a scratch  ... .
*  ...   file (but the latter could be done easily)              ... .
DATASET NAME     Original WINDOW=FRONT.
LIST.

*  ...   And activate a copy to work on:                         ... .
DATASET COPY     WorkOnIt.
DATASET ACTIVATE WorkOnIT  WINDOW=FRONT.

*  ...   BIG CHANGE IN LOGIC:                                    ... .
*  ...   It's much easier to get the first and last years for an ... .
*  ...   ID with a single AGGREGATE statement:                   ... .


AGGREGATE OUTFILE=*
    /BREAK=caseid
    /FirstYear 'Earliest year with data for the ID' = MIN(YearOnly)
    /LastYear  'Latest   year with data for the ID' = MAX(YearOnly).

.  /**/ LIST /*-*/.

*  ...   The XSAVE loop isn't too different from what you wrote. ... .
*  ...   (But there's no need to call the output year 'yr' and   ... .
*  ...   then have to rename it to 'YearOnly')                   ... .

NUMERIC YearOnly (F4).
LOOP    YearOnly = (FirstYear) TO (LastYear).
+ XSAVE OUTFILE=BlankTemp /KEEP=CASEID YearOnly .
END LOOP.
EXECUTE /* this IS a necessary EXECUTE statement */.
GET FILE=BlankTemp.

.  /**/ LIST /*-*/.


*  ...   And the final MATCH is almost the same, too:            ... .
****Matching the Files and filling in the gaps in years.
MATCH FILES
   /FILE=*
   /FILE=Original
   /BY CASEID YearOnly.

LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD