using LAG to check the validity of missing data/create new vars

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

using LAG to check the validity of missing data/create new vars

Matthew Boswell
Lets say I have the following time series data:

YEAR   ID       X
1980   A       40
1981   A        .
1982   A       35
1980   B        1
1981   B        .
1982   B        0


I think that the value for X in 1981 for A is actually
missing data.  However, it looks like the value for B
in 1981 should be 0.  We made the decision rule that
if  the preceeding value for X was less than or equal
to 2,  and x was sysmis, then we would code that
sysmis as 0, rather than sysmis.

I have tried a few ways to compare the variable X with
LAG (X), filter those that meet the criteria and then
do a COMPUTE statement to create recode the sysmis as
0, but no avail.  If anyone has any suggestions, I
would appreciate it!!

thanks-
Matt B.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: using LAG to check the validity of missing data/create new vars

Maguin, Eugene
Matthew,

I'm guessing that you have want to reset the value only if the prior value
has the same value of ID as the sysmis value does and it is also less than
or equal to 2. I'll assume that to be true. This should do it.

Do if (id eq lag(id)).
+  if (sysmis(x) and lag(x) le 2) x=0.
End if.


Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: using LAG to check the validity of missing data/create new vars

Richard Ristow
In reply to this post by Matthew Boswell
At 02:59 PM 4/27/2007, Matthew Boswell wrote:

>Lets say I have the following time series data:
>
>YEAR   ID       X
>1980   A       40
>1981   A        .
>1982   A       35
>1980   B        1
>1981   B        .
>1982   B        0
>
>We made the decision rule that if  the preceeding value for X was less
>than or equal to 2,  and x was sysmis, then we would code that sysmis
>as 0, rather than sysmis.

Well, assuming that this propagates (i.e., a re-assigned 0 is as good
as a 0 in the data), try this (SPSS 15 draft output -WRR, not saved
separately). I've computed into a new variable so it'll be clearer
what's been done, but I *strongly* recommend that for your data. That
is, keep the original value, missings and all, so you'll never lose
what you have, and you can rethink your decision.

|-----------------------------|---------------------------|
|Output Created               |27-APR-2007 16:42:20       |
|-----------------------------|---------------------------|
YEAR ID   X

1980 A   40
1981 A    .
1982 A   35
1980 B    1
1981 B    .
1982 B    0

Number of cases read:  6    Number of cases listed:  6


NUMERIC NEW_X (F3).
COMPUTE NEW_X = X.

IF    LAG(ID) EQ ID   /* Same person. (Assumes ID is never blank)
   AND LAG(X)  LE  2   /* "preceeding value less than or equal to 2"
   AND SYSMIS(NEW_X)   /* "x was sysmis"
       NEW_X = 0.      /* "code that sysmis as 0"                   */.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |27-APR-2007 16:42:20       |
|-----------------------------|---------------------------|
YEAR ID   X NEW_X

1980 A   40   40
1981 A    .    .
1982 A   35   35
1980 B    1    1
1981 B    .    0
1982 B    0    0

Number of cases read:  6    Number of cases listed:  6
===================
APPENDIX: Test data
===================
DATA LIST LIST SKIP=1
   /YEAR(F4) ID(A2) X(F3).
BEGIN DATA
YEAR   ID       X
1980   A       40
1981   A        .
1982   A       35
1980   B        1
1981   B        .
1982   B        0
END DATA.
SORT CASES BY ID YEAR.
LIST.