Using Syntax to create a data cut-off point

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Syntax to create a data cut-off point

[Ela Bonbevan]
Hi all,

I have 2 datasets which have MANY variables that contain dates -
essentially activities of individuals over a 20 year period.


Ideally, I would like to compare the 2 datasets, but due to a data
retrieval delay, one of the sets was obtained rougly 6 months after the
first dataset so it contains more recent records.  I want to set a data cut
of point equal to the most recent record in the first dataset and then
write some syntax to fill all of the dates past the cutoff as sysmis in
dataset 2. Is this possible to do as some kind of global command, or must I
do this variable by variable?

Many thanks
Diane
Reply | Threaded
Open this post in threaded view
|

Re: Using Syntax to create a data cut-off point

Maguin, Eugene
Diane,

The first that occurred to me when I read your message is that it would be
simpler to select cases that satisfy the date critera. In answer to your
specific question, there is no global command. You have to do it variable by
variable. However, you can use a Do repeat structure in combination with the
$sysmis keyword to 'blank out' dates.

Example.

Do repeat in=d1 d2 d3.
+  if (in lt date.mdy(mm,dd,yyyy)) in=$sysmis.
End repeat.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Using Syntax to create a data cut-off point

Richard Ristow
In reply to this post by [Ela Bonbevan]
At 04:46 PM 8/30/2006, [Ela Bonbevan] wrote:

>I have 2 datasets which have MANY variables that contain dates -
>essentially activities of individuals over a 20 year period.
>
>I would like to compare the 2 datasets, but one of the sets was
>obtained roughly 6 months after the first dataset so it contains more
>recent records.  I want to set a data cut of point equal to the most
>recent record in the first dataset and then write some syntax to fill
>all of the dates past the cutoff as sysmis in dataset 2.

You've seen Gene's suggestion, which I think is about as well as you
can do.

(Although, Gene, where you have
+  Do repeat in=d1 d2 d3.
+     if (in lt date.mdy(mm,dd,yyyy)) in=$sysmis.
+  End repeat.
should it be 'GT' instead of 'lt'? I think the goal was to blank
variables after the cutoff date.)

Now, Ela, you have "2 datasets which have MANY variables that contain
dates - essentially activities of individuals over a 20 year period."
If I have that right, each record records a lot of different events.
This is called 'wide' organization. Without giving all the details and
reasons now, 'long' organization, with a separate record for each
event, is much easier to do most things with. I'll conjecture that
includes your comparison of the two datasets. VARSTOCASES is pretty
good for converting 'wide' files to 'long'.