Fixing multiple records for same time interval...

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Fixing multiple records for same time interval...

rich reeves
Hi,

I have a data set where individuals have multiple rows with interval
information. I want to create a maximum interval for each set of overlapping
intervals for an individual.  That is for a set of rows with overlapping
time intervals I want the minimum begin date and the maximum end date.

What I have looks like:

     id

 time_begin
 time_end
 a

 8/15/2006

 12/15/2006
 a

 8/1/2006

 12/11/2006
 a

 8/31/2006

 11/20/2006
 a

 9/15/2006

 12/31/2006
 a

 1/15/2007

 5/1/2007
 a

 5/31/2007

 8/1/2007
 a

 6/15/2007

 8/31/2007



I want that reduced to:
     a

 8/1/2006

 12/31/2006
 a

 1/15/2007

 5/1/2007
 a

 5/31/2007

 8/31/2007
Any help is appreciated.
rich

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fixing multiple records for same time interval...

Melissa Ives
Seems like you could use varstocases by id to create records like..

A time_begin1 time_begin2...time_end1 time_end2...
B time_begin1 time_begin2...time_end1 time_end2...
Then use min and max

Compute Begin_first=min(time_begin1 to time_beginX).
Compute End_last=min(time_begin1 to time_beginX).
Compute maxint=date.dif(End_last,Begin_first,"interval").

I'm sure there are other ways (e.g. Aggregate with first and last, then
calculate maxint.)

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
rich reeves
Sent: Wednesday, December 19, 2007 9:01 AM
To: [hidden email]
Subject: [SPSSX-L] Fixing multiple records for same time interval...

Hi,

I have a data set where individuals have multiple rows with interval
information. I want to create a maximum interval for each set of
overlapping intervals for an individual.  That is for a set of rows with
overlapping time intervals I want the minimum begin date and the maximum
end date.

What I have looks like:

     id

 time_begin
 time_end
 a

 8/15/2006

 12/15/2006
 a

 8/1/2006

 12/11/2006
 a

 8/31/2006

 11/20/2006
 a

 9/15/2006

 12/31/2006
 a

 1/15/2007

 5/1/2007
 a

 5/31/2007

 8/1/2007
 a

 6/15/2007

 8/31/2007



I want that reduced to:
     a

 8/1/2006

 12/31/2006
 a

 1/15/2007

 5/1/2007
 a

 5/31/2007

 8/31/2007
Any help is appreciated.
rich

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fixing multiple records for same time interval...

Melissa Ives
In reply to this post by rich reeves
Absolutely---that's what comes of copy and paste without editing ALL of
the paste
Nice catch.
Melissa

-----Original Message-----
From: Gene Maguin [mailto:[hidden email]]
Sent: Wednesday, December 19, 2007 9:20 AM
To: Melissa Ives
Subject: RE: Fixing multiple records for same time interval...

Melissa,

I was looking at your code to see how you do the problem and I have a
couple of questions on what you posted.

A time_begin1 time_begin2...time_end1 time_end2...
B time_begin1 time_begin2...time_end1 time_end2...
Then use min and max

Compute Begin_first=min(time_begin1 to time_beginX).
Compute End_last=min(time_begin1 to time_beginX).
Compute maxint=date.dif(End_last,Begin_first,"interval").


Shouldn't this line be

Compute End_last=max(time_end1 to time_endX).

Gene Maguin




PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fixing multiple records for same time interval...

rich reeves
Thanks for that,
I have 9.7 million records with some intervals overlapping and some not and
many individuals have 40+ records with other indicators.  I was hoping for a
solution that wouldn't create 400 variables.  Maybe the aggregate can work.
I may try it if I can figure out a way to identify the overlapping intervals
using lag commands.
Thanks again.

On Dec 19, 2007 10:32 AM, Melissa Ives <[hidden email]> wrote:

> Absolutely---that's what comes of copy and paste without editing ALL of
> the paste
> Nice catch.
> Melissa
>
> -----Original Message-----
> From: Gene Maguin [mailto:[hidden email]]
> Sent: Wednesday, December 19, 2007 9:20 AM
> To: Melissa Ives
> Subject: RE: Fixing multiple records for same time interval...
>
> Melissa,
>
> I was looking at your code to see how you do the problem and I have a
> couple of questions on what you posted.
>
> A time_begin1 time_begin2...time_end1 time_end2...
> B time_begin1 time_begin2...time_end1 time_end2...
> Then use min and max
>
> Compute Begin_first=min(time_begin1 to time_beginX).
> Compute End_last=min(time_begin1 to time_beginX).
> Compute maxint=date.dif(End_last,Begin_first,"interval").
>
>
> Shouldn't this line be
>
> Compute End_last=max(time_end1 to time_endX).
>
> Gene Maguin
>
>
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fixing multiple records for same time interval...

Maguin, Eugene
In reply to this post by rich reeves
Rich,

Yes, 9.7 million is a few. Melissa's alternate suggestion of using Aggregate
is would be better but it requires a certain file structure, which you don't
seem to have. And, by the way, her first suggestion won't work without a
certain file structure.

Your posting suggests to me that you have four (or five) records per case,
case being defined as an id value, a blank line, a beginning time value, a
possible additional blank line, and an ending time value. Is this true?

What I have looks like:

     id

 time_begin
 time_end
 a

 8/15/2006

 12/15/2006


Or do you have

     id time_begin time_end
 a 8/15/2006 12/15/2006

Then you say

I want that reduced to:
     a

 8/1/2006

 12/31/2006
 a

 1/15/2007

 5/1/2007
 a

 5/31/2007

 8/31/2007


Do you mean what you wrote or do you mean

 a 8/1/2006 12/31/2006
 a 1/15/2007 5/1/2007
 a 5/31/2007 8/31/2007

If you mean what you wrote then, there are some additional steps to do. If
you mean the alternative I described, then you can use Melissa's aggregate
suggestion straight away.

Please clarify.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fixing multiple records for same time interval...

Richard Ristow
In reply to this post by rich reeves
A programming exercise that got put off, in hectic holiday travel.

At 10:00 AM 12/19/2007, rich reeves wrote:

>I have a data set where individuals have multiple rows with interval
>information. What I have looks like:

|-----------------------------|---------------------------|
|Output Created               |03-JAN-2008 21:58:42       |
|-----------------------------|---------------------------|
id time_begin   time_end

a  08/15/2006 12/15/2006
a  08/01/2006 12/11/2006
a  08/31/2006 11/20/2006
a  09/15/2006 12/31/2006
a  01/15/2007 05/01/2007
a  05/31/2007 08/01/2007
a  06/15/2007 08/31/2007

Number of cases read:  7    Number of cases listed:  7
(This is reformatted to what you probably actually have. Your example
came through with many line-breaks you probably didn't intend; I
wonder why this happens so often?)

>I want to create a maximum interval for each set of overlapping
>intervals for an individual.  That is for a set of rows with
>overlapping time intervals I want the minimum begin date and the
>maximum end date:

    id    time_begin   time_end
     a    8/1/2006     12/31/2006
     a    1/15/2007    5/1/2007
     a    5/31/2007    8/31/2007

That is, you want one record for every interval in which all days
fell within the interval for some one of the records, by ID. Here's a
solution using LAG and AGGREGATE. Variable #LATEST is a scratch
variable (see the SPSS *Command Syntax Reference*), which means that
(a) it keeps its value from one case to the next (b) it isn't kept in
the file. SPSS 14 draft output (WRR:not saved separately):

*   Assign each record to an interval of .
*   continuous 'activity':               .

SORT CASES BY time_begin.
NUMERIC INTERVAL (F3).
NUMERIC #LATEST  (ADATE10).

DO IF   $CASENUM EQ 1         /* First record in the file */.
.  COMPUTE INTERVAL = 1.
.  COMPUTE #LATEST  = time_end.
ELSE IF id NE LAG(ID)         /* First record for an id   */.
.  COMPUTE INTERVAL = 1.
.  COMPUTE #LATEST  = time_end.
ELSE IF time_begin LE #LATEST /* Continue current interval*/.
.  COMPUTE INTERVAL = LAG(INTERVAL).
.  COMPUTE #LATEST  = MAX(#LATEST,time_end).
ELSE                          /* Start new interval       */.
.  COMPUTE INTERVAL = LAG(INTERVAL) + 1.
.  COMPUTE #LATEST  = time_end.
END IF.

.  /**/  LIST /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |03-JAN-2008 22:16:26       |
|-----------------------------|---------------------------|
id time_begin   time_end INTERVAL

a  08/01/2006 12/11/2006      1
a  08/15/2006 12/15/2006      1
a  08/31/2006 11/20/2006      1
a  09/15/2006 12/31/2006      1
a  01/15/2007 05/01/2007      2
a  05/31/2007 08/01/2007      3
a  06/15/2007 08/31/2007      3

Number of cases read:  7    Number of cases listed:  7


*   Get earliest and latest dates for    .
*   each interval:                       .

AGGREGATE OUTFILE=*
   /BREAK = ID INTERVAL
   /time_begin = MIN(time_begin)
   /time_end   = MAX(time_end).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |03-JAN-2008 22:16:27       |
|-----------------------------|---------------------------|
id INTERVAL time_begin   time_end

a       1   08/01/2006 12/31/2006
a       2   01/15/2007 05/01/2007
a       3   05/31/2007 08/31/2007

Number of cases read:  3    Number of cases listed:  3

===================
APPENDIX: Test data
===================
DATA LIST LIST/
    id    time_begin   time_end
   (A1    ADATE10      ADATE10).
BEGIN DATA
     a    8/15/2006    12/15/2006
     a    8/1/2006     12/11/2006
     a    8/31/2006    11/20/2006
     a    9/15/2006    12/31/2006
     a    1/15/2007    5/1/2007
     a    5/31/2007    8/1/2007
     a    6/15/2007    8/31/2007
END DATA.
LIST.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD