|
Hi,
I have a data set where individuals have multiple rows with interval information. I want to create a maximum interval for each set of overlapping intervals for an individual. That is for a set of rows with overlapping time intervals I want the minimum begin date and the maximum end date. What I have looks like: id time_begin time_end a 8/15/2006 12/15/2006 a 8/1/2006 12/11/2006 a 8/31/2006 11/20/2006 a 9/15/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/1/2007 a 6/15/2007 8/31/2007 I want that reduced to: a 8/1/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/31/2007 Any help is appreciated. rich ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Seems like you could use varstocases by id to create records like..
A time_begin1 time_begin2...time_end1 time_end2... B time_begin1 time_begin2...time_end1 time_end2... Then use min and max Compute Begin_first=min(time_begin1 to time_beginX). Compute End_last=min(time_begin1 to time_beginX). Compute maxint=date.dif(End_last,Begin_first,"interval"). I'm sure there are other ways (e.g. Aggregate with first and last, then calculate maxint.) Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of rich reeves Sent: Wednesday, December 19, 2007 9:01 AM To: [hidden email] Subject: [SPSSX-L] Fixing multiple records for same time interval... Hi, I have a data set where individuals have multiple rows with interval information. I want to create a maximum interval for each set of overlapping intervals for an individual. That is for a set of rows with overlapping time intervals I want the minimum begin date and the maximum end date. What I have looks like: id time_begin time_end a 8/15/2006 12/15/2006 a 8/1/2006 12/11/2006 a 8/31/2006 11/20/2006 a 9/15/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/1/2007 a 6/15/2007 8/31/2007 I want that reduced to: a 8/1/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/31/2007 Any help is appreciated. rich ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by rich reeves
Absolutely---that's what comes of copy and paste without editing ALL of
the paste Nice catch. Melissa -----Original Message----- From: Gene Maguin [mailto:[hidden email]] Sent: Wednesday, December 19, 2007 9:20 AM To: Melissa Ives Subject: RE: Fixing multiple records for same time interval... Melissa, I was looking at your code to see how you do the problem and I have a couple of questions on what you posted. A time_begin1 time_begin2...time_end1 time_end2... B time_begin1 time_begin2...time_end1 time_end2... Then use min and max Compute Begin_first=min(time_begin1 to time_beginX). Compute End_last=min(time_begin1 to time_beginX). Compute maxint=date.dif(End_last,Begin_first,"interval"). Shouldn't this line be Compute End_last=max(time_end1 to time_endX). Gene Maguin PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks for that,
I have 9.7 million records with some intervals overlapping and some not and many individuals have 40+ records with other indicators. I was hoping for a solution that wouldn't create 400 variables. Maybe the aggregate can work. I may try it if I can figure out a way to identify the overlapping intervals using lag commands. Thanks again. On Dec 19, 2007 10:32 AM, Melissa Ives <[hidden email]> wrote: > Absolutely---that's what comes of copy and paste without editing ALL of > the paste > Nice catch. > Melissa > > -----Original Message----- > From: Gene Maguin [mailto:[hidden email]] > Sent: Wednesday, December 19, 2007 9:20 AM > To: Melissa Ives > Subject: RE: Fixing multiple records for same time interval... > > Melissa, > > I was looking at your code to see how you do the problem and I have a > couple of questions on what you posted. > > A time_begin1 time_begin2...time_end1 time_end2... > B time_begin1 time_begin2...time_end1 time_end2... > Then use min and max > > Compute Begin_first=min(time_begin1 to time_beginX). > Compute End_last=min(time_begin1 to time_beginX). > Compute maxint=date.dif(End_last,Begin_first,"interval"). > > > Shouldn't this line be > > Compute End_last=max(time_end1 to time_endX). > > Gene Maguin > > > > > PRIVILEGED AND CONFIDENTIAL INFORMATION > This transmittal and any attachments may contain PRIVILEGED AND > CONFIDENTIAL information and is intended only for the use of the > addressee. If you are not the designated recipient, or an employee > or agent authorized to deliver such transmittals to the designated > recipient, you are hereby notified that any dissemination, > copying or publication of this transmittal is strictly prohibited. If > you have received this transmittal in error, please notify us > immediately by replying to the sender and delete this copy from your > system. You may also call us at (309) 827-6026 for assistance. > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by rich reeves
Rich,
Yes, 9.7 million is a few. Melissa's alternate suggestion of using Aggregate is would be better but it requires a certain file structure, which you don't seem to have. And, by the way, her first suggestion won't work without a certain file structure. Your posting suggests to me that you have four (or five) records per case, case being defined as an id value, a blank line, a beginning time value, a possible additional blank line, and an ending time value. Is this true? What I have looks like: id time_begin time_end a 8/15/2006 12/15/2006 Or do you have id time_begin time_end a 8/15/2006 12/15/2006 Then you say I want that reduced to: a 8/1/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/31/2007 Do you mean what you wrote or do you mean a 8/1/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/31/2007 If you mean what you wrote then, there are some additional steps to do. If you mean the alternative I described, then you can use Melissa's aggregate suggestion straight away. Please clarify. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by rich reeves
A programming exercise that got put off, in hectic holiday travel.
At 10:00 AM 12/19/2007, rich reeves wrote: >I have a data set where individuals have multiple rows with interval >information. What I have looks like: |-----------------------------|---------------------------| |Output Created |03-JAN-2008 21:58:42 | |-----------------------------|---------------------------| id time_begin time_end a 08/15/2006 12/15/2006 a 08/01/2006 12/11/2006 a 08/31/2006 11/20/2006 a 09/15/2006 12/31/2006 a 01/15/2007 05/01/2007 a 05/31/2007 08/01/2007 a 06/15/2007 08/31/2007 Number of cases read: 7 Number of cases listed: 7 (This is reformatted to what you probably actually have. Your example came through with many line-breaks you probably didn't intend; I wonder why this happens so often?) >I want to create a maximum interval for each set of overlapping >intervals for an individual. That is for a set of rows with >overlapping time intervals I want the minimum begin date and the >maximum end date: id time_begin time_end a 8/1/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/31/2007 That is, you want one record for every interval in which all days fell within the interval for some one of the records, by ID. Here's a solution using LAG and AGGREGATE. Variable #LATEST is a scratch variable (see the SPSS *Command Syntax Reference*), which means that (a) it keeps its value from one case to the next (b) it isn't kept in the file. SPSS 14 draft output (WRR:not saved separately): * Assign each record to an interval of . * continuous 'activity': . SORT CASES BY time_begin. NUMERIC INTERVAL (F3). NUMERIC #LATEST (ADATE10). DO IF $CASENUM EQ 1 /* First record in the file */. . COMPUTE INTERVAL = 1. . COMPUTE #LATEST = time_end. ELSE IF id NE LAG(ID) /* First record for an id */. . COMPUTE INTERVAL = 1. . COMPUTE #LATEST = time_end. ELSE IF time_begin LE #LATEST /* Continue current interval*/. . COMPUTE INTERVAL = LAG(INTERVAL). . COMPUTE #LATEST = MAX(#LATEST,time_end). ELSE /* Start new interval */. . COMPUTE INTERVAL = LAG(INTERVAL) + 1. . COMPUTE #LATEST = time_end. END IF. . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |03-JAN-2008 22:16:26 | |-----------------------------|---------------------------| id time_begin time_end INTERVAL a 08/01/2006 12/11/2006 1 a 08/15/2006 12/15/2006 1 a 08/31/2006 11/20/2006 1 a 09/15/2006 12/31/2006 1 a 01/15/2007 05/01/2007 2 a 05/31/2007 08/01/2007 3 a 06/15/2007 08/31/2007 3 Number of cases read: 7 Number of cases listed: 7 * Get earliest and latest dates for . * each interval: . AGGREGATE OUTFILE=* /BREAK = ID INTERVAL /time_begin = MIN(time_begin) /time_end = MAX(time_end). LIST. List |-----------------------------|---------------------------| |Output Created |03-JAN-2008 22:16:27 | |-----------------------------|---------------------------| id INTERVAL time_begin time_end a 1 08/01/2006 12/31/2006 a 2 01/15/2007 05/01/2007 a 3 05/31/2007 08/31/2007 Number of cases read: 3 Number of cases listed: 3 =================== APPENDIX: Test data =================== DATA LIST LIST/ id time_begin time_end (A1 ADATE10 ADATE10). BEGIN DATA a 8/15/2006 12/15/2006 a 8/1/2006 12/11/2006 a 8/31/2006 11/20/2006 a 9/15/2006 12/31/2006 a 1/15/2007 5/1/2007 a 5/31/2007 8/1/2007 a 6/15/2007 8/31/2007 END DATA. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
