Here’s one I can’t get my head around. I can get part of this via the lag function and it’s easy with one complication that I can’t figure out. I’m assuming it needs some type of macro or a few nested loops. What’s the best way to do the following. Each person has an ID and can have several different rows. Each row as a begin date (DateIn) and end date (DateOut) but also date of an occurrence (compare date). I have to compare each of the CompareDates with the DateIn and DateOut periods to see whether any fall within the period from DateIn to DateOut (inclusive of the DateIn and DateOut) across all rows with the same ID number. Result = 1 when the CompareDate falls within any of the DateIn to DateOut periods within the same Person ID number (either in rows before or after). So it’s possible that the CompareDate can be within an interval on the row either before it or after it and that’s the part that’s throwing me. In getting started, I can do the following easily, but unsure how to finish. If((CompareDate >= DateIn) and (CompareDate <= DateOut)) Result = 1. If((CompareDate >= lag(DateIn)) and (CompareDate <= lag(DateOut))) Result = 1. If((CompareDate >= lag(DateIn 2)) and (CompareDate <= lag(DateOut,2))) Result = 1. If((CompareDate >= lag(DateIn,3)) and (CompareDate <= lag(DateOut 3))) Result = 1. Example with just the first 2 people. Each person will have a different number of rows. ID DateIn DateOut CompareDate Result 1 01-Sep-04 09-Sep-04 05-Oct-04 1 1 23-Sep-04 30-Sep-04 15-Jan-04 0 1 10-Sep-04 09-Nov-04 06-Sep-04 1 2 01-Jan-04 09-Feb-04 05-Oct-04 0 2 20-Feb-04 15-Mar-04 24-Sept-04 0 Thanks in advance. Jeff |
Administrator
|
If I follow, on each row, you wish to set a flag to 1 if CompareDate on that
row falls between MIN(DateIn) and MAX(DateOut), where the min and max values are across all rows for a given ID. Right? If so, use AGGREGATE with ID as the BREAK variable and the "add variables" mode to add new variables MinDateIn and MaxDateOut to the data set. Then, something like this (untested): COMPUTE Flag = RANGE(CompareDate, MinDateIn, MaxDateOut). FORMATS Flag (F1). FREQUENCIES Flag. By the way, MinDateIn and MaxDateOut will be ordinary numeric variables. Use ALTER TYPE if you want to make them display as date variables. HTH. Jeff A wrote > Here's one I can't get my head around. > > > > I can get part of this via the lag function and it's easy with one > complication that I can't figure out. I'm assuming it needs some type of > macro or a few nested loops. > > > > > > What's the best way to do the following. > > > > Each person has an ID and can have several different rows. Each row as a > begin date (DateIn) and end date (DateOut) but also date of an occurrence > (compare date). > > > > I have to compare each of the CompareDates with the DateIn and DateOut > periods to see whether any fall within the period from DateIn to DateOut > (inclusive of the DateIn and DateOut) across all rows with the same ID > number. Result = 1 when the CompareDate falls within any of the DateIn to > DateOut periods within the same Person ID number (either in rows before or > after). > > > > So it's possible that the CompareDate can be within an interval on the row > either before it or after it and that's the part that's throwing me. > > > > In getting started, I can do the following easily, but unsure how to > finish. > > > > > If((CompareDate >= DateIn) and (CompareDate <= DateOut)) Result = > 1. > > If((CompareDate >= lag(DateIn)) and (CompareDate <= lag(DateOut))) > Result = 1. > > If((CompareDate >= lag(DateIn 2)) and (CompareDate <= lag(DateOut,2))) > Result = 1. > > If((CompareDate >= lag(DateIn,3)) and (CompareDate <= lag(DateOut 3))) > Result = 1. > > > > Example with just the first 2 people. Each person will have a different > number of rows. > > > > ID DateIn DateOut CompareDate Result > > 1 01-Sep-04 09-Sep-04 05-Oct-04 1 > > 1 23-Sep-04 30-Sep-04 15-Jan-04 0 > > 1 10-Sep-04 09-Nov-04 06-Sep-04 1 > > > > 2 01-Jan-04 09-Feb-04 05-Oct-04 0 > > 2 20-Feb-04 15-Mar-04 24-Sept-04 0 > > > > > > Thanks in advance. > > > > Jeff > > > > > > -- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
@Bruce: I think aggregating can create problems if there are dates between min and max that are not included in any time interval. @Jeff: This can be done along the following logic. I'm sure there are more elegant solutions but this is straightforward and easy to understand. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. * Restructuring to wide format. CASESTOVARS /ID = ID. * Test if CompareDate lies between In and Out. DEFINE !result(!POS !TOKENS(1) /!POS !TOKENS(1) /!POS !TOKENS(1) /!POS !TOKENS(1)) COMPUTE test1 = DATEDIFF(!3, !1, 'days'). COMPUTE test2 = DATEDIFF(!3, !2, 'days'). IF (test1 >= 0) AND (test2 <= 0) !4 = 1. !ENDDEFINE. * Initializing (add more variables if needed). COMPUTE Result.1 = 0. COMPUTE Result.2 = 0. COMPUTE Result.3 = 0. * Repeat for each available CompareDate several times (add additional variables if needed - same principle). !result DateIn.1 DateOut.1 CompareDate.1 Result.1. !result DateIn.2 DateOut.2 CompareDate.1 Result.1. !result DateIn.3 DateOut.3 CompareDate.1 Result.1. !result DateIn.1 DateOut.1 CompareDate.2 Result.2. !result DateIn.2 DateOut.2 CompareDate.2 Result.2. !result DateIn.3 DateOut.3 CompareDate.2 Result.2. !result DateIn.1 DateOut.1 CompareDate.3 Result.3. !result DateIn.2 DateOut.2 CompareDate.3 Result.3. !result DateIn.3 DateOut.3 CompareDate.3 Result.3. * ... . FREQUENCIES Result.1 Result.2 Result.3. MATCH FILES FILE = * /DROP test1 test2. * Append additional measurement rows if needed. VARSTOCASES /MAKE DateIn FROM DateIn.1 DateIn.2 DateIn.3 /MAKE DateOut FROM DateOut.1 DateOut.2 DateOut.3 /MAKE CompareDate FROM CompareDate.1 CompareDate.2 CompareDate.3 /MAKE Result FROM Result.1 Result.2 Result.3 . SELECT IF NOT SYSMIS(CompareDate). EXECUTE. Good luck! Mario Munich, Germany
Am Samstag, 27. Juni 2020, 03:58:33 MESZ hat Bruce Weaver <[hidden email]> Folgendes geschrieben:
If I follow, on each row, you wish to set a flag to 1 if CompareDate on that row falls between MIN(DateIn) and MAX(DateOut), where the min and max values are across all rows for a given ID. Right? If so, use AGGREGATE with ID as the BREAK variable and the "add variables" mode to add new variables MinDateIn and MaxDateOut to the data set. Then, something like this (untested): COMPUTE Flag = RANGE(CompareDate, MinDateIn, MaxDateOut). FORMATS Flag (F1). FREQUENCIES Flag. By the way, MinDateIn and MaxDateOut will be ordinary numeric variables. Use ALTER TYPE if you want to make them display as date variables. HTH. Jeff A wrote > Here's one I can't get my head around. > > > > I can get part of this via the lag function and it's easy with one > complication that I can't figure out. I'm assuming it needs some type of > macro or a few nested loops. > > > > > > What's the best way to do the following. > > > > Each person has an ID and can have several different rows. Each row as a > begin date (DateIn) and end date (DateOut) but also date of an occurrence > (compare date). > > > > I have to compare each of the CompareDates with the DateIn and DateOut > periods to see whether any fall within the period from DateIn to DateOut > (inclusive of the DateIn and DateOut) across all rows with the same ID > number. Result = 1 when the CompareDate falls within any of the DateIn to > DateOut periods within the same Person ID number (either in rows before or > after). > > > > So it's possible that the CompareDate can be within an interval on the row > either before it or after it and that's the part that's throwing me. > > > > In getting started, I can do the following easily, but unsure how to > finish. > > > > > If((CompareDate >= DateIn) and (CompareDate <= DateOut)) Result = > 1. > > If((CompareDate >= lag(DateIn)) and (CompareDate <= lag(DateOut))) > Result = 1. > > If((CompareDate >= lag(DateIn 2)) and (CompareDate <= lag(DateOut,2))) > Result = 1. > > If((CompareDate >= lag(DateIn,3)) and (CompareDate <= lag(DateOut 3))) > Result = 1. > > > > Example with just the first 2 people. Each person will have a different > number of rows. > > > > ID DateIn DateOut CompareDate Result > > 1 01-Sep-04 09-Sep-04 05-Oct-04 1 > > 1 23-Sep-04 30-Sep-04 15-Jan-04 0 > > 1 10-Sep-04 09-Nov-04 06-Sep-04 1 > > > > 2 01-Jan-04 09-Feb-04 05-Oct-04 0 > > 2 20-Feb-04 15-Mar-04 24-Sept-04 0 > > > > > > Thanks in advance. > > > > Jeff > > > > > > -- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Here's what I had in mind, using Mario's sample dataset.
NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES /BREAK=ID /MinDateIn=MIN(DateIn) /MaxDateOut=MAX(DateOut). ALTER TYPE MinDateIn MaxDateOut (DATE11). COMPUTE Result = RANGE(CompareDate, MinDateIn, MaxDateOut). FORMATS Result (F1). LIST ID DateIn DateOut CompareDate Result . Here is the output from my LIST command: ID DateIn DateOut CompareDate Result 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 1 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 0 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 1 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 0 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 Here is the output I get if I run Mario's code, but change his final EXECUTE to LIST: ID DateIn DateOut CompareDate Result 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 1.00 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 .00 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 1.00 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 .00 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 .00 The results look the same to me. ;-) Mario Giesel-2 wrote > @Bruce: I think aggregating can create problems if there are dates between > min and max that are not included in any time interval. > @Jeff: This can be done along the following logic. I'm sure there are more > elegant solutions but this is straightforward and easy to understand. > --- snip--- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce,
Yabbut ... His is always right, and yours can fail. I'm reading
the requirement like he does - that the separate ranges have
to be used. Using RANGE( ) would make his Macro easier to read.
The problem with his is, Do you have a (reasonably small) maximum
number of lines for a case?
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 9:55 AM To: [hidden email] <[hidden email]> Subject: Re: Difficult date question Here's what I had in mind, using Mario's sample dataset.
NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES /BREAK=ID /MinDateIn=MIN(DateIn) /MaxDateOut=MAX(DateOut). ALTER TYPE MinDateIn MaxDateOut (DATE11). COMPUTE Result = RANGE(CompareDate, MinDateIn, MaxDateOut). FORMATS Result (F1). LIST ID DateIn DateOut CompareDate Result . Here is the output from my LIST command: ID DateIn DateOut CompareDate Result 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 1 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 0 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 1 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 0 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 Here is the output I get if I run Mario's code, but change his final EXECUTE to LIST: ID DateIn DateOut CompareDate Result 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 1.00 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 .00 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 1.00 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 .00 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 .00 The results look the same to me. ;-) Mario Giesel-2 wrote > @Bruce: I think aggregating can create problems if there are dates between > min and max that are not included in any time interval. > @Jeff: This can be done along the following logic. I'm sure there are more > elegant solutions but this is straightforward and easy to understand. > --- snip--- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Hi Rich. You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. Here's my first stab at it using multiple ranges. My main aim was to eliminate the need to manually intervene to indicate the maximum number of rows per person in the code. As a test, I added a 3rd person with 5 rows to Mario's dataset. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. DATASET NAME raw. DATASET COPY indates. DATASET COPY outdates. * Restructure InDates to wide format. DATASET ACTIVATE indates. DELETE VARIABLES CompareDate DateOut. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk1 (F1). EXECUTE. * Restructure OutDates to wide format. DATASET ACTIVATE outdates. DELETE VARIABLES CompareDate DateIn. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk2 (F1). EXECUTE. * Add wide InDate and OutDate variables to the original dataset. DATASET ACTIVATE raw. MATCH FILES FILE = * / TABLE='indates' / TABLE='outdates' / BY=ID. EXECUTE. DATASET CLOSE indates. DATASET CLOSE outdates. COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result = 1. END REPEAT. EXECUTE. DELETE VARIABLES DateIn1 to junk2. FORMATS Result (F1). LIST. Output from the LIST command (spaces between IDs inserted manually): ID DateIn DateOut CompareDate RESULT 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 1 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 0 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 1 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 0 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 3 01-SEP-2004 09-SEP-2004 05-OCT-2004 1 3 23-SEP-2004 30-SEP-2004 15-JAN-2004 1 3 10-SEP-2004 09-NOV-2004 06-SEP-2004 1 3 01-JAN-2004 09-FEB-2004 05-OCT-2004 1 3 20-FEB-2004 15-MAR-2004 24-SEP-2004 1 I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT so that I could exit the loop as soon as Result = 0. But I could not work out how to supply the maximum number of loops needed without hands on intervention. So I went with the (probably) less efficient DO-REPEAT. Cheers, Bruce Rich Ulrich wrote > Bruce, > Yabbut ... His is always right, and yours can fail. I'm reading > the requirement like he does - that the separate ranges have > to be used. Using RANGE( ) would make his Macro easier to read. > > The problem with his is, Do you have a (reasonably small) maximum > number of lines for a case? > -- > Rich Ulrich > > --- snip --- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce, and all,
I've noticed a solution that allows an infinite number of
lines for one case, so long as all the dates fall in one limited
range. For the data in these examples, you could use day-of-
year, for the year 2004. I don't know how much the processing
would slow down when using 2 or 3 or 10 years.
Define a vector, InTreat001 to InTreat366, for day of year.
Initialize: RECODE InTreat001 to InTreat366(Else=0).
Compute DateIn_day and DateOut_day as day-of-year (for each).
First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1.
Aggregate by ID, keeping the MAX value, putting it on each line.
Next pass - COMPUTE Response = Intreat(CompareDate_day).
Drop InTreat. I would be tempted to count the days while I had them.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Saturday, June 27, 2020 3:45 PM To: [hidden email] <[hidden email]> Subject: Re: Difficult date question Hi Rich. You prompted me to re-read the original post more carefully, and I
believe you and Mario are right. Here's my first stab at it using multiple ranges. My main aim was to eliminate the need to manually intervene to indicate the maximum number of rows per person in the code. As a test, I added a 3rd person with 5 rows to Mario's dataset. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. DATASET NAME raw. DATASET COPY indates. DATASET COPY outdates. * Restructure InDates to wide format. DATASET ACTIVATE indates. DELETE VARIABLES CompareDate DateOut. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk1 (F1). EXECUTE. * Restructure OutDates to wide format. DATASET ACTIVATE outdates. DELETE VARIABLES CompareDate DateIn. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk2 (F1). EXECUTE. * Add wide InDate and OutDate variables to the original dataset. DATASET ACTIVATE raw. MATCH FILES FILE = * / TABLE='indates' / TABLE='outdates' / BY=ID. EXECUTE. DATASET CLOSE indates. DATASET CLOSE outdates. COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result = 1. END REPEAT. EXECUTE. DELETE VARIABLES DateIn1 to junk2. FORMATS Result (F1). LIST. Output from the LIST command (spaces between IDs inserted manually): ID DateIn DateOut CompareDate RESULT 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 1 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 0 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 1 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 0 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 3 01-SEP-2004 09-SEP-2004 05-OCT-2004 1 3 23-SEP-2004 30-SEP-2004 15-JAN-2004 1 3 10-SEP-2004 09-NOV-2004 06-SEP-2004 1 3 01-JAN-2004 09-FEB-2004 05-OCT-2004 1 3 20-FEB-2004 15-MAR-2004 24-SEP-2004 1 I was trying to figure out a way to use VECTOR & LOOP rather than DO-REPEAT so that I could exit the loop as soon as Result = 0. But I could not work out how to supply the maximum number of loops needed without hands on intervention. So I went with the (probably) less efficient DO-REPEAT. Cheers, Bruce Rich Ulrich wrote > Bruce, > Yabbut ... His is always right, and yours can fail. I'm reading > the requirement like he does - that the separate ranges have > to be used. Using RANGE( ) would make his Macro easier to read. > > The problem with his is, Do you have a (reasonably small) maximum > number of lines for a case? > -- > Rich Ulrich > > --- snip --- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
The following should equally do the job assuming that the earliest day is one of the DayIns and the latest day is one of the DayOuts. **Its easy to modify if that is not the case. ***Obtain Start date and End date for each person. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DIn_Min 'Start Date in'=MIN(DateIn) /DOut_max 'Last day out'=MAX(DateOut). COMPUTE RESULT = CompareDate >= DIn_Min) & (CompareDate <= DOut_max. VARIABLE LABELS Day_Within 'Compare day is within Start and end'. EXECUTE. FORMATS Day_Within (F1). DELETE VARIABLES DIn_Min DOut_max. Best wishes Forcheh tested using Bruce data augmented by the original NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 5,01-Sep-04,09-Sep-04,05-Oct-04 5,23-Sep-04,30-Sep-04,15-Jan-04 5,10-Sep-04,09-Nov-04,06-Sep-04 6,01-Jan-04,09-Feb-04,05-Oct-04 6,20-Feb-04,15-Mar-04,24-Sept-04 END DATA. EXECUTE. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DIn_Min 'Start Date in'=MIN(DateIn) /DOut_max 'Last day out'=MAX(DateOut). COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. VARIABLE LABELS Day_Within 'Compare day is within Start and end'. EXECUTE. FORMATS Result (F1). DELETE VARIABLES DIn_Min DOut_max. best wishes Forcheh On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
|
corrected line : COMPUTE RESULT = CompareDate >= DIn_Min) & (CompareDate <= DOut_max. to: COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. On Sun, Jun 28, 2020 at 1:11 AM Nkem Ntonghanwah <[hidden email]> wrote:
|
In reply to this post by Rich Ulrich
Interesting approach.
The use of RECODE to initialize is something I'll try to remember. I appreciate that you frequently bring insight to the questions posed on this list. "I don't know how much the processing would slow down when using 2 or 3 or 10 years. " That gives you away as one of us old-timers. It is several years since I timed SPSS runs using different approaches. YMMV. With today's computers the difference in CPU time, memory, or storage is usually negligible. It is even becoming increasingly rare for me to bother redrafting syntax only on a subset of the data. Often the personnel time (1) to quality-assurance check the syntax and (2) explain the syntax to clients or critics is much more of a consideration. So I emphasize the readability of syntax. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Rich Ulrich
Dear All, I included the following cases to the data and tested using Bruce update with restructuring etc. Bruce version gives error messages - "In range 1st value greater than the second value, result set to missing" But the final result is exactly the same as using the simple aggregate and compute that I posted (below) which never generates any errors and don't depend on date range, etc.. 5,01-Jul-04,09-Sep-04,05-Oct-04 5,23-Jan-05,30-Dec-04,15-Jan-04 5,10-Sep-04,09-Nov-04,06-Sep-04 AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DIn_Min 'Start Date in'=MIN(DateIn) /DOut_max 'Last day out'=MAX(DateOut). COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. VARIABLE LABELS Result 'Compare day is within Start and end'. EXECUTE. FORMATS Result (F1). DELETE VARIABLES DIn_Min DOut_max. I would like a counter example where this doesn't work Thanks. Forcheh On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
|
Administrator
|
In reply to this post by Rich Ulrich
I had a go at implementing Rich's pseudo-code. And after enlisting his help
on one point that was confusing me, I believe I've now got a working version. * Implementation of Rich Ulrich's suggestion, posted 27-Jun-2020. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. COMPUTE doy1 = XDATE.JDATE(DateIn). COMPUTE doy2 = XDATE.JDATE(DateOut). COMPUTE doycd = XDATE.JDATE(CompareDate). FORMATS doy1 to doycd (F5.0). * Define a vector, InTreat001 to InTreat366, for day of year. * Initialize: RECODE InTreat001 to InTreat366(Else=0). NUMERIC InTx001 to InTx366 (F1). RECODE InTx001 to InTx366 (ELSE=0). * First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1. VECTOR InTx = InTx001 to InTx366. LOOP # = doy1 to doy2. COMPUTE InTx(#) = 1. END LOOP. EXECUTE. * Aggregate by ID, keeping the MAX values of InTx, writing them to each line. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES /BREAK=ID /InTx001 to InTx366 = MAX(InTx001 to InTx366). * Next pass - COMPUTE Response = Intreat(CompareDate_day). VECTOR InTx = InTx001 to InTx366. COMPUTE Result = InTx(doycd). COMPUTE DaysInTx = SUM(InTx001 to InTx366). FORMATS Result (F1) / DaysInTx (F5.0). EXECUTE. /* Needed before DELETE VARIABLES. DELETE VARIABLES InTx001 to InTx366. LIST. Output from LIST: ID DateIn DateOut CompareDate doy1 doy2 doycd Result DaysInTx 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 245 253 279 1 70 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 267 274 15 0 70 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 254 314 250 1 70 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 1 40 279 0 65 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 51 75 268 0 65 3 01-SEP-2004 09-SEP-2004 05-OCT-2004 245 253 279 1 135 3 23-SEP-2004 30-SEP-2004 15-JAN-2004 267 274 15 1 135 3 10-SEP-2004 09-NOV-2004 06-SEP-2004 254 314 250 1 135 3 01-JAN-2004 09-FEB-2004 05-OCT-2004 1 40 279 1 135 3 20-FEB-2004 15-MAR-2004 24-SEP-2004 51 75 268 1 135 As Rich noted, tweaks would be needed if dates are not all within the same calendar year. Rich Ulrich wrote > Bruce, > Yabbut ... His is always right, and yours can fail. I'm reading > the requirement like he does - that the separate ranges have > to be used. Using RANGE( ) would make his Macro easier to read. > > The problem with his is, Do you have a (reasonably small) maximum > number of lines for a case? > -- > Rich Ulrich > > --- snip --- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce, thanks for showing it.
It occurs to me that the method can save more information that might
be interesting if the Aggregate uses SUM( ) instead of MAX( ).
In that case, Result will contain the number of treatments that
a person was enrolled in at the time of the event, instead of No/Yes.
(To obtain No/Yes like the original Result, I would use RECODE .... /INTO ) .
And SUM( ) for DaysInTx will report a subtly different value - the sum
of days in treatments, instead of the count of calendar days for which
the person was in any treatment. You could get the latter from the
vector by using COUNT(1,2,... max num of simultaneous treatments).
As an old-timer, I respect the possibility that someone might want to
generalize the algorithm. "Thousands" will offer little slowdown so long
as everything stays in memory, not on disk. A file of millions of records
could slow down. Using a vector-length that is in the millions (seconds,
not days?) would be slow merely if that initializing takes time; I suspect
that initializing isn't required. Still, anything that has to page to disk will
take time.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Sunday, June 28, 2020 1:45 PM To: [hidden email] <[hidden email]> Subject: Re: Difficult date question I had a go at implementing Rich's pseudo-code. And after enlisting his help
on one point that was confusing me, I believe I've now got a working version. * Implementation of Rich Ulrich's suggestion, posted 27-Jun-2020. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 END DATA. COMPUTE doy1 = XDATE.JDATE(DateIn). COMPUTE doy2 = XDATE.JDATE(DateOut). COMPUTE doycd = XDATE.JDATE(CompareDate). FORMATS doy1 to doycd (F5.0). * Define a vector, InTreat001 to InTreat366, for day of year. * Initialize: RECODE InTreat001 to InTreat366(Else=0). NUMERIC InTx001 to InTx366 (F1). RECODE InTx001 to InTx366 (ELSE=0). * First pass - Loop from DateIn_day to DateOut_day to set values of InTreat to 1. VECTOR InTx = InTx001 to InTx366. LOOP # = doy1 to doy2. COMPUTE InTx(#) = 1. END LOOP. EXECUTE. * Aggregate by ID, keeping the MAX values of InTx, writing them to each line. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES /BREAK=ID /InTx001 to InTx366 = MAX(InTx001 to InTx366). * Next pass - COMPUTE Response = Intreat(CompareDate_day). VECTOR InTx = InTx001 to InTx366. COMPUTE Result = InTx(doycd). COMPUTE DaysInTx = SUM(InTx001 to InTx366). FORMATS Result (F1) / DaysInTx (F5.0). EXECUTE. /* Needed before DELETE VARIABLES. DELETE VARIABLES InTx001 to InTx366. LIST. Output from LIST: ID DateIn DateOut CompareDate doy1 doy2 doycd Result DaysInTx 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 245 253 279 1 70 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 267 274 15 0 70 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 254 314 250 1 70 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 1 40 279 0 65 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 51 75 268 0 65 3 01-SEP-2004 09-SEP-2004 05-OCT-2004 245 253 279 1 135 3 23-SEP-2004 30-SEP-2004 15-JAN-2004 267 274 15 1 135 3 10-SEP-2004 09-NOV-2004 06-SEP-2004 254 314 250 1 135 3 01-JAN-2004 09-FEB-2004 05-OCT-2004 1 40 279 1 135 3 20-FEB-2004 15-MAR-2004 24-SEP-2004 51 75 268 1 135 As Rich noted, tweaks would be needed if dates are not all within the same calendar year. Rich Ulrich wrote > Bruce, > Yabbut ... His is always right, and yours can fail. I'm reading > the requirement like he does - that the separate ranges have > to be used. Using RANGE( ) would make his Macro easier to read. > > The problem with his is, Do you have a (reasonably small) maximum > number of lines for a case? > -- > Rich Ulrich > > --- snip --- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Nkem Ntonghanwah
Counter example: 5,01-Jul-04,09-Sep-04,05-Oct-04 5,23-Jan-05,30-Apr-04,15-Jun-04 5,10-Sep-04,09-Nov-04,06-Sep-04 --------------------------------------------
Am Dienstag, 30. Juni 2020, 13:50:26 MESZ hat Nkem Ntonghanwah <[hidden email]> Folgendes geschrieben:
Dear All, I included the following cases to the data and tested using Bruce update with restructuring etc. Bruce version gives error messages - "In range 1st value greater than the second value, result set to missing" But the final result is exactly the same as using the simple aggregate and compute that I posted (below) which never generates any errors and don't depend on date range, etc.. 5,01-Jul-04,09-Sep-04,05-Oct-04 5,23-Jan-05,30-Dec-04,15-Jan-04 5,10-Sep-04,09-Nov-04,06-Sep-04 AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DIn_Min 'Start Date in'=MIN(DateIn) /DOut_max 'Last day out'=MAX(DateOut). COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. VARIABLE LABELS Result 'Compare day is within Start and end'. EXECUTE. FORMATS Result (F1). DELETE VARIABLES DIn_Min DOut_max. I would like a counter example where this doesn't work Thanks. Forcheh On Sat, Jun 27, 2020 at 11:36 PM Rich Ulrich <[hidden email]> wrote:
|
Administrator
|
In reply to this post by Nkem Ntonghanwah
For some reason, I did not see any of Forcheh's posts in the Nabble archive
until today. Presumably, they were visible sooner to those who receive posts via email. If the reference below is to the revised code I posted on 27-Jun-2020 (after being corrected by MG and RU), then yes, when I use Forcheh's new cases for ID=5 where DateIn > DateOut on one row, I also get those warning messages. (Note that they are warning messages, not error messages.) But I was able to eliminate them by making a change in the DO-IF structure near the end of that code. Here are the relevant bits of code, original version first followed by the revised version. From my code posted on 27-Jun-2020: COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result = 1. END REPEAT. EXECUTE. Revised version posted on 30-Jun-2020: COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. DO IF NOT MISSING(in) AND (in LE out) and NOT Result. IF RANGE(CompareDate,in,out) Result = 1. END IF. END REPEAT. EXECUTE. HTH. PS- To avoid confusion about which version this is referring to, here is the complete code. * Revised version of BW's second attempt after being corrected by MG & RU. * Original version of this code was posted on Jun 27, 2020. * In the dataset below, DateIn > DateOut for ID 5's 2nd row. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 5,01-Jul-04,09-Sep-04,05-Oct-04 5,23-Jan-05,30-Dec-04,15-Jan-04 5,10-Sep-04,09-Nov-04,06-Sep-04 END DATA. * Notice that DateIn > DateOut ID 5's second observation. * My original code would result in Warning messages for ID 5. * I have changed the conditional statements in the DO-REPEAT * structure below to eliminate those warnings. DATASET NAME raw. DATASET COPY indates. DATASET COPY outdates. * Restructure InDates to wide format. DATASET ACTIVATE indates. DELETE VARIABLES CompareDate DateOut. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk1 (F1). EXECUTE. * Restructure OutDates to wide format. DATASET ACTIVATE outdates. DELETE VARIABLES CompareDate DateIn. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk2 (F1). EXECUTE. * Add wide InDate and OutDate variables to the original dataset. DATASET ACTIVATE raw. MATCH FILES FILE = * / TABLE='indates' / TABLE='outdates' / BY=ID. EXECUTE. DATASET CLOSE indates. DATASET CLOSE outdates. COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. DO IF NOT MISSING(in) AND (in LE out) and NOT Result. IF RANGE(CompareDate,in,out) Result = 1. END IF. END REPEAT. EXECUTE. DELETE VARIABLES DateIn1 to junk2. FORMATS Result (F1). LIST. Nkem Ntonghanwah wrote > Dear All, > I included the following cases to the data and tested using Bruce update > with restructuring etc. > Bruce version gives error messages - "In range 1st value greater than the > second value, result set to missing" > But the final result is exactly the same as using the simple aggregate and > compute that I posted (below) which never generates any errors and > don't depend on date range, etc.. > > 5,01-Jul-04,09-Sep-04,05-Oct-04 > 5,23-Jan-05,30-Dec-04,15-Jan-04 > 5,10-Sep-04,09-Nov-04,06-Sep-04 > > > AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK=ID > /DIn_Min 'Start Date in'=MIN(DateIn) > /DOut_max 'Last day out'=MAX(DateOut). > > COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. > VARIABLE LABELS Result 'Compare day is within Start and end'. > EXECUTE. > > FORMATS Result (F1). > DELETE VARIABLES DIn_Min DOut_max. > > I would like a counter example where this doesn't work > Thanks. > Forcheh ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Data clean-up should come first, if qualiity wasn't assured on data entry.
If I am not working with "clean data" - so that IN may be later than OUT -
then I should prefer to see early warning that the data need fixing, instead
of facing awkward questions (days later) about inconsistencies.
Patching to code to avoid warnings is not a good solution.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Tuesday, June 30, 2020 10:48 AM To: [hidden email] <[hidden email]> Subject: Re: Difficult date question For some reason, I did not see any of Forcheh's posts in the Nabble archive
until today. Presumably, they were visible sooner to those who receive posts via email. If the reference below is to the revised code I posted on 27-Jun-2020 (after being corrected by MG and RU), then yes, when I use Forcheh's new cases for ID=5 where DateIn > DateOut on one row, I also get those warning messages. (Note that they are warning messages, not error messages.) But I was able to eliminate them by making a change in the DO-IF structure near the end of that code. Here are the relevant bits of code, original version first followed by the revised version. From my code posted on 27-Jun-2020: COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. IF NOT Result AND NOT MISSING(in) and RANGE(CompareDate,in,out) Result = 1. END REPEAT. EXECUTE. Revised version posted on 30-Jun-2020: COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. DO IF NOT MISSING(in) AND (in LE out) and NOT Result. IF RANGE(CompareDate,in,out) Result = 1. END IF. END REPEAT. EXECUTE. HTH. PS- To avoid confusion about which version this is referring to, here is the complete code. * Revised version of BW's second attempt after being corrected by MG & RU. * Original version of this code was posted on Jun 27, 2020. * In the dataset below, DateIn > DateOut for ID 5's 2nd row. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 5,01-Jul-04,09-Sep-04,05-Oct-04 5,23-Jan-05,30-Dec-04,15-Jan-04 5,10-Sep-04,09-Nov-04,06-Sep-04 END DATA. * Notice that DateIn > DateOut ID 5's second observation. * My original code would result in Warning messages for ID 5. * I have changed the conditional statements in the DO-REPEAT * structure below to eliminate those warnings. DATASET NAME raw. DATASET COPY indates. DATASET COPY outdates. * Restructure InDates to wide format. DATASET ACTIVATE indates. DELETE VARIABLES CompareDate DateOut. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk1 (F1). EXECUTE. * Restructure OutDates to wide format. DATASET ACTIVATE outdates. DELETE VARIABLES CompareDate DateIn. CASESTOVARS /ID = ID /SEPARATOR="" /GROUPBY=INDEX. NUMERIC junk2 (F1). EXECUTE. * Add wide InDate and OutDate variables to the original dataset. DATASET ACTIVATE raw. MATCH FILES FILE = * / TABLE='indates' / TABLE='outdates' / BY=ID. EXECUTE. DATASET CLOSE indates. DATASET CLOSE outdates. COMPUTE RESULT = 0. * Set Result = 1 if CompareDate falls within any of the intervals for that ID. DO REPEAT in = DateIn1 to junk1 / out = DateOut1 to junk2. DO IF NOT MISSING(in) AND (in LE out) and NOT Result. IF RANGE(CompareDate,in,out) Result = 1. END IF. END REPEAT. EXECUTE. DELETE VARIABLES DateIn1 to junk2. FORMATS Result (F1). LIST. Nkem Ntonghanwah wrote > Dear All, > I included the following cases to the data and tested using Bruce update > with restructuring etc. > Bruce version gives error messages - "In range 1st value greater than the > second value, result set to missing" > But the final result is exactly the same as using the simple aggregate and > compute that I posted (below) which never generates any errors and > don't depend on date range, etc.. > > 5,01-Jul-04,09-Sep-04,05-Oct-04 > 5,23-Jan-05,30-Dec-04,15-Jan-04 > 5,10-Sep-04,09-Nov-04,06-Sep-04 > > > AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK=ID > /DIn_Min 'Start Date in'=MIN(DateIn) > /DOut_max 'Last day out'=MAX(DateOut). > > COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. > VARIABLE LABELS Result 'Compare day is within Start and end'. > EXECUTE. > > FORMATS Result (F1). > DELETE VARIABLES DIn_Min DOut_max. > > I would like a counter example where this doesn't work > Thanks. > Forcheh ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by spss.giesel@yahoo.de
What is the correct result for each of the 3 lines? This is what I get. 5,01-Jul-04,09-Sep-04,05-Oct-04 1 5,23-Jan-05,30-Apr-04,15-Jun-04 0 5,10-Sep-04,09-Nov-04,06-Sep-04 1 But Bruce code fails to produce any results when these lines are added -- See below. NEW FILE. DATASET CLOSE ALL. DATA LIST FREE(",") /ID (F8) DateIn (DATE) DateOut (DATE) CompareDate (DATE). BEGIN DATA 1,01-Sep-04,09-Sep-04,05-Oct-04 1,23-Sep-04,30-Sep-04,15-Jan-04 1,10-Sep-04,09-Nov-04,06-Sep-04 2,01-Jan-04,09-Feb-04,05-Oct-04 2,20-Feb-04,15-Mar-04,24-Sep-04 3,01-Sep-04,09-Sep-04,05-Oct-04 3,23-Sep-04,30-Sep-04,15-Jan-04 3,10-Sep-04,09-Nov-04,06-Sep-04 3,01-Jan-04,09-Feb-04,05-Oct-04 3,20-Feb-04,15-Mar-04,24-Sep-04 5,01-Jul-04,09-Sep-04,05-Oct-04 5,23-Jan-05,30-Dec-04,15-Jan-04 5,10-Sep-04,09-Nov-04,06-Sep-04 6,01-Jan-04,09-Feb-05,05-Oct-04 6,20-Feb-04,15-Mar-04,24-Sept-04 4,01-Jul-04,09-Sep-04,05-Oct-04 4,23-Jan-05,30-Apr-04,15-Jun-04 4,10-Sep-04,09-Nov-04,06-Sep-04 END DATA. EXECUTE. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /DIn_Min 'Start Date in'=MIN(DateIn) /DOut_max 'Last day out'=MAX(DateOut). COMPUTE Result=CompareDate >= DIn_Min & CompareDate <= DOut_max. VARIABLE LABELS Result 'Compare day is within Start and end'. EXECUTE. FORMATS Result (F1). list. ID DateIn DateOut CompareDate DIn_Min DOut_max Result 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 01-SEP-2004 09-NOV-2004 1 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 01-SEP-2004 09-NOV-2004 0 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-SEP-2004 09-NOV-2004 1 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 01-JAN-2004 15-MAR-2004 0 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 01-JAN-2004 15-MAR-2004 0 3 01-SEP-2004 09-SEP-2004 05-OCT-2004 01-JAN-2004 09-NOV-2004 1 3 23-SEP-2004 30-SEP-2004 15-JAN-2004 01-JAN-2004 09-NOV-2004 1 3 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-JAN-2004 09-NOV-2004 1 3 01-JAN-2004 09-FEB-2004 05-OCT-2004 01-JAN-2004 09-NOV-2004 1 3 20-FEB-2004 15-MAR-2004 24-SEP-2004 01-JAN-2004 09-NOV-2004 1 5 01-JUL-2004 09-SEP-2004 05-OCT-2004 01-JUL-2004 30-DEC-2004 1 5 23-JAN-2005 30-DEC-2004 15-JAN-2004 01-JUL-2004 30-DEC-2004 0 5 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-JUL-2004 30-DEC-2004 1 6 01-JAN-2004 09-FEB-2005 05-OCT-2004 01-JAN-2004 09-FEB-2005 1 6 20-FEB-2004 15-MAR-2004 24-SEP-2004 01-JAN-2004 09-FEB-2005 1 4 01-JUL-2004 09-SEP-2004 05-OCT-2004 01-JUL-2004 09-NOV-2004 1 4 23-JAN-2005 30-APR-2004 15-JUN-2004 01-JUL-2004 09-NOV-2004 0 4 10-SEP-2004 09-NOV-2004 06-SEP-2004 01-JUL-2004 09-NOV-2004 1 Number of cases read: 18 Number of cases listed: 18 USING BRUCE OLD AND REVISED CODE, I get warning message "The first sequence error was found at case 16. The result data may not be correct." AND the following output ID DateIn DateOut CompareDate RESULT 1 01-SEP-2004 09-SEP-2004 05-OCT-2004 0 1 23-SEP-2004 30-SEP-2004 15-JAN-2004 0 1 10-SEP-2004 09-NOV-2004 06-SEP-2004 0 2 01-JAN-2004 09-FEB-2004 05-OCT-2004 0 2 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 3 01-SEP-2004 09-SEP-2004 05-OCT-2004 0 3 23-SEP-2004 30-SEP-2004 15-JAN-2004 0 3 10-SEP-2004 09-NOV-2004 06-SEP-2004 0 3 01-JAN-2004 09-FEB-2004 05-OCT-2004 0 3 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 5 01-JUL-2004 09-SEP-2004 05-OCT-2004 0 5 23-JAN-2005 30-DEC-2004 15-JAN-2004 0 5 10-SEP-2004 09-NOV-2004 06-SEP-2004 0 6 01-JAN-2004 09-FEB-2005 05-OCT-2004 0 6 20-FEB-2004 15-MAR-2004 24-SEP-2004 0 4 01-JUL-2004 09-SEP-2004 05-OCT-2004 0 4 23-JAN-2005 30-APR-2004 15-JUN-2004 0 4 10-SEP-2004 09-NOV-2004 06-SEP-2004 0 Regards Forcheh
|
Administrator
|
In reply to this post by Rich Ulrich
Fair point.
Rich Ulrich wrote > Data clean-up should come first, if qualiity wasn't assured on data entry. > > If I am not working with "clean data" - so that IN may be later than OUT - > then I should prefer to see early warning that the data need fixing, > instead > of facing awkward questions (days later) about inconsistencies. > > Patching to code to avoid warnings is not a good solution. > > -- > Rich Ulrich ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rich Ulrich
Hear! Hear!
Data quality checks should be done before any further analysis steps are done. For example, Sysmis reserved as a debugging help. All missing values are labeled user missing with meaningful labels. All variable values are in a plausible range. A reasonable set of between variable checks should have been done, e.g., post-test date should be before pre-test date, intake-date before discharge-date, cells in age by grade crosstab should be checked, etc. The "check for anomalies" procedure is a major benefit. YMMV but I am amazed when less than 95% of the work time in a project is in cleaning and checking the data. Out of many hundreds of datasets I have seen over the years, I can only recall 3 that surprised me by being clean. One of the problems with how stat is taught is that there is no mention that the example data sets have the peculiarity of being clean. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
"Patching to code to avoid warnings is not a good solution."
That is an understatement. The consensus in social and behavioral sciences is that Murphy was an optimist When there is complex logic it is often helpful to code to *get *warnings. For example, in a long DO IF sequence, it is often helpful to end with something like ELSE. PRINT 'OOPS unaccounted for combination of conditions.'. PRINT /$casenum ID CondVar1 Condvar2 CondVar3. END IF. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |