Hello Listers,
I hope you'll doing well. I searched everywhere to find an answer to this question but i was unlucky. I have a data that has more than 11 million observation (hospital visits) and i want to choose a sample of cases if certain binary variable is there. The problem is that, i want to make sure that this is a true case. The way to do so is to use both ( var1=1) and (var2=1). Both variables don't have to be =1 in the same observation. They can be =1 within 7 days window (+/- 7 days from each others). Example: ID Var1 Var1_date Var2 Var2_date 1 1 1/Jan/2015 0 0 1 0 0 1 27/DEC/2014 1 0 0 1 20/DEC/1995 2 1 1/June/2010 0 0 2 0 0 1 1/JUL/2005 As you can see, patient number 1 is a confirmed case because he has both var1 and var2 =1 within 7 days period. And patient 2 is not a case. How to select only patient 1? and aggregate both visits? My iam is to look at annual incidence. Thank you, |
Is there ever only one date available for each patient under Var1_Date? And is this date always prior to any date seen under Var2_Date?
It look like there can be multiple dates under Var2_Date, is the 7 day criteria to be judged between dates between Var1_Date and Var2_Date or could it be within Var2_Dates also? Could you explain what the variables represent, what each record of row represents please. |
Hey Jignesh,
Thank you for your interest. I have many observations (hospital visits for each patients in each row). Each patients should have a disease code (var1) and a procedure code (var2) within 7 days of each others to be considered a case. Dates of var1 and var2 could be prior or after each other. In my case it should be within 7 days before or after, and the 7 day criteria is to be judged between dates between Var1_Date and Var2_Date only. I hope i answered all of your questions. |
Administrator
|
In reply to this post by TheGhazi
As a start I would suggest looking at the SELECT IF command, the LAG function and CREATE or SHIFT VALUES commands (also SPLIT FILE and or CASESTOVARS might be useful).
Going one level deeper also see VECTOR and LOOP? Meanwhile perhaps post additional sample records and the nature of the proposed AGGREGATE? --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by TheGhazi
How can a procedure take place (Var2) before a diagnosis is identified (Var1)? On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote: Hey Jignesh, |
In reply to this post by David Marso
Great ideas, David.
I will think about it. Thank you so much. Date: Tue, 10 Mar 2015 11:02:01 -0700 From: [hidden email] To: [hidden email] Subject: Re: Aggregating observations if within a time frame As a start I would suggest looking at the SELECT IF command, the LAG function and CREATE or SHIFT VALUES commands (also SPLIT FILE and or CASESTOVARS might be useful). Going one level deeper also see VECTOR and LOOP? Meanwhile perhaps post additional sample records and the nature of the proposed AGGREGATE? --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728973.html
To unsubscribe from Aggregating observations if within a time frame, click here. <a href="http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble:email.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble:email.naml-instant_emails%21nabble:email.naml-send_instant_email%21nabble:email.naml" rel=nofollow style="font:9px serif;" target=_blank>NAML |
In reply to this post by Jignesh Sutar
It is a matter of coding, Jignesh.
The data is a result of 2 merged datasets. The diagnosis code gets coded at the end of the hospital stay while the procedure gets claimed right away. Date: Tue, 10 Mar 2015 11:14:28 -0700 From: [hidden email] To: [hidden email] Subject: Re: Aggregating observations if within a time frame How can a procedure take place (Var2) before a diagnosis is identified (Var1)? On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote: Hey Jignesh, Thank you for your interest. I have many observations (hospital visits for each patients in each row). Each patients should have a disease code (var1) and a procedure code (var2) within 7 days of each others to be considered a case. Dates of var1 and var2 could be prior or after each other. In my case it should be within 7 days before or after, and the 7 day criteria is to be judged between dates between Var1_Date and Var2_Date only. I hope i answered all of your questions. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728974.html
To unsubscribe from Aggregating observations if within a time frame, click here. <a href="http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble:email.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble:email.naml-instant_emails%21nabble:email.naml-send_instant_email%21nabble:email.naml" rel=nofollow style="font:9px serif;" target=_blank>NAML |
In reply to this post by Jignesh Sutar
It is a matter of coding, Jignesh. The data is a result of 2 merged datasets. The diagnosis code gets coded at the end of the hospital stay while the procedure gets claimed right away. Date: Tue, 10 Mar 2015 11:14:28 -0700 From: [hidden email] To: [hidden email] Subject: Re: Aggregating observations if within a time frame How can a procedure take place (Var2) before a diagnosis is identified (Var1)? On 10 March 2015 at 17:45, TheGhazi <[hidden email]> wrote: Hey Jignesh, Thank you for your interest. I have many observations (hospital visits for each patients in each row). Each patients should have a disease code (var1) and a procedure code (var2) within 7 days of each others to be considered a case. Dates of var1 and var2 could be prior or after each other. In my case it should be within 7 days before or after, and the 7 day criteria is to be judged between dates between Var1_Date and Var2_Date only. I hope i answered all of your questions. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728974.html
To unsubscribe from Aggregating observations if within a time frame, click here. <a href="http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble:email.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble:email.naml-instant_emails%21nabble:email.naml-send_instant_email%21nabble:email.naml" rel=nofollow style="font:9px serif;" target=_blank>NAML |
In reply to this post by TheGhazi
I think there may be a slight language issue because you say
>>I have many observations (hospital visits for each patients in each row). I think you mean that "... millions of records. Each record is either of this type 1 1 1/Jan/2015 0 0, where var1+var1_date= disease code+DX? date Or this type 1 0 0 1 27/DEC/201, where var2+var2_date is procedure code+Procedure data. But never this type 1 1 1/Jan/2015 1 7/jan/2015 Even though you said " Both variables don't have to be =1 in the same observation." I'm assuming each record is an observation. And never this type 1 0 0 0 0 Are the only valid values for var1 and var2 0 or 1? Let's say this is so, then I'd do this. Save all var1=1 records to file A0 keeping only id var1 var1_date. Save all var2=1 records to file B0 keeping only id var2 var2_date. Sort file A0 by id and var1_date then do a casestovars (as David pointed out) and save file A1. Repeat for file B0 using var2_date to make file B1. You have two files with one record per id: Match files. Vectorize (Vector) var1_date and var2_date. Set up double loops and set marker variable=1 if ABS(datediff(var1_date,var2_date,"days")) le 7 then Break to jump out of the loop. Gene Maguin ID Var1 Var1_date Var2 Var2_date 1 1 1/Jan/2015 0 0 1 0 0 1 27/DEC/2014 1 0 0 1 20/DEC/1995 2 1 1/June/2010 0 0 2 0 0 1 1/JUL/2005 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of TheGhazi Sent: Tuesday, March 10, 2015 1:45 PM To: [hidden email] Subject: Re: Aggregating observations if within a time frame Hey Jignesh, Thank you for your interest. I have many observations (hospital visits for each patients in each row). Each patients should have a disease code (var1) and a procedure code (var2) within 7 days of each others to be considered a case. Dates of var1 and var2 could be prior or after each other. In my case it should be within 7 days before or after, and the 7 day criteria is to be judged between dates between Var1_Date and Var2_Date only. I hope i answered all of your questions. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregating-observations-if-within-a-time-frame-tp5728970p5728972.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This makes a lot of assumptions about your data but may help as a stepping stone...
DATA LIST LIST / ID (f1.0) Var1 (f1.0) Var1_date (a10) Var2 (f1.0) Var2_date (a10). BEGIN DATA. 1 1 01/01/2015 0 0 1 0 0 1 27/12/2014 1 0 0 1 20/12/1995 2 1 1/01/2010 0 0 2 0 0 1 1/12/2005 END DATA. alter type Var1_date Var2_date (date). numeric Type (f1.0). value labels Type 1 "Diagnosis" 2 "Procedure". compute Type=sum(Var1, Var2*2). compute Date=sum(Var1_date, Var2_date). alter type Date (date). sort cases by ID Date. do if ID=lag(ID) and Type<>lag(Type). compute DayDiff=ctime.days(Date-(lag(date))). end if. do if DayDiff<8. compute Filter=1. end if. create previousrecord= lead(Filter,1). compute Filter=sum(Filter,previousrecord). select if Filter=1. exe. |
Administrator
|
In reply to this post by TheGhazi
*/ So off the cuff (obviously untested but I'll bet something of this sort will work ) */.
DATASET NAME rawdata. DATASET COPY work. DATASET ACTIVATE work. /* Retain only cases with var1 or var2 =1. SELECT IF ANY(1,var1,var2) /* Could use SELECT IF var1 | var2. [ if (0,1) ]*/. /* Presuming only one or other date is filled create a common field with both*/ COMPUTE date=MAX(date1,date2). /* Could possibly omit this SORT if you know cases are already ordered */. SORT CASES BY Unique_IDentifier date. /*Compare adjacent cases and flag second matching ID with ABS(d2-d1) within 7 days*/. COMPUTE flag=ABS((Unique_IDentifier EQ LAG(Unique_IDentifier))*DATEDIFF(date,LAG(date),'days')) LE 7. /* Add the flag to both records */. AGGREGATE OUTFILE * Mode ADDVARIABLES/ BREAK Unique_IDentifier / ValidCase=MAX(flag). /* Add flagged records to raw data */. MATCH FILES /FILE rawdata/TABLE * / BY Unique_IDentifier. DATASET Name flagged. SELECT IF ValidCase.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Actually, don't use MODE ADDVARIABLES.
DATASET NAME rawdata. DATASET COPY work. DATASET ACTIVATE work. /* Retain only cases with var1 or var2 =1. SELECT IF ANY(1,var1,var2) /* Could use SELECT IF var1 | var2. [ if (0,1) ]*/. /* Presuming only one or other date is filled create a common field with both*/ COMPUTE date=MAX(date1,date2). /* Could possibly omit this SORT if you know cases are already ordered */. SORT CASES BY Unique_IDentifier date. /*Compare adjacent cases and flag second matching ID with ABS(d2-d1) within 7 days*/. COMPUTE flag=ABS((Unique_IDentifier EQ LAG(Unique_IDentifier)) * DATEDIFF(date,LAG(date),'days')) LE 7. /* Add the flag to both records */. AGGREGATE OUTFILE * / BREAK Unique_IDentifier / ValidCase=MAX(flag). /* Add flagged records to raw data */. MATCH FILES /FILE rawdata/TABLE * / BY Unique_IDentifier. DATASET Name flagged. SELECT IF ValidCase.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |