Finding matching cases across 2 datasets based on a Date variable

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Finding matching cases across 2 datasets based on a Date variable

devoidx
This post was updated on .
Hi guys, new day, new problem.  So this time I have two datasets. dataset1 includes the patients (uniqueID) that saw a doctor for disease X (Diagnosis_var). dataset2 contains the same patients (uniqueID) used the drug Y(Drug_var). both datasets contain  a date variable which specifies the date that the patient either went to the doctor (in dataset1) or the date that the patient got his drug Y (in dataset2).

what I want to do is to some how match these two databases so that if a patient got his drug Y plus or minus 2 weeks of this doctor visit for Disease X, return 1, otherwise 0.

What complicates this is that each database can have multiple entries for the same patient since a patient can visit a doctor multiple times a year for the same reason or he might have gotten the same drug multiple times a year

So basically as long as one of the drug Y acquisition dates for a patient falls within two weeks (before or after) of one his doctor visits for disease X , I want a new variable lets say within2weeks=1. if none of the drug acquisitions dates of drug Y for a patient doesn't fall within 2 weeks of any of his clinic visits i want within2weeks=0.

in a nutshell, I want to come up with a list of patients that got drug Y because of disease X but since the disease database and the drug database are separate, and the only link between them is the UNIQUEID of the patient my only option is to find the drug Y acquisition dates that are close to when the patient went in for disease X checkup.

Any clue how I can go about this?

Thanks as always!

Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

Maguin, Eugene
Some clarifications.
Are you interested, for example, in only Diagnosis X = ICD756.89 only and Drug Y = PhenPhen or X as any legal diagnostic code and Y as any prescribable drug?
Just out of curiosity, how may a person be prescribed a drug and subsequently see a doctor and receive a diagnosis for which the already-prescribed drug is a qualifying treatment for the diagnosis?

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of devoidx
Sent: Friday, October 11, 2013 3:36 PM
To: [hidden email]
Subject: Finding matching cases across 2 datasets based on a Date variable

Hi guy, new day, new problem.  So this time I have two datasets. dataset1 includes the patients (uniqueID) that saw a doctor for disease X (Diagnosis_var). dataset2 contains the same patients (uniqueID) used the drug Y(Drug_var). both datasets contain  a date variable which specifies the date that the patient either went to the doctor (in dataset1) or the date that the patient got his drug Y (in dataset2).

what I want to do is to some how match these two databases so that if a patient got his drug Y plus or minus 2 weeks of this doctor visit for Disease X, return 1, otherwise 0.

What complicates this is that each database can have multiple entries for the same patient since a patient can visit a doctor multiple times a year for the same reason or he might have gotten the same drug multiple times a year

So basically as long as one of the drug Y acquisition dates for a patient falls within two weeks (before or after) of one his doctor visits for disease X , I want a new variable lets say within2weeks=1. if none of the drug acquisitions dates of drug Y for a patient doesn't fall within 2 weeks of any of his clinic visits i want within2weeks=0.

in a nutshell, I want to come up with a list of patients that got drug Y because of disease X but since the disease database and the drug database are separate, and the only link between them is the UNIQUEID of the patient my only option is to find the drug Y acquisition dates that are close to when the patient went in for disease X checkup.

Any clue how I can go about this?

Thanks as always!





--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-across-2-datasets-based-on-a-Date-variable-tp5722507.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

devoidx
Answer to you questions:

My two datasets already contain ONLY my subpoulations of interest (so Dataset1 has only patients who visited for disease X) and dataset2 contains patients with disease X who acquired drug Y...but since Drug Y can be used for many other diseases than X, for me to be able to say that patient was prescriebed drug Y for disease X, the office visit date and drug aquistion dates must be very close to each other

To answer your other question, depending on which claim is filed first, (ie. medical or presription drug) ...if a patient gets his drug before his medical bill is filed, the date of his prescription drug will be before his date of medical visit claim record which is why I am doing plus minus 2 weeks of the doctor visit.

Hope this helps
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

Rich Ulrich
I am not sure about all the complications here.  But it sounds like
you can get what you want if you attach the date from dataset1 to
every record in dataset2 as the first step.

Then you do your date comparison and set the flag when appropriate,
and use AGG in order to reduce the data back to one record per person,
preserving the Flag value.

--
Rich Ulrich


> Date: Fri, 11 Oct 2013 13:48:34 -0700

> From: [hidden email]
> Subject: Re: Finding matching cases across 2 datasets based on a Date variable
> To: [hidden email]
>
> Answer to you questions:
>
> My two datasets already contain ONLY my subpoulations of interest (so
> Dataset1 has only patients who visited for disease X) and dataset2 contains
> patients with disease X who acquired drug Y...but since Drug Y can be used
> for many other diseases than X, for me to be able to say that patient was
> prescriebed drug Y for disease X, the office visit date and drug aquistion
> dates must be very close to each other
>
> To answer your other question, depending on which claim is filed first, (ie.
> medical or presription drug) ...if a patient gets his drug before his
> medical bill is filed, the date of his prescription drug will be before his
> date of medical visit claim record which is why I am doing plus minus 2
> weeks of the doctor visit.
>
> Hope this helps
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-across-2-datasets-based-on-a-Date-variable-tp5722507p5722509.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

David Marso
Administrator
In reply to this post by devoidx


It might be helpful if you were to mock up an example of the two data sets (what variables exist in each of the 2 data sets?, A before/After snapshot of inputs and desired outputs etc.
In the meanwhile:
1. Look at the /BY subcommand on ADD FILES.
2.  Look at LEAD and LAG (on CREATE command) or better see SHIFT VALUES command.

devoidx wrote
Hi guys, new day, new problem.  So this time I have two datasets. dataset1 includes the patients (uniqueID) that saw a doctor for disease X (Diagnosis_var). dataset2 contains the same patients (uniqueID) used the drug Y(Drug_var). both datasets contain  a date variable which specifies the date that the patient either went to the doctor (in dataset1) or the date that the patient got his drug Y (in dataset2).

what I want to do is to some how match these two databases so that if a patient got his drug Y plus or minus 2 weeks of this doctor visit for Disease X, return 1, otherwise 0.

What complicates this is that each database can have multiple entries for the same patient since a patient can visit a doctor multiple times a year for the same reason or he might have gotten the same drug multiple times a year

So basically as long as one of the drug Y acquisition dates for a patient falls within two weeks (before or after) of one his doctor visits for disease X , I want a new variable lets say within2weeks=1. if none of the drug acquisitions dates of drug Y for a patient doesn't fall within 2 weeks of any of his clinic visits i want within2weeks=0.

in a nutshell, I want to come up with a list of patients that got drug Y because of disease X but since the disease database and the drug database are separate, and the only link between them is the UNIQUEID of the patient my only option is to find the drug Y acquisition dates that are close to when the patient went in for disease X checkup.

Any clue how I can go about this?

Thanks as always!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

devoidx
This post was updated on .
In reply to this post by Rich Ulrich
Rich, I am not sure how that is gonna work...each patient has 5-20 different office visits dates (ie. 5-20 difference cases in the dataset)  in dataset1 and also has 5-10 different drug aquisition dates (ie.5-10 separate cases) in dataset 2. Each date of drug acquisition needs to be compared with each date of office visit so for each patients there are 5-20 X 5-10 date comparisons that needs to be done
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

devoidx
This post was updated on .
Hi David, Here is a tiny mock up example:

Dataset1                                                                        Dataset2

ID           Diagnosis          ClinicDate                         ID            Drug        Aquistion Date                        
1                  X               19-aug-2012                        1               Y          25-Aug-2012
1                  X               21-Aug-2012                        1               Y          08-Nov-2012
1                  X               15 OCT-2012                        2              Y          28-May-2011
2                  X               13-June-2011                        2             Y          12-April-2011
2                  X               17-May-2011

My goal is to see if any of the drug aquisition dates for patientID=1 are within 2 weeks of any of the clinic dates of the same patient and return 1 if there is such occurence and 0 if there isn't...same with patient 2, etc.
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

David Marso
Administrator
Here is where I was hinting ;-)
I'll leave it to you to finish the thought ..
You will also want to LEAD and LAG the ID and InData1 flag to properly line up the ducks and then use CTIME.DAYS or DATEDIFF to thin the flock.
The ADD FILES with a BY is a powerful technique for interleaving cases.
Have a great weekend (It's Golden Monkey time here in PA so I'm out of here for now (You really don't want me posting after a couple Golden Monkeys hit my brain ;-) !.

DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA                                            
1                  X               19-aug-2012                      
1                  X               21-Aug-2012                        
1                  X               15-Oct-2012                      
2                  X               13-June-2011                    
2                  X               17-May-2011
END DATA.
SORT CASES BY ID ClinicDate.
COMPUTE COPYDATE=ClinicDate.
DATASET NAME data1.
DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA      
1               Y          25-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
END DATA.

SORT CASES BY ID Aquistion.
COMPUTE COPYDATE=Aquistion.
DATASET NAME data2.

ADD FILES / FILE data1 / IN=INData1 / FILE data2 / BY ID COPYDATE.
SHIFT VALUES
  VARIABLE=CopyDate RESULT=PrevDate LAG=1
  /VARIABLE=CopyDate RESULT=NextDate LEAD=1.
FORMATS CopyDate PrevDate NextDate (DATE).
LIST.





ID Diagnosis  ClinicDate    COPYDATE Drug   Aquistion INData1    PrevDate    NextDate

 1 X         19-AUG-2012 19-AUG-2012                .    1              . 21-AUG-2012
 1 X         21-AUG-2012 21-AUG-2012                .    1    19-AUG-2012 25-AUG-2012
 1                     . 25-AUG-2012 Y    25-AUG-2012    0    21-AUG-2012 15-OCT-2012
 1 X         15-OCT-2012 15-OCT-2012                .    1    25-AUG-2012 08-NOV-2012
 1                     . 08-NOV-2012 Y    08-NOV-2012    0    15-OCT-2012 12-APR-2011
 2                     . 12-APR-2011 Y    12-APR-2011    0    08-NOV-2012 17-MAY-2011
 2 X         17-MAY-2011 17-MAY-2011                .    1    12-APR-2011 28-MAY-2011
 2                     . 28-MAY-2011 Y    28-MAY-2011    0    17-MAY-2011 13-JUN-2011
 2 X         13-JUN-2011 13-JUN-2011                .    1    28-MAY-2011           .


Number of cases read:  9    Number of cases listed:  9


devoidx wrote
Hi David, Here is a tiny mock up example:

Dataset1                                                                        Dataset2

ID           Diagnosis          ClinicDate                         ID            Drug        Aquistion Date                        
1                  X               19-aug-2012                        1               Y          25-Aug-2012
1                  X               21-Aug-2012                        1               Y          08-Nov-2012
1                  X               15 OCT-2012                        2              Y          28-May-2011
2                  X               13-June-2011                        2             Y          12-April-2011
2                  X               17-May-2011

My goal is to see if any of the drug aquisition dates for patientID=1 are within 2 weeks of any of the clinic dates of the same patient and return 1 if there is such occurence and 0 if there isn't...same with patient 2, etc.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

devoidx
Nice David, that looks like a pretty clever strategy! im gonna implement it on my own database, hopefully it will go smooth.

Thanks again!
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

David Marso
Administrator
In reply to this post by David Marso
Ah hell, Girlfriend is running late and I'll end up corpse-like if I begin sucking suds without her ;-)))
Here is the rest of the Marso-Madness for your perusal, study and possible epiphany .

SHIFT VALUES
  VARIABLE=CopyDate RESULT=PrevDate LAG=1
  /VARIABLE=CopyDate RESULT=NextDate LEAD=1
 /VARIABLE=ID RESULT=PrevID LAG=1
  /VARIABLE=ID RESULT=NextID LEAD=1
 /VARIABLE=INDATA1 RESULT=PrevSource LAG=1
/VARIABLE=INDATA1 RESULT=NextSource LEAD=1.


DO IF NOT(InData1).
IF ID=PrevID  AND PrevSource DeltaPrev=CTIME.DAYS(CopyDate-PrevDate).
IF ID=NextID  AND NextSource DeltaNext=CTIME.DAYS(NextDate-CopyDate).
COMPUTE Within2Weeks=ANY(1,DeltaPrev LE 14,DeltaNext LE 14).
END IF.
FORMATS CopyDate PrevDate NextDate (DATE).

* Alternatively for last piece (Flanked by the DO IF .... END IF block *
-Downside is you don't get the deltas..
-Upside is you don't add 2 new variables to the file.
-DS2:  You may be muttering WTF to yourself 6 mos from now if you don't
          comment the bejeebus out of it.

COMPUTE Within2Weeks_Alt=
    NOT(InData1) AND
        ((ID=PrevID  AND PrevSource AND CTIME.DAYS(CopyDate-PrevDate) LE 14)
   OR  (ID=NextID  AND NextSource AND CTIME.DAYS(NextDate-CopyDate) LE 14)).

Afterthought you could Flag the med file rather than the Doc file  and avoid the NOT(InData1)
I find Negatives more difficult to process mentally.  
This has been shown in Psych studies of reasoning as well .

David Marso wrote
Here is where I was hinting ;-)
I'll leave it to you to finish the thought ..
You will also want to LEAD and LAG the ID and InData1 flag to properly line up the ducks and then use CTIME.DAYS or DATEDIFF to thin the flock.
The ADD FILES with a BY is a powerful technique for interleaving cases.
Have a great weekend (It's Golden Monkey time here in PA so I'm out of here for now (You really don't want me posting after a couple Golden Monkeys hit my brain ;-) !.

DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA                                            
1                  X               19-aug-2012                      
1                  X               21-Aug-2012                        
1                  X               15-Oct-2012                      
2                  X               13-June-2011                    
2                  X               17-May-2011
END DATA.
SORT CASES BY ID ClinicDate.
COMPUTE COPYDATE=ClinicDate.
DATASET NAME data1.
DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA      
1               Y          25-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
END DATA.

SORT CASES BY ID Aquistion.
COMPUTE COPYDATE=Aquistion.
DATASET NAME data2.

ADD FILES / FILE data1 / IN=INData1 / FILE data2 / BY ID COPYDATE.
SHIFT VALUES
  VARIABLE=CopyDate RESULT=PrevDate LAG=1
  /VARIABLE=CopyDate RESULT=NextDate LEAD=1.
FORMATS CopyDate PrevDate NextDate (DATE).
LIST.





ID Diagnosis  ClinicDate    COPYDATE Drug   Aquistion INData1    PrevDate    NextDate

 1 X         19-AUG-2012 19-AUG-2012                .    1              . 21-AUG-2012
 1 X         21-AUG-2012 21-AUG-2012                .    1    19-AUG-2012 25-AUG-2012
 1                     . 25-AUG-2012 Y    25-AUG-2012    0    21-AUG-2012 15-OCT-2012
 1 X         15-OCT-2012 15-OCT-2012                .    1    25-AUG-2012 08-NOV-2012
 1                     . 08-NOV-2012 Y    08-NOV-2012    0    15-OCT-2012 12-APR-2011
 2                     . 12-APR-2011 Y    12-APR-2011    0    08-NOV-2012 17-MAY-2011
 2 X         17-MAY-2011 17-MAY-2011                .    1    12-APR-2011 28-MAY-2011
 2                     . 28-MAY-2011 Y    28-MAY-2011    0    17-MAY-2011 13-JUN-2011
 2 X         13-JUN-2011 13-JUN-2011                .    1    28-MAY-2011           .


Number of cases read:  9    Number of cases listed:  9


devoidx wrote
Hi David, Here is a tiny mock up example:

Dataset1                                                                        Dataset2

ID           Diagnosis          ClinicDate                         ID            Drug        Aquistion Date                        
1                  X               19-aug-2012                        1               Y          25-Aug-2012
1                  X               21-Aug-2012                        1               Y          08-Nov-2012
1                  X               15 OCT-2012                        2              Y          28-May-2011
2                  X               13-June-2011                        2             Y          12-April-2011
2                  X               17-May-2011

My goal is to see if any of the drug aquisition dates for patientID=1 are within 2 weeks of any of the clinic dates of the same patient and return 1 if there is such occurence and 0 if there isn't...same with patient 2, etc.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

David Marso
Administrator

OK (1 Monkey + 1 Bud) :
Slicker than snot on a doorknob version consolidated ;-)
Minimized # of new variables created (can't avoid it on the LEADS can on the LAGS)
Note RENAMES on ADD
Likely an issue considering you probably are still dealing with a HUGE dataset.

DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA                                            
1                  X               19-aug-2012                      
1                  X               21-Aug-2012                        
1                  X               15-Oct-2012                      
2                  X               13-June-2011                    
2                  X               17-May-2011
END DATA.

SORT CASES BY ID ClinicDate.
*COMPUTE COPYDATE=ClinicDate.
DATASET NAME data1.

DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA      
1               Y          25-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
END DATA.

SORT CASES BY ID Aquistion.
*COMPUTE COPYDATE=Aquistion.
DATASET NAME data2.

ADD FILES
  / FILE data1 / RENAME(ClinicDate=CopyDate)
  / FILE data2 / RENAME (Aquistion=CopyDate)/ IN=INData2
  / BY ID COPYDATE.

SHIFT VALUES
   VARIABLE=CopyDate RESULT=NextDate   LEAD=1
  /VARIABLE=ID       RESULT=NextID     LEAD=1
  /VARIABLE=InData2  RESULT=NextSource LEAD=1.

COMPUTE WithIn2Weeks=
   InData2
  AND
    ((NOT(LAG(InData2)) AND ID EQ LAG(ID) AND CTIME.DAYS(CopyDate-LAG(CopyDate)) LE 14)
  OR (NOT(NextSource)   AND ID EQ NextID  AND CTIME.DAYS(NextDate-CopyDate)      LE 14)).

FORMATS CopyDate NextDate (DATE).
LIST.

ID Diagnosis    CopyDate Drug INData2    NextDate NextID NextSource WithIn2Weeks

 1 X         19-AUG-2012         0    21-AUG-2012    1        0            .00
 1 X         21-AUG-2012         0    25-AUG-2012    1        1            .00
 1           25-AUG-2012 Y       1    15-OCT-2012    1        0           1.00
 1 X         15-OCT-2012         0    08-NOV-2012    1        1            .00
 1           08-NOV-2012 Y       1    12-APR-2011    2        1            .00
 2           12-APR-2011 Y       1    17-MAY-2011    2        0            .00
 2 X         17-MAY-2011         0    28-MAY-2011    2        1            .00
 2           28-MAY-2011 Y       1    13-JUN-2011    2        0           1.00
 2 X         13-JUN-2011         0              .    .        .            .00


Number of cases read:  9    Number of cases listed:  9

David Marso wrote
Ah hell, Girlfriend is running late and I'll end up corpse-like if I begin sucking suds without her ;-)))
Here is the rest of the Marso-Madness for your perusal, study and possible epiphany .

SHIFT VALUES
  VARIABLE=CopyDate RESULT=PrevDate LAG=1
  /VARIABLE=CopyDate RESULT=NextDate LEAD=1
 /VARIABLE=ID RESULT=PrevID LAG=1
  /VARIABLE=ID RESULT=NextID LEAD=1
 /VARIABLE=INDATA1 RESULT=PrevSource LAG=1
/VARIABLE=INDATA1 RESULT=NextSource LEAD=1.


DO IF NOT(InData1).
IF ID=PrevID  AND PrevSource DeltaPrev=CTIME.DAYS(CopyDate-PrevDate).
IF ID=NextID  AND NextSource DeltaNext=CTIME.DAYS(NextDate-CopyDate).
COMPUTE Within2Weeks=ANY(1,DeltaPrev LE 14,DeltaNext LE 14).
END IF.
FORMATS CopyDate PrevDate NextDate (DATE).

* Alternatively for last piece (Flanked by the DO IF .... END IF block *
-Downside is you don't get the deltas..
-Upside is you don't add 2 new variables to the file.
-DS2:  You may be muttering WTF to yourself 6 mos from now if you don't
          comment the bejeebus out of it.

COMPUTE Within2Weeks_Alt=
    NOT(InData1) AND
        ((ID=PrevID  AND PrevSource AND CTIME.DAYS(CopyDate-PrevDate) LE 14)
   OR  (ID=NextID  AND NextSource AND CTIME.DAYS(NextDate-CopyDate) LE 14)).

Afterthought you could Flag the med file rather than the Doc file  and avoid the NOT(InData1)
I find Negatives more difficult to process mentally.  
This has been shown in Psych studies of reasoning as well .

David Marso wrote
Here is where I was hinting ;-)
I'll leave it to you to finish the thought ..
You will also want to LEAD and LAG the ID and InData1 flag to properly line up the ducks and then use CTIME.DAYS or DATEDIFF to thin the flock.
The ADD FILES with a BY is a powerful technique for interleaving cases.
Have a great weekend (It's Golden Monkey time here in PA so I'm out of here for now (You really don't want me posting after a couple Golden Monkeys hit my brain ;-) !.

DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA                                            
1                  X               19-aug-2012                      
1                  X               21-Aug-2012                        
1                  X               15-Oct-2012                      
2                  X               13-June-2011                    
2                  X               17-May-2011
END DATA.
SORT CASES BY ID ClinicDate.
COMPUTE COPYDATE=ClinicDate.
DATASET NAME data1.
DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA      
1               Y          25-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
END DATA.

SORT CASES BY ID Aquistion.
COMPUTE COPYDATE=Aquistion.
DATASET NAME data2.

ADD FILES / FILE data1 / IN=INData1 / FILE data2 / BY ID COPYDATE.
SHIFT VALUES
  VARIABLE=CopyDate RESULT=PrevDate LAG=1
  /VARIABLE=CopyDate RESULT=NextDate LEAD=1.
FORMATS CopyDate PrevDate NextDate (DATE).
LIST.





ID Diagnosis  ClinicDate    COPYDATE Drug   Aquistion INData1    PrevDate    NextDate

 1 X         19-AUG-2012 19-AUG-2012                .    1              . 21-AUG-2012
 1 X         21-AUG-2012 21-AUG-2012                .    1    19-AUG-2012 25-AUG-2012
 1                     . 25-AUG-2012 Y    25-AUG-2012    0    21-AUG-2012 15-OCT-2012
 1 X         15-OCT-2012 15-OCT-2012                .    1    25-AUG-2012 08-NOV-2012
 1                     . 08-NOV-2012 Y    08-NOV-2012    0    15-OCT-2012 12-APR-2011
 2                     . 12-APR-2011 Y    12-APR-2011    0    08-NOV-2012 17-MAY-2011
 2 X         17-MAY-2011 17-MAY-2011                .    1    12-APR-2011 28-MAY-2011
 2                     . 28-MAY-2011 Y    28-MAY-2011    0    17-MAY-2011 13-JUN-2011
 2 X         13-JUN-2011 13-JUN-2011                .    1    28-MAY-2011           .


Number of cases read:  9    Number of cases listed:  9


devoidx wrote
Hi David, Here is a tiny mock up example:

Dataset1                                                                        Dataset2

ID           Diagnosis          ClinicDate                         ID            Drug        Aquistion Date                        
1                  X               19-aug-2012                        1               Y          25-Aug-2012
1                  X               21-Aug-2012                        1               Y          08-Nov-2012
1                  X               15 OCT-2012                        2              Y          28-May-2011
2                  X               13-June-2011                        2             Y          12-April-2011
2                  X               17-May-2011

My goal is to see if any of the drug aquisition dates for patientID=1 are within 2 weeks of any of the clinic dates of the same patient and return 1 if there is such occurence and 0 if there isn't...same with patient 2, etc.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

hillel vardi
In reply to this post by devoidx

Shalom

If there is only one acquisition per one
doctor visit  using “lag” or “shift value” will work, but if there is an unknown acquisitions per visit 2 visits that fits the same acquisitions I believe One should use some king of “leave” .

Here is the way I will try to solve the problem .

 


* >>> create an example file from the SPSS sample files  .
GET
  FILE='C:\Program Files (x86)\EpiTools\SPSSInc\PASWStatistics18\Samples\English\insurance_claims.sav'
 / keep=  income   incident_date         retire    .
RENAME VARIABLES ( income   incident_date         retire=sno date case) .
VALUE LABELS   case 1 'doctor visit' 0 'drug  acquisition dates'
formats  date(edate10).
DATASET NAME dates.
select if  sno  ge 40 and sno le 80 .
*>>>  end of create example file.

sort cases by sno date.
select if  case eq 0 and date gt lag(date,1) or case eq 1 .
if case eq 1  #casedate=date .
if case eq 0 and (sno gt lag(sno,1))  #casedate=$sysmis .
if case eq 0   diff_affter =  datediff(date,#casedate,'days') .
sort cases by sno(a) date(d).
if case eq 1  #casedate=date .
if case eq 0 and (sno gt lag(sno,1))  #casedate=$sysmis .
if case eq 0   diff_befor =  datediff(date,#casedate,'days') .
execute .
recode diff_affter(15 thru hi=sysmis) .
recode diff_befor(lo thru -15  1 thru hi=sysmis) .
select if   case eq 1  or  diff_affter le 14 or diff_befor ge -14 .
sort cases by sno date.


Hillel vardi
bgu

On 11/10/2013 22:35, devoidx wrote:
Hi guy, new day, new problem.  So this time I have two datasets. dataset1
includes the patients (uniqueID) that saw a doctor for disease X
(Diagnosis_var). dataset2 contains the same patients (uniqueID) used the
drug Y(Drug_var). both datasets contain  a date variable which specifies the
date that the patient either went to the doctor (in dataset1) or the date
that the patient got his drug Y (in dataset2).

what I want to do is to some how match these two databases so that if a
patient got his drug Y plus or minus 2 weeks of this doctor visit for
Disease X, return 1, otherwise 0.

What complicates this is that each database can have multiple entries for
the same patient since a patient can visit a doctor multiple times a year
for the same reason or he might have gotten the same drug multiple times a
year

So basically as long as one of the drug Y acquisition dates for a patient
falls within two weeks (before or after) of one his doctor visits for
disease X , I want a new variable lets say within2weeks=1. if none of the
drug acquisitions dates of drug Y for a patient doesn't fall within 2 weeks
of any of his clinic visits i want within2weeks=0.

in a nutshell, I want to come up with a list of patients that got drug Y
because of disease X but since the disease database and the drug database
are separate, and the only link between them is the UNIQUEID of the patient
my only option is to find the drug Y acquisition dates that are close to
when the patient went in for disease X checkup.

Any clue how I can go about this?

Thanks as always!





--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-across-2-datasets-based-on-a-Date-variable-tp5722507.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

Andy W
In reply to this post by David Marso
Another option besides David's are to flatten one of the files, then do the merge, then do repeat to see if the diagnosis + drug dates fall within the specified period. If you really only have 5-20 repeat per IDs in each set that isn't so burdensome.  

*************************************************************************************.
DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA                                            
1                  X               19-aug-2012                      
1                  X               21-Aug-2012      
1                  X               27-Aug-2012                          
1                  X               15-Oct-2012                      
2                  X               13-June-2011                    
2                  X               17-May-2011
END DATA.
DATASET NAME Diag.
SORT CASES BY ID ClinicDate.
DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA    
1               Y          18-Aug-2012    
1               Y          20-Aug-2012    
1               Y          25-Aug-2012
1               Y          30-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
2              Y          12-June-2011
2              Y          15-June-2011
2              Z          17-June-2011
END DATA.
DATASET NAME Drug.
SORT CASES BY ID Aquistion.
*Flatten the drug file.
CASESTOVARS
/ID = ID
/SEPARATOR = "_".
*MERGE BACK INTO Diag file.
DATASET ACTIVATE Diag.
MATCH FILES FILE = *
/TABLE = 'Drug'
/BY ID.
*Flag for if any drug is within two weeks.
COMPUTE TwoWeek = 0.
DO REPEAT DrugD = Aquistion_1 to Aquistion_5.
  IF ABS(CTIME.DAYS(DrugD-ClinicDate)) LE 14 TwoWeek = 1.
END REPEAT.
EXECUTE.
*************************************************************************************.

David's approach seems to answer the question at hand - although I would note the LAG-LEAD approach is difficult if you have an indeterminate number of potential matches (e.g. more than 1 drug dates within 2 weeks of a diagnoses) - although if all you want is a flag for any drug date is within the time period then it doesn't matter. Flattening one of the files allows you to extract more info though, such as multiple drug dates within two weeks.

A more ghoulish suggestion might be to compute all 29 days (14 before + 14 after + exact date) and then do 29 match files for the exact dates (or better VARSTOCASES and then do one MATCH FILES). I don't think this would ever be beneficial over the other suggestions, but it does have the benefit that (besides correct sorting) you only need to manipulate one of the files. (So if one is really big and the other small you can just play with the small one.)
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

Maguin, Eugene
In reply to this post by David Marso
Ok, "Golden Monkey" is too good to pass up on considering the "FM" from a while back. So, could Golden Monkey be a very local to Philly craft beer?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Friday, October 11, 2013 10:37 PM
To: [hidden email]
Subject: Re: Finding matching cases across 2 datasets based on a Date variable

OK (1 Monkey + 1 Bud) :
Slicker than snot on a doorknob version consolidated ;-) Minimized # of new variables created (can't avoid it on the LEADS can on the
LAGS)
Note RENAMES on ADD
Likely an issue considering you probably are still dealing with a HUGE dataset.

DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA
1                  X               19-aug-2012
1                  X               21-Aug-2012
1                  X               15-Oct-2012
2                  X               13-June-2011
2                  X               17-May-2011
END DATA.

SORT CASES BY ID ClinicDate.
*COMPUTE COPYDATE=ClinicDate.
DATASET NAME data1.

DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA
1               Y          25-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
END DATA.

SORT CASES BY ID Aquistion.
*COMPUTE COPYDATE=Aquistion.
DATASET NAME data2.

ADD FILES
  / FILE data1 / RENAME(ClinicDate=CopyDate)
  / FILE data2 / RENAME (Aquistion=CopyDate)/ IN=INData2
  / BY ID COPYDATE.

SHIFT VALUES
   VARIABLE=CopyDate RESULT=NextDate   LEAD=1
  /VARIABLE=ID       RESULT=NextID     LEAD=1
  /VARIABLE=InData2  RESULT=NextSource LEAD=1.

COMPUTE WithIn2Weeks=
   InData2
  AND
    ((NOT(LAG(InData2)) AND ID EQ LAG(ID) AND
CTIME.DAYS(CopyDate-LAG(CopyDate)) LE 14)
  OR (NOT(NextSource)   AND ID EQ NextID  AND CTIME.DAYS(NextDate-CopyDate)
LE 14)).

FORMATS CopyDate NextDate (DATE).
LIST.

ID Diagnosis    CopyDate Drug INData2    NextDate NextID NextSource
WithIn2Weeks

 1 X         19-AUG-2012         0    21-AUG-2012    1        0
.00
 1 X         21-AUG-2012         0    25-AUG-2012    1        1
.00
 1           25-AUG-2012 Y       1    15-OCT-2012    1        0
1.00
 1 X         15-OCT-2012         0    08-NOV-2012    1        1
.00
 1           08-NOV-2012 Y       1    12-APR-2011    2        1
.00
 2           12-APR-2011 Y       1    17-MAY-2011    2        0
.00
 2 X         17-MAY-2011         0    28-MAY-2011    2        1
.00
 2           28-MAY-2011 Y       1    13-JUN-2011    2        0
1.00
 2 X         13-JUN-2011         0              .    .        .
.00


Number of cases read:  9    Number of cases listed:  9


David Marso wrote
> Ah hell, Girlfriend is running late and I'll end up corpse-like if I
> begin sucking suds without her ;-))) Here is the rest of the
> Marso-Madness for your perusal, study and possible epiphany .
>
> SHIFT VALUES
>   VARIABLE=CopyDate RESULT=PrevDate LAG=1
>   /VARIABLE=CopyDate RESULT=NextDate LEAD=1
*
>  /VARIABLE=ID RESULT=PrevID LAG=1
>   /VARIABLE=ID RESULT=NextID LEAD=1
>  /VARIABLE=INDATA1 RESULT=PrevSource LAG=1
> /VARIABLE=INDATA1 RESULT=NextSource LEAD=1.
*

>
> DO IF NOT(InData1).
> IF ID=PrevID  AND PrevSource DeltaPrev=CTIME.DAYS(CopyDate-PrevDate).
> IF ID=NextID  AND NextSource DeltaNext=CTIME.DAYS(NextDate-CopyDate).
> COMPUTE Within2Weeks=ANY(1,DeltaPrev LE 14,DeltaNext LE 14).
> END IF.
> FORMATS CopyDate PrevDate NextDate (DATE).
>
> * Alternatively for last piece (Flanked by the DO IF .... END IF block
> * -Downside is you don't get the deltas..
> -Upside is you don't add 2 new variables to the file.
> -DS2:  You may be muttering WTF to yourself 6 mos from now if you don't
>           comment the bejeebus out of it.
>
> COMPUTE Within2Weeks_Alt=
>     NOT(InData1) AND
>         ((ID=PrevID  AND PrevSource AND CTIME.DAYS(CopyDate-PrevDate)
> LE
> 14)
>    OR  (ID=NextID  AND NextSource AND CTIME.DAYS(NextDate-CopyDate) LE
> 14)).
>
> Afterthought you could Flag the med file rather than the Doc file  and
> avoid the NOT(InData1) I find Negatives more difficult to process
> mentally.
> This has been shown in Psych studies of reasoning as well .
> David Marso wrote
>> Here is where I was hinting ;-)
>> I'll leave it to you to finish the thought ..
>> You will also want to LEAD and LAG the ID and InData1 flag to
>> properly line up the ducks and then use CTIME.DAYS or DATEDIFF to thin the flock.
>> The ADD FILES with a BY is a powerful technique for interleaving cases.
>> Have a great weekend (It's Golden Monkey time here in PA so I'm out
>> of here for now (You really don't want me posting after a couple
>> Golden Monkeys hit my brain ;-) !.
>>
>> DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
>> BEGIN DATA
>> 1                  X               19-aug-2012
>> 1                  X               21-Aug-2012
>> 1                  X               15-Oct-2012
>> 2                  X               13-June-2011
>> 2                  X               17-May-2011
>> END DATA.
>> SORT CASES BY ID ClinicDate.
>> COMPUTE COPYDATE=ClinicDate.
>> DATASET NAME data1.
>> DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
>> BEGIN DATA
>> 1               Y          25-Aug-2012
>> 1               Y          08-Nov-2012
>> 2              Y          28-May-2011
>> 2             Y          12-April-2011
>> END DATA.
>>
>> SORT CASES BY ID Aquistion.
>> COMPUTE COPYDATE=Aquistion.
>> DATASET NAME data2.
>>
>> ADD FILES / FILE data1 / IN=INData1 / FILE data2 / BY ID COPYDATE.
>> SHIFT VALUES
>>   VARIABLE=CopyDate RESULT=PrevDate LAG=1
>>   /VARIABLE=CopyDate RESULT=NextDate LEAD=1.
>> FORMATS CopyDate PrevDate NextDate (DATE).
>> LIST.
>>
>>
>>
>>
>>
>> ID Diagnosis  ClinicDate    COPYDATE Drug   Aquistion INData1    PrevDate
>> NextDate
>>
>>  1 X         19-AUG-2012 19-AUG-2012                .    1              .
>> 21-AUG-2012
>>  1 X         21-AUG-2012 21-AUG-2012                .    1    19-AUG-2012
>> 25-AUG-2012
>>  1                     . 25-AUG-2012 Y    25-AUG-2012    0    21-AUG-2012
>> 15-OCT-2012
>>  1 X         15-OCT-2012 15-OCT-2012                .    1    25-AUG-2012
>> 08-NOV-2012
>>  1                     . 08-NOV-2012 Y    08-NOV-2012    0    15-OCT-2012
>> 12-APR-2011
>>  2                     . 12-APR-2011 Y    12-APR-2011    0    08-NOV-2012
>> 17-MAY-2011
>>  2 X         17-MAY-2011 17-MAY-2011                .    1    12-APR-2011
>> 28-MAY-2011
>>  2                     . 28-MAY-2011 Y    28-MAY-2011    0    17-MAY-2011
>> 13-JUN-2011
>>  2 X         13-JUN-2011 13-JUN-2011                .    1    28-MAY-2011
>> .
>>
>>
>> Number of cases read:  9    Number of cases listed:  9
>>
>> devoidx wrote
>>> Hi David, Here is a tiny mock up example:
>>>
>>> Dataset1
>>> Dataset2
>>>
>>> ID           Diagnosis          ClinicDate                         ID
>>> Drug        Aquistion Date
>>> 1                  X               19-aug-2012                        1
>>> Y          25-Aug-2012
>>> 1                  X               21-Aug-2012                        1
>>> Y          08-Nov-2012
>>> 1                  X               15 OCT-2012                        2
>>> Y          28-May-2011
>>> 2                  X               13-June-2011                        2
>>> Y          12-April-2011
>>> 2                  X               17-May-2011
>>>
>>> My goal is to see if any of the drug aquisition dates for
>>> patientID=1 are within 2 weeks of any of the clinic dates of the
>>> same patient and return 1 if there is such occurence and 0 if there
>>> isn't...same with patient 2, etc.





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-across-2-datasets-based-on-a-Date-variable-tp5722507p5722519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

David Marso
Administrator
Golden Monkey according to the Victory Brewing website distributes to 30 states as well as Japan and Singapore.  It sports a 9.5 ABV, so it definitely packs a wallop.  A bit too cloyingly sweet for a session beer but a true treat for fans of Belgian style tripels.  I tend to prefer hoppier styles myself, but do enjoy my Belgian styles.  I brew my own beers and mead (honey wine) and plan to one day create a Golden Monkey 'clone' (5 gal might take awhile to consume, so probably will brew a second batch of something more along the lines if an IPA or Czech Pilsner style- made a barley wine awhile back.  It took over a year to ferment through and we let it bottle condition for another year-).  Brewing is a fun hobby.  Don't expect to save much money over and above buying 6 packs unless doing whole grain (more effort and mess/equipment than I have time/patience to invest).

What is the "FM" from a while back you refer to?

Maguin, Eugene wrote
Ok, "Golden Monkey" is too good to pass up on considering the "FM" from a while back. So, could Golden Monkey be a very local to Philly craft beer?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Friday, October 11, 2013 10:37 PM
To: [hidden email]
Subject: Re: Finding matching cases across 2 datasets based on a Date variable

OK (1 Monkey + 1 Bud) :
Slicker than snot on a doorknob version consolidated ;-) Minimized # of new variables created (can't avoid it on the LEADS can on the
LAGS)
Note RENAMES on ADD
Likely an issue considering you probably are still dealing with a HUGE dataset.

DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
BEGIN DATA
1                  X               19-aug-2012
1                  X               21-Aug-2012
1                  X               15-Oct-2012
2                  X               13-June-2011
2                  X               17-May-2011
END DATA.

SORT CASES BY ID ClinicDate.
*COMPUTE COPYDATE=ClinicDate.
DATASET NAME data1.

DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
BEGIN DATA
1               Y          25-Aug-2012
1               Y          08-Nov-2012
2              Y          28-May-2011
2             Y          12-April-2011
END DATA.

SORT CASES BY ID Aquistion.
*COMPUTE COPYDATE=Aquistion.
DATASET NAME data2.

ADD FILES
  / FILE data1 / RENAME(ClinicDate=CopyDate)
  / FILE data2 / RENAME (Aquistion=CopyDate)/ IN=INData2
  / BY ID COPYDATE.

SHIFT VALUES
   VARIABLE=CopyDate RESULT=NextDate   LEAD=1
  /VARIABLE=ID       RESULT=NextID     LEAD=1
  /VARIABLE=InData2  RESULT=NextSource LEAD=1.

COMPUTE WithIn2Weeks=
   InData2
  AND
    ((NOT(LAG(InData2)) AND ID EQ LAG(ID) AND
CTIME.DAYS(CopyDate-LAG(CopyDate)) LE 14)
  OR (NOT(NextSource)   AND ID EQ NextID  AND CTIME.DAYS(NextDate-CopyDate)
LE 14)).

FORMATS CopyDate NextDate (DATE).
LIST.

ID Diagnosis    CopyDate Drug INData2    NextDate NextID NextSource
WithIn2Weeks

 1 X         19-AUG-2012         0    21-AUG-2012    1        0
.00
 1 X         21-AUG-2012         0    25-AUG-2012    1        1
.00
 1           25-AUG-2012 Y       1    15-OCT-2012    1        0
1.00
 1 X         15-OCT-2012         0    08-NOV-2012    1        1
.00
 1           08-NOV-2012 Y       1    12-APR-2011    2        1
.00
 2           12-APR-2011 Y       1    17-MAY-2011    2        0
.00
 2 X         17-MAY-2011         0    28-MAY-2011    2        1
.00
 2           28-MAY-2011 Y       1    13-JUN-2011    2        0
1.00
 2 X         13-JUN-2011         0              .    .        .
.00


Number of cases read:  9    Number of cases listed:  9


David Marso wrote
> Ah hell, Girlfriend is running late and I'll end up corpse-like if I
> begin sucking suds without her ;-))) Here is the rest of the
> Marso-Madness for your perusal, study and possible epiphany .
>
> SHIFT VALUES
>   VARIABLE=CopyDate RESULT=PrevDate LAG=1
>   /VARIABLE=CopyDate RESULT=NextDate LEAD=1
*
>  /VARIABLE=ID RESULT=PrevID LAG=1
>   /VARIABLE=ID RESULT=NextID LEAD=1
>  /VARIABLE=INDATA1 RESULT=PrevSource LAG=1
> /VARIABLE=INDATA1 RESULT=NextSource LEAD=1.
*
>
> DO IF NOT(InData1).
> IF ID=PrevID  AND PrevSource DeltaPrev=CTIME.DAYS(CopyDate-PrevDate).
> IF ID=NextID  AND NextSource DeltaNext=CTIME.DAYS(NextDate-CopyDate).
> COMPUTE Within2Weeks=ANY(1,DeltaPrev LE 14,DeltaNext LE 14).
> END IF.
> FORMATS CopyDate PrevDate NextDate (DATE).
>
> * Alternatively for last piece (Flanked by the DO IF .... END IF block
> * -Downside is you don't get the deltas..
> -Upside is you don't add 2 new variables to the file.
> -DS2:  You may be muttering WTF to yourself 6 mos from now if you don't
>           comment the bejeebus out of it.
>
> COMPUTE Within2Weeks_Alt=
>     NOT(InData1) AND
>         ((ID=PrevID  AND PrevSource AND CTIME.DAYS(CopyDate-PrevDate)
> LE
> 14)
>    OR  (ID=NextID  AND NextSource AND CTIME.DAYS(NextDate-CopyDate) LE
> 14)).
>
> Afterthought you could Flag the med file rather than the Doc file  and
> avoid the NOT(InData1) I find Negatives more difficult to process
> mentally.
> This has been shown in Psych studies of reasoning as well .
> David Marso wrote
>> Here is where I was hinting ;-)
>> I'll leave it to you to finish the thought ..
>> You will also want to LEAD and LAG the ID and InData1 flag to
>> properly line up the ducks and then use CTIME.DAYS or DATEDIFF to thin the flock.
>> The ADD FILES with a BY is a powerful technique for interleaving cases.
>> Have a great weekend (It's Golden Monkey time here in PA so I'm out
>> of here for now (You really don't want me posting after a couple
>> Golden Monkeys hit my brain ;-) !.
>>
>> DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
>> BEGIN DATA
>> 1                  X               19-aug-2012
>> 1                  X               21-Aug-2012
>> 1                  X               15-Oct-2012
>> 2                  X               13-June-2011
>> 2                  X               17-May-2011
>> END DATA.
>> SORT CASES BY ID ClinicDate.
>> COMPUTE COPYDATE=ClinicDate.
>> DATASET NAME data1.
>> DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
>> BEGIN DATA
>> 1               Y          25-Aug-2012
>> 1               Y          08-Nov-2012
>> 2              Y          28-May-2011
>> 2             Y          12-April-2011
>> END DATA.
>>
>> SORT CASES BY ID Aquistion.
>> COMPUTE COPYDATE=Aquistion.
>> DATASET NAME data2.
>>
>> ADD FILES / FILE data1 / IN=INData1 / FILE data2 / BY ID COPYDATE.
>> SHIFT VALUES
>>   VARIABLE=CopyDate RESULT=PrevDate LAG=1
>>   /VARIABLE=CopyDate RESULT=NextDate LEAD=1.
>> FORMATS CopyDate PrevDate NextDate (DATE).
>> LIST.
>>
>>
>>
>>
>>
>> ID Diagnosis  ClinicDate    COPYDATE Drug   Aquistion INData1    PrevDate
>> NextDate
>>
>>  1 X         19-AUG-2012 19-AUG-2012                .    1              .
>> 21-AUG-2012
>>  1 X         21-AUG-2012 21-AUG-2012                .    1    19-AUG-2012
>> 25-AUG-2012
>>  1                     . 25-AUG-2012 Y    25-AUG-2012    0    21-AUG-2012
>> 15-OCT-2012
>>  1 X         15-OCT-2012 15-OCT-2012                .    1    25-AUG-2012
>> 08-NOV-2012
>>  1                     . 08-NOV-2012 Y    08-NOV-2012    0    15-OCT-2012
>> 12-APR-2011
>>  2                     . 12-APR-2011 Y    12-APR-2011    0    08-NOV-2012
>> 17-MAY-2011
>>  2 X         17-MAY-2011 17-MAY-2011                .    1    12-APR-2011
>> 28-MAY-2011
>>  2                     . 28-MAY-2011 Y    28-MAY-2011    0    17-MAY-2011
>> 13-JUN-2011
>>  2 X         13-JUN-2011 13-JUN-2011                .    1    28-MAY-2011
>> .
>>
>>
>> Number of cases read:  9    Number of cases listed:  9
>>
>> devoidx wrote
>>> Hi David, Here is a tiny mock up example:
>>>
>>> Dataset1
>>> Dataset2
>>>
>>> ID           Diagnosis          ClinicDate                         ID
>>> Drug        Aquistion Date
>>> 1                  X               19-aug-2012                        1
>>> Y          25-Aug-2012
>>> 1                  X               21-Aug-2012                        1
>>> Y          08-Nov-2012
>>> 1                  X               15 OCT-2012                        2
>>> Y          28-May-2011
>>> 2                  X               13-June-2011                        2
>>> Y          12-April-2011
>>> 2                  X               17-May-2011
>>>
>>> My goal is to see if any of the drug aquisition dates for
>>> patientID=1 are within 2 weeks of any of the clinic dates of the
>>> same patient and return 1 if there is such occurence and 0 if there
>>> isn't...same with patient 2, etc.





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-across-2-datasets-based-on-a-Date-variable-tp5722507p5722519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Finding matching cases across 2 datasets based on a Date variable

Maguin, Eugene
FM from RTFM. Golden Monkey sounds interesting, thank you.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Sunday, October 13, 2013 6:57 PM
To: [hidden email]
Subject: Re: Finding matching cases across 2 datasets based on a Date variable

Golden Monkey according to the Victory Brewing website distributes to 30 states as well as Japan and Singapore.  It sports a 9.5 ABV, so it definitely packs a wallop.  A bit too cloyingly sweet for a session beer but a true treat for fans of Belgian style tripels.  I tend to prefer hoppier styles myself, but do enjoy my Belgian styles.  I brew my own beers and mead (honey wine) and plan to one day create a Golden Monkey 'clone' (5 gal might take awhile to consume, so probably will brew a second batch of something more along the lines if an IPA or Czech Pilsner style- made a barley wine awhile back.  It took over a year to ferment through and we let it bottle condition for another year-).  Brewing is a fun hobby.  Don't expect to save much money over and above buying 6 packs unless doing whole grain (more effort and mess/equipment than I have time/patience to invest).

What is the "FM" from a while back you refer to?


Maguin, Eugene wrote
> Ok, "Golden Monkey" is too good to pass up on considering the "FM"
> from a while back. So, could Golden Monkey be a very local to Philly craft beer?
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:

> SPSSX-L@.UGA

> ] On Behalf Of David Marso
> Sent: Friday, October 11, 2013 10:37 PM
> To:

> SPSSX-L@.UGA

> Subject: Re: Finding matching cases across 2 datasets based on a Date
> variable
>
> OK (1 Monkey + 1 Bud) :
> Slicker than snot on a doorknob version consolidated ;-) Minimized #
> of new variables created (can't avoid it on the LEADS can on the
> LAGS)
> Note RENAMES on ADD
> Likely an issue considering you probably are still dealing with a HUGE
> dataset.
>
> DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
> BEGIN DATA
> 1                  X               19-aug-2012
> 1                  X               21-Aug-2012
> 1                  X               15-Oct-2012
> 2                  X               13-June-2011
> 2                  X               17-May-2011
> END DATA.
>
> SORT CASES BY ID ClinicDate.
> *COMPUTE COPYDATE=ClinicDate.
> DATASET NAME data1.
>
> DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
> BEGIN DATA
> 1               Y          25-Aug-2012
> 1               Y          08-Nov-2012
> 2              Y          28-May-2011
> 2             Y          12-April-2011
> END DATA.
>
> SORT CASES BY ID Aquistion.
> *COMPUTE COPYDATE=Aquistion.
> DATASET NAME data2.
>
> ADD FILES
>   / FILE data1 / RENAME(ClinicDate=CopyDate)
>   / FILE data2 / RENAME (Aquistion=CopyDate)/ IN=INData2
>   / BY ID COPYDATE.
>
> SHIFT VALUES
>    VARIABLE=CopyDate RESULT=NextDate   LEAD=1
>   /VARIABLE=ID       RESULT=NextID     LEAD=1
>   /VARIABLE=InData2  RESULT=NextSource LEAD=1.
>
> COMPUTE WithIn2Weeks=
>    InData2
>   AND
>     ((NOT(LAG(InData2)) AND ID EQ LAG(ID) AND
> CTIME.DAYS(CopyDate-LAG(CopyDate)) LE 14)
>   OR (NOT(NextSource)   AND ID EQ NextID  AND
> CTIME.DAYS(NextDate-CopyDate)
> LE 14)).
>
> FORMATS CopyDate NextDate (DATE).
> LIST.
>
> ID Diagnosis    CopyDate Drug INData2    NextDate NextID NextSource
> WithIn2Weeks
>
>  1 X         19-AUG-2012         0    21-AUG-2012    1        0
> .00
>  1 X         21-AUG-2012         0    25-AUG-2012    1        1
> .00
>  1           25-AUG-2012 Y       1    15-OCT-2012    1        0
> 1.00
>  1 X         15-OCT-2012         0    08-NOV-2012    1        1
> .00
>  1           08-NOV-2012 Y       1    12-APR-2011    2        1
> .00
>  2           12-APR-2011 Y       1    17-MAY-2011    2        0
> .00
>  2 X         17-MAY-2011         0    28-MAY-2011    2        1
> .00
>  2           28-MAY-2011 Y       1    13-JUN-2011    2        0
> 1.00
>  2 X         13-JUN-2011         0              .    .        .
> .00
>
>
> Number of cases read:  9    Number of cases listed:  9
>
>
> David Marso wrote
>> Ah hell, Girlfriend is running late and I'll end up corpse-like if I
>> begin sucking suds without her ;-))) Here is the rest of the
>> Marso-Madness for your perusal, study and possible epiphany .
>>
>> SHIFT VALUES
>>   VARIABLE=CopyDate RESULT=PrevDate LAG=1
>>   /VARIABLE=CopyDate RESULT=NextDate LEAD=1
> *
>>  /VARIABLE=ID RESULT=PrevID LAG=1
>>   /VARIABLE=ID RESULT=NextID LEAD=1
>>  /VARIABLE=INDATA1 RESULT=PrevSource LAG=1
>> /VARIABLE=INDATA1 RESULT=NextSource LEAD=1.
> *
>>
>> DO IF NOT(InData1).
>> IF ID=PrevID  AND PrevSource DeltaPrev=CTIME.DAYS(CopyDate-PrevDate).
>> IF ID=NextID  AND NextSource DeltaNext=CTIME.DAYS(NextDate-CopyDate).
>> COMPUTE Within2Weeks=ANY(1,DeltaPrev LE 14,DeltaNext LE 14).
>> END IF.
>> FORMATS CopyDate PrevDate NextDate (DATE).
>>
>> * Alternatively for last piece (Flanked by the DO IF .... END IF
>> block
>> * -Downside is you don't get the deltas..
>> -Upside is you don't add 2 new variables to the file.
>> -DS2:  You may be muttering WTF to yourself 6 mos from now if you don't
>>           comment the bejeebus out of it.
>>
>> COMPUTE Within2Weeks_Alt=
>>     NOT(InData1) AND
>>         ((ID=PrevID  AND PrevSource AND CTIME.DAYS(CopyDate-PrevDate)
>> LE
>> 14)
>>    OR  (ID=NextID  AND NextSource AND CTIME.DAYS(NextDate-CopyDate)
>> LE 14)).
>>
>> Afterthought you could Flag the med file rather than the Doc file
>> and avoid the NOT(InData1) I find Negatives more difficult to process
>> mentally.
>> This has been shown in Psych studies of reasoning as well .
>> David Marso wrote
>>> Here is where I was hinting ;-)
>>> I'll leave it to you to finish the thought ..
>>> You will also want to LEAD and LAG the ID and InData1 flag to
>>> properly line up the ducks and then use CTIME.DAYS or DATEDIFF to
>>> thin the flock.
>>> The ADD FILES with a BY is a powerful technique for interleaving cases.
>>> Have a great weekend (It's Golden Monkey time here in PA so I'm out
>>> of here for now (You really don't want me posting after a couple
>>> Golden Monkeys hit my brain ;-) !.
>>>
>>> DATA LIST LIST / ID  (F1)    Diagnosis (A1)   ClinicDate (DATE).
>>> BEGIN DATA
>>> 1                  X               19-aug-2012
>>> 1                  X               21-Aug-2012
>>> 1                  X               15-Oct-2012
>>> 2                  X               13-June-2011
>>> 2                  X               17-May-2011
>>> END DATA.
>>> SORT CASES BY ID ClinicDate.
>>> COMPUTE COPYDATE=ClinicDate.
>>> DATASET NAME data1.
>>> DATA LIST LIST / ID  (F1)    Drug (A1)   Aquistion (DATE).
>>> BEGIN DATA
>>> 1               Y          25-Aug-2012
>>> 1               Y          08-Nov-2012
>>> 2              Y          28-May-2011
>>> 2             Y          12-April-2011
>>> END DATA.
>>>
>>> SORT CASES BY ID Aquistion.
>>> COMPUTE COPYDATE=Aquistion.
>>> DATASET NAME data2.
>>>
>>> ADD FILES / FILE data1 / IN=INData1 / FILE data2 / BY ID COPYDATE.
>>> SHIFT VALUES
>>>   VARIABLE=CopyDate RESULT=PrevDate LAG=1
>>>   /VARIABLE=CopyDate RESULT=NextDate LEAD=1.
>>> FORMATS CopyDate PrevDate NextDate (DATE).
>>> LIST.
>>>
>>>
>>>
>>>
>>>
>>> ID Diagnosis  ClinicDate    COPYDATE Drug   Aquistion INData1
>>> PrevDate
>>> NextDate
>>>
>>>  1 X         19-AUG-2012 19-AUG-2012                .    1
>>> .
>>> 21-AUG-2012
>>>  1 X         21-AUG-2012 21-AUG-2012                .    1
>>> 19-AUG-2012
>>> 25-AUG-2012
>>>  1                     . 25-AUG-2012 Y    25-AUG-2012    0
>>> 21-AUG-2012
>>> 15-OCT-2012
>>>  1 X         15-OCT-2012 15-OCT-2012                .    1
>>> 25-AUG-2012
>>> 08-NOV-2012
>>>  1                     . 08-NOV-2012 Y    08-NOV-2012    0
>>> 15-OCT-2012
>>> 12-APR-2011
>>>  2                     . 12-APR-2011 Y    12-APR-2011    0
>>> 08-NOV-2012
>>> 17-MAY-2011
>>>  2 X         17-MAY-2011 17-MAY-2011                .    1
>>> 12-APR-2011
>>> 28-MAY-2011
>>>  2                     . 28-MAY-2011 Y    28-MAY-2011    0
>>> 17-MAY-2011
>>> 13-JUN-2011
>>>  2 X         13-JUN-2011 13-JUN-2011                .    1
>>> 28-MAY-2011
>>> .
>>>
>>>
>>> Number of cases read:  9    Number of cases listed:  9
>>>
>>> devoidx wrote
>>>> Hi David, Here is a tiny mock up example:
>>>>
>>>> Dataset1
>>>> Dataset2
>>>>
>>>> ID           Diagnosis          ClinicDate                         ID
>>>> Drug        Aquistion Date
>>>> 1                  X               19-aug-2012                        1
>>>> Y          25-Aug-2012
>>>> 1                  X               21-Aug-2012                        1
>>>> Y          08-Nov-2012
>>>> 1                  X               15 OCT-2012                        2
>>>> Y          28-May-2011
>>>> 2                  X               13-June-2011
>>>> 2
>>>> Y          12-April-2011
>>>> 2                  X               17-May-2011
>>>>
>>>> My goal is to see if any of the drug aquisition dates for
>>>> patientID=1 are within 2 weeks of any of the clinic dates of the
>>>> same patient and return 1 if there is such occurence and 0 if there
>>>> isn't...same with patient 2, etc.
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
> porcos ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum
> cliff in abyssum?"
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-a
> cross-2-datasets-based-on-a-Date-variable-tp5722507p5722519.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to
> manage subscriptions, send the command INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Finding-matching-cases-across-2-datasets-based-on-a-Date-variable-tp5722507p5722531.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD