|
Hi,
I want to match a dataset of patients in hospital for specified periods (data set 1) with a dataset of patients and review dates (data set 2), so that each review date for the patient is linked to the relevant hospital period (via id and review date), to produce a 'long' file (ie each case has the period of care dates *and* the review dates). The end product should look like data set 3. Data set (1). * DATA SET 1. * NB 2nd record has a null end date to show the period of care has not yet terminated. DATA LIST FREE/ id(A2) POCs(DATE11) POCend(DATE11). BEGIN DATA 01 04-JAN-2004 05-MAY-2005 01 05-FEB-2006 . 02 04-JUL-2003 06-DEC-2003 END DATA. Data set (2) * DATA SET 2. DATA LIST FREE/ id(A2) revdate(DATE11). BEGIN DATA 01 05-FEB-2004 01 12-MARCH-2004 01 04-MAR-2006 01 5-JUL-2006 02 05-AUG-2003 02 06-SEP-2003 02 08-NOV-2003 END DATA. * DATA SET 3. 01 04-JAN-2004 05-MAY-2005 05-FEB-2004 01 04-JAN-2004 05-MAY-2005 12-MARCH-2004 01 05-FEB-2006 (SYSMIS) 04-MAR-2006 01 05-FEB-2006 (SYSMIS) 01 5-JUL-2006 02 04-JUL-2003 06-DEC-200302 05-AUG-2003 02 04-JUL-2003 06-DEC-200302 06-SEP-2003 02 04-JUL-2003 06-DEC-200302 08-NOV-2003 Richard Ristow posted an excellent solution to a similar problem (see Wed. 23 2008) using the XSAVE command to produce a new dataset, but in that problem, there was only one dataset (periods of care) analysed to give number of patients in hospital per day, whereas here there are two datasets involved. I wonder if there is a command similar to XSAVE that allows you to input one record from a file at a time, and compare that with the active dataset, producing a new dataset that can be built up with XSAVE? I hope my request conforms to the excellent guidelines suggested by Marta Garcia Granero (19 Aug. 2008)! If any one can suggest a solution I would be very grateful! Many thanks, Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Clive,
Oh, yes, this is a very interesting problem. Please verify an observation. 1) All review dates have to fall within the hospital enter and leave dates where a sysmis leave date is understood to mean some far future date. My first scheme (absolutely untested) is to convert dataset 2 to 'wide' using casetovars. Call the resulting file 'dataset2A' Then do a match files with dataset 1 and dataset 2A as the table. So now, each hospital stay has all the review dates attached. Next, put the review dates in a vector and test each against the enter and leave dates. Review dates that are outside the enter-leave date interval are set to sysmis. It might be a good idea to reposition review date values in the vector so that the list of review dates have no preceding or embedded sysmis values. Then do varstocases to 'unwind', as Richard says, the dataset. Gene Maguin >>I want to match a dataset of patients in hospital for specified periods (data set 1) with a dataset of patients and review dates (data set 2), so that each review date for the patient is linked to the relevant hospital period (via id and review date), to produce a 'long' file (ie each case has the period of care dates *and* the review dates). The end product should look like data set 3. Data set (1). * DATA SET 1. * NB 2nd record has a null end date to show the period of care has not yet terminated. DATA LIST FREE/ id(A2) POCs(DATE11) POCend(DATE11). BEGIN DATA 01 04-JAN-2004 05-MAY-2005 01 05-FEB-2006 . 02 04-JUL-2003 06-DEC-2003 END DATA. Data set (2) * DATA SET 2. DATA LIST FREE/ id(A2) revdate(DATE11). BEGIN DATA 01 05-FEB-2004 01 12-MARCH-2004 01 04-MAR-2006 01 5-JUL-2006 02 05-AUG-2003 02 06-SEP-2003 02 08-NOV-2003 END DATA. * DATA SET 3. 01 04-JAN-2004 05-MAY-2005 05-FEB-2004 01 04-JAN-2004 05-MAY-2005 12-MARCH-2004 01 05-FEB-2006 (SYSMIS) 04-MAR-2006 01 05-FEB-2006 (SYSMIS) 01 5-JUL-2006 02 04-JUL-2003 06-DEC-200302 05-AUG-2003 02 04-JUL-2003 06-DEC-200302 06-SEP-2003 02 04-JUL-2003 06-DEC-200302 08-NOV-2003 Richard Ristow posted an excellent solution to a similar problem (see Wed. 23 2008) using the XSAVE command to produce a new dataset, but in that problem, there was only one dataset (periods of care) analysed to give number of patients in hospital per day, whereas here there are two datasets involved. I wonder if there is a command similar to XSAVE that allows you to input one record from a file at a time, and compare that with the active dataset, producing a new dataset that can be built up with XSAVE? I hope my request conforms to the excellent guidelines suggested by Marta Garcia Granero (19 Aug. 2008)! If any one can suggest a solution I would be very grateful! Many thanks, Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Clive Downs
Hi Gene,
Thanks again for your response. Since posting the question I have looked at another approach, namely to (a) produce in Access the Cartesian product of the reviews file and the hospital stay date files (done via a simple point and click query) (b)import the Access query into SPSS using GET DATA but using the criteria screen to specify review dates between the hospital start and end dates (not forgetting to cater for null end dates). This generates the SQL in syntax, and produces a dataset with reviews matched to the right hospital stay period. I have tested this with the data shown in the message and it seems to work. The syntax with SQL is: GET DATA /TYPE=ODBC /CONNECT= 'DSN=MS Access Database;DBQ=H:\mtarry\POC.mdb;DriverId=25;FIL=MS'+ ' Access;MaxBufferSize=2048;PageTimeout=5;' /SQL = 'SELECT id, POCs, POCend, revdate FROM getreviewPOCs WHERE'+ ' (revdate >= POCs AND revdate <= POCend OR revdate >= POCs ' 'AND POCend IS null )' /ASSUMEDSTRWIDTH=5 . CACHE. It would be neater to be able to do it without having to do the step in Access first, but on the other hand, there isn't a lot to do in Access, and the code for the query is only a few lines. Also, it is conceptually perhaps a bit simpler than having to set up vectors. However, I haven't looked at your solution yet - I will do so and get back to you to see how it works out, Thanks again, regards, Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Clive:
Can you describe with some detail how to "produce the Cartesian product" of the two data files Jim Marks Director of Market Research x 1616 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Clive Downs Sent: Thursday, December 04, 2008 9:51 AM To: [hidden email] Subject: Re: Patient stays revisited Hi Gene, Thanks again for your response. Since posting the question I have looked at another approach, namely to (a) produce in Access the Cartesian product of the reviews file and the hospital stay date files (done via a simple point and click query) (b)import the Access query into SPSS using GET DATA but using the criteria screen to specify review dates between the hospital start and end dates (not forgetting to cater for null end dates). This generates the SQL in syntax, and produces a dataset with reviews matched to the right hospital stay period. I have tested this with the data shown in the message and it seems to work. The syntax with SQL is: GET DATA /TYPE=ODBC /CONNECT= 'DSN=MS Access Database;DBQ=H:\mtarry\POC.mdb;DriverId=25;FIL=MS'+ ' Access;MaxBufferSize=2048;PageTimeout=5;' /SQL = 'SELECT id, POCs, POCend, revdate FROM getreviewPOCs WHERE'+ ' (revdate >= POCs AND revdate <= POCend OR revdate >= POCs ' 'AND POCend IS null )' /ASSUMEDSTRWIDTH=5 . CACHE. It would be neater to be able to do it without having to do the step in Access first, but on the other hand, there isn't a lot to do in Access, and the code for the query is only a few lines. Also, it is conceptually perhaps a bit simpler than having to set up vectors. However, I haven't looked at your solution yet - I will do so and get back to you to see how it works out, Thanks again, regards, Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
It's the same as a cross join every row of one table are joined to every row of another table. So you get a potentially huuuuuuuge table of n x m rows.
--- On Thu, 12/4/08, Marks, Jim <[hidden email]> wrote: > From: Marks, Jim <[hidden email]> > Subject: Re: Patient stays revisited > To: [hidden email] > Date: Thursday, December 4, 2008, 5:30 PM > Clive: > > Can you describe with some detail how to "produce the > Cartesian product" > of the two data files > > Jim Marks > Director of Market Research > x 1616 > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > Clive Downs > Sent: Thursday, December 04, 2008 9:51 AM > To: [hidden email] > Subject: Re: Patient stays revisited > > Hi Gene, > > Thanks again for your response. Since posting the question > I have looked > at > another approach, namely to > > (a) produce in Access the Cartesian product of the reviews > file and the > hospital stay date files (done via a simple point and click > query) > > (b)import the Access query into SPSS using GET DATA but > using the > criteria > screen to specify review dates between the hospital start > and end dates > (not forgetting to cater for null end dates). > > This generates the SQL in syntax, and produces a dataset > with reviews > matched to the right hospital stay period. > > I have tested this with the data shown in the message and > it seems to > work. > The syntax with SQL is: > > GET DATA /TYPE=ODBC /CONNECT= > 'DSN=MS Access > Database;DBQ=H:\mtarry\POC.mdb;DriverId=25;FIL=MS'+ > ' Access;MaxBufferSize=2048;PageTimeout=5;' > /SQL = 'SELECT id, POCs, POCend, revdate FROM > getreviewPOCs > WHERE'+ > ' (revdate >= POCs AND revdate <= POCend OR > revdate >= POCs ' > 'AND POCend IS null )' > /ASSUMEDSTRWIDTH=5 > . > CACHE. > > > It would be neater to be able to do it without having to do > the step in > Access first, but on the other hand, there isn't a lot > to do in Access, > and > the code for the query is only a few lines. > > Also, it is conceptually perhaps a bit simpler than having > to set up > vectors. > > However, I haven't looked at your solution yet - I will > do so and get > back > to you to see how it works out, > > Thanks again, > > regards, > > Clive. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks-- My files are likely to go beyond the limits of access, so I'll
pursue the other option. -jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-jan Roskam Sent: Thursday, December 04, 2008 10:49 AM To: [hidden email] Subject: Re: Patient stays revisited It's the same as a cross join every row of one table are joined to every row of another table. So you get a potentially huuuuuuuge table of n x m rows. --- On Thu, 12/4/08, Marks, Jim <[hidden email]> wrote: > From: Marks, Jim <[hidden email]> > Subject: Re: Patient stays revisited > To: [hidden email] > Date: Thursday, December 4, 2008, 5:30 PM > Clive: > > Can you describe with some detail how to "produce the > Cartesian product" > of the two data files > > Jim Marks > Director of Market Research > x 1616 > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > Clive Downs > Sent: Thursday, December 04, 2008 9:51 AM > To: [hidden email] > Subject: Re: Patient stays revisited > > Hi Gene, > > Thanks again for your response. Since posting the question > I have looked > at > another approach, namely to > > (a) produce in Access the Cartesian product of the reviews > file and the > hospital stay date files (done via a simple point and click > query) > > (b)import the Access query into SPSS using GET DATA but > using the > criteria > screen to specify review dates between the hospital start > and end dates > (not forgetting to cater for null end dates). > > This generates the SQL in syntax, and produces a dataset > with reviews > matched to the right hospital stay period. > > I have tested this with the data shown in the message and > it seems to > work. > The syntax with SQL is: > > GET DATA /TYPE=ODBC /CONNECT= > 'DSN=MS Access > Database;DBQ=H:\mtarry\POC.mdb;DriverId=25;FIL=MS'+ > ' Access;MaxBufferSize=2048;PageTimeout=5;' > /SQL = 'SELECT id, POCs, POCend, revdate FROM > getreviewPOCs > WHERE'+ > ' (revdate >= POCs AND revdate <= POCend OR > revdate >= POCs ' > 'AND POCend IS null )' > /ASSUMEDSTRWIDTH=5 > . > CACHE. > > > It would be neater to be able to do it without having to do > the step in > Access first, but on the other hand, there isn't a lot > to do in Access, > and > the code for the query is only a few lines. > > Also, it is conceptually perhaps a bit simpler than having > to set up > vectors. > > However, I haven't looked at your solution yet - I will > do so and get > back > to you to see how it works out, > > Thanks again, > > regards, > > Clive. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Clive Downs
Hi,
Thanks to Jim and Albert for the further comments. First of all, my mistake - I said produce the Cartesian product of the two tables, in fact, this is not the type of join to produce. I meant a join where all records from reviews are matched with only those from Periods of Care where the id matches (I think this is a right inner join in relational database terminology). And to answer Gene's question, yes the null period of care end date means the period is 'censored' ie at the time of data collection, the period had not terminated. Anyway, the join described produces the following SQL, (it is easy to produce in Access by point and click) SELECT POC.id, POC.POCs, POC.POCend, reviews.revdate FROM POC RIGHT JOIN reviews ON POC.id = reviews.id ORDER BY POC.id, POC.POCs; and produces the following dataset of 11 records: id POCs POCend revdate 01 04/01/2004 05/05/2005 05/07/2006 01 04/01/2004 05/05/2005 04/03/2006 01 04/01/2004 05/05/2005 12/03/2004 01 04/01/2004 05/05/2005 05/02/2004 01 05/02/2006 05/07/2006 01 05/02/2006 04/03/2006 01 05/02/2006 12/03/2004 01 05/02/2006 05/02/2004 02 04/07/2003 06/12/2003 08/11/2003 02 04/07/2003 06/12/2003 06/09/2003 02 04/07/2003 06/12/2003 05/08/2003 It is then very easy to use the SPSS menu system to import the query from Access by specifying the conditions (ie that review date is between the care start date and care end date). Hope this clarifies the process, Regards, Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Clive Downs
At 05:03 AM 12/4/2008, Clive Downs wrote:
>I want to match a dataset of patients in hospital for specified >periods (data set 1): |-----------------------------|---------------------------| |Output Created |06-DEC-2008 10:33:07 | |-----------------------------|---------------------------| [D1.CarePeriod] id POCs POCend 01 04-JAN-2004 05-MAY-2005 01 05-FEB-2006 . 02 04-JUL-2003 06-DEC-2003 Number of cases read: 3 Number of cases listed: 3 >with a dataset of patients and review dates (data set 2): |-----------------------------|---------------------------| |Output Created |06-DEC-2008 10:33:24 | |-----------------------------|---------------------------| [D2.ReviewDate] id revdate 01 05-FEB-2004 01 12-MAR-2004 01 04-MAR-2006 01 05-JUL-2006 02 05-AUG-2003 02 06-SEP-2003 02 08-NOV-2003 Number of cases read: 7 Number of cases listed: 7 >so that each review date for the patient is linked to the relevant >hospital period (via id and review date). The end product should look like, > >id POCs POCend revdate >01 04-JAN-2004 05-MAY-2005 05-FEB-2004 >01 04-JAN-2004 05-MAY-2005 12-MARCH-2004 >01 05-FEB-2006 (SYSMIS) 04-MAR-2006 >01 05-FEB-2006 (SYSMIS) 01 5-JUL-2006 >02 04-JUL-2003 06-DEC-200302 05-AUG-2003 >02 04-JUL-2003 06-DEC-200302 06-SEP-2003 >02 04-JUL-2003 06-DEC-200302 08-NOV-2003 As has been proposed, this can be solved using CASESTOVARS & VECTOR logic; or, when the data is in a SQL database, by SQL joins. But there's also a natural solution with SPSS sequential logic. The following is tested, with thanks for excellent test data: * For merging: define the 'review date' for care periods as . * the beginning of the period: . ADD FILES /FILE=D1.CarePeriod. NUMERIC revdate(DATE11). COMPUTE revdate=POCs. * Interleave the files, with period-of-care records preceding . * review-date records when dates are tied: . ADD FILES /FILE=* /IN=PeriodRec /FILE=D2.ReviewDate /BY id revdate /KEEP=id POCs POCend revdate ALL. * Carry over period-of-care dates to review-date records, and . * drop period-of-care records: . DO IF NOT PeriodRec. . COMPUTE POCs = LAG(POCs). . COMPUTE POCend = LAG(POCend). END IF. SELECT IF NOT PeriodRec. LIST. |-----------------------------|---------------------------| |Output Created |06-DEC-2008 10:52:54 | |-----------------------------|---------------------------| id POCs POCend revdate PeriodRec 01 04-JAN-2004 05-MAY-2005 05-FEB-2004 0 01 04-JAN-2004 05-MAY-2005 12-MAR-2004 0 01 05-FEB-2006 . 04-MAR-2006 0 01 05-FEB-2006 . 05-JUL-2006 0 02 04-JUL-2003 06-DEC-2003 05-AUG-2003 0 02 04-JUL-2003 06-DEC-2003 06-SEP-2003 0 02 04-JUL-2003 06-DEC-2003 08-NOV-2003 0 Number of cases read: 7 Number of cases listed: 7 ===================================== APPENDIX: Test data, and code Contains LIST statements that are not included in the posted output. (WRR: not saved separately) ===================================== * DATA SET 1 - Periods of care . * NB 2nd record has a null end date to show the period of care . * has not yet terminated. . DATA LIST FREE/ id(A2) POCs(DATE11) POCend(DATE11). BEGIN DATA 01 04-JAN-2004 05-MAY-2005 01 05-FEB-2006 . 02 04-JUL-2003 06-DEC-2003 END DATA. DATASET NAME D1.CarePeriod WINDOW=FRONT. * DATA SET 2 - Review dates . DATA LIST FREE/ id(A2) revdate(DATE11). BEGIN DATA 01 05-FEB-2004 01 12-MARCH-2004 01 04-MAR-2006 01 5-JUL-2006 02 05-AUG-2003 02 06-SEP-2003 02 08-NOV-2003 END DATA. DATASET NAME D2.ReviewDate WINDOW=FRONT. * ..... Post after this point ........ . DATASET ACTIVATE D1.CarePeriod WINDOW=FRONT. LIST. DATASET ACTIVATE D2.ReviewDate WINDOW=FRONT. LIST. NEW FILE. * For merging: define the 'review date' for care periods as . * the beginning of the period: . ADD FILES /FILE=D1.CarePeriod. NUMERIC revdate(DATE11). COMPUTE revdate=POCs. * Interleave the files, with period-of-care records preceding . * review-date records when dates are tied: . ADD FILES /FILE=* /IN=PeriodRec /FILE=D2.ReviewDate /BY id revdate /KEEP=id POCs POCend revdate ALL. . /**/ LIST /*-*/. * Carry over period-of-care dates to review-date records, and . * drop period-of-care records: . DO IF NOT PeriodRec. . COMPUTE POCs = LAG(POCs). . COMPUTE POCend = LAG(POCend). END IF. . /**/ LIST /*-*/. SELECT IF NOT PeriodRec. LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Clive Downs
Hi Richard,
Thank you very much for the self-contained solution in SPSS syntax. That's a useful further option as it bypasses the need to put the data into an Access database. Also I wasn't aware of the facilities for interleaving records via ADD FILES - thanks. Regards Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
