|
Hi listers:
I am so sorry for the prior posted information. I do my best to explain it. The original big claims data (hospitalization) are “dd2001, dd2002, dd2003, dd2004, dd2005, and dd2006.” They are monthly claims data and have same variables. If patients were hospitalized longer than the monthly reporting date, the claims data had ³ 1 record for the patients at the same admission and discharge dates. I saw one patient (identified by id, birthday, in_date, and out_date) who was hospitalized for ³ 1 year, the claims data had around 12 records (or lines or rows) at the same date of admission (e.g., 20010101) and discharge (e.g., 20020202). In_date is the admission date and the out_date is discharge date. My target population is adults (³ 18 years) with hematological cancers receiving bone marrow transplant (BMT) from 2001 to 2005. First, I have selected hematological cancers from dd2001 to dd2006 using ICD-9-CM diagnostic codes (from icd9cd to icd9cd4) and added annual data set as DATA1. Second, I have limited the target population to patients undergoing BMT using 10 ICD-9-CM procedure codes (from icdopcd to icdopcd4). 10 ICD-9-CM procedure codes for BMT are from 4100 to 4109. Third, I converted birthday and admission dates and calculated ages. Fourth, I recoded age into 2 groups and selected age ³ 18 years old. Fifth, I have created an index dd2001_2006 using aggregating (selecting the first record and last record and summing different fees) and merging functions (adding cases again). Thus, DATA1 is an index dd2001_2006 and only 1 record per patient. If patients had received 2nd, 3rd, 4th, or subsequent BMT, those variables will be added to the DATA1 using different names of variables. It is occasionally hard to judge the admission date only for BMT due to coding problems so that I need pre-BMT chemotherapy records for checking and making decisions (exclude or not exclude patients). 2 outcomes are overall survival (from Jan 1, 2001 to Dec 31, 2005) and 30 day readmission of discharge. The variables of death and date of death have existed in the DATA1 for several patients because patients have died during BMT. Thus, the variables of overall survival for remaining patients, who survive during BMT, will be obtained from dd2001 to dd2006. Also, the variable of with readmission or without readmission will be obtained from dd2001 to dd2006 again. Hence, I have created syntax for selecting those adult patients undergoing BMT using their unique ID (32 length) and saved as “DATA2.” However, data2 include all records (rows) with respect to pre-, during, and post-BMT records. I am thinking how to create syntax for keeping pre-BMT chemotherapy records as one dataset and post-BMT records as one dataset or dropping BMT records from DATA2. The key variables for identifying pre-, during, or post-BMT are each admission date and discharge date from dd2001 to dd2006, although patients have same id and birthday. The in_date and out_date of pre-BMT records occur before in_date and out_date of BMT procedures, whereas the in_date and out_date of post-BMT occur after in_date and out_date of BMT procedures. Please see below examples: DATA1 (Index dd2001_2006 à only BMT records): id id_sex birthday in_date 1122ab33c5.. F 19580210 20011215 1134ac34c6.. M 19751122 20050719 2456b578ef.. F 19690516 20030113 ab2457cdg3.. M 19501030 20050413 out_date e_bedd tran_cd icd9cd icd9cd1 icdopcd 20020208 48 1 20500 6822 4103 20051130 134 4 20153 99685 4105 20030204 22 3 20021 2880 8607 20050720 98 3 20500 2880 9925 icdopcd1 dx_am room_am drug_am med_am… 9925 11664 44160 315227 473461 8607 69120 904218 722973 2579172 4101 11897 137262 138717 378661 4105 40099 358053 831632 1482244 DATA2 (including pre-BMT, during BMT, and post-BMT records): id id_sex birthday in_date out_date 2456b578ef.. F 19690516 20030113 2456b578ef.. F 19690516 20031025 20031204 2456b578ef.. F 19690516 20031025 20031204 1122ab33c5.. F 19580210 20030805 20031001 1122ab33c5.. F 19580210 20030805 20031001 1122ab33c5.. F 19580210 20011215 20020208 ab2457cdg3.. M 19501030 20050413 20050720 ab2457cdg3.. M 19501030 20050413 20050720 ab2457cdg3.. M 19501030 20050413 20050720 ab2457cdg3.. M 19501030 20050817 20051011 ab2457cdg3.. M 19501030 20050817 20051011 e_bedd tran_cd icd9cd icd9cd1 icdopcd 22 3 20021 2880 8607 40 2 20400 2880 Blank 40 4 20400 486 0392 57 2 20500 03482 9925 1 3 20500 1975 9925 48 1 20500 6822 4101 49 2 20500 2880 9925 30 2 20500 2880 9925 19 3 20500 03842 3324 45 2 20500 Blank Blank 10 5 20500 2880 9925 icdopcd1 dx_am room_am drug_am med_am… 4101 11897 137262 138717 378661 Blank 9963 34155 59627 177133 9925 2184 7245 55737 88606 8607 15942 61320 237694 462431 8607 546 1095 0 2005 8607 69120 904218 722973 2579172 3893 16107 55125 364826 633212 3893 15075 196530 119210 471444 9925 10469 147747 80434 295218 Blank 13885 50625 190254 418414 Blank 3573 11250 95807 173013 The same color is stand for the same patients. Please show me how to create syntax for keeping pre-BMT and post-BMT records as two separated files. Thank you so much. Amy Hsieh ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
(It is helpful always to use a subject line, and to keep the subject
line the same for follow-ups in the same thread.) At 01:55 PM 12/19/2008, SUH-ING (AMY) HSIEH wrote: >I am so sorry for the prior posted information. I do my best to explain it. Here, I'm editing what you wrote, for readability (and greetings to umaryland, since I'm briefly in the Washington area): >The original big claims data (hospitalization) are "dd2001, dd2002, >dd2003, dd2004, dd2005, and dd2006" They are monthly claims data and >have same variables. If patients were hospitalized longer than the >monthly reporting date, the claims data had >= 1 record for the >patients at the same admission and discharge dates. I saw one >patient (identified by id, birthday, in_date, and out_date) who was >hospitalized for >= 1 year, the claims data had around 12 records >(or lines or rows) at the same date of admission (e.g., 20010101) >and discharge (e.g., 20020202). In_date is the admission date and >the out_date is discharge date. > >My target population is adults (>= 18 years) with hematological >cancers receiving bone marrow transplant (BMT) from 2001 to 2005. >[Details of selection logic omitted.] DATA1 is an index dd2001_2006 >and only 1 record per patient. > >2 outcomes are overall survival (from Jan 1, 2001 to Dec 31, 2005) >and 30 day readmission of discharge. The variables of death and date >of death have existed in the DATA1 for several patients because >patients have died during BMT. Thus, the variables of overall >survival for remaining patients, who survive during BMT, will be >obtained from dd2001 to dd2006 [of other records?]. Also, the >variable of with readmission or without readmission will be obtained >from dd2001 to dd2006 [of other admission records?] again. I have >created syntax for selecting those adult patients undergoing BMT >using their unique ID (32 length) and saved as "DATA2" However, >data2 include all records (rows) with respect to pre-, during, and >post-BMT records. I am thinking how to create syntax for keeping >pre-BMT chemotherapy records as one dataset and post-BMT records as >one dataset or dropping BMT records from DATA2. > >The key variables for identifying pre-, during, or post-BMT are each >admission date and discharge date from dd2001 to dd2006, although >patients have same id and birthday. The in_date and out_date of >pre-BMT records occur before in_date and out_date of BMT procedures, >whereas the in_date and out_date of post-BMT occur after in_date and >out_date of BMT procedures. Please see below examples: The test data came through very, very badly unwrapped, not only with every column head and datum on a separate line, but many additional line breaks. (I wonder why that happens so often?) See if this is easier to understand: >DATA1 (Index dd2001_2006 [with?] only BMT records): > > id Id_sex birthday In_date Out_date > > 1122ab33c5.. F 19580210 20011215 20020208 > 1134ac34c6.. M 19751122 20050719 20051130 > 2456b578ef.. F 19690516 20030113 20030204 > ab2457cdg3.. M 19501030 20050413 20050720 > > E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd Icdopcd1 > 48 1 20500 6822 4103 9925 > 134 4 20153 99685 4105 8607 > 22 3 20021 2880 8607 4101 > 98 3 20500 2880 9925 4105 > >DATA2 (including pre-BMT, during BMT, and post-BMT records): > > id Id_sex birthday In_date Out_date > 1122ab33c5.. F 19580210 20030805 20031001 > 1122ab33c5.. F 19580210 20030805 20031001 > 1122ab33c5.. F 19580210 20011215 20020208 > 1134ac34c6.. M 19751122 20050719 20051130 > 2456b578ef.. F 19690516 20030113 20030204 > 2456b578ef.. F 19690516 20031025 20031204 > 2456b578ef.. F 19690516 20031025 20031204 > ab2457cdg3.. M 19501030 20050413 20050720 > ab2457cdg3.. M 19501030 20050413 20050720 > ab2457cdg3.. M 19501030 20050413 20050720 > ab2457cdg3.. M 19501030 20050817 20051011 > ab2457cdg3.. M 19501030 20050817 20051011 > > E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd Icdopcd1 > 57 2 20500 03482 9925 8607 > 1 3 20500 1975 9925 8607 > 48 1 20500 6822 4103 9925 > 134 4 20153 99685 4105 8607 > 22 3 20021 2880 8607 4101 > 40 2 20400 2880 Blank Blank > 40 4 20400 486 0392 9925 > 49 2 20500 2880 9925 4105 > 30 2 20500 2880 9925 3893 > 19 3 20500 03842 3324 9925 > 45 2 20500 Blank Blank Blank > 10 5 20500 2880 9925 Blank > > Dx_am Room_am Drug_am Med_am > 15942 61320 237694 462431 > 546 1095 0 2005 > 69120 904218 722973 2579172 > 69120 904218 722973 2579172 > 11897 137262 138717 378661 > 9963 34155 59627 177133 > 2184 7245 55737 88608 > 16107 55125 364826 633212 > 15075 196530 119210 471444 > 10469 147747 80434 295218 > 13885 50625 190254 418414 > 3573 11250 95807 173013 It sounds like you want to attach data regarding the bone-marrow transplant (BMT) from DATA1, to every record in DATA2, for selection and comparison. See how far this gets you. It's not tested, and I don't think I understand everything: id Id_sex birthday In_date Out_date 1122ab33c5.. F 19580210 20011215 20020208 1134ac34c6.. M 19751122 20050719 20051130 2456b578ef.. F 19690516 20030113 20030204 ab2457cdg3.. M 19501030 20050413 20050720 GET FILE=DATA1 /RENAME= (In_date Out_date = BMT_InDt BMT_OutDt) /KEEP= id BMT_InDt BMT_OutDt. MATCH FILES /TABLE=* /FILE =DATA2 /BY id. Then, you have the BMT dates on all the admission records, and you can compare for 'before' and 'after', etc. I assume that you have all your dates stored as SPSS date-format variables; if you don't, you should. -Good luck, and onward, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Suh-Ing Amy Hsieh
Amy,
You should know that color coding does not come through to the list. OK. I've rearranged what you posted in far more usable structure (see below). To summarize how I now understand things. You have two files: Data1 and Data2, each made as you describe. Data1 is a file of patients meeting your selection criteria and having one record per patient. That record is for the bone marrow transplant (BMT) treatment. Data2 has multiple records per patient, each record being an incident of chemotherapy. You want to separate the chemotherapy incidents in Data2 into two groups based on the BMT incident date in Data1. I'm now going to assume that you are very skilled with spss. I think you can do a match files using the table subcommand to match Data1 as the table file to Data2 using ID. I think you need only a subset of the variables in Data1, probably just ID and the in and out date variables. This little operation explicitly assumes that you have exactly one record per patient in Data1 and exactly one record in Data2 for each combination of ID and in and out date. If you don't, then you have more trouble. Not insurmountable trouble, but definitely more. Once the match files is complete, you can compare in and out dates from the Data2 records against those from the Data1 records to identify pre and post BMT incidents. Does this help you? Gene Maguin **************************************** The examples of data are messy. So, I repost it again. The original big claims data (hospitalization) are “dd2001, dd2002, dd2003, dd2004, dd2005, and dd2006.” They are monthly claims data and have same variables. If patients were hospitalized longer than the monthly reporting date, the claims data had > 1 record for the patients at the same admission and discharge dates. I saw one patient (identified by id, birthday, in_date, and out_date) who was hospitalized for > 1 year, the claims data had around 12 records (or lines or rows) at the same date of admission (e.g., 20010101) and discharge (e.g., 20020202). In_date is the admission date and the out_date is discharge date. My target population is adults (> 18 years) with hematological cancers receiving bone marrow transplant (BMT) from 2001 to 2005. First, I have selected hematological cancers from dd2001 to dd2006 using ICD-9-CM diagnostic codes (from icd9cd to icd9cd4) and added annual data set as DATA1. Second, I have limited the target population to patients undergoing BMT using 10 ICD-9-CM procedure codes (from icdopcd to icdopcd4). 10 ICD-9-CM procedure codes for BMT are from 4100 to 4109. Third, I converted birthday and admission dates and calculated ages. Fourth, I recoded age into 2 groups and selected age ³ 18 years old. Fifth, I have created an index dd2001_2006 using aggregating (selecting the first record and last record and summing different fees) and merging functions (adding cases again). Thus, DATA1 is an index dd2001_2006 and only 1 record per patient. If patients had received 2nd, 3rd, 4th, or subsequent BMT, those variables will be added to the DATA1 using different names of variables. It is occasionally hard to judge the admission date only for BMT due to coding problems so that I need pre-BMT chemotherapy records for checking and making decisions (exclude or not exclude patients). 2 outcomes are overall survival (from Jan 1, 2001 to Dec 31, 2005) and 30 day readmission of discharge. The variables of death and date of death have existed in the DATA1 for several patients because patients have died during BMT. Thus, the variables of overall survival for remaining patients, who survive during BMT, will be obtained from dd2001 to dd2006. Also, the variable of with readmission or without readmission will be obtained from dd2001 to dd2006 again. Hence, I have created syntax for selecting those adult patients undergoing BMT using their unique ID (32 length) and saved as “DATA2.” However, data2 include all records (rows) with respect to pre-, during, and post-BMT records. I am thinking how to create syntax for keeping pre-BMT chemotherapy records as one dataset and post-BMT records as one dataset or dropping BMT records from DATA2. The key variables for identifying pre-, during, or post-BMT are each admission date and discharge date from dd2001 to dd2006, although patients have same id and birthday. The in_date and out_date of pre-BMT records occur before in_date and out_date of BMT procedures, whereas the in_date and out_date of post-BMT occur after in_date and out_date of BMT procedures. Please see below examples: DATA1 (Index dd2001_2006 à only BMT records): id Id_sex birthday In_date Out_date E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd Icdopcd1 Dx_am Room_am Drug_am Med_am 1122ab33c5.. F 19580210 20011215 20020208 48 1 20500 6822 4103 9925 11664 44160 315227 473461 1134ac34c6.. M 19751122 20050719 20051130 134 4 20153 99685 4105 8607 69120 904218 722973 2579172 2456b578ef.. F 19690516 20030113 20030204 22 3 20021 2880 8607 4101 11897 137262 138717 378661 ab2457cdg3.. M 19501030 20050413 20050720 98 3 20500 2880 9925 4105 40099 358053 831632 1482244 DATA2 (including pre-BMT, during BMT, and post-BMT records): id Id_sex birthday In_date Out_date E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd Icdopcd1 Dx_am Room_am Drug_am Med_am 1122ab33c5.. F 19580210 20030805 20031001 57 2 20500 03482 9925 8607 15942 61320 237694 462431 1122ab33c5.. F 19580210 20030805 20031001 1 3 20500 1975 9925 8607 546 1095 0 2005 1122ab33c5.. F 19580210 20011215 20020208 48 1 20500 6822 4103 9925 69120 904218 722973 2579172 1134ac34c6.. M 19751122 20050719 20051130 134 4 20153 99685 4105 8607 69120 904218 722973 2579172 2456b578ef.. F 19690516 20030113 20030204 22 3 20021 2880 8607 4101 11897 137262 138717 378661 2456b578ef.. F 19690516 20031025 20031204 40 2 20400 2880 Blank Blank 9963 34155 59627 177133 2456b578ef.. F 19690516 20031025 20031204 40 4 20400 486 0392 9925 2184 7245 55737 88608 ab2457cdg3.. M 19501030 20050413 20050720 49 2 20500 2880 9925 4105 16107 55125 364826 633212 ab2457cdg3.. M 19501030 20050413 20050720 30 2 20500 2880 9925 3893 15075 196530 119210 471444 ab2457cdg3.. M 19501030 20050413 20050720 19 3 20500 03842 3324 9925 10469 147747 80434 295218 ab2457cdg3.. M 19501030 20050817 20051011 45 2 20500 Blank Blank Blank 13885 50625 190254 418414 ab2457cdg3.. M 19501030 20050817 20051011 10 5 20500 2880 9925 Blank 3573 11250 95807 173013 Please show me how to create syntax for keeping pre-BMT and post-BMT records as two separated files. Thank you so much. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Gene and Amy,
Assuming Gene's interpretation of the problem is correct, I have suggested some syntax to do what I think is needed. I have used a highly simplified version of the two datasets that I hope captures the essential part of the problem. the second dataset has records of chemo only for patient 001. The syntax should identify each chemo record as pre, during or post. You can then filter as needed. I hope this helps. -------------------------------------------------------- * set up data 1 patients and BMT in and out dates. *--------------------------------------------------------------------. DATA LIST FREE/ id(A3) BMin(DATE) BMout(DATE). BEGIN DATA 001 1/Mar/2002 30/Jun/2002 002 1/Jun/2003 31/Dec/2003 003 1/Jan/2004 30/Nov/2004 END DATA. SAVE OUTFILE='H:\SPSS-listserve\BMTdata.sav' /COMPRESSED. * set up data 2 - same patients but chemo in and out dates , with comment showing pre-, during- or post BMT. *------------------------------------------------. DATA LIST FREE/ id(A3) Chemo_in(DATE) Chemo_out(DATE) comment(A6). BEGIN DATA 001 1/Jan/2002 15/Jan/2002 pre 001 1/Apr/2002 15/Apr/2002 during 001 1/Jul/2002 15/Jul/2002 post END DATA. SAVE OUTFILE='H:\SPSS-listserve\Chemodata.sav' /COMPRESSED. GET FILE='H:\SPSS-listserve\BMTdata.sav'. DATASET NAME DataSet2 WINDOW=FRONT. * match files to get BMT dates for each patient. MATCH FILES /FILE=* /TABLE='DataSet2' /BY id. EXECUTE. * save resulting dataset with matched records. SAVE OUTFILE='H:\SPSS-listserve\ChemoBMTmatched.sav' /COMPRESSED. * work out if chemo is pre- during- or post- BMT or exception (code 1, 2, 3, or 4). *-----------------------------------------------------------------------. DO IF Chemo_in <BMin AND Chemo_out < BMin. COMPUTE time = 1. ELSE IF Chemo_in > BMin AND Chemo_out < BMout. COMPUTE time = 2. ELSE IF Chemo_in > BMout. COMPUTE time = 3. ELSE. COMPUTE time = 4. END IF. EXE. *Define Variable Properties. *time. VALUE LABELS time 1 'pre' 2 'during' 3 'post' 4 'exception'. EXECUTE. Regards Clive. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
