SPSSX Discussion

(no subject)

Classic

List

Threaded

4 messages Options

Suh-Ing Amy Hsieh

(no subject)

Hi listers:

I am so sorry for the prior
posted information. I do my best to explain it.

The original big claims data
(hospitalization) are “dd2001, dd2002, dd2003, dd2004, dd2005, and
dd2006.” They are monthly claims data and have same variables. If
patients were hospitalized longer than the monthly reporting date, the
claims data had ³ 1 record for the patients at the
same admission and discharge dates. I saw one patient (identified by id,
birthday, in_date, and out_date) who was hospitalized for ³ 1
year, the claims data had around 12 records (or lines or rows) at the same
date of admission (e.g., 20010101) and discharge (e.g., 20020202). In_date
is the admission date and the out_date is discharge date.

My target population is adults
(³ 18
years) with hematological cancers receiving bone marrow transplant (BMT)
from 2001 to 2005. First, I have selected hematological cancers from
dd2001 to dd2006 using ICD-9-CM diagnostic codes (from icd9cd to icd9cd4)
and added annual data set as DATA1. Second, I have limited the target
population to patients undergoing BMT using 10 ICD-9-CM procedure codes
(from icdopcd to icdopcd4). 10 ICD-9-CM procedure codes for BMT are from
4100 to 4109. Third,
I converted birthday and admission dates and calculated ages. Fourth, I
recoded age into 2 groups and selected age ³ 18 years old. Fifth, I have created an index
dd2001_2006 using aggregating (selecting the first record and last record
and summing different fees) and merging functions (adding cases
again). Thus, DATA1 is an
index dd2001_2006 and only 1 record per patient. If patients had received
2nd, 3rd, 4th, or subsequent BMT, those
variables will be added to the DATA1 using different names of
variables. It is occasionally
hard to judge the admission date only for BMT due to coding problems so
that I need pre-BMT chemotherapy records for checking and making decisions
(exclude or not exclude patients).

2 outcomes are overall
survival (from Jan 1, 2001 to Dec 31, 2005) and 30 day readmission of
discharge. The variables of death and date of death have existed in the
DATA1 for several patients because patients have died during BMT. Thus,
the variables of overall survival for remaining patients, who survive
during BMT, will be obtained from dd2001 to dd2006. Also, the variable of
with readmission or without readmission will be obtained from dd2001 to
dd2006 again. Hence, I have created syntax for selecting those adult
patients undergoing BMT using their unique ID (32 length) and saved as
“DATA2.” However, data2 include all records (rows) with
respect to pre-, during, and post-BMT records. I am thinking how to create syntax for keeping pre-BMT
chemotherapy records as one dataset and post-BMT records as one dataset or
dropping BMT records from DATA2.
The key variables for identifying pre-, during, or post-BMT are
each admission date and discharge date from dd2001 to dd2006, although
patients have same id and birthday. The in_date and out_date of pre-BMT
records occur before in_date and out_date of BMT procedures, whereas the
in_date and out_date of post-BMT occur after in_date and out_date of BMT
procedures. Please see below examples:

DATA1 (Index dd2001_2006
à only BMT records):

id
id_sex birthday
in_date

1122ab33c5..
F 19580210
20011215

1134ac34c6.. M 19751122
20050719

2456b578ef.. F 19690516
20030113

ab2457cdg3.. M 19501030
20050413

out_date
e_bedd
tran_cd
icd9cd
icd9cd1
icdopcd

20020208
48
1 20500 6822 4103

20051130
134
4 20153 99685 4105

20030204
22
3 20021 2880 8607

20050720
98
3 20500 2880 9925

icdopcd1 dx_am room_am drug_am med_am…
9925 11664 44160 315227 473461

8607 69120 904218 722973 2579172

4101 11897 137262 138717 378661

4105 40099 358053 831632 1482244

DATA2 (including pre-BMT, during BMT, and
post-BMT records):

id id_sex birthday in_date
out_date
2456b578ef.. F 19690516 20030113

2456b578ef.. F 19690516 20031025 20031204

2456b578ef.. F 19690516 20031025 20031204

1122ab33c5.. F 19580210 20030805 20031001

1122ab33c5.. F 19580210 20030805 20031001

1122ab33c5.. F 19580210 20011215 20020208

ab2457cdg3.. M 19501030 20050413 20050720

ab2457cdg3.. M 19501030 20050413 20050720

ab2457cdg3.. M 19501030 20050413 20050720

ab2457cdg3.. M 19501030 20050817 20051011

ab2457cdg3.. M 19501030 20050817 20051011

e_bedd
tran_cd
icd9cd
icd9cd1 icdopcd
22 3 20021 2880 8607

40 2 20400 2880 Blank

40 4 20400 486 0392

57 2 20500 03482 9925

1 3
20500 1975 9925

48 1 20500 6822 4101

49 2 20500 2880 9925

30 2 20500 2880 9925

19 3 20500 03842 3324

45 2 20500 Blank Blank

10 5 20500 2880 9925

icdopcd1 dx_am room_am drug_am med_am…
4101 11897 137262 138717 378661

Blank 9963 34155 59627 177133

9925 2184 7245 55737 88606

8607 15942 61320 237694 462431

8607 546 1095 0 2005

8607 69120 904218 722973 2579172

3893 16107 55125 364826 633212

3893 15075 196530 119210 471444

9925 10469 147747 80434 295218

Blank 13885 50625 190254 418414

Blank 3573 11250 95807 173013

The same color is stand for the same patients.
Please show me how to create syntax for keeping pre-BMT and post-BMT
records as two separated files. Thank you so much.
Amy Hsieh

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Syntax for keeping or dropping records

(It is helpful always to use a subject line, and to keep the subject
line the same for follow-ups in the same thread.)

At 01:55 PM 12/19/2008, SUH-ING (AMY) HSIEH wrote:

>I am so sorry for the prior posted information. I do my best to explain it.

Here, I'm editing what you wrote, for readability (and greetings to
umaryland, since I'm briefly in the Washington area):

>The original big claims data (hospitalization) are "dd2001, dd2002,
>dd2003, dd2004, dd2005, and dd2006" They are monthly claims data and
>have same variables. If patients were hospitalized longer than the
>monthly reporting date, the claims data had >= 1 record for the
>patients at the same admission and discharge dates. I saw one
>patient (identified by id, birthday, in_date, and out_date) who was
>hospitalized for >= 1 year, the claims data had around 12 records
>(or lines or rows) at the same date of admission (e.g., 20010101)
>and discharge (e.g., 20020202). In_date is the admission date and
>the out_date is discharge date.
>
>My target population is adults (>= 18 years) with hematological
>cancers receiving bone marrow transplant (BMT) from 2001 to 2005.
>[Details of selection logic omitted.] DATA1 is an index dd2001_2006
>and only 1 record per patient.
>
>2 outcomes are overall survival (from Jan 1, 2001 to Dec 31, 2005)
>and 30 day readmission of discharge. The variables of death and date
>of death have existed in the DATA1 for several patients because
>patients have died during BMT. Thus, the variables of overall
>survival for remaining patients, who survive during BMT, will be
>obtained from dd2001 to dd2006 [of other records?]. Also, the
>variable of with readmission or without readmission will be obtained
>from dd2001 to dd2006 [of other admission records?] again. I have
>created syntax for selecting those adult patients undergoing BMT
>using their unique ID (32 length) and saved as "DATA2" However,
>data2 include all records (rows) with respect to pre-, during, and
>post-BMT records. I am thinking how to create syntax for keeping
>pre-BMT chemotherapy records as one dataset and post-BMT records as
>one dataset or dropping BMT records from DATA2.
>
>The key variables for identifying pre-, during, or post-BMT are each
>admission date and discharge date from dd2001 to dd2006, although
>patients have same id and birthday. The in_date and out_date of
>pre-BMT records occur before in_date and out_date of BMT procedures,
>whereas the in_date and out_date of post-BMT occur after in_date and
>out_date of BMT procedures. Please see below examples:

========
The test data came through very, very badly unwrapped, not only with
every column head and datum on a separate line, but many additional
line breaks. (I wonder why that happens so often?) See if this is
easier to understand:

>DATA1 (Index dd2001_2006 [with?] only BMT records):
>
> id Id_sex birthday In_date Out_date
>
> 1122ab33c5.. F 19580210 20011215 20020208
> 1134ac34c6.. M 19751122 20050719 20051130
> 2456b578ef.. F 19690516 20030113 20030204
> ab2457cdg3.. M 19501030 20050413 20050720
>
> E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd Icdopcd1
> 48 1 20500 6822 4103 9925
> 134 4 20153 99685 4105 8607
> 22 3 20021 2880 8607 4101
> 98 3 20500 2880 9925 4105
>
>DATA2 (including pre-BMT, during BMT, and post-BMT records):
>
> id Id_sex birthday In_date Out_date
> 1122ab33c5.. F 19580210 20030805 20031001
> 1122ab33c5.. F 19580210 20030805 20031001
> 1122ab33c5.. F 19580210 20011215 20020208
> 1134ac34c6.. M 19751122 20050719 20051130
> 2456b578ef.. F 19690516 20030113 20030204
> 2456b578ef.. F 19690516 20031025 20031204
> 2456b578ef.. F 19690516 20031025 20031204
> ab2457cdg3.. M 19501030 20050413 20050720
> ab2457cdg3.. M 19501030 20050413 20050720
> ab2457cdg3.. M 19501030 20050413 20050720
> ab2457cdg3.. M 19501030 20050817 20051011
> ab2457cdg3.. M 19501030 20050817 20051011
>
> E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd Icdopcd1
> 57 2 20500 03482 9925 8607
> 1 3 20500 1975 9925 8607
> 48 1 20500 6822 4103 9925
> 134 4 20153 99685 4105 8607
> 22 3 20021 2880 8607 4101
> 40 2 20400 2880 Blank Blank
> 40 4 20400 486 0392 9925
> 49 2 20500 2880 9925 4105
> 30 2 20500 2880 9925 3893
> 19 3 20500 03842 3324 9925
> 45 2 20500 Blank Blank Blank
> 10 5 20500 2880 9925 Blank
>
> Dx_am Room_am Drug_am Med_am
> 15942 61320 237694 462431
> 546 1095 0 2005
> 69120 904218 722973 2579172
> 69120 904218 722973 2579172
> 11897 137262 138717 378661
> 9963 34155 59627 177133
> 2184 7245 55737 88608
> 16107 55125 364826 633212
> 15075 196530 119210 471444
> 10469 147747 80434 295218
> 13885 50625 190254 418414
> 3573 11250 95807 173013

It sounds like you want to attach data regarding the bone-marrow
transplant (BMT) from DATA1, to every record in DATA2, for selection
and comparison.

See how far this gets you. It's not tested, and I don't think I
understand everything:

id Id_sex birthday In_date Out_date

1122ab33c5.. F 19580210 20011215 20020208
1134ac34c6.. M 19751122 20050719 20051130
2456b578ef.. F 19690516 20030113 20030204
ab2457cdg3.. M 19501030 20050413 20050720

GET FILE=DATA1
/RENAME= (In_date Out_date =
BMT_InDt BMT_OutDt)
/KEEP= id BMT_InDt BMT_OutDt.

MATCH FILES
/TABLE=*
/FILE =DATA2
/BY id.

Then, you have the BMT dates on all the admission records, and you
can compare for 'before' and 'after', etc. I assume that you have
all your dates stored as SPSS date-format variables; if you don't, you should.

-Good luck, and onward,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Syntax for keeping or dropping records

In reply to this post by Suh-Ing Amy Hsieh

Amy,

You should know that color coding does not come through to the list.

OK. I've rearranged what you posted in far more usable structure (see
below). To summarize how I now understand things. You have two files: Data1
and Data2, each made as you describe. Data1 is a file of patients meeting
your selection criteria and having one record per patient. That record is
for the bone marrow transplant (BMT) treatment. Data2 has multiple records
per patient, each record being an incident of chemotherapy. You want to
separate the chemotherapy incidents in Data2 into two groups based on the
BMT incident date in Data1.

I'm now going to assume that you are very skilled with spss. I think you can
do a match files using the table subcommand to match Data1 as the table file
to Data2 using ID. I think you need only a subset of the variables in Data1,
probably just ID and the in and out date variables. This little operation
explicitly assumes that you have exactly one record per patient in Data1 and
exactly one record in Data2 for each combination of ID and in and out date.
If you don't, then you have more trouble. Not insurmountable trouble, but
definitely more.

Once the match files is complete, you can compare in and out dates from the
Data2 records against those from the Data1 records to identify pre and post
BMT incidents.

Does this help you?

Gene Maguin

****************************************
The examples of data are messy. So, I repost it again. The original big
claims data (hospitalization) are “dd2001, dd2002, dd2003, dd2004,
dd2005, and dd2006.” They are monthly claims data and have same
variables. If patients were hospitalized longer than the monthly reporting
date, the claims data had > 1 record for the patients at the same admission
and discharge dates. I saw one patient (identified by id, birthday, in_date,
and out_date) who was hospitalized for > 1 year, the claims data had around
12 records (or lines or rows) at the same date of admission (e.g., 20010101)
and discharge (e.g., 20020202). In_date is the admission date and the
out_date is discharge date.

My target population is adults (> 18 years) with hematological cancers
receiving bone marrow transplant (BMT) from 2001 to 2005. First, I have
selected hematological cancers from dd2001 to dd2006 using ICD-9-CM
diagnostic codes (from icd9cd to icd9cd4) and added annual data set as
DATA1. Second, I have limited the target population to patients undergoing
BMT using 10 ICD-9-CM procedure codes (from icdopcd to icdopcd4). 10
ICD-9-CM procedure codes for BMT are from 4100 to 4109. Third, I converted
birthday and admission dates and calculated ages. Fourth, I recoded age into
2 groups and selected age ³ 18 years old. Fifth, I have created an index
dd2001_2006 using aggregating (selecting the first record and last record
and summing different fees) and merging functions (adding cases again).
Thus, DATA1 is an index dd2001_2006 and only 1 record per patient. If
patients had received 2nd, 3rd, 4th, or subsequent BMT, those variables will
be added to the DATA1 using different names of variables. It is
occasionally hard to judge the admission date only for BMT due to coding
problems so that I need pre-BMT chemotherapy records for checking and making
decisions (exclude or not exclude patients).

2 outcomes are overall survival (from Jan 1, 2001 to Dec 31, 2005) and 30
day readmission of discharge. The variables of death and date of death have
existed in the DATA1 for several patients because patients have died during
BMT. Thus, the variables of overall survival for remaining patients, who
survive during BMT, will be obtained from dd2001 to dd2006. Also, the
variable of with readmission or without readmission will be obtained from
dd2001 to dd2006 again. Hence, I have created syntax for selecting those
adult patients undergoing BMT using their unique ID (32 length) and saved as
“DATA2.” However, data2 include all records (rows) with respect
to pre-, during, and post-BMT records. I am thinking how to create syntax
for keeping pre-BMT chemotherapy records as one dataset and post-BMT records
as one dataset or dropping BMT records from DATA2. The key variables for
identifying pre-, during, or post-BMT are each admission date and discharge
date from dd2001 to dd2006, although patients have same id and birthday. The
in_date and out_date of pre-BMT records occur before in_date and out_date of
BMT procedures, whereas the in_date and out_date of post-BMT occur after
in_date and out_date of BMT procedures. Please see below examples:

DATA1 (Index dd2001_2006 à only BMT records):

id Id_sex birthday In_date Out_date E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd
Icdopcd1 Dx_am Room_am Drug_am Med_am
1122ab33c5.. F 19580210 20011215 20020208 48 1 20500 6822 4103 9925 11664
44160 315227 473461
1134ac34c6.. M 19751122 20050719 20051130 134 4 20153 99685 4105 8607 69120
904218 722973 2579172
2456b578ef.. F 19690516 20030113 20030204 22 3 20021 2880 8607 4101 11897
137262 138717 378661
ab2457cdg3.. M 19501030 20050413 20050720 98 3 20500 2880 9925 4105 40099
358053 831632 1482244

DATA2 (including pre-BMT, during BMT, and post-BMT records):

id Id_sex birthday In_date Out_date E_bedd Tran_cd Icd9cd Icd9cd1 icdopcd
Icdopcd1 Dx_am Room_am Drug_am Med_am
1122ab33c5.. F 19580210 20030805 20031001 57 2 20500 03482 9925 8607 15942
61320 237694 462431
1122ab33c5.. F 19580210 20030805 20031001 1 3 20500 1975 9925 8607 546 1095
0 2005
1122ab33c5.. F 19580210 20011215 20020208 48 1 20500 6822 4103 9925 69120
904218 722973 2579172
1134ac34c6.. M 19751122 20050719 20051130 134 4 20153 99685 4105 8607 69120
904218 722973 2579172
2456b578ef.. F 19690516 20030113 20030204 22 3 20021 2880 8607 4101 11897
137262 138717 378661
2456b578ef.. F 19690516 20031025 20031204 40 2 20400 2880 Blank Blank 9963
34155 59627 177133
2456b578ef.. F 19690516 20031025 20031204 40 4 20400 486 0392 9925 2184 7245
55737 88608
ab2457cdg3.. M 19501030 20050413 20050720 49 2 20500 2880 9925 4105 16107
55125 364826 633212
ab2457cdg3.. M 19501030 20050413 20050720 30 2 20500 2880 9925 3893 15075
196530 119210 471444
ab2457cdg3.. M 19501030 20050413 20050720 19 3 20500 03842 3324 9925 10469
147747 80434 295218
ab2457cdg3.. M 19501030 20050817 20051011 45 2 20500 Blank Blank Blank 13885
50625 190254 418414
ab2457cdg3.. M 19501030 20050817 20051011 10 5 20500 2880 9925 Blank 3573
11250 95807 173013

Please show me how to create syntax for keeping pre-BMT and post-BMT records
as two separated files. Thank you so much.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Clive Downs

Re: Syntax for keeping or dropping records

Hi Gene and Amy,

Assuming Gene's interpretation of the problem is correct, I have suggested
some syntax to do what I think is needed. I have used a highly simplified
version of the two datasets that I hope captures the essential part of the
problem. the second dataset has records of chemo only for patient 001.

The syntax should identify each chemo record as pre, during or post. You
can then filter as needed.

I hope this helps.

--------------------------------------------------------

* set up data 1 patients and BMT in and out dates.
*--------------------------------------------------------------------.

DATA LIST FREE/ id(A3) BMin(DATE) BMout(DATE).
BEGIN DATA

001 1/Mar/2002 30/Jun/2002
002 1/Jun/2003 31/Dec/2003
003 1/Jan/2004 30/Nov/2004
END DATA.

SAVE OUTFILE='H:\SPSS-listserve\BMTdata.sav'
/COMPRESSED.

* set up data 2 - same patients but chemo in and out dates , with comment
showing pre-, during- or post BMT.
*------------------------------------------------.

DATA LIST FREE/ id(A3) Chemo_in(DATE) Chemo_out(DATE) comment(A6).

BEGIN DATA

001 1/Jan/2002 15/Jan/2002 pre
001 1/Apr/2002 15/Apr/2002 during
001 1/Jul/2002 15/Jul/2002 post
END DATA.

SAVE OUTFILE='H:\SPSS-listserve\Chemodata.sav'
/COMPRESSED.
GET
FILE='H:\SPSS-listserve\BMTdata.sav'.
DATASET NAME DataSet2 WINDOW=FRONT.

* match files to get BMT dates for each patient.
MATCH FILES /FILE=*
/TABLE='DataSet2'
/BY id.
EXECUTE.

* save resulting dataset with matched records.
SAVE OUTFILE='H:\SPSS-listserve\ChemoBMTmatched.sav'
/COMPRESSED.

* work out if chemo is pre- during- or post- BMT or exception (code 1, 2,
3, or 4).
*-----------------------------------------------------------------------.
DO IF Chemo_in <BMin AND Chemo_out < BMin.
COMPUTE time = 1.
ELSE IF Chemo_in > BMin AND Chemo_out < BMout.
COMPUTE time = 2.
ELSE IF Chemo_in > BMout.
COMPUTE time = 3.
ELSE.
COMPUTE time = 4.
END IF.
EXE.
*Define Variable Properties.
*time.
VALUE LABELS time
1 'pre'
2 'during'
3 'post'
4 'exception'.
EXECUTE.

Regards

Clive.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD