creating categorical variables

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

creating categorical variables

Anthony Santella
Dear All,

I have a large hospital discharge database, and have a question regarding
creating categorical variables.  For each patient, I have the following
variables:

Diagnoses1 ICD-9 Code
Diagnosis1 Name
Diagnosis1 Description
Diagnoses2 ICD-9 Code
Diagnosis2 Name
Diagnosis2 Description
....
Diagnoses8 ICD-9 Code
Diagnosis8 Name
Diagnosis9 Description

I am trying to a create a new categorical variable, number of diagnoses,
with the following groups:

0-1 diagnoses
2-3 diagnoses
4-5 diagnoses
6-7 diagnoses
7-8 diagnoses

Does anyone have any suggestions on how to accomplish this in SPSS?

Thank you in advance,
Anthony J. Santella, MPH
Doctoral Student
Department of Health Systems Management
Tulane University School of Public Health and Tropical Medicine
Cell: 404-276-5961
Email: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: creating categorical variables

Melissa Ives
I would think you could use the aggregate function to count the valid
responses* in the ICD-9 code variables, then recode this response into
your N of diagnoses groups.

*you would probably need to aggregate using N and NMISS (or NU/NUMISS)
and then subtract to get the total N of valid responses.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Anthony Santella
Sent: Wednesday, September 20, 2006 8:59 PM
To: [hidden email]
Subject: [SPSSX-L] creating categorical variables

Dear All,

I have a large hospital discharge database, and have a question
regarding creating categorical variables.  For each patient, I have the
following
variables:

Diagnoses1 ICD-9 Code
Diagnosis1 Name
Diagnosis1 Description
Diagnoses2 ICD-9 Code
Diagnosis2 Name
Diagnosis2 Description
....
Diagnoses8 ICD-9 Code
Diagnosis8 Name
Diagnosis9 Description

I am trying to a create a new categorical variable, number of diagnoses,
with the following groups:

0-1 diagnoses
2-3 diagnoses
4-5 diagnoses
6-7 diagnoses
7-8 diagnoses

Does anyone have any suggestions on how to accomplish this in SPSS?

Thank you in advance,
Anthony J. Santella, MPH
Doctoral Student
Department of Health Systems Management
Tulane University School of Public Health and Tropical Medicine
Cell: 404-276-5961
Email: [hidden email]



PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: creating categorical variables

Scott Czepiel
On 9/21/06, Melissa Ives <[hidden email]> wrote:
>
> I would think you could use the aggregate function to count the valid
> responses* in the ICD-9 code variables, then recode this response into
> your N of diagnoses groups.


That suggestion assumes there are multiple rows per respondent.  It appears
that the original dataset only contains one row per respondent, with 9 sets
of variables pertaining to the diagnoses.  This sounds like a job for the
"count" function.  Let's say you recode whichever variable is most
appropriate to a set of 9 indicators:  diag1, diag2, ..., diag9 where
1=there was a diagnosis and 0=no.  Then:

count totalnum = diag1 to diag9 (1).
freq totalnum.

will tell you how many (from a minimum of 0 to a maximum of 9) diagnoses
each respondent had.
Reply | Threaded
Open this post in threaded view
|

Re: creating categorical variables

Melissa Ives
In reply to this post by Anthony Santella
Yes, in re-reading, I think you are correct.

Another option for variables in a single row per client would be the
nvalid computation....

Compute totalnum=nvalid(dx1,dx2,dx3,...,dx8).
Recode totalnum (0,1=1)(2,3=2)(4,5=3)(6,7=4)(8=8) into ndxr.
Var labels ndxr 'Number of diagnoses-recoded'.

NVALID. NVALID(variable[,..]). Numeric. Returns a count of the arguments
that have valid, non-missing values. This function requires one or more
arguments, which should be variable names in the active dataset.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Scott Czepiel
Sent: Thursday, September 21, 2006 9:12 AM
To: [hidden email]
Subject: Re: [SPSSX-L] creating categorical variables

On 9/21/06, Melissa Ives <[hidden email]> wrote:
>
> I would think you could use the aggregate function to count the valid
> responses* in the ICD-9 code variables, then recode this response into

> your N of diagnoses groups.


That suggestion assumes there are multiple rows per respondent.  It
appears that the original dataset only contains one row per respondent,
with 9 sets of variables pertaining to the diagnoses.  This sounds like
a job for the "count" function.  Let's say you recode whichever variable
is most appropriate to a set of 9 indicators:  diag1, diag2, ..., diag9
where 1=there was a diagnosis and 0=no.  Then:

count totalnum = diag1 to diag9 (1).
freq totalnum.

will tell you how many (from a minimum of 0 to a maximum of 9) diagnoses
each respondent had.



PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: creating categorical variables

Beadle, ViAnn
I'm not sure why Anthony would want to lose precision here by collapsing values unless trying to replicate some other study.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Melissa Ives
Sent: Thursday, September 21, 2006 9:22 AM
To: [hidden email]
Subject: Re: creating categorical variables

Yes, in re-reading, I think you are correct.

Another option for variables in a single row per client would be the
nvalid computation....

Compute totalnum=nvalid(dx1,dx2,dx3,...,dx8).
Recode totalnum (0,1=1)(2,3=2)(4,5=3)(6,7=4)(8=8) into ndxr.
Var labels ndxr 'Number of diagnoses-recoded'.

NVALID. NVALID(variable[,..]). Numeric. Returns a count of the arguments
that have valid, non-missing values. This function requires one or more
arguments, which should be variable names in the active dataset.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Scott Czepiel
Sent: Thursday, September 21, 2006 9:12 AM
To: [hidden email]
Subject: Re: [SPSSX-L] creating categorical variables

On 9/21/06, Melissa Ives <[hidden email]> wrote:
>
> I would think you could use the aggregate function to count the valid
> responses* in the ICD-9 code variables, then recode this response into

> your N of diagnoses groups.


That suggestion assumes there are multiple rows per respondent.  It
appears that the original dataset only contains one row per respondent,
with 9 sets of variables pertaining to the diagnoses.  This sounds like
a job for the "count" function.  Let's say you recode whichever variable
is most appropriate to a set of 9 indicators:  diag1, diag2, ..., diag9
where 1=there was a diagnosis and 0=no.  Then:

count totalnum = diag1 to diag9 (1).
freq totalnum.

will tell you how many (from a minimum of 0 to a maximum of 9) diagnoses
each respondent had.



PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: creating categorical variables

Richard Ristow
In reply to this post by Anthony Santella
At 09:58 PM 9/20/2006, Anthony Santella wrote:

>I have a large hospital discharge database. For each patient, I have
>the following
>variables:
>
>Diagnoses1 ICD-9 Code
>Diagnosis1 Name
>Diagnosis1 Description
>....
>Diagnoses8 ICD-9 Code
>Diagnosis8 Name
>Diagnosis9 Description
>
>I am trying to a create a new categorical variable, number of
>diagnoses, with the following groups:
>
>0-1 diagnoses
>2-3 diagnoses
>4-5 diagnoses
>6-7 diagnoses
>7-8 diagnoses

It depends on how the presence or absence of a diagnosis is indicated.
I'll guess that the ICD-9 codes are string variables, to accommodate
the codes with 'E', 'V', and maybe other prefixes. There are various
ways to count the non-blank ICD-9 codes; the following is probably not
the neatest, but it's straightforward. (Not tested)

Assuming that the variables are named ICD9_01 to ICD9_08,

NUMERIC    NUM_DX (F3).
VAR LABEL  NUM_DX 'Number of diagnoses'.
COMPUTE    NUM_DX = 0.

DO REPEAT DX= ICD9_01 ICD9_02 ICD9_03 ICD9_04
               ICD9_05 ICD9_06 ICD9_07 ICD9_08.
.  IF (DX NE ' ') NUM_DX = NUM_DX + 1.
END REPEAT.

RECODE NUM_DX
    (0,1 = 1)
    (2,3 = 2)
    (4,5 = 3)
    (6,7 = 4)
    (8   = 5).
VAL LABEL NUM_DX
    1 '0-1 Dx'
    2 '2-3 Dx'
    3 '4-5 Dx'
    4 '6-7 Dx'
    5 '8   Dx'.

Unfortunately, since your variables aren't contiguous in the file, you
can't use a convention like "ICD9_01 TO ICD9_08" to make up the
variable list for DO REPEAT. (There are some tricks to make it easier,
including Python in SPSS 14+, but probably not worth the trouble for 8
variables.)

Comment: if your categorization is this coarse-grained, only two input
categories to an output, you might consider simply keeping the number
of diagnoses as is, rather than collapsing.

If you do collapse categories, think what they should be. For example,
0 is very special, probably always an error; it shouldn't be lumped
with 1, which will be very common. And very large numbers may be rare;
perhaps '5-8' should be a category, rather than broken down,