Dear All,
I have a large hospital discharge database, and have a question regarding creating categorical variables. For each patient, I have the following variables: Diagnoses1 ICD-9 Code Diagnosis1 Name Diagnosis1 Description Diagnoses2 ICD-9 Code Diagnosis2 Name Diagnosis2 Description .... Diagnoses8 ICD-9 Code Diagnosis8 Name Diagnosis9 Description I am trying to a create a new categorical variable, number of diagnoses, with the following groups: 0-1 diagnoses 2-3 diagnoses 4-5 diagnoses 6-7 diagnoses 7-8 diagnoses Does anyone have any suggestions on how to accomplish this in SPSS? Thank you in advance, Anthony J. Santella, MPH Doctoral Student Department of Health Systems Management Tulane University School of Public Health and Tropical Medicine Cell: 404-276-5961 Email: [hidden email] |
I would think you could use the aggregate function to count the valid
responses* in the ICD-9 code variables, then recode this response into your N of diagnoses groups. *you would probably need to aggregate using N and NMISS (or NU/NUMISS) and then subtract to get the total N of valid responses. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Anthony Santella Sent: Wednesday, September 20, 2006 8:59 PM To: [hidden email] Subject: [SPSSX-L] creating categorical variables Dear All, I have a large hospital discharge database, and have a question regarding creating categorical variables. For each patient, I have the following variables: Diagnoses1 ICD-9 Code Diagnosis1 Name Diagnosis1 Description Diagnoses2 ICD-9 Code Diagnosis2 Name Diagnosis2 Description .... Diagnoses8 ICD-9 Code Diagnosis8 Name Diagnosis9 Description I am trying to a create a new categorical variable, number of diagnoses, with the following groups: 0-1 diagnoses 2-3 diagnoses 4-5 diagnoses 6-7 diagnoses 7-8 diagnoses Does anyone have any suggestions on how to accomplish this in SPSS? Thank you in advance, Anthony J. Santella, MPH Doctoral Student Department of Health Systems Management Tulane University School of Public Health and Tropical Medicine Cell: 404-276-5961 Email: [hidden email] PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
On 9/21/06, Melissa Ives <[hidden email]> wrote:
> > I would think you could use the aggregate function to count the valid > responses* in the ICD-9 code variables, then recode this response into > your N of diagnoses groups. That suggestion assumes there are multiple rows per respondent. It appears that the original dataset only contains one row per respondent, with 9 sets of variables pertaining to the diagnoses. This sounds like a job for the "count" function. Let's say you recode whichever variable is most appropriate to a set of 9 indicators: diag1, diag2, ..., diag9 where 1=there was a diagnosis and 0=no. Then: count totalnum = diag1 to diag9 (1). freq totalnum. will tell you how many (from a minimum of 0 to a maximum of 9) diagnoses each respondent had. |
In reply to this post by Anthony Santella
Yes, in re-reading, I think you are correct.
Another option for variables in a single row per client would be the nvalid computation.... Compute totalnum=nvalid(dx1,dx2,dx3,...,dx8). Recode totalnum (0,1=1)(2,3=2)(4,5=3)(6,7=4)(8=8) into ndxr. Var labels ndxr 'Number of diagnoses-recoded'. NVALID. NVALID(variable[,..]). Numeric. Returns a count of the arguments that have valid, non-missing values. This function requires one or more arguments, which should be variable names in the active dataset. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Scott Czepiel Sent: Thursday, September 21, 2006 9:12 AM To: [hidden email] Subject: Re: [SPSSX-L] creating categorical variables On 9/21/06, Melissa Ives <[hidden email]> wrote: > > I would think you could use the aggregate function to count the valid > responses* in the ICD-9 code variables, then recode this response into > your N of diagnoses groups. That suggestion assumes there are multiple rows per respondent. It appears that the original dataset only contains one row per respondent, with 9 sets of variables pertaining to the diagnoses. This sounds like a job for the "count" function. Let's say you recode whichever variable is most appropriate to a set of 9 indicators: diag1, diag2, ..., diag9 where 1=there was a diagnosis and 0=no. Then: count totalnum = diag1 to diag9 (1). freq totalnum. will tell you how many (from a minimum of 0 to a maximum of 9) diagnoses each respondent had. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
I'm not sure why Anthony would want to lose precision here by collapsing values unless trying to replicate some other study.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Melissa Ives Sent: Thursday, September 21, 2006 9:22 AM To: [hidden email] Subject: Re: creating categorical variables Yes, in re-reading, I think you are correct. Another option for variables in a single row per client would be the nvalid computation.... Compute totalnum=nvalid(dx1,dx2,dx3,...,dx8). Recode totalnum (0,1=1)(2,3=2)(4,5=3)(6,7=4)(8=8) into ndxr. Var labels ndxr 'Number of diagnoses-recoded'. NVALID. NVALID(variable[,..]). Numeric. Returns a count of the arguments that have valid, non-missing values. This function requires one or more arguments, which should be variable names in the active dataset. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Scott Czepiel Sent: Thursday, September 21, 2006 9:12 AM To: [hidden email] Subject: Re: [SPSSX-L] creating categorical variables On 9/21/06, Melissa Ives <[hidden email]> wrote: > > I would think you could use the aggregate function to count the valid > responses* in the ICD-9 code variables, then recode this response into > your N of diagnoses groups. That suggestion assumes there are multiple rows per respondent. It appears that the original dataset only contains one row per respondent, with 9 sets of variables pertaining to the diagnoses. This sounds like a job for the "count" function. Let's say you recode whichever variable is most appropriate to a set of 9 indicators: diag1, diag2, ..., diag9 where 1=there was a diagnosis and 0=no. Then: count totalnum = diag1 to diag9 (1). freq totalnum. will tell you how many (from a minimum of 0 to a maximum of 9) diagnoses each respondent had. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
In reply to this post by Anthony Santella
At 09:58 PM 9/20/2006, Anthony Santella wrote:
>I have a large hospital discharge database. For each patient, I have >the following >variables: > >Diagnoses1 ICD-9 Code >Diagnosis1 Name >Diagnosis1 Description >.... >Diagnoses8 ICD-9 Code >Diagnosis8 Name >Diagnosis9 Description > >I am trying to a create a new categorical variable, number of >diagnoses, with the following groups: > >0-1 diagnoses >2-3 diagnoses >4-5 diagnoses >6-7 diagnoses >7-8 diagnoses It depends on how the presence or absence of a diagnosis is indicated. I'll guess that the ICD-9 codes are string variables, to accommodate the codes with 'E', 'V', and maybe other prefixes. There are various ways to count the non-blank ICD-9 codes; the following is probably not the neatest, but it's straightforward. (Not tested) Assuming that the variables are named ICD9_01 to ICD9_08, NUMERIC NUM_DX (F3). VAR LABEL NUM_DX 'Number of diagnoses'. COMPUTE NUM_DX = 0. DO REPEAT DX= ICD9_01 ICD9_02 ICD9_03 ICD9_04 ICD9_05 ICD9_06 ICD9_07 ICD9_08. . IF (DX NE ' ') NUM_DX = NUM_DX + 1. END REPEAT. RECODE NUM_DX (0,1 = 1) (2,3 = 2) (4,5 = 3) (6,7 = 4) (8 = 5). VAL LABEL NUM_DX 1 '0-1 Dx' 2 '2-3 Dx' 3 '4-5 Dx' 4 '6-7 Dx' 5 '8 Dx'. Unfortunately, since your variables aren't contiguous in the file, you can't use a convention like "ICD9_01 TO ICD9_08" to make up the variable list for DO REPEAT. (There are some tricks to make it easier, including Python in SPSS 14+, but probably not worth the trouble for 8 variables.) Comment: if your categorization is this coarse-grained, only two input categories to an output, you might consider simply keeping the number of diagnoses as is, rather than collapsing. If you do collapse categories, think what they should be. For example, 0 is very special, probably always an error; it shouldn't be lumped with 1, which will be very common. And very large numbers may be rare; perhaps '5-8' should be a category, rather than broken down, |
Free forum by Nabble | Edit this page |