>I'm hoping to get some help with an epidemiology project. I'm using a
>data set that has multiple values (medical diagnoses) in one variable: >. So in the variable Diagnosis, there are up to five separate values >in each entry, with hundreds of possible values. I could make a new >variable for each diagnosis, but like I said there are hundreds of >different ones. We are trying to associate different risk factors with >each diagnosis. So, what would be a good strategy to separate the >different diagnoses? Thanks Art and Stephen Brand ("Statisticsdoc") have asked questions about your data representation. Those are important, but more critical is: what does your data mean, and what do you want to do with it? Strictly, a "variable with multiple values" is impossible in SPSS; that's just a feature of SPSS's representation of data. However, in the real world there are frequently categorical concepts where the categories are not mutually exclusive, and one case may have several: What magazine do you subscribe to? Or, in your case, diagnoses. In SPSS, these are called 'multiple response' sets or groups. See commands MRSETS and MULT RESPONSE in the Command Syntax reference, plus documentation for the CTABLES module, if you have it. There are two ways to represent them. In 'multiple dichotomies', there is one yes/no variable for each category; that's the "new variable for each diagnosis", and indeed it's clumsy when there are many categories. In 'multiple response groups' or 'multiple category sets', there is a set of categorical variables, each giving one category applicable to the case; or, of course, 'N/A'. The group has enough variables for a reasonable estimate of the maximum number of categories per case. That's usual for diagnoses: variables like DX1 through DX5, or some such. Your data may already be in this form; "variable with multiple values" makes sense, and may be simply a mis-phrasing, in SPSS terminology. As Stephen and Art have said, if your data doesn't have this form, it probably should; and you should tell us how the diagnoses are represented now, so we can help you convert. Now, the big question: How to analyze your data? You're "trying to associate different risk factors with each diagnosis". You have two conceptual problems: . With hundreds of possible diagnoses, many of them are probably too rare in your data to assess risk factors. The usual solutions are to drop these and assess only for the more common diagnoses; or to combine related diagnosis categories into larger categories, and analyze by those categories. . With multiple diagnoses per patient, you need a conceptual way to decide which diagnoses should be considered associated with the risk factors you're assessing. A common approach is to drop all but one of the diagnoses (the 'primary diagnosis'), and analyze by the primary diagnosis. I could imagine associating the risk factors with *all* the diagnoses for a case, in which case you'd 'unroll' your data: create one record for each diagnosis present for the patient. That would raise difficulties because you'd clearly have non-independent records. I'm getting out of my depth here; if you like this approach, maybe others can say if there's a legitimate way to analyze it. Without a clear description of the data it is difficult to make a recommendation. But it seems that some multivariate technique could help on this, i.e. clustering, factor analysis, or principal components. Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
Free forum by Nabble | Edit this page |