SPSSX Discussion

(no subject)

Classic

List

Threaded

1 message

Ornelas, Fermin

(no subject)

>I'm hoping to get some help with an epidemiology project. I'm using a

>data set that has multiple values (medical diagnoses) in one variable:

>. So in the variable Diagnosis, there are up to five separate values

>in each entry, with hundreds of possible values. I could make a new

>variable for each diagnosis, but like I said there are hundreds of

>different ones. We are trying to associate different risk factors with

>each diagnosis. So, what would be a good strategy to separate the

>different diagnoses? Thanks

Art and Stephen Brand ("Statisticsdoc") have asked questions about your
data representation. Those are important, but more critical is: what
does your data mean, and what do you want to do with it?

Strictly, a "variable with multiple values" is impossible in SPSS;
that's just a feature of SPSS's representation of data. However, in the
real world there are frequently categorical concepts where the
categories are not mutually exclusive, and one case may have several:

What magazine do you subscribe to? Or, in your case, diagnoses.

In SPSS, these are called 'multiple response' sets or groups. See
commands MRSETS and MULT RESPONSE in the Command Syntax reference, plus
documentation for the CTABLES module, if you have it. There are two ways
to represent them. In 'multiple dichotomies', there is one yes/no
variable for each category; that's the "new variable for each
diagnosis", and indeed it's clumsy when there are many categories. In
'multiple response groups' or 'multiple category sets', there is a set
of categorical variables, each giving one category applicable to the
case; or, of course, 'N/A'. The group has enough variables for a
reasonable estimate of the maximum number of categories per case.

That's usual for diagnoses: variables like DX1 through DX5, or some
such.

Your data may already be in this form; "variable with multiple values"

makes sense, and may be simply a mis-phrasing, in SPSS terminology. As
Stephen and Art have said, if your data doesn't have this form, it
probably should; and you should tell us how the diagnoses are
represented now, so we can help you convert.

Now, the big question: How to analyze your data? You're "trying to
associate different risk factors with each diagnosis". You have two
conceptual problems:

. With hundreds of possible diagnoses, many of them are probably too
rare in your data to assess risk factors. The usual solutions are to
drop these and assess only for the more common diagnoses; or to combine
related diagnosis categories into larger categories, and analyze by
those categories.

. With multiple diagnoses per patient, you need a conceptual way to
decide which diagnoses should be considered associated with the risk
factors you're assessing. A common approach is to drop all but one of
the diagnoses (the 'primary diagnosis'), and analyze by the primary
diagnosis. I could imagine associating the risk factors with *all* the
diagnoses for a case, in which case you'd 'unroll' your data: create one
record for each diagnosis present for the patient. That would raise
difficulties because you'd clearly have non-independent records. I'm
getting out of my depth here; if you like this approach, maybe others
can say if there's a legitimate way to analyze it.

Without a clear description of the data it is difficult to make a
recommendation. But it seems that some multivariate technique could help
on this, i.e. clustering, factor analysis, or principal components.

Fermin Ornelas, Ph.D.

Management Analyst III, AZ DES

Tel: (602) 542-5639

E-mail: [hidden email]

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you.