stepwise regression how to include all cases despite missing data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

stepwise regression how to include all cases despite missing data

Hunna Watson
Hi all,
 
I'm running a stepwise regression of organizational practices on construction projects that predict project cost growth. I have data for 115 projects, yet some organizational practices were not applicable on some projects (in a random fashion). the missing data is obviously purposeful and not due to not filling in questionnaires etc. spss automatically excludes cases with any missing values, or wants to substitute a value, so I end up with a regression being carried out on 10 projects, obviously not useful. Any suggestions for syntax to include all cases or suggestions to rectify this problem?
 
Thanks in advance,
Hunna Watson
Reply | Threaded
Open this post in threaded view
|

Re: stepwise regression how to include all cases despite missing data

Hector Maletta
         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson
Reply | Threaded
Open this post in threaded view
|

Re: stepwise regression how to include all cases despite missing data

Mark A Davenport MADAVENP
Hector,

When I worked at ACT, Inc we often treated the student's school identifier
this way, usually with great success.  Granted, we had many thousands of
cases to draw from.  Hunna only has 115?  She is going to run out of cases
pretty quickly, don't you think?


***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Hector Maletta <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/25/2007 10:15 AM
Please respond to
Hector Maletta <[hidden email]>


To
[hidden email]
cc

Subject
Re: stepwise regression how to include all cases despite missing    data






         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson
Reply | Threaded
Open this post in threaded view
|

Re: stepwise regression how to include all cases despite missing data

Hector Maletta
In reply to this post by Hector Maletta
Hunna:

Even with a scale, the "missing" responses can be reinterpreted as saying
"This practice was not effective in this case because -for one reason or
another- it was not used". This is not quite clean conceptually, but is your
only choice unless you put up with working with 10 cases.

The problem, apparently, is in the design of the questionnaire, asking for
the effectiveness of a practice that is not in universal use among the cases
under analysis. In any case, a practice cannot have any effectiveness if it
is not used, so I insist you can treat it as having zero effectiveness when
it was not used.

On the other hand, since your variables seem to be many, and your cases seem
to be few, perhaps you should consider a more artisanal approach for
identifying effective strategies instead of your alli-in-one regression.
With 87 predictors and 115 cases you don't have a chance even without a
single missing value.



Hector







  _____

From: Hunna Watson [mailto:[hidden email]]
Sent: 25 July 2007 11:36
To: Hector Maletta
Subject: FW: Re: stepwise regression how to include all cases despite
missing data



thanks for your reply, i've just come on board this project in the past two
weeks, the data has been collected already and, this is essentially what
happened though i'm simplifying it, respondents rated how effective the use
of the strategy was for preventing cost growth in the form of work that had
to be done again on the project, so I have data on a scale and no
possibility for coding absent or present :S



extra information....



yes I know all the horrible things about stepwise, but it is the only
suitable method I can think of to answer the research questions, I have just
come on board the project in the last two weeks. The research is very
exploratory and the topic hasn't been examined before. Data has been
collected on many different predictor variables (design-related sources,
subcontractor sources, site management sources, contract documentation, the
list goes on and on - up to a terrible 87 predictors). There are 115
projects, so each is a case if you like, and we want to first look at this
data set (no options there), but after that we can merge it with another
data set containing information on a further 160 projects. Some predictors
weren't relevant to projects. for instance, some didn't use incentives, but
we have ratings on scales of 1 to 5 (assessing raters perceptions of
contribution of use of that method to costs) and we are seeking to predict
costs from the predictor variables. IF a method wasn't applicable e.g., use
of a particular incentive plan, it has been left blank on questionnaires. No
logical ordering.

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson