Help calculating predictor variable

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Help calculating predictor variable

Marc Mackert
I am currently helping a colleague who is working on her dissertation
and need some advice on how to calculate a predictor variable for a
multiple regression model.

The research question involves whether or not timely case review of
children in out of home care effects their overall outcomes in out of
home care (ie Length of stay in foster care, number of times they move
placements while in foster care). The problem we are facing is coming up
with a way to create a single variable of "timely case review" as
children require a case review at 6 months in care and then every 6
months thereafter and could have many types of timely case review
patterns - (e.g., the first was timely, the second was late, the third
was timely, they missed their 4th or all were timely, no reviews were
timely, etc..)

Currently, she is considering using a variable that involves looking at
the number of reviews the child required, the number completed, and the
number completed timely - basically requiring that the child be timely
on all required reviews. So, if they were timely on all the get a 1, and
if they are late on even on review, they receive a 0.

She is also considering creating some sort of continous variable that
scales the child on the timeliness factor, for example maybe a ratio of
late to all reviews held, although there may be other suggestions on how
she might do this as well.

Finally, is there is a better way to handle this that we have not
thought of and someone could suggest?

Thanks in advance for any help!

Marc


Marc J. Mackert, Ph.D.
Data Management Analyst
Administrative Review Division
4045 S. Lowell Blvd.
Denver, Colorado 80236
Office: (303) 866-7137
Cell: (303) 947-3106

This e-mail and any attachments thereto, is intended only for use by
the addressees named herein and may contain legally privileged and/or
confidential information.  If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or
copying of this e-mail and any attachments thereto, is strictly
prohibited.  If you have received this e-mail in error, please
immediately notify me at 303.866.7137 and permanently delete the
original and any copy of any e-mail and any printout thereof.
Reply | Threaded
Open this post in threaded view
|

Re: Help calculating predictor variable

Richard Ristow
At 05:39 PM 1/12/2007, Marc Mackert wrote:

>I am currently helping a colleague who is working on her dissertation
>and need some advice on how to calculate a predictor variable for a
>multiple regression model.

To be precise, which matters here, she is trying to *define* her
predictor variable, not *calculate* it. (Calculating it is unlikely to
pose problems.)

Defining a value, conceptually, is fundamentally the job of the subject
specialist, the researcher, rather than of the methodologist, the guys
like us. The researcher knows, or defines, the meanings that are
assigned to values. That's part of defining what the study means, and
that's part of what being the researcher is about.

If we're doing our job, and having a good time, we'll learn a good deal
about what the study means; but we'll never know as much as the
researcher, and in any case the researcher's decision is final.

We can make suggestions, though, and otherwise help. Among other
things, we're likely to have a clearer sense what can be calculated
from the data, and what can't.

In your case,

>The research question involves whether or not timely case review of
>children in out of home care effects their overall outcomes. [...] The
>problem we are facing is coming up with a way to create a single
>variable of "timely case review" as children require a case review at
>6 months in care and then every 6 months thereafter and could have
>many types of timely case review patterns - (e.g., the first was
>timely, the second was late, the third was timely, they missed their
>4th or all were timely, no reviews were timely, etc..)

Right. So far, so complex. The first thing to note, and to accept, is
that the data do not provide an unambiguously best definition of
'timely review'. The researcher, if not experienced, may expect that
all such questions DO have unambiguous answers. Disabusing researchers
of this is part of our job.

However, it sounds as if your researcher knows this already.

>She is considering using a variable that involves looking at the
>number of reviews the child required, the number completed, and the
>number completed timely - basically requiring that the child be timely
>on all required reviews. So, if they were timely on all the get a 1,
>and if they are late on even one review, they receive a 0.

Repeating, repeating, that the methodologist must NOT make this
decision, there are a few points the researcher may find helpful.
Quickly, I think of,

. On the face of it, this is the most stringent criterion that's at all
reasonable. Possibly it's so stringent it loses the distinction she
needs. ALWAYS run frequencies on any candidate categorization of the
data. If, say, only 10% receive 'timely' reviews, the researcher should
consider whether (a) the criterion is too stringent; (b) the system is
hopelessly broken; (c) some other possibilities; don't rely on the
methodologist to think of everything.

. The longer the child is in the system, the more reviews are required,
and the greater opportunity for one not to be timely. The criterion
will partly be a proxy for short-term clients; it may, or may not, be
so much so as to distort the results.

>She is also considering creating some sort of continuous variable that
>scales the child on the timeliness factor, for example maybe a ratio
>of late to all reviews held, although there may be other suggestions
>on how she might do this as well.

That's certainly something else to try.
. Since she's going to run a multiple regression model, I'd worry a
little about non-linear response to this quantity, though a study like
this rarely has a good enough model fit to distinguish non-linear
effects.
. The first measure is the same as this, but dichotomized with a
cutpoint of 100%. That suggests dichotomizing at other levels. One
criterion for a cutpoint is that it puts a reasonable number of cases
in each group; again, always check by frequencies. (In fact, run
descriptives on the continuous variable, including 10%, 20%, ... 90%
percentile points.) But a cutpoint must NEVER be accepted by such a
criterion unless it can be argued it's reasonable in the study
circumstances.
. Frequencies again (or good old crosstabs): what is the percent of
first, second, ... reviews that are timely? If they differ a lot, is
the continuous variable partly a proxy for length of time in program?
If they differ a lot, what does the researcher think that shows about
the program.

>Finally, is there is a better way to handle this that we have not
>thought of and someone could suggest?

There's no unambiguously BETTER way. There are always OTHER ways.
Again, thinking very quickly,

. About as unbiased as I can think of but throwing away information,
score timely/untimely for the first required review only. It *may* be
that this is the most important review anyway; that's for the
researcher to judge.

. I might consider trichotomizing. (Here, better methodologists than I
may have contrary opinions.) That is, on some scale such as the percent
of reviews that are timely (her continuous variable), establish
cutpoints that categorize into 'pretty bad', 'middling OK', and 'good'.
Even trying that, and looking at what cutpoints you come up with and
how the categories are distributed in the population, will tell you
something.

>Thanks in advance for any help!

You're welcome. I hope you enjoy our daily special on un-canned worms.
Richard


>Marc
>
>
>Marc J. Mackert, Ph.D.
>Data Management Analyst
>Administrative Review Division
>4045 S. Lowell Blvd.
>Denver, Colorado 80236
>Office: (303) 866-7137
>Cell: (303) 947-3106
>
>This e-mail and any attachments thereto, is intended only for use by
>the addressees named herein and may contain legally privileged and/or
>confidential information.  If you are not the intended recipient of
>this
>e-mail, you are hereby notified that any dissemination, distribution
>or
>copying of this e-mail and any attachments thereto, is strictly
>prohibited.  If you have received this e-mail in error, please
>immediately notify me at 303.866.7137 and permanently delete the
>original and any copy of any e-mail and any printout thereof.
>
>
>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.5.432 / Virus Database: 268.16.10/624 - Release Date:
>1/12/2007 2:04 PM