I am currently helping a colleague who is working on her dissertation
and need some advice on how to calculate a predictor variable for a multiple regression model. The research question involves whether or not timely case review of children in out of home care effects their overall outcomes in out of home care (ie Length of stay in foster care, number of times they move placements while in foster care). The problem we are facing is coming up with a way to create a single variable of "timely case review" as children require a case review at 6 months in care and then every 6 months thereafter and could have many types of timely case review patterns - (e.g., the first was timely, the second was late, the third was timely, they missed their 4th or all were timely, no reviews were timely, etc..) Currently, she is considering using a variable that involves looking at the number of reviews the child required, the number completed, and the number completed timely - basically requiring that the child be timely on all required reviews. So, if they were timely on all the get a 1, and if they are late on even on review, they receive a 0. She is also considering creating some sort of continous variable that scales the child on the timeliness factor, for example maybe a ratio of late to all reviews held, although there may be other suggestions on how she might do this as well. Finally, is there is a better way to handle this that we have not thought of and someone could suggest? Thanks in advance for any help! Marc Marc J. Mackert, Ph.D. Data Management Analyst Administrative Review Division 4045 S. Lowell Blvd. Denver, Colorado 80236 Office: (303) 866-7137 Cell: (303) 947-3106 This e-mail and any attachments thereto, is intended only for use by the addressees named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify me at 303.866.7137 and permanently delete the original and any copy of any e-mail and any printout thereof. |
At 05:39 PM 1/12/2007, Marc Mackert wrote:
>I am currently helping a colleague who is working on her dissertation >and need some advice on how to calculate a predictor variable for a >multiple regression model. To be precise, which matters here, she is trying to *define* her predictor variable, not *calculate* it. (Calculating it is unlikely to pose problems.) Defining a value, conceptually, is fundamentally the job of the subject specialist, the researcher, rather than of the methodologist, the guys like us. The researcher knows, or defines, the meanings that are assigned to values. That's part of defining what the study means, and that's part of what being the researcher is about. If we're doing our job, and having a good time, we'll learn a good deal about what the study means; but we'll never know as much as the researcher, and in any case the researcher's decision is final. We can make suggestions, though, and otherwise help. Among other things, we're likely to have a clearer sense what can be calculated from the data, and what can't. In your case, >The research question involves whether or not timely case review of >children in out of home care effects their overall outcomes. [...] The >problem we are facing is coming up with a way to create a single >variable of "timely case review" as children require a case review at >6 months in care and then every 6 months thereafter and could have >many types of timely case review patterns - (e.g., the first was >timely, the second was late, the third was timely, they missed their >4th or all were timely, no reviews were timely, etc..) Right. So far, so complex. The first thing to note, and to accept, is that the data do not provide an unambiguously best definition of 'timely review'. The researcher, if not experienced, may expect that all such questions DO have unambiguous answers. Disabusing researchers of this is part of our job. However, it sounds as if your researcher knows this already. >She is considering using a variable that involves looking at the >number of reviews the child required, the number completed, and the >number completed timely - basically requiring that the child be timely >on all required reviews. So, if they were timely on all the get a 1, >and if they are late on even one review, they receive a 0. Repeating, repeating, that the methodologist must NOT make this decision, there are a few points the researcher may find helpful. Quickly, I think of, . On the face of it, this is the most stringent criterion that's at all reasonable. Possibly it's so stringent it loses the distinction she needs. ALWAYS run frequencies on any candidate categorization of the data. If, say, only 10% receive 'timely' reviews, the researcher should consider whether (a) the criterion is too stringent; (b) the system is hopelessly broken; (c) some other possibilities; don't rely on the methodologist to think of everything. . The longer the child is in the system, the more reviews are required, and the greater opportunity for one not to be timely. The criterion will partly be a proxy for short-term clients; it may, or may not, be so much so as to distort the results. >She is also considering creating some sort of continuous variable that >scales the child on the timeliness factor, for example maybe a ratio >of late to all reviews held, although there may be other suggestions >on how she might do this as well. That's certainly something else to try. . Since she's going to run a multiple regression model, I'd worry a little about non-linear response to this quantity, though a study like this rarely has a good enough model fit to distinguish non-linear effects. . The first measure is the same as this, but dichotomized with a cutpoint of 100%. That suggests dichotomizing at other levels. One criterion for a cutpoint is that it puts a reasonable number of cases in each group; again, always check by frequencies. (In fact, run descriptives on the continuous variable, including 10%, 20%, ... 90% percentile points.) But a cutpoint must NEVER be accepted by such a criterion unless it can be argued it's reasonable in the study circumstances. . Frequencies again (or good old crosstabs): what is the percent of first, second, ... reviews that are timely? If they differ a lot, is the continuous variable partly a proxy for length of time in program? If they differ a lot, what does the researcher think that shows about the program. >Finally, is there is a better way to handle this that we have not >thought of and someone could suggest? There's no unambiguously BETTER way. There are always OTHER ways. Again, thinking very quickly, . About as unbiased as I can think of but throwing away information, score timely/untimely for the first required review only. It *may* be that this is the most important review anyway; that's for the researcher to judge. . I might consider trichotomizing. (Here, better methodologists than I may have contrary opinions.) That is, on some scale such as the percent of reviews that are timely (her continuous variable), establish cutpoints that categorize into 'pretty bad', 'middling OK', and 'good'. Even trying that, and looking at what cutpoints you come up with and how the categories are distributed in the population, will tell you something. >Thanks in advance for any help! You're welcome. I hope you enjoy our daily special on un-canned worms. Richard >Marc > > >Marc J. Mackert, Ph.D. >Data Management Analyst >Administrative Review Division >4045 S. Lowell Blvd. >Denver, Colorado 80236 >Office: (303) 866-7137 >Cell: (303) 947-3106 > >This e-mail and any attachments thereto, is intended only for use by >the addressees named herein and may contain legally privileged and/or >confidential information. If you are not the intended recipient of >this >e-mail, you are hereby notified that any dissemination, distribution >or >copying of this e-mail and any attachments thereto, is strictly >prohibited. If you have received this e-mail in error, please >immediately notify me at 303.866.7137 and permanently delete the >original and any copy of any e-mail and any printout thereof. > > > >-- >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.5.432 / Virus Database: 268.16.10/624 - Release Date: >1/12/2007 2:04 PM |
Free forum by Nabble | Edit this page |