A lot of helpful feedback from a lot of
people - thanks everyone. I'm reading Rosenbaum & Rubin's 1983
paper on propensity scores right now. My impression had been that
these would be used for post-hoc matching of participants and controls
within subgroups by propensity score. Is that what you're suggesting,
or is there a way to use propensity scores to "correct" the assignment
of units to participant or control status up front?
By the way my outcome variable will be measured energy savings (electricity plus natural gas) based on a "treatment" of two years' duration consisting of feedback regarding energy performance relative to a benchmark pool of buildings and one-on-one consultation regarding efficiency measures. Owners' investment in these measures as well as operational changes are intermediate outcomes but are not a main focus. From: "Poes, Matthew Joseph" <[hidden email]> To: 'Martha Hewett' <[hidden email]>, "'[hidden email]'" <[hidden email]> Date: 02/23/2012 11:28 AM Subject: RE: guidance on assigning cases to participant and control groups Hi Martha, I believe that If you have variables which can identify groups of owner types, that you need to use a stratified random sampling method. The stratifications are the owner types. This helps ensure that there is a random assignment within each owner type, but that owner types are equally distributed between treatment and control. The next consideration is that, especially with smaller samples, this still may lead to a bias (it may not be feasible to perfectly balance without creating a researcher biased sample assignment. As a result, you can move to a propensity score matching approach (After the stratified assignment) which indicates the likelihood of assignment to the treatment group (regardless of their actual assignment) based on these same descriptive variables. This way you can get a better unbiased causal effect estimate. Since I do see you noting that you feel that there aren’t enough people within these strata, I would still say, shoot for the above method as the gold standard approach you go with. However, you will need to reduce your stratifications into a small minimum number. If true stratification isn’t possible, then simply go with random assignment and don’t worry about. By following it up with the propensity score matching, you can control for the bias. If you feel that you simply have a sample that is in no way homogeneous, that it’s simply a smattering of unique cases, then maybe a statistical analysis isn’t going to be feasible (what would the interpretation of the statistics really tell you?). If you want to identify the effect of different strata, you need to have at least 10 people in each strata to do anything with them, otherwise you simply need to ignore them, and accept these limitations. It also seems like your subgroups or strata are not well defined. That’s a problem. You either need to find a way to define these strata, or potentially accept that they aren’t really representative of meaningful subgroups. If you want to use cluster analysis to define your strata, I would recommend considering how you will approach this very carefully. First, by nature of cluster analysis, these groupings will be totally sample dependent. Is this ok? Once the strata (Clusters) are defined, will you be able to apply a theoretical rational for their grouping? Is the size of each subgroup meaningfully large? Can you divide your sample randomly in half and compare the clustering between the 2 in order to ensure they are robust? Would they hold up in another sampling from the same population you are estimating too? Matthew J Poes Research Data Specialist Center for Prevention Research and Development University of Illinois 510 Devonshire Dr. Champaign, IL 61820 Phone: 217-265-4576 email: mpoes@... From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martha Hewett Sent: Thursday, February 23, 2012 11:00 AM To: [hidden email] Subject: Re: guidance on assigning cases to participant and control groups I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches. For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property. And so on. If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't. I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc. What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups. In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata. I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control.
From: Gene Maguin <emaguin@...> To: [hidden email] Date: 02/23/2012 10:47 AM Subject: Re: guidance on assigning cases to participant and control groups Sent by: "SPSSX(r) Discussion" <[hidden email]> Martha, Maybe I’m not understanding something but why have you rejected random assignment of owners to condition? Gene Maguin From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martha Hewett Sent: Thursday, February 23, 2012 11:14 AM To: [hidden email] Subject: guidance on assigning cases to participant and control groups I'm involved with a project that is recruiting 100 to 200 owners of multifamily properties, with 1 to 10 properties each (average 5), for an energy efficiency project. Because the project's effectiveness depends in significant part on influencing the owners, we need to assign all of any given owner's properties to either participant or control status. Beyond the owners, however, there are many property-specific factors that may affect savings, such as the age of the buildings, number of units, type of heating equipment, etc. Obviously there will not be enough owners or properties to match the participant and control groups on all of these dimensions. So my question is, when you have a large number of more or less independent (but possibly somewhat correlated) dimensions that may affect outcomes relative to the size of your sample, what is the best way to assign cases to participant or control status? What additional considerations are introduced by the fact that I have to assign participant/control status by owner but need to consider other factors by property? If someone can point me to a suitable methodology, I'm sure I can figure out how to apply it. And yes, I agree, wouldn't it be great if n could be bigger?
|
Free forum by Nabble | Edit this page |