Martha, Maybe I’m not understanding something but why have you rejected random assignment of owners to condition? Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martha Hewett I'm involved with a project that is recruiting 100 to 200 owners of multifamily properties, with 1 to 10 properties each (average 5), for an energy efficiency project. Because the project's effectiveness depends in significant part on influencing the owners, we need to assign all of any given owner's properties to either participant or control status. Beyond the owners, however, there are many property-specific factors that may affect savings, such as the age of the buildings, number of units, type of heating equipment, etc. Obviously there will not be enough owners or properties to match the participant and control groups on all of these dimensions.
|
I'm concerned that with such a small sample,
and a significant variation in types of owners and properties, random assignment
might result in gross mismatches. For example, maybe I have a large
owner who has 10 high-rise subsidized housing properties, and another owner
who has just one 12 unit three-story walk-up property. And so on.
If I had a large sample, these sorts of variations would even out,
but with a small sample I'm afraid they won't. I could end up with
most of the highrises in one of the two groups, most of the large (multiple
property) owners in one of the two groups, most of the older buildings
in one of the two groups, etc. What I thought might make sense is
to divide the recruits into a small number of similar subgroups, and then
do random assignment to participant and control status within subgroups.
In other words, sort of a stratified random sample, except that I
don't necessarily want to use just one or two dimensions to define the
strata, and I don't have enough cases to use lots of dimensions to define
the strata. I thought about using cluster analysis to define appropriate
subgroups before randomly assigning within subgroup to partic and control.
From: Gene Maguin <[hidden email]> To: [hidden email] Date: 02/23/2012 10:47 AM Subject: Re: guidance on assigning cases to participant and control groups Sent by: "SPSSX(r) Discussion" <[hidden email]> Martha, Maybe I’m not understanding something but why have you rejected random assignment of owners to condition? Gene Maguin From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martha Hewett Sent: Thursday, February 23, 2012 11:14 AM To: [hidden email] Subject: guidance on assigning cases to participant and control groups I'm involved with a project that is recruiting 100 to 200 owners of multifamily properties, with 1 to 10 properties each (average 5), for an energy efficiency project. Because the project's effectiveness depends in significant part on influencing the owners, we need to assign all of any given owner's properties to either participant or control status. Beyond the owners, however, there are many property-specific factors that may affect savings, such as the age of the buildings, number of units, type of heating equipment, etc. Obviously there will not be enough owners or properties to match the participant and control groups on all of these dimensions. So my question is, when you have a large number of more or less independent (but possibly somewhat correlated) dimensions that may affect outcomes relative to the size of your sample, what is the best way to assign cases to participant or control status? What additional considerations are introduced by the fact that I have to assign participant/control status by owner but need to consider other factors by property? If someone can point me to a suitable methodology, I'm sure I can figure out how to apply it. And yes, I agree, wouldn't it be great if n could be bigger?
|
"Propensity scoring" is the general topic for matching on
multiple variables that are expected to relate to an outcome. I have no idea what it is that you will be measuring for outcome, but I believe that you need to design your eventual analyses, and start imagining what you will be able to write as conclusions, before you can make effective use of propensity matching. Doing a cluster analysis and matching by cluster sounds like a way to *grope* your way forward. That could be what you have, if this is exploratory work. -- Rich Ulrich Date: Thu, 23 Feb 2012 10:59:34 -0600 From: [hidden email] Subject: Re: guidance on assigning cases to participant and control groups To: [hidden email] I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches. For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property. And so on. If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't. I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc. What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups. In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata. I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control. [snip] |
Administrator
|
In reply to this post by Martha Hewett
It sounds like you'll be using some kind of multilevel model to analyze the data (owners at Level 2, dwellings at Level 1). In that context, 100-200 level 2 units is a lot more than one often sees, it seems to me. So I'm a bit surprised to hear you say it's a small sample.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I wonder if a better, more acceptable, strategy than simple random assignment might be to stratify the population on a variable or two or three but not too many and all of which has some correlation with your DV or DVs and then randomly assign within strata. You see this sort of sampling scheme in the literature on school based interventions where a plausible set of community covariates are used to score the areas served by different schools, the schools are then grouped into strata and random assignment is made within the strata.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: Thursday, February 23, 2012 1:34 PM To: [hidden email] Subject: Re: guidance on assigning cases to participant and control groups It sounds like you'll be using some kind of multilevel model to analyze the data (owners at Level 2, dwellings at Level 1). In that context, 100-200 level 2 units is a lot more than one often sees, it seems to me. So I'm a bit surprised to hear you say it's a small sample. Martha Hewett wrote > > I'm concerned that with such a small sample, and a significant variation > in types of owners and properties, random assignment might result in gross > mismatches. For example, maybe I have a large owner who has 10 high-rise > subsidized housing properties, and another owner who has just one 12 unit > three-story walk-up property. And so on. If I had a large sample, these > sorts of variations would even out, but with a small sample I'm afraid > they won't. I could end up with most of the highrises in one of the two > groups, most of the large (multiple property) owners in one of the two > groups, most of the older buildings in one of the two groups, etc. What I > thought might make sense is to divide the recruits into a small number of > similar subgroups, and then do random assignment to participant and > control status within subgroups. In other words, sort of a stratified > random sample, except that I don't necessarily want to use just one or two > dimensions to define the strata, and I don't have enough cases to use lots > of dimensions to define the strata. I thought about using cluster > analysis to define appropriate subgroups before randomly assigning within > subgroup to partic and control. > > Martha Hewett | > Director of Research | 612.335.5865 > Center for Energy and Environment > 212 Third Avenue North, Suite 560 | Minneapolis, MN 55401 > (cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org > > > > > From: Gene Maguin <emaguin@> > To: SPSSX-L@.UGA > Date: 02/23/2012 10:47 AM > Subject: Re: guidance on assigning cases to participant and control > groups > Sent by: "SPSSX(r) Discussion" <SPSSX-L@.UGA> > > > > Martha, > > Maybe Im not understanding something but why have you rejected random > assignment of owners to condition? > > Gene Maguin > > From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of > Martha Hewett > Sent: Thursday, February 23, 2012 11:14 AM > To: SPSSX-L@.UGA > Subject: guidance on assigning cases to participant and control groups > > I'm involved with a project that is recruiting 100 to 200 owners of > multifamily properties, with 1 to 10 properties each (average 5), for an > energy efficiency project. Because the project's effectiveness depends in > significant part on influencing the owners, we need to assign all of any > given owner's properties to either participant or control status. Beyond > the owners, however, there are many property-specific factors that may > affect savings, such as the age of the buildings, number of units, type of > heating equipment, etc. Obviously there will not be enough owners or > properties to match the participant and control groups on all of these > dimensions. > > So my question is, when you have a large number of more or less > independent (but possibly somewhat correlated) dimensions that may affect > outcomes relative to the size of your sample, what is the best way to > assign cases to participant or control status? What additional > considerations are introduced by the fact that I have to assign > participant/control status by owner but need to consider other factors by > property? > > If someone can point me to a suitable methodology, I'm sure I can figure > out how to apply it. > > And yes, I agree, wouldn't it be great if n could be bigger? > > Martha Hewett > Director of Research | 612.335.5865 > Center for Energy and Environment > 212 Third Avenue North, Suite 560 | Minneapolis, MN 55401 > (cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/guidance-on-assigning-cases-to-participant-and-control-groups-tp5508554p5508943.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rich Ulrich
See my recent reply to Matthew Poes. The
outcome variable is total energy savings. For each property we will
have at least two years of pre-treatment energy use data as well as data
during and after the treatment period. The pre-treatment data is
used to create a model of energy use as a function of weather and vacancy
rate. Post-treatment energy use is compared to the model to assess
the change in usage for each property. The question then is whether
the participant properties saved more than the control properties, and
how much more. An average savings will be sufficient information
for a go-no-go decision on a full scale program by the entities that are
funding this pilot project.
The savings analysis will probably be normalized per sq.ft. or per unit (apartment) so minor differences in building size aren't a problem. However, some buildings may have more potential for savings up front - they're older and so therefore less efficient, or they have central heating systems for which more cost-effective retrofit strategies are available, etc. So we need to be able to control for that in the analysis. The reason the owners come into it, and not just the properties, is that they tend to take operational or retrofit strategies they find to be successful in one building and apply them to others (so we don't want an individual owner's properties divided between participant and control) and that they tend to have different incentives (if the housing is subsidized or not) and that they have different capabilities (capital, staff, etc.) to execute suggested measures. Finally, I'd like to have all of the pre-treatment energy use data in hand and analyzed prior to assignment of the participant and control groups, since pre-treatment energy intensity is an important factor in savings potential, but the project timeline isn't going to allow that to happen. But energy intensity is correlated to some degree with the other characteristics data (age of building, number of stories, etc) that we will have at the time of the assignment.
From: Rich Ulrich <[hidden email]> To: <[hidden email]>, SPSS list <[hidden email]> Date: 02/23/2012 12:27 PM Subject: RE: guidance on assigning cases to participant and control groups "Propensity scoring" is the general topic for matching on multiple variables that are expected to relate to an outcome. I have no idea what it is that you will be measuring for outcome, but I believe that you need to design your eventual analyses, and start imagining what you will be able to write as conclusions, before you can make effective use of propensity matching. Doing a cluster analysis and matching by cluster sounds like a way to *grope* your way forward. That could be what you have, if this is exploratory work. -- Rich Ulrich Date: Thu, 23 Feb 2012 10:59:34 -0600 From: [hidden email] Subject: Re: guidance on assigning cases to participant and control groups To: [hidden email] I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches. For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property. And so on. If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't. I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc. What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups. In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata. I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control. [snip] |
That's interesting. From your description, it seems that it is a tricky
question, and yet to be determined, how you will measure "total energy savings" as a single score for a single landlord... Standardizing for square feet is problematic when you seek "total savings". I never had data needing a propensity score, so I never used them, but they have been used before randomization. But if you won't have your "pre" data well in hand before randomization, that could be irrelevant. Since "savings" is the dimension of outcome, the potential for savings is what you might use to stratify your sampling. I suspect that 10 or 20% of the owners will contain 90% of the complications: each may have several properties that have diverse characteristics. So: sample the easy ones by something like "total value" and "age", and split up the complicated ones, perhaps after ranking them subjectively. Spend your worry-time worrying about the tough ones. -- Rich Ulrich Date: Thu, 23 Feb 2012 15:08:16 -0600 From: [hidden email] Subject: Re: guidance on assigning cases to participant and control groups To: [hidden email] See my recent reply to Matthew Poes. The outcome variable is total energy savings. For each property we will have at least two years of pre-treatment energy use data as well as data during and after the treatment period. The pre-treatment data is used to create a model of energy use as a function of weather and vacancy rate. Post-treatment energy use is compared to the model to assess the change in usage for each property. The question then is whether the participant properties saved more than the control properties, and how much more. An average savings will be sufficient information for a go-no-go decision on a full scale program by the entities that are funding this pilot project. The savings analysis will probably be normalized per sq.ft. or per unit (apartment) so minor differences in building size aren't a problem. However, some buildings may have more potential for savings up front - they're older and so therefore less efficient, or they have central heating systems for which more cost-effective retrofit strategies are available, etc. So we need to be able to control for that in the analysis. The reason the owners come into it, and not just the properties, is that they tend to take operational or retrofit strategies they find to be successful in one building and apply them to others (so we don't want an individual owner's properties divided between participant and control) and that they tend to have different incentives (if the housing is subsidized or not) and that they have different capabilities (capital, staff, etc.) to execute suggested measures. Finally, I'd like to have all of the pre-treatment energy use data in hand and analyzed prior to assignment of the participant and control groups, since pre-treatment energy intensity is an important factor in savings potential, but the project timeline isn't going to allow that to happen. But energy intensity is correlated to some degree with the other characteristics data (age of building, number of stories, etc) that we will have at the time of the assignment.
From: Rich Ulrich <[hidden email]> To: <[hidden email]>, SPSS list <[hidden email]> Date: 02/23/2012 12:27 PM Subject: RE: guidance on assigning cases to participant and control groups "Propensity scoring" is the general topic for matching on multiple variables that are expected to relate to an outcome. I have no idea what it is that you will be measuring for outcome, but I believe that you need to design your eventual analyses, and start imagining what you will be able to write as conclusions, before you can make effective use of propensity matching. Doing a cluster analysis and matching by cluster sounds like a way to *grope* your way forward. That could be what you have, if this is exploratory work. -- Rich Ulrich Date: Thu, 23 Feb 2012 10:59:34 -0600 From: [hidden email] Subject: Re: guidance on assigning cases to participant and control groups To: [hidden email] I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches. For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property. And so on. If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't. I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc. What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups. In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata. I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control. [snip] |
Free forum by Nabble | Edit this page |