guidance on assigning cases to participant and control groups

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

guidance on assigning cases to participant and control groups

Martha Hewett
I'm involved with a project that is recruiting 100 to 200 owners of multifamily properties, with 1 to 10 properties each (average 5), for an energy efficiency project.  Because the project's effectiveness depends in significant part on influencing the owners, we need to assign all of any given owner's properties to either participant or control status.  Beyond the owners, however, there are many property-specific factors that may affect savings, such as the age of the buildings, number of units, type of heating equipment, etc.  Obviously there will not be enough owners or properties to match the participant and control groups on all of these dimensions.  

So my question is, when you have a large number of more or less independent (but possibly somewhat correlated) dimensions that may affect outcomes relative to the size of your sample, what is the best way to assign cases to participant or control status?  What additional considerations are introduced by the fact that I have to assign participant/control status by owner but need to consider other factors by property?

If someone can point me to a suitable methodology, I'm sure I can figure out how to apply it.

And yes, I agree, wouldn't it be great if n could be bigger?
Martha Hewett
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org

Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Maguin, Eugene

Martha,

 

Maybe I’m not understanding something but why have you rejected random assignment of owners to condition?

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martha Hewett
Sent: Thursday, February 23, 2012 11:14 AM
To: [hidden email]
Subject: guidance on assigning cases to participant and control groups

 

I'm involved with a project that is recruiting 100 to 200 owners of multifamily properties, with 1 to 10 properties each (average 5), for an energy efficiency project.  Because the project's effectiveness depends in significant part on influencing the owners, we need to assign all of any given owner's properties to either participant or control status.  Beyond the owners, however, there are many property-specific factors that may affect savings, such as the age of the buildings, number of units, type of heating equipment, etc.  Obviously there will not be enough owners or properties to match the participant and control groups on all of these dimensions.  

So my question is, when you have a large number of more or less independent (but possibly somewhat correlated) dimensions that may affect outcomes relative to the size of your sample, what is the best way to assign cases to participant or control status?  What additional considerations are introduced by the fact that I have to assign participant/control status by owner but need to consider other factors by property?

If someone can point me to a suitable methodology, I'm sure I can figure out how to apply it.

And yes, I agree, wouldn't it be great if n could be bigger?

Martha Hewett

Director of Research | 612.335.5865

Center for Energy and Environment

212 Third Avenue North, Suite 560 | Minneapolis, MN 55401

(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org

 

Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Martha Hewett
I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches.  For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property.  And so on.  If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't.  I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc.  What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups.  In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata.  I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control.  
Martha Hewett  |
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org





From:        Gene Maguin <[hidden email]>
To:        [hidden email]
Date:        02/23/2012 10:47 AM
Subject:        Re: guidance on assigning cases to participant and control groups
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Martha,
 
Maybe I’m not understanding something but why have you rejected random assignment of owners to condition?
 
Gene Maguin
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Martha Hewett
Sent:
Thursday, February 23, 2012 11:14 AM
To:
[hidden email]
Subject:
guidance on assigning cases to participant and control groups

 
I'm involved with a project that is recruiting 100 to 200 owners of multifamily properties, with 1 to 10 properties each (average 5), for an energy efficiency project.  Because the project's effectiveness depends in significant part on influencing the owners, we need to assign all of any given owner's properties to either participant or control status.  Beyond the owners, however, there are many property-specific factors that may affect savings, such as the age of the buildings, number of units, type of heating equipment, etc.  Obviously there will not be enough owners or properties to match the participant and control groups on all of these dimensions.  

So my question is, when you have a large number of more or less independent (but possibly somewhat correlated) dimensions that may affect outcomes relative to the size of your sample, what is the best way to assign cases to participant or control status?  What additional considerations are introduced by the fact that I have to assign participant/control status by owner but need to consider other factors by property?


If someone can point me to a suitable methodology, I'm sure I can figure out how to apply it.


And yes, I agree, wouldn't it be great if n could be bigger?

Martha Hewett
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org

 
Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Rich Ulrich
"Propensity scoring" is the general topic for matching on
multiple variables that are expected to relate to an outcome. 

I have no idea what it is that you will be measuring for outcome,
but I believe that you need to design your eventual analyses,
and start imagining what you will be able to write as conclusions,
before you can make effective use of propensity matching.

Doing a cluster analysis and matching by cluster sounds like a
way to *grope* your way forward.  That could be what you have,
if this is exploratory work.

--
Rich Ulrich



Date: Thu, 23 Feb 2012 10:59:34 -0600
From: [hidden email]
Subject: Re: guidance on assigning cases to participant and control groups
To: [hidden email]

I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches.  For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property.  And so on.  If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't.  I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc.  What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups.  In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata.  I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control. 

[snip]
Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Bruce Weaver
Administrator
In reply to this post by Martha Hewett
It sounds like you'll be using some kind of multilevel model to analyze the data (owners at Level 2, dwellings at Level 1).  In that context, 100-200 level 2 units is a lot more than one often sees, it seems to me.  So I'm a bit surprised to hear you say it's a small sample.


Martha Hewett wrote
I'm concerned that with such a small sample, and a significant variation
in types of owners and properties, random assignment might result in gross
mismatches.  For example, maybe I have a large owner who has 10 high-rise
subsidized housing properties, and another owner who has just one 12 unit
three-story walk-up property.  And so on.  If I had a large sample, these
sorts of variations would even out, but with a small sample I'm afraid
they won't.  I could end up with most of the highrises in one of the two
groups, most of the large (multiple property) owners in one of the two
groups, most of the older buildings in one of the two groups, etc.  What I
thought might make sense is to divide the recruits into a small number of
similar subgroups, and then do random assignment to participant and
control status within subgroups.  In other words, sort of a stratified
random sample, except that I don't necessarily want to use just one or two
dimensions to define the strata, and I don't have enough cases to use lots
of dimensions to define the strata.  I thought about using cluster
analysis to define appropriate subgroups before randomly assigning within
subgroup to partic and control.

Martha Hewett  |
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org




From:   Gene Maguin <[hidden email]>
To:     [hidden email]
Date:   02/23/2012 10:47 AM
Subject:        Re: guidance on assigning cases to participant and control
groups
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Martha,
 
Maybe I’m not understanding something but why have you rejected random
assignment of owners to condition?
 
Gene Maguin
 
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Martha Hewett
Sent: Thursday, February 23, 2012 11:14 AM
To: [hidden email]
Subject: guidance on assigning cases to participant and control groups
 
I'm involved with a project that is recruiting 100 to 200 owners of
multifamily properties, with 1 to 10 properties each (average 5), for an
energy efficiency project.  Because the project's effectiveness depends in
significant part on influencing the owners, we need to assign all of any
given owner's properties to either participant or control status.  Beyond
the owners, however, there are many property-specific factors that may
affect savings, such as the age of the buildings, number of units, type of
heating equipment, etc.  Obviously there will not be enough owners or
properties to match the participant and control groups on all of these
dimensions.  

So my question is, when you have a large number of more or less
independent (but possibly somewhat correlated) dimensions that may affect
outcomes relative to the size of your sample, what is the best way to
assign cases to participant or control status?  What additional
considerations are introduced by the fact that I have to assign
participant/control status by owner but need to consider other factors by
property?

If someone can point me to a suitable methodology, I'm sure I can figure
out how to apply it.

And yes, I agree, wouldn't it be great if n could be bigger?

Martha Hewett
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Maguin, Eugene
I wonder if a better, more acceptable, strategy than simple random assignment might be to stratify the population on a variable or two or three but not too many and all of which has some  correlation with your DV or DVs and then randomly assign within strata. You see this sort of sampling scheme in the literature on school based interventions where a plausible set of community covariates are used to score the areas served by different schools, the schools are then grouped into strata and random assignment is made within the strata.

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Thursday, February 23, 2012 1:34 PM
To: [hidden email]
Subject: Re: guidance on assigning cases to participant and control groups

It sounds like you'll be using some kind of multilevel model to analyze the
data (owners at Level 2, dwellings at Level 1).  In that context, 100-200
level 2 units is a lot more than one often sees, it seems to me.  So I'm a
bit surprised to hear you say it's a small sample.



Martha Hewett wrote

>
> I'm concerned that with such a small sample, and a significant variation
> in types of owners and properties, random assignment might result in gross
> mismatches.  For example, maybe I have a large owner who has 10 high-rise
> subsidized housing properties, and another owner who has just one 12 unit
> three-story walk-up property.  And so on.  If I had a large sample, these
> sorts of variations would even out, but with a small sample I'm afraid
> they won't.  I could end up with most of the highrises in one of the two
> groups, most of the large (multiple property) owners in one of the two
> groups, most of the older buildings in one of the two groups, etc.  What I
> thought might make sense is to divide the recruits into a small number of
> similar subgroups, and then do random assignment to participant and
> control status within subgroups.  In other words, sort of a stratified
> random sample, except that I don't necessarily want to use just one or two
> dimensions to define the strata, and I don't have enough cases to use lots
> of dimensions to define the strata.  I thought about using cluster
> analysis to define appropriate subgroups before randomly assigning within
> subgroup to partic and control.
>
> Martha Hewett  |
> Director of Research | 612.335.5865
> Center for Energy and Environment
> 212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
> (cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org
>
>
>
>
> From:   Gene Maguin &lt;emaguin@&gt;
> To:     SPSSX-L@.UGA
> Date:   02/23/2012 10:47 AM
> Subject:        Re: guidance on assigning cases to participant and control
> groups
> Sent by:        "SPSSX(r) Discussion" &lt;SPSSX-L@.UGA&gt;
>
>
>
> Martha,
>
> Maybe Im not understanding something but why have you rejected random
> assignment of owners to condition?
>
> Gene Maguin
>
> From: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] On Behalf Of
> Martha Hewett
> Sent: Thursday, February 23, 2012 11:14 AM
> To: SPSSX-L@.UGA
> Subject: guidance on assigning cases to participant and control groups
>
> I'm involved with a project that is recruiting 100 to 200 owners of
> multifamily properties, with 1 to 10 properties each (average 5), for an
> energy efficiency project.  Because the project's effectiveness depends in
> significant part on influencing the owners, we need to assign all of any
> given owner's properties to either participant or control status.  Beyond
> the owners, however, there are many property-specific factors that may
> affect savings, such as the age of the buildings, number of units, type of
> heating equipment, etc.  Obviously there will not be enough owners or
> properties to match the participant and control groups on all of these
> dimensions.
>
> So my question is, when you have a large number of more or less
> independent (but possibly somewhat correlated) dimensions that may affect
> outcomes relative to the size of your sample, what is the best way to
> assign cases to participant or control status?  What additional
> considerations are introduced by the fact that I have to assign
> participant/control status by owner but need to consider other factors by
> property?
>
> If someone can point me to a suitable methodology, I'm sure I can figure
> out how to apply it.
>
> And yes, I agree, wouldn't it be great if n could be bigger?
>
> Martha Hewett
> Director of Research | 612.335.5865
> Center for Energy and Environment
> 212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
> (cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/guidance-on-assigning-cases-to-participant-and-control-groups-tp5508554p5508943.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Martha Hewett
In reply to this post by Rich Ulrich
See my recent reply to Matthew Poes.  The outcome variable is total energy savings.  For each property we will have at least two years of pre-treatment energy use data as well as data during and after the treatment period.  The pre-treatment data is used to create a model of energy use as a function of weather and vacancy rate.  Post-treatment energy use is compared to the model to assess the change in usage for each property.  The question then is whether the participant properties saved more than the control properties, and how much more.  An average savings will be sufficient information for a go-no-go decision on a full scale program by the entities that are funding this pilot project.  

The savings analysis will probably be normalized per sq.ft. or per unit (apartment) so minor differences in building size aren't a problem.  However, some buildings may have more potential for savings up front - they're older and so therefore less efficient, or they have central heating systems for which more cost-effective retrofit strategies are available, etc.  So we need to be able to control for that in the analysis.

The reason the owners come into it, and not just the properties, is that they tend to take operational or retrofit strategies they find to be successful in one building and apply them to others (so we don't want an individual owner's properties divided between participant and control) and that they tend to have different incentives (if the housing is subsidized or not) and that they have different capabilities (capital, staff, etc.) to execute suggested measures.

Finally, I'd like to have all of the pre-treatment energy use data in hand and analyzed prior to assignment of the participant and control groups, since pre-treatment energy intensity is an important factor in savings potential, but the project timeline isn't going to allow that to happen.  But energy intensity is correlated to some degree with the other characteristics data (age of building, number of stories, etc) that we will have at the time of the assignment.
Martha Hewett  |
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org





From:        Rich Ulrich <[hidden email]>
To:        <[hidden email]>, SPSS list <[hidden email]>
Date:        02/23/2012 12:27 PM
Subject:        RE: guidance on assigning cases to participant and control groups




"Propensity scoring" is the general topic for matching on
multiple variables that are expected to relate to an outcome.  

I have no idea what it is that you will be measuring for outcome,
but I believe that you need to design your eventual analyses,
and start imagining what you will be able to write as conclusions,
before you can make effective use of propensity matching.

Doing a cluster analysis and matching by cluster sounds like a
way to *grope* your way forward.  That could be what you have,
if this is exploratory work.

--
Rich Ulrich



Date: Thu, 23 Feb 2012 10:59:34 -0600
From: [hidden email]
Subject: Re: guidance on assigning cases to participant and control groups
To: [hidden email]

I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches.  For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property.  And so on.  If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't.  I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc.  What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups.  In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata.  I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control.  


[snip]

Reply | Threaded
Open this post in threaded view
|

Re: guidance on assigning cases to participant and control groups

Rich Ulrich
That's interesting.  From your description, it seems that it is a tricky
question, and yet to be determined, how you will measure "total energy
savings"  as a single score for a single landlord... Standardizing for square
feet is problematic when you seek "total savings".

I never had data needing a propensity score, so I never used them, but
they have been used before randomization.  But if you won't have your
"pre" data well in hand before randomization, that could be irrelevant.

Since "savings" is the dimension of outcome, the potential for savings
is what you might use to stratify your sampling.  I suspect that 10 or 20%
of the owners will contain 90% of the complications: each may have several
properties that have diverse characteristics. 

So:  sample the easy ones by something like "total value" and "age", and
split up the complicated ones, perhaps after ranking them subjectively.
Spend your worry-time worrying about the tough ones.

--
Rich Ulrich



Date: Thu, 23 Feb 2012 15:08:16 -0600
From: [hidden email]
Subject: Re: guidance on assigning cases to participant and control groups
To: [hidden email]

See my recent reply to Matthew Poes.  The outcome variable is total energy savings.  For each property we will have at least two years of pre-treatment energy use data as well as data during and after the treatment period.  The pre-treatment data is used to create a model of energy use as a function of weather and vacancy rate.  Post-treatment energy use is compared to the model to assess the change in usage for each property.  The question then is whether the participant properties saved more than the control properties, and how much more.  An average savings will be sufficient information for a go-no-go decision on a full scale program by the entities that are funding this pilot project.  

The savings analysis will probably be normalized per sq.ft. or per unit (apartment) so minor differences in building size aren't a problem.  However, some buildings may have more potential for savings up front - they're older and so therefore less efficient, or they have central heating systems for which more cost-effective retrofit strategies are available, etc.  So we need to be able to control for that in the analysis.

The reason the owners come into it, and not just the properties, is that they tend to take operational or retrofit strategies they find to be successful in one building and apply them to others (so we don't want an individual owner's properties divided between participant and control) and that they tend to have different incentives (if the housing is subsidized or not) and that they have different capabilities (capital, staff, etc.) to execute suggested measures.

Finally, I'd like to have all of the pre-treatment energy use data in hand and analyzed prior to assignment of the participant and control groups, since pre-treatment energy intensity is an important factor in savings potential, but the project timeline isn't going to allow that to happen.  But energy intensity is correlated to some degree with the other characteristics data (age of building, number of stories, etc) that we will have at the time of the assignment.
Martha Hewett  |
Director of Research | 612.335.5865
Center for Energy and Environment
212 Third Avenue North, Suite 560 | Minneapolis, MN 55401
(cell) 612.839.2358 | (fax) 612.335.5888 | www.mncee.org





From:        Rich Ulrich <[hidden email]>
To:        <[hidden email]>, SPSS list <[hidden email]>
Date:        02/23/2012 12:27 PM
Subject:        RE: guidance on assigning cases to participant and control groups




"Propensity scoring" is the general topic for matching on
multiple variables that are expected to relate to an outcome.  

I have no idea what it is that you will be measuring for outcome,
but I believe that you need to design your eventual analyses,
and start imagining what you will be able to write as conclusions,
before you can make effective use of propensity matching.

Doing a cluster analysis and matching by cluster sounds like a
way to *grope* your way forward.  That could be what you have,
if this is exploratory work.

--
Rich Ulrich



Date: Thu, 23 Feb 2012 10:59:34 -0600
From: [hidden email]
Subject: Re: guidance on assigning cases to participant and control groups
To: [hidden email]

I'm concerned that with such a small sample, and a significant variation in types of owners and properties, random assignment might result in gross mismatches.  For example, maybe I have a large owner who has 10 high-rise subsidized housing properties, and another owner who has just one 12 unit three-story walk-up property.  And so on.  If I had a large sample, these sorts of variations would even out, but with a small sample I'm afraid they won't.  I could end up with most of the highrises in one of the two groups, most of the large (multiple property) owners in one of the two groups, most of the older buildings in one of the two groups, etc.  What I thought might make sense is to divide the recruits into a small number of similar subgroups, and then do random assignment to participant and control status within subgroups.  In other words, sort of a stratified random sample, except that I don't necessarily want to use just one or two dimensions to define the strata, and I don't have enough cases to use lots of dimensions to define the strata.  I thought about using cluster analysis to define appropriate subgroups before randomly assigning within subgroup to partic and control.  


[snip]