interactive dummies

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

interactive dummies

Dimitris Nikolaou
Hello to everybody!

I have used in my regression a dummy for marriage: married, single, divorced
(the base dummy), sex (female=0 and male=1), and if someone has any
children: child, nochild(base dummy) and other. I created an interactive
dummy if someone is man and married (sex*married) and another ifsomeone is a
man and has children (man*child).
My first question is to which I compare each one of them. For example for
the first case I compare the married man to those men not married or to
women?
Also, I will use in my regression both the interactive and the first dummies
(sex*married, sex and married)?
Finally, can I use interactive dummies for both sexes at the same time? By
this I mean, can I have in my regression male*married, male*child,
woman*married and woman*child?

Thank you very much for your valuable help!
Dimitris!

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Reply | Threaded
Open this post in threaded view
|

Re: interactive dummies

Sean McKenzie
Here's another one I get to add two cents on.  I'm a time series guy.

Suppose someone decides to adjust for seasonality by in monthly data by
inserting 0/1 dummies for each month of the year.  If you run that with also
a constant of regression, a vector of 1's, you get an error message/it
doesn't run...the error message in my old eviews program was Near Singular
Matrix....

Ok that's perfect multicollinearity...

The problem with too many dummies is you start to get near
multicollinearity, it will depend on your data, but when you have mutually
exclusive categories like months of the year you will get the singularity
problem once the number of dummies equals the number of categies...then you
can dump your regression constant and it will run, but now what you are
actually calculating is not some supposed effect of your dummies, but the
combined constant of regression/dummy....

essentially its a multicollinearity/near multicollinearity problem, a too
many variables problem....

What you may want to do for your analysis is run it using just one or a few
dummies at a time...

Technically you may be able to do something and the machine will calculate
it, but it is usually not a good idea, not in anything I have done.  Much of
my panel data knowledge comes form the current population survey et al
analysing participation in the labor force  by sex, but then we also account
for race et al...

You have People who are black/white and male/female so 4 combinations of
categories, then you also have is someone living in an SMSA (they don't
actually use that word any more).

You can put dummies on 3 of the 4 race/sex categories, but you don't
neccessarily want to.  If you put dummies on all 4 categeries, which are
mutually exclusive but also exhaustive (every one is in exactly one categor)
then you have produce a convex combination of the constant of regression, a
vector of ones, in this case the sum of your dummy vectors is a vector of
ones.

The SMSA dummy can in general be used with anything because it is not
perfectly correlated with any of the other dummies or any combination of
them, but you may get the problem of it is very correlated with something
else.

Many things related to race/sex have issues because race and sex are
frequently highly correlated with other variables.  One of the problems will
be is that as you use different mixtures of dummies, you can potentially
manipulate any particular dummy to be what you want.  The other issue is the
usual of when you add a variable you improve your likelihood function or R2
et al, but now your coefficients which had previously been good are bad.

What groups you wish to compare depends on what question you are trying to
answer, but why not compare all groups, this is as they say, serendipitous.
What are you interested in, the difference between married and unmarried,
between men and women, or the interaction of the two, i.e., how the effect
of being married is different for men than it is for women, based on US data
that I have seen marriage is very 'good' for men, and somewhat dubious for
women.  Children tend to be 'good' for married of either sex, and very very
'bad' for women, where the reference good/bad refers to miscellaneous social
indicators, especially income/poverty status.  Supposedly the age of the
women when she got married has a high positive correlation with 'good', but
I have never worked on a data set that had that information since graduate
school.

>this I mean, can I have in my regression male*married, male*child,
>woman*married and woman*child<

These do not appear to mutually exclusive and exhaustive categories, so you
will not have the perfect no go problems, but you could have the near no go
problems.

In these day and age, as my Grandmother used to say, and especially where I
am, Alaska, I would not be surprised if the probability that someone,
especially women, had a child (child living with them) was higher for the
unmarried than for the married, but if your population is such that all or
nearly all of your people with children are also married, you may be hitting
yet another multicollinearity type problem.

I would suggest that in general 4 dummies within the same set of relatively
few related categories is a lot.  You could I expect technically do it, but
good idea is another thing.  Try the assorted permutations of your
regressions and see what kind of results you get, and what coefficients are
stable across permutations and which are not.  Consider the question you are
trying to answer and the known characteristics of your population.

Hope that was sufficiently confusing.



>From: Dimitris Nikolaou <[hidden email]>
>Reply-To: Dimitris Nikolaou <[hidden email]>
>To: [hidden email]
>Subject: interactive dummies
>Date: Thu, 5 Oct 2006 19:25:28 +0300
>
>Hello to everybody!
>
>I have used in my regression a dummy for marriage: married, single,
>divorced
>(the base dummy), sex (female=0 and male=1), and if someone has any
>children: child, nochild(base dummy) and other. I created an interactive
>dummy if someone is man and married (sex*married) and another ifsomeone is
>a
>man and has children (man*child).
>My first question is to which I compare each one of them. For example for
>the first case I compare the married man to those men not married or to
>women?
>Also, I will use in my regression both the interactive and the first
>dummies
>(sex*married, sex and married)?
>Finally, can I use interactive dummies for both sexes at the same time? By
>this I mean, can I have in my regression male*married, male*child,
>woman*married and woman*child?
>
>Thank you very much for your valuable help!
>Dimitris!
>
>_________________________________________________________________
>Express yourself instantly with MSN Messenger! Download today it's FREE!
>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/