transformation of variable into a normally distributed variable

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

transformation of variable into a normally distributed variable

drfg2008
I would like to transform a random variable which is not normally distributed, into a normal distributed random variable (if possible). I have different transformation functions, especially based on the logarithm of the variable. Depending on the type of distribution, there are supposed to be so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does anyone know of suitable transformations or has anyone heard of a 'Box-Cox' transformation?

Thanks
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Fwd: transformation of variable into a normally distributed variable

Matthias Spörrle
Hi Frank,

you might want to try
http://epm.sagepub.com/content/55/4/625.abstract

It helped me on various occasions.

HTH
Matthias



---------- Forwarded message ----------
From: drfg2008 <[hidden email]>
Date: Sun, May 1, 2011 at 9:31 PM
Subject: transformation of variable into a normally distributed variable
To: [hidden email]


I would like to transform a random variable which is not normally
distributed, into a normal distributed random variable (if possible). I have
different transformation functions, especially based on the logarithm of the
variable. Depending on the type of distribution, there are supposed to be
so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does
anyone know of suitable transformations or has anyone heard of a 'Box-Cox'
transformation?

Thanks

-----Dr. Frank Gaeth
FU Berlin
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/transformation-of-variable-into-a-normally-distributed-variable-tp4363370p4363370.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

Jon K Peck
In reply to this post by drfg2008
The Box-Cox transformation is available in the Data Preparation Option.  If you have that, look for
Transform>Prepare Data for Modeling>Interactive.  You will find it on the Rescale tab under Continuous Targets or in the ADP command.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        drfg2008 <[hidden email]>
To:        [hidden email]
Date:        05/01/2011 01:34 PM
Subject:        [SPSSX-L] transformation of variable into a normally distributed              variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I would like to transform a random variable which is not normally
distributed, into a normal distributed random variable (if possible). I have
different transformation functions, especially based on the logarithm of the
variable. Depending on the type of distribution, there are supposed to be
so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does
anyone know of suitable transformations or has anyone heard of a 'Box-Cox'
transformation?

Thanks

-----Dr. Frank Gaeth
FU Berlin
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/transformation-of-variable-into-a-normally-distributed-variable-tp4363370p4363370.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

Mike
In reply to this post by drfg2008
A google search will turn up many hits for Box-Cox transformation as
will a search of scholar.google.com.  There is a Wikipedia entry titled
"Power Transform" that give brief coverage of Box-Cox transformations.
see: http://en.wikipedia.org/wiki/Power_transform

A somewhat more informative presentation is given by Gerson in his
notes on assumptions underlying linear model analysis.  He discusses
several different types of transformation in addition to Box-Cox; see:
http://faculty.chass.ncsu.edu/garson/PA765/assumpt.htm

A more technical review of Box-Cox transformations is provided by
 the SpringerLink website for its Encyclopedia of Mathematics; see:
http://eom.springer.de/B/b110790

If you have the SPSS manual/PDF for Data Preparation, there is a
section on how to get Box-Cox transformation in the "Rescale Fields"
part of Automated Data Preparation  or ADP procedure  (for v19,
see page 26; for a syntax example see pp86-97).  I presume there
are additional sources on how SPSS does Box-Cox as well as
how to set up your own equations.

There appears to be a sizable literature on Box-Cox transformation
but you should decide whether these are relevant to your needs.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "drfg2008" <[hidden email]>
To: <[hidden email]>
Sent: Sunday, May 01, 2011 3:31 PM
Subject: transformation of variable into a normally distributed variable


>I would like to transform a random variable which is not normally
> distributed, into a normal distributed random variable (if possible). I have
> different transformation functions, especially based on the logarithm of the
> variable. Depending on the type of distribution, there are supposed to be
> so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does
> anyone know of suitable transformations or has anyone heard of a 'Box-Cox'
> transformation?
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

Garry Gelade
In reply to this post by Jon K Peck

Hi Frank

 

Hi Frank

 

The Box-Cox transformation estimates the power to which the DV must be raised to minimize the mean square error

in a regression.   In my way of doing (probably by now ages out-of-date!) you need to specify the DV and predictors. 

ADP requires you to specify a DV (target), but I’m not sure how it knows what your IVs are.

 

If you haven’t got ADP I can let you have a macro.

 

Garry Gelade

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: 01 May 2011 21:27
To: [hidden email]
Subject: Re: transformation of variable into a normally distributed variable

 

The Box-Cox transformation is available in the Data Preparation Option.  If you have that, look for
Transform>Prepare Data for Modeling>Interactive.  You will find it on the Rescale tab under Continuous Targets or in the ADP command.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        drfg2008 <[hidden email]>
To:        [hidden email]
Date:        05/01/2011 01:34 PM
Subject:        [SPSSX-L] transformation of variable into a normally distributed              variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





I would like to transform a random variable which is not normally
distributed, into a normal distributed random variable (if possible). I have
different transformation functions, especially based on the logarithm of the
variable. Depending on the type of distribution, there are supposed to be
so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does
anyone know of suitable transformations or has anyone heard of a 'Box-Cox'
transformation?

Thanks

-----Dr. Frank Gaeth
FU Berlin
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/transformation-of-variable-into-a-normally-distributed-variable-tp4363370p4363370.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

Bruce Weaver
Administrator
In reply to this post by drfg2008
What kind of model do you want to use, and what role does this non-normal variable play in it (i.e., explanatory variable or outcome)?  What does the distribution look like for the non-normal variable?


drfg2008 wrote
I would like to transform a random variable which is not normally distributed, into a normal distributed random variable (if possible). I have different transformation functions, especially based on the logarithm of the variable. Depending on the type of distribution, there are supposed to be so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does anyone know of suitable transformations or has anyone heard of a 'Box-Cox' transformation?

Thanks
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

Rich Ulrich
In reply to this post by drfg2008
What is suitable?  The best guide to that, in my own experience,
is information about how the numbers are generated.  The textbooks
by John Tukey (et al.) include unusually good discussions.  Counts
often deserve square-root, distances or latencies may deserve
reciprocals, and so on.  The further guide to what is appropriate
is, What transformation yields "equal intervals"  within the context
of what you are modeling?  For example if "twice as much"  is
a natural way to describe equal differences, then the log is apt
to be the appropriate transformation.

Box-Cox transformations, as Tukey discusses, are applied to scales
that have a natural zero - real "quantities," for instance. You may
want to re-score a Test where subjects score near 100-max  as the
score of errors (near zero).  However, a scale with both a minimum
and a maximum (and scores near both extremes) may deserve a "folded"
transformation, such as the logit.

>
> I would like to transform a random variable which is not normally
> distributed, into a normal distributed random variable (if possible). I have
> different transformation functions, especially based on the logarithm of the
> variable. Depending on the type of distribution, there are supposed to be
> so-called ‘optimal transformations’ (eg Box-Cox transformation ??). Does
> anyone know of suitable transformations or has anyone heard of a 'Box-Cox'
> transformation?

--
Rich Ulrich


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

drfg2008
Thanks everyone!

I didn't know that from SPSS 19 upwards there is a Box-Cox transformation available. My version is 17.
Thank you also for the literature, which I'm checking right now, and thank you for the hint concerning the external VB-Program (I try to keep the programming completely within the syntax).


Just an additional question to that:

If I compute the Rank Variables (Blom) method (over a skewed distribution) like that:

RANK VARIABLES=zv1 (A)
  /NORMAL
  /PRINT=NO
  /TIES=MEAN
  /FRACTION=BLOM.

I get a N~distributed variable with µ=0 and s=1. And let's assume I would want to do a test on location. (Since I do not have V19): Would it be acceptable to do a transformation of the DV with Blom first and then compute a GLM (for example) over the DV.

(Remark: Spearman is nothing more than Pearson over the ranks and the U-Test is in its result very similar to a t-Test over the ranks.)

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

Bruce Weaver
Administrator
This partially answers the question I asked earlier--i.e., I think you want to run an OLS linear model with the non-normal variable as the dependent.  If so, the first thing to do is run the model using the raw variable, and then examine the residuals.  It is the errors (which are estimated by the residuals*) that are assumed to be normal, not the variable itself. If the residuals are too non-normal for comfort (or if they are too heteroscedastic), then start looking for a transformation.

* http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics

HTH.


drfg2008 wrote
Thanks everyone!

I didn't know that from SPSS 19 upwards there is a Box-Cox transformation available. My version is 17.
Thank you also for the literature, which I'm checking right now, and thank you for the hint concerning the external VB-Program (I try to keep the programming completely within the syntax).


Just an additional question to that:

If I compute the Rank Variables (Blom) method (over a skewed distribution) like that:

RANK VARIABLES=zv1 (A)
  /NORMAL
  /PRINT=NO
  /TIES=MEAN
  /FRACTION=BLOM.

I get a N~distributed variable with µ=0 and s=1. And let's assume I would want to do a test on location. (Since I do not have V19): Would it be acceptable to do a transformation of the DV with Blom first and then compute a GLM (for example) over the DV.

(Remark: Spearman is nothing more than Pearson over the ranks and the U-Test is in its result very similar to a t-Test over the ranks.)

Frank
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

drfg2008
Thanks Bruce,

I think about using Blom and/or Box-Cox transformation together with the CHAID algorithm. Since CHAID consists of ANOVA (if DV is metric) or Chi-Square (if DV is descrete), it could make a difference in the performance of CHAID. Especially if you have skewed distributions, where ANOVA  is not robust.
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: transformation of variable into a normally distributed variable

drbk
In reply to this post by Garry Gelade
Hi Jon Peck (Senior Software Engineer, IBM) and Mike Palij (New York University)

I hope that you guys are still monitoring this old thread of discussion.

I note your response on the Box-Cox procedure on SPSS.

1. How do we find out what Lamda was used in the transformation?
I am using the version 20 Premium edition of SPSS.


2. I also have the version 19 Base package of SPSS on another computer. How do I write a script/ Macro to do the transformation?


3. Does anyone know how to do other transformation besides Box-Cox on SPSS: e.g. Atksinson's score test (1973) for transforming responses and Box and Tidwell (1962) for transforming predictors?



Ref:
Atkinson, A. C. (1973) Testing transformations to normality. J. R. Statist. Soc. B, 35, 473–479.
Box, G.E.P. and Tidwell, P.W. (1962) Transformations of the independent variables. Technometrics 4 531-550.





Thanks!
Ben