Transformed data

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Transformed data

Peter Spangler
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter


Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter



Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter




Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
Ryan, 

Would it be correct to say that a 1% increase in the IV would predict an average .558% increase in the DV. 
Such that : A repeat Buyer increase of .2 would predict a $32 increase in GMV

Change in DV = (.558/100)*5735 = 32.0013 

                                Unstandardized Beta
log_rb                                .558

Mean GMV = $5735
Mean Repeat Buyer = 20








On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter





Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
In reply to this post by Peter Spangler
I no little about use of statistics in economics except for the occasional example I have come across in a textbook and/or online forum. I think I dabbled in cost-benefit analyses a while back, but that was a long time ago. So bear with me for a moment...Additional questions:
 
What is the possible range of values that the dependent variable can take on (e.g., 1 dollar to infinitely many dollars, 1 dollar to a fixed upper limit, 0 to ...). I assume the values are positive integers, and that the distribution is positively skewed.
 
How in the world is the dependent variable (number of dollars spent) linked to the independent variable (repeat buyers)? In fact, what do you mean by repeat buyers? Repeat buyers of a specific product? So does that mean that each record represents a different product?
 
Sorry, but I am still not clear.
 
Ryan


On Tue, Apr 16, 2013 at 8:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter





Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
In reply to this post by Peter Spangler
Peter,
 
Without understanding your model, I will simply direct you to a specific answer with respect to interpretation:
 
 
Go to the last section of this page that discusses interpretation of regression coefficients when the DV and predictor(s) are log-transformed.
 
HTH,
 
Ryan


On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler <[hidden email]> wrote:
Ryan, 

Would it be correct to say that a 1% increase in the IV would predict an average .558% increase in the DV. 
Such that : A repeat Buyer increase of .2 would predict a $32 increase in GMV

Change in DV = (.558/100)*5735 = 32.0013 

                                Unstandardized Beta
log_rb                                .558

Mean GMV = $5735
Mean Repeat Buyer = 20








On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter






Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
In reply to this post by Ryan
Thanks for your patience Ryan. 
Range for DV is 1-5 million. Positively skewed indeed.

IV can be though of as individuals that have purchased a product from a distinct seller more than once. The greater the number of buyers that come back to purchase  from the same seller, the greater the sales. 

Sent from my iPhone

On Apr 16, 2013, at 5:59 PM, R B <[hidden email]> wrote:

I no little about use of statistics in economics except for the occasional example I have come across in a textbook and/or online forum. I think I dabbled in cost-benefit analyses a while back, but that was a long time ago. So bear with me for a moment...Additional questions:
 
What is the possible range of values that the dependent variable can take on (e.g., 1 dollar to infinitely many dollars, 1 dollar to a fixed upper limit, 0 to ...). I assume the values are positive integers, and that the distribution is positively skewed.
 
How in the world is the dependent variable (number of dollars spent) linked to the independent variable (repeat buyers)? In fact, what do you mean by repeat buyers? Repeat buyers of a specific product? So does that mean that each record represents a different product?
 
Sorry, but I am still not clear.
 
Ryan


On Tue, Apr 16, 2013 at 8:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter





Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
In reply to this post by Ryan
Yes, this section is very helpful. I guess my question remains:   if the unstandardized coefficient is .11, is it  divided by 100 to get .11% before multiplying by the mean of the DV in order to get the actual unit increase in the DV? 

Sent from my iPhone

On Apr 16, 2013, at 6:04 PM, R B <[hidden email]> wrote:

Peter,
 
Without understanding your model, I will simply direct you to a specific answer with respect to interpretation:
 
 
Go to the last section of this page that discusses interpretation of regression coefficients when the DV and predictor(s) are log-transformed.
 
HTH,
 
Ryan


On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler <[hidden email]> wrote:
Ryan, 

Would it be correct to say that a 1% increase in the IV would predict an average .558% increase in the DV. 
Such that : A repeat Buyer increase of .2 would predict a $32 increase in GMV

Change in DV = (.558/100)*5735 = 32.0013 

                                Unstandardized Beta
log_rb                                .558

Mean GMV = $5735
Mean Repeat Buyer = 20








On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter






Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
In reply to this post by Peter Spangler
Okay. I'm going to cut off this back-and-forth because it would take a long time in order to obtain all necessary information for me to provide any substantive advice (e.g., are the DV units in hundreds, thousands, millions, etc.). Let's not go down this path because time is against me. Perhaps somebody else will pick up where I have left off; I simply do not have the experience with these kinds of data. Moreover, I would need a lot more information before providing any advice.
 
I have provided you with information on how to interpret coefficients when the variables are log-transformed. Hope that information proves useful.
 
I will make a three general statements:
1. The solution in dealing with skewed data is not always a transformation (see 3rd point)
2. There are no distributional assumptions about IVs in regression, but highly skewed IVs can have certain implications. Note that if the skew is in the same or different direction between two variables influences the range of the Pearson correlation.
3. There is a large family of exponential distributions outside of the Gaussian distribution (e.g., binomial, poisson, negative binomial, gamma, beta, etc.) that may be entirely appropriate for modeling this type of data (dollars spent). There are zero-truncated variations of these models as well that may be appropriate. I'm not suggesting that linear regression is necessarily inappropriate, but you ought to familiarize yourself with generalized linear models, and see how economists commonly model these data. I wouldn't be surprised to see that they do not always transform the DV; perhaps sometimes they consider other distributions.
 
Best,
 
Ryan


On Tue, Apr 16, 2013 at 9:03 PM, Peter Spangler <[hidden email]> wrote:
Thanks for your patience Ryan. 
Range for DV is 1-5 million. Positively skewed indeed.

IV can be though of as individuals that have purchased a product from a distinct seller more than once. The greater the number of buyers that come back to purchase  from the same seller, the greater the sales. 

Sent from my iPhone

On Apr 16, 2013, at 5:59 PM, R B <[hidden email]> wrote:

I no little about use of statistics in economics except for the occasional example I have come across in a textbook and/or online forum. I think I dabbled in cost-benefit analyses a while back, but that was a long time ago. So bear with me for a moment...Additional questions:
 
What is the possible range of values that the dependent variable can take on (e.g., 1 dollar to infinitely many dollars, 1 dollar to a fixed upper limit, 0 to ...). I assume the values are positive integers, and that the distribution is positively skewed.
 
How in the world is the dependent variable (number of dollars spent) linked to the independent variable (repeat buyers)? In fact, what do you mean by repeat buyers? Repeat buyers of a specific product? So does that mean that each record represents a different product?
 
Sorry, but I am still not clear.
 
Ryan


On Tue, Apr 16, 2013 at 8:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter






Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
In reply to this post by Ryan
Correction: I *know*

May the grammar gods forgive me for the grammatically incorrect messages I post. ;-)

Rarely do I take the time to double check my messages. I usually type the message, submit, and move on....

Ryan

On Apr 16, 2013, at 8:57 PM, R B <[hidden email]> wrote:

I no little about use of statistics in economics except for the occasional example I have come across in a textbook and/or online forum. I think I dabbled in cost-benefit analyses a while back, but that was a long time ago. So bear with me for a moment...Additional questions:
 
What is the possible range of values that the dependent variable can take on (e.g., 1 dollar to infinitely many dollars, 1 dollar to a fixed upper limit, 0 to ...). I assume the values are positive integers, and that the distribution is positively skewed.
 
How in the world is the dependent variable (number of dollars spent) linked to the independent variable (repeat buyers)? In fact, what do you mean by repeat buyers? Repeat buyers of a specific product? So does that mean that each record represents a different product?
 
Sorry, but I am still not clear.
 
Ryan


On Tue, Apr 16, 2013 at 8:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter





Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

David Marso
Administrator
In reply to this post by Peter Spangler
Considering the fact that you haven't even bothered to post the actual regression model, anyone jumping further into your rabbit hole is bound to become a mad hatter!
I decline!
--
Peter Spangler wrote
Yes, this section is very helpful. I guess my question remains:   if the
unstandardized coefficient is .11, is it  divided by 100 to get .11% before
multiplying by the mean of the DV in order to get the actual unit increase
in the DV?

Sent from my iPhone

On Apr 16, 2013, at 6:04 PM, R B <[hidden email]> wrote:

Peter,

Without understanding your model, I will simply direct you to a specific
answer with respect to interpretation:

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm

Go to the last section of this page that discusses interpretation of
regression coefficients when the DV and predictor(s) are log-transformed.

HTH,

Ryan


On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler <[hidden email]>wrote:

> Ryan,
>
> Would it be correct to say that a 1% increase in the IV would predict an
> average .558% increase in the DV.
> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
> GMV
>
> Change in DV = (.558/100)*5735 = 32.0013
>
>                                 *Unstandardized Beta*
> log_rb                                .558
>
> Mean GMV = $5735
> Mean Repeat Buyer = 20
>
>
>
>
>
>
>
>
> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler <[hidden email]>wrote:
>
>> For example the original dependent variable of interest is in dollars
>> (gross market value) and IV is repeat buyers. Both scale variables. I
>> transformed them because the distribution was very skewed and for them to
>> share the same scale.
>>
>>
>> On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
>>
>>> Peter,
>>>
>>> Can you describe the dependent variables in their original form in as
>>> much detail as possible, and why you felt the need to transform them? (Keep
>>> in mind that one assumes the errors are normally distributed when
>>> performing regression analyses.)
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>>
>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]>wrote:
>>>
>>>> Dear SPSS List Folks,
>>>>
>>>> I have data that was transformed to meet the assumptions of parametric
>>>> tests. The transformation is as follows: V1..V4 -->Transformed to Log10 -->
>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>
>>>> I now have standardized and unstandardized beta coefficients from my
>>>> linear regression output that I would like to make statements about in
>>>> their original units. Is there a typical way of handling these conditions
>>>> such that I can a 1 unit increase in my IV predicts X unit increase in my
>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation increase in
>>>> my DV.
>>>>
>>>> Trying to keep this clear...
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>
>>
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Art Kendall
In reply to this post by Ryan
<insert tongue in cheek>
 
Rarely do I take the time to double check my messages
Hey you are  weakening my soapbox about always checking syntax because it is like any other kind of writing.

<remove tongue from cheek>
Art Kendall
Social Research Consultants
On 4/16/2013 11:44 PM, R B [via SPSSX Discussion] wrote:
Correction: I *know*

May the grammar gods forgive me for the grammatically incorrect messages I post. ;-)

Rarely do I take the time to double check my messages. I usually type the message, submit, and move on....

Ryan

On Apr 16, 2013, at 8:57 PM, R B <[hidden email]> wrote:

I no little about use of statistics in economics except for the occasional example I have come across in a textbook and/or online forum. I think I dabbled in cost-benefit analyses a while back, but that was a long time ago. So bear with me for a moment...Additional questions:
 
What is the possible range of values that the dependent variable can take on (e.g., 1 dollar to infinitely many dollars, 1 dollar to a fixed upper limit, 0 to ...). I assume the values are positive integers, and that the distribution is positively skewed.
 
How in the world is the dependent variable (number of dollars spent) linked to the independent variable (repeat buyers)? In fact, what do you mean by repeat buyers? Repeat buyers of a specific product? So does that mean that each record represents a different product?
 
Sorry, but I am still not clear.
 
Ryan


On Tue, Apr 16, 2013 at 8:40 PM, Peter Spangler <[hidden email]> wrote:
For example the original dependent variable of interest is in dollars (gross market value) and IV is repeat buyers. Both scale variables. I transformed them because the distribution was very skewed and for them to share the same scale.  


On Tue, Apr 16, 2013 at 5:31 PM, R B <[hidden email]> wrote:
Peter,
 
Can you describe the dependent variables in their original form in as much detail as possible, and why you felt the need to transform them? (Keep in mind that one assumes the errors are normally distributed when performing regression analyses.)
 
Thanks,
 
Ryan


On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <[hidden email]> wrote:
Dear SPSS List Folks,

I have data that was transformed to meet the assumptions of parametric tests. The transformation is as follows: V1..V4 -->Transformed to Log10 --> Saved Standard values --> Saved all as Mean=50, SD=10.

I now have standardized and unstandardized beta coefficients from my linear regression output that I would like to make statements about in their original units. Is there a typical way of handling these conditions such that I can a 1 unit increase in my IV predicts X unit increase in my DV. Or a 1 unit increase in my IV predicts X standard deviation increase in my DV.

Trying to keep this clear...

Peter








If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719518.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
In reply to this post by David Marso
The regression model is simple linear using two log transformed variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
                                                                                                           IV = nlog_rb (scale variable, the number of buyers that a seller had more than one transactions with)   

REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT nlog_gmv 
  /METHOD=ENTER nlog_rb.

Coefficients        Unstandardized Beta
nlog                                   .558



On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]> wrote:
Considering the fact that you haven't even bothered to post the actual
regression model, anyone jumping further into your rabbit hole is bound to
become a mad hatter!
I decline!
--

Peter Spangler wrote
> Yes, this section is very helpful. I guess my question remains:   if the
> unstandardized coefficient is .11, is it  divided by 100 to get .11%
> before
> multiplying by the mean of the DV in order to get the actual unit increase
> in the DV?
>
> Sent from my iPhone
>
> On Apr 16, 2013, at 6:04 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>
> Peter,
>
> Without understanding your model, I will simply direct you to a specific
> answer with respect to interpretation:
>
> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>
> Go to the last section of this page that discusses interpretation of
> regression coefficients when the DV and predictor(s) are log-transformed.
>
> HTH,
>
> Ryan
>
>
> On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Ryan,
>>
>> Would it be correct to say that a 1% increase in the IV would predict an
>> average .558% increase in the DV.
>> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
>> GMV
>>
>> Change in DV = (.558/100)*5735 = 32.0013
>>
>>                                 *Unstandardized Beta*
>> log_rb                                .558
>>
>> Mean GMV = $5735
>> Mean Repeat Buyer = 20
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>
>>> For example the original dependent variable of interest is in dollars
>>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> transformed them because the distribution was very skewed and for them
>>> to
>>> share the same scale.
>>>
>>>
>>> On Tue, Apr 16, 2013 at 5:31 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>>>
>>>> Peter,
>>>>
>>>> Can you describe the dependent variables in their original form in as
>>>> much detail as possible, and why you felt the need to transform them?
>>>> (Keep
>>>> in mind that one assumes the errors are normally distributed when
>>>> performing regression analyses.)
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>>>
>>>>> Dear SPSS List Folks,
>>>>>
>>>>> I have data that was transformed to meet the assumptions of parametric
>>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>>>> Log10 -->
>>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>>
>>>>> I now have standardized and unstandardized beta coefficients from my
>>>>> linear regression output that I would like to make statements about in
>>>>> their original units. Is there a typical way of handling these
>>>>> conditions
>>>>> such that I can a 1 unit increase in my IV predicts X unit increase in
>>>>> my
>>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>>>> increase in
>>>>> my DV.
>>>>>
>>>>> Trying to keep this clear...
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
Trying this again for clarity and completeness: My data is made up of two scale variables DV (gmv) -- in dollars and IV (repeat buyers) in persons. 
Both variables transformed to t_gmv and t_repeat_buyers: Log10 --> Z scores --> Mean = 50, SD = 10.

My goal is to calculate GMV in its original units (dollars) based on a one unit (person) in crease in Repeat Buyers. I need to essentially back transform to calculate:

t_GMV = Bo + B1 (t_repeat buyers) + E1

t_GMV = 5.37 + .426 (t_repeat buyers) + E1



REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT t_gmv 
  /METHOD=ENTER t_rb.

Coefficients        Unstandardized Beta
constant                             5.37
t_ rb                                  .426


On Wed, Apr 17, 2013 at 8:48 AM, Peter Spangler <[hidden email]> wrote:
The regression model is simple linear using two log transformed variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
                                                                                                           IV = nlog_rb (scale variable, the number of buyers that a seller had more than one transactions with)   

REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT nlog_gmv 
  /METHOD=ENTER nlog_rb.

Coefficients        Unstandardized Beta
nlog                                   .558



On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]> wrote:
Considering the fact that you haven't even bothered to post the actual
regression model, anyone jumping further into your rabbit hole is bound to
become a mad hatter!
I decline!
--

Peter Spangler wrote
> Yes, this section is very helpful. I guess my question remains:   if the
> unstandardized coefficient is .11, is it  divided by 100 to get .11%
> before
> multiplying by the mean of the DV in order to get the actual unit increase
> in the DV?
>
> Sent from my iPhone
>
> On Apr 16, 2013, at 6:04 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>
> Peter,
>
> Without understanding your model, I will simply direct you to a specific
> answer with respect to interpretation:
>
> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>
> Go to the last section of this page that discusses interpretation of
> regression coefficients when the DV and predictor(s) are log-transformed.
>
> HTH,
>
> Ryan
>
>
> On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Ryan,
>>
>> Would it be correct to say that a 1% increase in the IV would predict an
>> average .558% increase in the DV.
>> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
>> GMV
>>
>> Change in DV = (.558/100)*5735 = 32.0013
>>
>>                                 *Unstandardized Beta*
>> log_rb                                .558
>>
>> Mean GMV = $5735
>> Mean Repeat Buyer = 20
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>
>>> For example the original dependent variable of interest is in dollars
>>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> transformed them because the distribution was very skewed and for them
>>> to
>>> share the same scale.
>>>
>>>
>>> On Tue, Apr 16, 2013 at 5:31 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>>>
>>>> Peter,
>>>>
>>>> Can you describe the dependent variables in their original form in as
>>>> much detail as possible, and why you felt the need to transform them?
>>>> (Keep
>>>> in mind that one assumes the errors are normally distributed when
>>>> performing regression analyses.)
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>>>
>>>>> Dear SPSS List Folks,
>>>>>
>>>>> I have data that was transformed to meet the assumptions of parametric
>>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>>>> Log10 -->
>>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>>
>>>>> I now have standardized and unstandardized beta coefficients from my
>>>>> linear regression output that I would like to make statements about in
>>>>> their original units. Is there a typical way of handling these
>>>>> conditions
>>>>> such that I can a 1 unit increase in my IV predicts X unit increase in
>>>>> my
>>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>>>> increase in
>>>>> my DV.
>>>>>
>>>>> Trying to keep this clear...
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
Peter,
 
Okay. I've given this some thought...
 
If you take the derivative of both sides of the log-log simple regression equation w.r.t. x results in a straightforward interpretation of the unstandardized slope; that is,
 
unstandardized slope  = <unstandardized slope value> percent change in y given unit percent change in x.
 
The unstandardized slope is the point elasticity of y with respect to x. I would abandon the notion of back-transforming the unstandardized slope from a log-log simple regression since the linear relationship is on a multiplicative or percentage scale. That's how I see it; perhaps someone else will have a different perspective. Frankly, I tend to avoid transforming variables as it tends to complicate interpretation. Furthermore, there is usually a misunderstanding as to when it is appropriate to employ certain transformations, and often I find that people (not you) mistakenly transform data for the wrong reason(s) (e.g., examining the distribution of the DV as opposed to the distribution of the residuals).
 
What you should ask yourself, IMHO:
1. Did you find that the assumption(s) of a simple linear regression model did not hold when using the variables in their original forms? If so, which assumption(s) were not tenable? How did taking the log of both variables resolve the problem(s)? You will need to be able to defend these transformations if and when you submit this for peer review.
2. Further, why did you standardize the variables after the logarithmic transformations? Again, you will need to defend this decision. While I can see why someone would perform a log transformation to linearize a relationship, I really do not see why one would standardize the variables to a mean of 50 and sd of 10  after the transformation.
 
HTH,
 
Ryan


On Wed, Apr 17, 2013 at 3:45 PM, Peter Spangler <[hidden email]> wrote:
Trying this again for clarity and completeness: My data is made up of two scale variables DV (gmv) -- in dollars and IV (repeat buyers) in persons. 
Both variables transformed to t_gmv and t_repeat_buyers: Log10 --> Z scores --> Mean = 50, SD = 10.

My goal is to calculate GMV in its original units (dollars) based on a one unit (person) in crease in Repeat Buyers. I need to essentially back transform to calculate:

t_GMV = Bo + B1 (t_repeat buyers) + E1

t_GMV = 5.37 + .426 (t_repeat buyers) + E1



REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT t_gmv 
  /METHOD=ENTER t_rb.

Coefficients        Unstandardized Beta
constant                             5.37
t_ rb                                  .426


On Wed, Apr 17, 2013 at 8:48 AM, Peter Spangler <[hidden email]> wrote:
The regression model is simple linear using two log transformed variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
                                                                                                           IV = nlog_rb (scale variable, the number of buyers that a seller had more than one transactions with)   

REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT nlog_gmv 
  /METHOD=ENTER nlog_rb.

Coefficients        Unstandardized Beta
nlog                                   .558



On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]> wrote:
Considering the fact that you haven't even bothered to post the actual
regression model, anyone jumping further into your rabbit hole is bound to
become a mad hatter!
I decline!
--

Peter Spangler wrote
> Yes, this section is very helpful. I guess my question remains:   if the
> unstandardized coefficient is .11, is it  divided by 100 to get .11%
> before
> multiplying by the mean of the DV in order to get the actual unit increase
> in the DV?
>
> Sent from my iPhone
>
> On Apr 16, 2013, at 6:04 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>
> Peter,
>
> Without understanding your model, I will simply direct you to a specific
> answer with respect to interpretation:
>
> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>
> Go to the last section of this page that discusses interpretation of
> regression coefficients when the DV and predictor(s) are log-transformed.
>
> HTH,
>
> Ryan
>
>
> On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Ryan,
>>
>> Would it be correct to say that a 1% increase in the IV would predict an
>> average .558% increase in the DV.
>> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
>> GMV
>>
>> Change in DV = (.558/100)*5735 = 32.0013
>>
>>                                 *Unstandardized Beta*
>> log_rb                                .558
>>
>> Mean GMV = $5735
>> Mean Repeat Buyer = 20
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>
>>> For example the original dependent variable of interest is in dollars
>>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> transformed them because the distribution was very skewed and for them
>>> to
>>> share the same scale.
>>>
>>>
>>> On Tue, Apr 16, 2013 at 5:31 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>>>
>>>> Peter,
>>>>
>>>> Can you describe the dependent variables in their original form in as
>>>> much detail as possible, and why you felt the need to transform them?
>>>> (Keep
>>>> in mind that one assumes the errors are normally distributed when
>>>> performing regression analyses.)
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>>>
>>>>> Dear SPSS List Folks,
>>>>>
>>>>> I have data that was transformed to meet the assumptions of parametric
>>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>>>> Log10 -->
>>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>>
>>>>> I now have standardized and unstandardized beta coefficients from my
>>>>> linear regression output that I would like to make statements about in
>>>>> their original units. Is there a typical way of handling these
>>>>> conditions
>>>>> such that I can a 1 unit increase in my IV predicts X unit increase in
>>>>> my
>>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>>>> increase in
>>>>> my DV.
>>>>>
>>>>> Trying to keep this clear...
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Bruce Weaver
Administrator
I concur with Ryan's comment about people often transforming for the wrong reasons, and with the two questions he posed.  As he says, people often fail to understand that the assumptions for OLS linear regression concern the errors, not to be confused with the residuals.  I think the Wikipedia page on errors versus residuals is quite good.

  http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics

The assumptions for OLS linear regression are that the errors are independently and identically distributed as Normal with a mean of 0 and variance = sigma-squared.  In the usual notation, the errors are assumed to be i.i.d. N(0, sigma-squared).  That's it.  And as Herman Rubin has frequently reminded readers of the sci.stat.* newsgroups, the independence assumption is by far the most important one, followed by identically distributed (i.e., homoscedasticity).  One way to think of it is that if those assumptions were met perfectly, then the statistical tests associated with OLS regression would be exact tests; but as the assumptions are never met perfectly, the tests are always approximate, and the question is whether the approximation is good enough for them to be useful.  (And yes, I am thinking of what George Box said about models being wrong, but still useful.)

As Ryan pointed out in another post, if the error distribution is too far from normal, one may wish to consider a generalized linear model that employs a different error distribution (e.g., via GENLIN).

HTH.


R B wrote
Peter,

Okay. I've given this some thought...

If you take the derivative of both sides of the log-log simple regression
equation w.r.t. x results in a straightforward interpretation of the
unstandardized slope; that is,

unstandardized slope  = <unstandardized slope value> percent change in
y given unit percent change in x.

The unstandardized slope is the point elasticity of y with respect to x. I
would abandon the notion of back-transforming the unstandardized slope from
a log-log simple regression since the linear relationship is on a
multiplicative or percentage scale. That's how I see it; perhaps someone
else will have a different perspective. Frankly, I tend to avoid
transforming variables as it tends to complicate interpretation.
Furthermore, there is usually a misunderstanding as to when it is
appropriate to employ certain transformations, and often I find that people
(not you) mistakenly transform data for the wrong reason(s) (e.g.,
examining the distribution of the DV as opposed to the distribution of the
residuals).

What you should ask yourself, IMHO:
1. Did you find that the assumption(s) of a simple linear regression
model did not hold when using the variables in their original forms? If so,
which assumption(s) were not tenable? How did taking the log of both
variables resolve the problem(s)? You will need to be able to defend these
transformations if and when you submit this for peer review.
2. Further, why did you standardize the variables after the logarithmic
transformations? Again, you will need to defend this decision. While I can
see why someone would perform a log transformation to linearize a
relationship, I really do not see why one would standardize the variables
to a mean of 50 and sd of 10  after the transformation.

HTH,

Ryan


On Wed, Apr 17, 2013 at 3:45 PM, Peter Spangler <[hidden email]>wrote:

> Trying this again for clarity and completeness: My data is made up of two
> scale variables DV (gmv) -- in dollars and IV (repeat buyers) in persons.
> Both variables transformed to t_gmv and t_repeat_buyers: Log10 --> Z
> scores --> Mean = 50, SD = 10.
>
> My goal is to calculate GMV in its original units (dollars) based on a one
> unit (person) in crease in Repeat Buyers. I need to essentially back
> transform to calculate:
>
> t_GMV = Bo + B1 (t_repeat buyers) + *E*1
>
> t_GMV = 5.37 + .426 (t_repeat buyers) + *E*1
>
>
> REGRESSION
>   /MISSING LISTWISE
>   /STATISTICS COEFF OUTS R ANOVA
>   /CRITERIA=PIN(.05) POUT(.10)
>   /NOORIGIN
>   /DEPENDENT t_gmv
>   /METHOD=ENTER t_rb.
>
> *Coefficients        Unstandardized Beta*
> constant                             5.37
> t_ rb                                  .426
>
>
> On Wed, Apr 17, 2013 at 8:48 AM, Peter Spangler <[hidden email]>wrote:
>
>> The regression model is simple linear using two log transformed
>> variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
>>
>>                                  IV = nlog_rb (scale variable, the number
>> of buyers that a seller had more than one transactions with)
>>
>> REGRESSION
>>   /MISSING LISTWISE
>>   /STATISTICS COEFF OUTS R ANOVA
>>   /CRITERIA=PIN(.05) POUT(.10)
>>   /NOORIGIN
>>   /DEPENDENT nlog_gmv
>>   /METHOD=ENTER nlog_rb.
>>
>> *Coefficients        Unstandardized Beta*
>> nlog                                   .558
>>
>>
>>
>> On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]>wrote:
>>
>>> Considering the fact that you haven't even bothered to post the actual
>>> regression model, anyone jumping further into your rabbit hole is bound
>>> to
>>> become a mad hatter!
>>> I decline!
>>> --
>>>
>>> Peter Spangler wrote
>>> > Yes, this section is very helpful. I guess my question remains:   if
>>> the
>>> > unstandardized coefficient is .11, is it  divided by 100 to get .11%
>>> > before
>>> > multiplying by the mean of the DV in order to get the actual unit
>>> increase
>>> > in the DV?
>>> >
>>> > Sent from my iPhone
>>> >
>>> > On Apr 16, 2013, at 6:04 PM, R B <
>>>
>>> > ryan.andrew.black@
>>>
>>> > > wrote:
>>> >
>>> > Peter,
>>> >
>>> > Without understanding your model, I will simply direct you to a
>>> specific
>>> > answer with respect to interpretation:
>>> >
>>> >
>>> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>>> >
>>> > Go to the last section of this page that discusses interpretation of
>>> > regression coefficients when the DV and predictor(s) are
>>> log-transformed.
>>> >
>>> > HTH,
>>> >
>>> > Ryan
>>> >
>>> >
>>> > On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler <
>>>
>>> > pspangler@
>>>
>>> > >wrote:
>>> >
>>> >> Ryan,
>>> >>
>>> >> Would it be correct to say that a 1% increase in the IV would predict
>>> an
>>> >> average .558% increase in the DV.
>>> >> Such that : A repeat Buyer increase of .2 would predict a $32
>>> increase in
>>> >> GMV
>>> >>
>>> >> Change in DV = (.558/100)*5735 = 32.0013
>>> >>
>>> >>                                 *Unstandardized Beta*
>>> >> log_rb                                .558
>>> >>
>>> >> Mean GMV = $5735
>>> >> Mean Repeat Buyer = 20
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler <
>>>
>>> > pspangler@
>>>
>>> > >wrote:
>>> >>
>>> >>> For example the original dependent variable of interest is in dollars
>>> >>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> >>> transformed them because the distribution was very skewed and for
>>> them
>>> >>> to
>>> >>> share the same scale.
>>> >>>
>>> >>>
>>> >>> On Tue, Apr 16, 2013 at 5:31 PM, R B <
>>>
>>> > ryan.andrew.black@
>>>
>>> > > wrote:
>>> >>>
>>> >>>> Peter,
>>> >>>>
>>> >>>> Can you describe the dependent variables in their original form in
>>> as
>>> >>>> much detail as possible, and why you felt the need to transform
>>> them?
>>> >>>> (Keep
>>> >>>> in mind that one assumes the errors are normally distributed when
>>> >>>> performing regression analyses.)
>>> >>>>
>>> >>>> Thanks,
>>> >>>>
>>> >>>> Ryan
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler <
>>>
>>> > pspangler@
>>>
>>> > >wrote:
>>> >>>>
>>> >>>>> Dear SPSS List Folks,
>>> >>>>>
>>> >>>>> I have data that was transformed to meet the assumptions of
>>> parametric
>>> >>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>> >>>>> Log10 -->
>>> >>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>> >>>>>
>>> >>>>> I now have standardized and unstandardized beta coefficients from
>>> my
>>> >>>>> linear regression output that I would like to make statements
>>> about in
>>> >>>>> their original units. Is there a typical way of handling these
>>> >>>>> conditions
>>> >>>>> such that I can a 1 unit increase in my IV predicts X unit
>>> increase in
>>> >>>>> my
>>> >>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>> >>>>> increase in
>>> >>>>> my DV.
>>> >>>>>
>>> >>>>> Trying to keep this clear...
>>> >>>>>
>>> >>>>> Peter
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>> -----
>>> Please reply to the list and not to my personal email.
>>> Those desiring my consulting or training services please feel free to
>>> email me.
>>> ---
>>> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
>>> porcos ne forte conculcent eas pedibus suis."
>>> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff
>>> in abyssum?"
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>>
>
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Peter Spangler
In reply to this post by Ryan
Ryan and Bruce, thank you very much indeed!

After some further reading today, I better understand Ryan's interpretation that a single unit percent change in x predicts an <unstandardized slope value> percent change in y. 

The reason I transformed the data was not only to handle a horrid positive skew but to minimize the variance among scores. I believe Andy Field mentions log transformation as a way of handling data that tests significantly for Levenes test of homoscedasticity. 

Log transform of the variables, saving them as z scores and setting means and std deviations removed the different units of some of the other variables (ratios, etc) and allowed scores to be added to create an overall score that could rank cases. 


Sent from my iPhone

On Apr 17, 2013, at 6:16 PM, R B <[hidden email]> wrote:

Peter,
 
Okay. I've given this some thought...
 
If you take the derivative of both sides of the log-log simple regression equation w.r.t. x results in a straightforward interpretation of the unstandardized slope; that is,
 
unstandardized slope  = <unstandardized slope value> percent change in y given unit percent change in x.
 
The unstandardized slope is the point elasticity of y with respect to x. I would abandon the notion of back-transforming the unstandardized slope from a log-log simple regression since the linear relationship is on a multiplicative or percentage scale. That's how I see it; perhaps someone else will have a different perspective. Frankly, I tend to avoid transforming variables as it tends to complicate interpretation. Furthermore, there is usually a misunderstanding as to when it is appropriate to employ certain transformations, and often I find that people (not you) mistakenly transform data for the wrong reason(s) (e.g., examining the distribution of the DV as opposed to the distribution of the residuals).
 
What you should ask yourself, IMHO:
1. Did you find that the assumption(s) of a simple linear regression model did not hold when using the variables in their original forms? If so, which assumption(s) were not tenable? How did taking the log of both variables resolve the problem(s)? You will need to be able to defend these transformations if and when you submit this for peer review.
2. Further, why did you standardize the variables after the logarithmic transformations? Again, you will need to defend this decision. While I can see why someone would perform a log transformation to linearize a relationship, I really do not see why one would standardize the variables to a mean of 50 and sd of 10  after the transformation.
 
HTH,
 
Ryan


On Wed, Apr 17, 2013 at 3:45 PM, Peter Spangler <[hidden email]> wrote:
Trying this again for clarity and completeness: My data is made up of two scale variables DV (gmv) -- in dollars and IV (repeat buyers) in persons. 
Both variables transformed to t_gmv and t_repeat_buyers: Log10 --> Z scores --> Mean = 50, SD = 10.

My goal is to calculate GMV in its original units (dollars) based on a one unit (person) in crease in Repeat Buyers. I need to essentially back transform to calculate:

t_GMV = Bo + B1 (t_repeat buyers) + E1

t_GMV = 5.37 + .426 (t_repeat buyers) + E1



REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT t_gmv 
  /METHOD=ENTER t_rb.

Coefficients        Unstandardized Beta
constant                             5.37
t_ rb                                  .426


On Wed, Apr 17, 2013 at 8:48 AM, Peter Spangler <[hidden email]> wrote:
The regression model is simple linear using two log transformed variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
                                                                                                           IV = nlog_rb (scale variable, the number of buyers that a seller had more than one transactions with)   

REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT nlog_gmv 
  /METHOD=ENTER nlog_rb.

Coefficients        Unstandardized Beta
nlog                                   .558



On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]> wrote:
Considering the fact that you haven't even bothered to post the actual
regression model, anyone jumping further into your rabbit hole is bound to
become a mad hatter!
I decline!
--

Peter Spangler wrote
> Yes, this section is very helpful. I guess my question remains:   if the
> unstandardized coefficient is .11, is it  divided by 100 to get .11%
> before
> multiplying by the mean of the DV in order to get the actual unit increase
> in the DV?
>
> Sent from my iPhone
>
> On Apr 16, 2013, at 6:04 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>
> Peter,
>
> Without understanding your model, I will simply direct you to a specific
> answer with respect to interpretation:
>
> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>
> Go to the last section of this page that discusses interpretation of
> regression coefficients when the DV and predictor(s) are log-transformed.
>
> HTH,
>
> Ryan
>
>
> On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Ryan,
>>
>> Would it be correct to say that a 1% increase in the IV would predict an
>> average .558% increase in the DV.
>> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
>> GMV
>>
>> Change in DV = (.558/100)*5735 = 32.0013
>>
>>                                 *Unstandardized Beta*
>> log_rb                                .558
>>
>> Mean GMV = $5735
>> Mean Repeat Buyer = 20
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>
>>> For example the original dependent variable of interest is in dollars
>>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> transformed them because the distribution was very skewed and for them
>>> to
>>> share the same scale.
>>>
>>>
>>> On Tue, Apr 16, 2013 at 5:31 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>>>
>>>> Peter,
>>>>
>>>> Can you describe the dependent variables in their original form in as
>>>> much detail as possible, and why you felt the need to transform them?
>>>> (Keep
>>>> in mind that one assumes the errors are normally distributed when
>>>> performing regression analyses.)
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>>>
>>>>> Dear SPSS List Folks,
>>>>>
>>>>> I have data that was transformed to meet the assumptions of parametric
>>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>>>> Log10 -->
>>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>>
>>>>> I now have standardized and unstandardized beta coefficients from my
>>>>> linear regression output that I would like to make statements about in
>>>>> their original units. Is there a typical way of handling these
>>>>> conditions
>>>>> such that I can a 1 unit increase in my IV predicts X unit increase in
>>>>> my
>>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>>>> increase in
>>>>> my DV.
>>>>>
>>>>> Trying to keep this clear...
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Ryan
Responses are interspersed below.
On Wed, Apr 17, 2013 at 10:37 PM, Peter Spangler <[hidden email]> wrote:
Ryan and Bruce, thank you very much indeed!
 
***You are welcome.  

After some further reading today, I better understand Ryan's interpretation that a single unit percent change in x predicts an <unstandardized slope value> percent change in y. 

The reason I transformed the data was not only to handle a horrid positive skew
 
***Use of the term "horrid" suggests that you view that something is wrong with positively skewed data. It is not uncommon to observe positively skewed sample data that arise from Poisson, Negative Binomial, and other distributions.
 
but to minimize the variance among scores.
 
***Why would you want to minimize variance among scores?
 
I believe Andy Field mentions log transformation as a way of handling data that tests significantly for Levenes test of homoscedasticity. 
 
***What do you think is the source of the heteroscedasticity? I fear that you are trying to force your data to conform to meet the assumptions of OLS regression without considering other estimation methods and models.
 

Log transform of the variables, saving them as z scores and setting means and std deviations removed the different units of some of the other variables (ratios, etc) and allowed scores to be added
 
***As someone who lives in the world of psychometrics, what you just stated above is very concerning. A simple algebra trick does not give someone permission to sum scores across variables. I assume you have good reason to do so, aside from simply forcing the distributions to have the same mean and sd.
 
to create an overall score that could rank cases.  
 
***I don't recall you stating that you were ranking cases, and I have no idea how that has anything to do with the two variables you described initially (but perhaps you did). ***Anyway, I will just assume that you understand what you are doing.
 
***Good luck.
 
***Ryan
 

Sent from my iPhone

On Apr 17, 2013, at 6:16 PM, R B <[hidden email]> wrote:

Peter,
 
Okay. I've given this some thought...
 
If you take the derivative of both sides of the log-log simple regression equation w.r.t. x results in a straightforward interpretation of the unstandardized slope; that is,
 
unstandardized slope  = <unstandardized slope value> percent change in y given unit percent change in x.
 
The unstandardized slope is the point elasticity of y with respect to x. I would abandon the notion of back-transforming the unstandardized slope from a log-log simple regression since the linear relationship is on a multiplicative or percentage scale. That's how I see it; perhaps someone else will have a different perspective. Frankly, I tend to avoid transforming variables as it tends to complicate interpretation. Furthermore, there is usually a misunderstanding as to when it is appropriate to employ certain transformations, and often I find that people (not you) mistakenly transform data for the wrong reason(s) (e.g., examining the distribution of the DV as opposed to the distribution of the residuals).
 
What you should ask yourself, IMHO:
1. Did you find that the assumption(s) of a simple linear regression model did not hold when using the variables in their original forms? If so, which assumption(s) were not tenable? How did taking the log of both variables resolve the problem(s)? You will need to be able to defend these transformations if and when you submit this for peer review.
2. Further, why did you standardize the variables after the logarithmic transformations? Again, you will need to defend this decision. While I can see why someone would perform a log transformation to linearize a relationship, I really do not see why one would standardize the variables to a mean of 50 and sd of 10  after the transformation.
 
HTH,
 
Ryan


On Wed, Apr 17, 2013 at 3:45 PM, Peter Spangler <[hidden email]> wrote:
Trying this again for clarity and completeness: My data is made up of two scale variables DV (gmv) -- in dollars and IV (repeat buyers) in persons. 
Both variables transformed to t_gmv and t_repeat_buyers: Log10 --> Z scores --> Mean = 50, SD = 10.

My goal is to calculate GMV in its original units (dollars) based on a one unit (person) in crease in Repeat Buyers. I need to essentially back transform to calculate:

t_GMV = Bo + B1 (t_repeat buyers) + E1

t_GMV = 5.37 + .426 (t_repeat buyers) + E1



REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT t_gmv 
  /METHOD=ENTER t_rb.

Coefficients        Unstandardized Beta
constant                             5.37
t_ rb                                  .426


On Wed, Apr 17, 2013 at 8:48 AM, Peter Spangler <[hidden email]> wrote:
The regression model is simple linear using two log transformed variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
                                                                                                           IV = nlog_rb (scale variable, the number of buyers that a seller had more than one transactions with)   

REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT nlog_gmv 
  /METHOD=ENTER nlog_rb.

Coefficients        Unstandardized Beta
nlog                                   .558



On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]> wrote:
Considering the fact that you haven't even bothered to post the actual
regression model, anyone jumping further into your rabbit hole is bound to
become a mad hatter!
I decline!
--

Peter Spangler wrote
> Yes, this section is very helpful. I guess my question remains:   if the
> unstandardized coefficient is .11, is it  divided by 100 to get .11%
> before
> multiplying by the mean of the DV in order to get the actual unit increase
> in the DV?
>
> Sent from my iPhone
>
> On Apr 16, 2013, at 6:04 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>
> Peter,
>
> Without understanding your model, I will simply direct you to a specific
> answer with respect to interpretation:
>
> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>
> Go to the last section of this page that discusses interpretation of
> regression coefficients when the DV and predictor(s) are log-transformed.
>
> HTH,
>
> Ryan
>
>
> On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Ryan,
>>
>> Would it be correct to say that a 1% increase in the IV would predict an
>> average .558% increase in the DV.
>> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
>> GMV
>>
>> Change in DV = (.558/100)*5735 = 32.0013
>>
>>                                 *Unstandardized Beta*
>> log_rb                                .558
>>
>> Mean GMV = $5735
>> Mean Repeat Buyer = 20
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>
>>> For example the original dependent variable of interest is in dollars
>>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> transformed them because the distribution was very skewed and for them
>>> to
>>> share the same scale.
>>>
>>>
>>> On Tue, Apr 16, 2013 at 5:31 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>>>
>>>> Peter,
>>>>
>>>> Can you describe the dependent variables in their original form in as
>>>> much detail as possible, and why you felt the need to transform them?
>>>> (Keep
>>>> in mind that one assumes the errors are normally distributed when
>>>> performing regression analyses.)
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>>>
>>>>> Dear SPSS List Folks,
>>>>>
>>>>> I have data that was transformed to meet the assumptions of parametric
>>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>>>> Log10 -->
>>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>>
>>>>> I now have standardized and unstandardized beta coefficients from my
>>>>> linear regression output that I would like to make statements about in
>>>>> their original units. Is there a typical way of handling these
>>>>> conditions
>>>>> such that I can a 1 unit increase in my IV predicts X unit increase in
>>>>> my
>>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>>>> increase in
>>>>> my DV.
>>>>>
>>>>> Trying to keep this clear...
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD




Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

David Marso
Administrator
I believe I mentioned a rabbit hole?
One reason I rarely involve myself with stat discussions on X-L...
Many unstated assumptions and background issues which take up too much precious time.  I decline on grounds of sanity and hair preservation related issues.


On Wed, Apr 17, 2013 at 11:36 PM, R B [via SPSSX Discussion] <[hidden email]> wrote:
Responses are interspersed below.
On Wed, Apr 17, 2013 at 10:37 PM, Peter Spangler <[hidden email]> wrote:
Ryan and Bruce, thank you very much indeed!
 
***You are welcome.  

After some further reading today, I better understand Ryan's interpretation that a single unit percent change in x predicts an <unstandardized slope value> percent change in y. 

The reason I transformed the data was not only to handle a horrid positive skew
 
***Use of the term "horrid" suggests that you view that something is wrong with positively skewed data. It is not uncommon to observe positively skewed sample data that arise from Poisson, Negative Binomial, and other distributions.
 
but to minimize the variance among scores.
 
***Why would you want to minimize variance among scores?
 
I believe Andy Field mentions log transformation as a way of handling data that tests significantly for Levenes test of homoscedasticity. 
 
***What do you think is the source of the heteroscedasticity? I fear that you are trying to force your data to conform to meet the assumptions of OLS regression without considering other estimation methods and models.
 

Log transform of the variables, saving them as z scores and setting means and std deviations removed the different units of some of the other variables (ratios, etc) and allowed scores to be added
 
***As someone who lives in the world of psychometrics, what you just stated above is very concerning. A simple algebra trick does not give someone permission to sum scores across variables. I assume you have good reason to do so, aside from simply forcing the distributions to have the same mean and sd.
 
to create an overall score that could rank cases.  
 
***I don't recall you stating that you were ranking cases, and I have no idea how that has anything to do with the two variables you described initially (but perhaps you did). ***Anyway, I will just assume that you understand what you are doing.
 
***Good luck.
 
***Ryan
 

Sent from my iPhone

On Apr 17, 2013, at 6:16 PM, R B <[hidden email]> wrote:

Peter,
 
Okay. I've given this some thought...
 
If you take the derivative of both sides of the log-log simple regression equation w.r.t. x results in a straightforward interpretation of the unstandardized slope; that is,
 
unstandardized slope  = <unstandardized slope value> percent change in y given unit percent change in x.
 
The unstandardized slope is the point elasticity of y with respect to x. I would abandon the notion of back-transforming the unstandardized slope from a log-log simple regression since the linear relationship is on a multiplicative or percentage scale. That's how I see it; perhaps someone else will have a different perspective. Frankly, I tend to avoid transforming variables as it tends to complicate interpretation. Furthermore, there is usually a misunderstanding as to when it is appropriate to employ certain transformations, and often I find that people (not you) mistakenly transform data for the wrong reason(s) (e.g., examining the distribution of the DV as opposed to the distribution of the residuals).
 
What you should ask yourself, IMHO:
1. Did you find that the assumption(s) of a simple linear regression model did not hold when using the variables in their original forms? If so, which assumption(s) were not tenable? How did taking the log of both variables resolve the problem(s)? You will need to be able to defend these transformations if and when you submit this for peer review.
2. Further, why did you standardize the variables after the logarithmic transformations? Again, you will need to defend this decision. While I can see why someone would perform a log transformation to linearize a relationship, I really do not see why one would standardize the variables to a mean of 50 and sd of 10  after the transformation.
 
HTH,
 
Ryan


On Wed, Apr 17, 2013 at 3:45 PM, Peter Spangler <[hidden email]> wrote:
Trying this again for clarity and completeness: My data is made up of two scale variables DV (gmv) -- in dollars and IV (repeat buyers) in persons. 
Both variables transformed to t_gmv and t_repeat_buyers: Log10 --> Z scores --> Mean = 50, SD = 10.

My goal is to calculate GMV in its original units (dollars) based on a one unit (person) in crease in Repeat Buyers. I need to essentially back transform to calculate:

t_GMV = Bo + B1 (t_repeat buyers) + E1

t_GMV = 5.37 + .426 (t_repeat buyers) + E1



REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT t_gmv 
  /METHOD=ENTER t_rb.

Coefficients        Unstandardized Beta
constant                             5.37
t_ rb                                  .426


On Wed, Apr 17, 2013 at 8:48 AM, Peter Spangler <[hidden email]> wrote:
The regression model is simple linear using two log transformed variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
                                                                                                           IV = nlog_rb (scale variable, the number of buyers that a seller had more than one transactions with)   

REGRESSION 
  /MISSING LISTWISE 
  /STATISTICS COEFF OUTS R ANOVA 
  /CRITERIA=PIN(.05) POUT(.10) 
  /NOORIGIN 
  /DEPENDENT nlog_gmv 
  /METHOD=ENTER nlog_rb.

Coefficients        Unstandardized Beta
nlog                                   .558



On Tue, Apr 16, 2013 at 9:00 PM, David Marso <[hidden email]> wrote:
Considering the fact that you haven't even bothered to post the actual
regression model, anyone jumping further into your rabbit hole is bound to
become a mad hatter!
I decline!
--

Peter Spangler wrote
> Yes, this section is very helpful. I guess my question remains:   if the
> unstandardized coefficient is .11, is it  divided by 100 to get .11%
> before
> multiplying by the mean of the DV in order to get the actual unit increase
> in the DV?
>
> Sent from my iPhone
>
> On Apr 16, 2013, at 6:04 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>
> Peter,
>
> Without understanding your model, I will simply direct you to a specific
> answer with respect to interpretation:
>
> http://www.ats.ucla.edu/stat/mult_pkg/faq/general/log_transformed_regression.htm
>
> Go to the last section of this page that discusses interpretation of
> regression coefficients when the DV and predictor(s) are log-transformed.
>
> HTH,
>
> Ryan
>
>
> On Tue, Apr 16, 2013 at 8:50 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Ryan,
>>
>> Would it be correct to say that a 1% increase in the IV would predict an
>> average .558% increase in the DV.
>> Such that : A repeat Buyer increase of .2 would predict a $32 increase in
>> GMV
>>
>> Change in DV = (.558/100)*5735 = 32.0013
>>
>>                                 *Unstandardized Beta*
>> log_rb                                .558
>>
>> Mean GMV = $5735
>> Mean Repeat Buyer = 20
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 16, 2013 at 5:40 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>
>>> For example the original dependent variable of interest is in dollars
>>> (gross market value) and IV is repeat buyers. Both scale variables. I
>>> transformed them because the distribution was very skewed and for them
>>> to
>>> share the same scale.
>>>
>>>
>>> On Tue, Apr 16, 2013 at 5:31 PM, R B &lt;

> ryan.andrew.black@

> &gt; wrote:
>>>
>>>> Peter,
>>>>
>>>> Can you describe the dependent variables in their original form in as
>>>> much detail as possible, and why you felt the need to transform them?
>>>> (Keep
>>>> in mind that one assumes the errors are normally distributed when
>>>> performing regression analyses.)
>>>>
>>>> Thanks,
>>>>
>>>> Ryan
>>>>
>>>>
>>>> On Tue, Apr 16, 2013 at 8:17 PM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>>>>
>>>>> Dear SPSS List Folks,
>>>>>
>>>>> I have data that was transformed to meet the assumptions of parametric
>>>>> tests. The transformation is as follows: V1..V4 -->Transformed to
>>>>> Log10 -->
>>>>> Saved Standard values --> Saved all as Mean=50, SD=10.
>>>>>
>>>>> I now have standardized and unstandardized beta coefficients from my
>>>>> linear regression output that I would like to make statements about in
>>>>> their original units. Is there a typical way of handling these
>>>>> conditions
>>>>> such that I can a 1 unit increase in my IV predicts X unit increase in
>>>>> my
>>>>> DV. Or a 1 unit increase in my IV predicts X standard deviation
>>>>> increase in
>>>>> my DV.
>>>>>
>>>>> Trying to keep this clear...
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719519.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD







If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Transformed-data-tp5719509p5719536.html
To unsubscribe from Transformed data, click here.
NAML

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Transformed data

Richard Ristow
In reply to this post by Peter Spangler
At 11:48 AM 4/17/2013, Peter Spangler wrote:

>The regression model is simple linear using two log transformed
>variables: DV = nlog_gmv (scale variable in dollars, $1 - $5 million)
>
>IV = nlog_rb (scale variable, the number of buyers that a seller had
>more than one transactions with)

All right. First of all, others have noted that it's doubtful
practice to transform variables to make them 'look' better -- to
reduce skewness, for example. In addition to commonly being
statistically inadvisable, it has the great drawback you've run into:
when you make a transformation that doesn't have theoretical backing,
you have a hard time understanding what the resulting model means.

Now, there are legitimate reasons to transform variables, especially
when theory supports the transformation. In particular, when a
variable has a very wide dynamic range (ratio of largest to smallest
values), and the behavior over the whole range is of interest, a log
transformation is frequently recommended. Taking the log
transformation asserts, implicitly, that the same percentage change
is about equally important over the whole range; and that the same
absolute change is less important toward the high end of the range.
There are often good reasons to accept this.

Your case, where the dynamic range is 5,000,000:1, is a good one for
log transformation.

Now, you're also log-transforming your independent variable. That
gives you a power model: the model

log(gmv)=a*log(rv)+b

corresponds (taking anti-logs) to

gmv = b*(rv**a)

If you're fitting such a model, make sure it makes theoretical sense.

Often in a case like yours, one log-transforms only the independent
variable, fitting model

log(gmv)=a*rv + b

That corresponds to an exponential growth model,

gmv = exp(b)*exp(a)**rv = B*A**rv

and it's one you may well consider, depending on the theory you're
working from and the particulars of variable rv.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD