Statistics Challenge: Does analysis metric matter? Are normal based methods robust?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Statistics Challenge: Does analysis metric matter? Are normal based methods robust?

Kornbrot, Diana
Greetings and apologies for cross-posting

It is often claimed that normal based methods such as linear regression are 'robust' and do not give misleading results, even when data are far from normally distributed.

To investigate this claim, several real data sets have been analysed: both using normal based methods and using methods based on various non-normal distributions. The first scenario, Scenario 1 is given below.

We want to compare the actual concordance of two alternative methods with the predictions of statistical practitioners, such as the committed users of this list. So we are asking for your  predictions about concordance for various scenarios.

Scenario 1: Multiple linear regression is performed with a raw and a transformed metric. 
                              Predict % agreement between results from the 2 metrics
Analyst want to know which of 21 features significantly predict overall satisfaction
Raw metric is proportion of respondents favourable, p
BUT p is not & can not be normally distributed. So an alternative is the inverse normal, z, corresponding to p.
Best subset linear regression was conducted for 51 separate units: a. using p as metric. b. using z as metric.

Concordance Question: How much difference does it make?
Predict from all the significant predictors, what:
% same predictors significant at 95% cl for both p and z
% predictors only significant for p
% predictors only significant for z.
Please give your expert predictions at https://www.surveymonkey.com/s/9SY7V7Z

Dissemination of Results
The actual concordance and a summary of the predicted concordance of experts will be published on 16 Feb 2014 at   http://dianakornbrot.wordpress.com/projects/methods-matter/

Many thanks for reading this long screed. Comments on the project are very welcome.

best

Diana
_____________________

Professor Diana Kornbrot
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice: +44 (0) 170 728 4626
email:   [hidden email]
skype:  kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:    +44 (0) 208 444 2081

Reply | Threaded
Open this post in threaded view
|

Re: Statistics Challenge: Does analysis metric matter? Are normal based methods robust?

Maguin, Eugene

Diana,

After I read your posting I looked at your website and while I think I understand the overall question you are asking, I don’t understand the construction of your dataset. To summarize the dataset: People rated 21 features of something and also rated their overall satisfaction, all on a 1-5 scale. Feature ratings were recoded 1-3=0, 4,5=1. There seem to have been 51 groups of people. Each group is analyzed separately because different relationships may be expected in each group. This I don’t get: It seems that the feature ratings were converted to either proportions or “z’s” via an inverse normal distribution mapping of the proportions. Either way, haven’t you converted your 51*n(g) dataset to a N=51 dataset.

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kornbrot, Diana
Sent: Tuesday, January 28, 2014 7:36 AM
To: [hidden email]
Subject: Statistics Challenge: Does analysis metric matter? Are normal based methods robust?

 

Greetings and apologies for cross-posting

 

It is often claimed that normal based methods such as linear regression are 'robust' and do not give misleading results, even when data are far from normally distributed.

 

To investigate this claim, several real data sets have been analysed: both using normal based methods and using methods based on various non-normal distributions. The first scenario, Scenario 1 is given below.

 

We want to compare the actual concordance of two alternative methods with the predictions of statistical practitioners, such as the committed users of this list. So we are asking for your  predictions about concordance for various scenarios.

 

Scenario 1: Multiple linear regression is performed with a raw and a transformed metric. 

                              Predict % agreement between results from the 2 metrics
Analyst want to know which of 21 features significantly predict overall satisfaction
Raw metric is proportion of respondents favourable, p
BUT p is not & can not be normally distributed. So an alternative is the inverse normal, z, corresponding to p.
Best subset linear regression was conducted for 51 separate units: a. using p as metric. b. using z as metric.

Concordance Question: How much difference does it make?
Predict from all the significant predictors, what:

% same predictors significant at 95% cl for both p and z

% predictors only significant for p

% predictors only significant for z.
Please give your expert predictions at https://www.surveymonkey.com/s/9SY7V7Z

 

Dissemination of Results

The actual concordance and a summary of the predicted concordance of experts will be published on 16 Feb 2014 at   http://dianakornbrot.wordpress.com/projects/methods-matter/

 

Many thanks for reading this long screed. Comments on the project are very welcome.

 

best

 

Diana

_____________________

 

Professor Diana Kornbrot

Work

University of Hertfordshire

College Lane, Hatfield, Hertfordshire AL10 9AB, UK

voice:       +44 (0) 170 728 4626

email:     [hidden email]

skype:         kornbrotme

Home

19 Elmhurst Avenue

London N2 0LT, UK

voice:    +44 (0) 208 444 2081

 

Reply | Threaded
Open this post in threaded view
|

Re: Statistics Challenge: Does analysis metric matter? Are normal based methods robust?

Rich Ulrich
In reply to this post by Kornbrot, Diana
Complaints:
1. The importance of transformations is especially seen in
data where there are outliers. You don't have outliers, to
speak of, when your initial data are Likert items.  After
dichotomizing and choosing (yes/no) whether to use Probits,
no difference is likely to matter so long as the proportions
are between 20% and 80%.

2. Since data are measured with Likert scaling of the items,
it seems that the natural comparison would be between
analyzing the data either (a) while assuming that they
are normal, by the usual regression; or (b) while assuming
that they should be rank-transformed, by performing regression
on the rank-transformed version of the raw scores.

It is well-known that when there is sufficient power, artifacts
of rank-transformed data tend to introduce extra variables
(including suppressors) in order to account for the bad fit
at the extremes -- assuming that you have the large amount of
data needed to find reproducible predictors.

3. "Best subset linear regression" is such a bad idea that
this project should be rejected for the fact that seems to
imply that it is *not* (almost always) a bad idea.

--
Rich Ulrich

________________________________

> Date: Tue, 28 Jan 2014 12:36:02 +0000
> From: [hidden email]
> Subject: Statistics Challenge: Does analysis metric matter? Are normal
> based methods robust?
> To: [hidden email]
>
> Greetings and apologies for cross-posting
>
> It is often claimed that normal based methods such as linear regression
> are 'robust' and do not give misleading results, even when data are far
> from normally distributed.
>
> To investigate this claim, several real data sets have been analysed:
> both using normal based methods and using methods based on various
> non-normal distributions. The first scenario, Scenario 1 is given
> below.
>
> We want to compare the actual concordance of two alternative methods
> with the predictions of statistical practitioners, such as the
> committed users of this list. So we are asking for your  predictions
> about concordance for various scenarios.
>
> Scenario 1: Multiple linear regression is performed with a raw and a
> transformed metric.
>                                Predict % agreement between results from
> the 2 metrics
> Analyst want to know which of 21 features significantly predict overall
> satisfaction
> Raw metric is proportion of respondents favourable, p
> BUT p is not & can not be normally distributed. So an alternative is
> the inverse normal, z, corresponding to p.
> Best subset linear regression was conducted for 51 separate units: a.
> using p as metric. b. using z as metric.
>
> Concordance Question: How much difference does it make?
> Predict from all the significant predictors, what:
> % same predictors significant at 95% cl for both p and z
> % predictors only significant for p
> % predictors only significant for z.
> Please give your expert predictions at https://www.surveymonkey.com/s/9SY7V7Z
> More details
> about project at:  http://dianakornbrot.wordpress.com/projects/methods-matter/
>
> Dissemination of Results
> The actual concordance and a summary of the predicted concordance of
> experts will be published on 16 Feb 2014
> at   http://dianakornbrot.wordpress.com/projects/methods-matter/
>
> Many thanks for reading this long screed. Comments on the project are
> very welcome.
>
> best
>
> Diana
> _____________________
>
> Professor Diana Kornbrot
> Work
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> voice: +44 (0) 170 728 4626
> email:   [hidden email]<mailto:[hidden email]>
> skype:  kornbrotme
> Home
> 19 Elmhurst Avenue
> London N2 0LT, UK
> voice:    +44 (0) 208 444 2081
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistics Challenge: Does analysis metric matter? Are normal based methods robust?

Art Kendall
As usual Rich made a very insightful response.

In addition, CATREG can be used to model the data specifying different assumptions about level of measurement and to compare the results.
Art Kendall
Social Research Consultants
On 1/30/2014 1:44 AM, Rich Ulrich [via SPSSX Discussion] wrote:
Complaints:
1. The importance of transformations is especially seen in
data where there are outliers. You don't have outliers, to
speak of, when your initial data are Likert items.  After
dichotomizing and choosing (yes/no) whether to use Probits,
no difference is likely to matter so long as the proportions
are between 20% and 80%.

2. Since data are measured with Likert scaling of the items,
it seems that the natural comparison would be between
analyzing the data either (a) while assuming that they
are normal, by the usual regression; or (b) while assuming
that they should be rank-transformed, by performing regression
on the rank-transformed version of the raw scores.

It is well-known that when there is sufficient power, artifacts
of rank-transformed data tend to introduce extra variables
(including suppressors) in order to account for the bad fit
at the extremes -- assuming that you have the large amount of
data needed to find reproducible predictors.

3. "Best subset linear regression" is such a bad idea that
this project should be rejected for the fact that seems to
imply that it is *not* (almost always) a bad idea.

--
Rich Ulrich

________________________________

> Date: Tue, 28 Jan 2014 12:36:02 +0000
> From: [hidden email]
> Subject: Statistics Challenge: Does analysis metric matter? Are normal
> based methods robust?
> To: [hidden email]
>
> Greetings and apologies for cross-posting
>
> It is often claimed that normal based methods such as linear regression
> are 'robust' and do not give misleading results, even when data are far
> from normally distributed.
>
> To investigate this claim, several real data sets have been analysed:
> both using normal based methods and using methods based on various
> non-normal distributions. The first scenario, Scenario 1 is given
> below.
>
> We want to compare the actual concordance of two alternative methods
> with the predictions of statistical practitioners, such as the
> committed users of this list. So we are asking for your  predictions
> about concordance for various scenarios.
>
> Scenario 1: Multiple linear regression is performed with a raw and a
> transformed metric.
>                                Predict % agreement between results from
> the 2 metrics
> Analyst want to know which of 21 features significantly predict overall
> satisfaction
> Raw metric is proportion of respondents favourable, p
> BUT p is not & can not be normally distributed. So an alternative is
> the inverse normal, z, corresponding to p.
> Best subset linear regression was conducted for 51 separate units: a.
> using p as metric. b. using z as metric.
>
> Concordance Question: How much difference does it make?
> Predict from all the significant predictors, what:
> % same predictors significant at 95% cl for both p and z
> % predictors only significant for p
> % predictors only significant for z.
> Please give your expert predictions at https://www.surveymonkey.com/s/9SY7V7Z
> More details
> about project at:  http://dianakornbrot.wordpress.com/projects/methods-matter/
>
> Dissemination of Results
> The actual concordance and a summary of the predicted concordance of
> experts will be published on 16 Feb 2014
> at   http://dianakornbrot.wordpress.com/projects/methods-matter/
>
> Many thanks for reading this long screed. Comments on the project are
> very welcome.
>
> best
>
> Diana
> _____________________
>
> Professor Diana Kornbrot
> Work
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> voice: +44 (0) 170 728 4626
> email:   [hidden email]<mailto:[hidden email]>
> skype:  kornbrotme
> Home
> 19 Elmhurst Avenue
> London N2 0LT, UK
> voice:    +44 (0) 208 444 2081
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants