Best subset regression

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Best subset regression

Kornbrot, Diana
Best subset regression Hi
Am using best subset regression via automated linear modelling in spss21.
Very useful, BUT in 21 only output is via the ghastly model viewer which does not export to excel. Furthermore automated linear modelling does not work well with split files/BY variable, as if one by variable encounters problems it aborts. This is unlike regression which issues a warning and goes on to next item in split file. Is any of this better in v22?
  1. in 22 can one choose pivot and chart output instead of model viewer in prefs? In 21 this is only implemented for linear mixed models?
  2. in 22 can one successfully use split file with automated linear modelling?

An additional request for ALL models. Please supply normality test for residuals. Automated linear modelling does have rather nice p-p plots, as do other regression procedures. BUT I would like to know that the RESIDUALS are normally distributed, more important than knowing whether separate variables in model are normally distributed.

NB automated linear modelling has option under build options basic to ‘automatically prepare the data’ [the default value]. In my view, this is a VERY bad idea in many situations. E.g. IT clumps all  the outliers at 3 sd [?Winsorising] for each variable separately. For my data, this REDUCED the adjusted r^2 for 80.2 to76.4

Best

Diana

Professor Diana Kornbrot
email: : d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
            http://go.herts.ac.uk/diana_kornbrot
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208  444 2081
mobile: +44 (0) 740 318 1612
Reply | Threaded
Open this post in threaded view
|

Re: Best subset regression

Bruce Weaver
Administrator
DK:  "An additional request for ALL models. Please supply normality test for residuals."

I find that a curious request, Diana.  You already know the errors are not truly normally distributed.  As George Box put it:

“…the statistician knows…that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world.” (JASA, 1976, Vol. 71, 791-799; emphasis added)

What will you gain by having a p-value from a test of normality?  That test will have too little power when samples are small, and too much power when samples are large, IMO.  

Furthermore, OLS models are quite robust to non-normality of the errors.  Independence of the errors is far more important.  

There's my tuppence!  ;-)

Cheers,
Bruce



Kornbrot, Diana wrote
Hi
Am using best subset regression via automated linear modelling in spss21.
Very useful, BUT in 21 only output is via the ghastly model viewer which does not export to excel. Furthermore automated linear modelling does not work well with split files/BY variable, as if one by variable encounters problems it aborts. This is unlike regression which issues a warning and goes on to next item in split file. Is any of this better in v22?

 1.  in 22 can one choose pivot and chart output instead of model viewer in prefs? In 21 this is only implemented for linear mixed models?
 2.  in 22 can one successfully use split file with automated linear modelling?

An additional request for ALL models. Please supply normality test for residuals. Automated linear modelling does have rather nice p-p plots, as do other regression procedures. BUT I would like to know that the RESIDUALS are normally distributed, more important than knowing whether separate variables in model are normally distributed.

NB automated linear modelling has option under build options basic to 'automatically prepare the data' [the default value]. In my view, this is a VERY bad idea in many situations. E.g. IT clumps all  the outliers at 3 sd [?Winsorising] for each variable separately. For my data, this REDUCED the adjusted r^2 for 80.2 to76.4

Best

Diana
________________________________
Professor Diana Kornbrot
email: : [hidden email]
web:    http://dianakornbrot.wordpress.com/
            http://go.herts.ac.uk/diana_kornbrot
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208  444 2081
mobile: +44 (0) 740 318 1612
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Best subset regression

Jon K Peck
In reply to this post by Kornbrot, Diana
In V22 the ALM output is still a Model Viewer, but it can be exported to Excel, although the exported quality is mediocre.  The PDF export is better.  Be sure to check the "Export all views" box in the Change Options subdialog.

ALM does honor splits and will note problems with a split and continue to the next one.

You can save  the predicted values from ALM and hence compute the residuals to the active dataset, so normality tests and plots can be done as with any other variable if desired.  You do get a residual histogram or p-p plot in the MV.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        "Kornbrot, Diana" <[hidden email]>
To:        Jon K Peck/Chicago/IBM@IBMUS, "[hidden email]" <[hidden email]>,
Date:        11/12/2013 08:21 AM
Subject:        Best subset regression




Hi
Am using best subset regression via automated linear modelling in spss21.
Very useful, BUT in 21 only output is via the ghastly model viewer which does not export to excel. Furthermore automated linear modelling does not work well with split files/BY variable, as if one by variable encounters problems it aborts. This is unlike regression which issues a warning and goes on to next item in split file. Is any of this better in v22?

1.        in 22 can one choose pivot and chart output instead of model viewer in prefs? In 21 this is only implemented for linear mixed models?
2.        in 22 can one successfully use split file with automated linear modelling?

An additional request for ALL models. Please supply normality test for residuals. Automated linear modelling does have rather nice p-p plots, as do other regression procedures. BUT I would like to know that the RESIDUALS are normally distributed, more important than knowing whether separate variables in model are normally distributed.

NB automated linear modelling has option under build options basic to ‘automatically prepare the data’ [the default value]. In my view, this is a VERY bad idea in many situations. E.g. IT clumps all  the outliers at 3 sd [?Winsorising] for each variable separately. For my data, this REDUCED the adjusted r^2 for 80.2 to76.4

Best

Diana


Professor Diana Kornbrot
email: :
d.e.kornbrot@...
web:    
http://dianakornbrot.wordpress.com/
           
http://go.herts.ac.uk/diana_kornbrot
Work

Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home

19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208  444 2081
mobile: +44 (0) 740 318 1612