Number of coefficients in stepwise linear regression query

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Number of coefficients in stepwise linear regression query

Mark Webb-5
I'm repeating a survey where the questionnaire has not changed.
Statements rating 0-100.
I'm using Linear Regression [from menu] / stepwise / F to enter = 0.04 To remove 0.10
Last year of the 50 statements 20 were entered. This year it has increased to 30.
What does this imply?
Thank you in advance - Regards
-- 
Mark Webb

Line +27 (21) 786 1124
Cell +27 (72) 199 1000 [Poor reception]
Fax  +27 (86) 260 1946 

Skype       tomarkwebb 
Email       [hidden email] 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Number of coefficients in stepwise linear regression query

Ware, William B

That there are some “random” fluctuations in the magnitudes of the intercorrelations from one year to the next?

 

wbw

 

William B. Ware, Ph.D.

McMichael Term Professor of Education, 2011-2013

Educational Psychology, Measurement, and Evaluation

CB #3500 - 118 Peabody Hall 

University of North Carolina at Chapel Hill

Chapel Hill, NC     27599-3500

Office: (919)-962-2511

Fax:    (919)-962-1533

Office:  118 Peabody Hall

EMAIL: [hidden email]

Adjunct Professor, School of Social Work

Academy of Distinguished Teaching Scholars at UNC-Chapel Hill

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Webb
Sent: Monday, February 19, 2018 5:31 AM
To: [hidden email]
Subject: Number of coefficients in stepwise linear regression query

 

I'm repeating a survey where the questionnaire has not changed.
Statements rating 0-100.
I'm using Linear Regression [from menu] / stepwise / F to enter = 0.04 To remove 0.10
Last year of the 50 statements 20 were entered. This year it has increased to 30.
What does this imply?
Thank you in advance - Regards

-- 
Mark Webb
 
Line +27 (21) 786 1124
Cell +27 (72) 199 1000 [Poor reception]
Fax  +27 (86) 260 1946 
 
Skype       tomarkwebb 
Email       [hidden email] 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Number of coefficients in stepwise linear regression query

Rich Ulrich
In reply to this post by Mark Webb-5

Well, stepwise regression is a crappy method in general, so you've received a typical confirmation. Its overall fault is that (for most examples you see) it does not lead to correct inferences. For huge samples, folks end up reporting artifacts; for smaller samples, they report noise.


When it mainly works is when you want an arbitrary, short equation - one with only two or three variables, selecting them from a large, redundant set. 


The Stata FAQ includes Frank Harrell's comments on stepwise, along with references. (I promoted his list in the usenet stat.* groups, for years after he originally posted it to one of them.)


--

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Mark Webb <[hidden email]>
Sent: Monday, February 19, 2018 5:31:28 AM
To: [hidden email]
Subject: Number of coefficients in stepwise linear regression query
 
I'm repeating a survey where the questionnaire has not changed.
Statements rating 0-100.
I'm using Linear Regression [from menu] / stepwise / F to enter = 0.04 To remove 0.10
Last year of the 50 statements 20 were entered. This year it has increased to 30.
What does this imply?
Thank you in advance - Regards
-- 
Mark Webb

Line +27 (21) 786 1124
Cell +27 (72) 199 1000 [Poor reception]
Fax  +27 (86) 260 1946 

Skype       tomarkwebb 
Email       [hidden email] 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Number of coefficients in stepwise linear regression query

Mark Webb-5
What would you recommend if I want to do linear regression with a reduction of variables? Thanks for the response. Regards

On Mon, 19 Feb 2018 20:39 Rich Ulrich, <[hidden email]> wrote:

Well, stepwise regression is a crappy method in general, so you've received a typical confirmation. Its overall fault is that (for most examples you see) it does not lead to correct inferences. For huge samples, folks end up reporting artifacts; for smaller samples, they report noise.


When it mainly works is when you want an arbitrary, short equation - one with only two or three variables, selecting them from a large, redundant set. 


The Stata FAQ includes Frank Harrell's comments on stepwise, along with references. (I promoted his list in the usenet stat.* groups, for years after he originally posted it to one of them.)


--

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Mark Webb <[hidden email]>
Sent: Monday, February 19, 2018 5:31:28 AM

To: [hidden email]
Subject: Number of coefficients in stepwise linear regression query
I'm repeating a survey where the questionnaire has not changed.
Statements rating 0-100.
I'm using Linear Regression [from menu] / stepwise / F to enter = 0.04 To remove 0.10
Last year of the 50 statements 20 were entered. This year it has increased to 30.
What does this imply?
Thank you in advance - Regards
-- 
Mark Webb

Line +27 (21) 786 1124
Cell +27 (72) 199 1000 [Poor reception]
Fax  +27 (86) 260 1946 

Skype       tomarkwebb 
Email       [hidden email] 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--

Mark Webb +27 21 786 1124 +27 072 199 1000

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Number of coefficients in stepwise linear regression query

Rich Ulrich

What is your purpose? What are you trying to learn, or to prove?


"Reduction of variables" is a starting point. A-priori selection based on previous knowledge and experience is helpful if you are aiming for testing of hypotheses. Then you set up a hierarchy: there are a few main tests; when you validate that (say) one composite score is appropriate, then you may look to see whether various items reflected by that composite are more or less important.


You should hope to /test/ only a few hypotheses at the top level, using selected variables or composite scores. Are there just a few variables that /really/ matter? - or that you /know/ have to be accounted for?  If not a few "variables" in your list, do you have latent constructs in mind? If you /know/ that something has to be accounted for - sex, maybe, or age - then it may be proper to consider it as a covariate for whatever else will happen.


 - What is your sample size? Does your survey lend itself to factor analysis? Arbitrary grouping of items?

For data sets with really huge N's, the proper approach probably includes sub-sampling (either random, or carefully systematic) and cross-validation.

--
Rich Ulrich


From: Mark Webb <[hidden email]>
Sent: Monday, February 19, 2018 1:43 PM
To: Rich Ulrich
Cc: [hidden email]
Subject: Re: Number of coefficients in stepwise linear regression query
 
What would you recommend if I want to do linear regression with a reduction of variables? Thanks for the response. Regards

On Mon, 19 Feb 2018 20:39 Rich Ulrich, <[hidden email]> wrote:

Well, stepwise regression is a crappy method in general, so you've received a typical confirmation. Its overall fault is that (for most examples you see) it does not lead to correct inferences. For huge samples, folks end up reporting artifacts; for smaller samples, they report noise.


When it mainly works is when you want an arbitrary, short equation - one with only two or three variables, selecting them from a large, redundant set. 


The Stata FAQ includes Frank Harrell's comments on stepwise, along with references. (I promoted his list in the usenet stat.* groups, for years after he originally posted it to one of them.)


--

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Mark Webb <[hidden email]>
Sent: Monday, February 19, 2018 5:31:28 AM

To: [hidden email]
Subject: Number of coefficients in stepwise linear regression query
I'm repeating a survey where the questionnaire has not changed.
Statements rating 0-100.
I'm using Linear Regression [from menu] / stepwise / F to enter = 0.04 To remove 0.10
Last year of the 50 statements 20 were entered. This year it has increased to 30.
What does this imply?
Thank you in advance - Regards
-- 
Mark Webb

Line +27 (21) 786 1124
Cell +27 (72) 199 1000 [Poor reception]
Fax  +27 (86) 260 1946 

Skype       tomarkwebb 
Email       [hidden email] 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--

Mark Webb +27 21 786 1124 +27 072 199 1000

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Number of coefficients in stepwise linear regression query

Art Kendall
Rich asked some good questions.

In addition
 Were the variables intended to be parts of a summative scale?

How large were the samples? how were they drawn stratified? multilevel? etc?

Are these populations the samples were drawn from different in some way?
Countries? Political parties? etc.?

Are you looking at changes in structure over time, I.e., are the correlation
matrices systematically different?

How many sets of data are there?







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Number of coefficients in stepwise linear regression query

Anthony Babinec
Art and Rich asked some good questions.

You did not mention the sample size in either study.

Further, stepwise variable selection is a "high-variance" method, meaning
that it is influenced by the training data it sees.

For Frank Harrell's "classic" take on stepwise regression, see

https://www.stata.com/support/faqs/statistics/stepwise-regression-problems/

Lastly, you might explore methods for regularized regression such as
the lasso and the elastic net. You can do these in the CATREG procedure.

Tony Babinec

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD