overestimated model (GLM)

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

overestimated model (GLM)

drfg2008
SPSS 20

am I right, that if in a GLM procedure SPSS encounters an overestimated model (for example more colums than rows - more predictors than cases), SPSS not just returns an error message and stops, but tries to solve the equations by deleting 'unnecessary' predictors ?

(R just returned error messages in these cases).

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

David Marso
Administrator
Frank,
 Try it and report back!
drfg2008 wrote
SPSS 20

am I right, that if in a GLM procedure SPSS encounters an overestimated model (for example more colums than rows - more predictors than cases), SPSS not just returns an error message and stops, but tries to solve the equations by deleting 'unnecessary' predictors ?

(R just returned error messages in these cases).

Frank
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

drfg2008
David,
just because it works in the following example doesn't mean, it always works like that.
(In the following example the pre_1 fits exactly: R = 1. However, the t- and sig.-values are no longer printed.)


input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

VECTOR v(120).

DO REPEAT #i = v1 to v120.
COMPUTE #i=RV.NORMAL(0,1).
END REPEAT.
EXECUTE.

COMPUTE w=SUM(v1 to v120) + RV.NORMAL(0,1).
EXECUTE.


REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN
  /DEPENDENT w
  /METHOD=ENTER v1 to v120
  /SAVE PRED.
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

David Marso
Administrator
What else would you expect?
--
drfg2008 wrote
David,
just because it works in the following example doesn't mean, it always works like that.
(In the following example the pre_1 fits exactly: R = 1. However, the t- and sig.-values are no longer printed.)


input program.
loop a =1 to 100 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

VECTOR v(120).

DO REPEAT #i = v1 to v120.
COMPUTE #i=RV.NORMAL(0,1).
END REPEAT.
EXECUTE.

COMPUTE w=SUM(v1 to v120) + RV.NORMAL(0,1).
EXECUTE.


REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN
  /DEPENDENT w
  /METHOD=ENTER v1 to v120
  /SAVE PRED.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

drfg2008
well, R would just return an error message.


The R-extention 'robust regression' also:

Warnings
Subcommand : ENTER must specify a valid variable list.
Execution of this command stops.


So, am I right, that SPSS is <always> deleting the list of variables until the model is no longer overestimated?

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

David Marso
Administrator
Frank,
  Look at the algorithms ( ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Statistics_Algorithms.pdf )!  Variables are *NOT* deleted, they are simply not entered if the tolerance is too low.  With more variables than cases, at some point (k <=N) the remaining variables will have a tolerance of 0.  Why are you trying to regress more variables than you have cases?  Sounds fishy to me.
drfg2008 wrote
well, R would just return an error message.


The R-extention 'robust regression' also:

Warnings
Subcommand : ENTER must specify a valid variable list.
Execution of this command stops.


So, am I right, that SPSS is <always> deleting the list of variables until the model is no longer overestimated?

Frank
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

drfg2008
Why are you trying to regress more variables than you have cases?  Sounds fishy to me.  


Because I'm a fisher ; -)

Fishing for the best odds in sportsbetting. (Engeneering approach: "erlaubt ist, was funktioniert" *)

Frank


*)
allowed is, what works -
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

David Marso
Administrator
If you aren't careful you'll find this sucker in your net ;-)
Frank's dinner
drfg2008 wrote
Why are you trying to regress more variables than you have cases?  Sounds fishy to me.  


Because I'm a fisher ; -)

Fishing for the best odds in sportsbetting. (Engeneering approach: "erlaubt ist, was funktioniert" *)

Frank


*)
allowed is, what works -
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

drfg2008
I'll bet.
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

Rich Ulrich
In reply to this post by drfg2008
For more effective fishing - Use Step 1  to put into the equation
only the one or two variables that you are sure must belong
there.  Then, ask for SPSS to print out the statistics on everything
"not in the equation."   That's what you want to look at, anyway.

--
Rich Ulrich

> Date: Sun, 19 Feb 2012 00:38:43 -0800

> From: [hidden email]
> Subject: Re: overestimated model (GLM)
> To: [hidden email]
>
> /Why are you trying to regress more variables than you have cases? Sounds
> fishy to me. /
>
>
> Because I'm a fisher ; -)
>
> Fishing for the best odds in sportsbetting. (Engeneering approach: "erlaubt
> ist, was funktioniert" *)
>
> Frank
>
>
> *)
> allowed is, what works -

"Whatever works, do it."  Or just,
"Whatever works ...."

>
...
Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

drfg2008
@ Rich Ulrich


"Whatever works, do it."  Or just,
"Whatever works ...."



Sure. Sorry for my google translations.


However: Would it make sense use blocks, and to put into block 1 of the equation
only the one or two variables that I am sure must belong
there and to use all the others in a block 2 ?

(to print out anything wouldn't make sense, since the system is expected to run automatically and as fast as possible, and during the night, when I want to sleep ;- )
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: overestimated model (GLM)

Rich Ulrich
What I had in mind --
Enter Must-belong in block 1, and examine the out-statistics on the rest. 
Figure out what to do next.
There never was a block 2, so far as Entering was concerned.

Data-mining is potentially legitimate, but stepwise inclusion
from EVERYTHING  has extremely limited value. Almost none.
When you start with a very large sample, you can use some
cases for "training" and most of the cases for very extensive
cross-validation. Otherwise, your results are mainly capitalizing
on chance.

I presume that you want something that might replicate.
Selecting from 100 variables almost guarantees that your
next variables, beyond the obvious and face-valid ones, will
include a large share of "random contributors".  You can
Google for < Frank Harrell stepwise >  to get some good
comments on the drawbacks of stepwise.

Especially with limited N -- I would want to get rid of variables,
either by dumping a bunch entirely, or by creating composites
to replace them.

--
Rich Ulrich


> Date: Sun, 19 Feb 2012 14:10:29 -0800

> From: [hidden email]
> Subject: Re: overestimated model (GLM)
> To: [hidden email]
>
> @ Rich Ulrich
>
> /
> "Whatever works, do it." Or just,
> "Whatever works ...."/
>
>
> Sure. Sorry for my google translations.
>
>
> However: Would it make sense use blocks, and to put into block 1 of the
> equation
> only the one or two variables that I am sure must belong
> there and to use all the others in a block 2 ?
>
> (to print out anything wouldn't make sense, since the system is expected to
> run automatically and as fast as possible, and during the night, when I want
> to sleep ;- )
>
>...