cross validation using SPSS

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

cross validation using SPSS

Mehrshad Koleini

Dear all


Hi. During cross-validation procedure for making a regression model, I need to obtain PRESSp (prediction sum of squares), and MSPR (mean squared prediction error). Does anybody know how I can calculate it by using SPSS 17.0 Professor Package or I should use other software?

 

Kind regards


Mehrshad



Reply | Threaded
Open this post in threaded view
|

Re: cross validation using SPSS

David Marso
Administrator
OK!  First of all it would be nice if you were to provide a reference to these quantities or a formula.
Sure, I can google but AFAIC it is a pain and you really should save us the extra research effort!
I made the effort to Google "prediction sum of squares" and located something which might be useful.
http://webscripts.softpedia.com/script/Scientific-Engineering-Ruby/Statistics-and-Probability/press-35784.html
Given the definition one might be inclined to run a bajillion different regressions leaving one case out  for each regression and then calculating the residuals for the omitted case based on the regression weights for the remaining cases.  OTOH, this is shear folly as there is a much nicer way to achieve this.
My initial idea was to create a MATRIX program to calculate the 'hat' matrix and then go to town with that.  My second idea was to see what SPSS will give you in terms of useful stuff in the SAVE subcommand.  Rather than spoil all the fun I leave you with the following.  
You should run this as is and look at the data file after running all three regressions... Hmmmmm.

data list free / a b c y .
begin data
1 6 3 1 6 3 6 1 5 3 6 5 3 6 1 5 6 3 1 5 6 3 1 5 6 7 3 5 1 2 6 7 3 5 1 7 6 3 7 6 1 3 5 6 7 1 3 6 7 1 5 3 6 7 1 5 3 6 7 1 7 6
end data.
compute id=$casenum.
reg  / var a b c y   / select id NE 1  / dep y   / method enter a b c   / SAVE DRESID (h1) RESID (e1).
reg  / var a b c y   / select id NE 2  / dep y   / method enter a b c   / SAVE DRESID (h2) RESID (e2).
reg  / var a b c y   / dep y   / method enter a b c   / SAVE DRESID (h_all) .
*Note this is merely a pointer in the (hopefully right) direction.

Regarding MSPR (mean squared prediction error).  I think you will need to provide an explicit publically available citation or formula.  I found a few references but did not feel like attempting to make sense of them in the context of linear Regression.  OTOH, I did see a reference to Mallow's Cp as a scaled version of MSRP.
HTH, David




Mehrshad Koleini wrote
Dear all


Hi. During cross-validation procedure for making a regression model, I need
to obtain PRESSp (prediction sum of squares), and MSPR (mean squared
prediction error). Does anybody know how I can calculate it by using SPSS
17.0 Professor Package or I should use other software?



Kind regards


Mehrshad
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: cross validation using SPSS

ViAnn Beadle
Perhaps these are available via the GLM procedure?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Marso
Sent: Monday, June 27, 2011 4:49 PM
To: [hidden email]
Subject: Re: cross validation using SPSS

OK!  First of all it would be nice if you were to provide a reference to
these quantities or a formula.
Sure, I can google but AFAIC it is a pain and you really should save us the
extra research effort!
I made the effort to Google "prediction sum of squares" and located
something which might be useful.
http://webscripts.softpedia.com/script/Scientific-Engineering-Ruby/Statistic
s-and-Probability/press-35784.html
Given the definition one might be inclined to run a bajillion different
regressions leaving one case out  for each regression and then calculating
the residuals for the omitted case based on the regression weights for the
remaining cases.  OTOH, this is shear folly as there is a much nicer way to
achieve this.
My initial idea was to create a MATRIX program to calculate the 'hat' matrix
and then go to town with that.  My second idea was to see what SPSS will
give you in terms of useful stuff in the SAVE subcommand.  Rather than spoil
all the fun I leave you with the following.
You should run this as is and look at the data file after running all three
regressions... Hmmmmm.

data list free / a b c y .
begin data
1 6 3 1 6 3 6 1 5 3 6 5 3 6 1 5 6 3 1 5 6 3 1 5 6 7 3 5 1 2 6 7 3 5 1 7 6 3
7 6 1 3 5 6 7 1 3 6 7 1 5 3 6 7 1 5 3 6 7 1 7 6 end data.
compute id=$casenum.
reg  / var a b c y   / select id NE 1  / dep y   / method enter a b c   /
SAVE DRESID (h1) RESID (e1).
reg  / var a b c y   / select id NE 2  / dep y   / method enter a b c   /
SAVE DRESID (h2) RESID (e2).
reg  / var a b c y   / dep y   / method enter a b c   / SAVE DRESID (h_all)
.
*Note this is merely a pointer in the (hopefully right) direction.

Regarding MSPR (mean squared prediction error).  I think you will need to
provide an explicit publically available citation or formula.  I found a few
references but did not feel like attempting to make sense of them in the
context of linear Regression.  OTOH, I did see a reference to Mallow's Cp as
a scaled version of MSRP.
HTH, David





Mehrshad Koleini wrote:

>
> Dear all
>
>
> Hi. During cross-validation procedure for making a regression model, I
> need to obtain PRESSp (prediction sum of squares), and MSPR (mean
> squared prediction error). Does anybody know how I can calculate it by
> using SPSS
> 17.0 Professor Package or I should use other software?
>
>
>
> Kind regards
>
>
> Mehrshad
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/cross-validation-using-SPSS-tp
4528990p4530101.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: cross validation using SPSS

Mike
It would help if the OP was more specific about what they wanted but
if one looks for the PRESS statistic, there are several articles that focus
on it such as:
http://www.sciencedirect.com/science/article/pii/S1572312706000529
and
http://www.jstor.org/pss/2686028
and
http://www.jstor.org/pss/1391469

It appears that the PRESS statistic is available in R; see:
http://www.oga-lab.net/RGM2/func.php?rd_id=qpcR:PRESS
And there appears to be a Mean Squared Error of Prediction
measure as well but it is call MSEP and RMSEP in R; see:
http://www.oga-lab.net/RGM2/func.php?rd_id=lspls:MSEP.lsplsCv

So, if one has R along with SPSS, one could probably call the
R procedures.  I'll leave it to an R maven to show how this can
be done.

-Mike Palij
New York University
[hidden email]



----- Original Message -----
From: "ViAnn Beadle" <[hidden email]>
To: <[hidden email]>
Sent: Monday, June 27, 2011 8:30 PM
Subject: Re: cross validation using SPSS


> Perhaps these are available via the GLM procedure?
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> David Marso
> Sent: Monday, June 27, 2011 4:49 PM
> To: [hidden email]
> Subject: Re: cross validation using SPSS
>
> OK!  First of all it would be nice if you were to provide a reference to
> these quantities or a formula.
> Sure, I can google but AFAIC it is a pain and you really should save us the
> extra research effort!
> I made the effort to Google "prediction sum of squares" and located
> something which might be useful.
> http://webscripts.softpedia.com/script/Scientific-Engineering-Ruby/Statistic
> s-and-Probability/press-35784.html
> Given the definition one might be inclined to run a bajillion different
> regressions leaving one case out  for each regression and then calculating
> the residuals for the omitted case based on the regression weights for the
> remaining cases.  OTOH, this is shear folly as there is a much nicer way to
> achieve this.
> My initial idea was to create a MATRIX program to calculate the 'hat' matrix
> and then go to town with that.  My second idea was to see what SPSS will
> give you in terms of useful stuff in the SAVE subcommand.  Rather than spoil
> all the fun I leave you with the following.
> You should run this as is and look at the data file after running all three
> regressions... Hmmmmm.
>
> data list free / a b c y .
> begin data
> 1 6 3 1 6 3 6 1 5 3 6 5 3 6 1 5 6 3 1 5 6 3 1 5 6 7 3 5 1 2 6 7 3 5 1 7 6 3
> 7 6 1 3 5 6 7 1 3 6 7 1 5 3 6 7 1 5 3 6 7 1 7 6 end data.
> compute id=$casenum.
> reg  / var a b c y   / select id NE 1  / dep y   / method enter a b c   /
> SAVE DRESID (h1) RESID (e1).
> reg  / var a b c y   / select id NE 2  / dep y   / method enter a b c   /
> SAVE DRESID (h2) RESID (e2).
> reg  / var a b c y   / dep y   / method enter a b c   / SAVE DRESID (h_all)
> .
> *Note this is merely a pointer in the (hopefully right) direction.
>
> Regarding MSPR (mean squared prediction error).  I think you will need to
> provide an explicit publically available citation or formula.  I found a few
> references but did not feel like attempting to make sense of them in the
> context of linear Regression.  OTOH, I did see a reference to Mallow's Cp as
> a scaled version of MSRP.
> HTH, David
>
>
>
>
>
> Mehrshad Koleini wrote:
>>
>> Dear all
>>
>>
>> Hi. During cross-validation procedure for making a regression model, I
>> need to obtain PRESSp (prediction sum of squares), and MSPR (mean
>> squared prediction error). Does anybody know how I can calculate it by
>> using SPSS
>> 17.0 Professor Package or I should use other software?
>>
>>
>>
>> Kind regards
>>
>>
>> Mehrshad
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/cross-validation-using-SPSS-tp
> 4528990p4530101.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: cross validation using SPSS

David Marso
Administrator
From my previous code (last REG without the SELECT subcommand) simply compute the Squared DRESID and then sum them using AGGREGATE or DESC ;-).  I was going to leave that detail as an exercise for the OP.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: cross validation using SPSS

David Marso
Administrator
The DRESID is the same thing as PRESS residuals (it is actually referred to by that name in the GLM algos).
My initial 2 REGS were intended to illustrate that the RESID for the deleted case is the same as the DRESID and that when SELECT is not specified the DRESID for the entire data set corresponds (as expected) to these same using the SELECT for the specific cases.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"