A MATRIX mystery

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

A MATRIX mystery

Bruce Weaver
Administrator
Here's a little MATRIX mystery someone might be able to help me solve.  

Background:  I ran a linear regression model of the form Y = b0 + b1*X + b2*X**2 + E via the REGRESSION procedure, and everything worked fine.  Then I tried to run the same model via matrix equations, and it didn't run.  The error message said "Source operand is singular for INV."  I was using old MATRIX code that worked in the past, so I tried with a different explanatory variable, and it ran fine.  Finally, I went back to my original X-variable, but divided it by 100 to make the numbers smaller.  After that rescaling, it worked just fine.

This brings me to my question:  Why does the model with X=WEIGHT run without problems using REGRESSION, but not using MATRIX?  Apparently MATRIX runs into some limitation that REGRESSION doesn't.  Is there something I can do to raise that limit?  

My syntax is appended below for anyone who wants to try the various models.  The data come from the Cars.sav file found in your "Samples" folder.

Cheers,
Bruce


*--- Syntax for the various models --- .

* This demo uses data from the "Cars.sav" file that
* comes with SPSS.  It runs two regression models
* of the form Y = b0 + b1*X + b2*X**2 + E.
* Both models run when I use REGRESSION.
* But only the model with HORSE runs using MATRIX.
* The model using WEIGHT results in a singularity.
* If I rescale WEIGHT to WEIGHT/100, the model runs.
* I don't know why the model using WEIGHT runs via
* REGRESSION, but not via MATRIX .

new file.
dataset close all.

* Change path below as necessary.
get file = "C:\SPSSdata\Cars.sav".

select if nmiss(mpg,weight,horse) EQ 0.

GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

* Delete the odd case where vehicle weight < 1000 lbs.
select if weight GT 1000.
GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

compute hpsq = horse**2. /* horsepower squared.
compute wsq = weight**2. /* weight-squared variable .
compute w100 = weight/100.
compute w100sq = w100**2.


REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER horse hpsq
.
REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER weight wsq
.
REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER w100 w100sq
.

* ------------------------------------------ .
* Now run the same models via MATRIX.
* Run the MATRIX section 3 times, commenting
* out two of the "GET x" commands each time
* to choose the appropriate design matrix.
* ------------------------------------------ .

compute c = 1. /* First column of design matrix.
exe.

matrix.

GET y
 /file = *
 /var = MPG.

* Comment out one of the "GET x" commands below .
GET x
 /file = *
 /var = c horse hpsq.
*GET x
 /file = *
 /var = c weight wsq.
*GET x
 /file = *
 /var = c w100 w100sq.

compute B = inv(t(x)*x)*t(x)*y.  /* Vector of regression coefficients.

compute n = nrow(y).            /* N = number of cases .
compute ybar = msum(y)/n.     /* Mean of Y values .
compute dev.y = y-ybar.       /* Matrix of (Y-Ybar) deviation scores.
compute ss.y = t(dev.y)*dev.y.  /* SS(Y) = dev'dev.
compute yhat = x*b.           /* Yhat = XB .
compute e = y - yhat.         /* E = Y-Yhat .
compute ss.e = t(e)*e.        /* SSE = E'E .
compute ss.reg = ss.y - ss.e.   /* SS_regression .
compute df1 = ncol(x)-1.        /* df for numerator of F-ratio.
compute df2 = n-df1-1.           /* df for denominator of F-ratio .
compute ms.reg = ss.reg/df1.     /* MS regression .
compute ms.e = ss.e/df2.       /* Variance of residuals .
compute covB = inv(t(x)*x)*ms.e. /* (X'X)^-1*MS_error .
compute var.b = diag(covB).    /* Variances of B .
compute se.b = sqrt(var.b).    /* SE of B .
compute f = ms.reg/ms.e.       /* F-statistic .
compute p = 1 - FCDF(f,df1,df2). /* p-value for F-ratio .
compute Rsq = ss.reg / ss.y .    /* R-squared value .

compute BSE = { B,se.b }.        /* Regression coefficients & standard errors .
compute Summary = MAKE(1,6,0).
compute Summary(1,1) = Rsq.        /* R-squared value .
compute Summary(1,2) = SQRT(ms.e). /* Root mean square error .
compute Summary(1,3) = f.          /* F-ratio for regression model.
compute Summary(1,4) = df1.        /* df for numerator.
compute Summary(1,5) = df2.        /* df for denominator.
compute Summary(1,6) = p.          /* p-value for F-test .

print BSE / format = "f8.3" /
 title= "Regression coefficients & standard errors" /
 clabel = "B" "SE(B)"
.

print Summary / format = "f8.3" /
 title= "Summary statistics for the regression model" /
 clabel = "R-sq" "RMSE" "F" "df1" "df2" "p"
.

end matrix.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Jon K Peck
My guess is that the difference is due to using a numerically unstable way to compute the regression.  The Regression procedure does not proceed by just calculating with the textbook expression.  It uses numerical methods that are designed to minimize precision loss due to extreme magnitudes and high collinearity.

If you were to rewrite your regression code, say, to use a Cholesky factorization rather than a brute force matrix inversion, I'll bet that you could get the same results.  It could also be that the INV function singularity criterion is too sensitive to scale differences.  I haven't looked into that code.

Regards,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Bruce Weaver <[hidden email]>
To:        [hidden email]
Date:        12/07/2011 07:50 AM
Subject:        [SPSSX-L] A MATRIX mystery
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Here's a little MATRIX mystery someone might be able to help me solve.

*Background*:  I ran a linear regression model of the form Y = b0 + b1*X +
b2*X**2 + E via the REGRESSION procedure, and everything worked fine.  Then
I tried to run the same model via matrix equations, and it didn't run.  The
error message said "Source operand is singular for INV."  I was using old
MATRIX code that worked in the past, so I tried with a different explanatory
variable, and it ran fine.  Finally, I went back to my original X-variable,
but divided it by 100 to make the numbers smaller.  After that rescaling, it
worked just fine.

This brings me to my question:  Why does the model with X=WEIGHT run without
problems using REGRESSION, but not using MATRIX?  Apparently MATRIX runs
into some limitation that REGRESSION doesn't.  Is there something I can do
to raise that limit?

My syntax is appended below for anyone who wants to try the various models.
The data come from the Cars.sav file found in your "Samples" folder.

Cheers,
Bruce


*--- Syntax for the various models --- .

* This demo uses data from the "Cars.sav" file that
* comes with SPSS.  It runs two regression models
* of the form Y = b0 + b1*X + b2*X**2 + E.
* Both models run when I use REGRESSION.
* But only the model with HORSE runs using MATRIX.
* The model using WEIGHT results in a singularity.
* If I rescale WEIGHT to WEIGHT/100, the model runs.
* I don't know why the model using WEIGHT runs via
* REGRESSION, but not via MATRIX .

new file.
dataset close all.

* Change path below as necessary.
get file = "C:\SPSSdata\Cars.sav".

select if nmiss(mpg,weight,horse) EQ 0.

GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

* Delete the odd case where vehicle weight < 1000 lbs.
select if weight GT 1000.
GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

compute hpsq = horse**2. /* horsepower squared.
compute wsq = weight**2. /* weight-squared variable .
compute w100 = weight/100.
compute w100sq = w100**2.


REGRESSION
/STATISTICS COEFF R ANOVA CI(95)
/DEPENDENT=MPG
/METHOD=ENTER horse hpsq
.
REGRESSION
/STATISTICS COEFF R ANOVA CI(95)
/DEPENDENT=MPG
/METHOD=ENTER weight wsq
.
REGRESSION
/STATISTICS COEFF R ANOVA CI(95)
/DEPENDENT=MPG
/METHOD=ENTER w100 w100sq
.

* ------------------------------------------ .
* Now run the same models via MATRIX.
* Run the MATRIX section 3 times, commenting
* out two of the "GET x" commands each time
* to choose the appropriate design matrix.
* ------------------------------------------ .

compute c = 1. /* First column of design matrix.
exe.

matrix.

GET y
/file = *
/var = MPG.

* Comment out one of the "GET x" commands below .
GET x
/file = *
/var = c horse hpsq.
*GET x
/file = *
/var = c weight wsq.
*GET x
/file = *
/var = c w100 w100sq.

compute B = inv(t(x)*x)*t(x)*y.  /* Vector of regression coefficients.

compute n = nrow(y).                /* N = number of cases .
compute ybar = msum(y)/n.                  /* Mean of Y values .
compute dev.y = y-ybar.                        /* Matrix of (Y-Ybar) deviation scores.
compute ss.y = t(dev.y)*dev.y.    /* SS(Y) = dev'dev.
compute yhat = x*b.                                /* Yhat = XB .
compute e = y - yhat.                            /* E = Y-Yhat .
compute ss.e = t(e)*e.                          /* SSE = E'E .
compute ss.reg = ss.y - ss.e.           /* SS_regression .
compute df1 = ncol(x)-1.                /* df for numerator of F-ratio.
compute df2 = n-df1-1.           /* df for denominator of F-ratio .
compute ms.reg = ss.reg/df1.     /* MS regression .
compute ms.e = ss.e/df2.                   /* Variance of residuals .
compute covB = inv(t(x)*x)*ms.e.        /* (X'X)^-1*MS_error .
compute var.b = diag(covB).                 /* Variances of B .
compute se.b = sqrt(var.b).                 /* SE of B .
compute f = ms.reg/ms.e.                       /* F-statistic .
compute p = 1 - FCDF(f,df1,df2). /* p-value for F-ratio .
compute Rsq = ss.reg / ss.y .    /* R-squared value .

compute BSE = { B,se.b }.        /* Regression coefficients & standard
errors .
compute Summary = MAKE(1,6,0).
compute Summary(1,1) = Rsq.        /* R-squared value .
compute Summary(1,2) = SQRT(ms.e). /* Root mean square error .
compute Summary(1,3) = f.          /* F-ratio for regression model.
compute Summary(1,4) = df1.        /* df for numerator.
compute Summary(1,5) = df2.        /* df for denominator.
compute Summary(1,6) = p.          /* p-value for F-test .

print BSE / format = "f8.3" /
title= "Regression coefficients & standard errors" /
clabel = "B" "SE(B)"
.

print Summary / format = "f8.3" /
title= "Summary statistics for the regression model" /
clabel = "R-sq" "RMSE" "F" "df1" "df2" "p"
.

end matrix.



-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-tp5055860p5055860.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Ryan
In reply to this post by Bruce Weaver
Bruce,

Use the generalized Inverse (GINV) function instead of INV function. 

Ryan

On Wed, Dec 7, 2011 at 9:46 AM, Bruce Weaver <[hidden email]> wrote:
Here's a little MATRIX mystery someone might be able to help me solve.

*Background*:  I ran a linear regression model of the form Y = b0 + b1*X +
b2*X**2 + E via the REGRESSION procedure, and everything worked fine.  Then
I tried to run the same model via matrix equations, and it didn't run.  The
error message said "Source operand is singular for INV."  I was using old
MATRIX code that worked in the past, so I tried with a different explanatory
variable, and it ran fine.  Finally, I went back to my original X-variable,
but divided it by 100 to make the numbers smaller.  After that rescaling, it
worked just fine.

This brings me to my question:  Why does the model with X=WEIGHT run without
problems using REGRESSION, but not using MATRIX?  Apparently MATRIX runs
into some limitation that REGRESSION doesn't.  Is there something I can do
to raise that limit?

My syntax is appended below for anyone who wants to try the various models.
The data come from the Cars.sav file found in your "Samples" folder.

Cheers,
Bruce


*--- Syntax for the various models --- .

* This demo uses data from the "Cars.sav" file that
* comes with SPSS.  It runs two regression models
* of the form Y = b0 + b1*X + b2*X**2 + E.
* Both models run when I use REGRESSION.
* But only the model with HORSE runs using MATRIX.
* The model using WEIGHT results in a singularity.
* If I rescale WEIGHT to WEIGHT/100, the model runs.
* I don't know why the model using WEIGHT runs via
* REGRESSION, but not via MATRIX .

new file.
dataset close all.

* Change path below as necessary.
get file = "C:\SPSSdata\Cars.sav".

select if nmiss(mpg,weight,horse) EQ 0.

GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

* Delete the odd case where vehicle weight < 1000 lbs.
select if weight GT 1000.
GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

compute hpsq = horse**2. /* horsepower squared.
compute wsq = weight**2. /* weight-squared variable .
compute w100 = weight/100.
compute w100sq = w100**2.


REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER horse hpsq
.
REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER weight wsq
.
REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER w100 w100sq
.

* ------------------------------------------ .
* Now run the same models via MATRIX.
* Run the MATRIX section 3 times, commenting
* out two of the "GET x" commands each time
* to choose the appropriate design matrix.
* ------------------------------------------ .

compute c = 1. /* First column of design matrix.
exe.

matrix.

GET y
 /file = *
 /var = MPG.

* Comment out one of the "GET x" commands below .
GET x
 /file = *
 /var = c horse hpsq.
*GET x
 /file = *
 /var = c weight wsq.
*GET x
 /file = *
 /var = c w100 w100sq.

compute B = inv(t(x)*x)*t(x)*y.  /* Vector of regression coefficients.

compute n = nrow(y).                /* N = number of cases .
compute ybar = msum(y)/n.                  /* Mean of Y values .
compute dev.y = y-ybar.                        /* Matrix of (Y-Ybar) deviation scores.
compute ss.y = t(dev.y)*dev.y.    /* SS(Y) = dev'dev.
compute yhat = x*b.                                /* Yhat = XB .
compute e = y - yhat.                            /* E = Y-Yhat .
compute ss.e = t(e)*e.                          /* SSE = E'E .
compute ss.reg = ss.y - ss.e.           /* SS_regression .
compute df1 = ncol(x)-1.                /* df for numerator of F-ratio.
compute df2 = n-df1-1.           /* df for denominator of F-ratio .
compute ms.reg = ss.reg/df1.     /* MS regression .
compute ms.e = ss.e/df2.                   /* Variance of residuals .
compute covB = inv(t(x)*x)*ms.e.        /* (X'X)^-1*MS_error .
compute var.b = diag(covB).                 /* Variances of B .
compute se.b = sqrt(var.b).                 /* SE of B .
compute f = ms.reg/ms.e.                       /* F-statistic .
compute p = 1 - FCDF(f,df1,df2). /* p-value for F-ratio .
compute Rsq = ss.reg / ss.y .    /* R-squared value .

compute BSE = { B,se.b }.        /* Regression coefficients & standard
errors .
compute Summary = MAKE(1,6,0).
compute Summary(1,1) = Rsq.        /* R-squared value .
compute Summary(1,2) = SQRT(ms.e). /* Root mean square error .
compute Summary(1,3) = f.          /* F-ratio for regression model.
compute Summary(1,4) = df1.        /* df for numerator.
compute Summary(1,5) = df2.        /* df for denominator.
compute Summary(1,6) = p.          /* p-value for F-test .

print BSE / format = "f8.3" /
 title= "Regression coefficients & standard errors" /
 clabel = "B" "SE(B)"
.

print Summary / format = "f8.3" /
 title= "Summary statistics for the regression model" /
 clabel = "R-sq" "RMSE" "F" "df1" "df2" "p"
.

end matrix.



-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-tp5055860p5055860.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Bruce Weaver
Administrator
Aha...that's fixed it.  Thanks Ryan.  For the benefit  of others, here's some relevant info from the fine manual.

GINV(M):
Moore-Penrose generalized inverse of a matrix. Takes a single argument.
Returns a matrix with the same dimensions as the transpose of the
argument. If A is the generalized inverse of a matrix M,then M*A*M=M
and A*M*A=A.Both A*M and M*A are symmetric.

INV(M):
Inverse of a matrix. Takes a single argument, which must be square and
nonsingular (that is, its determinant must not be 0). Returns a square
matrix having the same dimensions as the argument. If A is the inverse of
M,then M*A=A*M=I,where I is the identity matrix.


R B wrote
Bruce,

Use the generalized Inverse (GINV) function instead of INV function.

Ryan

On Wed, Dec 7, 2011 at 9:46 AM, Bruce Weaver <[hidden email]>wrote:

> Here's a little MATRIX mystery someone might be able to help me solve.
>
> *Background*:  I ran a linear regression model of the form Y = b0 + b1*X +
> b2*X**2 + E via the REGRESSION procedure, and everything worked fine.  Then
> I tried to run the same model via matrix equations, and it didn't run.  The
> error message said "Source operand is singular for INV."  I was using old
> MATRIX code that worked in the past, so I tried with a different
> explanatory
> variable, and it ran fine.  Finally, I went back to my original X-variable,
> but divided it by 100 to make the numbers smaller.  After that rescaling,
> it
> worked just fine.
>
> This brings me to my question:  Why does the model with X=WEIGHT run
> without
> problems using REGRESSION, but not using MATRIX?  Apparently MATRIX runs
> into some limitation that REGRESSION doesn't.  Is there something I can do
> to raise that limit?
>
> My syntax is appended below for anyone who wants to try the various models.
> The data come from the Cars.sav file found in your "Samples" folder.
>
> Cheers,
> Bruce
>
>
> *--- Syntax for the various models --- .
>
> * This demo uses data from the "Cars.sav" file that
> * comes with SPSS.  It runs two regression models
> * of the form Y = b0 + b1*X + b2*X**2 + E.
> * Both models run when I use REGRESSION.
> * But only the model with HORSE runs using MATRIX.
> * The model using WEIGHT results in a singularity.
> * If I rescale WEIGHT to WEIGHT/100, the model runs.
> * I don't know why the model using WEIGHT runs via
> * REGRESSION, but not via MATRIX .
>
> new file.
> dataset close all.
>
> * Change path below as necessary.
> get file = "C:\SPSSdata\Cars.sav".
>
> select if nmiss(mpg,weight,horse) EQ 0.
>
> GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .
>
> * Delete the odd case where vehicle weight < 1000 lbs.
> select if weight GT 1000.
> GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .
>
> compute hpsq = horse**2. /* horsepower squared.
> compute wsq = weight**2. /* weight-squared variable .
> compute w100 = weight/100.
> compute w100sq = w100**2.
>
>
> REGRESSION
>  /STATISTICS COEFF R ANOVA CI(95)
>  /DEPENDENT=MPG
>  /METHOD=ENTER horse hpsq
> .
> REGRESSION
>  /STATISTICS COEFF R ANOVA CI(95)
>  /DEPENDENT=MPG
>  /METHOD=ENTER weight wsq
> .
> REGRESSION
>  /STATISTICS COEFF R ANOVA CI(95)
>  /DEPENDENT=MPG
>  /METHOD=ENTER w100 w100sq
> .
>
> * ------------------------------------------ .
> * Now run the same models via MATRIX.
> * Run the MATRIX section 3 times, commenting
> * out two of the "GET x" commands each time
> * to choose the appropriate design matrix.
> * ------------------------------------------ .
>
> compute c = 1. /* First column of design matrix.
> exe.
>
> matrix.
>
> GET y
>  /file = *
>  /var = MPG.
>
> * Comment out one of the "GET x" commands below .
> GET x
>  /file = *
>  /var = c horse hpsq.
> *GET x
>  /file = *
>  /var = c weight wsq.
> *GET x
>  /file = *
>  /var = c w100 w100sq.
>
> compute B = inv(t(x)*x)*t(x)*y.  /* Vector of regression coefficients.
>
> compute n = nrow(y).                /* N = number of cases .
> compute ybar = msum(y)/n.                  /* Mean of Y values .
> compute dev.y = y-ybar.                        /* Matrix of (Y-Ybar)
> deviation scores.
> compute ss.y = t(dev.y)*dev.y.    /* SS(Y) = dev'dev.
> compute yhat = x*b.                                /* Yhat = XB .
> compute e = y - yhat.                            /* E = Y-Yhat .
> compute ss.e = t(e)*e.                          /* SSE = E'E .
> compute ss.reg = ss.y - ss.e.           /* SS_regression .
> compute df1 = ncol(x)-1.                /* df for numerator of F-ratio.
> compute df2 = n-df1-1.           /* df for denominator of F-ratio .
> compute ms.reg = ss.reg/df1.     /* MS regression .
> compute ms.e = ss.e/df2.                   /* Variance of residuals .
> compute covB = inv(t(x)*x)*ms.e.        /* (X'X)^-1*MS_error .
> compute var.b = diag(covB).                 /* Variances of B .
> compute se.b = sqrt(var.b).                 /* SE of B .
> compute f = ms.reg/ms.e.                       /* F-statistic .
> compute p = 1 - FCDF(f,df1,df2). /* p-value for F-ratio .
> compute Rsq = ss.reg / ss.y .    /* R-squared value .
>
> compute BSE = { B,se.b }.        /* Regression coefficients & standard
> errors .
> compute Summary = MAKE(1,6,0).
> compute Summary(1,1) = Rsq.        /* R-squared value .
> compute Summary(1,2) = SQRT(ms.e). /* Root mean square error .
> compute Summary(1,3) = f.          /* F-ratio for regression model.
> compute Summary(1,4) = df1.        /* df for numerator.
> compute Summary(1,5) = df2.        /* df for denominator.
> compute Summary(1,6) = p.          /* p-value for F-test .
>
> print BSE / format = "f8.3" /
>  title= "Regression coefficients & standard errors" /
>  clabel = "B" "SE(B)"
> .
>
> print Summary / format = "f8.3" /
>  title= "Summary statistics for the regression model" /
>  clabel = "R-sq" "RMSE" "F" "df1" "df2" "p"
> .
>
> end matrix.
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-tp5055860p5055860.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

David Marso
Administrator
In reply to this post by Jon K Peck
This can certainly be a problem (squaring large values then using them RAW).
matrix.

GET y /file = * /var = MPG.
GET x /file = * /var = c weight wsq.
compute XTX=t(x)*x).
CALL EIGEN(XTX,EVECS,EVALS).
PRINT XTX /FORMAT "F20.16".
PRINT EVALS / FORMAT "F20.16" / TITLE "Eigenvalues".
PRINT EVECS / FORMAT "F20.16" / TITLE "Eigen Vectors".
end matrix.
Run MATRIX procedure:

XTX
 391.0000000000000000 1162481.000000000000 3735183665.000000000
 1162481.000000000000 3735183665.000000000 12886361633495.00000
 3735183665.000000000 12886361633495.00000 47237850341145600.00

Eigenvalues
 47237853856511500.00
 219818099.1347533000
   1.9308414318132260

Eigen Vectors
    .0000000790718362   -.0006529613287495    .9999997868207250
    .0002727973673845   -.9999997496115200   -.0006529613260240
    .9999999627907940    .0002727973608606    .0000000990543151

------ END MATRIX -----
AFAIK: Regression uses the Correlation matrix rather than the raw data matrix and the SWEEP operator to extract the regression coefficients!

---
Jon K Peck wrote
My guess is that the difference is due to using a numerically unstable way
to compute the regression.  The Regression procedure does not proceed by
just calculating with the textbook expression.  It uses numerical methods
that are designed to minimize precision loss due to extreme magnitudes and
high collinearity.

If you were to rewrite your regression code, say, to use a Cholesky
factorization rather than a brute force matrix inversion, I'll bet that
you could get the same results.  It could also be that the INV function
singularity criterion is too sensitive to scale differences.  I haven't
looked into that code.

Regards,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:   Bruce Weaver <[hidden email]>
To:     [hidden email]
Date:   12/07/2011 07:50 AM
Subject:        [SPSSX-L] A MATRIX mystery
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Here's a little MATRIX mystery someone might be able to help me solve.

*Background*:  I ran a linear regression model of the form Y = b0 + b1*X +
b2*X**2 + E via the REGRESSION procedure, and everything worked fine. Then
I tried to run the same model via matrix equations, and it didn't run. The
error message said "Source operand is singular for INV."  I was using old
MATRIX code that worked in the past, so I tried with a different
explanatory
variable, and it ran fine.  Finally, I went back to my original
X-variable,
but divided it by 100 to make the numbers smaller.  After that rescaling,
it
worked just fine.

This brings me to my question:  Why does the model with X=WEIGHT run
without
problems using REGRESSION, but not using MATRIX?  Apparently MATRIX runs
into some limitation that REGRESSION doesn't.  Is there something I can do
to raise that limit?

My syntax is appended below for anyone who wants to try the various
models.
The data come from the Cars.sav file found in your "Samples" folder.

Cheers,
Bruce


*--- Syntax for the various models --- .

* This demo uses data from the "Cars.sav" file that
* comes with SPSS.  It runs two regression models
* of the form Y = b0 + b1*X + b2*X**2 + E.
* Both models run when I use REGRESSION.
* But only the model with HORSE runs using MATRIX.
* The model using WEIGHT results in a singularity.
* If I rescale WEIGHT to WEIGHT/100, the model runs.
* I don't know why the model using WEIGHT runs via
* REGRESSION, but not via MATRIX .

new file.
dataset close all.

* Change path below as necessary.
get file = "C:\SPSSdata\Cars.sav".

select if nmiss(mpg,weight,horse) EQ 0.

GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

* Delete the odd case where vehicle weight < 1000 lbs.
select if weight GT 1000.
GRAPH /SCATTERPLOT(BIVAR)=weight WITH mpg .

compute hpsq = horse**2. /* horsepower squared.
compute wsq = weight**2. /* weight-squared variable .
compute w100 = weight/100.
compute w100sq = w100**2.


REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER horse hpsq
.
REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER weight wsq
.
REGRESSION
 /STATISTICS COEFF R ANOVA CI(95)
 /DEPENDENT=MPG
 /METHOD=ENTER w100 w100sq
.

* ------------------------------------------ .
* Now run the same models via MATRIX.
* Run the MATRIX section 3 times, commenting
* out two of the "GET x" commands each time
* to choose the appropriate design matrix.
* ------------------------------------------ .

compute c = 1. /* First column of design matrix.
exe.

matrix.

GET y
 /file = *
 /var = MPG.

* Comment out one of the "GET x" commands below .
GET x
 /file = *
 /var = c horse hpsq.
*GET x
 /file = *
 /var = c weight wsq.
*GET x
 /file = *
 /var = c w100 w100sq.

compute B = inv(t(x)*x)*t(x)*y.  /* Vector of regression coefficients.

compute n = nrow(y).                /* N = number of cases .
compute ybar = msum(y)/n.                  /* Mean of Y values .
compute dev.y = y-ybar.                        /* Matrix of (Y-Ybar)
deviation scores.
compute ss.y = t(dev.y)*dev.y.    /* SS(Y) = dev'dev.
compute yhat = x*b.                                /* Yhat = XB .
compute e = y - yhat.                            /* E = Y-Yhat .
compute ss.e = t(e)*e.                          /* SSE = E'E .
compute ss.reg = ss.y - ss.e.           /* SS_regression .
compute df1 = ncol(x)-1.                /* df for numerator of F-ratio.
compute df2 = n-df1-1.           /* df for denominator of F-ratio .
compute ms.reg = ss.reg/df1.     /* MS regression .
compute ms.e = ss.e/df2.                   /* Variance of residuals .
compute covB = inv(t(x)*x)*ms.e.        /* (X'X)^-1*MS_error .
compute var.b = diag(covB).                 /* Variances of B .
compute se.b = sqrt(var.b).                 /* SE of B .
compute f = ms.reg/ms.e.                       /* F-statistic .
compute p = 1 - FCDF(f,df1,df2). /* p-value for F-ratio .
compute Rsq = ss.reg / ss.y .    /* R-squared value .

compute BSE = { B,se.b }.        /* Regression coefficients & standard
errors .
compute Summary = MAKE(1,6,0).
compute Summary(1,1) = Rsq.        /* R-squared value .
compute Summary(1,2) = SQRT(ms.e). /* Root mean square error .
compute Summary(1,3) = f.          /* F-ratio for regression model.
compute Summary(1,4) = df1.        /* df for numerator.
compute Summary(1,5) = df2.        /* df for denominator.
compute Summary(1,6) = p.          /* p-value for F-test .

print BSE / format = "f8.3" /
 title= "Regression coefficients & standard errors" /
 clabel = "B" "SE(B)"
.

print Summary / format = "f8.3" /
 title= "Summary statistics for the regression model" /
 clabel = "R-sq" "RMSE" "F" "df1" "df2" "p"
.

end matrix.



-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-tp5055860p5055860.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Angeliki
In reply to this post by Ryan
Hi,

I have the same problems on a mediation analysis I m trying to run. How do I change it to GINV?

Thanks,
Angeliki
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Bruce Weaver
Administrator
For the benefit of those not reading via Nabble, here is the original thread Angeliki is responding to:

http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-td5055860.html

Angeliki, are you "rolling your own" mediation analysis using the MATRIX language?  Or are you using the REGRESSION command to estimate a series of models in the manner described by Baron & Kenny (1986), among others (e.g., http://davidakenny.net/cm/mediate.htm)?  I suspect the latter, in which case you may be able to fix things by centering some of your variables.  But it would help if you posted  your syntax with error or warning messages.

By the way, Andrew Hayes' PROCESS macro seems to be extremely popular with SPSS-using mediation enthusiasts these days.  

If you inferred from that last sentence that I am not a mediation enthusiast, you inferred correctly.  ;-)  My main concern is that many people seem to forget that, "Mediation and confounding are identical statistically and can be distinguished only on conceptual grounds" (Source:  https://www.ncbi.nlm.nih.gov/pubmed/11523746).  

HTH.

Angeliki wrote
Hi,

I have the same problems on a mediation analysis I m trying to run. How do I change it to GINV?

Thanks,
Angeliki
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

transhumanist
Hi, I used Hayes' PROCESS macro and got exactly the same error message "source operand is singular for inv". Is it possible to change the function from INV to GINV in PROCESS macro?


Bruce Weaver wrote
For the benefit of those not reading via Nabble, here is the original thread Angeliki is responding to:

http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-td5055860.html

Angeliki, are you "rolling your own" mediation analysis using the MATRIX language?  Or are you using the REGRESSION command to estimate a series of models in the manner described by Baron & Kenny (1986), among others (e.g., http://davidakenny.net/cm/mediate.htm)?  I suspect the latter, in which case you may be able to fix things by centering some of your variables.  But it would help if you posted  your syntax with error or warning messages.

By the way, Andrew Hayes' PROCESS macro seems to be extremely popular with SPSS-using mediation enthusiasts these days.  

If you inferred from that last sentence that I am not a mediation enthusiast, you inferred correctly.  ;-)  My main concern is that many people seem to forget that, "Mediation and confounding are identical statistically and can be distinguished only on conceptual grounds" (Source:  https://www.ncbi.nlm.nih.gov/pubmed/11523746).  

HTH.

Angeliki wrote
Hi,

I have the same problems on a mediation analysis I m trying to run. How do I change it to GINV?

Thanks,
Angeliki
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Bruce Weaver
Administrator
Hi Angeliki.  Here's what I would try:

1. Make a copy of the syntax file containing the PROCESS macro definition.
2. In the copy, search for "inv(" and replace with "ginv(".  Without the quotes, of course.

Then run the modified macro definition, and try your analysis again.


transhumanist wrote
Hi, I used Hayes' PROCESS macro and got exactly the same error message "source operand is singular for inv". Is it possible to change the function from INV to GINV in PROCESS macro?


Bruce Weaver wrote
For the benefit of those not reading via Nabble, here is the original thread Angeliki is responding to:

http://spssx-discussion.1045642.n5.nabble.com/A-MATRIX-mystery-td5055860.html

Angeliki, are you "rolling your own" mediation analysis using the MATRIX language?  Or are you using the REGRESSION command to estimate a series of models in the manner described by Baron & Kenny (1986), among others (e.g., http://davidakenny.net/cm/mediate.htm)?  I suspect the latter, in which case you may be able to fix things by centering some of your variables.  But it would help if you posted  your syntax with error or warning messages.

By the way, Andrew Hayes' PROCESS macro seems to be extremely popular with SPSS-using mediation enthusiasts these days.  

If you inferred from that last sentence that I am not a mediation enthusiast, you inferred correctly.  ;-)  My main concern is that many people seem to forget that, "Mediation and confounding are identical statistically and can be distinguished only on conceptual grounds" (Source:  https://www.ncbi.nlm.nih.gov/pubmed/11523746).  

HTH.

Angeliki wrote
Hi,

I have the same problems on a mediation analysis I m trying to run. How do I change it to GINV?

Thanks,
Angeliki
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Kirill Orlov
I wouldn't advice blunt changing INV to GINV is some (not yours and complex) code without knowing the theory behind the algorithm profoundly. I is a bad idea. Many statistical algorithms ruin conceptually when used with collinear data - even if computationally GINV might help to "solve" it: solve it and gives you perhaps almost random, unreasonable, unstable results.

17.10.2016 15:57, Bruce Weaver пишет:
Hi Angeliki.  Here's what I would try:

1. Make a copy of the syntax file containing the PROCESS macro definition.
2. In the copy, search for "inv(" and replace with "ginv(".  Without the
quotes, of course.

Then run the modified macro definition, and try your analysis again.





Avast logo

Это сообщение проверено на вирусы антивирусом Avast.
www.avast.com


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

David Marso
Administrator

Indeed!
Consider the following demo ;-(

MATRIX.
COMPUTE X0=MAKE(100,1,1).
COMPUTE X1=UNIFORM(100,1).
COMPUTE X2=X1*2.
COMPUTE X3=(X1+X2)/2 .
COMPUTE X={X0,X1,X2,X3}.
COMPUTE Y=X0 + .5*X1 + .7*X2 + X3.
/* Following fails due to Singular X'X */.
/*COMPUTE B=INV(T(X)*X)*T(X)*Y.
COMPUTE B=GINV(T(X)*X)*T(X)*Y.
/* Estimates do not reproduce coefficients from generating function */ .
PRINT B.
SAVE {Y,X} /OUTFILE * / VARIABLES Y X0 X1 X2 X3.
END MATRIX.
REGRESSION DEPENDENT Y / METHOD ENTER X1 TO X3.

Kirill Orlov wrote
I wouldn't advice blunt changing INV to GINV is some (not yours and
complex) code without knowing the theory behind the algorithm
profoundly. I is a bad idea. Many statistical algorithms ruin
conceptually when used with collinear data - even if computationally
GINV might help to "solve" it: solve it and gives you perhaps almost
random, unreasonable, unstable results.

17.10.2016 15:57, Bruce Weaver пишет:
> Hi Angeliki.  Here's what I would try:
>
> 1. Make a copy of the syntax file containing the PROCESS macro definition.
> 2. In the copy, search for "inv(" and replace with "ginv(".  Without the
> quotes, of course.
>
> Then run the modified macro definition, and try your analysis again.
>



---
Это сообщение проверено на вирусы антивирусом Avast.
https://www.avast.com/antivirus

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

David Marso
Administrator
Examining the Process Macro I arrive at the conclusion that at best the implementation is SLOPPY.  Have never seen an uncommented 5000 line matrix macro until now.  Rather hard to follow it with all those magic numbers and repetitive coding.  I would be temped to do some slice and dice were it not for

"/* Permission is hereby granted, free of charge, to any person obtaining a copy of this software */.
/* and associated documentation files (the "Software"), to use the software in this form.  Distribution */.
/* after modification is prohibited
, ..."
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Kirill Orlov
I've never used PROCESS macro and didn't try to follow what it does inside. But my very surface, immediate impression was that it could be re-written probably some 4 times shorter and times faster to run.


18.10.2016 18:04, David Marso пишет:
Examining the Process Macro I arrive at the conclusion that at best the
implementation is SLOPPY.  Have never seen an uncommented 5000 line matrix
macro until now.  Rather hard to follow it with all those magic numbers and
repetitive coding.  I would be temped to do some slice and dice were it not
for

"/* Permission is hereby granted, free of charge, to any person obtaining a
copy of this software */.
/* and associated documentation files (the "Software"), to use the software
in this form.  *Distribution */.
/* after modification is prohibited*, ..."





Avast logo

Это сообщение проверено на вирусы антивирусом Avast.
www.avast.com


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: A MATRIX mystery

Kirill Orlov
In reply to this post by David Marso
The more that ginv is slower than inv, too. Ginv is typically based on svd internally, and svd is an expensive operation.

Btw, the fastest way to solve OLS regression (under non-concollinearity, of course, else it is crap, as you just showed below) is via solve() of linear equations. I used that sweet way in my matrix !regress() function on my web-page.


18.10.2016 17:48, David Marso пишет:
Indeed!
Consider the following demo ;-(

MATRIX.
COMPUTE X0=MAKE(100,1,1).
COMPUTE X1=UNIFORM(100,1).
COMPUTE X2=X1*2.
COMPUTE X3=(X1+X2)/2 .
COMPUTE X={X0,X1,X2,X3}.
COMPUTE Y=X0 + .5*X1 + .7*X2 + X3.
/* Following fails due to Singular X'X */.
/*COMPUTE B=INV(T(X)*X)*T(X)*Y.
COMPUTE B=GINV(T(X)*X)*T(X)*Y.
/* Estimates do not reproduce coefficients from generating function */ .
PRINT B.
SAVE {Y,X} /OUTFILE * / VARIABLES Y X0 X1 X2 X3.
END MATRIX.
REGRESSION DEPENDENT Y / METHOD ENTER X1 TO X3.





Avast logo

Это сообщение проверено на вирусы антивирусом Avast.
www.avast.com


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD