One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming
SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some
correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS?
Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. Emil
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
I have never worried much about errors-in-variables because it
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
does not affect the testing. I was concerned about the tests, and never took exact coefficients too seriously. Neither did anyone else in my particular area; we paid attention to the testing. If the tests are your concern, then do not be worried. A method that takes e-in-v into account will produce a different estimate of coefficients. The notion here is the same as "correcting for attenuation" when looking at simple Pearson r's. That is not something that most people do. Or expect. "Normal" is an assumption that applies to the residuals. It is the big outliers, or correlated outliers, that affect the robustness of the testing. "Nonlinearity" is often misunderstood as a character of the predictor, whereas it should be applied to the relationship between predictor and outcome. I find it useful to think of the "equal-interval" relationship, where equal-intervals of changes in the predictor should result in equal- intervals of changes in the outcome. -- Rich Ulrich Date: Thu, 5 May 2016 16:36:03 +0000 From: [hidden email] Subject: Statistics with errors in the x-variable To: [hidden email] One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming
SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some
correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS? Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. |
Hi Rich,
When you say it does not affect the tests, do you mean that the statistical results would be identical whether errors-in-variables are ignored or included? I am wondering if there are any references or simulations to this end. When I was searching the scientific literature (biology), I also found that authors always ignored predictor errors, even excluded horizontal error bars, but biological scientific papers aren't a benchmark of good statistics, so I wasn't sure if that approach was correct. Much to be said about the residual normality, since SPSS only outputs conditional residuals for MIXED, yet West says that normality should be assessed using the studentized or standardized residuals/eBLUPs. Interesting point on the definition of nonlinearity. It would seem that definition will always be satisfied automatically, unless one of the axes is categorical data that's treated as continuous without proper transformation. I've seen some colleagues make a graph using x axes values that are not equidistant in terms of measurement (20V, 60V, 100V...) yet are so plotted and analyzed (1,2,3,...). This is essentially a rank transformation. For categorical analyzes this doesn't matter, except when these values are repeated measures and the users are trying to establish polynomial relationships with rANOVA. Emil From: Rich Ulrich [[hidden email]]
Sent: Thursday, May 05, 2016 10:19 AM To: Rudobeck, Emil (LLU); SPSS list Subject: RE: Statistics with errors in the x-variable I have never worried much about errors-in-variables because it
does not affect the testing. I was concerned about the tests, and never took exact coefficients too seriously. Neither did anyone else in my particular area; we paid attention to the testing. If the tests are your concern, then do not be worried. A method that takes e-in-v into account will produce a different estimate of coefficients. The notion here is the same as "correcting for attenuation" when looking at simple Pearson r's. That is not something that most people do. Or expect. "Normal" is an assumption that applies to the residuals. It is the big outliers, or correlated outliers, that affect the robustness of the testing. "Nonlinearity" is often misunderstood as a character of the predictor, whereas it should be applied to the relationship between predictor and outcome. I find it useful to think of the "equal-interval" relationship, where equal-intervals of changes in the predictor should result in equal- intervals of changes in the outcome. -- Rich Ulrich Date: Thu, 5 May 2016 16:36:03 +0000 From: [hidden email] Subject: Statistics with errors in the x-variable To: [hidden email] One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming
SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some
correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS?
Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don’t hear from, or attachments you usually don’t receive or didn’t expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email.
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
In reply to this post by Rudobeck, Emil (LLU)
Issues of errors-in-x and errors-in-y are covered briefly in the
following source: Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Psychology Press. NOTE: LEA originally published the text in 1991 but the Psychology Press which bought the LEA catalog has re-issued a non-updated version in 2013 which is why one may see 2013 as the publication year, as on books.google.com: https://books.google.com/books?hl=en&lr=&id=WXt_NSiqV7wC&oi=fnd&pg=PR2&dq=pedhazur+schmelkin&ots=7svoK7egOQ&sig=o1I40WF9mrHUqnO_kkoHySHzy3I#v=onepage&q=pedhazur%20schmelkin&f=false Quoting from page 391: |(for detailed discussions, see Blalock, Wells, & Carter, 1970; |Bohrnstedt & Carter, 1971; Cochran, 1968, 1970; Linn & Werts, |1982). Unlike simple regression analysis, random measurement |errors in multiple regression may lead to either overestimation or |underestimation of regression coefficients. Further, the biasing |effects of measurement errors are not limited to the estimation |of the regression coefficient for the variable being measured but |affect also estimates of regression coefficients for other variables |correlated with the variable in question. Thus, estimates of regression |coefficients for variables measured with high reliability may be |biased as a result of their correlations with variables measured |with low reliability. | |Generally speaking, the lower the reliabilities of the measures |used and the higher the intercorrelations among the variables, |the more adverse the biasing effects of measurement errors. |Under such circumstances, regression coefficients should be |interpreted with great circumspection. Caution is particularly |called for when attempting to interpret magnitudes of standardized |regression coefficients as indicating the relative importance |of the variables with which they are associated. It would be wiser |to refrain from such attempts altogether when measurement |errors are prevalent. | |Thus far, we have not dealt with the effects of errors in the |measurement of the dependent variable. Such errors do not lead |to bias in the estimation of the unstandardized regression |coefficient (b). They do, however, lead to the attenuation of the |correlation between the independent and the dependent variable, |hence, to the attenuation of the standardized regression |coefficient (beta).(footnote18) Becaute 1 - r^2 (or 1 - R^2 in |multiple regression analysis) is part of the error term, it can |be seen that measurement errors in the dependent variable |reduce the sensitivity of the statistical analysis. | |Of various approaches and remedies for managing the magnitude |of the errors and of taking into taking account of their impact |on the estimation of model parameters, probably the most |promising are those incorporated in structural equation modeling |(SEM). Chapters 23 and 24 are devoted to analytic approaches |for such models, where it is also shown how measurement |errors are taken into account when estimating the parameters |of the model. | |Although approaches to managing measurement errors are useful, |greater benefits would be reaped if researchers were to pay |more attention to the validity and reliability of measures; if they |directed their efforts towards optimizing them instead of attempting |to counteract adverse effects of poorly conceived and poorly |constructed measures. In psychology it has been traditional to use SEM to construct a measurement model for the predictors X (true value + error) and relating the latent variables to each other and the outcome or dependent variable (Y or, if Y is measured with error, it's latent variable). For an example using AMOS, see: http://www.spss.com.hk/amos/measurement_error_application.htm The answer to the question "can one ignore measurement error in the x-variables" depends on what are the differences in analyses that ignore them (traditional) or incorporate them (e.g., SEM). For a more extensive presentation on the role of measurement error in linear and nonlinear models, see: Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: a modern perspective. CRC press. Parts can be previewed on books.google.com; see: https://books.google.com/books?id=9kBx5CPZCqkC&pg=PA52&dq=%22multiple+regression%22+%22measurement+error%22&hl=en&sa=X&ved=0ahUKEwiZoIqT0sPMAhUEHx4KHWg1CCoQ6AEIMDAD#v=onepage&q=%22multiple%20regression%22%20%22measurement%20error%22&f=false Finally, Bayesian methods have also been employed to deal with this problem and examples of doing this in R are available on the R-bloggers' website; see: http://www.r-bloggers.com/bayesian-type-ii-regression/ and http://www.r-bloggers.com/errors-in-variables-models-in-stan/ -Mike Palij New York University [hidden email] ----- Original Message ----- From: Rudobeck, Emil (LLU) To: [hidden email] Sent: Thursday, May 05, 2016 12:36 PM Subject: Statistics with errors in the x-variable One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS? Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. Emil ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks Mike. As expected, the solution isn't simple. I will need to read up on SEM. Luckily I have AMOS. The Bayesian approach will be a much bigger leap. I found out that Deming regression, which can also be used, has been submitted as an enhancement request for SPSS. I don't know when they will incorporate it. I also found out that 2 stage least squares regression (2SLS) is an alternative to SEM and is available in SPSS. If anyone here knows the differences between the 2SLS and SEM approaches, it would be interesting to find out. I will have to find some good sources on the practical application of SPSS and AMOS for using either 2SLS or SEM.
If one has to run SEM to show that the results aren't much different than regression, then I don't see the point there since for a publication both tests would need to be reported. As such, it would make sense to do only SEM to begin with and simplify the findings. There is a third approach which to me doesn't seem to be incorrect: if you are experimentally measuring Y11, Y12 in the same animal, then Y21 and Y22 in another animal, all with the same predefined X (without error), instead of graphing the means of Y11, Y21 vs Y12, Y22 (which would result in horizontal error), another approach would be to simply use ratios to get rid of the horizontal error. So in this case it would be the means of Y11/Y12, Y21/Y22 vs X (no measurement error). The main issue with the latter approach is that ratios are more difficult to explain than a simple Y v X graph. If I'm overlooking something statistically, let me know. Emil ________________________________________ From: Mike Palij [[hidden email]] Sent: Thursday, May 05, 2016 12:30 PM To: Rudobeck, Emil (LLU); [hidden email] Cc: Michael Palij Subject: Re: Statistics with errors in the x-variable Issues of errors-in-x and errors-in-y are covered briefly in the following source: Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Psychology Press. NOTE: LEA originally published the text in 1991 but the Psychology Press which bought the LEA catalog has re-issued a non-updated version in 2013 which is why one may see 2013 as the publication year, as on books.google.com: https://books.google.com/books?hl=en&lr=&id=WXt_NSiqV7wC&oi=fnd&pg=PR2&dq=pedhazur+schmelkin&ots=7svoK7egOQ&sig=o1I40WF9mrHUqnO_kkoHySHzy3I#v=onepage&q=pedhazur%20schmelkin&f=false Quoting from page 391: |(for detailed discussions, see Blalock, Wells, & Carter, 1970; |Bohrnstedt & Carter, 1971; Cochran, 1968, 1970; Linn & Werts, |1982). Unlike simple regression analysis, random measurement |errors in multiple regression may lead to either overestimation or |underestimation of regression coefficients. Further, the biasing |effects of measurement errors are not limited to the estimation |of the regression coefficient for the variable being measured but |affect also estimates of regression coefficients for other variables |correlated with the variable in question. Thus, estimates of regression |coefficients for variables measured with high reliability may be |biased as a result of their correlations with variables measured |with low reliability. | |Generally speaking, the lower the reliabilities of the measures |used and the higher the intercorrelations among the variables, |the more adverse the biasing effects of measurement errors. |Under such circumstances, regression coefficients should be |interpreted with great circumspection. Caution is particularly |called for when attempting to interpret magnitudes of standardized |regression coefficients as indicating the relative importance |of the variables with which they are associated. It would be wiser |to refrain from such attempts altogether when measurement |errors are prevalent. | |Thus far, we have not dealt with the effects of errors in the |measurement of the dependent variable. Such errors do not lead |to bias in the estimation of the unstandardized regression |coefficient (b). They do, however, lead to the attenuation of the |correlation between the independent and the dependent variable, |hence, to the attenuation of the standardized regression |coefficient (beta).(footnote18) Becaute 1 - r^2 (or 1 - R^2 in |multiple regression analysis) is part of the error term, it can |be seen that measurement errors in the dependent variable |reduce the sensitivity of the statistical analysis. | |Of various approaches and remedies for managing the magnitude |of the errors and of taking into taking account of their impact |on the estimation of model parameters, probably the most |promising are those incorporated in structural equation modeling |(SEM). Chapters 23 and 24 are devoted to analytic approaches |for such models, where it is also shown how measurement |errors are taken into account when estimating the parameters |of the model. | |Although approaches to managing measurement errors are useful, |greater benefits would be reaped if researchers were to pay |more attention to the validity and reliability of measures; if they |directed their efforts towards optimizing them instead of attempting |to counteract adverse effects of poorly conceived and poorly |constructed measures. In psychology it has been traditional to use SEM to construct a measurement model for the predictors X (true value + error) and relating the latent variables to each other and the outcome or dependent variable (Y or, if Y is measured with error, it's latent variable). For an example using AMOS, see: http://www.spss.com.hk/amos/measurement_error_application.htm The answer to the question "can one ignore measurement error in the x-variables" depends on what are the differences in analyses that ignore them (traditional) or incorporate them (e.g., SEM). For a more extensive presentation on the role of measurement error in linear and nonlinear models, see: Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: a modern perspective. CRC press. Parts can be previewed on books.google.com; see: https://books.google.com/books?id=9kBx5CPZCqkC&pg=PA52&dq=%22multiple+regression%22+%22measurement+error%22&hl=en&sa=X&ved=0ahUKEwiZoIqT0sPMAhUEHx4KHWg1CCoQ6AEIMDAD#v=onepage&q=%22multiple%20regression%22%20%22measurement%20error%22&f=false Finally, Bayesian methods have also been employed to deal with this problem and examples of doing this in R are available on the R-bloggers' website; see: http://www.r-bloggers.com/bayesian-type-ii-regression/ and http://www.r-bloggers.com/errors-in-variables-models-in-stan/ -Mike Palij New York University [hidden email] ----- Original Message ----- From: Rudobeck, Emil (LLU) To: [hidden email] Sent: Thursday, May 05, 2016 12:36 PM Subject: Statistics with errors in the x-variable One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS? Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. Emil ________________________________ WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don’t hear from, or attachments you usually don’t receive or didn’t expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email. CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I was going to mention 2sls, which is based on instrumental variables. IVs were the original (as far as I know) method for dealing with errors in variables problems. Besides the built-in 2sls command in Statistics, there is an extension command called stats eqnsystem that provides a variety of estimators for equation systems.
Also, assuming that you have some idea of the error variance, you can explore the effect on your estimates and tests by adding random noise to the variables to see how it affects the results. Generally you will find that multicollinearity exacerbates the effect of errors in variables. On Thursday, May 5, 2016, Rudobeck, Emil (LLU) <[hidden email]> wrote: Thanks Mike. As expected, the solution isn't simple. I will need to read up on SEM. Luckily I have AMOS. The Bayesian approach will be a much bigger leap. I found out that Deming regression, which can also be used, has been submitted as an enhancement request for SPSS. I don't know when they will incorporate it. I also found out that 2 stage least squares regression (2SLS) is an alternative to SEM and is available in SPSS. If anyone here knows the differences between the 2SLS and SEM approaches, it would be interesting to find out. I will have to find some good sources on the practical application of SPSS and AMOS for using either 2SLS or SEM. -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rudobeck, Emil (LLU)
Emil, to take your paragraphs in order:
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
1. The /test/ results (but not the coefficients) will be robust when ignoring e-in-v. In some cases, this results is obvious from inspection of the computation of the e-in-v error terms, which simply translate the robust tests and pretend that the errors still apply to the new coefficients -- despite some extra (and not error-less) manipulation. And if you are considering SEM (or AMOS), they are famous for not providing meaningful tests ... so you compare your /set/ of models and have to be happy if the best one seems much better than the others. They are for estimation of coefficients and alternate paths, not for testing the minimal existence of an effect. 2. "Residuals" matter because they are whose sums are squared to form a chi-squared variate that makes up the ANOVA F-test. They don't have to be really great, but you don't want one or two outliers contributing half the sum of squares. If your sample is small, you don't have much power for /testing/ normality; if your sample is large enough, you don't have much concern, because the F-test will still be pretty good. (Pay more attention to heterogeneity, or correlation.) Consider what is being measured... does it seem like equal intervals (with the outcome in mind)? 3. Nonlinearity. Apparently you missed my meaning entirely. Obviously, a linear equation will produce equal predictions for equal intervals. But: Does common sense tell you that is realistic? If you don't have a particular outcome in mind, consider the "latent factor" that is supposed to be measured by your score. There is a fairly big difference in status (outcome) between scoring 3 errors (failing?) on a dementia scale versus 0 errors (healthy) out of 31, where there is very little difference between scoring 20 versus 23 (seriously dysfunctional). For "20 versus 23", you might seriously wonder if the patient would vary by that much if you re-tested a few hours later. For the particular scale I have in mind, for a sample that spanned the range of scores, I think I recommended using the square root of the number of errors. Almost any model building with that score /ought/ to be concerned with that latent factor, and not with the count of errors. [Actually, the protocol-scoring reported scores to 31-- which created an unfortunate bias toward "demented" when a patient was deaf/blind/whatever and could not be scored on some item.] Weekly evaluation of a new psychiatric treatment is not done "equal interval" in time from the start of treatment. Doing followups at (4 days, a week, 2 weeks, 4 weeks, 8 weeks) will be an approximation of equal intervals for outcome which will not be perfect; but it will be economical, it will avoid the negative effects of "overtesting", and it will be far more linear in changes than counting days as equal. Clinical investigators typically /chose/ their intervals to represent what they expect to approximate equal amounts of change, up to the maintenance phase. If you know the "linearity" that the clinician expects, it makes sense to build your default model on the spacing and then, if you want, test for the departure from the expected linearity. -- Rich Ulrich Date: Thu, 5 May 2016 18:02:29 +0000 From: [hidden email] Subject: Re: Statistics with errors in the x-variable To: [hidden email] Hi Rich, When you say it does not affect the tests, do you mean that the statistical results would be identical whether errors-in-variables are ignored or included? I am wondering if there are any references or simulations to this end. When I was searching the scientific literature (biology), I also found that authors always ignored predictor errors, even excluded horizontal error bars, but biological scientific papers aren't a benchmark of good statistics, so I wasn't sure if that approach was correct. Much to be said about the residual normality, since SPSS only outputs conditional residuals for MIXED, yet West says that normality should be assessed using the studentized or standardized residuals/eBLUPs. Interesting point on the definition of nonlinearity. It would seem that definition will always be satisfied automatically, unless one of the axes is categorical data that's treated as continuous without proper transformation. I've seen some colleagues make a graph using x axes values that are not equidistant in terms of measurement (20V, 60V, 100V...) yet are so plotted and analyzed (1,2,3,...). This is essentially a rank transformation. For categorical analyzes this doesn't matter, except when these values are repeated measures and the users are trying to establish polynomial relationships with rANOVA. Emil From: Rich Ulrich [[hidden email]]
Sent: Thursday, May 05, 2016 10:19 AM To: Rudobeck, Emil (LLU); SPSS list Subject: RE: Statistics with errors in the x-variable I have never worried much about errors-in-variables because it does not affect the testing. I was concerned about the tests, and never took exact coefficients too seriously. Neither did anyone else in my particular area; we paid attention to the testing. If the tests are your concern, then do not be worried. A method that takes e-in-v into account will produce a different estimate of coefficients. The notion here is the same as "correcting for attenuation" when looking at simple Pearson r's. That is not something that most people do. Or expect. "Normal" is an assumption that applies to the residuals. It is the big outliers, or correlated outliers, that affect the robustness of the testing. "Nonlinearity" is often misunderstood as a character of the predictor, whereas it should be applied to the relationship between predictor and outcome. I find it useful to think of the "equal-interval" relationship, where equal-intervals of changes in the predictor should result in equal- intervals of changes in the outcome. -- Rich Ulrich Date: Thu, 5 May 2016 16:36:03 +0000 From: [hidden email] Subject: Statistics with errors in the x-variable To: [hidden email] One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming
SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some
correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS?
Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. |
In reply to this post by Jon Peck
I think that there is some confusion about (a) instrumental variables
and (b) 2SLS analysis. Let me suggest the following chapter by Ken Bollen: Bollen, K. A. (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37-72. In his Figure 1, he provide the common definition of Instrumental Variables (IV), namely, there is a covariance/correlation between a predictor X' and the MODEL error e. This can occur evem if X is not a latent variable (i.e., X = Xi/Ksi + epsilon-x, in LISREL notiation). In his Figure 2, Bollwns identifies several conditions where X can be correlated with epsilon-y, the model error: (1) Figure 2b represents the "measurement error in X" that we have been disccusiong so far. OLS regression uses the empirical X which combine Xi + epsilon-x -- becaise Xi is correlated with Y, epsilon-x will become correlated with epsilon-y (see page 39). Creating the appropriate measurement model for X, that is, Xi and epsilon-x, allows one to use Xi in the regression and epsilon-x now stands alone, independent of all other entities. (2) Figure 2a represents a model where empirical X and Y have a feedback relationship (reciprical causation) and both have epsilon terms that are correlated and influence their associated empirical indicators (i.e., epsilon-x is causally related to X and epsilon-y is causally related to Y). (3) Figure 2d assumes X is a lagged version of Y (a measure of Y at a prior time) which induced an autoregressive relationship between epsilon-x and expsilong-y, and each affect X and Y similar to that in Figure 2a. (4) Figure 2c assumes that a variable L is omitted but is causally related to X and Y. L relationship to Y is expressed through epsilon-x. So, 2SLS can be used to correct for the correlation between epsilon-x and epsilon-y but measurement error in x is just one situation whete this occurs -- one has to determine whether one data represents the model in Figure 2b or the in the other models, which will require different solutions. Ken Bollen has been studying this situation for a while and he has suggested various alternative analyses (he does not identify software package solutions, so one would have to write the code to identify the appropriate model and then modify the regression appropriately. On page 59 Bollen cites one of his papers where he proposed a 2-stage analysis strategy that can assist in determining the number of instrumental variables to use; he reviews other methods tht can be used to check one's model. His Table 1 (page 64) contains a list of references, which method of analysis was used (e.g., SEM, 2SLS, etc.) to evaluate a model. This was published in 2012 but one might want to look at an earlier paper by Bollen where he argues for 2-stage analyses: Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109-121. Bollen has published post 2012 papers that one might also want to look at: Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation models. In Handbook of causal analysis for social research (pp. 301-328). Springer Netherlands. Bollen, K. A., Kolenikov, S., & Bauldry, S. (2014). Model-Implied Instrumental Variable—Generalized Method of Moments (MIIV-GMM) Estimators for Latent Variable Models. Psychometrika, 79(1), 20-50. -Mike Palij New York University [hidden email] ----- Original Message ----- From: Jon Peck To: [hidden email] Sent: Thursday, May 05, 2016 10:34 PM Subject: Re: Statistics with errors in the x-variable I was going to mention 2sls, which is based on instrumental variables. IVs were the original (as far as I know) method for dealing with errors in variables problems. Besides the built-in 2sls command in Statistics, there is an extension command called stats eqnsystem that provides a variety of estimators for equation systems. Also, assuming that you have some idea of the error variance, you can explore the effect on your estimates and tests by adding random noise to the variables to see how it affects the results. Generally you will find that multicollinearity exacerbates the effect of errors in variables. -------------------------------------------- On Thursday, May 5, 2016, Rudobeck, Emil (LLU) <[hidden email]> wrote: Thanks Mike. As expected, the solution isn't simple. I will need to read up on SEM. Luckily I have AMOS. The Bayesian approach will be a much bigger leap. I found out that Deming regression, which can also be used, has been submitted as an enhancement request for SPSS. I don't know when they will incorporate it. I also found out that 2 stage least squares regression (2SLS) is an alternative to SEM and is available in SPSS. If anyone here knows the differences between the 2SLS and SEM approaches, it would be interesting to find out. I will have to find some good sources on the practical application of SPSS and AMOS for using either 2SLS or SEM. If one has to run SEM to show that the results aren't much different than regression, then I don't see the point there since for a publication both tests would need to be reported. As such, it would make sense to do only SEM to begin with and simplify the findings. There is a third approach which to me doesn't seem to be incorrect: if you are experimentally measuring Y11, Y12 in the same animal, then Y21 and Y22 in another animal, all with the same predefined X (without error), instead of graphing the means of Y11, Y21 vs Y12, Y22 (which would result in horizontal error), another approach would be to simply use ratios to get rid of the horizontal error. So in this case it would be the means of Y11/Y12, Y21/Y22 vs X (no measurement error). The main issue with the latter approach is that ratios are more difficult to explain than a simple Y v X graph. If I'm overlooking something statistically, let me know. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rich Ulrich
I have not been able to find any references about safely ignoring measurement errors and still achieving unbiased results.
Tellinghuisen's Monte Carlo simulations showed that OLS should be used if both X and Y are homoscedastic, which is not easy to satisfy in biological situations. But that's still different because under
such conditions the coefficients are accurate, not just the test results. I don't really understand how can the coefficients be inaccurate but the test results accurate when when significance testing compares those very same coefficients. Does anyone have
any references, especially Monte Carlo simulations? Maybe I'm not using the right keywords in my searches.
For understanding the results or publishing them, the coefficients themselves are rather important to give some idea about the effect size, even if not officially calculated/standardized. Rich, your point about latent variables and their relationship to linearity seems to be more about the biological theory. If one is using a particular scale or method (hence theory), which has been developed by prior scientists, then any nonlinearity is valid based on that method. One needs to come up with a new method and a scale/relationship, if there is belief that the latent variables are not properly represented. But in either case, even nonlinear data can sometimes be transformed, purely mathematically, into a linear counterpart before worrying about nonlinear analyses, be it a square root or some other transformation that works. If I understood your example, the approach there was mathematical as well - the theory of how errors should be scaled or measured, or the use of a different measurement system, was not addressed. Here is the link to the article again, in case the hyperlink above doesn't work: http://www.ncbi.nlm.nih.gov/pubmed/20577693 From: Rich Ulrich [[hidden email]]
Sent: Friday, May 06, 2016 12:20 AM To: Rudobeck, Emil (LLU); SPSS list Subject: RE: Statistics with errors in the x-variable Emil, to take your paragraphs in order:
1. The /test/ results (but not the coefficients) will be robust when ignoring e-in-v. In some cases, this results is obvious from inspection of the computation of the e-in-v error terms, which simply translate the robust tests and pretend that the errors still apply to the new coefficients -- despite some extra (and not error-less) manipulation. And if you are considering SEM (or AMOS), they are famous for not providing meaningful tests ... so you compare your /set/ of models and have to be happy if the best one seems much better than the others. They are for estimation of coefficients and alternate paths, not for testing the minimal existence of an effect. 2. "Residuals" matter because they are whose sums are squared to form a chi-squared variate that makes up the ANOVA F-test. They don't have to be really great, but you don't want one or two outliers contributing half the sum of squares. If your sample is small, you don't have much power for /testing/ normality; if your sample is large enough, you don't have much concern, because the F-test will still be pretty good. (Pay more attention to heterogeneity, or correlation.) Consider what is being measured... does it seem like equal intervals (with the outcome in mind)? 3. Nonlinearity. Apparently you missed my meaning entirely. Obviously, a linear equation will produce equal predictions for equal intervals. But: Does common sense tell you that is realistic? If you don't have a particular outcome in mind, consider the "latent factor" that is supposed to be measured by your score. There is a fairly big difference in status (outcome) between scoring 3 errors (failing?) on a dementia scale versus 0 errors (healthy) out of 31, where there is very little difference between scoring 20 versus 23 (seriously dysfunctional). For "20 versus 23", you might seriously wonder if the patient would vary by that much if you re-tested a few hours later. For the particular scale I have in mind, for a sample that spanned the range of scores, I think I recommended using the square root of the number of errors. Almost any model building with that score /ought/ to be concerned with that latent factor, and not with the count of errors. [Actually, the protocol-scoring reported scores to 31-- which created an unfortunate bias toward "demented" when a patient was deaf/blind/whatever and could not be scored on some item.] Weekly evaluation of a new psychiatric treatment is not done "equal interval" in time from the start of treatment. Doing followups at (4 days, a week, 2 weeks, 4 weeks, 8 weeks) will be an approximation of equal intervals for outcome which will not be perfect; but it will be economical, it will avoid the negative effects of "overtesting", and it will be far more linear in changes than counting days as equal. Clinical investigators typically /chose/ their intervals to represent what they expect to approximate equal amounts of change, up to the maintenance phase. If you know the "linearity" that the clinician expects, it makes sense to build your default model on the spacing and then, if you want, test for the departure from the expected linearity. -- Rich Ulrich Date: Thu, 5 May 2016 18:02:29 +0000 From: [hidden email] Subject: Re: Statistics with errors in the x-variable To: [hidden email] Hi Rich,
When you say it does not affect the tests, do you mean that the statistical results would be identical whether errors-in-variables are ignored or included? I am wondering if there are any references or simulations to this end. When I was searching the scientific literature (biology), I also found that authors always ignored predictor errors, even excluded horizontal error bars, but biological scientific papers aren't a benchmark of good statistics, so I wasn't sure if that approach was correct. Much to be said about the residual normality, since SPSS only outputs conditional residuals for MIXED, yet West says that normality should be assessed using the studentized or standardized residuals/eBLUPs. Interesting point on the definition of nonlinearity. It would seem that definition will always be satisfied automatically, unless one of the axes is categorical data that's treated as continuous without proper transformation. I've seen some colleagues make a graph using x axes values that are not equidistant in terms of measurement (20V, 60V, 100V...) yet are so plotted and analyzed (1,2,3,...). This is essentially a rank transformation. For categorical analyzes this doesn't matter, except when these values are repeated measures and the users are trying to establish polynomial relationships with rANOVA. Emil From: Rich Ulrich [[hidden email]]
Sent: Thursday, May 05, 2016 10:19 AM To: Rudobeck, Emil (LLU); SPSS list Subject: RE: Statistics with errors in the x-variable I have never worried much about errors-in-variables because it
does not affect the testing. I was concerned about the tests, and never took exact coefficients too seriously. Neither did anyone else in my particular area; we paid attention to the testing. If the tests are your concern, then do not be worried. A method that takes e-in-v into account will produce a different estimate of coefficients. The notion here is the same as "correcting for attenuation" when looking at simple Pearson r's. That is not something that most people do. Or expect. "Normal" is an assumption that applies to the residuals. It is the big outliers, or correlated outliers, that affect the robustness of the testing. "Nonlinearity" is often misunderstood as a character of the predictor, whereas it should be applied to the relationship between predictor and outcome. I find it useful to think of the "equal-interval" relationship, where equal-intervals of changes in the predictor should result in equal- intervals of changes in the outcome. -- Rich Ulrich Date: Thu, 5 May 2016 16:36:03 +0000 From: [hidden email] Subject: Statistics with errors in the x-variable To: [hidden email] One of the often ignored assumptions of regression is that the predictor variable cannot have any errors. Since mixed models incorporate the assumptions of regression, I am assuming
SPSS MIXED also requires precise predictor measurements. Admittedly, I have had a hard time finding all the exact assumptions of mixed models since they are not covered, even in the most recent book by Brady West. As a result, I am wondering what are some
correct statistical tests that can be applied when the data has both x and y errors (predictor and response variables). And which of these tests can be done with SPSS?
Some research turned up total/generalized least squares as the answer, but it doesn't seem like SPSS has this option (r plug-in allows partial least squares). And I am not sure if robust regression or bootstrap regression available in SPSS would address the issue either. Some suggestions or solutions would be appreciated. A further, possible complication is for data that is nonnormal or nonlinear, or both. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don’t hear from, or attachments you usually don’t receive or didn’t expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email.
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
In reply to this post by Mike
Instrumental variables for errors in the independent variables has a long history - back to 1945. While 2SLS is focused on dealing with the problem of correlation between endogenous regressors and the error term(s) in a set of equations, the use of instrumental variables is not confined to this situation. In a simultaneous equations setting, instruments are taken from exogenous variables in all the equations of the model. Absent such a model, selection of instruments may be more ad hoc, but variables that are correlated with the systematic portion of the regressors and neither the equation error terms nor the measurement error are suitable. Accuracy requires strong correlation between the instruments and the systematic part of the regressors. I'm having trouble at the moment logging in to JSTOR to access good online references, but on my shelf I have Malinvaud, Statistics Methods of Econometrics, and C10, Linear Models with Errors in Variables, section 7 discusses the mathematical details and properties of the IV estimator. The 2SLS procedure in Statistics allows you to specify the instrumental variables along with the dependent and predictor variables without explicitly specifying other equations of the model. On Fri, May 6, 2016 at 11:24 AM, Mike Palij <[hidden email]> wrote: I think that there is some confusion about (a) instrumental variables and |
Hi Jon,
With all due respect I have to ask the following questions: (1) Did you read any of the references that I listed in my previous posts? Specifically: Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109-121. I assume that you are up on the current literature and would not rely on out-of-date references. (2) In the IBM SPSS website that provides information on the 2SLS procedure, they list as the source for the algorithms two books by Thiel from 1953; see: http://www.ibm.com/support/knowledgecenter/SSLVMB_20.0.0/com.ibm.spss.statistics.help/alg_2sls_references.htm?lang=sl I note that this is for ver 20 of SPSS but I have to ask: have there been no updates to the algorithms or how to solve these problems since 1953? My reading of Bollen and others suggest that there may have been. SIDENOTE: One reason I ask about the use of possibly outdated algorithms is because I recently became aware that AMOS uses a formula for the "Modification Indexes" or Lagrangian multipliers that originally comes from LISREL V and which was replaced by a newer algorithm in LISREL VI. The SEM software EQS, LISREL, and SAS Calis all use the newer algorithm and provide the same values for the modification indexes but AMOS provides different (lower) values. Tabachnick and Fidell 5th Edition shows how all of these program handle a common dataset and do not comment on why AMOS gives different results..Examination of the AMOS manual would lead one to think that it also uses the newer algorithm that LISREL VI and other programs use but this is clearly wrong. Which leads one to wonder why no one has (a) clearly explained why this is being done and that one should expect results that do not agree with other software, and (b) who decides to keep this as a "feature" instead of updating the software? (3) I note that the most recent edition of the text by Malinvaud that you refer to below is 1980 (3rd ed) according to WorldCat. I can't make out what the C10 reference is. You seem to imply that there have been no new developments in these areas since 1980 (though Bollen and others appear to imply otherwise). Is this true? Just wondering. -Mike Palij New York University [hidden email] ----- Original Message ----- From: Jon Peck To: Mike Palij Cc: [hidden email] Sent: Friday, May 06, 2016 3:30 PM Subject: Re: Statistics with errors in the x-variable Instrumental variables for errors in the independent variables has a long history - back to 1945. While 2SLS is focused on dealing with the problem of correlation between endogenous regressors and the error term(s) in a set of equations, the use of instrumental variables is not confined to this situation. In a simultaneous equations setting, instruments are taken from exogenous variables in all the equations of the model. Absent such a model, selection of instruments may be more ad hoc, but variables that are correlated with the systematic portion of the regressors and neither the equation error terms nor the measurement error are suitable. Accuracy requires strong correlation between the instruments and the systematic part of the regressors. I'm having trouble at the moment logging in to JSTOR to access good online references, but on my shelf I have Malinvaud, Statistics Methods of Econometrics, and C10, Linear Models with Errors in Variables, section 7 discusses the mathematical details and properties of the IV estimator. The 2SLS procedure in Statistics allows you to specify the instrumental variables along with the dependent and predictor variables without explicitly specifying other equations of the model. On Fri, May 6, 2016 at 11:24 AM, Mike Palij <[hidden email]> wrote: I think that there is some confusion about (a) instrumental variables and (b) 2SLS analysis. Let me suggest the following chapter by Ken Bollen: Bollen, K. A. (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37-72. In his Figure 1, he provide the common definition of Instrumental Variables (IV), namely, there is a covariance/correlation between a predictor X' and the MODEL error e. This can occur evem if X is not a latent variable (i.e., X = Xi/Ksi + epsilon-x, in LISREL notiation). In his Figure 2, Bollwns identifies several conditions where X can be correlated with epsilon-y, the model error: (1) Figure 2b represents the "measurement error in X" that we have been disccusiong so far. OLS regression uses the empirical X which combine Xi + epsilon-x -- becaise Xi is correlated with Y, epsilon-x will become correlated with epsilon-y (see page 39). Creating the appropriate measurement model for X, that is, Xi and epsilon-x, allows one to use Xi in the regression and epsilon-x now stands alone, independent of all other entities. (2) Figure 2a represents a model where empirical X and Y have a feedback relationship (reciprical causation) and both have epsilon terms that are correlated and influence their associated empirical indicators (i.e., epsilon-x is causally related to X and epsilon-y is causally related to Y). (3) Figure 2d assumes X is a lagged version of Y (a measure of Y at a prior time) which induced an autoregressive relationship between epsilon-x and expsilong-y, and each affect X and Y similar to that in Figure 2a. (4) Figure 2c assumes that a variable L is omitted but is causally related to X and Y. L relationship to Y is expressed through epsilon-x. So, 2SLS can be used to correct for the correlation between epsilon-x and epsilon-y but measurement error in x is just one situation whete this occurs -- one has to determine whether one data represents the model in Figure 2b or the in the other models, which will require different solutions. Ken Bollen has been studying this situation for a while and he has suggested various alternative analyses (he does not identify software package solutions, so one would have to write the code to identify the appropriate model and then modify the regression appropriately. On page 59 Bollen cites one of his papers where he proposed a 2-stage analysis strategy that can assist in determining the number of instrumental variables to use; he reviews other methods tht can be used to check one's model. His Table 1 (page 64) contains a list of references, which method of analysis was used (e.g., SEM, 2SLS, etc.) to evaluate a model. This was published in 2012 but one might want to look at an earlier paper by Bollen where he argues for 2-stage analyses: Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109-121. Bollen has published post 2012 papers that one might also want to look at: Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation models. In Handbook of causal analysis for social research (pp. 301-328). Springer Netherlands. Bollen, K. A., Kolenikov, S., & Bauldry, S. (2014). Model-Implied Instrumental Variable—Generalized Method of Moments (MIIV-GMM) Estimators for Latent Variable Models. Psychometrika, 79(1), 20-50. -Mike Palij New York University [hidden email] ----- Original Message ----- From: Jon Peck To: [hidden email] Sent: Thursday, May 05, 2016 10:34 PM Subject: Re: Statistics with errors in the x-variable I was going to mention 2sls, which is based on instrumental variables. IVs were the original (as far as I know) method for dealing with errors in variables problems. Besides the built-in 2sls command in Statistics, there is an extension command called stats eqnsystem that provides a variety of estimators for equation systems. Also, assuming that you have some idea of the error variance, you can explore the effect on your estimates and tests by adding random noise to the variables to see how it affects the results. Generally you will find that multicollinearity exacerbates the effect of errors in variables. -------------------------------------------- On Thursday, May 5, 2016, Rudobeck, Emil (LLU) <[hidden email]> wrote: Thanks Mike. As expected, the solution isn't simple. I will need to read up on SEM. Luckily I have AMOS. The Bayesian approach will be a much bigger leap. I found out that Deming regression, which can also be used, has been submitted as an enhancement request for SPSS. I don't know when they will incorporate it. I also found out that 2 stage least squares regression (2SLS) is an alternative to SEM and is available in SPSS. If anyone here knows the differences between the 2SLS and SEM approaches, it would be interesting to find out. I will have to find some good sources on the practical application of SPSS and AMOS for using either 2SLS or SEM. If one has to run SEM to show that the results aren't much different than regression, then I don't see the point there since for a publication both tests would need to be reported. As such, it would make sense to do only SEM to begin with and simplify the findings. There is a third approach which to me doesn't seem to be incorrect: if you are experimentally measuring Y11, Y12 in the same animal, then Y21 and Y22 in another animal, all with the same predefined X (without error), instead of graphing the means of Y11, Y21 vs Y12, Y22 (which would result in horizontal error), another approach would be to simply use ratios to get rid of the horizontal error. So in this case it would be the means of Y11/Y12, Y21/Y22 vs X (no measurement error). The main issue with the latter approach is that ratios are more difficult to explain than a simple Y v X graph. If I'm overlooking something statistically, let me know. -- Jon K Peck [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rudobeck, Emil (LLU)
For understanding or publishing results in clinical studies like I participated in, almost
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
everyone gives the (potentially) biased coefficients, and are comfortable with them. You want to do what is conventional in your area, if there is a convention. UNBIASED/ attenuation. Suppose the IQ of identical twins is correlated at about 0.94. That is usually sufficient to say. However, a theoretical work might take into account the "true score" variability of the IQ test, and report that this IQ is (say) 0.97 "when corrected for attenuation". Note that I am describing the r and not the regression coefficient, which your citation favors. Also, what I have used -- I have decided that the correlation between two scales is essentially 1.0, based on inter-correlations at one time and across a short time. So I don't want to look at the scales as if they were distinct. But I would not attempt to put a confidence limit on the 0.97 or on the 1.00. What I said about ignoring measurement errors is, "Do it, if the purpose is testing, and /not/ unbiased coefficients." On the other hand, I never mentioned 2SLS. If you add information from elsewhere, it is possible that you get tighter tests. I am not sure whether your citation suggests that one method discussed in that abstract gives tighter tests after re-estimating x: I can imagine that working in the instance of extremely high r's, but not in general. NONLINEARITY. I really don't much understand your comments or your objections. Yes, I'm all for using natural units when they make sense. Yes, I'm very familiar with having PIs come to me with arbitrary scores or measures that deserve transformation in order to make sense ... and, at the same time, fix problems that we might otherwise detect with subtle testing. -- Rich Ulrich Date: Fri, 6 May 2016 18:44:49 +0000 From: [hidden email] Subject: Re: Statistics with errors in the x-variable To: [hidden email] I have not been able to find any references about safely ignoring measurement errors and still achieving unbiased results.
Tellinghuisen's Monte Carlo simulations showed that OLS should be used if both X and Y are homoscedastic, which is not easy to satisfy in biological situations. But that's still different because under
such conditions the coefficients are accurate, not just the test results. I don't really understand how can the coefficients be inaccurate but the test results accurate when when significance testing compares those very same coefficients. Does anyone have
any references, especially Monte Carlo simulations? Maybe I'm not using the right keywords in my searches.
For understanding the results or publishing them, the coefficients themselves are rather important to give some idea about the effect size, even if not officially calculated/standardized. Rich, your point about latent variables and their relationship to linearity seems to be more about the biological theory. If one is using a particular scale or method (hence theory), which has been developed by prior scientists, then any nonlinearity is valid based on that method. One needs to come up with a new method and a scale/relationship, if there is belief that the latent variables are not properly represented. But in either case, even nonlinear data can sometimes be transformed, purely mathematically, into a linear counterpart before worrying about nonlinear analyses, be it a square root or some other transformation that works. If I understood your example, the approach there was mathematical as well - the theory of how errors should be scaled or measured, or the use of a different measurement system, was not addressed. Here is the link to the article again, in case the hyperlink above doesn't work: http://www.ncbi.nlm.nih.gov/pubmed/20577693 From: Rich Ulrich [[hidden email]]
Sent: Friday, May 06, 2016 12:20 AM To: Rudobeck, Emil (LLU); SPSS list Subject: RE: Statistics with errors in the x-variable Emil, to take your paragraphs in order: 1. The /test/ results (but not the coefficients) will be robust when ignoring e-in-v. In some cases, this results is obvious from inspection of the computation of the e-in-v error terms, which simply translate the robust tests and pretend that the errors still apply to the new coefficients -- despite some extra (and not error-less) manipulation. And if you are considering SEM (or AMOS), they are famous for not providing meaningful tests ... so you compare your /set/ of models and have to be happy if the best one seems much better than the others. They are for estimation of coefficients and alternate paths, not for testing the minimal existence of an effect. 2. "Residuals" matter because they are whose sums are squared to form a chi-squared variate that makes up the ANOVA F-test. They don't have to be really great, but you don't want one or two outliers contributing half the sum of squares. If your sample is small, you don't have much power for /testing/ normality; if your sample is large enough, you don't have much concern, because the F-test will still be pretty good. (Pay more attention to heterogeneity, or correlation.) Consider what is being measured... does it seem like equal intervals (with the outcome in mind)? 3. Nonlinearity. Apparently you missed my meaning entirely. Obviously, a linear equation will produce equal predictions for equal intervals. But: Does common sense tell you that is realistic? If you don't have a particular outcome in mind, consider the "latent factor" that is supposed to be measured by your score. There is a fairly big difference in status (outcome) between scoring 3 errors (failing?) on a dementia scale versus 0 errors (healthy) out of 31, where there is very little difference between scoring 20 versus 23 (seriously dysfunctional). For "20 versus 23", you might seriously wonder if the patient would vary by that much if you re-tested a few hours later. For the particular scale I have in mind, for a sample that spanned the range of scores, I think I recommended using the square root of the number of errors. Almost any model building with that score /ought/ to be concerned with that latent factor, and not with the count of errors. [Actually, the protocol-scoring reported scores to 31-- which created an unfortunate bias toward "demented" when a patient was deaf/blind/whatever and could not be scored on some item.] Weekly evaluation of a new psychiatric treatment is not done "equal interval" in time from the start of treatment. Doing followups at (4 days, a week, 2 weeks, 4 weeks, 8 weeks) will be an approximation of equal intervals for outcome which will not be perfect; but it will be economical, it will avoid the negative effects of "overtesting", and it will be far more linear in changes than counting days as equal. Clinical investigators typically /chose/ their intervals to represent what they expect to approximate equal amounts of change, up to the maintenance phase. If you know the "linearity" that the clinician expects, it makes sense to build your default model on the spacing and then, if you want, test for the departure from the expected linearity. -- Rich Ulrich [strip, earlier notes] |
Free forum by Nabble | Edit this page |