Dear All, I am conducting factor analysis with missing data and would like to treat missing data as MAR (missing at random). I have a question how SPSS procedure FACTOR handles missing data in the subcommand « missing
INCLUDE » . It is based on assumption MAR? Any help is greatly appreciated. Thanks in advance for your help. Regards, Placide Placide Poba-Nzaou Associate Professor, University of Quebec in Montreal, Montreal, Canada |
You are putting more on INCLUDE than it can bear. All that INCLUDE=YES means is that values marked as user missing are treated as valid and included in the analysis as regular values. (System missing values are always excluded). This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure. On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:
|
Thank so much for your prompt response. I already tried the subcommand « missing meansub » which gave me different result compared to the subcommand « missing include ». What does it really mean « Cases with user-missing values are treated as valid ». Does it mean that the missing values are imputed by the means? The key point : with MAR data, how to conduct factor analysis in SPSS with all the data? (All missing values are user-missing values.) Regards From: Jon Peck <[hidden email]> You are putting more on INCLUDE than it can bear. All that INCLUDE=YES means is that values marked as user missing are treated as valid and included in the analysis as regular values. (System missing values
are always excluded). This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure. On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:
-- Jon K Peck |
Administrator
|
In reply to this post by Poba-Nzaou
Hello Placide. You could use MVA to generate a matrix of EM correlations (or
covariances), and use that matrix as input to the FACTOR command. Unfortunately, MVA does not have a /MATRIX sub-command to facilitate this. But my colleague Hillary Maxwell and I wrote a couple macros to do that job. See the links below for details. https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/emcorr http://tqmp.org/Content/vol10-2/p143/p143.pdf HTH. Poba-Nzaou wrote > Dear All, > I am conducting factor analysis with missing data and would like to treat > missing data as MAR (missing at random). I have a question how SPSS > procedure FACTOR handles missing data in the subcommand « missing INCLUDE > » . It is based on assumption MAR? Any help is greatly appreciated. > Thanks in advance for your help. > > Regards, > Placide > > Placide Poba-Nzaou > Associate Professor, > University of Quebec in Montreal, Montreal, Canada > > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Poba-Nzaou
« Cases with user-missing values are treated as valid » means the worst
thing that you imagine it might mean -- Example: height coded as -999 (for MISSING)
gets treated as a real, observed value of -999. You don't want this. I've had data where,
for a few variables, I could consider /zero/ as either MISSING or real. However, this is one
of those options where a clever, artificial-intelligence interface would warn you, YOU
ALMOST NEVER WANT TO DO THIS.
Mean-substituting is not /terrible/ for MAR. If I had a bunch of sparse and scattered
MISSING, MAR, I might do two analyses. Does the result using mean-substitution
look essentially the same as the reduced-sample result (from using the default of
omitting every case with a missing value)?
From: Poba-Nzaou, Placide <[hidden email]>
Sent: Wednesday, December 5, 2018 4:08 PM To: Jon Peck; Rich Ulrich Cc: SPSS List Subject: Re: [SPSSX-L] Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS Thank so much for your prompt response.
I already tried the subcommand « missing meansub » which gave me different result compared to the subcommand « missing include ». What does it really mean « Cases with user-missing values are treated as valid ». Does it mean that the missing values are imputed by the means? The key point : with MAR data, how to conduct factor analysis in SPSS with all the data? (All missing values are user-missing values.)
Regards
From:
Jon Peck <[hidden email]>
You are putting more on INCLUDE than it can bear. All that INCLUDE=YES means is that values marked as user missing are treated as valid and included in the analysis as regular values. (System missing values are always excluded).
This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure.
On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:
-- Jon K Peck |
Thank you so much !!! From: Rich Ulrich <[hidden email]> « Cases with user-missing values are treated as valid » means the worst thing that you imagine it might mean -- Example: height coded as -999 (for MISSING)
gets treated as a real, observed value of -999. You don't want this. I've had data where, for a few variables, I could consider /zero/ as either MISSING or real. However, this is one
of those options where a clever, artificial-intelligence interface would warn you, YOU ALMOST NEVER WANT TO DO THIS. Mean-substituting is not /terrible/ for MAR. If I had a bunch of sparse and scattered
MISSING, MAR, I might do two analyses. Does the result using mean-substitution look essentially the same as the reduced-sample result (from using the default of
omitting every case with a missing value)? From: Poba-Nzaou, Placide <[hidden email]> Thank so much for your prompt response. I already tried the subcommand « missing meansub » which gave me different result compared to the subcommand « missing include ». What does it really mean « Cases with user-missing values are treated as valid ». Does it mean that the missing values are imputed by the means? The key point : with MAR data, how to conduct factor analysis in SPSS with all the data? (All missing values are user-missing values.) Regards From:
Jon Peck <[hidden email]> You are putting more on INCLUDE than it can bear. All that INCLUDE=YES means is that values marked as user missing are treated as valid and included in the analysis as regular values. (System missing values
are always excluded). This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure. On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:
-- Jon K Peck |
In reply to this post by Bruce Weaver
Bruce,
So is my understanding correct or not - that one should use the EM correlation (covariance) matrix in FACTOR to extract loadings and should use the imputed data (MVA can save EM-imputed data) if one wants then to compute factor scores? 06.12.2018 0:39, Bruce Weaver пишет:
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDHello Placide. You could use MVA to generate a matrix of EM correlations (or covariances), and use that matrix as input to the FACTOR command. Unfortunately, MVA does not have a /MATRIX sub-command to facilitate this. But my colleague Hillary Maxwell and I wrote a couple macros to do that job. See the links below for details. https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/emcorr http://tqmp.org/Content/vol10-2/p143/p143.pdf HTH. |
Administrator
|
In reply to this post by Rich Ulrich
Rich Ulrich wrote
> --- snip --- > Mean-substituting is not /terrible/ for MAR. > --- snip --- Rich, John Graham (well known author on missing data) would not agree with you. This is an excerpt from his book, Missing Data - Analysis and Design (p. 51). --- start of excerpt -- Mean substitution is a strategy in which the mean is calculated for the variable based on all cases that have data for that variable. This mean is then used in place of any missing value on that variable. This is the worst of all possible strategies. Inserting the mean in place of the missing value reduces variance on the variable and plays havoc with covariances and correlations. Also, there is no straightforward way to estimate standard errors. Because of all the problems with this strategy, I believe that using it amounts to nothing more than pretending that no data are missing. I recommend that people should NEVER use this procedure. If you absolutely must pretend that you have no missing data, a much better strategy, and one that is almost as easy to implement, is to impute a single data set from EM parameters (see Chaps. 3 and 7) and use that. -- end of excerpt --- Here is a PDF of the book--it is on the Springer website, so would seem not to be violating copyright. https://link.springer.com/content/pdf/10.1007%2F978-1-4614-4018-5.pdf Another nice (shorter) resource by Graham is his 2009 Annual Review of Psychology chapter: https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf HTH. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce,
- those are worth-while comments.
I wish I had said
> Mean-substituting is not /terrible/ for MAR... for factor analysis.
The choice may be "conservative results" versus "results based on artifacts."
And I did say, Do the factoring two ways and compare the results. A problem
for other replacement for factor analysis is that the algorithm shapes the
factor results.
MISSINGs create other problems for inference and testing, even when you
meet the assumptions of Missing at Random. And I don't like to trust that
Missings are at random.
If you replace a large number of Missings, the over-estimate of d.f. for tests
might be too much.
Also, in a sample size of N with k replacements, you are messing with k/N
(expected) share of the variance (though, you hope, the "mess" is small for
each case). But that k/N suggests that in an ANOVA setting, an R-squared
near 1.0 is much more disrupted than an R-squared near zero. Compare the
fraction k/N to an error variance of the underlying data that is 5%, to the
case where it is 95%. Roughly speaking.
When there is a bunch missing, you really need to be careful, and I don't
think there's a single answer that fits all cases for multivariate data.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Thursday, December 6, 2018 10:19 AM To: [hidden email] Subject: Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS Rich Ulrich wrote
> --- snip --- > Mean-substituting is not /terrible/ for MAR. > --- snip --- Rich, John Graham (well known author on missing data) would not agree with you. This is an excerpt from his book, Missing Data - Analysis and Design (p. 51). --- start of excerpt -- Mean substitution is a strategy in which the mean is calculated for the variable based on all cases that have data for that variable. This mean is then used in place of any missing value on that variable. This is the worst of all possible strategies. Inserting the mean in place of the missing value reduces variance on the variable and plays havoc with covariances and correlations. Also, there is no straightforward way to estimate standard errors. Because of all the problems with this strategy, I believe that using it amounts to nothing more than pretending that no data are missing. I recommend that people should NEVER use this procedure. If you absolutely must pretend that you have no missing data, a much better strategy, and one that is almost as easy to implement, is to impute a single data set from EM parameters (see Chaps. 3 and 7) and use that. -- end of excerpt --- Here is a PDF of the book--it is on the Springer website, so would seem not to be violating copyright. https://link.springer.com/content/pdf/10.1007%2F978-1-4614-4018-5.pdf Another nice (shorter) resource by Graham is his 2009 Annual Review of Psychology chapter: https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf HTH. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
The approach described in my article with Maxwell is based directly on this
excerpt from John Graham's Annual Review of Psychology article (https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf). --- start of excerpt (p. 556) --- Good uses of the EM algorithm. Although the EM algorithm provides excellent parameter estimates, the lack of convenient standard errors means that EM is not particularly good for hypothesis testing. On the other hand, several important analyses, often preliminary analyses, don’t use standard errors anyway, so the EM estimates are very useful. First, it is often desirable to report means, standard deviations, and sometimes a correlation matrix in one's paper. I would argue that the best estimates for these quantities are the ML estimates provided by EM. Second, data quality analyses, for example, coefficient alpha analyses, because they typically do not involve standard errors, can easily be based on the EM covariance matrix (e.g., see Enders 2003; Graham et al. 2002, 2003). The EM covariance matrix is also an excellent basis for exploratory factor analysis with missing data. This is especially easy with the SAS/STAT software program (SAS Institute); one simply includes the relevant variables in Proc MI, asking for the EM matrix to be output. That matrix may then be used as input for Proc Factor using the "type = cov" option. --- end or excerpt --- HTH. Bruce Weaver wrote > Hello Placide. You could use MVA to generate a matrix of EM correlations > (or > covariances), and use that matrix as input to the FACTOR command. > Unfortunately, MVA does not have a /MATRIX sub-command to facilitate this. > But my colleague Hillary Maxwell and I wrote a couple macros to do that > job. > See the links below for details. > > https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/emcorr > http://tqmp.org/Content/vol10-2/p143/p143.pdf > > HTH. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Rich Ulrich <[hidden email]>
Sent: Thursday, December 6, 2018 7:14 PM To: [hidden email] Subject: Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS
Bruce,
- those are worth-while comments.
I wish I had said
> Mean-substituting is not /terrible/ for MAR... for factor analysis.
The choice may be "conservative results" versus "results based on artifacts."
And I did say, Do the factoring two ways and compare the results. A problem
for other replacement for factor analysis is that the algorithm shapes the
factor results.
MISSINGs create other problems for inference and testing, even when you
meet the assumptions of Missing at Random. And I don't like to trust that
Missings are at random.
If you replace a large number of Missings, the over-estimate of d.f. for tests
might be too much.
Also, in a sample size of N with k replacements, you are messing with k/N
(expected) share of the variance (though, you hope, the "mess" is small for
each case). But that k/N suggests that in an ANOVA setting, an R-squared
near 1.0 is much more disrupted than an R-squared near zero. Compare the
fraction k/N to an error variance of the underlying data that is 5%, to the
case where it is 95%. Roughly speaking.
When there is a bunch missing, you really need to be careful, and I don't
think there's a single answer that fits all cases for multivariate data.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Thursday, December 6, 2018 10:19 AM To: [hidden email] Subject: Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS Rich Ulrich wrote
> --- snip --- > Mean-substituting is not /terrible/ for MAR. > --- snip --- Rich, John Graham (well known author on missing data) would not agree with you. This is an excerpt from his book, Missing Data - Analysis and Design (p. 51). --- start of excerpt -- Mean substitution is a strategy in which the mean is calculated for the variable based on all cases that have data for that variable. This mean is then used in place of any missing value on that variable. This is the worst of all possible strategies. Inserting the mean in place of the missing value reduces variance on the variable and plays havoc with covariances and correlations. Also, there is no straightforward way to estimate standard errors. Because of all the problems with this strategy, I believe that using it amounts to nothing more than pretending that no data are missing. I recommend that people should NEVER use this procedure. If you absolutely must pretend that you have no missing data, a much better strategy, and one that is almost as easy to implement, is to impute a single data set from EM parameters (see Chaps. 3 and 7) and use that. -- end of excerpt --- Here is a PDF of the book--it is on the Springer website, so would seem not to be violating copyright. https://link.springer.com/content/pdf/10.1007%2F978-1-4614-4018-5.pdf Another nice (shorter) resource by Graham is his 2009 Annual Review of Psychology chapter: https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf HTH. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |