SPSSX Discussion

Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Classic

List

Threaded

11 messages Options

Poba-Nzaou

Dec 05, 2018; 6:27pm

Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Dear All,

I am conducting factor analysis with missing data and would like to treat missing data as MAR (missing at random). I have a question how SPSS procedure FACTOR handles missing data in the subcommand « missing INCLUDE » . It is based on assumption MAR? Any help is greatly appreciated.

Thanks in advance for your help.

Regards,

Placide

Placide Poba-Nzaou

Associate Professor,

University of Quebec in Montreal, Montreal, Canada

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon Peck

Dec 05, 2018; 8:04pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

You are putting more on INCLUDE than it can bear. All that INCLUDE=YES means is that values marked as user missing are treated as valid and included in the analysis as regular values. (System missing values are always excluded).

This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure.

On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:

Dear All,

I am conducting factor analysis with missing data and would like to treat missing data as MAR (missing at random). I have a question how SPSS procedure FACTOR handles missing data in the subcommand « missing INCLUDE » . It is based on assumption MAR? Any help is greatly appreciated.

Thanks in advance for your help.

Regards,

Placide

Placide Poba-Nzaou

Associate Professor,

University of Quebec in Montreal, Montreal, Canada

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

Jon K Peck
[hidden email]

Poba-Nzaou

Dec 05, 2018; 9:08pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Thank so much for your prompt response.

I already tried the subcommand « missing meansub » which gave me different result compared to the subcommand « missing include ».

What does it really mean « Cases with user-missing values are treated as valid ». Does it mean that the missing values are imputed by the means?

The key point : with MAR data, how to conduct factor analysis in SPSS with all the data? (All missing values are user-missing values.)

Regards

From: Jon Peck <[hidden email]>
Date: Wednesday, December 5, 2018 at 15:05
To: Placide Poba-Nzaou <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure.

On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:

Dear All,

I am conducting factor analysis with missing data and would like to treat missing data as MAR (missing at random). I have a question how SPSS procedure FACTOR handles missing data in the subcommand « missing INCLUDE » . It is based on assumption MAR? Any help is greatly appreciated.

Thanks in advance for your help.

Regards,

Placide

Placide Poba-Nzaou

Associate Professor,

University of Quebec in Montreal, Montreal, Canada

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

Jon K Peck
[hidden email]

Bruce Weaver

Dec 05, 2018; 9:39pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Administrator

In reply to this post by Poba-Nzaou

Hello Placide. You could use MVA to generate a matrix of EM correlations (or
covariances), and use that matrix as input to the FACTOR command.
Unfortunately, MVA does not have a /MATRIX sub-command to facilitate this.
But my colleague Hillary Maxwell and I wrote a couple macros to do that job.
See the links below for details.

https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/emcorr
http://tqmp.org/Content/vol10-2/p143/p143.pdf

HTH.

Poba-Nzaou wrote

> Dear All,
> I am conducting factor analysis with missing data and would like to treat
> missing data as MAR (missing at random). I have a question how SPSS
> procedure FACTOR handles missing data in the subcommand « missing INCLUDE
> » . It is based on assumption MAR? Any help is greatly appreciated.
> Thanks in advance for your help.
>
> Regards,
> Placide
>
> Placide Poba-Nzaou
> Associate Professor,
> University of Quebec in Montreal, Montreal, Canada
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

... [show rest of quote]

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Rich Ulrich

Dec 05, 2018; 10:24pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

In reply to this post by Poba-Nzaou

« Cases with user-missing values are treated as valid » means the worst

thing that you imagine it might mean -- Example: height coded as -999 (for MISSING)

gets treated as a real, observed value of -999. You don't want this. I've had data where,

for a few variables, I could consider /zero/ as either MISSING or real. However, this is one

of those options where a clever, artificial-intelligence interface would warn you, YOU

ALMOST NEVER WANT TO DO THIS.

Mean-substituting is not /terrible/ for MAR. If I had a bunch of sparse and scattered

MISSING, MAR, I might do two analyses. Does the result using mean-substitution

look essentially the same as the reduced-sample result (from using the default of

omitting every case with a missing value)?

From: Poba-Nzaou, Placide <[hidden email]>
Sent: Wednesday, December 5, 2018 4:08 PM
To: Jon Peck; Rich Ulrich
Cc: SPSS List
Subject: Re: [SPSSX-L] Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Thank so much for your prompt response.

I already tried the subcommand « missing meansub » which gave me different result compared to the subcommand « missing include ».

What does it really mean « Cases with user-missing values are treated as valid ». Does it mean that the missing values are imputed by the means?

The key point : with MAR data, how to conduct factor analysis in SPSS with all the data? (All missing values are user-missing values.)

Regards

This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure.

On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:

Dear All,

I am conducting factor analysis with missing data and would like to treat missing data as MAR (missing at random). I have a question how SPSS procedure FACTOR handles missing data in the subcommand « missing INCLUDE » . It is based on assumption MAR? Any help is greatly appreciated.

Thanks in advance for your help.

Regards,

Placide

Placide Poba-Nzaou

Associate Professor,

University of Quebec in Montreal, Montreal, Canada

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

Jon K Peck
[hidden email]

Poba-Nzaou

Dec 06, 2018; 5:48am

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Thank you so much !!!

From: Rich Ulrich <[hidden email]>
Date: Wednesday, December 5, 2018 at 17:24
To: Placide Poba-Nzaou <[hidden email]>, Jon Peck <[hidden email]>
Cc: SPSS List <[hidden email]>
Subject: Re: [SPSSX-L] Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

« Cases with user-missing values are treated as valid » means the worst

thing that you imagine it might mean -- Example: height coded as -999 (for MISSING)

gets treated as a real, observed value of -999. You don't want this. I've had data where,

for a few variables, I could consider /zero/ as either MISSING or real. However, this is one

of those options where a clever, artificial-intelligence interface would warn you, YOU

ALMOST NEVER WANT TO DO THIS.

Mean-substituting is not /terrible/ for MAR. If I had a bunch of sparse and scattered

MISSING, MAR, I might do two analyses. Does the result using mean-substitution

look essentially the same as the reduced-sample result (from using the default of

omitting every case with a missing value)?

Thank so much for your prompt response.

I already tried the subcommand « missing meansub » which gave me different result compared to the subcommand « missing include ».

What does it really mean « Cases with user-missing values are treated as valid ». Does it mean that the missing values are imputed by the means?

The key point : with MAR data, how to conduct factor analysis in SPSS with all the data? (All missing values are user-missing values.)

Regards

This generally does not make sense for continuous variables. You can impute values with the MVA procedure, but multiple imputation is not supported in the FACTOR procedure.

On Wed, Dec 5, 2018 at 11:27 AM Poba-Nzaou, Placide <[hidden email]> wrote:

Dear All,

I am conducting factor analysis with missing data and would like to treat missing data as MAR (missing at random). I have a question how SPSS procedure FACTOR handles missing data in the subcommand « missing INCLUDE » . It is based on assumption MAR? Any help is greatly appreciated.

Thanks in advance for your help.

Regards,

Placide

Placide Poba-Nzaou

Associate Professor,

University of Quebec in Montreal, Montreal, Canada

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

Jon K Peck
[hidden email]

Kirill Orlov

Dec 06, 2018; 3:05pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

In reply to this post by Bruce Weaver

Bruce,
So is my understanding correct or not - that one should use the EM correlation (covariance) matrix in FACTOR to extract loadings and should use the imputed data (MVA can save EM-imputed data) if one wants then to compute factor scores?

06.12.2018 0:39, Bruce Weaver пишет:

Hello Placide.  You could use MVA to generate a matrix of EM correlations (or
covariances), and use that matrix as input to the FACTOR command. 
Unfortunately, MVA does not have a /MATRIX sub-command to facilitate this. 
But my colleague Hillary Maxwell and I wrote a couple macros to do that job. 
See the links below for details.

https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/emcorr
http://tqmp.org/Content/vol10-2/p143/p143.pdf

HTH.

Bruce Weaver

Dec 06, 2018; 3:19pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Administrator

In reply to this post by Rich Ulrich

Rich Ulrich wrote
> --- snip ---
> Mean-substituting is not /terrible/ for MAR.
> --- snip ---

Rich, John Graham (well known author on missing data) would not agree with
you. This is an excerpt from his book, Missing Data - Analysis and Design
(p. 51).

--- start of excerpt --
Mean substitution is a strategy in which the mean is calculated for the
variable based on all cases that have data for that variable. This mean is
then used in place of any missing value on that variable.

This is the worst of all possible strategies. Inserting the mean in place of
the missing value reduces variance on the variable and plays havoc with
covariances and correlations. Also, there is no straightforward way to
estimate standard errors. Because of all the problems with this strategy, I
believe that using it amounts to nothing more than pretending that no data
are missing. I recommend that people should NEVER use this procedure. If
you absolutely must pretend that you have no missing data, a much better
strategy, and one that is almost as easy to implement, is to impute a single
data set from EM parameters (see Chaps. 3 and 7) and use that.
-- end of excerpt ---

Here is a PDF of the book--it is on the Springer website, so would seem not
to be violating copyright.

https://link.springer.com/content/pdf/10.1007%2F978-1-4614-4018-5.pdf

Another nice (shorter) resource by Graham is his 2009 Annual Review of
Psychology chapter:

https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf

HTH.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Dec 06, 2018; 6:14pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Bruce,

- those are worth-while comments.

I wish I had said

> Mean-substituting is not /terrible/ for MAR... for factor analysis.

The choice may be "conservative results" versus "results based on artifacts."

And I did say, Do the factoring two ways and compare the results. A problem

for other replacement for factor analysis is that the algorithm shapes the

factor results.

MISSINGs create other problems for inference and testing, even when you

meet the assumptions of Missing at Random. And I don't like to trust that

Missings are at random.

If you replace a large number of Missings, the over-estimate of d.f. for tests

might be too much.

Also, in a sample size of N with k replacements, you are messing with k/N

(expected) share of the variance (though, you hope, the "mess" is small for

each case). But that k/N suggests that in an ANOVA setting, an R-squared

near 1.0 is much more disrupted than an R-squared near zero. Compare the

fraction k/N to an error variance of the underlying data that is 5%, to the

case where it is 95%. Roughly speaking.

When there is a bunch missing, you really need to be careful, and I don't

think there's a single answer that fits all cases for multivariate data.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Thursday, December 6, 2018 10:19 AM
To: [hidden email]
Subject: Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Bruce Weaver

Dec 06, 2018; 7:43pm

Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Administrator

In reply to this post by Bruce Weaver

The approach described in my article with Maxwell is based directly on this
excerpt from John Graham's Annual Review of Psychology article
(https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf).

--- start of excerpt (p. 556) ---
Good uses of the EM algorithm. Although the EM algorithm provides excellent
parameter estimates, the lack of convenient standard errors means that EM is
not particularly good for hypothesis testing. On the other hand, several
important analyses, often preliminary analyses, don’t use standard errors
anyway, so the EM estimates are very useful. First, it is often desirable
to report means, standard deviations, and sometimes a correlation matrix in
one's paper. I would argue that the best estimates for these quantities are
the ML estimates provided by EM. Second, data quality analyses, for
example, coefficient alpha analyses, because they typically do not involve
standard errors, can easily be based on the EM covariance matrix (e.g., see
Enders 2003; Graham et al. 2002, 2003). The EM covariance matrix is also
an excellent basis for exploratory factor analysis with missing data. This
is especially easy with the SAS/STAT software program (SAS Institute); one
simply includes the relevant variables in Proc MI, asking for the EM matrix
to be output. That matrix may then be used as input for Proc Factor using
the "type = cov" option.
--- end or excerpt ---

HTH.

Bruce Weaver wrote

> Hello Placide. You could use MVA to generate a matrix of EM correlations
> (or
> covariances), and use that matrix as input to the FACTOR command.
> Unfortunately, MVA does not have a /MATRIX sub-command to facilitate this.
> But my colleague Hillary Maxwell and I wrote a couple macros to do that
> job.
> See the links below for details.
>
> https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/spss/my-spss-page/emcorr
> http://tqmp.org/Content/vol10-2/p143/p143.pdf
>
> HTH.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ann De Jonghe

Dec 06, 2018; 7:50pm

Re: SIGNOFF SPSSX-L

In reply to this post by Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Rich Ulrich <[hidden email]>
Sent: Thursday, December 6, 2018 7:14 PM
To: [hidden email]
Subject: Re: Question on INCLUDE instruction when managing missing data in a FACTOR ANALYSIS

Bruce,

- those are worth-while comments.

I wish I had said

> Mean-substituting is not /terrible/ for MAR... for factor analysis.

The choice may be "conservative results" versus "results based on artifacts."

And I did say, Do the factoring two ways and compare the results. A problem

for other replacement for factor analysis is that the algorithm shapes the

factor results.

MISSINGs create other problems for inference and testing, even when you

meet the assumptions of Missing at Random. And I don't like to trust that

Missings are at random.

If you replace a large number of Missings, the over-estimate of d.f. for tests

might be too much.

Also, in a sample size of N with k replacements, you are messing with k/N

(expected) share of the variance (though, you hope, the "mess" is small for

each case). But that k/N suggests that in an ANOVA setting, an R-squared

near 1.0 is much more disrupted than an R-squared near zero. Compare the

fraction k/N to an error variance of the underlying data that is 5%, to the

case where it is 95%. Roughly speaking.

When there is a bunch missing, you really need to be careful, and I don't

think there's a single answer that fits all cases for multivariate data.

Rich Ulrich