SPSSX Discussion

Treatment of missing data in Mixed

Classic

List

Threaded

5 messages Options

Maguin, Eugene

Treatment of missing data in Mixed

All,

Sorry, hit the wrong key!

Somebody told me that Mixed uses the EM algorithm to include cases with missing data. My understanding, based on the documentation for the missing subcommand, is that EM was not used.

􀂄 Cases, which contain system-missing values in one of the variables, are always deleted.
􀂄 The keywords EXCLUDE and INCLUDE are mutually exclusive. Only one of them can be
specified at once.
EXCLUDE Exclude both user-missing and system-missing values. This is the default.
INCLUDE User-missing values are treated as valid. System-missing values cannot be included
in the analysis.

Furthermore, I don't find any mention of 'EM' or 'expectation-maximization' in the Mixed section of the Algorithms documentation.

I conclude that Mixed does not use EM to include cases with missing data. Is my understanding correct?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Treatment of missing data in Mixed

Gene,

Suppose you have the following dataset:

id time x y
1 1 12 23
1 2 11 13
1 3 45
1 4 54 10
2 1 76 11
2 2 44 39
2 3 83 26
2 4 52 14
.
.
.

In the example above, the third case would not be used in parameter estimation.

Ryan

On Fri, Jul 15, 2011 at 11:33 AM, Gene Maguin <[hidden email]> wrote:

> All,
>
> Sorry, hit the wrong key!
>
> Somebody told me that Mixed uses the EM algorithm to include cases with missing data. My understanding, based on the documentation for the missing subcommand, is that EM was not used.
>
> 􀂄 Cases, which contain system-missing values in one of the variables, are always deleted.
> 􀂄 The keywords EXCLUDE and INCLUDE are mutually exclusive. Only one of them can be
> specified at once.
> EXCLUDE Exclude both user-missing and system-missing values. This is the default.
> INCLUDE User-missing values are treated as valid. System-missing values cannot be included
> in the analysis.
>
> Furthermore, I don't find any mention of 'EM' or 'expectation-maximization' in the Mixed section of the Algorithms documentation.
>
> I conclude that Mixed does not use EM to include cases with missing data. Is my understanding correct?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Treatment of missing data in Mixed

Gene

Another example would be where you are only missing data for the
covariate, x. Even in this scenario, that case would be excluded from
the analysis.

Does this answer your question?

Ryan

On Fri, Jul 15, 2011 at 11:51 AM, R B <[hidden email]> wrote:

> Gene,
>
> Suppose you have the following dataset:
>
> id � time � x � y
> 1 � � 1 � � 12 � 23
> 1 � � 2 � � 11 � 13
> 1 � � 3 � � 45
> 1 � � 4 � � 54 � 10
> 2 � � 1 � � 76 � 11
> 2 � � 2 � � 44 � 39
> 2 � � 3 � � 83 � 26
> 2 � � 4 � � 52 � 14
> .
> .
> .
>
>
> In the example above, the third case would not be used in parameter estimation.
>
> Ryan
>
> On Fri, Jul 15, 2011 at 11:33 AM, Gene Maguin <[hidden email]> wrote:
>> All,
>>
>> Sorry, hit the wrong key!
>>
>> Somebody told me that Mixed uses the EM algorithm to include cases with missing data. My understanding, based on the documentation for the missing subcommand, is that EM was not used.
>>
>> 􀂄 Cases, which contain system-missing values in one of the variables, are always deleted.
>> 􀂄 The keywords EXCLUDE and INCLUDE are mutually exclusive. Only one of them can be
>> specified at once.
>> EXCLUDE Exclude both user-missing and system-missing values. This is the default.
>> INCLUDE User-missing values are treated as valid. System-missing values cannot be included
>> in the analysis.
>>
>> Furthermore, I don't find any mention of 'EM' or 'expectation-maximization' in the Mixed section of the Algorithms documentation.
>>
>> I conclude that Mixed does not use EM to include cases with missing data. Is my understanding correct?
>>
>> Thanks, Gene Maguin
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>

Ryan

Re: Treatment of missing data in Mixed

In reply to this post by Ryan

Gene,

Quite busy today, but a couple of brief comments are interspersed below.

On Fri, Jul 15, 2011 at 1:05 PM, Gene Maguin <[hidden email]> wrote:

> Ryan,
>
> Thank you for taking time to reply. The other person's comments came up in
> the context of a multilevel grwoth model. So I'll modify your example data
> to illustrate two scenarios.
>
> Suppose you have the following dataset:
>
> id time x y a
> 1 1 12 23 0
> 1 2 11 13 0
> 1 3 45 . 0
> 1 4 54 10 0
> 2 1 76 11 .
> 2 2 44 39 .
> 2 3 83 26 .
> 2 4 52 14 .
> 3 1 63 11 1
> 3 2 58 . 1
> 3 3 81 . 1
> 3 4 66 . 1
>
> So let's analyze this as a growth curve model for y as the DV with level 1
> (x) and level 2 (a) covariates. I'll write the equations for my benefit
>
> Y(i,t) = B0(i)+ B1(i)*x+e(i,t).
> B0(i) = G00+G01*a+f0(i).
> B1(i) = G10+G11*a+f1(i).
>
> 1) what happens for the id=2 cases? The level 1 equations can be solved but
> the level 2 covariate is missing.

Therefore, all cases associated with id=2 will be excluded from the analysis.

>
> 2) what happens for the id=3 cases? The level 1 eqations can't be solved and
> that would seem to cause that case to be excluded from the level two
> computation also. Or, does something else happen?

Yes, cases for which there is no data on the response ("y") will also
be excluded from the analysis.

>
> I understand that the level 1 equations are not solved first and the
> calculated values inserted in the level 2 equations. They are all solved
> together. But, I don't understand the mechanics, or maybe the accounting,
> that goes on. I've never seen this discussed in a text, or maybe, discussed
> in a what that I understand what is being stated.

My understanding is that any case that has missing data on any
variable incorporated into the model will be excluded from the
analysis. I suggest you create a small data set and test this out. I
bet you'll find that a case which is missing data on a first level
covariate, higher level covariate, or the response variable will not
be used. Certainly write back if I'm incorrect!

HTH,

Ryan

>
> Thanks, Gene
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Treatment of missing data in Mixed

In reply to this post by Maguin, Eugene

At 11:33 AM 7/15/2011, Gene Maguin wrote:

>Somebody told me that Mixed uses the EM algorithm to include cases
>with missing data. My understanding, based on the documentation for
>the missing subcommand, is that EM was not used.

As a wild guess, could they be referring to MIXED's ability to work
with an unbalanced design, with different number of replicates per subject?

I believe that mixed-model analysis is a generalization of
repeated-measures ANOVA, which usually did require a balanced design;
that generalizing to unbalanced designs raised some mathematical
difficulties but also greatly increased the computational load; and
that one reason mixed-model procedures are now widely available is
that the capacity of normally-used machines has increased enough to
handle that load.

-Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD