SPSSX Discussion

missing values in mixed

Classic

List

Threaded

4 messages Options

Frans Marcelissen-4

missing values in mixed

Hi.
I am trying to inderstand how missing values are handled by mixed.
As far as I understand, missing values are removed listwise. That means that a whole row is removed when one value is missing.
however, repeated measurements in MIXED are spread out
over multiple rows, so MIXED will use the rows with valid values for a
given subject.
Now I have done some tests with the syntax and the data of the book "multilevel and longitudinal modeling with ibm spss"
The syntax is as follows:
MIXED test WITH time quadtime
/FIXED=time quadtime | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN)

test is a variable with three repeated measures (time=1,2,3)

Now I create 50% missings on quadtime, but only on time=1. Of course this gives other results then the full dataset.
Now I remove all the cases that have a missing on quadtime. So not only time 1 is removed, but also time =2 or 3.
Now I would expect that the results would be different, but the results are exactly the same as the analysis with only missings on t=1
This gives me the impression that the handling of missing data is listwise on the case level (maybe because of subject(id))?
Could anyone explain how this works?
Frans

Ryan

Re: missing values in mixed

Frans,

The MIXED does not remove missing *response* data listwise. That is, if a subject is missing data on "y" at one time point then that row will be excluded. However, it does not mean that the row in which response data were obtained from the same subject at other time points will be discarded.

If you have a time-invariant predictor (e.g., gender) then all cases for a subject whose gender is missing will be removed (aka listwise deletion). On the other hand, if you have a time-varying predictor, then only the cases where time points for which the data are missing on the time-varying variable will be excluded.

To sum up, the MIXED procedure will remove an entire row if any data are missing on any of the variables listed in the MIXED code on that row.

Ryan

Sent from my iPhone

> On May 22, 2014, at 5:58 AM, Frans Marcelissen <[hidden email]> wrote:
>
> Hi.
> I am trying to inderstand how missing values are handled by mixed.
> As far as I understand, missing values are removed listwise. That means that a whole row is removed when one value is missing.
> however, repeated measurements in MIXED are spread out
> over multiple rows, so MIXED will use the rows with valid values for a
> given subject.
> Now I have done some tests with the syntax and the data of the book "multilevel and longitudinal modeling with ibm spss"
> The syntax is as follows:
> MIXED test WITH time quadtime
> /FIXED=time quadtime | SSTYPE(3)
> /METHOD=REML
> /PRINT=G SOLUTION TESTCOV
> /RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN)
>
> test is a variable with three repeated measures (time=1,2,3)
>
> Now I create 50% missings on quadtime, but only on time=1. Of course this gives other results then the full dataset.
> Now I remove all the cases that have a missing on quadtime. So not only time 1 is removed, but also time =2 or 3.
> Now I would expect that the results would be different, but the results are exactly the same as the analysis with only missings on t=1
> This gives me the impression that the handling of missing data is listwise on the case level (maybe because of subject(id))?
> Could anyone explain how this works?
> Frans

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Frans Marcelissen-4

Re: missing values in mixed

Hy Ryan (and others),

May I ask to reread my question? Probably it was not clear.

What you describe is exactly what I did understand. But when testing it it appeared to work another way. I have a test dataset with 3 repeated measures. T1 has 50%missing values. Whe I run the analysis it gives some results. But when I remove all the cases with missing values on t1 (so removing not only t1 of these cases but also t2,t3, etc) it gives exactly the same results. This is not what I expected. Could anyone explain this to me?

Here is the syntax

compute selectie1=rnd(id/2)*2=id.
if not (selectie1) quadtime=$sysmis.

*ie quadtime is missing when id=even.

MIXED test WITH time quadtime
/FIXED=time quadtime | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN).

select if selectie1.

* ie remove all uneven cases, = cases with missing on t1.

MIXED test WITH time quadtime
/FIXED=time quadtime | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN).

2014-05-22 15:30 GMT+02:00 GMAIL <[hidden email]>:

Frans,

The MIXED does not remove missing *response* data listwise. That is, if a subject is missing data on "y" at one time point then that row will be excluded. However, it does not mean that the row in which response data were obtained from the same subject at other time points will be discarded.

If you have a time-invariant predictor (e.g., gender) then all cases for a subject whose gender is missing will be removed (aka listwise deletion). On the other hand, if you have a time-varying predictor, then only the cases where time points for which the data are missing on the time-varying variable will be excluded.

To sum up, the MIXED procedure will remove an entire row if any data are missing on any of the variables listed in the MIXED code on that row.

Ryan

Sent from my iPhone

> On May 22, 2014, at 5:58 AM, Frans Marcelissen <[hidden email]> wrote:
>
> Hi.
> I am trying to inderstand how missing values are handled by mixed.
> As far as I understand, missing values are removed listwise. That means that a whole row is removed when one value is missing.
> however, repeated measurements in MIXED are spread out
> over multiple rows, so MIXED will use the rows with valid values for a
> given subject.
> Now I have done some tests with the syntax and the data of the book "multilevel and longitudinal modeling with ibm spss"
> The syntax is as follows:
> MIXED test WITH time quadtime
> /FIXED=time quadtime | SSTYPE(3)
> /METHOD=REML
> /PRINT=G SOLUTION TESTCOV
> /RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN)
>
> test is a variable with three repeated measures (time=1,2,3)
>
> Now I create 50% missings on quadtime, but only on time=1. Of course this gives other results then the full dataset.
> Now I remove all the cases that have a missing on quadtime. So not only time 1 is removed, but also time =2 or 3.
> Now I would expect that the results would be different, but the results are exactly the same as the analysis with only missings on t=1
> This gives me the impression that the handling of missing data is listwise on the case level (maybe because of subject(id))?
> Could anyone explain how this works?
> Frans

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: missing values in mixed

Frans,

Let's start by making sure you have correctly constructed your dataset. Run the SPSS syntax below, investigate the artificial dataset, and conduct your testing. When conducting your testing, please make sure the syntax you write to filter the data works as you expect.

Again, any row that has a missing value on any variable used in the linear mixed model will be excluded from the analysis. Rows that do not have any missing values on the variables used in the linear mixed model will be included in the analysis.

Ryan

/*Generate Data*/.

/*seed for random generator*/.

set seed 987879546.

new file.

input program.

compute ID = -99.

compute #Gamma00 = -99.

compute #Gamma10 = -99.

compute #V11 = -.99.

compute #V22 = -99.

compute #rho = -99.

compute #V21 = -99.

compute #a11 = -99.

compute #a21 = -99.

compute #a22 = -99.

compute #x0j = -99.

compute #x1j = -99.

compute #u0j = -99.

compute #u1j = -99.

compute #B0J = -99.

compute #B1J = -99.

compute #eij = -99.

compute time = -99.

leave ID to time.

/*1000 subjects*/.

loop ID = 1 to 10000.

/*fixed intercept*/.

compute #Gamma00 = 0.50.

/*fixed slope*/.

compute #Gamma10 = 0.30.

/*random intercept var*/.

compute #V11 = 0.80.

/*random slope var*/.

compute #V22 = 0.50.

/*random intercept and slope corr*/.

compute #rho = 0.35.

/*random intercept and slope cov*/.

compute #V21 = #rho*sqrt(#V11*#V22).

compute #a11 = sqrt(#V11).

compute #a21 = #V21/#a11.

compute #a22 = sqrt(#V22 - #a21*#a21).

/*norm. dist r.v.*/.

compute #x0j = rv.normal(0,1).

/*norm. dist r.v.*/.

compute #x1j = rv.normal(0,1).

/*random intercept error term*/.

compute #u0j = #a11*#x0j.

/*random slope error term*/.

compute #u1j = #a21*#x0j + #a22*#x1j.

/*random intercept term*/.

compute #B0J = #Gamma00 + #u0j.

/*random slope term*/.

compute #B1J = #Gamma10 + #u1j.

/*5 time points*/.

loop time = 1 to 3.

/*error term*/.

compute #eij = rv.normal(0,1).

/*full equation*/.

compute test = #B0J + #B1J*time + #eij.

end case.

end loop.

end file.

end input program.

execute.

/*End Data Generation*/.

MIXED test WITH time

/FIXED=time | SSTYPE(3)

/METHOD=REML

/PRINT=SOLUTION G

/RANDOM=INTERCEPT time | SUBJECT(ID) COVTYPE(UN).

On Fri, May 23, 2014 at 3:52 AM, Frans Marcelissen <[hidden email]> wrote:

Hy Ryan (and others),
May I ask to reread my question? Probably it was not clear.

What you describe is exactly what I did understand. But when testing it it appeared to work another way. I have a test dataset with 3 repeated measures. T1 has 50%missing values. Whe I run the analysis it gives some results. But when I remove all the cases with missing values on t1 (so removing not only t1 of these cases but also t2,t3, etc) it gives exactly the same results. This is not what I expected. Could anyone explain this to me?

Here is the syntax

compute selectie1=rnd(id/2)*2=id.
if not (selectie1) quadtime=$sysmis.
*ie quadtime is missing when id=even.

MIXED test WITH time quadtime
/FIXED=time quadtime | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN).

select if selectie1.
* ie remove all uneven cases, = cases with missing on t1.

MIXED test WITH time quadtime
/FIXED=time quadtime | SSTYPE(3)
/METHOD=REML
/PRINT=G SOLUTION TESTCOV
/RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN).

2014-05-22 15:30 GMT+02:00 GMAIL <[hidden email]>:

Frans,

The MIXED does not remove missing *response* data listwise. That is, if a subject is missing data on "y" at one time point then that row will be excluded. However, it does not mean that the row in which response data were obtained from the same subject at other time points will be discarded.

If you have a time-invariant predictor (e.g., gender) then all cases for a subject whose gender is missing will be removed (aka listwise deletion). On the other hand, if you have a time-varying predictor, then only the cases where time points for which the data are missing on the time-varying variable will be excluded.

To sum up, the MIXED procedure will remove an entire row if any data are missing on any of the variables listed in the MIXED code on that row.

Ryan

Sent from my iPhone

> On May 22, 2014, at 5:58 AM, Frans Marcelissen <[hidden email]> wrote:
>
> Hi.
> I am trying to inderstand how missing values are handled by mixed.
> As far as I understand, missing values are removed listwise. That means that a whole row is removed when one value is missing.
> however, repeated measurements in MIXED are spread out
> over multiple rows, so MIXED will use the rows with valid values for a
> given subject.
> Now I have done some tests with the syntax and the data of the book "multilevel and longitudinal modeling with ibm spss"
> The syntax is as follows:
> MIXED test WITH time quadtime
> /FIXED=time quadtime | SSTYPE(3)
> /METHOD=REML
> /PRINT=G SOLUTION TESTCOV
> /RANDOM=INTERCEPT time | SUBJECT(id) COVTYPE(UN)
>
> test is a variable with three repeated measures (time=1,2,3)
>
> Now I create 50% missings on quadtime, but only on time=1. Of course this gives other results then the full dataset.
> Now I remove all the cases that have a missing on quadtime. So not only time 1 is removed, but also time =2 or 3.
> Now I would expect that the results would be different, but the results are exactly the same as the analysis with only missings on t=1
> This gives me the impression that the handling of missing data is listwise on the case level (maybe because of subject(id))?
> Could anyone explain how this works?
> Frans

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD