SPSSX Discussion

Filling in the top half of a correlation matrix

Classic

List

Threaded

13 messages Options

Bruce Weaver

Filling in the top half of a correlation matrix

Administrator

In another thread (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html), I suggested that one can use the EM correlations from MVA as input for FACTOR when one is doing exploratory factor analysis, but has missing data. As I've been exploring how to do that, I've run up against a small problem: The matrix of EM correlations can be captured via OMS, but it contains only the lower half (and main diagonal) of the correlation matrix. But FACTOR wants the full matrix as input. The only way I could think of to fill in the top half was with a little MATRIX program, like the one shown below. Dataset LowerHalf holds the lower half of the EM correlation matrix, and looks like this, for example:

V1 V2 V3 V4 V5
1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000

On first attempt to grab variables V1 to V5 with the GET command in the MATRIX program, it objected to the system missing values. So I first recoded SYSMIS to 99 in the upper half of the correlation matrix.

DATASET ACTIVATE LowerHalf.
recode V1 to V5 (sysmis=99).
execute.

MATRIX.
get CM / file = * / variables = V1 to V5.
print CM / format = "f5.3".
loop r = 1 to nrow(CM)-1.
loop c = 2 to ncol(CM).
compute CM(r,c) = CM(c,r).
end loop.
end loop.
print CM / format = "f5.3".
msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.
END MATRIX.

The two PRINT statements in the matrix program were included initially to check that things were working as expected. But later, I found that it didn't work properly if I removed them.

So, I have two questions:

1. Is there some easier alternative to the double loop in a matrix program for filling in the top half of the correlation matrix? (I was thinking MCONVERT or something like that, but found nothing suitable.)

2. Any thoughts on why removal of the two PRINT commands in my matrix program is causing it to go all FUBAR? (I tried removing the PRINTs and including an EXECUTE after the double-loop, but that did not fix it.)

Thanks,
Bruce

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Art Kendall

Re: Filling in the top half of a correlation matrix

Does the EM Correlations procedure have a /MATRIX option?

Art Kendall
Social Research Consultants

On 9/25/2013 4:06 PM, Bruce Weaver [via SPSSX Discussion] wrote:

In another thread (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html), I suggested that one can use the EM correlations from MVA as input for FACTOR when one is doing exploratory factor analysis, but has missing data. As I've been exploring how to do that, I've run up against a small problem: The matrix of EM correlations can be captured via OMS, but it contains only the lower half (and main diagonal) of the correlation matrix. But FACTOR wants the full matrix as input. The only way I could think of to fill in the top half was with a little MATRIX program, like the one shown below. Dataset LowerHalf holds the lower half of the EM correlation matrix, and looks like this, for example:

V1 V2 V3 V4 V5
1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000

On first attempt to grab variables V1 to V5 with the GET command in the MATRIX program, it objected to the system missing values. So I first recoded SYSMIS to 99 in the upper half of the correlation matrix.

DATASET ACTIVATE LowerHalf.
recode V1 to V5 (sysmis=99).
execute.

MATRIX.
get CM / file = * / variables = V1 to V5.
print CM / format = "f5.3".
loop r = 1 to nrow(CM)-1.
loop c = 2 to ncol(CM).
compute CM(r,c) = CM(c,r).
end loop.
end loop.
print CM / format = "f5.3".
msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.
END MATRIX.

The two PRINT statements in the matrix program were included initially to check that things were working as expected. But later, I found that it didn't work properly if I removed them.

So, I have two questions:

1. Is there some easier alternative to the double loop in a matrix program for filling in the top half of the correlation matrix? (I was thinking MCONVERT or something like that, but found nothing suitable.)

2. Any thoughts on why removal of the two PRINT commands in my matrix program is causing it to go all FUBAR? (I tried removing the PRINTs and including an EXECUTE after the double-loop, but that did not fix it.)

Thanks,
Bruce

--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

David Marso

Re: Filling in the top half of a correlation matrix

Administrator

In reply to this post by Bruce Weaver

Does something l;ike the following work?
--
DATA LIST LIST /V1 V2 V3 V4 V5 .
BEGIN DATA
1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000
END DATA.
DATASET NAME lower .
DATASET DECLARE cout.
DATASET ACTIVATE lower.
MATRIX.
get CM / file = lower / variables = V1 to V5/MISSING=ACCEPT / VALUE=0.
COMPUTE CM=CM+T(CM).
SAVE CM / OUTFILE COut .
END MATRIX.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Bruce Weaver

Re: Filling in the top half of a correlation matrix

Administrator

In reply to this post by Art Kendall

Hi Art. The table of EM correlations is generated via the MVA command, like this:

DATASET DECLARE EM1.
OMS
/SELECT TABLES
/IF COMMANDS=['MVA'] SUBTYPES=['EOUT_EM CORRELATIONS']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='EM1'.

dataset activate raw.
MVA VARIABLES= {variable list} /EM.

OMSEND.

I wrap it in OMS, because that's the only way I can see to get the EM correlations into a data file.

You can add an OUTFILE option to the /EM sub-command to write a file of "raw" data with missing values imputed*, but I see no /MATRIX option.

* As John Graham notes on p. 556 of the following, use of this imputed dataset is not recommended. One should use MI instead to get multiple imputed datasets.

http://www.stats.ox.ac.uk/~snijders/Graham2009.pdf

Cheers,
Bruce

Art Kendall wrote

Does the EM
Correlations procedure have a /MATRIX option?
Art Kendall
Social Research Consultants
On 9/25/2013 4:06 PM, Bruce Weaver [via SPSSX Discussion] wrote:

In another thread ( http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html ),
I suggested that one can use the EM correlations from MVA as input
for FACTOR when one is doing exploratory factor analysis, but has
missing data. As I've been exploring how to do that, I've run up
against a small problem: The matrix of EM correlations can be
captured via OMS, but it contains only the lower half (and
main diagonal) of the correlation matrix. But FACTOR wants the full
matrix as input. The only way I could think of to fill in
the top half was with a little MATRIX program, like the one shown
below. Dataset LowerHalf holds the lower half of the EM
correlation matrix, and looks like this, for example:

V1 V2 V3 V4 V5

1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000

On first attempt to grab variables V1 to V5 with the GET command
in the MATRIX program, it objected to the system missing values.
So I first recoded SYSMIS to 99 in the upper half of the
correlation matrix.

DATASET ACTIVATE LowerHalf.

recode V1 to V5 (sysmis=99).

execute.

MATRIX.

get CM / file = * / variables = V1 to V5.

print CM / format = "f5.3".

loop r = 1 to nrow(CM)-1.

loop c = 2 to ncol(CM).

compute CM(r,c) = CM(c,r).
end loop.

end loop.

print CM / format = "f5.3".

msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.

END MATRIX.

The two PRINT statements in the matrix program were included
initially to check that things were working as expected. But
later, I found that it didn't work properly if I removed them.

So, I have two questions:

1. Is there some easier alternative to the double loop in a matrix
program for filling in the top half of the correlation matrix? (I
was thinking MCONVERT or something like that, but found nothing
suitable.)

2. Any thoughts on why removal of the two PRINT commands in my
matrix program is causing it to go all FUBAR? (I tried removing
the PRINTs and including an EXECUTE after the double-loop, but
that did not fix it.)

Thanks,

Bruce

--

Bruce Weaver

[hidden email]

http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.

To send me an e-mail, please use the address shown above.

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

David Marso

Re: Filling in the top half of a correlation matrix

Administrator

In reply to this post by David Marso

Or the following?
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST /V1 V2 V3 V4 V5 .
BEGIN DATA
1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000
END DATA.

DATASET NAME lower.
COMPUTE ID=$CASENUM.
FLIP.
DATASET NAME flipped.
RENAME VARIABLES (var001 TO var005 = V1 TO V5).
COMPUTE ID=$CASENUM.
UPDATE FILE flipped / FILE lower/ BY ID.
SELECT IF (CASE_LBL NE 'ID').
EXECUTE.
DELETE VARIABLES ID.
LIST.

Kirill Orlov

Re: Filling in the top half of a correlation matrix

In reply to this post by Bruce Weaver

Bruce,
If you chose to do it via MATRIX, here it is.

matrix.get m /vari= v1 to v5 /miss= 0 /names= names.comp m= m+t(m).print m.call setdiag(m,1).save m /outfile= * /names= names.end matrix.

The above example takes only the matrix body - i.e. without variables ROWTYPE_ and VARNAME_ (if you have such there) - but you can modify it to take in those, too.

You can do a similar thing using MGET / MSAVE matrix statements, but I don't recommend it ever.

26.09.2013 0:06, Bruce Weaver пишет:

In another thread
(http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html),
I suggested that one can use the EM correlations from MVA as input for
FACTOR when one is doing exploratory factor analysis, but has missing data.
As I've been exploring how to do that, I've run up against a small problem:
The matrix of EM correlations can be captured via OMS, but it contains only
the /lower half/ (and main diagonal) of the correlation matrix.  But FACTOR
wants the /full matrix/ as input.  The only way I could think of to fill in
the top half was with a little MATRIX program, like the one shown below.
Dataset LowerHalf holds the lower half of the EM correlation matrix, and
looks like this, for example:

    V1     V2     V3     V4     V5
 1.000   .      .      .      .
  .508  1.000   .      .      .
  .347   .583  1.000   .      .
  .204   .243   .294  1.000   .
  .108   .166   .213   .250  1.000

On first attempt to grab variables V1 to V5 with the GET command in the
MATRIX program, it objected to the system missing values.  So I first
recoded SYSMIS to 99 in the upper half of the correlation matrix.

DATASET ACTIVATE LowerHalf.
recode V1 to V5 (sysmis=99).
execute.

MATRIX.
get CM  / file = * / variables = V1 to V5.
print CM / format = "f5.3".
loop r = 1 to nrow(CM)-1.
loop c = 2 to ncol(CM).
*compute CM(r,c) = CM(c,r).*
end loop.
end loop.
print CM / format = "f5.3".
msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.
END MATRIX.

The two PRINT statements in the matrix program were included initially to
check that things were working as expected.  But later, I found that it
didn't work properly if I removed them.

So, I have two questions:

1. Is there some easier alternative to the double loop in a matrix program
for filling in the top half of the correlation matrix?  (I was thinking
MCONVERT or something like that, but found nothing suitable.)

2. Any thoughts on why removal of the two PRINT commands in my matrix
program is causing it to go all FUBAR?  (I tried removing the PRINTs and
including an EXECUTE after the double-loop, but that did not fix it.)

Thanks,
Bruce





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kirill Orlov

Re: Filling in the top half of a correlation matrix

In reply to this post by David Marso

Ah, that was interesting piece below. +1, David.

26.09.2013 1:12, David Marso пишет:

Or the following?
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST   /V1     V2     V3     V4     V5   .
BEGIN DATA
 1.000   .      .      .      .
  .508  1.000   .      .      .
  .347   .583  1.000   .      .
  .204   .243   .294  1.000   .
  .108   .166   .213   .250  1.000
END DATA.

DATASET NAME lower.
COMPUTE ID=$CASENUM.
FLIP.
DATASET NAME flipped.
RENAME VARIABLES (var001 TO var005 = V1 TO V5).
COMPUTE ID=$CASENUM.
UPDATE FILE flipped  / FILE lower/ BY ID.
SELECT IF (CASE_LBL NE 'ID').
EXECUTE.
DELETE VARIABLES ID.
LIST.

Bruce Weaver

Re: Filling in the top half of a correlation matrix

Administrator

In reply to this post by David Marso

Aha...CM = CM + T(CM) was what I was looking for! Nice one, David. Thanks. I'm dead chuffed, as our British friends might say. ;-)

For some reason, /MISSING=ACCEPT / VALUE=0 is not working as expected--I end up with sysmis in the top half of the matrix. But no big deal. I can just recode sysmis to 0 before running the matrix program.

So for the record, here's what it looks like now. !MyVarList is a macro defining the list of variables in the correlation matrix.

********************************************* .

dataset activate EM1. /* the EM correlations from OMS.
recode !MyVarList (sysmis=0).
execute.

MATRIX.
get CM / file = * / variables = !MyVarList .
compute CM=CM+T(CM).
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
END MATRIX.
DATASET NAME EM2.

********************************************* .

There's still a bit to do after that (e.g., adding an N row with the count of cases in the raw data file), but that's all fairly straightforward. Here's an example of the final result in a matrix file ready for input to FACTOR:

ROWTYPE_ VARNAME_ V1 V2 V3 V4 V5
N 235.0000 235.0000 235.0000 235.0000 235.0000
CORR V1 1.00000 .50810 .34713 .20362 .10753
CORR V2 .50810 1.00000 .58283 .24337 .16637
CORR V3 .34713 .58283 1.00000 .29423 .21310
CORR V4 .20362 .24337 .29423 1.00000 .24957
CORR V5 .10753 .16637 .21310 .24957 1.00000

David Marso wrote

Does something l;ike the following work?
--
DATA LIST LIST /V1 V2 V3 V4 V5 .
BEGIN DATA
1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000
END DATA.
DATASET NAME lower .
DATASET DECLARE cout.
DATASET ACTIVATE lower.
MATRIX.
get CM / file = lower / variables = V1 to V5/MISSING=ACCEPT / VALUE=0.
COMPUTE CM=CM+T(CM).
SAVE CM / OUTFILE COut .
END MATRIX.

Art Kendall

Re: Filling in the top half of a correlation matrix

In reply to this post by Bruce Weaver

Although, pairwise deletion can result in a very problematic matrix, it would be an interesting exercise to compare that matrix to the one output by the MVA procedure, and the one output by list deletion.
It might also be an interesting exercise to do an INDSCAL on matrices with listwise, pairwise, and each of the missing value imputation methods or from each of the different imputed data sets.

Art Kendall
Social Research Consultants

On 9/25/2013 5:12 PM, Bruce Weaver [via SPSSX Discussion] wrote:

Hi Art. The table of EM correlations is generated via the MVA command, like this:

DATASET DECLARE EM1.
OMS
/SELECT TABLES
/IF COMMANDS=['MVA'] SUBTYPES=['EOUT_EM CORRELATIONS']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='EM1'.

dataset activate raw.
MVA VARIABLES= {variable list} /EM.

OMSEND.

I wrap it in OMS, because that's the only way I can see to get the EM correlations into a data file.

You can add an OUTFILE option to the /EM sub-command to write a file of "raw" data with missing values imputed*, but I see no /MATRIX option.

* As John Graham notes on p. 556 of the following, use of this imputed dataset is not recommended. One should use MI instead to get multiple imputed datasets.

http://www.stats.ox.ac.uk/~snijders/Graham2009.pdf

Cheers,
Bruce

Art Kendall wrote

Does the EM
Correlations procedure have a /MATRIX option?
Art Kendall
Social Research Consultants
On 9/25/2013 4:06 PM, Bruce Weaver [via SPSSX Discussion] wrote:

In another thread ( http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html ),
I suggested that one can use the EM correlations from MVA as input
for FACTOR when one is doing exploratory factor analysis, but has
missing data. As I've been exploring how to do that, I've run up
against a small problem: The matrix of EM correlations can be
captured via OMS, but it contains only the lower half (and
main diagonal) of the correlation matrix. But FACTOR wants the full
matrix as input. The only way I could think of to fill in
the top half was with a little MATRIX program, like the one shown
below. Dataset LowerHalf holds the lower half of the EM
correlation matrix, and looks like this, for example:

V1 V2 V3 V4 V5

1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000

On first attempt to grab variables V1 to V5 with the GET command
in the MATRIX program, it objected to the system missing values.
So I first recoded SYSMIS to 99 in the upper half of the
correlation matrix.

DATASET ACTIVATE LowerHalf.

recode V1 to V5 (sysmis=99).

execute.

MATRIX.

get CM / file = * / variables = V1 to V5.

print CM / format = "f5.3".

loop r = 1 to nrow(CM)-1.

loop c = 2 to ncol(CM).

compute CM(r,c) = CM(c,r).
end loop.

end loop.

print CM / format = "f5.3".

msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.

END MATRIX.

The two PRINT statements in the matrix program were included
initially to check that things were working as expected. But
later, I found that it didn't work properly if I removed them.

So, I have two questions:

1. Is there some easier alternative to the double loop in a matrix
program for filling in the top half of the correlation matrix? (I
was thinking MCONVERT or something like that, but found nothing
suitable.)

2. Any thoughts on why removal of the two PRINT commands in my
matrix program is causing it to go all FUBAR? (I tried removing
the PRINTs and including an EXECUTE after the double-loop, but
that did not fix it.)

Thanks,

Bruce

--

Bruce Weaver

[hidden email]

http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.

To send me an e-mail, please use the address shown above.

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229p5722232.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Bruce Weaver

Re: Filling in the top half of a correlation matrix

Administrator

In reply to this post by Kirill Orlov

Another aha moment! Thank you Kirill for showing me that I was misunderstanding how the /MISSING sub-command works for GET. With /MISSING=0, I no longer need my recode out front.

I didn't really need the /NAMES=NAMES bit, because I have my variable list as a macro.

My final syntax (I think) looks like this, without the recode out front:

dataset activate EM1. /* EM correlations from OMS.
MATRIX.
get CM / file = * / variables = !MyVarList / missing=0 .
compute CM=CM+T(CM).
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
END MATRIX.
DATASET NAME EM2.

Note that I use MSAVE rather than SAVE, as this gives me the ROWTYPE_ and VARNAME_ variables I need. (They are not present in the EM1 dataset obtained via OMS.)

Thanks fellas. ;-)

Kirill Orlov wrote

Bruce,
If you chose to do it via MATRIX, here it is.

matrix.
get m /vari= v1 to v5 /miss= 0 /names= names.
comp m= m+t(m).
print m.
call setdiag(m,1).
save m /outfile= * /names= names.
end matrix.

The above example takes only the matrix body - i.e. without variables
ROWTYPE_ and VARNAME_ (if you have such there) - but you can modify it
to take in those, too.

You can do a similar thing using MGET / MSAVE matrix statements, but I
don't recommend it ever.

26.09.2013 0:06, Bruce Weaver ?????:
> In another thread
> (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html),
> I suggested that one can use the EM correlations from MVA as input for
> FACTOR when one is doing exploratory factor analysis, but has missing data.
> As I've been exploring how to do that, I've run up against a small problem:
> The matrix of EM correlations can be captured via OMS, but it contains only
> the /lower half/ (and main diagonal) of the correlation matrix. But FACTOR
> wants the /full matrix/ as input. The only way I could think of to fill in
> the top half was with a little MATRIX program, like the one shown below.
> Dataset LowerHalf holds the lower half of the EM correlation matrix, and
> looks like this, for example:
>
> V1 V2 V3 V4 V5
> 1.000 . . . .
> .508 1.000 . . .
> .347 .583 1.000 . .
> .204 .243 .294 1.000 .
> .108 .166 .213 .250 1.000
>
> On first attempt to grab variables V1 to V5 with the GET command in the
> MATRIX program, it objected to the system missing values. So I first
> recoded SYSMIS to 99 in the upper half of the correlation matrix.
>
> DATASET ACTIVATE LowerHalf.
> recode V1 to V5 (sysmis=99).
> execute.
>
> MATRIX.
> get CM / file = * / variables = V1 to V5.
> print CM / format = "f5.3".
> loop r = 1 to nrow(CM)-1.
> loop c = 2 to ncol(CM).
> *compute CM(r,c) = CM(c,r).*
> end loop.
> end loop.
> print CM / format = "f5.3".
> msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.
> END MATRIX.
>
> The two PRINT statements in the matrix program were included initially to
> check that things were working as expected. But later, I found that it
> didn't work properly if I removed them.
>
> So, I have two questions:
>
> 1. Is there some easier alternative to the double loop in a matrix program
> for filling in the top half of the correlation matrix? (I was thinking
> MCONVERT or something like that, but found nothing suitable.)
>
> 2. Any thoughts on why removal of the two PRINT commands in my matrix
> program is causing it to go all FUBAR? (I tried removing the PRINTs and
> including an EXECUTE after the double-loop, but that did not fix it.)
>
> Thanks,
> Bruce
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

Bruce Weaver

Re: Filling in the top half of a correlation matrix

Administrator

In reply to this post by Art Kendall

I was thinking along similar lines, Art. In a first draft of the syntax, I run the factor analysis using the EM correlations as input; but then I follow up using the raw data as input, and the following /MISSING options (in 3 separate models, obviously):

/MISSING LISTWISE
/MISSING PAIRWISE
/MISSING MEANSUB

I figure we should be prepared in case any inquisitive members of the team want to know how the results from EM correlations differ from any of these other approaches (given that the latter approaches are probably more familiar to them). We don't have the complete dataset yet, but judging from a quick glance at what I do have to date, I don't think it is going to make a huge difference to the final solution in this case. But I'll still argue for using the EM correlations, given the well-known limitations of the other methods for dealing with missing data.

Cheers,
Bruce

Art Kendall wrote

Although, pairwise deletion
can result in a very problematic
matrix, it would be an interesting exercise to compare
that matrix to the one output by the MVA procedure, and the one
output by list deletion.
It might also be an interesting exercise to do an INDSCAL on
matrices with listwise, pairwise, and each of the missing
value imputation methods or from each of the different imputed
data sets.
Art Kendall
Social Research Consultants
On 9/25/2013 5:12 PM, Bruce Weaver [via SPSSX Discussion] wrote:

Hi Art. The table of EM correlations is generated
via the MVA command, like this:

DATASET DECLARE EM1.

OMS

/SELECT TABLES

/IF COMMANDS=['MVA'] SUBTYPES=['EOUT_EM CORRELATIONS']

/DESTINATION FORMAT=SAV NUMBERED=TableNumber_

OUTFILE='EM1'.

dataset activate raw.

MVA VARIABLES= {variable list} /EM.

OMSEND.

I wrap it in OMS, because that's the only way I can see to get the
EM correlations into a data file.

You can add an OUTFILE option to the /EM sub-command to write a
file of "raw" data with missing values imputed*, but I see no
/MATRIX option.

* As John Graham notes on p. 556 of the following, use of this
imputed dataset is not recommended. One should use MI instead to
get multiple imputed datasets.

http://www.stats.ox.ac.uk/~snijders/Graham2009.pdf

Cheers,

Bruce

Art
Kendall wrote
Does the EM

Correlations procedure have a /MATRIX option?
Art Kendall

Social Research Consultants
On 9/25/2013 4:06 PM, Bruce Weaver [via SPSSX
Discussion] wrote:

In another thread ( http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html  ),

I suggested that one can use the EM correlations from
MVA as input

for FACTOR when one is doing exploratory factor
analysis, but has

missing data. As I've been exploring how to do that,
I've run up

against a small problem: The matrix of EM
correlations can be

captured via OMS, but it contains only the lower
half (and

main diagonal) of the correlation matrix. But FACTOR
wants the full

matrix as input. The only way I could think of to
fill in

the top half was with a little MATRIX program, like
the one shown

below. Dataset LowerHalf holds the lower half of the
EM

correlation matrix, and looks like this, for example:

V1 V2 V3 V4 V5

1.000 . . . .
.508 1.000 . . .
.347 .583 1.000 . .
.204 .243 .294 1.000 .
.108 .166 .213 .250 1.000

On first attempt to grab variables V1 to V5 with the
GET command

in the MATRIX program, it objected to the system
missing values.

So I first recoded SYSMIS to 99 in the upper half of
the

correlation matrix.

DATASET ACTIVATE LowerHalf.

recode V1 to V5 (sysmis=99).

execute.

MATRIX.

get CM / file = * / variables = V1 to V5.

print CM / format = "f5.3".

loop r = 1 to nrow(CM)-1.

loop c = 2 to ncol(CM).

compute CM(r,c) = CM(c,r).
end loop.

end loop.

print CM / format = "f5.3".

msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.

END MATRIX.

The two PRINT statements in the matrix program were
included

initially to check that things were working as
expected. But

later, I found that it didn't work properly if I
removed them.

So, I have two questions:

1. Is there some easier alternative to the double loop
in a matrix

program for filling in the top half of the correlation
matrix? (I

was thinking MCONVERT or something like that, but
found nothing

suitable.)

2. Any thoughts on why removal of the two PRINT
commands in my

matrix program is causing it to go all FUBAR? (I
tried removing

the PRINTs and including an EXECUTE after the
double-loop, but

that did not fix it.)

Thanks,

Bruce

--

Bruce Weaver

[hidden
email]

http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.

To send me an e-mail, please use the address shown
above.

If you reply to this email, your

message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html

To start a new topic under SPSSX Discussion, email

[hidden
email]
To unsubscribe from SPSSX Discussion, click

here .
NAML

--

Bruce Weaver

[hidden email]

http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.

To send me an e-mail, please use the address shown above.

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229p5722232.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

Bruce Weaver

Re: Filling in the top half of a correlation matrix

Administrator

This post was updated on .

In reply to this post by Bruce Weaver

After some off-list discussion with David, I inserted a PRINT after CM=CM+T(CM) and discovered that the main diagonal terms were all equal to 2 -- that is why Kirill included the CALL SETDIAG line in his solution. But interestingly, the MSAVE I used to write out the matrix seems to have fixed things, because the matrix in my EM2 dataset had 1's on the main diagonal.

dataset activate EM1. /* EM correlations from OMS.
MATRIX.
get CM / file = * / variables = !MyVarList / missing=0 .
compute CM=CM+T(CM).
print CM / format = "f5.3". /* 2's on the main diagonal at this point /*
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
END MATRIX.
DATASET NAME EM2. /* 1's on the main diagonal by this point /*

Meanwhile, it occurred to me this morning that I could replace missing with 1 when grabbing the matrix via GET, and then use element-wise multiplication (&*) of CM and T(CM). Now there is definitely no ned for SETDIAG. Here is a self-contained example, which demonstrates that it works just fine:

EDIT: Note that in the original version of this post, I showed %* as the element-wise multiplication function. It should have been &*, as in the example below!

NEW FILE.
DATASET CLOSE all.
MATRIX.
* Create lower half of a correlation matrix, but with 1's in the top half .
COMPUTE CM =
{1,1,1,1,1 ;
.508,1,1,1,1 ;
.347,.583,1,1,1;
.204,.243,.294,1,1;
.108,.166,.213,.250,1 }.
PRINT CM / format = "f5.3".
COMPUTE CM = CM &* T(CM).
PRINT CM / format = "f5.3".
MSAVE CM /TYPE=CORR /OUTFILE=* /VARIABLES=V1 to V5.
END MATRIX.
FORMATS V1 to V5 (f5.3).
LIST.

There are 1's on the main diagonal at all points, and here is the final result of the MSAVE:

ROWTYPE_ VARNAME_ V1 V2 V3 V4 V5

CORR V1 1.000 .508 .347 .204 .108
CORR V2 .508 1.000 .583 .243 .166
CORR V3 .347 .583 1.000 .294 .213
CORR V4 .204 .243 .294 1.000 .250
CORR V5 .108 .166 .213 .250 1.000

Bruce Weaver wrote

Another aha moment! Thank you Kirill for showing me that I was misunderstanding how the /MISSING sub-command works for GET. With /MISSING=0, I no longer need my recode out front.

I didn't really need the /NAMES=NAMES bit, because I have my variable list as a macro.

My final syntax (I think) looks like this, without the recode out front:

dataset activate EM1. /* EM correlations from OMS.
MATRIX.
get CM / file = * / variables = !MyVarList / missing=0 .
compute CM=CM+T(CM).
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
END MATRIX.
DATASET NAME EM2.

Note that I use MSAVE rather than SAVE, as this gives me the ROWTYPE_ and VARNAME_ variables I need. (They are not present in the EM1 dataset obtained via OMS.)

Thanks fellas. ;-)

Kirill Orlov wrote

Bruce,
If you chose to do it via MATRIX, here it is.

matrix.
get m /vari= v1 to v5 /miss= 0 /names= names.
comp m= m+t(m).
print m.
call setdiag(m,1).
save m /outfile= * /names= names.
end matrix.

The above example takes only the matrix body - i.e. without variables
ROWTYPE_ and VARNAME_ (if you have such there) - but you can modify it
to take in those, too.

You can do a similar thing using MGET / MSAVE matrix statements, but I
don't recommend it ever.

26.09.2013 0:06, Bruce Weaver ?????:
> In another thread
> (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html),
> I suggested that one can use the EM correlations from MVA as input for
> FACTOR when one is doing exploratory factor analysis, but has missing data.
> As I've been exploring how to do that, I've run up against a small problem:
> The matrix of EM correlations can be captured via OMS, but it contains only
> the /lower half/ (and main diagonal) of the correlation matrix. But FACTOR
> wants the /full matrix/ as input. The only way I could think of to fill in
> the top half was with a little MATRIX program, like the one shown below.
> Dataset LowerHalf holds the lower half of the EM correlation matrix, and
> looks like this, for example:
>
> V1 V2 V3 V4 V5
> 1.000 . . . .
> .508 1.000 . . .
> .347 .583 1.000 . .
> .204 .243 .294 1.000 .
> .108 .166 .213 .250 1.000
>
> On first attempt to grab variables V1 to V5 with the GET command in the
> MATRIX program, it objected to the system missing values. So I first
> recoded SYSMIS to 99 in the upper half of the correlation matrix.
>
> DATASET ACTIVATE LowerHalf.
> recode V1 to V5 (sysmis=99).
> execute.
>
> MATRIX.
> get CM / file = * / variables = V1 to V5.
> print CM / format = "f5.3".
> loop r = 1 to nrow(CM)-1.
> loop c = 2 to ncol(CM).
> *compute CM(r,c) = CM(c,r).*
> end loop.
> end loop.
> print CM / format = "f5.3".
> msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.
> END MATRIX.
>
> The two PRINT statements in the matrix program were included initially to
> check that things were working as expected. But later, I found that it
> didn't work properly if I removed them.
>
> So, I have two questions:
>
> 1. Is there some easier alternative to the double loop in a matrix program
> for filling in the top half of the correlation matrix? (I was thinking
> MCONVERT or something like that, but found nothing suitable.)
>
> 2. Any thoughts on why removal of the two PRINT commands in my matrix
> program is causing it to go all FUBAR? (I tried removing the PRINTs and
> including an EXECUTE after the double-loop, but that did not fix it.)
>
> Thanks,
> Bruce
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

Bruce Weaver

Re: Filling in the top half of a correlation matrix

Administrator

At the risk of beating a dead horse, here is one final (I hope) variation on the matrix program for filling in the top half of my correlation matrix. In a nutshell, I have gone back to setting the SYSMIS values in the top half of the matrix to 0, using CM = CM + T(CM) to fill in the top half, and using CALL SETDIAG to restore the values on the main diagonal to their correct original values. The reason for changing back to this approach is that it is more general--e.g., it will work for a covariance matrix just as well as a correlation matrix. Using my pervious method with a covariance matrix would have resulted in the terms on the main diagonal being equal to the squares of the variances. With the approach I'm using now, they are twice the size of the correct values for both covariance and correlation matrices.

Thanks again to Kirill & David for their help.

From my syntax file...

* In an earlier version of this matrix program, I set the missing
* values in the top half of the correlation matrix to 1, and then
* filled in the top half by setting CM = CM &* T(CM), where CM =
* the correlation matrix, &* is the element-wise multiplication
* function, and T(CM) is the transpose of matrix CM. But I have
* now changed it to fill in the top half as follows:
* 1) Set the missing correlations in the top half of the matrix to 0;
* 2) Store the main diagonal of matrix CM in vector D;
* 3) Let CM = CM + T(CM), at which point there are 2's on the diagonal;
* 4) Use CALL SETDIAG to set the diagonal to the values stored in D.
* I have changed to this more general approach because it will also
* work for covariance matrices, where the main diagonal holds variances,
* not 1's. Using CM = CM &* T(CM) would give me a matrix where the terms
* on the main diagonal are equal to the squares of the variances.
* With the approach I now use below, I always end up with the terms
* on the diagonal being twice as large as they should be, and this
* is so for either correlation or covariance matrices. Therefore,
* I can simply divide the terms on the diagonal by 2 in either case.

dataset activate EM1. /* bottom half of matrix of EM correlations.
MATRIX.
get CM / file = * / variables = !MyVarList / missing=0 .
* MISSING=0 on the previous line replaces the SYSMIS values with zeroes.
compute CM = CM + T(CM).
* At this point, the terms on the main diagonal are twice
* as large as they should be, so divide them by 2.
call setdiag(CM,DIAG(CM)/2).
*print CM / format = "f5.3".
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
* MSAVE saves the specified matrix as a matrix file, the
* sort of file needed as input to FACTOR.
* If you are modifying this code to use with a covariance
* matrix, change TYPE=CORR to TYPE=COV.
END MATRIX.
DATASET NAME EM2.

I suppose this could all be stuck in a macro with CORR vs COV as an argument. Maybe I'll do that someday if I need to do this with a covariance matrix.

Bruce Weaver wrote

After some off-list discussion with David, I inserted a PRINT after CM=CM+T(CM) and discovered that the main diagonal terms were all equal to 2 -- that is why Kirill included the CALL SETDIAG line in his solution. But interestingly, the MSAVE I used to write out the matrix seems to have fixed things, because the matrix in my EM2 dataset had 1's on the main diagonal.

dataset activate EM1. /* EM correlations from OMS.
MATRIX.
get CM / file = * / variables = !MyVarList / missing=0 .
compute CM=CM+T(CM).
print CM / format = "f5.3". /* 2's on the main diagonal at this point /*
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
END MATRIX.
DATASET NAME EM2. /* 1's on the main diagonal by this point /*

Meanwhile, it occurred to me this morning that I could replace missing with 1 when grabbing the matrix via GET, and then use element-wise multiplication (&*) of CM and T(CM). Now there is definitely no ned for SETDIAG. Here is a self-contained example, which demonstrates that it works just fine:

EDIT: Note that in the original version of this post, I showed %* as the element-wise multiplication function. It should have been &*, as in the example below!

NEW FILE.
DATASET CLOSE all.
MATRIX.
* Create lower half of a correlation matrix, but with 1's in the top half .
COMPUTE CM =
{1,1,1,1,1 ;
.508,1,1,1,1 ;
.347,.583,1,1,1;
.204,.243,.294,1,1;
.108,.166,.213,.250,1 }.
PRINT CM / format = "f5.3".
COMPUTE CM = CM &* T(CM).
PRINT CM / format = "f5.3".
MSAVE CM /TYPE=CORR /OUTFILE=* /VARIABLES=V1 to V5.
END MATRIX.
FORMATS V1 to V5 (f5.3).
LIST.

There are 1's on the main diagonal at all points, and here is the final result of the MSAVE:

ROWTYPE_ VARNAME_ V1 V2 V3 V4 V5

CORR V1 1.000 .508 .347 .204 .108
CORR V2 .508 1.000 .583 .243 .166
CORR V3 .347 .583 1.000 .294 .213
CORR V4 .204 .243 .294 1.000 .250
CORR V5 .108 .166 .213 .250 1.000

Bruce Weaver wrote

Another aha moment! Thank you Kirill for showing me that I was misunderstanding how the /MISSING sub-command works for GET. With /MISSING=0, I no longer need my recode out front.

I didn't really need the /NAMES=NAMES bit, because I have my variable list as a macro.

My final syntax (I think) looks like this, without the recode out front:

dataset activate EM1. /* EM correlations from OMS.
MATRIX.
get CM / file = * / variables = !MyVarList / missing=0 .
compute CM=CM+T(CM).
msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList.
END MATRIX.
DATASET NAME EM2.

Note that I use MSAVE rather than SAVE, as this gives me the ROWTYPE_ and VARNAME_ variables I need. (They are not present in the EM1 dataset obtained via OMS.)

Thanks fellas. ;-)

Kirill Orlov wrote

Bruce,
If you chose to do it via MATRIX, here it is.

matrix.
get m /vari= v1 to v5 /miss= 0 /names= names.
comp m= m+t(m).
print m.
call setdiag(m,1).
save m /outfile= * /names= names.
end matrix.

The above example takes only the matrix body - i.e. without variables
ROWTYPE_ and VARNAME_ (if you have such there) - but you can modify it
to take in those, too.

You can do a similar thing using MGET / MSAVE matrix statements, but I
don't recommend it ever.

26.09.2013 0:06, Bruce Weaver ?????:
> In another thread
> (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html),
> I suggested that one can use the EM correlations from MVA as input for
> FACTOR when one is doing exploratory factor analysis, but has missing data.
> As I've been exploring how to do that, I've run up against a small problem:
> The matrix of EM correlations can be captured via OMS, but it contains only
> the /lower half/ (and main diagonal) of the correlation matrix. But FACTOR
> wants the /full matrix/ as input. The only way I could think of to fill in
> the top half was with a little MATRIX program, like the one shown below.
> Dataset LowerHalf holds the lower half of the EM correlation matrix, and
> looks like this, for example:
>
> V1 V2 V3 V4 V5
> 1.000 . . . .
> .508 1.000 . . .
> .347 .583 1.000 . .
> .204 .243 .294 1.000 .
> .108 .166 .213 .250 1.000
>
> On first attempt to grab variables V1 to V5 with the GET command in the
> MATRIX program, it objected to the system missing values. So I first
> recoded SYSMIS to 99 in the upper half of the correlation matrix.
>
> DATASET ACTIVATE LowerHalf.
> recode V1 to V5 (sysmis=99).
> execute.
>
> MATRIX.
> get CM / file = * / variables = V1 to V5.
> print CM / format = "f5.3".
> loop r = 1 to nrow(CM)-1.
> loop c = 2 to ncol(CM).
> *compute CM(r,c) = CM(c,r).*
> end loop.
> end loop.
> print CM / format = "f5.3".
> msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5.
> END MATRIX.
>
> The two PRINT statements in the matrix program were included initially to
> check that things were working as expected. But later, I found that it
> didn't work properly if I removed them.
>
> So, I have two questions:
>
> 1. Is there some easier alternative to the double loop in a matrix program
> for filling in the top half of the correlation matrix? (I was thinking
> MCONVERT or something like that, but found nothing suitable.)
>
> 2. Any thoughts on why removal of the two PRINT commands in my matrix
> program is causing it to go all FUBAR? (I tried removing the PRINTs and
> including an EXECUTE after the double-loop, but that did not fix it.)
>
> Thanks,
> Bruce
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>