Administrator
|
In another thread (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html), I suggested that one can use the EM correlations from MVA as input for FACTOR when one is doing exploratory factor analysis, but has missing data. As I've been exploring how to do that, I've run up against a small problem: The matrix of EM correlations can be captured via OMS, but it contains only the lower half (and main diagonal) of the correlation matrix. But FACTOR wants the full matrix as input. The only way I could think of to fill in the top half was with a little MATRIX program, like the one shown below. Dataset LowerHalf holds the lower half of the EM correlation matrix, and looks like this, for example:
V1 V2 V3 V4 V5 1.000 . . . . .508 1.000 . . . .347 .583 1.000 . . .204 .243 .294 1.000 . .108 .166 .213 .250 1.000 On first attempt to grab variables V1 to V5 with the GET command in the MATRIX program, it objected to the system missing values. So I first recoded SYSMIS to 99 in the upper half of the correlation matrix. DATASET ACTIVATE LowerHalf. recode V1 to V5 (sysmis=99). execute. MATRIX. get CM / file = * / variables = V1 to V5. print CM / format = "f5.3". loop r = 1 to nrow(CM)-1. loop c = 2 to ncol(CM). compute CM(r,c) = CM(c,r). end loop. end loop. print CM / format = "f5.3". msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5. END MATRIX. The two PRINT statements in the matrix program were included initially to check that things were working as expected. But later, I found that it didn't work properly if I removed them. So, I have two questions: 1. Is there some easier alternative to the double loop in a matrix program for filling in the top half of the correlation matrix? (I was thinking MCONVERT or something like that, but found nothing suitable.) 2. Any thoughts on why removal of the two PRINT commands in my matrix program is causing it to go all FUBAR? (I tried removing the PRINTs and including an EXECUTE after the double-loop, but that did not fix it.) Thanks, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Does the EM
Correlations procedure have a /MATRIX option?
Art Kendall Social Research ConsultantsOn 9/25/2013 4:06 PM, Bruce Weaver [via SPSSX Discussion] wrote: In another thread (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html), I suggested that one can use the EM correlations from MVA as input for FACTOR when one is doing exploratory factor analysis, but has missing data. As I've been exploring how to do that, I've run up against a small problem: The matrix of EM correlations can be captured via OMS, but it contains only the lower half (and main diagonal) of the correlation matrix. But FACTOR wants the full matrix as input. The only way I could think of to fill in the top half was with a little MATRIX program, like the one shown below. Dataset LowerHalf holds the lower half of the EM correlation matrix, and looks like this, for example:
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Bruce Weaver
Does something l;ike the following work?
-- DATA LIST LIST /V1 V2 V3 V4 V5 . BEGIN DATA 1.000 . . . . .508 1.000 . . . .347 .583 1.000 . . .204 .243 .294 1.000 . .108 .166 .213 .250 1.000 END DATA. DATASET NAME lower . DATASET DECLARE cout. DATASET ACTIVATE lower. MATRIX. get CM / file = lower / variables = V1 to V5/MISSING=ACCEPT / VALUE=0. COMPUTE CM=CM+T(CM). SAVE CM / OUTFILE COut . END MATRIX.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Art Kendall
Hi Art. The table of EM correlations is generated via the MVA command, like this:
DATASET DECLARE EM1. OMS /SELECT TABLES /IF COMMANDS=['MVA'] SUBTYPES=['EOUT_EM CORRELATIONS'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='EM1'. dataset activate raw. MVA VARIABLES= {variable list} /EM. OMSEND. I wrap it in OMS, because that's the only way I can see to get the EM correlations into a data file. You can add an OUTFILE option to the /EM sub-command to write a file of "raw" data with missing values imputed*, but I see no /MATRIX option. * As John Graham notes on p. 556 of the following, use of this imputed dataset is not recommended. One should use MI instead to get multiple imputed datasets. http://www.stats.ox.ac.uk/~snijders/Graham2009.pdf Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by David Marso
Or the following?
NEW FILE. DATASET CLOSE ALL. DATA LIST LIST /V1 V2 V3 V4 V5 . BEGIN DATA 1.000 . . . . .508 1.000 . . . .347 .583 1.000 . . .204 .243 .294 1.000 . .108 .166 .213 .250 1.000 END DATA. DATASET NAME lower. COMPUTE ID=$CASENUM. FLIP. DATASET NAME flipped. RENAME VARIABLES (var001 TO var005 = V1 TO V5). COMPUTE ID=$CASENUM. UPDATE FILE flipped / FILE lower/ BY ID. SELECT IF (CASE_LBL NE 'ID'). EXECUTE. DELETE VARIABLES ID. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Bruce Weaver
Bruce,
If you chose to do it via MATRIX, here it is. matrix. get m /vari= v1 to v5 /miss= 0 /names= names. comp m= m+t(m). print m. call setdiag(m,1). save m /outfile= * /names= names. end matrix. The above example takes only the matrix body - i.e. without variables ROWTYPE_ and VARNAME_ (if you have such there) - but you can modify it to take in those, too. You can do a similar thing using MGET / MSAVE matrix statements, but I don't recommend it ever. 26.09.2013 0:06, Bruce Weaver пишет:
In another thread (http://spssx-discussion.1045642.n5.nabble.com/Multiple-Imputation-td4994372.html), I suggested that one can use the EM correlations from MVA as input for FACTOR when one is doing exploratory factor analysis, but has missing data. As I've been exploring how to do that, I've run up against a small problem: The matrix of EM correlations can be captured via OMS, but it contains only the /lower half/ (and main diagonal) of the correlation matrix. But FACTOR wants the /full matrix/ as input. The only way I could think of to fill in the top half was with a little MATRIX program, like the one shown below. Dataset LowerHalf holds the lower half of the EM correlation matrix, and looks like this, for example: V1 V2 V3 V4 V5 1.000 . . . . .508 1.000 . . . .347 .583 1.000 . . .204 .243 .294 1.000 . .108 .166 .213 .250 1.000 On first attempt to grab variables V1 to V5 with the GET command in the MATRIX program, it objected to the system missing values. So I first recoded SYSMIS to 99 in the upper half of the correlation matrix. DATASET ACTIVATE LowerHalf. recode V1 to V5 (sysmis=99). execute. MATRIX. get CM / file = * / variables = V1 to V5. print CM / format = "f5.3". loop r = 1 to nrow(CM)-1. loop c = 2 to ncol(CM). *compute CM(r,c) = CM(c,r).* end loop. end loop. print CM / format = "f5.3". msave CM /TYPE=CORR /OUTFILE = * /VARIABLES=V1 to V5. END MATRIX. The two PRINT statements in the matrix program were included initially to check that things were working as expected. But later, I found that it didn't work properly if I removed them. So, I have two questions: 1. Is there some easier alternative to the double loop in a matrix program for filling in the top half of the correlation matrix? (I was thinking MCONVERT or something like that, but found nothing suitable.) 2. Any thoughts on why removal of the two PRINT commands in my matrix program is causing it to go all FUBAR? (I tried removing the PRINTs and including an EXECUTE after the double-loop, but that did not fix it.) Thanks, Bruce ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Filling-in-the-top-half-of-a-correlation-matrix-tp5722229.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Ah, that was interesting piece below. +1, David.
26.09.2013 1:12, David Marso пишет:
Or the following? NEW FILE. DATASET CLOSE ALL. DATA LIST LIST /V1 V2 V3 V4 V5 . BEGIN DATA 1.000 . . . . .508 1.000 . . . .347 .583 1.000 . . .204 .243 .294 1.000 . .108 .166 .213 .250 1.000 END DATA. DATASET NAME lower. COMPUTE ID=$CASENUM. FLIP. DATASET NAME flipped. RENAME VARIABLES (var001 TO var005 = V1 TO V5). COMPUTE ID=$CASENUM. UPDATE FILE flipped / FILE lower/ BY ID. SELECT IF (CASE_LBL NE 'ID'). EXECUTE. DELETE VARIABLES ID. LIST. |
Administrator
|
In reply to this post by David Marso
Aha...CM = CM + T(CM) was what I was looking for! Nice one, David. Thanks. I'm dead chuffed, as our British friends might say. ;-)
For some reason, /MISSING=ACCEPT / VALUE=0 is not working as expected--I end up with sysmis in the top half of the matrix. But no big deal. I can just recode sysmis to 0 before running the matrix program. So for the record, here's what it looks like now. !MyVarList is a macro defining the list of variables in the correlation matrix. ********************************************* . dataset activate EM1. /* the EM correlations from OMS. recode !MyVarList (sysmis=0). execute. MATRIX. get CM / file = * / variables = !MyVarList . compute CM=CM+T(CM). msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList. END MATRIX. DATASET NAME EM2. ********************************************* . There's still a bit to do after that (e.g., adding an N row with the count of cases in the raw data file), but that's all fairly straightforward. Here's an example of the final result in a matrix file ready for input to FACTOR: ROWTYPE_ VARNAME_ V1 V2 V3 V4 V5 N 235.0000 235.0000 235.0000 235.0000 235.0000 CORR V1 1.00000 .50810 .34713 .20362 .10753 CORR V2 .50810 1.00000 .58283 .24337 .16637 CORR V3 .34713 .58283 1.00000 .29423 .21310 CORR V4 .20362 .24337 .29423 1.00000 .24957 CORR V5 .10753 .16637 .21310 .24957 1.00000
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Bruce Weaver
Although, pairwise deletion
can result in a very problematic
matrix, it would be an interesting exercise to compare
that matrix to the one output by the MVA procedure, and the one
output by list deletion.
It might also be an interesting exercise to do an INDSCAL on matrices with listwise, pairwise, and each of the missing value imputation methods or from each of the different imputed data sets. Art Kendall Social Research ConsultantsOn 9/25/2013 5:12 PM, Bruce Weaver [via SPSSX Discussion] wrote: Hi Art. The table of EM correlations is generated via the MVA command, like this:
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Kirill Orlov
Another aha moment! Thank you Kirill for showing me that I was misunderstanding how the /MISSING sub-command works for GET. With /MISSING=0, I no longer need my recode out front.
I didn't really need the /NAMES=NAMES bit, because I have my variable list as a macro. My final syntax (I think) looks like this, without the recode out front: dataset activate EM1. /* EM correlations from OMS. MATRIX. get CM / file = * / variables = !MyVarList / missing=0 . compute CM=CM+T(CM). msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList. END MATRIX. DATASET NAME EM2. Note that I use MSAVE rather than SAVE, as this gives me the ROWTYPE_ and VARNAME_ variables I need. (They are not present in the EM1 dataset obtained via OMS.) Thanks fellas. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by Art Kendall
I was thinking along similar lines, Art. In a first draft of the syntax, I run the factor analysis using the EM correlations as input; but then I follow up using the raw data as input, and the following /MISSING options (in 3 separate models, obviously):
/MISSING LISTWISE /MISSING PAIRWISE /MISSING MEANSUB I figure we should be prepared in case any inquisitive members of the team want to know how the results from EM correlations differ from any of these other approaches (given that the latter approaches are probably more familiar to them). We don't have the complete dataset yet, but judging from a quick glance at what I do have to date, I don't think it is going to make a huge difference to the final solution in this case. But I'll still argue for using the EM correlations, given the well-known limitations of the other methods for dealing with missing data. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
This post was updated on .
In reply to this post by Bruce Weaver
After some off-list discussion with David, I inserted a PRINT after CM=CM+T(CM) and discovered that the main diagonal terms were all equal to 2 -- that is why Kirill included the CALL SETDIAG line in his solution. But interestingly, the MSAVE I used to write out the matrix seems to have fixed things, because the matrix in my EM2 dataset had 1's on the main diagonal.
dataset activate EM1. /* EM correlations from OMS. MATRIX. get CM / file = * / variables = !MyVarList / missing=0 . compute CM=CM+T(CM). print CM / format = "f5.3". /* 2's on the main diagonal at this point /* msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList. END MATRIX. DATASET NAME EM2. /* 1's on the main diagonal by this point /* Meanwhile, it occurred to me this morning that I could replace missing with 1 when grabbing the matrix via GET, and then use element-wise multiplication (&*) of CM and T(CM). Now there is definitely no ned for SETDIAG. Here is a self-contained example, which demonstrates that it works just fine: EDIT: Note that in the original version of this post, I showed %* as the element-wise multiplication function. It should have been &*, as in the example below! NEW FILE. DATASET CLOSE all. MATRIX. * Create lower half of a correlation matrix, but with 1's in the top half . COMPUTE CM = {1,1,1,1,1 ; .508,1,1,1,1 ; .347,.583,1,1,1; .204,.243,.294,1,1; .108,.166,.213,.250,1 }. PRINT CM / format = "f5.3". COMPUTE CM = CM &* T(CM). PRINT CM / format = "f5.3". MSAVE CM /TYPE=CORR /OUTFILE=* /VARIABLES=V1 to V5. END MATRIX. FORMATS V1 to V5 (f5.3). LIST. There are 1's on the main diagonal at all points, and here is the final result of the MSAVE: ROWTYPE_ VARNAME_ V1 V2 V3 V4 V5 CORR V1 1.000 .508 .347 .204 .108 CORR V2 .508 1.000 .583 .243 .166 CORR V3 .347 .583 1.000 .294 .213 CORR V4 .204 .243 .294 1.000 .250 CORR V5 .108 .166 .213 .250 1.000
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
At the risk of beating a dead horse, here is one final (I hope) variation on the matrix program for filling in the top half of my correlation matrix. In a nutshell, I have gone back to setting the SYSMIS values in the top half of the matrix to 0, using CM = CM + T(CM) to fill in the top half, and using CALL SETDIAG to restore the values on the main diagonal to their correct original values. The reason for changing back to this approach is that it is more general--e.g., it will work for a covariance matrix just as well as a correlation matrix. Using my pervious method with a covariance matrix would have resulted in the terms on the main diagonal being equal to the squares of the variances. With the approach I'm using now, they are twice the size of the correct values for both covariance and correlation matrices.
Thanks again to Kirill & David for their help. From my syntax file... * In an earlier version of this matrix program, I set the missing * values in the top half of the correlation matrix to 1, and then * filled in the top half by setting CM = CM &* T(CM), where CM = * the correlation matrix, &* is the element-wise multiplication * function, and T(CM) is the transpose of matrix CM. But I have * now changed it to fill in the top half as follows: * 1) Set the missing correlations in the top half of the matrix to 0; * 2) Store the main diagonal of matrix CM in vector D; * 3) Let CM = CM + T(CM), at which point there are 2's on the diagonal; * 4) Use CALL SETDIAG to set the diagonal to the values stored in D. * I have changed to this more general approach because it will also * work for covariance matrices, where the main diagonal holds variances, * not 1's. Using CM = CM &* T(CM) would give me a matrix where the terms * on the main diagonal are equal to the squares of the variances. * With the approach I now use below, I always end up with the terms * on the diagonal being twice as large as they should be, and this * is so for either correlation or covariance matrices. Therefore, * I can simply divide the terms on the diagonal by 2 in either case. dataset activate EM1. /* bottom half of matrix of EM correlations. MATRIX. get CM / file = * / variables = !MyVarList / missing=0 . * MISSING=0 on the previous line replaces the SYSMIS values with zeroes. compute CM = CM + T(CM). * At this point, the terms on the main diagonal are twice * as large as they should be, so divide them by 2. call setdiag(CM,DIAG(CM)/2). *print CM / format = "f5.3". msave CM /TYPE=CORR /OUTFILE=* /VARIABLES=!MyVarList. * MSAVE saves the specified matrix as a matrix file, the * sort of file needed as input to FACTOR. * If you are modifying this code to use with a covariance * matrix, change TYPE=CORR to TYPE=COV. END MATRIX. DATASET NAME EM2. I suppose this could all be stuck in a macro with CORR vs COV as an argument. Maybe I'll do that someday if I need to do this with a covariance matrix.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |