|
Hi - I've got a large dataset (over 500 variables, 150K rows) and would like to detect
a) variables that are highly correlated with one another b) linear combinations of variables likely to cause conditioning problems/failed pos.def. correlation matrices. Whether I'm sampling or not, CORRELATIONS procedure won't take more than 100 variables, and wouldn't help with b), so I'm working with FACTOR and / EXTRACTION PC. Question: --------- Before chiseling the wheel, does someone have the code handy to produce the linear combination coefficients of the input variables leading to singularities? Thanks. Marc. Hotmail® has ever-growing storage! Don’t worry about storage limits. Check it out. |
|
When I pseudorandomly generate 150 cases with 550 variables, I of
course get singularities.
Please describe the nature of your data. Then we may be able to make suggestions. Are these some sort of repeated measures, e.g., items intended to be in scales, prices over time, energy at different wave-lengths, etc? RELIABILITY can be useful for tracking down singularities. Open a new instance of SPSS. Copy the syntax below to a syntax file. Click <run>. Click <all>. Then go back to the syntax and put fewer items into the scale. Finally try using just 150. You will see that the SMC squared multiple correlation column now has entries, But they are all 1.000. You can edit the RELIABILITY syntax to produce the whole correlation matrix, but in this instance that would be futile. new file. input program. vector x (550,f3). loop id = 1 to 150. loop #p = 1 to 550. compute x(#p) = rnd(rv.normal(50,10)). end loop. end case. end loop. end file. end input program. reliability variables= x1 to x550 /scale (bigbunch) = x1 to x550 /SUMMARY =all. Art Kendall Social Research Consultants M wrote: Hi - I've got a large dataset (over 500 variables, 150K rows) and would like to detect===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by M-24
Get your preferred Email name! Now you can @ymail.com and @rocketmail.com. |
|
Art Kendall Social Research Eins Bernardo wrote:
Art Kendall
Social Research Consultants |
|
Thanks for the code & advice on RELIABILITY.
The data in question is census-based data at the zip level. All values are proportions, no missing. There are several groups of variables (ethnicity, household descriptors and such). The highest correlations are around 0.98. One short cut is sample down the data, work through these variable groups by listing pairs of variables with a correlation coef > 0.95, and decide which variables to drop. A PC analysis within these variable families can help get an idea of redundancies. A canonical corr. analysis would probably help too when checking accross groups. SAS' PROC PRINCOMP would provide a linear combination of the columns when it detected a singularity - if I remember well - , and that's the kind of fast diagnostic output I had in mind at this point. Regards, Marc. Date: Sun, 7 Jun 2009 07:37:46 -0400 From: [hidden email] Subject: Re: detecting linear combinations/high correlations in a data set To: [hidden email] Art Kendall Social Research Eins Bernardo wrote:
Hotmail® has ever-growing storage! Don’t worry about storage limits. Check it out. |
|
What you have is compositional data.� Each subset sums to 1.00.� Therefore each subset will fail to have an inverse. If I understand correctly correspondence analysis can deal with compositional data.� An N-battery canonical correlation might be useful if you dropped one variable from each subset.� I have not done this but I have heard that CATEGORIES can do this. People from the Leiden group are sometimes on this list.� You may also want to post asking about analyzing several kinds of compositional data on the Classification Society list. http://lists.sunysb.edu/index.cgi?A0=CLASS-L Art Kendall Social Research Consultants M wrote: Thanks for the code & advice on RELIABILITY.===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
| Free forum by Nabble | Edit this page |
