I don't have SPSS where I am right now, but thought somebody on this list
might have written syntax to do this. As a demo of uncertainty in, e.g., correlations, t-test, an article has suggested randomly sorting values of a variable many times. (This could be thought of as a reduced instance of parallel analysis for factor analysis. If I had access to SPSS right now, I would just cannibalize the syntax for parallel analysis.) The steps are. 1. compute a correlation or other stat between X and Y 2. loop 10,000 times -- 2.1 randomly sort Y leaving X in place -- 2.2 compute the correlation or other stat between X and Y. -- 2.3 put the stat in a file 3. end loop. 4. examine the distribution of the computed correlation or other stat. 5. report numerically and visually how the original correlation fits in the distribution. 6. compare (a) the SE by conventional methods of the original correlation with (b) the SE of the randomized correlation ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
The word you are looking for is "permutations". I have a macro that will
generate permutations of a particular variable and add them into a dataset, see https://dl.dropboxusercontent.com/s/2hgi2fqupeyorff/MACRO_Permutations.sps?dl=0. Below is an example applying that to a correlation coefficient to generate a reference distribution. ************************************************************************************************. DEFINE !PermData (Var = !TOKENS(1) /N = !TOKENS(1) /Base = !TOKENS(1) /File = !TOKENS(1) ) PRESERVE. SET MXLOOPS=!N. DATASET ACTIVATE !File. COMPUTE XX_TempID_XX = $casenum. MATRIX. GET z /FILE = * /VARIABLES = !Var. GET Id /FILE = * /VARIABLES = XX_TempID_XX. COMMENT generating permutation distributions. COMPUTE Res = MAKE(NROW(z),!N+1,0). COMPUTE Res(:,1) = Id. LOOP #I = 1 TO !N. COMPUTE zP = !PERMC(z). COMPUTE Res(:,#I+1) = zP. END LOOP. SAVE Res /VARIABLES = XX_TempID_XX !CONCAT(!Base,"1") TO !CONCAT(!Base,!N) /OUTFILE = *. END MATRIX. DATASET NAME XX_TempResults_XX. DATASET ACTIVATE !File. MATCH FILES FILE = * /FILE = 'XX_TempResults_XX' /BY XX_TempID_XX. DATASET CLOSE XX_TempResults_XX. MATCH FILES FILE = * /DROP XX_TempID_XX. RESTORE. !ENDDEFINE. *Permutates the order of a column vector (or rows of a matrix). DEFINE !PERMC (!POSITIONAL !ENCLOSE("(",")") ) (!1(GRADE(UNIFORM(NROW(!1),1)),:)) !ENDDEFINE. *Now making fake data to show permutation approach. SET SEED 10. INPUT PROGRAM. LOOP Id = 1 TO 1000. END CASE. END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME Sim. COMPUTE #Corr = 0.2. COMPUTE X = RV.NORMAL(0,1). COMPUTE Y = #Corr*X + RV.NORMAL(0,SQRT(1-#Corr**2)). FREQ X Y /FORMAT = NOTABLE /STATISTICS MEAN STDDEV. !PermData Var = Y N = 1000 Base = YPerm File = Sim. VARSTOCASES /MAKE YPerm FROM YPerm1 TO YPerm1000 /INDEX Perm. SORT CASES BY Perm Id. SPLIT FILE BY Perm. DATASET DECLARE Corrs. OMS /SELECT TABLES /IF SUBTYPES='Correlations' /DESTINATION FORMAT=SAV OUTFILE='Corrs' VIEWER=NO /TAG = 'CorrOut'. CORRELATIONS X WITH Y YPerm. OMSEND TAG='CorrOut'. SPLIT FILE OFF. DATASET ACTIVATE Corrs. SELECT IF Var3 = "Pearson Correlation". FORMATS YPerm Y (F3.2). EXECUTE. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=YPerm Y MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: YPerm=col(source(s), name("YPerm")) DATA: Y=col(source(s), name("Y")) TRANS: top=eval(100) TRANS: bot=eval(0) GUIDE: axis(dim(1), label("Correlation"), delta(0.05)) GUIDE: axis(dim(2), label("Frequency")) ELEMENT: interval(position(summary.count(bin.rect(YPerm))), shape.interior(shape.square)) ELEMENT: edge(position(region.spread.range(Y*(bot+top))), color.interior(color.red)) END GPL. ************************************************************************************************. Here you can see that the permutation distribution have correlations that are centered on zero and range from about -0.1 to 0.1. So in this sample the observed correlation of 0.17 is not very likely if the null of zero correlation were true. <http://spssx-discussion.1045642.n5.nabble.com/file/t329824/HistoPerm.png> ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Great! All *permutations* are even better than randomizations for a reference
distribution. When I get back I'll give this a try. I look forward to seeing how this compares to the conventional SE. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
In reply to this post by Art Kendall
I did precisely that as an experiment and the correlations were uniformly
miniscule. Why would you expect that the magnitude of the correlation would be preserved under a randomization of one of the two vectors? Here is my code. Maybe I csrewed the pooch along the way -Doubtful- ;-)? --- DEFINE !InputProgram (N !TOKENS (1) /Corr !TOKENS (1) /Vars !CMDEND ) MATRIX. SAVE MAKE(!N,2,0) /OUTFILE * / VARIABLES !Vars . END MATRIX. DO REPEAT v=!Vars. COMPUTE v=RV.NORMAL (0,1). END REPEAT. FACTOR VARIABLES !Vars / CRITERIA FACTORS(2)/ SAVE REG (2,FS). DELETE VARIABLES !Vars . RENAME VARIABLES (FS1 FS2 = !Vars ). MATRIX. COMPUTE R={1,!Corr;!Corr,1}. GET Data /FILE * / VARIABLES=!Vars. COMPUTE Chol_R=CHOL(R). COMPUTE CorrData=Data * Chol_R . SAVE CorrData /OUTFILE * / VARIABLES =!Vars. END MATRIX. CORRELATIONS VARIABLES x y. !ENDDEFINE. DEFINE !DoPermutations ( NIter !TOKENS (1) / OUTFILE !TOKENS (1) / Vars !CMDEND ) PRESERVE. FILE HANDLE fpFolder_With_ReadWriteRights /NAME='%userprofile%\Desktop' . CD fpFolder_With_ReadWriteRights . DATASET DECLARE !OUTFILE. SET MXLOOPS=!NIter. MATRIX. GET Data / FILE * /VARIABLES !Vars. COMPUTE N=NROW(Data). COMPUTE Sums=T(MAKE(N,1,1))*Data. COMPUTE S12Term=Sums(1)* Sums(2)/N . COMPUTE SDs=SQRT((DIAG(T(Data)*Data)-DIAG(T(Sums)*Sums)/N)/(N-1)). COMPUTE SD12= SDs(1) * SDs(2). COMPUTE Tx=T(Data(:,1)). COMPUTE y=Data(:,2). COMPUTE Perm_Y=y. LOOP #=1 TO !NIter. + COMPUTE g_y=GRADE(UNIFORM(N,1)). + LOOP ##= 1 TO N. + COMPUTE Y(##)=Perm_Y(g_y(##)). + END LOOP. + SAVE ((Tx*y-S12Term)/(N-1) / SD12) /OUTFILE !OUTFILE /VARIABLES Corr. END LOOP. END MATRIX. RESTORE. !ENDDEFINE. NEW FILE. DATASET CLOSE ALL. !InputProgram N =100 Corr =.5 Vars x y . !DoPermutations NIter=10000 OUTFILE=test Vars=x y . DATASET ACTIVATE test. DESCRIPTIVES VARIABLES Corr / STATISTICS ALL. Art Kendall wrote > I don't have SPSS where I am right now, but thought somebody on this list > might have written syntax to do this. > > As a demo of uncertainty in, e.g., correlations, t-test, an article has > suggested randomly sorting values of a variable many times. (This could be > thought of as a reduced instance of parallel analysis for factor analysis. > If I had access to SPSS right now, I would just cannibalize the syntax for > parallel analysis.) > > The steps are. > 1. compute a correlation or other stat between X and Y > 2. loop 10,000 times > -- 2.1 randomly sort Y leaving X in place > -- 2.2 compute the correlation or other stat between X and Y. > -- 2.3 put the stat in a file > 3. end loop. > 4. examine the distribution of the computed correlation or other stat. > 5. report numerically and visually how the original correlation fits in > the > distribution. > 6. compare (a) the SE by conventional methods of the original correlation > with (b) the SE of the randomized correlation > > > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD Art Kendall wrote > I don't have SPSS where I am right now, but thought somebody on this list > might have written syntax to do this. > > As a demo of uncertainty in, e.g., correlations, t-test, an article has > suggested randomly sorting values of a variable many times. (This could be > thought of as a reduced instance of parallel analysis for factor analysis. > If I had access to SPSS right now, I would just cannibalize the syntax for > parallel analysis.) > > The steps are. > 1. compute a correlation or other stat between X and Y > 2. loop 10,000 times > -- 2.1 randomly sort Y leaving X in place > -- 2.2 compute the correlation or other stat between X and Y. > -- 2.3 put the stat in a file > 3. end loop. > 4. examine the distribution of the computed correlation or other stat. > 5. report numerically and visually how the original correlation fits in > the > distribution. > 6. compare (a) the SE by conventional methods of the original correlation > with (b) the SE of the randomized correlation > > > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
David Marso wrote
> Why would you expect that the magnitude of the correlation would be > preserved under a randomization of one of the two vectors? > > That is exactly my question. > > Art, can you provide details on the article you mentioned? > > Cheers, > Bruce > > > Art Kendall wrote >> I don't have SPSS where I am right now, but thought somebody on this list >> might have written syntax to do this. >> >> As a demo of uncertainty in, e.g., correlations, t-test, an article has >> suggested randomly sorting values of a variable many times. (This could >> be >> thought of as a reduced instance of parallel analysis for factor >> analysis. >> If I had access to SPSS right now, I would just cannibalize the syntax >> for >> parallel analysis.) >> >> The steps are. >> 1. compute a correlation or other stat between X and Y >> 2. loop 10,000 times >> -- 2.1 randomly sort Y leaving X in place >> -- 2.2 compute the correlation or other stat between X and Y. >> -- 2.3 put the stat in a file >> 3. end loop. >> 4. examine the distribution of the computed correlation or other stat. >> 5. report numerically and visually how the original correlation fits in >> the >> distribution. >> 6. compare (a) the SE by conventional methods of the original correlation >> with (b) the SE of the randomized correlation >> >> >> >> >> >> ----- >> Art Kendall >> Social Research Consultants >> -- >> Sent from: http://spssx-discussion.1045642.n5.nabble.com/ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > > Art Kendall wrote >> I don't have SPSS where I am right now, but thought somebody on this list >> might have written syntax to do this. >> >> As a demo of uncertainty in, e.g., correlations, t-test, an article has >> suggested randomly sorting values of a variable many times. (This could >> be >> thought of as a reduced instance of parallel analysis for factor >> analysis. >> If I had access to SPSS right now, I would just cannibalize the syntax >> for >> parallel analysis.) >> >> The steps are. >> 1. compute a correlation or other stat between X and Y >> 2. loop 10,000 times >> -- 2.1 randomly sort Y leaving X in place >> -- 2.2 compute the correlation or other stat between X and Y. >> -- 2.3 put the stat in a file >> 3. end loop. >> 4. examine the distribution of the computed correlation or other stat. >> 5. report numerically and visually how the original correlation fits in >> the >> distribution. >> 6. compare (a) the SE by conventional methods of the original correlation >> with (b) the SE of the randomized correlation >> >> >> >> >> >> ----- >> Art Kendall >> Social Research Consultants >> -- >> Sent from: http://spssx-discussion.1045642.n5.nabble.com/ >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos > ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in > abyssum?" > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by David Marso
I would not expect the correlation to be preserved. I would expect that for
most randomizations of Y most r would be smaller. In the instance of a perfect correlation, all other orderings would yield smaller r's. The idea is to get a reference distribution with which to compare the obtained/original r. Since Andy's post I know know of 3 reference distributions. The goal is to see how plausible it is that the observed r simply due to randomness. 1) the conventional p. 2) Andy's using all permutations (at some point it would yield too many alternative variables, but would be fantastic for small N's 3) many randomizations of Y. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Why not bootstrap the correlations? It's not clear to me what the point of randomly permuted reference distributions would be. On Wed, Dec 20, 2017 at 9:57 AM, Art Kendall <[hidden email]> wrote: I would not expect the correlation to be preserved. I would expect that for |
"Random permutation" is one justification that Fisher used for the F-test. Assuming
near-normal distributions, the F reproduces the p-values that would be achieved
by randomization. Art wants to show natural variability of results.
However, for an exercise that demonstrates the variability of results, I think I would start with something other than real data. And I would plan on plotting the results. If you repeat the experiment with ordinary "normal" data, all the
plots will have predictable variability, with greater variance (S.E.) for smaller N.
If you take two distributions that are exponential, rather than normal, the set of
correlations has a fatter tail. Or if you start with "normal" and a moderate number of
extreme outliers, your set of correlations will by symmetric but will better match a plot based on smaller Ns (not much more than the count of outliers). (Say, 90 cases with SD=1,
10 cases with SD=4. "Mixtures of distributions" like this are sometimes used for testing
robustness of proposed tests.)
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Wednesday, December 20, 2017 12:26 PM To: [hidden email] Subject: Re: syntax for correlation randomly sort Y, say, 10,000 times Why not bootstrap the correlations? It's not clear to me what the point of randomly permuted reference distributions would be.
...
|
the article is
Grice, J. W. (2014) Observation Oriented Modeling /Comprehensive Psychology/, 3,3. *This approach focuses on the degree to which the expected model exactly fits individual cases. * IMO the article reinvents some wheels and is based on some "straw man" caricatures of psychological research. It uses other terminology to suggest crosstabs for pairs of items in items for Likert scales, DFA classification phase to further look at t-test data, and vertically displaying histograms in the same graphs. Some of the ideas would be useful in getting students/clients to think of stat as more than magic tools. In sum, it has some ideas that would widen perspectives, but those ideas can be implemented in many existing stat packages. But the presentation that these ideas should replace rather than extend conventional stat/methods teaching hits of disingenuity. The author has the data available as SPSS files, so could have been clearer when (s)he was discussing cases, variables, and values. I drafted a brief overview of the article for the other functional consultants in Statistics Without Borders which I can send if you are interested. Jon's suggestion of bootstrapping is another good way to help students get a broader understanding of correlations etc. P.S. on another list someone asked for an example of "odd vocabulary". I replied the use of "Crossed Observations" for "crosstabs". It is a good idea when presenting crosstabs to make it explicit that one is creating a table (matrix) of how values of one variable go together with values of another variable. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
Thanks Art. It appears to be open access, so anyone who is interested should
be able to view it here: http://journals.sagepub.com/doi/full/10.2466/05.08.IT.3.3 http://journals.sagepub.com/doi/pdf/10.2466/05.08.IT.3.3 Cheers, Bruce Art Kendall wrote > the article is > Grice, J. W. (2014) Observation Oriented Modeling /Comprehensive > Psychology/, 3,3. > > *This approach focuses on the degree to which the expected model exactly > fits individual cases. * > > > > IMO the article reinvents some wheels and is based on some "straw man" > caricatures of psychological research. It uses other terminology to > suggest > crosstabs for pairs of items in items for Likert scales, DFA > classification > phase to further look at t-test data, and vertically displaying > histograms > in the same graphs. Some of the ideas would be useful in getting > students/clients to think of stat as more than magic tools. > > In sum, it has some ideas that would widen perspectives, but those ideas > can > be implemented in many existing stat packages. But the presentation that > these ideas should replace rather than extend conventional stat/methods > teaching hits of disingenuity. > > The author has the data available as SPSS files, so could have been > clearer > when (s)he was discussing cases, variables, and values. > > I drafted a brief overview of the article for the other functional > consultants in Statistics Without Borders which I can send if you are > interested. > > Jon's suggestion of bootstrapping is another good way to help students get > a > broader understanding of correlations etc. > > P.S. on another list someone asked for an example of "odd vocabulary". I > replied the use of "Crossed Observations" for "crosstabs". It is a good > idea when presenting crosstabs to make it explicit that one is creating a > table (matrix) of how values of one variable go together with values of > another variable. > > > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
This approach has been brought to the attention of the Functional Consultants
for Statistics Without Borders. This note has details of techniques suggested in the article. At least most if not all of this can be done in SPSS. Brief description of Observation Oriented Modeling. This article uses the word “observation” in the sense of the value of a measurement on a case/entity/respondent/unit of analysis. It does not refer to methods of recording codes/variable values for behaviors, etc. It does not refer to ‘observations’ as the term for cases/entities/respondents. It rightfully decries that models and techniques of are thought of in a very mechanistic way and that insufficient attention is paid to the meaning of the variables and the questions the models represent. It reinforces the idea that a statistical model does not necessarily identically fit every case used in building the model. It advocates replacing conventional methods by examining the ‘accuracy’ with which a model fits cases. It emphasizes looking a the particular/concrete rather than the general/abstract portions of a model. It advocates examining data visually rather than with equations. In much of psychology and other social sciences, it is customary to look at both the statistical model and how well it fits individual cases. It is also customary to look at the data both numerically and visually. Details. I have seen these approaches since at least the mid 70s. It uses a variety of techniques to get at how “accurate” a model is. It emphasizes the “percent correctly classified”. Although it does not use these words, it is much the same thing as “flipping” the roles of an independent variable with 2 values and a continuous dependent variable. In practice this is conventionally done by following a t-test with a 2 group discriminant function analysis (DFA). The estimation phase would calculate predicted scores and assigned group membership for each cases. The classification phase of the DFA would crosstab the original group membership and the membership assigned by the DFA. It talks about creating a reference distribution for a correlation by randomly reordering one of the variables many times. Again, not in these words. This is a lot like jackknifing, and bootstrapping to enhance understanding of the uncertainty inherent in a model. It is also like the parallel analysis typically done in principal component and principal factor analysis, Only with two variables rather than many. To look at the data for 2 groups and one variable, it suggests using side-by-side horizontal bar graphs. The vertical axis represents the variable. The portion of the bar representing exact fir to the hypothesis is shaded. It calls this a multigram. It suggests progressively coarsening measurement by collapsing variables to see how that changes the visual impression. It suggests cross tabulating pairs of individual items in a summative scale. It suggests cross tabulating a pair of continuous variables and shading cells to see what the picture would look like IF there were perfect correlation/fit. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Just to weigh in on this, it's not the values of the correlations that need to be greater, but the predictive value arising from the correlations with Y against the predictive value arising from the correlations with YPerm. The entire idea is to examine the proportion of accurate predictive capacities of the original correlation with respect to the proportion of accurate predictive capacities based on randomization. Art's explanation below is, as he suggests, shorthand for accomplishing the same outcome. I used a dataset of my own and using Art's method below correctly reproduced 86.7% of the original treatment group membership. With OOM, I reproduced 86.3%. Art's method is a lot more straightforward, easy to develop syntax for, and seems to produce nearly the same values. When I added the option of adding clustered bar charts to the output, I was able to determine relative accuracy in reproducing each of the treatment groups, e.g., the predictive equation was more accurate in reproducing the control group than the treatment group, akin to sensitivity and specificity.
Brian ________________________________________ From: SPSSX(r) Discussion [[hidden email]] on behalf of Art Kendall [[hidden email]] Sent: Monday, January 08, 2018 10:21 AM To: [hidden email] Subject: Re: syntax for correlation randomly sort Y, say, 10,000 times This approach has been brought to the attention of the Functional Consultants for Statistics Without Borders. This note has details of techniques suggested in the article. At least most if not all of this can be done in SPSS. Brief description of Observation Oriented Modeling. This article uses the word “observation” in the sense of the value of a measurement on a case/entity/respondent/unit of analysis. It does not refer to methods of recording codes/variable values for behaviors, etc. It does not refer to ‘observations’ as the term for cases/entities/respondents. It rightfully decries that models and techniques of are thought of in a very mechanistic way and that insufficient attention is paid to the meaning of the variables and the questions the models represent. It reinforces the idea that a statistical model does not necessarily identically fit every case used in building the model. It advocates replacing conventional methods by examining the ‘accuracy’ with which a model fits cases. It emphasizes looking a the particular/concrete rather than the general/abstract portions of a model. It advocates examining data visually rather than with equations. In much of psychology and other social sciences, it is customary to look at both the statistical model and how well it fits individual cases. It is also customary to look at the data both numerically and visually. Details. I have seen these approaches since at least the mid 70s. It uses a variety of techniques to get at how “accurate” a model is. It emphasizes the “percent correctly classified”. Although it does not use these words, it is much the same thing as “flipping” the roles of an independent variable with 2 values and a continuous dependent variable. In practice this is conventionally done by following a t-test with a 2 group discriminant function analysis (DFA). The estimation phase would calculate predicted scores and assigned group membership for each cases. The classification phase of the DFA would crosstab the original group membership and the membership assigned by the DFA. It talks about creating a reference distribution for a correlation by randomly reordering one of the variables many times. Again, not in these words. This is a lot like jackknifing, and bootstrapping to enhance understanding of the uncertainty inherent in a model. It is also like the parallel analysis typically done in principal component and principal factor analysis, Only with two variables rather than many. To look at the data for 2 groups and one variable, it suggests using side-by-side horizontal bar graphs. The vertical axis represents the variable. The portion of the bar representing exact fir to the hypothesis is shaded. It calls this a multigram. It suggests progressively coarsening measurement by collapsing variables to see how that changes the visual impression. It suggests cross tabulating pairs of individual items in a summative scale. It suggests cross tabulating a pair of continuous variables and shading cells to see what the picture would look like IF there were perfect correlation/fit. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |