|
I have 69 separate variables that are all knowledge test questions (call
them v1 to v69) that are scored as correct (1) or incorrect (0). I would like to create a subscale (in this case a partial knowledge score) from these variables by randomly sampling 11 items out of the 69. The subscale is then computed by taking the mean of the 11 items. I would then like to use the subscale in a regression estimate (using other variables in my data as the DV and also other IV's) and want to save the regression results (specifically the 'CHANGE' in R Squ when I enter the subscale into the model in the 2nd block). The trick is that I want to repeat this procedure of creating a new random subscale and running and saving the regression estimates 1,000 times. This will enable me to create a pdf of improvement of model fit so that I can test whether a particular combination of the 11 item subscale is significantly better than other random 11 item combinations. I would greatly appreciate suggestions for syntax/macro code to automate this procedure. Thanks! ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
You would need to set the seed and the filespec for the output text. HOWEVER, are you sure you want to do this? If you do this as an exercise, see how the results compare to using reliability on all 69 items and whittling away 1) all those that lower the internal consistency reliability then 2) one or two items at a time until you are down to 11. Then try that in your regressions. That way you will work toward having as reliable a scale as you can. Are you sure that there is a single latent dimension in the set of items? Art Kendall Social Research Consultants *Create syntax to 1000 times randomly select 11 of 69 items to form a scale. set seed = 200908221. input program. vector SCALE (1000,f10.8). loop ITEM = 1 to 69. loop #k = 1 to 1000. compute SCALE(#k) = rv.uniform(0,2**31). end loop. end case. end loop. end file. end input program. RANK VARIABLES = SCALE1 TO SCALE1000 /RANK INTO RSCALE1 TO RSCALE1000. FORMATS RSCALE1 TO RSCALE1000 (F2). MULT RESPONSE GROUPS=$FIRST100 'FIRST 100' (rscale1 TO rscale100 (1,69)) /FREQUENCIES=$FIRST100. RECODE RSCALE1 TO RSCALE1000(1 THRU 11 =1)(ELSE=0) INTO KEEPER1 TO KEEPER1000. FORMATS KEEPER1 TO KEEPER1000 (F1). FLIP VARIABLES=KEEPER1 TO KEEPER1000. FORMATS VAR001 TO VAR069(F1). STRING SCALESTRING (A4). STRING INDSTRING (A2). COMPUTE SCALESTRING = STRING($CASENUM,N4). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /'COMPUTE SCALE' SCALESTRING ' = SUM('. COMPUTE ITEMCOUNT = 1. DO REPEAT INDEX = 1 TO 69/FLAG =VAR001 TO VAR069. DO IF ITEMCOUNT LT 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' + '. COMPUTE ITEMCOUNT = ITEMCOUNT+1. ELSE IF ITEMCOUNT EQ 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' ). '. END IF. END REPEAT. EXECUTE. Shayne wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDI have 69 separate variables that are all knowledge test questions (call them v1 to v69) that are scored as correct (1) or incorrect (0). I would like to create a subscale (in this case a partial knowledge score) from these variables by randomly sampling 11 items out of the 69. The subscale is then computed by taking the mean of the 11 items. I would then like to use the subscale in a regression estimate (using other variables in my data as the DV and also other IV's) and want to save the regression results (specifically the 'CHANGE' in R Squ when I enter the subscale into the model in the 2nd block). The trick is that I want to repeat this procedure of creating a new random subscale and running and saving the regression estimates 1,000 times. This will enable me to create a pdf of improvement of model fit so that I can test whether a particular combination of the 11 item subscale is significantly better than other random 11 item combinations. I would greatly appreciate suggestions for syntax/macro code to automate this procedure. Thanks! ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Art,
I read your reply with interest because the problem seemed especially challenging. I think I understand your solution with two exceptions. One problem that comes up is that of sampling items without replacement. My first question is how you solved that problem. My second question is about the purpose of the MULT RESPONSE command. What problem does that solve? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
The sampling without replacement was done by creating a 69 case( in this
instance the items) by 1000 (in this instance the proposed samples) matrix of uniform random numbers. The 1000 variables were then RANKed. The matrix of ranks was then RECODEd so that only 11 items. This is the same thing as doing 1000 sorts and finding the 11 highest random numbers. This provides a sample of fixed size without replacement. The MULT RESPONSE was a leftover procedure from checking that the syntax did what it was supposed to. Art Gene Maguin wrote: > Art, > > I read your reply with interest because the problem seemed especially > challenging. I think I understand your solution with two exceptions. One > problem that comes up is that of sampling items without replacement. My > first question is how you solved that problem. > > My second question is about the purpose of the MULT RESPONSE command. What > problem does that solve? > > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
|
Jill Stoltzfus wrote:
> > Hello everyone. I'm looking for information on calculating sample > size for Poisson regression and would appreciate your comments. > Specifically, what's the best formula to use? > > Thanks in advance for your help. > > Hi Jill: Perhaps you should try to Google a bit. For instance, I tried the following search "sample size Poisson regression", and the first answer looked very promising (provided you have access to Biometrika articles on-line, which I don't have right now, since I'm at home, not at the University). Here's the link to the article, titled "Sample size calculations for logistic and Poisson regression models": http://biomet.oxfordjournals.org/cgi/content/abstract/88/4/1193 HTH, Marta GG -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
|
In reply to this post by Jill Stoltzfus
Hi
all,
I need
to find all possible combinations of 8 dichotomous variables, chose the top 10
most frequently occurring, and also determine the extent to which any of the
combinations (not just the top 10) are associated with a continuous dependent
variable.
Is
there an easy way to compute these variables in SPSS? I do not have SPSS
Classification Trees.
Thanks,
Kim
|
|
Kim,
I don't know that there is a 'procedural' way to do this in spss. Here, though, is one way. Let your 8 dichotomous variables be x1 to x8, all coded 0,1 and with format F1.0. How they are coded doesn't matter but a consistent coding for all variables will help interpreting the patterns that result. Also, I assume missing is coded as 9 and there are no sysmis. String pattern(a8). Compute pattern=concat(string(x1,f1.0),string(x2,f1.0),string(x3,f1.0), string(x4,f1.0),string(x5,f1.0),string(x6,f1.0),string(x7,f1.0), string(x8,f1.0)). Frequencies pattern. Note. The maximum number of patterns for 8 dichotomous variables is 2**8=256 and for 8 trichotomous variables (0,1,9) is 3**8=6561. If a number of your variables have 9=missing, it would be better to do an aggregate command followed by a list command to get the frequencies because spss 16+ with java will probably choke on the frequencies (and if you try printing that frequency table you should consider it an overnight job). That takes care of the all possible combinations part. The top ten combinations will fall out from either the frequencies listing or from the aggregate+list commands. However, another way is to use the Rank command and just print the first ten rank values (but pay attention to how you treat ties because you should have quite a few, by definition.) Evaluating associations with a continuous variables is simply an Anova type operation but you have so many possible combinations that you MAY exceed the capacity of the anova type commands (GLM, Unianova, Means, etc). That said, combinations with one case are pointless to keep as you need two cases to compute the within group sum of squares. So, all combinations with n=1 can be discared immediately. Ok, this will get you started and other people will probably comment. Gene Maguin >>I need to find all possible combinations of 8 dichotomous variables, chose the top 10 most frequently occurring, and also determine the extent to which any of the combinations (not just the top 10) are associated with a continuous dependent variable. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi,
Below is the 'numerical' version of Gene's approach. An eight-digit variable called 'combination' is computed, which has to be interpreted by looking at the digit level. Digit #1 refers to variable x8, #2 to var x7, and so forth. * sample data. set rng = mt seed = 12345. input program. + loop #case = 1 to 1000. + compute x1 = rnd(rv.uniform(0,1)). + compute x2 = rnd(rv.uniform(0,1)). + compute x3 = rnd(rv.uniform(0,1)). + compute x4 = rnd(rv.uniform(0,1)). + compute x5 = rnd(rv.uniform(0,1)). + compute x6 = rnd(rv.uniform(0,1)). + compute x7 = rnd(rv.uniform(0,1)). + compute x8 = rnd(rv.uniform(0,1)). + end case. + end loop. + end file. end input program. exe. formats all (f1). * actual code. compute combination = x1 + x2 * 10**1 + x3 * 10**2 + x4 * 10**3 + x5 * 10**4 + x6 * 10**5 + x7 * 10**6 + x8 * 10**7. formats combination (n8). fre combination / formats = dfreq. /* combinations sorted from common to rare. aggr out = * / break = combination / n = n. show n. /* shows the number of actually occurring combinations. Cheers!! Albert-Jan --- On Mon, 8/24/09, Gene Maguin <[hidden email]> wrote: > From: Gene Maguin <[hidden email]> > Subject: Re: [SPSSX-L] All possible combinations of 8 dichotomous variables > To: [hidden email] > Date: Monday, August 24, 2009, 10:05 PM > Kim, > > I don't know that there is a 'procedural' way to do this in > spss. Here, > though, is one way. Let your 8 dichotomous variables be x1 > to x8, all coded > 0,1 and with format F1.0. How they are coded doesn't matter > but a consistent > coding for all variables will help interpreting the > patterns that result. > Also, I assume missing is coded as 9 and there are no > sysmis. > > String pattern(a8). > Compute > pattern=concat(string(x1,f1.0),string(x2,f1.0),string(x3,f1.0), > string(x4,f1.0),string(x5,f1.0),string(x6,f1.0),string(x7,f1.0), > string(x8,f1.0)). > > Frequencies pattern. > > Note. The maximum number of patterns for 8 dichotomous > variables is 2**8=256 > and for 8 > trichotomous variables (0,1,9) is 3**8=6561. If a number of > your variables > have 9=missing, it would be better to do an aggregate > command followed by a > list command to get the frequencies because spss 16+ with > java will probably > choke on the frequencies (and if you try printing that > frequency table you > should consider it an overnight job). > > That takes care of the all possible combinations part. > > The top ten combinations will fall out from either the > frequencies listing > or from the aggregate+list commands. However, another way > is to use the Rank > command and just print the first ten rank values (but pay > attention to how > you treat ties because you should have quite a few, by > definition.) > > Evaluating associations with a continuous variables is > simply an Anova type > operation but you have so many possible combinations that > you MAY exceed the > capacity of the anova type commands (GLM, Unianova, Means, > etc). That said, > combinations with one case are pointless to keep as you > need two cases to > compute the within group sum of squares. So, all > combinations with n=1 can > be discared immediately. > > Ok, this will get you started and other people will > probably comment. > > > Gene Maguin > > >>I need to find all possible combinations of 8 > dichotomous variables, chose > the top 10 most frequently occurring, and also determine > the extent to which > any of the combinations (not just the top 10) are > associated with a > continuous dependent variable. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Art Kendall
you can adapt it to your situation by changing the strings that are red or blue. The colors might disappear when you copy the syntax into SPSS/PASW. See my earlier response to Gene Maquin for the technique of sorting random numbers to get a sample without replacement. Art Kendall Social Research Consultants *Create syntax to 1000 times randomly select 11 of 69 items to form a scale. *set the seed for the random number generator. set seed = 200908221. input program. vector SCALE (1000,f10.8). loop ITEM = 1 to 69. loop #k = 1 to 1000. compute SCALE(#k) = rv.uniform(0,2**31). end loop. end case. end loop. end file. end input program. *there is now a 1000 variable by 69 case file of random number. *create 1000 new variables by RANKING the random numbers. RANK VARIABLES = SCALE1 TO SCALE1000 /RANK INTO RSCALE1 TO RSCALE1000. FORMATS RSCALE1 TO RSCALE1000 (F2). *double check the ranking. MULT RESPONSE GROUPS=$FIRST100 'FIRST 100' (rscale1 TO rscale100 (1,69)) /FREQUENCIES=$FIRST100. *create 1000 new variables that are flags for whether to keep the item in a scale. RECODE RSCALE1 TO RSCALE1000(1 THRU 11 =1)(ELSE=0) INTO KEEPER1 TO KEEPER1000. FORMATS KEEPER1 TO KEEPER1000 (F1). *Flip the matrix of flags that items go into scales. FLIP VARIABLES=KEEPER1 TO KEEPER1000. FORMATS VAR001 TO VAR069(F1). *there are now 1000 cases with flags indicating whether the item is on that scale. *begin the portion that writes a syntax file that you can INSERT into your existing syntax. *change the filespec to match your situation. *change the string ITEM to match your situation. STRING SCALESTRING (A4). STRING INDSTRING (A2). COMPUTE SCALESTRING = STRING($CASENUM,N4). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /'COMPUTE SCALE' SCALESTRING ' = SUM('. COMPUTE ITEMCOUNT = 1. DO REPEAT INDEX = 1 TO 69/FLAG =VAR001 TO VAR069. DO IF ITEMCOUNT LT 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' + '. COMPUTE ITEMCOUNT = ITEMCOUNT+1. ELSE IF ITEMCOUNT EQ 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' ). '. END IF. END REPEAT. EXECUTE. Shayne Gary wrote: Thanks Art! I'll let you know how the analysis turns out.===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
