SPSSX Discussion

Computing 1,000 random subscales from existing variables

Classic

List

Threaded

11 messages Options

Shayne-5

Computing 1,000 random subscales from existing variables

I have 69 separate variables that are all knowledge test questions (call
them v1 to v69) that are scored as correct (1) or incorrect (0). I would
like to create a subscale (in this case a partial knowledge score) from
these variables by randomly sampling 11 items out of the 69. The subscale is
then computed by taking the mean of the 11 items.

I would then like to use the subscale in a regression estimate (using other
variables in my data as the DV and also other IV's) and want to save the
regression results (specifically the 'CHANGE' in R Squ when I enter the
subscale into the model in the 2nd block).

The trick is that I want to repeat this procedure of creating a new random
subscale and running and saving the regression estimates 1,000 times. This
will enable me to create a pdf of improvement of model fit so that I can
test whether a particular combination of the 11 item subscale is
significantly better than other random 11 item combinations.

I would greatly appreciate suggestions for syntax/macro code to
automate this procedure. Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Computing 1,000 random subscales from existing variables

I cobbled together syntax to sample exactly 11 of 69 items 1000 times. see below the sig block.
You would need to set the seed and the filespec for the output text.

HOWEVER, are you sure you want to do this?
If you do this as an exercise, see how the results compare to using reliability on all 69 items and whittling away 1) all those that lower the internal consistency reliability then 2) one or two items at a time until you are down to 11. Then try that in your regressions.
That way you will work toward having as reliable a scale as you can.

Are you sure that there is a single latent dimension in the set of items?

Art Kendall
Social Research Consultants

*Create syntax to 1000 times randomly select 11 of 69 items to form a scale. set seed = 200908221. input program. vector SCALE (1000,f10.8). loop ITEM = 1 to 69. loop #k = 1 to 1000. compute SCALE(#k) = rv.uniform(0,2**31). end loop. end case. end loop. end file. end input program. RANK VARIABLES = SCALE1 TO SCALE1000 /RANK INTO RSCALE1 TO RSCALE1000. FORMATS RSCALE1 TO RSCALE1000 (F2). MULT RESPONSE GROUPS=$FIRST100 'FIRST 100' (rscale1 TO rscale100 (1,69)) /FREQUENCIES=$FIRST100. RECODE RSCALE1 TO RSCALE1000(1 THRU 11 =1)(ELSE=0) INTO KEEPER1 TO KEEPER1000. FORMATS KEEPER1 TO KEEPER1000 (F1). FLIP VARIABLES=KEEPER1 TO KEEPER1000. FORMATS VAR001 TO VAR069(F1). STRING SCALESTRING (A4). STRING INDSTRING (A2). COMPUTE SCALESTRING = STRING($CASENUM,N4). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /'COMPUTE SCALE' SCALESTRING ' = SUM('. COMPUTE ITEMCOUNT = 1. DO REPEAT INDEX = 1 TO 69/FLAG =VAR001 TO VAR069. DO IF ITEMCOUNT LT 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' + '. COMPUTE ITEMCOUNT = ITEMCOUNT+1. ELSE IF ITEMCOUNT EQ 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' ). '. END IF. END REPEAT. EXECUTE.

Shayne wrote:

I have 69 separate variables that are all knowledge test questions (call
them v1 to v69) that are scored as correct (1) or incorrect (0). I would
like to create a subscale (in this case a partial knowledge score) from
these variables by randomly sampling 11 items out of the 69. The subscale is
then computed by taking the mean of the 11 items.

I would then like to use the subscale in a regression estimate (using other
variables in my data as the DV and also other IV's) and want to save the
regression results (specifically the 'CHANGE' in R Squ when I enter the
subscale into the model in the 2nd block).

The trick is that I want to repeat this procedure of creating a new random
subscale and running and saving the regression estimates 1,000 times. This
will enable me to create a pdf of improvement of model fit so that I can
test whether a particular combination of the 11 item subscale is
significantly better than other random 11 item combinations.

I would greatly appreciate suggestions for syntax/macro code to
automate this procedure. Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Maguin, Eugene

Re: Computing 1,000 random subscales from existing variables

Art,

I read your reply with interest because the problem seemed especially
challenging. I think I understand your solution with two exceptions. One
problem that comes up is that of sampling items without replacement. My
first question is how you solved that problem.

My second question is about the purpose of the MULT RESPONSE command. What
problem does that solve?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Computing 1,000 random subscales from existing variables

The sampling without replacement was done by creating a 69 case( in this
instance the items) by 1000 (in this instance the proposed samples)
matrix of uniform random numbers.
The 1000 variables were then RANKed. The matrix of ranks was then
RECODEd so that only 11 items. This is the same thing as doing 1000
sorts and finding the 11 highest random numbers. This provides a sample
of fixed size without replacement.
The MULT RESPONSE was a leftover procedure from checking that the syntax
did what it was supposed to.

Art

Gene Maguin wrote:

> Art,
>
> I read your reply with interest because the problem seemed especially
> challenging. I think I understand your solution with two exceptions. One
> problem that comes up is that of sampling items without replacement. My
> first question is how you solved that problem.
>
> My second question is about the purpose of the MULT RESPONSE command. What
> problem does that solve?
>
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Jill Stoltzfus

Sample size for Poisson regression

Hello everyone. I'm looking for information on calculating sample size for Poisson regression and would appreciate your comments. Specifically, what's the best formula to use?

Thanks in advance for your help.

Jill

Marta Garcia-Granero

Re: Sample size for Poisson regression

Jill Stoltzfus wrote:
>
> Hello everyone. I'm looking for information on calculating sample
> size for Poisson regression and would appreciate your comments.
> Specifically, what's the best formula to use?
>
> Thanks in advance for your help.
>
>
Hi Jill:

Perhaps you should try to Google a bit. For instance, I tried the
following search "sample size Poisson regression", and the first answer
looked very promising (provided you have access to Biometrika articles
on-line, which I don't have right now, since I'm at home, not at the
University).

Here's the link to the article, titled "Sample size calculations for
logistic and Poisson regression models":

http://biomet.oxfordjournals.org/cgi/content/abstract/88/4/1193

HTH,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jill Stoltzfus

Re: Sample size for Poisson regression

Thanks, Marta. I saw that Biometrika reference myself while I was searching online and will check it out.

Jill

--- On Mon, 8/24/09, Marta García-Granero <[hidden email]> wrote:

From: Marta García-Granero <[hidden email]>
Subject: Re: Sample size for Poisson regression
To: [hidden email]
Date: Monday, August 24, 2009, 12:25 PM

Jill Stoltzfus wrote:
>
>    Hello everyone. I'm looking for information on calculating sample
>    size for Poisson regression and would appreciate your comments.
>    Specifically, what's the best formula to use?
>
>    Thanks in advance for your help.
>
>
Hi Jill:

Perhaps you should try to Google a bit. For instance, I tried the
following search "sample size Poisson regression", and the first answer
looked very promising (provided you have access to Biometrika articles
on-line, which I don't have right now, since I'm at home, not at the
University).

Here's the link to the article, titled "Sample size calculations for
logistic and Poisson regression models":

http://biomet.oxfordjournals.org/cgi/content/abstract/88/4/1193

HTH,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kim Jinnett

Re: All possible combinations of 8 dichotomous variables

In reply to this post by Jill Stoltzfus

Hi all,

I need to find all possible combinations of 8 dichotomous variables, chose the top 10 most frequently occurring, and also determine the extent to which any of the combinations (not just the top 10) are associated with a continuous dependent variable.

Is there an easy way to compute these variables in SPSS? I do not have SPSS Classification Trees.

Thanks,

Kim

Maguin, Eugene

Re: All possible combinations of 8 dichotomous variables

Kim,

I don't know that there is a 'procedural' way to do this in spss. Here,
though, is one way. Let your 8 dichotomous variables be x1 to x8, all coded
0,1 and with format F1.0. How they are coded doesn't matter but a consistent
coding for all variables will help interpreting the patterns that result.
Also, I assume missing is coded as 9 and there are no sysmis.

String pattern(a8).
Compute pattern=concat(string(x1,f1.0),string(x2,f1.0),string(x3,f1.0),
string(x4,f1.0),string(x5,f1.0),string(x6,f1.0),string(x7,f1.0),
string(x8,f1.0)).

Frequencies pattern.

Note. The maximum number of patterns for 8 dichotomous variables is 2**8=256
and for 8
trichotomous variables (0,1,9) is 3**8=6561. If a number of your variables
have 9=missing, it would be better to do an aggregate command followed by a
list command to get the frequencies because spss 16+ with java will probably
choke on the frequencies (and if you try printing that frequency table you
should consider it an overnight job).

That takes care of the all possible combinations part.

The top ten combinations will fall out from either the frequencies listing
or from the aggregate+list commands. However, another way is to use the Rank
command and just print the first ten rank values (but pay attention to how
you treat ties because you should have quite a few, by definition.)

Evaluating associations with a continuous variables is simply an Anova type
operation but you have so many possible combinations that you MAY exceed the
capacity of the anova type commands (GLM, Unianova, Means, etc). That said,
combinations with one case are pointless to keep as you need two cases to
compute the within group sum of squares. So, all combinations with n=1 can
be discared immediately.

Ok, this will get you started and other people will probably comment.

Gene Maguin

>>I need to find all possible combinations of 8 dichotomous variables, chose
the top 10 most frequently occurring, and also determine the extent to which
any of the combinations (not just the top 10) are associated with a
continuous dependent variable.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Albert-Jan Roskam

Re: All possible combinations of 8 dichotomous variables

Hi,

Below is the 'numerical' version of Gene's approach. An eight-digit variable called 'combination' is computed, which has to be interpreted by looking at the digit level. Digit #1 refers to variable x8, #2 to var x7, and so forth.

* sample data.
set rng = mt seed = 12345.
input program.
+ loop #case = 1 to 1000.
+ compute x1 = rnd(rv.uniform(0,1)).
+ compute x2 = rnd(rv.uniform(0,1)).
+ compute x3 = rnd(rv.uniform(0,1)).
+ compute x4 = rnd(rv.uniform(0,1)).
+ compute x5 = rnd(rv.uniform(0,1)).
+ compute x6 = rnd(rv.uniform(0,1)).
+ compute x7 = rnd(rv.uniform(0,1)).
+ compute x8 = rnd(rv.uniform(0,1)).
+ end case.
+ end loop.
+ end file.
end input program.
exe.
formats all (f1).

* actual code.
compute combination = x1 + x2 * 10**1 + x3 * 10**2 + x4 * 10**3 + x5 * 10**4 + x6 * 10**5 + x7 * 10**6 + x8 * 10**7.
formats combination (n8).
fre combination / formats = dfreq. /* combinations sorted from common to rare.
aggr out = * / break = combination / n = n.
show n. /* shows the number of actually occurring combinations.

Cheers!!
Albert-Jan

--- On Mon, 8/24/09, Gene Maguin <[hidden email]> wrote:

> From: Gene Maguin <[hidden email]>
> Subject: Re: [SPSSX-L] All possible combinations of 8 dichotomous variables
> To: [hidden email]
> Date: Monday, August 24, 2009, 10:05 PM
> Kim,
>
> I don't know that there is a 'procedural' way to do this in
> spss. Here,
> though, is one way. Let your 8 dichotomous variables be x1
> to x8, all coded
> 0,1 and with format F1.0. How they are coded doesn't matter
> but a consistent
> coding for all variables will help interpreting the
> patterns that result.
> Also, I assume missing is coded as 9 and there are no
> sysmis.
>
> String pattern(a8).
> Compute
> pattern=concat(string(x1,f1.0),string(x2,f1.0),string(x3,f1.0),
> string(x4,f1.0),string(x5,f1.0),string(x6,f1.0),string(x7,f1.0),
> string(x8,f1.0)).
>
> Frequencies pattern.
>
> Note. The maximum number of patterns for 8 dichotomous
> variables is 2**8=256
> and for 8
> trichotomous variables (0,1,9) is 3**8=6561. If a number of
> your variables
> have 9=missing, it would be better to do an aggregate
> command followed by a
> list command to get the frequencies because spss 16+ with
> java will probably
> choke on the frequencies (and if you try printing that
> frequency table you
> should consider it an overnight job).
>
> That takes care of the all possible combinations part.
>
> The top ten combinations will fall out from either the
> frequencies listing
> or from the aggregate+list commands. However, another way
> is to use the Rank
> command and just print the first ten rank values (but pay
> attention to how
> you treat ties because you should have quite a few, by
> definition.)
>
> Evaluating associations with a continuous variables is
> simply an Anova type
> operation but you have so many possible combinations that
> you MAY exceed the
> capacity of the anova type commands (GLM, Unianova, Means,
> etc). That said,
> combinations with one case are pointless to keep as you
> need two cases to
> compute the within group sum of squares. So, all
> combinations with n=1 can
> be discared immediately.
>
> Ok, this will get you started and other people will
> probably comment.
>
>
> Gene Maguin
>
> >>I need to find all possible combinations of 8
> dichotomous variables, chose
> the top 10 most frequently occurring, and also determine
> the extent to which
> any of the combinations (not just the top 10) are
> associated with a
> continuous dependent variable.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall-2

Re: Computing 1,000 random subscales from existing variables

In reply to this post by Art Kendall

I have inserted comments to help clarify the process.
you can adapt it to your situation by changing the strings that are red or blue.
The colors might disappear when you copy the syntax into SPSS/PASW.

See my earlier response to Gene Maquin for the technique of sorting random numbers to get a sample without replacement.

Art Kendall
Social Research Consultants*Create syntax to 1000 times randomly select 11 of 69 items to form a scale. *set the seed for the random number generator. set seed = 200908221. input program. vector SCALE (1000,f10.8). loop ITEM = 1 to 69. loop #k = 1 to 1000. compute SCALE(#k) = rv.uniform(0,2**31). end loop. end case. end loop. end file. end input program. *there is now a 1000 variable by 69 case file of random number. *create 1000 new variables by RANKING the random numbers. RANK VARIABLES = SCALE1 TO SCALE1000 /RANK INTO RSCALE1 TO RSCALE1000. FORMATS RSCALE1 TO RSCALE1000 (F2). *double check the ranking. MULT RESPONSE GROUPS=$FIRST100 'FIRST 100' (rscale1 TO rscale100 (1,69)) /FREQUENCIES=$FIRST100. *create 1000 new variables that are flags for whether to keep the item in a scale. RECODE RSCALE1 TO RSCALE1000(1 THRU 11 =1)(ELSE=0) INTO KEEPER1 TO KEEPER1000. FORMATS KEEPER1 TO KEEPER1000 (F1). *Flip the matrix of flags that items go into scales. FLIP VARIABLES=KEEPER1 TO KEEPER1000. FORMATS VAR001 TO VAR069(F1). *there are now 1000 cases with flags indicating whether the item is on that scale. *begin the portion that writes a syntax file that you can INSERT into your existing syntax. *change the filespec to match your situation. *change the string ITEM to match your situation. STRING SCALESTRING (A4). STRING INDSTRING (A2). COMPUTE SCALESTRING = STRING($CASENUM,N4). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /'COMPUTE SCALE' SCALESTRING ' = SUM('. COMPUTE ITEMCOUNT = 1. DO REPEAT INDEX = 1 TO 69/FLAG =VAR001 TO VAR069. DO IF ITEMCOUNT LT 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' + '. COMPUTE ITEMCOUNT = ITEMCOUNT+1. ELSE IF ITEMCOUNT EQ 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' ). '. END IF. END REPEAT. EXECUTE.

Shayne Gary wrote:

Thanks Art! I'll let you know how the analysis turns out.

Is there any chance you could document (in very brief notes) the syntax code you sent me to help me follow the logic? That would be great. I want to make sure I understand every detail of what is happening and looking up each command in isolation in the reference manual is proving VERY slow and incomplete since the explanations of the commands do not lend insight about the logic of your set of commands put together.

Also, when I open the syntax file (SCORE.SPS) that is created at the end of the code you sent, the COMPUTE command tries to sum variables with the names ITEM05, ITEM20, ...ITEMn. However, I get an error msg indicating these variable names are not defined.

Thanks for your help!

Shayne
On Tue, Aug 25, 2009 at 10:36 PM, Art Kendall <[hidden email]> wrote:
I do not believe there are latent dimensions in the set of items. Knowledge about one aspect of the domain is not necessarily correlated with knowledge about other aspects. In other words, these are not multiple items geared towards imperfectly measuring a smaller set of latent constructs. Let me know if this makes sense to you.

I think you are saying that all of the items are intended to measure a single construct.
How do they perform as a single scale in RELIABILITY?
Do they load on a single factor in a PAF?
Assuming that you have thousand of cases:

What is the improvement in R**2 if you predict the DV from all 69 items in one equation. (step 1, METHOD=ENTER all the other variables, step2 METHOD=ENTER the 69 items).

There are many reasons not to be guilty of stepwise regression. If you take the results with a big helping of salt and hold your nose, you might try
step 1, METHOD = ENTER all the other variables, step2 METHOD = STEPWISE the 69 items.
step 1, METHOD=ENTER all the other variables, step2 METHOD=ENTER the 11 items, step3 METHOD=STEPWISE).
Keep in mind that the goal of scale creation is to create a summative score that most reliably and validly measure a single construct.
Whereas in model development, internal consistency of items in a scale would lead to a lot of collinearity when the items are used as separate predictor variables.

It would be interesting to see what you find out.

Art Kendall
Social Research Consultants
Shayne Gary wrote:
Thanks Art! I will try running this syntax and may get back to you if I get lost. I am new to spss syntax.

I was trying to cobble together code that was on a different path. First, I flipped the 69 vars and then used the sampling command normally used for sampling cases, flipped the sampled results back, and then computed the scales. I was having some trouble with this syntax, so I am delighted you sent your syntax.

In terms of whether I really want to do this... there are a couple of issues at work. The first is pragmatic. A reviewer has asked me to do this analysis on a paper I have under review at a journal, so I figured it is easier to do the analysis and then make an argument against incorporating the results into the paper if I do not think it makes sense.

The second issue is that I think it might actually make sense. I have identified 11 out of the total 69 items that compose my knowledge test that SHOULD be a significant predictor of performance. These 11 items are key principles in this knowledge domain (well established in prior research). When I compute the subscale using these 11 items, I find they are indeed a significant predictor of performance. However, the reviewer has asked quite rightly (I think) whether these 11 items are really 'special' or different from any other randomly chosen 11 items from the knowledge test. S/he suggested randomly sampling from the 69 items 1000 times and then to run the regression with each computed to subscale to create a pdf of improvement of model fit. This will enable me to test whether the 11 item scale identified in prior research as the key principles is significantly better than other random 11 item combinations.

I do not believe there are latent dimensions in the set of items. Knowledge about one aspect of the domain is not necessarily correlated with knowledge about other aspects. In other words, these are not multiple items geared towards imperfectly measuring a smaller set of latent constructs. Let me know if this makes sense to you.

Thanks again!

Best,
Shayne
On Sun, Aug 23, 2009 at 1:05 AM, Art Kendall <[hidden email]> wrote:
I cobbled together syntax to sample exactly 11 of 69 items 1000 times. see below the sig block.
You would need to set the seed and the filespec for the output text.

HOWEVER, are you sure you want to do this?
If you do this as an exercise, see how the results compare to using reliability on all 69 items and whittling away 1) all those that lower the internal consistency reliability then 2) one or two items at a time until you are down to 11. Then try that in your regressions.
That way you will work toward having as reliable a scale as you can.

Are you sure that there is a single latent dimension in the set of items?

Art Kendall
Social Research Consultants

*Create syntax to 1000 times randomly select 11 of 69 items to form a scale. set seed = 200908221. input program. vector SCALE (1000,f10.8). loop ITEM = 1 to 69. loop #k = 1 to 1000. compute SCALE(#k) = rv.uniform(0,2**31). end loop. end case. end loop. end file. end input program. RANK VARIABLES = SCALE1 TO SCALE1000 /RANK INTO RSCALE1 TO RSCALE1000. FORMATS RSCALE1 TO RSCALE1000 (F2). MULT RESPONSE GROUPS=$FIRST100 'FIRST 100' (rscale1 TO rscale100 (1,69)) /FREQUENCIES=$FIRST100. RECODE RSCALE1 TO RSCALE1000(1 THRU 11 =1)(ELSE=0) INTO KEEPER1 TO KEEPER1000. FORMATS KEEPER1 TO KEEPER1000 (F1). FLIP VARIABLES=KEEPER1 TO KEEPER1000. FORMATS VAR001 TO VAR069(F1). STRING SCALESTRING (A4). STRING INDSTRING (A2). COMPUTE SCALESTRING = STRING($CASENUM,N4). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /'COMPUTE SCALE' SCALESTRING ' = SUM('. COMPUTE ITEMCOUNT = 1. DO REPEAT INDEX = 1 TO 69/FLAG =VAR001 TO VAR069. DO IF ITEMCOUNT LT 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' + '. COMPUTE ITEMCOUNT = ITEMCOUNT+1. ELSE IF ITEMCOUNT EQ 11 AND FLAG EQ 1. COMPUTE INDSTRING = STRING(INDEX,N2). WRITE OUTFILE= 'D:\PROJECT\SCORE.SPS' /' ITEM' INDSTRING ' ). '. END IF. END REPEAT. EXECUTE.

Shayne wrote:
I have 69 separate variables that are all knowledge test questions (call
them v1 to v69) that are scored as correct (1) or incorrect (0). I would
like to create a subscale (in this case a partial knowledge score) from
these variables by randomly sampling 11 items out of the 69. The subscale is
then computed by taking the mean of the 11 items.

I would then like to use the subscale in a regression estimate (using other
variables in my data as the DV and also other IV's) and want to save the
regression results (specifically the 'CHANGE' in R Squ when I enter the
subscale into the model in the 2nd block).

The trick is that I want to repeat this procedure of creating a new random
subscale and running and saving the regression estimates 1,000 times. This
will enable me to create a pdf of improvement of model fit so that I can
test whether a particular combination of the 11 item subscale is
significantly better than other random 11 item combinations.

I would greatly appreciate suggestions for syntax/macro code to
automate this procedure. Thanks!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD