SPSSX Discussion

randomly selecting variables

Classic

List

Threaded

5 messages Options

nina

randomly selecting variables

Hello,

I haven an unusually large number of items (k=60) which other researchers
have used to build a scale. I think it could make sense to drastically
reduce the number of items. Because the intercorrelations of the items are
highly similar I would like to draw random samples of items (not cases) to
check the internal consisteny of a scale with e.g. 30, 15 or 5 items.
Question is: which syntax coculd be used to tell spss to draw random samples
of a selected number of variables, not cases?

Many thanks for your replies
N.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: randomly selecting variables

Are you sure that there is only one factor (concept, construct, idea)
underlying the 60 items?!?

How many cases do you have?
What is the response scale. I.e., what legitimate values can the item
responses have?

Did you try principal axis factoring? What did you use as a stopping rule
to determine the number of factors to retain?

Did you do a parallel analysis to compare your eigenvalues with those from a
parallel analysis?

if there is a single construct it is amazing that that the matrix is not
singular.

What is the substantive nature of the scale?

-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Baker, Harley

Re: randomly selecting variables

In reply to this post by nina

Hello,

You could do this, I suppose. But that is not the way psychometricians would recommend. A better strategy would be to run a principal components analysis and focus solely on the first principal component. Rank-order the items in terms of their factor loadings ( and other important criteria - e.g., floor/ceiling effects) and "create" scales consisting of the highest-loading 30 items for a 30-item scale, 15 items for a 15-item scale, etc. Rerun the same analysis of just those items and then run reliability on scales so composed. This iterative procedure, though a bit clunky, is likely a better approach psychometrically and statistically than what you propose.

There are better ways of doing this using IRT, but SPSS doesn't really do IRT modeling (yet.)

Harley

On 10/30/18, 12:59 PM, "SPSSX(r) Discussion on behalf of nina" <[hidden email] on behalf of [hidden email]> wrote:

Hello,

I haven an unusually large number of items (k=60) which other researchers
have used to build a scale. I think it could make sense to drastically
reduce the number of items. Because the intercorrelations of the items are
highly similar I would like to draw random samples of items (not cases) to
check the internal consisteny of a scale with e.g. 30, 15 or 5 items.
Question is: which syntax coculd be used to tell spss to draw random samples
of a selected number of variables, not cases?

Many thanks for your replies
N.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon Peck

Re: randomly selecting variables

The STATS IRM extension command fits a three-parameter item response model, and the STATS EXRASCH extension command fits a variety of Rasch models.

Picking a random set of variables is easily done with a little Python code as here.

begin program.

import spssaux, random, spss

vardict = spssaux.VariableDict()

# specify variables using TO or a specific list here

somevars = vardict.expand("ID to salary")

# specify the number of variables to select here (3 in this example).

ranvars = random.sample(somevars, 3)

print ranvars

# create a macro listing these variables

spss.SetMacroValue("!ranvars", ranvars)

end program

On Tue, Oct 30, 2018 at 2:27 PM Baker, Harley <[hidden email]> wrote:

Hello,

You could do this, I suppose. But that is not the way psychometricians would recommend. A better strategy would be to run a principal components analysis and focus solely on the first principal component. Rank-order the items in terms of their factor loadings ( and other important criteria - e.g., floor/ceiling effects) and "create" scales consisting of the highest-loading 30 items for a 30-item scale, 15 items for a 15-item scale, etc. Rerun the same analysis of just those items and then run reliability on scales so composed. This iterative procedure, though a bit clunky, is likely a better approach psychometrically and statistically than what you propose.

There are better ways of doing this using IRT, but SPSS doesn't really do IRT modeling (yet.)

Harley

On 10/30/18, 12:59 PM, "SPSSX(r) Discussion on behalf of nina" <[hidden email] on behalf of [hidden email]> wrote:

Hello,

I haven an unusually large number of items (k=60) which other researchers
have used to build a scale. I think it could make sense to drastically
reduce the number of items. Because the intercorrelations of the items are
highly similar I would like to draw random samples of items (not cases) to
check the internal consisteny of a scale with e.g. 30, 15 or 5 items.
Question is: which syntax coculd be used to tell spss to draw random samples
of a selected number of variables, not cases?

Many thanks for your replies
N.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: randomly selecting variables

In reply to this post by nina

When I read about a large number of "items which other researchers
have used to build a scale", I'm reminded of a model that others are not

taking into account. That is, you have collected items from elsewhere, and

there is too-great redundancy between certain pairs (or sets) of items owing

mainly to slight variations in wording. That's not a problem to be fixed by

looking at the first principal component.

If I'm wrong about that, then the following comments are offered as generic

advice, for whomever it might help.

Factor analysis might help (though you have not mentioned the N) if you

want to identify a "duplex" or "triplex". For example, rotated factor #4 or #5

with 2 or 3 variables each loading above 0.90, and little else loading, shows

the redundancy.

Before getting to that - You also have not mentioned how the items are

scaled/scored. Likert-type items have reliability less than 0.70, and scaled-

dichotomies usually less than 0.45. Look at your basic, Pearson correlations.

Correlations near or above those levels are likely to share "error variance"

because they are re-wordings of the same concept - so, you want to drop one.

One of my earliest projects featured data of this sort. The PI was trying to do

everything precisely right, not realizing that experienced people were far more

casual. Anyway, he had recruited several "experts" to review his potential items,

to criticize the wording or whatever, and to rate how well they would serve. You

don't want items that can be mis-interpreted; you do want items that are generally

consistent in the way they are presented. Your first pass might start with variables

sorted by their highest intercorrelation, and ask your best experts to criticize or to

choose.

With 60 items and an N of several hundred, I would expect a factor analysis (other

than 1's on the diagonal) to produce at least a half-dozen factors with eigen-value

above 1.0 because there is so much variance to distribute. A first, informal check on

whether this universe of items does represent a single "latent concept" is whether

the first unrotated factor is a several times as large as the second factor. (Someone

probably has more exact advice on that; it has never been my concern.)

Hope this helps.

Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of nina <[hidden email]>
Sent: Tuesday, October 30, 2018 3:59 PM
To: [hidden email]
Subject: randomly selecting variables