Random selecting variables for a score

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Random selecting variables for a score

Christopher Lowenkamp-2
I have a dataset that has 15 individual varialbes that are summed together to create an overall risk score.
 
I would like to create a series of risk scores that include a random selection of those 15 variables.  For instance a risk score that is made up of 10 of the 15 variables where the variables are chosen at random for each case.
 
How can I go about doing this?
 
Thanks
Chris
 
   
Reply | Threaded
Open this post in threaded view
|

Re: Random selecting variables for a score

Albert-Jan Roskam
Hi!

Something like this? This sums up 3 randomly chosen vars from a list of 5 vars. The example is nonsensical, but it just demonstrates one way to do it.

GET FILE = 'c:/program files/spss/employee data.sav'.
SET MPRINT = ON.
BEGIN PROGRAM.
import random, spss
def selvars (vars, size, outvar):
        if size <= len (vars):
                samp = ", ".join(random.sample(vars, size))
                spss.Submit("compute %s = sum(%s)." % (outvar, samp)
        else:
                print "--> Error: Sample >= Population!"
selvars (vars = ['id', 'jobcat', 'jobtime', 'educ', 'minority'], size = 3, outvar = score)
END PROGRAM.



--- On Thu, 4/16/09, Christopher Lowenkamp <[hidden email]> wrote:

> From: Christopher Lowenkamp <[hidden email]>
> Subject: Random selecting variables for a score
> To: [hidden email]
> Date: Thursday, April 16, 2009, 4:41 AM
> I have a dataset that has 15
> individual varialbes that are summed together to create an
> overall risk score.
>
> I would like to create a series of risk scores that
> include a random selection of those 15 variables.  For
> instance a risk score that is made up of 10 of the 15
> variables where the variables are chosen at random for each
> case.
>
>
> How can I go about doing this?
>
> Thanks
> Chris
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Random selecting variables for a score

Richard Ristow
In reply to this post by Christopher Lowenkamp-2
First, a question for anyone: How does the output and code in my postings look? Now that the list takes HTML, I've been putting code and output in a small fixed-pitch font, instead of unformatted. Is it readable?

At 10:41 PM 4/15/2009, Christopher Lowenkamp wrote:

I have a dataset that has 15 variables that are summed together to create an overall risk score. I would like to create a series of risk scores that include a random selection of those 15 variables.  For instance a risk score that is made up of 10 of the 15 variables where the variables are chosen at random for each case.

The following code creates a risk score as the sum of 3 the 5 input variables, the 3 randomly selected independently for each case. (Albert-Jan, I'm not sure I'm reading your Python code right. What SPSS does it expand into? Does it select a different set of components for each case?)

Test data:
|-----------------------------|---------------------------|
|Output Created               |16-APR-2009 22:58:08       |
|-----------------------------|---------------------------|
CaseID  IndV1  IndV2  IndV3  IndV4  IndV5

  001   -1.14   2.90    .22    .05   4.22
  002    1.57    .36   2.18    .44   2.80
  003    5.33  -1.01    .16   2.08   2.80
  004    2.80   5.08  -1.37   1.05   2.60
  005    1.83   3.29   3.60    .61   2.54
  006    3.00   1.22   -.25   3.91   1.91
  007    2.12   1.44   1.51   1.07   4.12
  008    5.87    .56   1.69    .00   1.67

Number of cases read:  8    Number of cases listed:  8

Code and output:
 
*  .....   The list of variables that may contribute to the    ..... .
*  .....   risk score.                                         ..... .
VECTOR IndV=IndV1 TO IndV5.

*  .....   Flag variables, indicating whether the corresponding..... .
*  .....   input variable contributes to the risk score.       ..... .
*  .....   Omit, if this is not needed.                        ..... .
VECTOR Ctr(5,F2).

NUMERIC RiskScore  (F6.2).
COMPUTE RiskScore = 0.


*  .....   Select 3 of the 5 independent variables, using     ..... .
*  .....   the 'k/n/ sampling algorithm                       ..... .

COMPUTE #K = 3  /* Sample size     */.
COMPUTE #N = 5  /* Population size */.

LOOP #Idx = 1 TO 5.

*  ... "UseIt" indicates whether or not this variable        ...... .
*  ... contributes to the risk score.                        ...... .

*  ... Randomly select #UseIt:                               ...... .

.      COMPUTE #UseIt = RV.BERNOULLI(#K/#N).
.      COMPUTE #K     = #K - #UseIt.
.      COMPUTE #N     = #N -1.

*  ... If desired, set a permanent flag indicating whether  ...... .
*  ... or not this variable contributes to the score.       ...... .

.      COMPUTE Ctr(#Idx) = #UseIt.

*  ...  Update the risk score, adding the current           ....... .
*  ...  variable's value if it is to contribute to the      ....... .
*  ...  risk score.                                         ....... .

*  ...  SUM is used instead of simple "+", so missing       ....... .
*  ...  input values don't cause the result to be missing.  ....... .

.       IF   #UseIt
        RiskScore  = SUM(RiskScore,IndV(#Idx)).

END LOOP.

TEMPORARY.
STRING SPACE(A8).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |16-APR-2009 22:58:10       |
|-----------------------------|---------------------------|
Cas                                    Ct
eID  IndV1  IndV2  IndV3  IndV4  IndV5 r1 Ctr2 Ctr3 Ctr4 Ctr5 RiskScore

001  -1.14   2.90    .22    .05   4.22  0   1    0    1    1      7.18
002   1.57    .36   2.18    .44   2.80  0   0    1    1   1      5.42
003   5.33  -1.01    .16   2.08   2.80  1   0    0   1    1     10.21
004   2.80   5.08  -1.37   1.05   2.60  1   0    0   1    1      6.45
005   1.83   3.29   3.60    .61   2.54  1   0    1    1   0      6.04
006   3.00   1.22   -.25   3.91   1.91  0   1    1   0    1      2.88
007   2.12   1.44   1.51   1.07   4.12  1   1    1   0    0      5.07
008   5.87    .56   1.69    .00   1.67  1   0    1    1   0      7.56

Number of cases read:  8    Number of cases listed:  8
=============================
APPENDIX: Test data, and code
=============================
*  C:\Documents and Settings\Richard\My Documents                      .
*    \Technical\spssx-l\Z-2009b                                        .
*    \2009-04-15 Lowenkamp - Random selecting variables for a score.SPS.

*  In response to posting                                            .
*  Date:    Wed, 15 Apr 2009 22:41:49 -0400                          .
*  From:    Christopher Lowenkamp <[hidden email]>             .
*  Subject: Random selecting variables for a score                   .
*  To:      [hidden email]                                 .

*  "I have 15 varialbes that are summed to create an overall risk    .
*  score. I would like to create a series of risk scores that        .
*  include a random selection of those 15 variables, 10 of the 15    .
*  variables where the variables are chosen at random for each       .
*  case."                                                            .

*  ................................................................. .
*  .................   Test data               ..................... .
SET RNG = MT       /* 'Mersenne twister' random number generator  */ .
SET MTINDEX = 6111 /*  Providence, RI telephone book              */ .

INPUT PROGRAM.
.  NUMERIC CaseID (N3).
.  VECTOR  IndV(5,F6.2).
.  LOOP CaseID = 1 TO 8.
.     LOOP  #Idx = 1 TO 5.
.        COMPUTE IndV(#Idx) = RV.NORMAL(2,2).
.     END LOOP.
.     END CASE.
.  END LOOP.
END FILE.
END INPUT PROGRAM.


LIST.

*  ................................................................. .
*  .................   Logic                   ..................... .

*  .....   The list of variables that may contribute to the    ..... .
*  .....   risk score.                                         ..... .
VECTOR IndV=IndV1 TO IndV5.

*  .....   Flag variables, indicating whether the corresponding..... .
*  .....   input variable contributes to the risk score.       ..... .
*  .....   Omit, if this is not needed.                        ..... .
VECTOR Ctr(5,F2).

NUMERIC RiskScore  (F6.2).
COMPUTE RiskScore = 0.


*  .....   Select 3 of the 5 independent variables, using     ..... .
*  .....   the 'k/n/ sampling algorithm                       ..... .

COMPUTE #K = 3  /* Sample size     */.
COMPUTE #N = 5  /* Population size */.

LOOP #Idx = 1 TO 5.

*  ... "UseIt" indicates whether or not this variable        ...... .
*  ... contributes to the risk score.                        ...... .

*  ... Randomly select #UseIt:                               ...... .

.      COMPUTE #UseIt = RV.BERNOULLI(#K/#N).
.      COMPUTE #K     = #K - #UseIt.
.      COMPUTE #N     = #N -1.

*  ... If desired, set a permanent flag indicating whether  ...... .
*  ... or not this variable contributes to the score.       ...... .

.      COMPUTE Ctr(#Idx) = #UseIt.

*  ...  Update the risk score, adding the current           ....... .
*  ...  variable's value if it is to contribute to the      ....... .
*  ...  risk score.                                         ....... .

*  ...  SUM is used instead of simple "+", so missing       ....... .
*  ...  input values don't cause the result to be missing.  ....... .

.       IF   #UseIt
        RiskScore  = SUM(RiskScore,IndV(#Idx)).

END LOOP.

TEMPORARY.
STRING SPACE(A8).
LIST.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD