SPSSX Discussion

sampling from a list of variables

Classic

List

Threaded

3 messages Options

progster

sampling from a list of variables

I have a list of variables (a vector) and I want to sample randomly a certain number of variables, e.g. 30%.

Then doing several extractions, e.g. 50.

So finally I will have 50 new different lists.

The idea that I had was to put the names in a .sav file in a variable and then sampling that data, but I think there's something more efficient.

My aim is to use those lists as differents inputs for launching Do macros.

for instance if my list is v1 to v10, the first sample could be v7,v9,v10, the second v2,v4,v10 etc.

Any hint? is that possible?

David Marso

Re: sampling from a list of variables

Administrator

"My aim is to use those lists as differents inputs for launching Do macros. "

Please clarify this concept?
Meanwhile search this list for Bootstrapping (except you are doing variables rather than cases. Same principle applies.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

progster

Re: sampling from a list of variables

This post was updated on .

Yes, I looked at this for bootstrapping (and at the end a python code allows me to randomly select a list of variables)

My aim is to launch several different bootstrapping and every time also with a different list of variables without pasting manually in list1 "indvars" a random list of names, then changing it in list2 etc.

***oms_bootstrapping.sps***.

PRESERVE.
SET TVARS NAMES.

***first OMS command just suppresses Viewer output***.
OMS /DESTINATION VIEWER=YES.

DATASET DECLARE bootstrap_example.

***select regression coefficients tables and write to data file***.
***Note that DIMNAMES values vary based on output language***.
***/COLUMNS SEQUENCE=[R2 C1] will achieve the same result in all languages***.

OMS /SELECT TABLES
/IF COMMANDS=['Logistic Regression'] SUBTYPES = ['Variables in the Equation']
/DESTINATION FORMAT=SAV OUTFILE='bootstrap_example'
/COLUMNS DIMNAMES=['Variables' 'Statistics']
/TAG='logeg_coeff'.

DATASET DECLARE CF_MATRIX.
OMS
/SELECT TABLES
/IF COMMANDS=['Logistic Regression'] SUBTYPES=['Classification Table']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='CF_MATRIX' VIEWER=YES.

***define a macro to draw samples with replacement and run Regression commands***.
DEFINE regression_bootstrap (samples=!TOKENS(1)
/depvar=!TOKENS(1)
/indvars=!CMDEND)

COMPUTE dummyvar=1.
AGGREGATE
/OUTFILE = * MODE = ADDVARIABLES
/BREAK=dummyvar
/filesize=N.
!DO !other=1 !TO !samples
SET SEED RANDOM.
WEIGHT OFF.
FILTER OFF.
DO IF $casenum=1.
- COMPUTE #samplesize=filesize.
- COMPUTE #filesize=filesize.
END IF.
DO IF (#samplesize>0 and #filesize>0).
- COMPUTE sampleWeight=rv.binom(#samplesize, 1/#filesize).
- COMPUTE #samplesize=#samplesize-sampleWeight.
- COMPUTE #filesize=#filesize-1.
ELSE.
- COMPUTE sampleWeight=0.
END IF.
WEIGHT BY sampleWeight.
FILTER BY sampleWeight.

LOGISTIC REGRESSION VARIABLES target_v
/METHOD=ENTER !indvars
/SAVE=PRED
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

!DOEND
!ENDDEFINE.

***insert any valid path\data file name***.
*GET FILE='e:\miscellaneous\ms_rbs_data2.sav'.
***Call the macro, and specify number of samples, dependent variable, and independent variables***.

*list 1.
regression_bootstrap
samples=10
indvars= v1 v2 v3 v6 v7 v8 v9 v10.

*list 2.
regression_bootstrap
samples=10
indvars= v1 v2 v3 v6 v5 v8 v9 v10.

OMSEND.

DATASET ACTIVATE bootstrap_example.

RESTORE.

import csv
import random

items = set(['v1','v2','v3','v4','v5','v6','v7','v8','v9','v10'])
with open('file.csv', 'wb') as csvfile:
sampling = csv.writer(csvfile, delimiter=',')
for _ in range(10):
sampling.writerow(random.sample(items, 7))