SPSSX Discussion

Randomly Sort a variable

Classic

List

Threaded

5 messages Options

Vishal Lala

Randomly Sort a variable

Hello,

For a paper we are working on, I need to randomly sort data from one
column/variable into another. So, if the data in the mother
column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be
randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a
random id variable and sort based on that but the problem is that "Sort"
will change the sequence of all cases not just the column of interest. I
could copy and paste but this will take an inordinately long time for the
10,000 random columns we have planned. I would greatly appreciate any
suggestions.

thanks,
Vishal

Dr. Vishal Lala
Assistant Professor of Marketing
Pace University

Marta García-Granero

Re: Randomly Sort a variable

Hi Vishal

VL> For a paper we are working on, I need to randomly sort data from one
VL> column/variable into another. So, if the data in the mother
VL> column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be
VL> randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a
VL> random id variable and sort based on that but the problem is that "Sort"
VL> will change the sequence of all cases not just the column of interest. I
VL> could copy and paste but this will take an inordinately long time for the
VL> 10,000 random columns we have planned. I would greatly appreciate any
VL> suggestions.

This task is easy with MATRIX.

* Tiny example dataset *.
DATA LIST FREE/var1 (F8).
BEGIN DATA.
2 7 5 9 4
END DATA.

MATRIX.
GET data /VAR=var1.
COMPUTE randoms=UNIFORM(NROW(data),1).
COMPUTE sdata=data.
COMPUTE sdata(GRADE(randoms))=data.
COMPUTE vname={'shuffled'}.
SAVE sdata /OUTFILE='C:\Temp\RandomlyShuffledData.sav'/NAMES=vname.
END MATRIX.
MATCH FILES /FILE=* /FILE='C:\Temp\\RandomlyShuffledData.sav'.
LIST.

Ìf you plan to do that for a lot of variables (10,000!) then the
wisest thing to do would be turn this code into a simple MACRO that
loops thru the whole file. If you need help to do that, please do not
hesitate to ask me.

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

-----------------------------------------------------------------------
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)

Antoon Smulders

Re: Randomly Sort a variable

In reply to this post by Vishal Lala

Hi Marta and Vishal (and others)

I am not so familiar with the MATRIX statement, but it can be easily done
without it. Just create a file with only the variable to be randomly sorted
and follow the procedure that Vishal suggested, then MATCH the file with the
original file.
As an example (V1 is the variable to be "shuffled").

GET FILE "s:\test.sav".
MATCH FILES FILE * /KEEP v1.
* the following is some aritrary random function:
COMPUTE rn = rv.normal(1, 10).
SORT CASES BY rn.
MATCH FILES FILE * /RENAME v1 = RandomCopyV1 /KEEP RandomCopyV1.
MATCH FILES FILE * /FILE "s:\test.sav".

Antoon Smulders

-----Oorspronkelijk bericht-----
Van: SPSSX(r) Discussion [mailto:[hidden email]] Namens Marta
García-Granero
Verzonden: dinsdag 25 juli 2006 20:19
Aan: [hidden email]
Onderwerp: Re: Randomly Sort a variable

Hi Vishal

VL> For a paper we are working on, I need to randomly sort data from one
VL> column/variable into another. So, if the data in the mother
VL> column/variable is 2, 7, 5, 9, 4, the data in the next column needs to
be
VL> randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create
a
VL> random id variable and sort based on that but the problem is that "Sort"
VL> will change the sequence of all cases not just the column of interest. I
VL> could copy and paste but this will take an inordinately long time for
the
VL> 10,000 random columns we have planned. I would greatly appreciate any
VL> suggestions.

This task is easy with MATRIX.

* Tiny example dataset *.
DATA LIST FREE/var1 (F8).
BEGIN DATA.
2 7 5 9 4
END DATA.

MATRIX.
GET data /VAR=var1.
COMPUTE randoms=UNIFORM(NROW(data),1).
COMPUTE sdata=data.
COMPUTE sdata(GRADE(randoms))=data.
COMPUTE vname={'shuffled'}.
SAVE sdata /OUTFILE='C:\Temp\RandomlyShuffledData.sav'/NAMES=vname.
END MATRIX.
MATCH FILES /FILE=* /FILE='C:\Temp\\RandomlyShuffledData.sav'.
LIST.

Ìf you plan to do that for a lot of variables (10,000!) then the
wisest thing to do would be turn this code into a simple MACRO that
loops thru the whole file. If you need help to do that, please do not
hesitate to ask me.

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

-----------------------------------------------------------------------
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)

Marta García-Granero

Re: Randomly Sort a variable

In reply to this post by Vishal Lala

Hi Vishal

VL> Thank you very much for your response to my question on randomly
VL> sorting a variable. Your syntax using Matrix is very elegant.
VL> Pardon my ignorance of SPSS syntax but I am only beginning to play
VL> around with the syntax. To this point, I have largely relied on
VL> the point and click options in SPSS not realizing what I have been
VL> missing out on. I was wondering if you could also help me run this
VL> code repeatedly, say 10,000 times. Is a Loop a good way to go?

Assuming you want to randomly sort 10,000 times one single variable,
the MATRIX code can be easily adapted:

DATA LIST FREE/var1 (F8).
BEGIN DATA
2 7 5 9 4 3 1 6 8
END DATA.

PRESERVE.
SET MXLOOPS=10000.
MATRIX.
GET data /VAR=var1.
COMPUTE shuffled=MAKE(NROW(data),10000,0).
LOOP i=1 TO 10000.
. COMPUTE randoms=UNIFORM(NROW(data),1).
. COMPUTE sdata=data.
. COMPUTE sdata(GRADE(randoms))=data.
. COMPUTE shuffled(:,i)=sdata.
END LOOP.
SAVE shuffled /OUTFILE='C:\Temp\RandomlyShuffledData.sav'.
PRINT /TITLE='10,000 shuffled columns saved to disk'.
PRINT /TITLE='Matching them to active dataset will take some time...'.
END MATRIX.
RESTORE.

* This step takes quite long, depending on the sample size *.
MATCH FILES
/FILE=* /FILE='C:\Temp\\RandomlyShuffledData.sav'.
FORMAT col1 TO col10000(F8).
EXE. /* Not really necessary *.

(Now, please, people who read this message, DON'T start sending me
private messages telling me of better (?) ways of handling this
situation without using MATRIX, as if I was doing something wrong
using this tool. I know there is usually more than one way of solving
a problem, but I like MATRIX, is a very powerful tool for data
handling. If you want to give an alternative answer, send the message
to this list, and everybody will benefit).

If you want to sort 10,000 different variables one time, tell me and
I'll adapt the code again.

Regards,
Marta

hillel vardi

Re: Randomly Sort a variable

In reply to this post by Vishal Lala

Shalom

Here is how to get as many random columns as you need without using any
macro .

title Randomly Sort a variable.
input program .
loop i=1 to 12 .
compute val=trunc(unifrom(200))+1.
end case .
end loop .
end file .
end input program .
execute .
formats val(f4).
* >>>>>>>>>>>>>>>>>> change here to get as many copy as needed <<<<<<
loop i=1 to 8 .
xsave outfile=tmp.sav.
end loop .
get file=tmp.sav.
compute var1=unifrom(2000000).
sort cases by i var1 .
add files file=* /first=start/by i .
numeric seq(f4).
leave seq.
if start eq 1 seq=0.
compute seq=sum(seq,1) .
sort cases by seq i .
CASESTOVARS
/ID = seq
/INDEX = i
/seperator=""
/drop=var1 start
/GROUPBY = VARIABLE .

Hillel Vardi
Ben Gurion U
Israel

Vishal Lala wrote:

> Hello,
>
> For a paper we are working on, I need to randomly sort data from one
> column/variable into another. So, if the data in the mother
> column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be
> randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a
> random id variable and sort based on that but the problem is that "Sort"
> will change the sequence of all cases not just the column of interest. I
> could copy and paste but this will take an inordinately long time for the
> 10,000 random columns we have planned. I would greatly appreciate any
> suggestions.
>
> thanks,
> Vishal
>
>
> Dr. Vishal Lala
> Assistant Professor of Marketing
> Pace University
>
>