Hello,
For a paper we are working on, I need to randomly sort data from one column/variable into another. So, if the data in the mother column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a random id variable and sort based on that but the problem is that "Sort" will change the sequence of all cases not just the column of interest. I could copy and paste but this will take an inordinately long time for the 10,000 random columns we have planned. I would greatly appreciate any suggestions. thanks, Vishal Dr. Vishal Lala Assistant Professor of Marketing Pace University |
Hi Vishal
VL> For a paper we are working on, I need to randomly sort data from one VL> column/variable into another. So, if the data in the mother VL> column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be VL> randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a VL> random id variable and sort based on that but the problem is that "Sort" VL> will change the sequence of all cases not just the column of interest. I VL> could copy and paste but this will take an inordinately long time for the VL> 10,000 random columns we have planned. I would greatly appreciate any VL> suggestions. This task is easy with MATRIX. * Tiny example dataset *. DATA LIST FREE/var1 (F8). BEGIN DATA. 2 7 5 9 4 END DATA. MATRIX. GET data /VAR=var1. COMPUTE randoms=UNIFORM(NROW(data),1). COMPUTE sdata=data. COMPUTE sdata(GRADE(randoms))=data. COMPUTE vname={'shuffled'}. SAVE sdata /OUTFILE='C:\Temp\RandomlyShuffledData.sav'/NAMES=vname. END MATRIX. MATCH FILES /FILE=* /FILE='C:\Temp\\RandomlyShuffledData.sav'. LIST. Ìf you plan to do that for a lot of variables (10,000!) then the wisest thing to do would be turn this code into a simple MACRO that loops thru the whole file. If you need help to do that, please do not hesitate to ask me. -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician ----------------------------------------------------------------------- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
In reply to this post by Vishal Lala
Hi Marta and Vishal (and others)
I am not so familiar with the MATRIX statement, but it can be easily done without it. Just create a file with only the variable to be randomly sorted and follow the procedure that Vishal suggested, then MATCH the file with the original file. As an example (V1 is the variable to be "shuffled"). GET FILE "s:\test.sav". MATCH FILES FILE * /KEEP v1. * the following is some aritrary random function: COMPUTE rn = rv.normal(1, 10). SORT CASES BY rn. MATCH FILES FILE * /RENAME v1 = RandomCopyV1 /KEEP RandomCopyV1. MATCH FILES FILE * /FILE "s:\test.sav". Antoon Smulders -----Oorspronkelijk bericht----- Van: SPSSX(r) Discussion [mailto:[hidden email]] Namens Marta García-Granero Verzonden: dinsdag 25 juli 2006 20:19 Aan: [hidden email] Onderwerp: Re: Randomly Sort a variable Hi Vishal VL> For a paper we are working on, I need to randomly sort data from one VL> column/variable into another. So, if the data in the mother VL> column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be VL> randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a VL> random id variable and sort based on that but the problem is that "Sort" VL> will change the sequence of all cases not just the column of interest. I VL> could copy and paste but this will take an inordinately long time for the VL> 10,000 random columns we have planned. I would greatly appreciate any VL> suggestions. This task is easy with MATRIX. * Tiny example dataset *. DATA LIST FREE/var1 (F8). BEGIN DATA. 2 7 5 9 4 END DATA. MATRIX. GET data /VAR=var1. COMPUTE randoms=UNIFORM(NROW(data),1). COMPUTE sdata=data. COMPUTE sdata(GRADE(randoms))=data. COMPUTE vname={'shuffled'}. SAVE sdata /OUTFILE='C:\Temp\RandomlyShuffledData.sav'/NAMES=vname. END MATRIX. MATCH FILES /FILE=* /FILE='C:\Temp\\RandomlyShuffledData.sav'. LIST. Ìf you plan to do that for a lot of variables (10,000!) then the wisest thing to do would be turn this code into a simple MACRO that loops thru the whole file. If you need help to do that, please do not hesitate to ask me. -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician ----------------------------------------------------------------------- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
In reply to this post by Vishal Lala
Hi Vishal
VL> Thank you very much for your response to my question on randomly VL> sorting a variable. Your syntax using Matrix is very elegant. VL> Pardon my ignorance of SPSS syntax but I am only beginning to play VL> around with the syntax. To this point, I have largely relied on VL> the point and click options in SPSS not realizing what I have been VL> missing out on. I was wondering if you could also help me run this VL> code repeatedly, say 10,000 times. Is a Loop a good way to go? Assuming you want to randomly sort 10,000 times one single variable, the MATRIX code can be easily adapted: DATA LIST FREE/var1 (F8). BEGIN DATA 2 7 5 9 4 3 1 6 8 END DATA. PRESERVE. SET MXLOOPS=10000. MATRIX. GET data /VAR=var1. COMPUTE shuffled=MAKE(NROW(data),10000,0). LOOP i=1 TO 10000. . COMPUTE randoms=UNIFORM(NROW(data),1). . COMPUTE sdata=data. . COMPUTE sdata(GRADE(randoms))=data. . COMPUTE shuffled(:,i)=sdata. END LOOP. SAVE shuffled /OUTFILE='C:\Temp\RandomlyShuffledData.sav'. PRINT /TITLE='10,000 shuffled columns saved to disk'. PRINT /TITLE='Matching them to active dataset will take some time...'. END MATRIX. RESTORE. * This step takes quite long, depending on the sample size *. MATCH FILES /FILE=* /FILE='C:\Temp\\RandomlyShuffledData.sav'. FORMAT col1 TO col10000(F8). EXE. /* Not really necessary *. (Now, please, people who read this message, DON'T start sending me private messages telling me of better (?) ways of handling this situation without using MATRIX, as if I was doing something wrong using this tool. I know there is usually more than one way of solving a problem, but I like MATRIX, is a very powerful tool for data handling. If you want to give an alternative answer, send the message to this list, and everybody will benefit). If you want to sort 10,000 different variables one time, tell me and I'll adapt the code again. Regards, Marta |
In reply to this post by Vishal Lala
Shalom
Here is how to get as many random columns as you need without using any macro . title Randomly Sort a variable. input program . loop i=1 to 12 . compute val=trunc(unifrom(200))+1. end case . end loop . end file . end input program . execute . formats val(f4). * >>>>>>>>>>>>>>>>>> change here to get as many copy as needed <<<<<< loop i=1 to 8 . xsave outfile=tmp.sav. end loop . get file=tmp.sav. compute var1=unifrom(2000000). sort cases by i var1 . add files file=* /first=start/by i . numeric seq(f4). leave seq. if start eq 1 seq=0. compute seq=sum(seq,1) . sort cases by seq i . CASESTOVARS /ID = seq /INDEX = i /seperator="" /drop=var1 start /GROUPBY = VARIABLE . Hillel Vardi Ben Gurion U Israel Vishal Lala wrote: > Hello, > > For a paper we are working on, I need to randomly sort data from one > column/variable into another. So, if the data in the mother > column/variable is 2, 7, 5, 9, 4, the data in the next column needs to be > randomly sorted like say 4, 5, 9, 7, 2. One way to do this is to create a > random id variable and sort based on that but the problem is that "Sort" > will change the sequence of all cases not just the column of interest. I > could copy and paste but this will take an inordinately long time for the > 10,000 random columns we have planned. I would greatly appreciate any > suggestions. > > thanks, > Vishal > > > Dr. Vishal Lala > Assistant Professor of Marketing > Pace University > > |
Free forum by Nabble | Edit this page |