I wish to express my sincere thanks to the following people :
Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject. Sorry to say that I have not been able to achieve my desired objective: I below is re-statement of what I want to do. Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) z1, z2 ,z3 and Z4 have identical categories(15). I want to do the following: 1.Identify 10 most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4) 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in 10 most frequent occurring categories Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10 most frequent occurring categories, set it to sysmiss 3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 Any help on this task will be very much appreciated. Warm regards to all. |
OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14?
Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi Sent: Thursday, August 03, 2006 2:29 PM To: [hidden email] Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) I wish to express my sincere thanks to the following people : Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject. Sorry to say that I have not been able to achieve my desired objective: I below is re-statement of what I want to do. Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) z1, z2 ,z3 and Z4 have identical categories(15). I want to do the following: 1.Identify 10 most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4) 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in 10 most frequent occurring categories Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10 most frequent occurring categories, set it to sysmiss 3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 Any help on this task will be very much appreciated. Warm regards to all. |
In reply to this post by Edward Boadi
OK, Beadle and List.
Lets consider this data file DATA LIST FREE/x y z1 z2 z3 z4. BEGIN DATA 2 1 1 3 3 5 1 2 4 1 4 3 1 3 4 5 9 1 1 4 5 2 5 4 2 2 5 1 3 1 1 3 2 2 2 5 1 2 1 2 1 1 1 1 9 4 1 1 1 1 2 4 5 1 1 3 1 5 1 1 1 1 2 4 4 4 1 2 2 9 4 4 2 4 5 1 2 3 1 1 1 1 9 2 1 2 5 1 1 3 1 4 5 1 2 1 1 3 1 2 4 4 END DATA. SAVE OUTFILE ='C:\Temp\originaldata.sav'. In the above data file : 1. The value "1" has the highest occurance accross z1,z2,z3 and z4 2. The next is the value "4" accross z1,z2,z3 and z4 I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4). Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 and z4 Regards. -----Original Message----- From: Beadle, ViAnn [mailto:[hidden email]] Sent: Thursday, August 03, 2006 3:52 PM To: Edward Boadi; [hidden email] Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14? Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi Sent: Thursday, August 03, 2006 2:29 PM To: [hidden email] Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) I wish to express my sincere thanks to the following people : Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject. Sorry to say that I have not been able to achieve my desired objective: I below is re-statement of what I want to do. Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) z1, z2 ,z3 and Z4 have identical categories(15). I want to do the following: 1.Identify 10 most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4) 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in 10 most frequent occurring categories Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10 most frequent occurring categories, set it to sysmiss 3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 Any help on this task will be very much appreciated. Warm regards to all. |
In reply to this post by Edward Boadi
Use the mult response procedure to tabulate your 4 z variables together as a multiple response set.
MULT RESPONSE GROUPS=z (z1 z2 z3 z4 (1,15)) /FREQUENCIES=z. Look at the table and find the top ten. Use recode to recode all other values to sysmis. -----Original Message----- From: Edward Boadi [mailto:[hidden email]] Sent: Thursday, August 03, 2006 3:29 PM To: Beadle, ViAnn; [hidden email] Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) OK, Beadle and List. Lets consider this data file DATA LIST FREE/x y z1 z2 z3 z4. BEGIN DATA 2 1 1 3 3 5 1 2 4 1 4 3 1 3 4 5 9 1 1 4 5 2 5 4 2 2 5 1 3 1 1 3 2 2 2 5 1 2 1 2 1 1 1 1 9 4 1 1 1 1 2 4 5 1 1 3 1 5 1 1 1 1 2 4 4 4 1 2 2 9 4 4 2 4 5 1 2 3 1 1 1 1 9 2 1 2 5 1 1 3 1 4 5 1 2 1 1 3 1 2 4 4 END DATA. SAVE OUTFILE ='C:\Temp\originaldata.sav'. In the above data file : 1. The value "1" has the highest occurance accross z1,z2,z3 and z4 2. The next is the value "4" accross z1,z2,z3 and z4 I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4). Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 and z4 Regards. -----Original Message----- From: Beadle, ViAnn [mailto:[hidden email]] Sent: Thursday, August 03, 2006 3:52 PM To: Edward Boadi; [hidden email] Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14? Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi Sent: Thursday, August 03, 2006 2:29 PM To: [hidden email] Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) I wish to express my sincere thanks to the following people : Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject. Sorry to say that I have not been able to achieve my desired objective: I below is re-statement of what I want to do. Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) z1, z2 ,z3 and Z4 have identical categories(15). I want to do the following: 1.Identify 10 most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4) 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in 10 most frequent occurring categories Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10 most frequent occurring categories, set it to sysmiss 3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 Any help on this task will be very much appreciated. Warm regards to all. |
In reply to this post by Edward Boadi
Thanks Beadle,
Is there a way to automate the whole process without having to look at the table created by : MULT RESPONSE GROUPS=z (z1 z2 z3 z4 (1,15)) /FREQUENCIES=z. Thanks: -----Original Message----- From: Beadle, ViAnn [mailto:[hidden email]] Sent: Thursday, August 03, 2006 4:42 PM To: Edward Boadi; [hidden email] Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) Use the mult response procedure to tabulate your 4 z variables together as a multiple response set. MULT RESPONSE GROUPS=z (z1 z2 z3 z4 (1,15)) /FREQUENCIES=z. Look at the table and find the top ten. Use recode to recode all other values to sysmis. -----Original Message----- From: Edward Boadi [mailto:[hidden email]] Sent: Thursday, August 03, 2006 3:29 PM To: Beadle, ViAnn; [hidden email] Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) OK, Beadle and List. Lets consider this data file DATA LIST FREE/x y z1 z2 z3 z4. BEGIN DATA 2 1 1 3 3 5 1 2 4 1 4 3 1 3 4 5 9 1 1 4 5 2 5 4 2 2 5 1 3 1 1 3 2 2 2 5 1 2 1 2 1 1 1 1 9 4 1 1 1 1 2 4 5 1 1 3 1 5 1 1 1 1 2 4 4 4 1 2 2 9 4 4 2 4 5 1 2 3 1 1 1 1 9 2 1 2 5 1 1 3 1 4 5 1 2 1 1 3 1 2 4 4 END DATA. SAVE OUTFILE ='C:\Temp\originaldata.sav'. In the above data file : 1. The value "1" has the highest occurance accross z1,z2,z3 and z4 2. The next is the value "4" accross z1,z2,z3 and z4 I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4). Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 and z4 Regards. -----Original Message----- From: Beadle, ViAnn [mailto:[hidden email]] Sent: Thursday, August 03, 2006 3:52 PM To: Edward Boadi; [hidden email] Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14? Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi Sent: Thursday, August 03, 2006 2:29 PM To: [hidden email] Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) I wish to express my sincere thanks to the following people : Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject. Sorry to say that I have not been able to achieve my desired objective: I below is re-statement of what I want to do. Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) z1, z2 ,z3 and Z4 have identical categories(15). I want to do the following: 1.Identify 10 most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4) 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in 10 most frequent occurring categories Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10 most frequent occurring categories, set it to sysmiss 3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 Any help on this task will be very much appreciated. Warm regards to all. |
In reply to this post by Edward Boadi
varstocases make z from z1 to z4.
flip. sel if case_lbl='z'. compute end=$sysmis. do repeat f=f1 to f15 /n=1 to 15. count f = var001 to end (n). end repeat. match file file=*/keep=f1 to f15. do repeat f=f1 to f15 /n=1 to 15. if max(f1 to f15) =f mostfreq1=n. end repeat. do repeat f=f1 to f15 /n=1 to 15. if n=mostfreq1 f=$sysmis. end repeat. do repeat f=f1 to f15 /n=1 to 15. if max(f1 to f15) =f mostfreq2=n. end repeat. exe. write outfile='c:\temp\recodes.sps' /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " mostfreq2 " = " mostfreq2 " ) (else = sysmis )". get file = 'c:\temp\originaldata.sav'. include file = 'c:\temp\recodes.sps'. exe. At 06:29 AM 4/08/2006, you wrote: >OK, Beadle and List. >Lets consider this data file > >DATA LIST FREE/x y z1 z2 z3 z4. >BEGIN DATA >2 1 1 3 3 5 >1 2 4 1 4 3 >1 3 4 5 9 1 >1 4 5 2 5 4 >2 2 5 1 3 1 >1 3 2 2 2 5 >1 2 1 2 1 1 >1 1 9 4 1 1 >1 1 2 4 5 1 >1 3 1 5 1 1 >1 1 2 4 4 4 >1 2 2 9 4 4 >2 4 5 1 2 3 >1 1 1 1 9 2 >1 2 5 1 1 3 >1 4 5 1 2 1 >1 3 1 2 4 4 >END DATA. > >SAVE OUTFILE ='C:\Temp\originaldata.sav'. > >In the above data file : > >1. The value "1" has the highest occurance accross z1,z2,z3 and z4 >2. The next is the value "4" accross z1,z2,z3 and z4 > >I want to create a new data file 'C:\Temp\Newdata.sav' with same variables >(x,y,z1,z2,z3,z4) but with >the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 >or 4). > >Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 and z4 > >Regards. > > >-----Original Message----- >From: Beadle, ViAnn [mailto:[hidden email]] >Sent: Thursday, August 03, 2006 3:52 PM >To: Edward Boadi; [hidden email] >Subject: RE: Re: 10 most frequent occurring values of a multiple >response set ( REVISITED ) > > >OK, let's try this from a different tack because I don't think anybody >understands what you mean by most frequently occurring categories. Do you >want to count occurrences of values across all four variables so that if >z1 and z2 each have the value 14, that counts for two occurrences of 14? > >Perhaps if you would tell us why you want to do this, we would better >understand your question. Or if you could give us a small set of data for >the 4 variables and tell us what you think are the top 2 values (so you >don't have to provide so much data that we can't read it), we would could >provide more help here. > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Edward Boadi >Sent: Thursday, August 03, 2006 2:29 PM >To: [hidden email] >Subject: Re: 10 most frequent occurring values of a multiple response set >( REVISITED ) > >I wish to express my sincere thanks to the following people : >Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions >(advice and syntax) on the above subject. > >Sorry to say that I have not been able to achieve my desired objective: > >I below is re-statement of what I want to do. > >Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) >, z1(1-15), z2(1-15),z3(1-15), and z4(1-15) > z1, z2 ,z3 and Z4 have identical categories(15). > >I want to do the following: > >1.Identify 10 most frequent occurring categories of Z ( where Z is a >combination of z1, Z2, z3 and z4) >2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not >in 10 most frequent occurring categories > Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the > 10 most frequent occurring categories, set > it to sysmiss >3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 > > >Any help on this task will be very much appreciated. > >Warm regards to all. Research Database Manager and Analyst Melbourne Institute of Applied Economic and Social Research The University of Melbourne Melbourne VIC 3010 Australia New Tel: (03) 8344 2085 New Fax: (03) 8344 2111 http://www.melbourneinstitute.com/hilda/ |
In reply to this post by Edward Boadi
Is this what you need
DATA LIST FREE/x y z1 z2 z3 z4. BEGIN DATA 2 1 1 3 3 5 1 2 4 1 4 3 1 3 4 5 9 1 1 4 5 2 5 4 2 2 5 1 3 1 1 3 2 2 2 5 1 2 1 2 1 1 1 1 9 4 1 1 1 1 2 4 5 1 1 3 1 5 1 1 1 1 2 4 4 4 1 2 2 9 4 4 2 4 5 1 2 3 1 1 1 1 9 2 1 2 5 1 1 3 1 4 5 1 2 1 1 3 1 2 4 4 END DATA. SAVE OUTFILE ='C:\Temp\originaldata.sav'. MULT RESPONSE GROUPS=$z 'the 4 z variables' (z1 z2 z3 z4 (1,9)) /FREQUENCIES=$z . recode z1 to z4 (1,4=copy) (else=0) into newz1 to newz4. MULT RESPONSE GROUPS=$newz 'the 4 z variables' (newz1 newz2 newz3 newz4 (1,9)) /FREQUENCIES=$newz . Art Edward Boadi wrote: >OK, Beadle and List. >Lets consider this data file > >DATA LIST FREE/x y z1 z2 z3 z4. >BEGIN DATA >2 1 1 3 3 5 >1 2 4 1 4 3 >1 3 4 5 9 1 >1 4 5 2 5 4 >2 2 5 1 3 1 >1 3 2 2 2 5 >1 2 1 2 1 1 >1 1 9 4 1 1 >1 1 2 4 5 1 >1 3 1 5 1 1 >1 1 2 4 4 4 >1 2 2 9 4 4 >2 4 5 1 2 3 >1 1 1 1 9 2 >1 2 5 1 1 3 >1 4 5 1 2 1 >1 3 1 2 4 4 >END DATA. > >SAVE OUTFILE ='C:\Temp\originaldata.sav'. > >In the above data file : > >1. The value "1" has the highest occurance accross z1,z2,z3 and z4 >2. The next is the value "4" accross z1,z2,z3 and z4 > >I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with >the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4). > >Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 and z4 > >Regards. > > >-----Original Message----- >From: Beadle, ViAnn [mailto:[hidden email]] >Sent: Thursday, August 03, 2006 3:52 PM >To: Edward Boadi; [hidden email] >Subject: RE: Re: 10 most frequent occurring values of a multiple >response set ( REVISITED ) > > >OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14? > >Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here. > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi >Sent: Thursday, August 03, 2006 2:29 PM >To: [hidden email] >Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) > >I wish to express my sincere thanks to the following people : >Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject. > >Sorry to say that I have not been able to achieve my desired objective: > >I below is re-statement of what I want to do. > >Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) > z1, z2 ,z3 and Z4 have identical categories(15). > >I want to do the following: > >1.Identify 10 most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4) >2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in 10 most frequent occurring categories > Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10 most frequent occurring categories, set > it to sysmiss >3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 > > >Any help on this task will be very much appreciated. > >Warm regards to all. > > > >
Art Kendall
Social Research Consultants |
In reply to this post by Edward Boadi
Thanks a million Simon for your amazing syntax.
It does exactly what I need. Suppose I want to extend the syntax to "top 3, 10,25 etc" values What changes to the syntax is required. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Simon Freidin Sent: Thursday, August 03, 2006 8:45 PM To: [hidden email] Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) varstocases make z from z1 to z4. flip. sel if case_lbl='z'. compute end=$sysmis. do repeat f=f1 to f15 /n=1 to 15. count f = var001 to end (n). end repeat. match file file=*/keep=f1 to f15. do repeat f=f1 to f15 /n=1 to 15. if max(f1 to f15) =f mostfreq1=n. end repeat. do repeat f=f1 to f15 /n=1 to 15. if n=mostfreq1 f=$sysmis. end repeat. do repeat f=f1 to f15 /n=1 to 15. if max(f1 to f15) =f mostfreq2=n. end repeat. exe. write outfile='c:\temp\recodes.sps' /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " mostfreq2 " = " mostfreq2 " ) (else = sysmis )". get file = 'c:\temp\originaldata.sav'. include file = 'c:\temp\recodes.sps'. exe. At 06:29 AM 4/08/2006, you wrote: >OK, Beadle and List. >Lets consider this data file > >DATA LIST FREE/x y z1 z2 z3 z4. >BEGIN DATA >2 1 1 3 3 5 >1 2 4 1 4 3 >1 3 4 5 9 1 >1 4 5 2 5 4 >2 2 5 1 3 1 >1 3 2 2 2 5 >1 2 1 2 1 1 >1 1 9 4 1 1 >1 1 2 4 5 1 >1 3 1 5 1 1 >1 1 2 4 4 4 >1 2 2 9 4 4 >2 4 5 1 2 3 >1 1 1 1 9 2 >1 2 5 1 1 3 >1 4 5 1 2 1 >1 3 1 2 4 4 >END DATA. > >SAVE OUTFILE ='C:\Temp\originaldata.sav'. > >In the above data file : > >1. The value "1" has the highest occurance accross z1,z2,z3 and z4 >2. The next is the value "4" accross z1,z2,z3 and z4 > >I want to create a new data file 'C:\Temp\Newdata.sav' with same variables >(x,y,z1,z2,z3,z4) but with >the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 >or 4). > >Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 and z4 > >Regards. > > >-----Original Message----- >From: Beadle, ViAnn [mailto:[hidden email]] >Sent: Thursday, August 03, 2006 3:52 PM >To: Edward Boadi; [hidden email] >Subject: RE: Re: 10 most frequent occurring values of a multiple >response set ( REVISITED ) > > >OK, let's try this from a different tack because I don't think anybody >understands what you mean by most frequently occurring categories. Do you >want to count occurrences of values across all four variables so that if >z1 and z2 each have the value 14, that counts for two occurrences of 14? > >Perhaps if you would tell us why you want to do this, we would better >understand your question. Or if you could give us a small set of data for >the 4 variables and tell us what you think are the top 2 values (so you >don't have to provide so much data that we can't read it), we would could >provide more help here. > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Edward Boadi >Sent: Thursday, August 03, 2006 2:29 PM >To: [hidden email] >Subject: Re: 10 most frequent occurring values of a multiple response set >( REVISITED ) > >I wish to express my sincere thanks to the following people : >Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions >(advice and syntax) on the above subject. > >Sorry to say that I have not been able to achieve my desired objective: > >I below is re-statement of what I want to do. > >Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) >, z1(1-15), z2(1-15),z3(1-15), and z4(1-15) > z1, z2 ,z3 and Z4 have identical categories(15). > >I want to do the following: > >1.Identify 10 most frequent occurring categories of Z ( where Z is a >combination of z1, Z2, z3 and z4) >2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not >in 10 most frequent occurring categories > Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the > 10 most frequent occurring categories, set > it to sysmiss >3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4 > > >Any help on this task will be very much appreciated. > >Warm regards to all. Research Database Manager and Analyst Melbourne Institute of Applied Economic and Social Research The University of Melbourne Melbourne VIC 3010 Australia New Tel: (03) 8344 2085 New Fax: (03) 8344 2111 http://www.melbourneinstitute.com/hilda/ |
change all the 15s to max of allowed values; and insert, for each
additional value a block like do repeat f=f1 to f15 /n=1 to 15. if n=mostfreq1 f=$sysmis. end repeat. do repeat f=f1 to f15 /n=1 to 15. if max(f1 to f15) =f mostfreq2=n. end repeat. Change the mostfreq var names as you do. For the 3rd value change mostfreq1 -> mostfreq2 and mostfreq2 -> mostfreq3 Then add the additional value to the write: write outfile='c:\temp\recodes.sps' /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " mostfreq2 " = " mostfreq2 " ) ( " mostfreq3 " = " mostfreq3 " ) (else = sysmis )". On 05/08/2006, at 12:33 AM, Edward Boadi wrote: > Thanks a million Simon for your amazing syntax. > It does exactly what I need. > > Suppose I want to extend the syntax to "top 3, 10,25 etc" values > What changes to the syntax is required. > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]]On > Behalf Of > Simon Freidin > Sent: Thursday, August 03, 2006 8:45 PM > To: [hidden email] > Subject: Re: 10 most frequent occurring values of a multiple response > set ( REVISITED ) > > > varstocases make z from z1 to z4. > flip. > sel if case_lbl='z'. > compute end=$sysmis. > do repeat f=f1 to f15 /n=1 to 15. > count f = var001 to end (n). > end repeat. > match file file=*/keep=f1 to f15. > do repeat f=f1 to f15 /n=1 to 15. > if max(f1 to f15) =f mostfreq1=n. > end repeat. > do repeat f=f1 to f15 /n=1 to 15. > if n=mostfreq1 f=$sysmis. > end repeat. > do repeat f=f1 to f15 /n=1 to 15. > if max(f1 to f15) =f mostfreq2=n. > end repeat. > exe. > write outfile='c:\temp\recodes.sps' > /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " > mostfreq2 " = > " mostfreq2 " ) (else = sysmis )". > > get file = 'c:\temp\originaldata.sav'. > include file = 'c:\temp\recodes.sps'. > exe. > > At 06:29 AM 4/08/2006, you wrote: >> OK, Beadle and List. >> Lets consider this data file >> >> DATA LIST FREE/x y z1 z2 z3 z4. >> BEGIN DATA >> 2 1 1 3 3 5 >> 1 2 4 1 4 3 >> 1 3 4 5 9 1 >> 1 4 5 2 5 4 >> 2 2 5 1 3 1 >> 1 3 2 2 2 5 >> 1 2 1 2 1 1 >> 1 1 9 4 1 1 >> 1 1 2 4 5 1 >> 1 3 1 5 1 1 >> 1 1 2 4 4 4 >> 1 2 2 9 4 4 >> 2 4 5 1 2 3 >> 1 1 1 1 9 2 >> 1 2 5 1 1 3 >> 1 4 5 1 2 1 >> 1 3 1 2 4 4 >> END DATA. >> >> SAVE OUTFILE ='C:\Temp\originaldata.sav'. >> >> In the above data file : >> >> 1. The value "1" has the highest occurance accross z1,z2,z3 and z4 >> 2. The next is the value "4" accross z1,z2,z3 and z4 >> >> I want to create a new data file 'C:\Temp\Newdata.sav' with same >> variables >> (x,y,z1,z2,z3,z4) but with >> the values of z1,z2,z3 and z4 set to sysmis EXCEPT when >> (z1,z2,z3,z4) = (1 >> or 4). >> >> Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 >> and z4 >> >> Regards. >> >> >> -----Original Message----- >> From: Beadle, ViAnn [mailto:[hidden email]] >> Sent: Thursday, August 03, 2006 3:52 PM >> To: Edward Boadi; [hidden email] >> Subject: RE: Re: 10 most frequent occurring values of a multiple >> response set ( REVISITED ) >> >> >> OK, let's try this from a different tack because I don't think >> anybody >> understands what you mean by most frequently occurring categories. >> Do you >> want to count occurrences of values across all four variables so >> that if >> z1 and z2 each have the value 14, that counts for two occurrences >> of 14? >> >> Perhaps if you would tell us why you want to do this, we would better >> understand your question. Or if you could give us a small set of >> data for >> the 4 variables and tell us what you think are the top 2 values >> (so you >> don't have to provide so much data that we can't read it), we >> would could >> provide more help here. >> >> -----Original Message----- >> From: SPSSX(r) Discussion [mailto:[hidden email]] On >> Behalf Of >> Edward Boadi >> Sent: Thursday, August 03, 2006 2:29 PM >> To: [hidden email] >> Subject: Re: 10 most frequent occurring values of a multiple >> response set >> ( REVISITED ) >> >> I wish to express my sincere thanks to the following people : >> Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions >> (advice and syntax) on the above subject. >> >> Sorry to say that I have not been able to achieve my desired >> objective: >> >> I below is re-statement of what I want to do. >> >> Giving a data file c:\Temp\OriginalData.sav with variables x >> (1-4) ,y(1-3) >> , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) >> z1, z2 ,z3 and Z4 have identical categories(15). >> >> I want to do the following: >> >> 1.Identify 10 most frequent occurring categories of Z ( where Z is a >> combination of z1, Z2, z3 and z4) >> 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not >> in 10 most frequent occurring categories >> Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in >> the >> 10 most frequent occurring categories, set >> it to sysmiss >> 3. Save the new file as c:\Temp\NewData.sav with variables >> x,y,z1,z2,z3 and z4 >> >> >> Any help on this task will be very much appreciated. >> >> Warm regards to all. > > > Research Database Manager and Analyst > Melbourne Institute of Applied Economic and Social Research > The University of Melbourne > Melbourne VIC 3010 Australia > New Tel: (03) 8344 2085 New Fax: (03) 8344 2111 > http://www.melbourneinstitute.com/hilda/ |
In reply to this post by Edward Boadi
Thanks Simon, you are a star.
Your first syntax and the modification suggested works the way I want. Thanks again to every one who contributed to this topic. Warm regards to all. Edward -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Simon Freidin Sent: Friday, August 04, 2006 10:35 PM To: [hidden email] Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED ) change all the 15s to max of allowed values; and insert, for each additional value a block like do repeat f=f1 to f15 /n=1 to 15. if n=mostfreq1 f=$sysmis. end repeat. do repeat f=f1 to f15 /n=1 to 15. if max(f1 to f15) =f mostfreq2=n. end repeat. Change the mostfreq var names as you do. For the 3rd value change mostfreq1 -> mostfreq2 and mostfreq2 -> mostfreq3 Then add the additional value to the write: write outfile='c:\temp\recodes.sps' /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " mostfreq2 " = " mostfreq2 " ) ( " mostfreq3 " = " mostfreq3 " ) (else = sysmis )". On 05/08/2006, at 12:33 AM, Edward Boadi wrote: > Thanks a million Simon for your amazing syntax. > It does exactly what I need. > > Suppose I want to extend the syntax to "top 3, 10,25 etc" values > What changes to the syntax is required. > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]]On > Behalf Of > Simon Freidin > Sent: Thursday, August 03, 2006 8:45 PM > To: [hidden email] > Subject: Re: 10 most frequent occurring values of a multiple response > set ( REVISITED ) > > > varstocases make z from z1 to z4. > flip. > sel if case_lbl='z'. > compute end=$sysmis. > do repeat f=f1 to f15 /n=1 to 15. > count f = var001 to end (n). > end repeat. > match file file=*/keep=f1 to f15. > do repeat f=f1 to f15 /n=1 to 15. > if max(f1 to f15) =f mostfreq1=n. > end repeat. > do repeat f=f1 to f15 /n=1 to 15. > if n=mostfreq1 f=$sysmis. > end repeat. > do repeat f=f1 to f15 /n=1 to 15. > if max(f1 to f15) =f mostfreq2=n. > end repeat. > exe. > write outfile='c:\temp\recodes.sps' > /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " > mostfreq2 " = > " mostfreq2 " ) (else = sysmis )". > > get file = 'c:\temp\originaldata.sav'. > include file = 'c:\temp\recodes.sps'. > exe. > > At 06:29 AM 4/08/2006, you wrote: >> OK, Beadle and List. >> Lets consider this data file >> >> DATA LIST FREE/x y z1 z2 z3 z4. >> BEGIN DATA >> 2 1 1 3 3 5 >> 1 2 4 1 4 3 >> 1 3 4 5 9 1 >> 1 4 5 2 5 4 >> 2 2 5 1 3 1 >> 1 3 2 2 2 5 >> 1 2 1 2 1 1 >> 1 1 9 4 1 1 >> 1 1 2 4 5 1 >> 1 3 1 5 1 1 >> 1 1 2 4 4 4 >> 1 2 2 9 4 4 >> 2 4 5 1 2 3 >> 1 1 1 1 9 2 >> 1 2 5 1 1 3 >> 1 4 5 1 2 1 >> 1 3 1 2 4 4 >> END DATA. >> >> SAVE OUTFILE ='C:\Temp\originaldata.sav'. >> >> In the above data file : >> >> 1. The value "1" has the highest occurance accross z1,z2,z3 and z4 >> 2. The next is the value "4" accross z1,z2,z3 and z4 >> >> I want to create a new data file 'C:\Temp\Newdata.sav' with same >> variables >> (x,y,z1,z2,z3,z4) but with >> the values of z1,z2,z3 and z4 set to sysmis EXCEPT when >> (z1,z2,z3,z4) = (1 >> or 4). >> >> Thus I want to keep the top 2 values (1 and 4) accross z1,z2,z3 >> and z4 >> >> Regards. >> >> >> -----Original Message----- >> From: Beadle, ViAnn [mailto:[hidden email]] >> Sent: Thursday, August 03, 2006 3:52 PM >> To: Edward Boadi; [hidden email] >> Subject: RE: Re: 10 most frequent occurring values of a multiple >> response set ( REVISITED ) >> >> >> OK, let's try this from a different tack because I don't think >> anybody >> understands what you mean by most frequently occurring categories. >> Do you >> want to count occurrences of values across all four variables so >> that if >> z1 and z2 each have the value 14, that counts for two occurrences >> of 14? >> >> Perhaps if you would tell us why you want to do this, we would better >> understand your question. Or if you could give us a small set of >> data for >> the 4 variables and tell us what you think are the top 2 values >> (so you >> don't have to provide so much data that we can't read it), we >> would could >> provide more help here. >> >> -----Original Message----- >> From: SPSSX(r) Discussion [mailto:[hidden email]] On >> Behalf Of >> Edward Boadi >> Sent: Thursday, August 03, 2006 2:29 PM >> To: [hidden email] >> Subject: Re: 10 most frequent occurring values of a multiple >> response set >> ( REVISITED ) >> >> I wish to express my sincere thanks to the following people : >> Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions >> (advice and syntax) on the above subject. >> >> Sorry to say that I have not been able to achieve my desired >> objective: >> >> I below is re-statement of what I want to do. >> >> Giving a data file c:\Temp\OriginalData.sav with variables x >> (1-4) ,y(1-3) >> , z1(1-15), z2(1-15),z3(1-15), and z4(1-15) >> z1, z2 ,z3 and Z4 have identical categories(15). >> >> I want to do the following: >> >> 1.Identify 10 most frequent occurring categories of Z ( where Z is a >> combination of z1, Z2, z3 and z4) >> 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not >> in 10 most frequent occurring categories >> Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in >> the >> 10 most frequent occurring categories, set >> it to sysmiss >> 3. Save the new file as c:\Temp\NewData.sav with variables >> x,y,z1,z2,z3 and z4 >> >> >> Any help on this task will be very much appreciated. >> >> Warm regards to all. > > > Research Database Manager and Analyst > Melbourne Institute of Applied Economic and Social Research > The University of Melbourne > Melbourne VIC 3010 Australia > New Tel: (03) 8344 2085 New Fax: (03) 8344 2111 > http://www.melbourneinstitute.com/hilda/ |
Free forum by Nabble | Edit this page |