Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Edward Boadi
I wish to express my sincere thanks to the following people :
Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject.

Sorry to say that I have not been able to achieve my desired objective:

I below is re-statement of what I want to do.

Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
 z1, z2 ,z3 and Z4 have identical categories(15).

I want to do the following:

1.Identify 10  most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4)
2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in  10  most frequent occurring categories
 Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10  most frequent occurring categories, set
 it to sysmiss
3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4


Any help on this task will be very much appreciated.

Warm regards to all.
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Beadle, ViAnn
OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14?

Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi
Sent: Thursday, August 03, 2006 2:29 PM
To: [hidden email]
Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

I wish to express my sincere thanks to the following people :
Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject.

Sorry to say that I have not been able to achieve my desired objective:

I below is re-statement of what I want to do.

Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
 z1, z2 ,z3 and Z4 have identical categories(15).

I want to do the following:

1.Identify 10  most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4)
2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in  10  most frequent occurring categories
 Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10  most frequent occurring categories, set
 it to sysmiss
3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4


Any help on this task will be very much appreciated.

Warm regards to all.
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Edward Boadi
In reply to this post by Edward Boadi
OK, Beadle and List.
Lets consider this data file

DATA LIST FREE/x y  z1 z2 z3 z4.
BEGIN DATA
2       1       1       3       3       5
1       2       4       1       4       3
1       3       4       5       9       1
1       4       5       2       5       4
2       2       5       1       3       1
1       3       2       2       2       5
1       2       1       2       1       1
1       1       9       4       1       1
1       1       2       4       5       1
1       3       1       5       1       1
1       1       2       4       4       4
1       2       2       9       4       4
2       4       5       1       2       3
1       1       1       1       9       2
1       2       5       1       1       3
1       4       5       1       2       1
1     3     1     2     4     4
END DATA.

SAVE OUTFILE  ='C:\Temp\originaldata.sav'.

In the above data file :

1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
2. The next  is the value "4" accross  z1,z2,z3 and z4

I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with
the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4).

Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3 and z4

Regards.


-----Original Message-----
From: Beadle, ViAnn [mailto:[hidden email]]
Sent: Thursday, August 03, 2006 3:52 PM
To: Edward Boadi; [hidden email]
Subject: RE: Re: 10 most frequent occurring values of a multiple
response set ( REVISITED )


OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14?

Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi
Sent: Thursday, August 03, 2006 2:29 PM
To: [hidden email]
Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

I wish to express my sincere thanks to the following people :
Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject.

Sorry to say that I have not been able to achieve my desired objective:

I below is re-statement of what I want to do.

Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
 z1, z2 ,z3 and Z4 have identical categories(15).

I want to do the following:

1.Identify 10  most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4)
2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in  10  most frequent occurring categories
 Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10  most frequent occurring categories, set
 it to sysmiss
3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4


Any help on this task will be very much appreciated.

Warm regards to all.
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Beadle, ViAnn
In reply to this post by Edward Boadi
Use the mult response procedure to tabulate your 4 z variables together as a multiple response set.

MULT RESPONSE  GROUPS=z  (z1 z2 z3 z4 (1,15))
   /FREQUENCIES=z.

Look at the table and find the top ten. Use recode to recode all other values to sysmis.

-----Original Message-----
From: Edward Boadi [mailto:[hidden email]]
Sent: Thursday, August 03, 2006 3:29 PM
To: Beadle, ViAnn; [hidden email]
Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

OK, Beadle and List.
Lets consider this data file

DATA LIST FREE/x y  z1 z2 z3 z4.
BEGIN DATA
2       1       1       3       3       5
1       2       4       1       4       3
1       3       4       5       9       1
1       4       5       2       5       4
2       2       5       1       3       1
1       3       2       2       2       5
1       2       1       2       1       1
1       1       9       4       1       1
1       1       2       4       5       1
1       3       1       5       1       1
1       1       2       4       4       4
1       2       2       9       4       4
2       4       5       1       2       3
1       1       1       1       9       2
1       2       5       1       1       3
1       4       5       1       2       1
1     3     1     2     4     4
END DATA.

SAVE OUTFILE  ='C:\Temp\originaldata.sav'.

In the above data file :

1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
2. The next  is the value "4" accross  z1,z2,z3 and z4

I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with
the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4).

Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3 and z4

Regards.


-----Original Message-----
From: Beadle, ViAnn [mailto:[hidden email]]
Sent: Thursday, August 03, 2006 3:52 PM
To: Edward Boadi; [hidden email]
Subject: RE: Re: 10 most frequent occurring values of a multiple
response set ( REVISITED )


OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14?

Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi
Sent: Thursday, August 03, 2006 2:29 PM
To: [hidden email]
Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

I wish to express my sincere thanks to the following people :
Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject.

Sorry to say that I have not been able to achieve my desired objective:

I below is re-statement of what I want to do.

Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
 z1, z2 ,z3 and Z4 have identical categories(15).

I want to do the following:

1.Identify 10  most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4)
2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in  10  most frequent occurring categories
 Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10  most frequent occurring categories, set
 it to sysmiss
3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4


Any help on this task will be very much appreciated.

Warm regards to all.
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Edward Boadi
In reply to this post by Edward Boadi
Thanks Beadle,
Is there a way to automate the whole process without having to look at the
table created by :

MULT RESPONSE  GROUPS=z  (z1 z2 z3 z4 (1,15))
   /FREQUENCIES=z.

Thanks:


-----Original Message-----
From: Beadle, ViAnn [mailto:[hidden email]]
Sent: Thursday, August 03, 2006 4:42 PM
To: Edward Boadi; [hidden email]
Subject: RE: Re: 10 most frequent occurring values of a multiple
response set ( REVISITED )


Use the mult response procedure to tabulate your 4 z variables together as a multiple response set.

MULT RESPONSE  GROUPS=z  (z1 z2 z3 z4 (1,15))
   /FREQUENCIES=z.

Look at the table and find the top ten. Use recode to recode all other values to sysmis.

-----Original Message-----
From: Edward Boadi [mailto:[hidden email]]
Sent: Thursday, August 03, 2006 3:29 PM
To: Beadle, ViAnn; [hidden email]
Subject: RE: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

OK, Beadle and List.
Lets consider this data file

DATA LIST FREE/x y  z1 z2 z3 z4.
BEGIN DATA
2       1       1       3       3       5
1       2       4       1       4       3
1       3       4       5       9       1
1       4       5       2       5       4
2       2       5       1       3       1
1       3       2       2       2       5
1       2       1       2       1       1
1       1       9       4       1       1
1       1       2       4       5       1
1       3       1       5       1       1
1       1       2       4       4       4
1       2       2       9       4       4
2       4       5       1       2       3
1       1       1       1       9       2
1       2       5       1       1       3
1       4       5       1       2       1
1     3     1     2     4     4
END DATA.

SAVE OUTFILE  ='C:\Temp\originaldata.sav'.

In the above data file :

1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
2. The next  is the value "4" accross  z1,z2,z3 and z4

I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with
the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4).

Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3 and z4

Regards.


-----Original Message-----
From: Beadle, ViAnn [mailto:[hidden email]]
Sent: Thursday, August 03, 2006 3:52 PM
To: Edward Boadi; [hidden email]
Subject: RE: Re: 10 most frequent occurring values of a multiple
response set ( REVISITED )


OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14?

Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi
Sent: Thursday, August 03, 2006 2:29 PM
To: [hidden email]
Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

I wish to express my sincere thanks to the following people :
Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject.

Sorry to say that I have not been able to achieve my desired objective:

I below is re-statement of what I want to do.

Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
 z1, z2 ,z3 and Z4 have identical categories(15).

I want to do the following:

1.Identify 10  most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4)
2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in  10  most frequent occurring categories
 Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10  most frequent occurring categories, set
 it to sysmiss
3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4


Any help on this task will be very much appreciated.

Warm regards to all.
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Simon Phillip Freidin
In reply to this post by Edward Boadi
varstocases make z from z1 to z4.
flip.
sel if case_lbl='z'.
compute end=$sysmis.
do repeat f=f1 to f15 /n=1 to 15.
count f = var001 to end (n).
end repeat.
match file file=*/keep=f1 to f15.
do repeat f=f1 to f15 /n=1 to 15.
if max(f1 to f15) =f mostfreq1=n.
end repeat.
do repeat f=f1 to f15 /n=1 to 15.
if n=mostfreq1 f=$sysmis.
end repeat.
do repeat f=f1 to f15 /n=1 to 15.
if max(f1 to f15) =f mostfreq2=n.
end repeat.
exe.
write outfile='c:\temp\recodes.sps'
  /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " mostfreq2 " =
" mostfreq2 " ) (else = sysmis )".

get file = 'c:\temp\originaldata.sav'.
include file = 'c:\temp\recodes.sps'.
exe.

At 06:29 AM 4/08/2006, you wrote:

>OK, Beadle and List.
>Lets consider this data file
>
>DATA LIST FREE/x y  z1 z2 z3 z4.
>BEGIN DATA
>2       1       1       3       3       5
>1       2       4       1       4       3
>1       3       4       5       9       1
>1       4       5       2       5       4
>2       2       5       1       3       1
>1       3       2       2       2       5
>1       2       1       2       1       1
>1       1       9       4       1       1
>1       1       2       4       5       1
>1       3       1       5       1       1
>1       1       2       4       4       4
>1       2       2       9       4       4
>2       4       5       1       2       3
>1       1       1       1       9       2
>1       2       5       1       1       3
>1       4       5       1       2       1
>1     3     1     2     4     4
>END DATA.
>
>SAVE OUTFILE  ='C:\Temp\originaldata.sav'.
>
>In the above data file :
>
>1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
>2. The next  is the value "4" accross  z1,z2,z3 and z4
>
>I want to create a new data file 'C:\Temp\Newdata.sav' with same variables
>(x,y,z1,z2,z3,z4) but with
>the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1
>or 4).
>
>Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3 and z4
>
>Regards.
>
>
>-----Original Message-----
>From: Beadle, ViAnn [mailto:[hidden email]]
>Sent: Thursday, August 03, 2006 3:52 PM
>To: Edward Boadi; [hidden email]
>Subject: RE: Re: 10 most frequent occurring values of a multiple
>response set ( REVISITED )
>
>
>OK, let's try this from a different tack because I don't think anybody
>understands what you mean by most frequently occurring categories. Do you
>want to count occurrences of values across all four variables so that if
>z1 and z2 each have the value 14, that counts for two occurrences of 14?
>
>Perhaps if you would tell us why you want to do this, we would better
>understand your question. Or if you could give us a small set of data for
>the 4 variables and tell us what you think are the top 2 values (so you
>don't have to provide so much data that we can't read it), we would could
>provide more help here.
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Edward Boadi
>Sent: Thursday, August 03, 2006 2:29 PM
>To: [hidden email]
>Subject: Re: 10 most frequent occurring values of a multiple response set
>( REVISITED )
>
>I wish to express my sincere thanks to the following people :
>Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions
>(advice and syntax) on the above subject.
>
>Sorry to say that I have not been able to achieve my desired objective:
>
>I below is re-statement of what I want to do.
>
>Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3)
>, z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
>  z1, z2 ,z3 and Z4 have identical categories(15).
>
>I want to do the following:
>
>1.Identify 10  most frequent occurring categories of Z ( where Z is a
>combination of z1, Z2, z3 and z4)
>2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not
>in  10  most frequent occurring categories
>  Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the
> 10  most frequent occurring categories, set
>  it to sysmiss
>3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4
>
>
>Any help on this task will be very much appreciated.
>
>Warm regards to all.


Research Database Manager and Analyst
Melbourne Institute of Applied Economic and Social Research
The University of Melbourne
Melbourne VIC 3010 Australia
New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
http://www.melbourneinstitute.com/hilda/
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Art Kendall
In reply to this post by Edward Boadi
Is this what you need

DATA LIST FREE/x y  z1 z2 z3 z4.
BEGIN DATA
2       1       1       3       3       5
1       2       4       1       4       3
1       3       4       5       9       1
1       4       5       2       5       4
2       2       5       1       3       1
1       3       2       2       2       5
1       2       1       2       1       1
1       1       9       4       1       1
1       1       2       4       5       1
1       3       1       5       1       1
1       1       2       4       4       4
1       2       2       9       4       4
2       4       5       1       2       3
1       1       1       1       9       2
1       2       5       1       1       3
1       4       5       1       2       1
1       3       1       2       4       4
END DATA.

SAVE OUTFILE  ='C:\Temp\originaldata.sav'.

MULT RESPONSE
  GROUPS=$z 'the 4 z variables' (z1 z2 z3 z4 (1,9))
  /FREQUENCIES=$z  .

recode z1 to z4 (1,4=copy) (else=0) into newz1 to newz4.

MULT RESPONSE
  GROUPS=$newz 'the 4 z variables' (newz1 newz2 newz3 newz4 (1,9))
  /FREQUENCIES=$newz  .


Art

Edward Boadi wrote:

>OK, Beadle and List.
>Lets consider this data file
>
>DATA LIST FREE/x y  z1 z2 z3 z4.
>BEGIN DATA
>2       1       1       3       3       5
>1       2       4       1       4       3
>1       3       4       5       9       1
>1       4       5       2       5       4
>2       2       5       1       3       1
>1       3       2       2       2       5
>1       2       1       2       1       1
>1       1       9       4       1       1
>1       1       2       4       5       1
>1       3       1       5       1       1
>1       1       2       4       4       4
>1       2       2       9       4       4
>2       4       5       1       2       3
>1       1       1       1       9       2
>1       2       5       1       1       3
>1       4       5       1       2       1
>1     3     1     2     4     4
>END DATA.
>
>SAVE OUTFILE  ='C:\Temp\originaldata.sav'.
>
>In the above data file :
>
>1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
>2. The next  is the value "4" accross  z1,z2,z3 and z4
>
>I want to create a new data file 'C:\Temp\Newdata.sav' with same variables (x,y,z1,z2,z3,z4) but with
>the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1 or 4).
>
>Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3 and z4
>
>Regards.
>
>
>-----Original Message-----
>From: Beadle, ViAnn [mailto:[hidden email]]
>Sent: Thursday, August 03, 2006 3:52 PM
>To: Edward Boadi; [hidden email]
>Subject: RE: Re: 10 most frequent occurring values of a multiple
>response set ( REVISITED )
>
>
>OK, let's try this from a different tack because I don't think anybody understands what you mean by most frequently occurring categories. Do you want to count occurrences of values across all four variables so that if z1 and z2 each have the value 14, that counts for two occurrences of 14?
>
>Perhaps if you would tell us why you want to do this, we would better understand your question. Or if you could give us a small set of data for the 4 variables and tell us what you think are the top 2 values (so you don't have to provide so much data that we can't read it), we would could provide more help here.
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Edward Boadi
>Sent: Thursday, August 03, 2006 2:29 PM
>To: [hidden email]
>Subject: Re: 10 most frequent occurring values of a multiple response set ( REVISITED )
>
>I wish to express my sincere thanks to the following people :
>Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions (advice and syntax) on the above subject.
>
>Sorry to say that I have not been able to achieve my desired objective:
>
>I below is re-statement of what I want to do.
>
>Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3) , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
> z1, z2 ,z3 and Z4 have identical categories(15).
>
>I want to do the following:
>
>1.Identify 10  most frequent occurring categories of Z ( where Z is a combination of z1, Z2, z3 and z4)
>2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not in  10  most frequent occurring categories
> Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the 10  most frequent occurring categories, set
> it to sysmiss
>3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4
>
>
>Any help on this task will be very much appreciated.
>
>Warm regards to all.
>
>
>
>
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Edward Boadi
In reply to this post by Edward Boadi
Thanks a million Simon for your amazing syntax.
It does exactly what I need.

Suppose I want to extend the syntax to "top 3, 10,25 etc"  values What changes to the syntax is required.



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Simon Freidin
Sent: Thursday, August 03, 2006 8:45 PM
To: [hidden email]
Subject: Re: 10 most frequent occurring values of a multiple response
set ( REVISITED )


varstocases make z from z1 to z4.
flip.
sel if case_lbl='z'.
compute end=$sysmis.
do repeat f=f1 to f15 /n=1 to 15.
count f = var001 to end (n).
end repeat.
match file file=*/keep=f1 to f15.
do repeat f=f1 to f15 /n=1 to 15.
if max(f1 to f15) =f mostfreq1=n.
end repeat.
do repeat f=f1 to f15 /n=1 to 15.
if n=mostfreq1 f=$sysmis.
end repeat.
do repeat f=f1 to f15 /n=1 to 15.
if max(f1 to f15) =f mostfreq2=n.
end repeat.
exe.
write outfile='c:\temp\recodes.sps'
  /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( " mostfreq2 " =
" mostfreq2 " ) (else = sysmis )".

get file = 'c:\temp\originaldata.sav'.
include file = 'c:\temp\recodes.sps'.
exe.

At 06:29 AM 4/08/2006, you wrote:

>OK, Beadle and List.
>Lets consider this data file
>
>DATA LIST FREE/x y  z1 z2 z3 z4.
>BEGIN DATA
>2       1       1       3       3       5
>1       2       4       1       4       3
>1       3       4       5       9       1
>1       4       5       2       5       4
>2       2       5       1       3       1
>1       3       2       2       2       5
>1       2       1       2       1       1
>1       1       9       4       1       1
>1       1       2       4       5       1
>1       3       1       5       1       1
>1       1       2       4       4       4
>1       2       2       9       4       4
>2       4       5       1       2       3
>1       1       1       1       9       2
>1       2       5       1       1       3
>1       4       5       1       2       1
>1     3     1     2     4     4
>END DATA.
>
>SAVE OUTFILE  ='C:\Temp\originaldata.sav'.
>
>In the above data file :
>
>1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
>2. The next  is the value "4" accross  z1,z2,z3 and z4
>
>I want to create a new data file 'C:\Temp\Newdata.sav' with same variables
>(x,y,z1,z2,z3,z4) but with
>the values of z1,z2,z3 and z4 set to sysmis EXCEPT when (z1,z2,z3,z4) = (1
>or 4).
>
>Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3 and z4
>
>Regards.
>
>
>-----Original Message-----
>From: Beadle, ViAnn [mailto:[hidden email]]
>Sent: Thursday, August 03, 2006 3:52 PM
>To: Edward Boadi; [hidden email]
>Subject: RE: Re: 10 most frequent occurring values of a multiple
>response set ( REVISITED )
>
>
>OK, let's try this from a different tack because I don't think anybody
>understands what you mean by most frequently occurring categories. Do you
>want to count occurrences of values across all four variables so that if
>z1 and z2 each have the value 14, that counts for two occurrences of 14?
>
>Perhaps if you would tell us why you want to do this, we would better
>understand your question. Or if you could give us a small set of data for
>the 4 variables and tell us what you think are the top 2 values (so you
>don't have to provide so much data that we can't read it), we would could
>provide more help here.
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Edward Boadi
>Sent: Thursday, August 03, 2006 2:29 PM
>To: [hidden email]
>Subject: Re: 10 most frequent occurring values of a multiple response set
>( REVISITED )
>
>I wish to express my sincere thanks to the following people :
>Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions
>(advice and syntax) on the above subject.
>
>Sorry to say that I have not been able to achieve my desired objective:
>
>I below is re-statement of what I want to do.
>
>Giving a data file c:\Temp\OriginalData.sav with variables x(1-4) ,y(1-3)
>, z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
>  z1, z2 ,z3 and Z4 have identical categories(15).
>
>I want to do the following:
>
>1.Identify 10  most frequent occurring categories of Z ( where Z is a
>combination of z1, Z2, z3 and z4)
>2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not
>in  10  most frequent occurring categories
>  Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in the
> 10  most frequent occurring categories, set
>  it to sysmiss
>3. Save the new file as c:\Temp\NewData.sav with variables x,y,z1,z2,z3 and z4
>
>
>Any help on this task will be very much appreciated.
>
>Warm regards to all.


Research Database Manager and Analyst
Melbourne Institute of Applied Economic and Social Research
The University of Melbourne
Melbourne VIC 3010 Australia
New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
http://www.melbourneinstitute.com/hilda/
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Simon Phillip Freidin
change all the 15s to max of allowed values; and insert, for each
additional value a block like

do repeat f=f1 to f15 /n=1 to 15.
if n=mostfreq1 f=$sysmis.
end repeat.
do repeat f=f1 to f15 /n=1 to 15.
if max(f1 to f15) =f mostfreq2=n.
end repeat.

Change the mostfreq var names as you do. For the 3rd value change
mostfreq1 -> mostfreq2 and mostfreq2 -> mostfreq3

Then add the additional value to the write:

write outfile='c:\temp\recodes.sps'
   /"recode z1 z2 z3 z4 ( "
   mostfreq1 " = " mostfreq1
   " ) ( "
   mostfreq2 " =  " mostfreq2
   " ) ( "
   mostfreq3 " =  " mostfreq3
   " ) (else = sysmis )".


On 05/08/2006, at 12:33 AM, Edward Boadi wrote:

> Thanks a million Simon for your amazing syntax.
> It does exactly what I need.
>
> Suppose I want to extend the syntax to "top 3, 10,25 etc"  values
> What changes to the syntax is required.
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]On
> Behalf Of
> Simon Freidin
> Sent: Thursday, August 03, 2006 8:45 PM
> To: [hidden email]
> Subject: Re: 10 most frequent occurring values of a multiple response
> set ( REVISITED )
>
>
> varstocases make z from z1 to z4.
> flip.
> sel if case_lbl='z'.
> compute end=$sysmis.
> do repeat f=f1 to f15 /n=1 to 15.
> count f = var001 to end (n).
> end repeat.
> match file file=*/keep=f1 to f15.
> do repeat f=f1 to f15 /n=1 to 15.
> if max(f1 to f15) =f mostfreq1=n.
> end repeat.
> do repeat f=f1 to f15 /n=1 to 15.
> if n=mostfreq1 f=$sysmis.
> end repeat.
> do repeat f=f1 to f15 /n=1 to 15.
> if max(f1 to f15) =f mostfreq2=n.
> end repeat.
> exe.
> write outfile='c:\temp\recodes.sps'
>   /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( "
> mostfreq2 " =
> " mostfreq2 " ) (else = sysmis )".
>
> get file = 'c:\temp\originaldata.sav'.
> include file = 'c:\temp\recodes.sps'.
> exe.
>
> At 06:29 AM 4/08/2006, you wrote:
>> OK, Beadle and List.
>> Lets consider this data file
>>
>> DATA LIST FREE/x y  z1 z2 z3 z4.
>> BEGIN DATA
>> 2       1       1       3       3       5
>> 1       2       4       1       4       3
>> 1       3       4       5       9       1
>> 1       4       5       2       5       4
>> 2       2       5       1       3       1
>> 1       3       2       2       2       5
>> 1       2       1       2       1       1
>> 1       1       9       4       1       1
>> 1       1       2       4       5       1
>> 1       3       1       5       1       1
>> 1       1       2       4       4       4
>> 1       2       2       9       4       4
>> 2       4       5       1       2       3
>> 1       1       1       1       9       2
>> 1       2       5       1       1       3
>> 1       4       5       1       2       1
>> 1     3     1     2     4     4
>> END DATA.
>>
>> SAVE OUTFILE  ='C:\Temp\originaldata.sav'.
>>
>> In the above data file :
>>
>> 1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
>> 2. The next  is the value "4" accross  z1,z2,z3 and z4
>>
>> I want to create a new data file 'C:\Temp\Newdata.sav' with same
>> variables
>> (x,y,z1,z2,z3,z4) but with
>> the values of z1,z2,z3 and z4 set to sysmis EXCEPT when
>> (z1,z2,z3,z4) = (1
>> or 4).
>>
>> Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3
>> and z4
>>
>> Regards.
>>
>>
>> -----Original Message-----
>> From: Beadle, ViAnn [mailto:[hidden email]]
>> Sent: Thursday, August 03, 2006 3:52 PM
>> To: Edward Boadi; [hidden email]
>> Subject: RE: Re: 10 most frequent occurring values of a multiple
>> response set ( REVISITED )
>>
>>
>> OK, let's try this from a different tack because I don't think
>> anybody
>> understands what you mean by most frequently occurring categories.
>> Do you
>> want to count occurrences of values across all four variables so
>> that if
>> z1 and z2 each have the value 14, that counts for two occurrences
>> of 14?
>>
>> Perhaps if you would tell us why you want to do this, we would better
>> understand your question. Or if you could give us a small set of
>> data for
>> the 4 variables and tell us what you think are the top 2 values
>> (so you
>> don't have to provide so much data that we can't read it), we
>> would could
>> provide more help here.
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On
>> Behalf Of
>> Edward Boadi
>> Sent: Thursday, August 03, 2006 2:29 PM
>> To: [hidden email]
>> Subject: Re: 10 most frequent occurring values of a multiple
>> response set
>> ( REVISITED )
>>
>> I wish to express my sincere thanks to the following people :
>> Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions
>> (advice and syntax) on the above subject.
>>
>> Sorry to say that I have not been able to achieve my desired
>> objective:
>>
>> I below is re-statement of what I want to do.
>>
>> Giving a data file c:\Temp\OriginalData.sav with variables x
>> (1-4) ,y(1-3)
>> , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
>>  z1, z2 ,z3 and Z4 have identical categories(15).
>>
>> I want to do the following:
>>
>> 1.Identify 10  most frequent occurring categories of Z ( where Z is a
>> combination of z1, Z2, z3 and z4)
>> 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not
>> in  10  most frequent occurring categories
>>  Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in
>> the
>> 10  most frequent occurring categories, set
>>  it to sysmiss
>> 3. Save the new file as c:\Temp\NewData.sav with variables
>> x,y,z1,z2,z3 and z4
>>
>>
>> Any help on this task will be very much appreciated.
>>
>> Warm regards to all.
>
>
> Research Database Manager and Analyst
> Melbourne Institute of Applied Economic and Social Research
> The University of Melbourne
> Melbourne VIC 3010 Australia
> New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
> http://www.melbourneinstitute.com/hilda/
Reply | Threaded
Open this post in threaded view
|

Re: 10 most frequent occurring values of a multiple response set ( REVISITED )

Edward Boadi
In reply to this post by Edward Boadi
Thanks Simon, you are a star.
Your first syntax and the modification suggested works the way I want.

Thanks again to every one who contributed to this topic.

Warm regards to all.

Edward


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Simon Freidin
Sent: Friday, August 04, 2006 10:35 PM
To: [hidden email]
Subject: Re: 10 most frequent occurring values of a multiple response
set ( REVISITED )


change all the 15s to max of allowed values; and insert, for each
additional value a block like

do repeat f=f1 to f15 /n=1 to 15.
if n=mostfreq1 f=$sysmis.
end repeat.
do repeat f=f1 to f15 /n=1 to 15.
if max(f1 to f15) =f mostfreq2=n.
end repeat.

Change the mostfreq var names as you do. For the 3rd value change
mostfreq1 -> mostfreq2 and mostfreq2 -> mostfreq3

Then add the additional value to the write:

write outfile='c:\temp\recodes.sps'
   /"recode z1 z2 z3 z4 ( "
   mostfreq1 " = " mostfreq1
   " ) ( "
   mostfreq2 " =  " mostfreq2
   " ) ( "
   mostfreq3 " =  " mostfreq3
   " ) (else = sysmis )".


On 05/08/2006, at 12:33 AM, Edward Boadi wrote:

> Thanks a million Simon for your amazing syntax.
> It does exactly what I need.
>
> Suppose I want to extend the syntax to "top 3, 10,25 etc"  values
> What changes to the syntax is required.
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]On
> Behalf Of
> Simon Freidin
> Sent: Thursday, August 03, 2006 8:45 PM
> To: [hidden email]
> Subject: Re: 10 most frequent occurring values of a multiple response
> set ( REVISITED )
>
>
> varstocases make z from z1 to z4.
> flip.
> sel if case_lbl='z'.
> compute end=$sysmis.
> do repeat f=f1 to f15 /n=1 to 15.
> count f = var001 to end (n).
> end repeat.
> match file file=*/keep=f1 to f15.
> do repeat f=f1 to f15 /n=1 to 15.
> if max(f1 to f15) =f mostfreq1=n.
> end repeat.
> do repeat f=f1 to f15 /n=1 to 15.
> if n=mostfreq1 f=$sysmis.
> end repeat.
> do repeat f=f1 to f15 /n=1 to 15.
> if max(f1 to f15) =f mostfreq2=n.
> end repeat.
> exe.
> write outfile='c:\temp\recodes.sps'
>   /"recode z1 z2 z3 z4 ( " mostfreq1 " = " mostfreq1 " ) ( "
> mostfreq2 " =
> " mostfreq2 " ) (else = sysmis )".
>
> get file = 'c:\temp\originaldata.sav'.
> include file = 'c:\temp\recodes.sps'.
> exe.
>
> At 06:29 AM 4/08/2006, you wrote:
>> OK, Beadle and List.
>> Lets consider this data file
>>
>> DATA LIST FREE/x y  z1 z2 z3 z4.
>> BEGIN DATA
>> 2       1       1       3       3       5
>> 1       2       4       1       4       3
>> 1       3       4       5       9       1
>> 1       4       5       2       5       4
>> 2       2       5       1       3       1
>> 1       3       2       2       2       5
>> 1       2       1       2       1       1
>> 1       1       9       4       1       1
>> 1       1       2       4       5       1
>> 1       3       1       5       1       1
>> 1       1       2       4       4       4
>> 1       2       2       9       4       4
>> 2       4       5       1       2       3
>> 1       1       1       1       9       2
>> 1       2       5       1       1       3
>> 1       4       5       1       2       1
>> 1     3     1     2     4     4
>> END DATA.
>>
>> SAVE OUTFILE  ='C:\Temp\originaldata.sav'.
>>
>> In the above data file :
>>
>> 1. The  value "1" has the highest occurance accross  z1,z2,z3 and z4
>> 2. The next  is the value "4" accross  z1,z2,z3 and z4
>>
>> I want to create a new data file 'C:\Temp\Newdata.sav' with same
>> variables
>> (x,y,z1,z2,z3,z4) but with
>> the values of z1,z2,z3 and z4 set to sysmis EXCEPT when
>> (z1,z2,z3,z4) = (1
>> or 4).
>>
>> Thus I want to keep the top 2 values (1 and 4)  accross  z1,z2,z3
>> and z4
>>
>> Regards.
>>
>>
>> -----Original Message-----
>> From: Beadle, ViAnn [mailto:[hidden email]]
>> Sent: Thursday, August 03, 2006 3:52 PM
>> To: Edward Boadi; [hidden email]
>> Subject: RE: Re: 10 most frequent occurring values of a multiple
>> response set ( REVISITED )
>>
>>
>> OK, let's try this from a different tack because I don't think
>> anybody
>> understands what you mean by most frequently occurring categories.
>> Do you
>> want to count occurrences of values across all four variables so
>> that if
>> z1 and z2 each have the value 14, that counts for two occurrences
>> of 14?
>>
>> Perhaps if you would tell us why you want to do this, we would better
>> understand your question. Or if you could give us a small set of
>> data for
>> the 4 variables and tell us what you think are the top 2 values
>> (so you
>> don't have to provide so much data that we can't read it), we
>> would could
>> provide more help here.
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On
>> Behalf Of
>> Edward Boadi
>> Sent: Thursday, August 03, 2006 2:29 PM
>> To: [hidden email]
>> Subject: Re: 10 most frequent occurring values of a multiple
>> response set
>> ( REVISITED )
>>
>> I wish to express my sincere thanks to the following people :
>> Hillel Vardi,Beadle ViAnn,and Richard Ristow for your contributions
>> (advice and syntax) on the above subject.
>>
>> Sorry to say that I have not been able to achieve my desired
>> objective:
>>
>> I below is re-statement of what I want to do.
>>
>> Giving a data file c:\Temp\OriginalData.sav with variables x
>> (1-4) ,y(1-3)
>> , z1(1-15), z2(1-15),z3(1-15), and z4(1-15)
>>  z1, z2 ,z3 and Z4 have identical categories(15).
>>
>> I want to do the following:
>>
>> 1.Identify 10  most frequent occurring categories of Z ( where Z is a
>> combination of z1, Z2, z3 and z4)
>> 2.Set z1,z2,z3 and z4 to missing for values of z1,z2,z3,z4 not
>> in  10  most frequent occurring categories
>>  Thus if the categories (2,4,7,9,12) of z1,z2,z3 and z4 are not in
>> the
>> 10  most frequent occurring categories, set
>>  it to sysmiss
>> 3. Save the new file as c:\Temp\NewData.sav with variables
>> x,y,z1,z2,z3 and z4
>>
>>
>> Any help on this task will be very much appreciated.
>>
>> Warm regards to all.
>
>
> Research Database Manager and Analyst
> Melbourne Institute of Applied Economic and Social Research
> The University of Melbourne
> Melbourne VIC 3010 Australia
> New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
> http://www.melbourneinstitute.com/hilda/