|
I often deal with skewed distributions, with hundreds if not thousands of
cases. I often want to know exactly how many cases are involved on and around the mode, but don't really care much about the details of the remaining distribution. For example, Service Cost (ServCost) is a skewed variable with a mode of 0, many 1's, and then a long tail with a scattering of costs. If I do a normal Frequency analysis, I get a very long table with pages of useless information. All I want to see is the number of 0's and 1's. How can I obtain this output without all the other stuff? Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
filter out all but 0 (select if), then table the number of cases for the var
of interest then ditto for 1 sound workable? alex shackman On Fri, Feb 29, 2008 at 1:54 PM, Bob Schacht <[hidden email]> wrote: > I often deal with skewed distributions, with hundreds if not thousands of > cases. I often want to know exactly how many cases are involved on and > around the mode, but don't really care much about the details of the > remaining distribution. > > For example, Service Cost (ServCost) is a skewed variable with a mode of > 0, > many 1's, and then a long tail with a scattering of costs. > > If I do a normal Frequency analysis, I get a very long table with pages of > useless information. All I want to see is the number of 0's and 1's. > > How can I obtain this output without all the other stuff? > > Bob Schacht > > Robert M. Schacht, Ph.D. <[hidden email]> > Pacific Basin Rehabilitation Research & Training Center > 1268 Young Street, Suite #204 > Research Center, University of Hawaii > Honolulu, HI 96814 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- Alexander J. Shackman Laboratory for Affective Neuroscience Waisman Laboratory for Brain Imaging & Behavior University of Wisconsin-Madison 1202 West Johnson Street Madison, Wisconsin 53706 Telephone: +1 (608) 358-5025 FAX: +1 (608) 265-2875 EMAIL: [hidden email] http://psyphz.psych.wisc.edu/~shackman Calendar {still under construction}: http://www.google.com/calendar/embed?src=ajshackman%40gmail.com ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Bob Schacht-3
Why not just use a filter restricting the sample to values of 0 and 1 and then run Frequencies?
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bob Schacht Sent: Friday, February 29, 2008 12:55 PM To: [hidden email] Subject: [SPSSX-L] Skewed distributions I often deal with skewed distributions, with hundreds if not thousands of cases. I often want to know exactly how many cases are involved on and around the mode, but don't really care much about the details of the remaining distribution. For example, Service Cost (ServCost) is a skewed variable with a mode of 0, many 1's, and then a long tail with a scattering of costs. If I do a normal Frequency analysis, I get a very long table with pages of useless information. All I want to see is the number of 0's and 1's. How can I obtain this output without all the other stuff? Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
You can also create a new variable coded 0=0, 1=1, and 2=more than 1.
Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: 29 February 2008 18:31 To: [hidden email] Subject: Re: Skewed distributions Why not just use a filter restricting the sample to values of 0 and 1 and then run Frequencies? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bob Schacht Sent: Friday, February 29, 2008 12:55 PM To: [hidden email] Subject: [SPSSX-L] Skewed distributions I often deal with skewed distributions, with hundreds if not thousands of cases. I often want to know exactly how many cases are involved on and around the mode, but don't really care much about the details of the remaining distribution. For example, Service Cost (ServCost) is a skewed variable with a mode of 0, many 1's, and then a long tail with a scattering of costs. If I do a normal Frequency analysis, I get a very long table with pages of useless information. All I want to see is the number of 0's and 1's. How can I obtain this output without all the other stuff? Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Peck, Jon
At 10:31 AM 2/29/2008, Peck, Jon wrote:
>Why not just use a filter restricting the sample to values of 0 and 1 and >then run Frequencies? Jon, Hector, and Alexander, Thank you for your replies. In case it makes any difference, I should have mentioned that I am using ver. 12. I guess what I was hoping for is that there was more interval control in the Frequencies function so that I could get a frequency distribution by a controlled set of intervals. Here's a different case along the same lines. I have data for which the hourly wage is calculated from weekly earnings by hours worked per week. This can produce very similar wages that differ by less than a penny per hour. The distribution can also be skewed, with a long positive tail. It would occasionally be useful to generate a frequency distribution in controlled intervals of varying width, such as * Less than $10/hour * $10-$20 per hour * $20-$30 per hour * $30-$40 per hour * $40-$50 per hour * $50-$70 per hour * $70-$!00 per hour * $100-$150 per hour * More than $150 per hour Yes, I know I could recode, but I was hoping for something easier. :-) Bob >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Bob Schacht >Sent: Friday, February 29, 2008 12:55 PM >To: [hidden email] >Subject: [SPSSX-L] Skewed distributions > >I often deal with skewed distributions, with hundreds if not thousands of >cases. I often want to know exactly how many cases are involved on and >around the mode, but don't really care much about the details of the >remaining distribution. > >For example, Service Cost (ServCost) is a skewed variable with a mode of 0, >many 1's, and then a long tail with a scattering of costs. > >If I do a normal Frequency analysis, I get a very long table with pages of >useless information. All I want to see is the number of 0's and 1's. > >How can I obtain this output without all the other stuff? > >Bob Schacht > >Robert M. Schacht, Ph.D. <[hidden email]> >Pacific Basin Rehabilitation Research & Training Center >1268 Young Street, Suite #204 >Research Center, University of Hawaii >Honolulu, HI 96814 > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
If you had at least SPSS 14, you could do things like this very easily with the Visual Bander (now called Visual Binner). You can define your recode with a few clicks in the distribution, or you can have it generate a recode based on equal width intervals, equal percentiles, or based on the moments.
With older versions, you'll have to put up with doing the recode manually. Specifying that really isn't any more work than spelling out what intervals you would like for Frequencies, though. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bob Schacht Sent: Friday, February 29, 2008 2:34 PM To: [hidden email] Subject: Re: [SPSSX-L] Skewed distributions At 10:31 AM 2/29/2008, Peck, Jon wrote: >Why not just use a filter restricting the sample to values of 0 and 1 and >then run Frequencies? Jon, Hector, and Alexander, Thank you for your replies. In case it makes any difference, I should have mentioned that I am using ver. 12. I guess what I was hoping for is that there was more interval control in the Frequencies function so that I could get a frequency distribution by a controlled set of intervals. Here's a different case along the same lines. I have data for which the hourly wage is calculated from weekly earnings by hours worked per week. This can produce very similar wages that differ by less than a penny per hour. The distribution can also be skewed, with a long positive tail. It would occasionally be useful to generate a frequency distribution in controlled intervals of varying width, such as * Less than $10/hour * $10-$20 per hour * $20-$30 per hour * $30-$40 per hour * $40-$50 per hour * $50-$70 per hour * $70-$!00 per hour * $100-$150 per hour * More than $150 per hour Yes, I know I could recode, but I was hoping for something easier. :-) Bob >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >Bob Schacht >Sent: Friday, February 29, 2008 12:55 PM >To: [hidden email] >Subject: [SPSSX-L] Skewed distributions > >I often deal with skewed distributions, with hundreds if not thousands of >cases. I often want to know exactly how many cases are involved on and >around the mode, but don't really care much about the details of the >remaining distribution. > >For example, Service Cost (ServCost) is a skewed variable with a mode of 0, >many 1's, and then a long tail with a scattering of costs. > >If I do a normal Frequency analysis, I get a very long table with pages of >useless information. All I want to see is the number of 0's and 1's. > >How can I obtain this output without all the other stuff? > >Bob Schacht > >Robert M. Schacht, Ph.D. <[hidden email]> >Pacific Basin Rehabilitation Research & Training Center >1268 Young Street, Suite #204 >Research Center, University of Hawaii >Honolulu, HI 96814 > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Bob Schacht-3
At 02:54 PM 2/29/2008, Bob Schacht wrote (I'm also responding to his
posting of 04:34 PM 2/29/2008): >I often deal with skewed distributions. I often want to know exactly >how many cases are involved on and around the mode, but don't really >care much about the details of the remaining distribution. > >For example, Service Cost (ServCost) is a skewed variable with a >mode of 0, many 1's, and then a long tail with a scattering of >costs. If I do a normal Frequency analysis, I get a very long table >with pages of useless information. All I want to see is the number >of 0's and 1's. Now, at 04:34 PM 2/29/2008, Bob Schacht followed up: >Yes, I know I could recode, but I was hoping for something easier. :-) Goodness! Am I hopelessly old-school? I thought RECODE was about the easiest syntax there was. Anyway, what I'd do with your problem is (untested), TEMPORARY. RECODE ServCost (1 = 1) (1 THRU HI = 2). VAL LABELS ServCost 2 ' > 1'. FREQUENCIES ServCost. That RECODE statement looks funny, doesn't it, recoding 1 into itself and then putting it in another range, too? Here's what it means: . The last specification recodes everything from 1 upwards into 2. . The first specification keeps this from applying to value 1, itself. When a value is included in more than one recode specification, RECODE applies the first one. You can do some useful things, with that. Continuing, At 04:34 PM 2/29/2008, Bob Schacht wrote: >Along the same lines, I have data for which the hourly wage is >calculated from weekly earnings by hours worked per week. This can >produce wages that differ by less than a penny per hour. Of course, if you just want all values to, say, there nearest 50 cents, you can do this. (Use this form *only* if you've kept 'weekly earnings' and 'hours worked' in your data. Otherwise, this loses information. NEVER do that; compute a new rounded variable, instead.) COMPUTE HrlyWage = 0.5*RND(2*HrlyWage). >It would occasionally be useful to generate a frequency distribution >in controlled intervals of varying width, such as > * Less than $10/hour > * $10-$20 per hour > * $20-$30 per hour > * $30-$40 per hour > * $40-$50 per hour > * $50-$70 per hour > * $70-$!00 per hour > * $100-$150 per hour > * More than $150 per hour > >Yes, I know I could recode, but I was hoping for something easier. :-) Again, what is easier than RECODE? You have to specify every interval; but if you're going to control your intervals, wouldn't you have to, no matter what? Not tested, but RECODE HrlyWage (MISSING = 999) (150 THRU HI = 150) (100 THRU 150 = 100) ( 70 THRU 100 = 70) ( 50 THRU 70 = 50) ( 40 THRU 50 = 40) ( 30 THRU 40 = 30) ( 20 THRU 30 = 20) ( 10 THRU 20 = 10) ( 0 = 0) ( 0 THRU 10 = 5) INTO WageCatg. VAL LABEL WageCatg 0 'Nothing' 5 '< $10' 10 '$ 10-$ 20' 20 '$ 20-$ 30' 30 '$ 30-$ 40' 40 '$ 40-$ 50' 50 '$ 50-$ 70' 70 '$ 70-$100' 100 '$100-$150' 150 '$ >= $150' 999 'Missing'. MISSING VAL WageCatg (999). FORMATS WageCatg (F4). VAR LABEL WageCatg 'Hourly wages, categorized by range'. FREQUENCIES WageCatg. Notice that I've put the RECODE specifications in *descending* order by range. That puts the ends of the ranges into the higher interval; that is, $10 is in the $10-$20 range, $20 in the $20-$30 range, etc. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
