SPSSX Discussion

Skewed distributions

Classic

List

Threaded

7 messages Options

Bob Schacht-3

Skewed distributions

I often deal with skewed distributions, with hundreds if not thousands of
cases. I often want to know exactly how many cases are involved on and
around the mode, but don't really care much about the details of the
remaining distribution.

For example, Service Cost (ServCost) is a skewed variable with a mode of 0,
many 1's, and then a long tail with a scattering of costs.

If I do a normal Frequency analysis, I get a very long table with pages of
useless information. All I want to see is the number of 0's and 1's.

How can I obtain this output without all the other stuff?

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Alexander J. Shackman-2

Re: Skewed distributions

filter out all but 0 (select if), then table the number of cases for the var
of interest
then
ditto for 1

sound workable?

alex shackman

On Fri, Feb 29, 2008 at 1:54 PM, Bob Schacht <[hidden email]> wrote:

> I often deal with skewed distributions, with hundreds if not thousands of
> cases. I often want to know exactly how many cases are involved on and
> around the mode, but don't really care much about the details of the
> remaining distribution.
>
> For example, Service Cost (ServCost) is a skewed variable with a mode of
> 0,
> many 1's, and then a long tail with a scattering of costs.
>
> If I do a normal Frequency analysis, I get a very long table with pages of
> useless information. All I want to see is the number of 0's and 1's.
>
> How can I obtain this output without all the other stuff?
>
> Bob Schacht
>
> Robert M. Schacht, Ph.D. <[hidden email]>
> Pacific Basin Rehabilitation Research & Training Center
> 1268 Young Street, Suite #204
> Research Center, University of Hawaii
> Honolulu, HI 96814
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

--
Alexander J. Shackman
Laboratory for Affective Neuroscience
Waisman Laboratory for Brain Imaging & Behavior
University of Wisconsin-Madison
1202 West Johnson Street
Madison, Wisconsin 53706

Telephone: +1 (608) 358-5025
FAX: +1 (608) 265-2875
EMAIL: [hidden email]
http://psyphz.psych.wisc.edu/~shackman
Calendar {still under construction}:
http://www.google.com/calendar/embed?src=ajshackman%40gmail.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: Skewed distributions

In reply to this post by Bob Schacht-3

Why not just use a filter restricting the sample to values of 0 and 1 and then run Frequencies?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bob Schacht
Sent: Friday, February 29, 2008 12:55 PM
To: [hidden email]
Subject: [SPSSX-L] Skewed distributions

I often deal with skewed distributions, with hundreds if not thousands of
cases. I often want to know exactly how many cases are involved on and
around the mode, but don't really care much about the details of the
remaining distribution.

For example, Service Cost (ServCost) is a skewed variable with a mode of 0,
many 1's, and then a long tail with a scattering of costs.

If I do a normal Frequency analysis, I get a very long table with pages of
useless information. All I want to see is the number of 0's and 1's.

How can I obtain this output without all the other stuff?

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: Skewed distributions

You can also create a new variable coded 0=0, 1=1, and 2=more than 1.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Peck, Jon
Sent: 29 February 2008 18:31
To: [hidden email]
Subject: Re: Skewed distributions

Why not just use a filter restricting the sample to values of 0 and 1 and
then run Frequencies?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bob
Schacht
Sent: Friday, February 29, 2008 12:55 PM
To: [hidden email]
Subject: [SPSSX-L] Skewed distributions

I often deal with skewed distributions, with hundreds if not thousands of
cases. I often want to know exactly how many cases are involved on and
around the mode, but don't really care much about the details of the
remaining distribution.

For example, Service Cost (ServCost) is a skewed variable with a mode of 0,
many 1's, and then a long tail with a scattering of costs.

If I do a normal Frequency analysis, I get a very long table with pages of
useless information. All I want to see is the number of 0's and 1's.

How can I obtain this output without all the other stuff?

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Schacht-3

Re: Skewed distributions

In reply to this post by Peck, Jon

At 10:31 AM 2/29/2008, Peck, Jon wrote:
>Why not just use a filter restricting the sample to values of 0 and 1 and
>then run Frequencies?

Jon, Hector, and Alexander,
Thank you for your replies. In case it makes any difference, I should have
mentioned that I am using ver. 12.
I guess what I was hoping for is that there was more interval control in
the Frequencies function so that I could get a frequency distribution by a
controlled set of intervals.

Here's a different case along the same lines. I have data for which the
hourly wage is calculated from weekly earnings by hours worked per week.
This can produce very similar wages that differ by less than a penny per
hour. The distribution can also be skewed, with a long positive tail. It
would occasionally be useful to generate a frequency distribution in
controlled intervals of varying width, such as
* Less than $10/hour
* $10-$20 per hour
* $20-$30 per hour
* $30-$40 per hour
* $40-$50 per hour
* $50-$70 per hour
* $70-$!00 per hour
* $100-$150 per hour
* More than $150 per hour

Yes, I know I could recode, but I was hoping for something easier. :-)

Bob

>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>Bob Schacht
>Sent: Friday, February 29, 2008 12:55 PM
>To: [hidden email]
>Subject: [SPSSX-L] Skewed distributions
>
>I often deal with skewed distributions, with hundreds if not thousands of
>cases. I often want to know exactly how many cases are involved on and
>around the mode, but don't really care much about the details of the
>remaining distribution.
>
>For example, Service Cost (ServCost) is a skewed variable with a mode of 0,
>many 1's, and then a long tail with a scattering of costs.
>
>If I do a normal Frequency analysis, I get a very long table with pages of
>useless information. All I want to see is the number of 0's and 1's.
>
>How can I obtain this output without all the other stuff?
>
>Bob Schacht
>
>Robert M. Schacht, Ph.D. <[hidden email]>
>Pacific Basin Rehabilitation Research & Training Center
>1268 Young Street, Suite #204
>Research Center, University of Hawaii
>Honolulu, HI 96814
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

Peck, Jon

Re: Skewed distributions

If you had at least SPSS 14, you could do things like this very easily with the Visual Bander (now called Visual Binner). You can define your recode with a few clicks in the distribution, or you can have it generate a recode based on equal width intervals, equal percentiles, or based on the moments.

With older versions, you'll have to put up with doing the recode manually. Specifying that really isn't any more work than spelling out what intervals you would like for Frequencies, though.

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bob Schacht
Sent: Friday, February 29, 2008 2:34 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Skewed distributions

At 10:31 AM 2/29/2008, Peck, Jon wrote:
>Why not just use a filter restricting the sample to values of 0 and 1 and
>then run Frequencies?

Jon, Hector, and Alexander,
Thank you for your replies. In case it makes any difference, I should have
mentioned that I am using ver. 12.
I guess what I was hoping for is that there was more interval control in
the Frequencies function so that I could get a frequency distribution by a
controlled set of intervals.

Here's a different case along the same lines. I have data for which the
hourly wage is calculated from weekly earnings by hours worked per week.
This can produce very similar wages that differ by less than a penny per
hour. The distribution can also be skewed, with a long positive tail. It
would occasionally be useful to generate a frequency distribution in
controlled intervals of varying width, such as
* Less than $10/hour
* $10-$20 per hour
* $20-$30 per hour
* $30-$40 per hour
* $40-$50 per hour
* $50-$70 per hour
* $70-$!00 per hour
* $100-$150 per hour
* More than $150 per hour

Yes, I know I could recode, but I was hoping for something easier. :-)

Bob

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Skewed distributions

In reply to this post by Bob Schacht-3

At 02:54 PM 2/29/2008, Bob Schacht wrote (I'm also responding to his
posting of 04:34 PM 2/29/2008):

>I often deal with skewed distributions. I often want to know exactly
>how many cases are involved on and around the mode, but don't really
>care much about the details of the remaining distribution.
>
>For example, Service Cost (ServCost) is a skewed variable with a
>mode of 0, many 1's, and then a long tail with a scattering of
>costs. If I do a normal Frequency analysis, I get a very long table
>with pages of useless information. All I want to see is the number
>of 0's and 1's.

Now, at 04:34 PM 2/29/2008, Bob Schacht followed up:
>Yes, I know I could recode, but I was hoping for something easier. :-)

Goodness! Am I hopelessly old-school? I thought RECODE was about the
easiest syntax there was. Anyway, what I'd do with your problem is (untested),

TEMPORARY.
RECODE ServCost
(1 = 1)
(1 THRU HI = 2).
VAL LABELS ServCost 2 ' > 1'.
FREQUENCIES ServCost.

That RECODE statement looks funny, doesn't it, recoding 1 into itself
and then putting it in another range, too? Here's what it means:
. The last specification recodes everything from 1 upwards into 2.
. The first specification keeps this from applying to value 1,
itself. When a value is included in more than one recode
specification, RECODE applies the first one. You can do some useful
things, with that.

Continuing,

At 04:34 PM 2/29/2008, Bob Schacht wrote:

>Along the same lines, I have data for which the hourly wage is
>calculated from weekly earnings by hours worked per week. This can
>produce wages that differ by less than a penny per hour.

Of course, if you just want all values to, say, there nearest 50
cents, you can do this. (Use this form *only* if you've kept 'weekly
earnings' and 'hours worked' in your data. Otherwise, this loses
information. NEVER do that; compute a new rounded variable, instead.)

COMPUTE HrlyWage = 0.5*RND(2*HrlyWage).

>It would occasionally be useful to generate a frequency distribution
>in controlled intervals of varying width, such as
> * Less than $10/hour
> * $10-$20 per hour
> * $20-$30 per hour
> * $30-$40 per hour
> * $40-$50 per hour
> * $50-$70 per hour
> * $70-$!00 per hour
> * $100-$150 per hour
> * More than $150 per hour
>
>Yes, I know I could recode, but I was hoping for something easier. :-)

Again, what is easier than RECODE? You have to specify every
interval; but if you're going to control your intervals, wouldn't you
have to, no matter what? Not tested, but

RECODE HrlyWage
(MISSING = 999)
(150 THRU HI = 150)
(100 THRU 150 = 100)
( 70 THRU 100 = 70)
( 50 THRU 70 = 50)
( 40 THRU 50 = 40)
( 30 THRU 40 = 30)
( 20 THRU 30 = 20)
( 10 THRU 20 = 10)
( 0 = 0)
( 0 THRU 10 = 5)
INTO WageCatg.

VAL LABEL WageCatg
0 'Nothing'
5 '< $10'
10 '$ 10-$ 20'
20 '$ 20-$ 30'
30 '$ 30-$ 40'
40 '$ 40-$ 50'
50 '$ 50-$ 70'
70 '$ 70-$100'
100 '$100-$150'
150 '$ >= $150'
999 'Missing'.

MISSING VAL WageCatg (999).
FORMATS WageCatg (F4).
VAR LABEL WageCatg 'Hourly wages, categorized by range'.

FREQUENCIES WageCatg.

Notice that I've put the RECODE specifications in *descending* order
by range. That puts the ends of the ranges into the higher interval;
that is, $10 is in the $10-$20 range, $20 in the $20-$30 range, etc.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD