Sum statistic in descriptives

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Sum statistic in descriptives

Rebecca Burzette
I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case.  Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence.  The syntax works fine with other variables with the same coding, but with more than 1 case -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.

Here is the output I get...
Descriptive Statistics
                              N Minimum Maximum Sum Mean Std. Deviation
HYPERTENSIVE    244 0            1           0 .00   .064
HYPOTENSIVE    244 0            1       110 .45   .499
                               

Thanks,

Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 50011-1134

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Jon Peck
Those statistics suggest that p (the mean) for HYPERTENSIVE is very small, so the result is probably rounding to 0 to the number of decimals displayed.  Try increasing the number of decimals shown either by changing the variable format to show more decimals or by editing the pivot table and increasing the display precision there.

On Thu, Jul 20, 2017 at 12:54 PM, Rebecca Burzette <[hidden email]> wrote:
I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case.  Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence.  The syntax works fine with other variables with the same coding, but with more than 1 case -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.

Here is the output I get...
Descriptive Statistics
                              N Minimum Maximum Sum     Mean    Std. Deviation
HYPERTENSIVE        244 0                   1              0    .00        .064
HYPOTENSIVE         244 0                   1          110      .45        .499


Thanks,

Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 50011-1134

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Rebecca Burzette
In reply to this post by Rebecca Burzette
Hi Jon,

I see my table did not translate very well...  Yes, the mean is very small (.044) and that shows up when I click on it in the output.  My problem is that the SUM is 0 when it should be 1.  The maximum of 1 tells me that there is at least one case with "1" as the value and in fact, when I look at the raw data, I do have one case of hypertension.

Thanks!

Becky Burzette

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Bruce Weaver
Administrator
In reply to this post by Rebecca Burzette
Your SD for HYPERTENSIVE is what you would get if there is one observation with HYPERTENSIVE = 1, as shown below.  So it is a mystery to me why you are getting a sum = 0.  The code below shows me a sum of 1, with everything else matching your output.

NEW FILE.
DATASET CLOSE ALL.

INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.  
COMPUTE Hypotensive = ID LE 110.

FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.

FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.


You could try the following to check for any cases where Hypertensive = a value other than 1 or 0.  Change ID to whatever your ID variable is called (if you have one).    

TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.

If you don't have an ID variable, generate one via $CASENUM so that you can find the problematic case (if there is one).

COMPUTE ID = $CASENUM.

HTH.


Rebecca Burzette wrote
I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case.  Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence.  The syntax works fine with other variables with the same coding, but with more than 1 case -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.

Here is the output I get...
Descriptive Statistics
                              N Minimum Maximum Sum Mean Std. Deviation
HYPERTENSIVE    244 0            1           0 .00   .064
HYPOTENSIVE    244 0            1       110 .45   .499
                               

Thanks,

Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 50011-1134

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Jon Peck
The problem is solved.  The user's system had not been patched.

I don't have V21, but with the oldest version I have, 23, the sum is shown as 1.  To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid roundoff errors in floating point numbers.

SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :-)).  It was off by one as you have observed.  I believe that was the maximum error.  Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed.  You can check by looking at the version number in Help > About.  The last digit should be nonzero.

You can get the latest fixpack for V21 here
You might need administrative rights to install it.

That should fix the problem.

On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:
Your SD for HYPERTENSIVE is what you would get if there is one observation
with HYPERTENSIVE = 1, as shown below.  So it is a mystery to me why you are
getting a sum = 0.  The code below shows me a sum of 1, with everything else
matching your output.

NEW FILE.
DATASET CLOSE ALL.

INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.

FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.

FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.


You could try the following to check for any cases where Hypertensive = a
value other than 1 or 0.  Change ID to whatever your ID variable is called
(if you have one).

TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.

If you don't have an ID variable, generate one via $CASENUM so that you can
find the problematic case (if there is one).

COMPUTE ID = $CASENUM.

HTH.



Rebecca Burzette wrote
> I am using IBM SPSS Version 21 and have run into a little problem using
> DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a
> variable which I know has 1 case.  Here is the syntax I'm using:
> DESC HYPERTENSIVE /STAT DEF SUM.
> My variable is coded 0 or 1 for absence or presence.  The syntax works
> fine with other variables with the same coding, but with more than 1 case
> -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The
> minimum is 0 and the maximum is 1, so I have no idea what's going on or
> how to fix it.
>
> Here is the output I get...
> Descriptive Statistics
>                             N Minimum Maximum Sum     Mean    Std. Deviation
> HYPERTENSIVE      244 0                   1              0    .00        .064
> HYPOTENSIVE       244 0                   1          110      .45        .499
>
>
> Thanks,
>
> Rebecca G. Burzette, Ph.D.
> Assistant Scientist
> Office of Curricular & Student Assessment
> 2259 Veterinary Medicine
> Iowa State University
> 1800 Christensen Drive
> Ames, Iowa 50011-1134
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sum-statistic-in-descriptives-tp5734557p5734560.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Maguin, Eugene
In reply to this post by Rebecca Burzette
I think there are some other odd things going on, which Jon's reply made me think about. For hypertensive the mean is .064 for an N of 244. So 244*.064=15.616. if there were one case with hypertensive=1, the mean should be .0041. The same story applies to hypotensive; the mean is .499 and but the sum is 110.45. .499*244=121.756 but 110.45/244=.452. I'm inclined to think that while you think both variables are 0/1, there are other values in the dataset. So: frequencies would be a useful command.
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rebecca Burzette
Sent: Thursday, July 20, 2017 2:54 PM
To: [hidden email]
Subject: Sum statistic in descriptives

I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case.  Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM.
My variable is coded 0 or 1 for absence or presence.  The syntax works fine with other variables with the same coding, but with more than 1 case -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it.

Here is the output I get...
Descriptive Statistics
                              N Minimum Maximum Sum Mean Std. Deviation
HYPERTENSIVE    244 0            1           0 .00   .064
HYPOTENSIVE    244 0            1       110 .45   .499
                               

Thanks,

Rebecca G. Burzette, Ph.D.
Assistant Scientist
Office of Curricular & Student Assessment
2259 Veterinary Medicine
Iowa State University
1800 Christensen Drive
Ames, Iowa 50011-1134

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Maguin, Eugene
In reply to this post by Jon Peck

Jon, thanks for posting that reference. I’ve never known how the floating point representation and computation worked. Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: Thursday, July 20, 2017 7:19 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives

 

The problem is solved.  The user's system had not been patched.

 

I don't have V21, but with the oldest version I have, 23, the sum is shown as 1.  To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid roundoff errors in floating point numbers.

 

SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :-)).  It was off by one as you have observed.  I believe that was the maximum error.  Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed.  You can check by looking at the version number in Help > About.  The last digit should be nonzero.

 

You can get the latest fixpack for V21 here

You might need administrative rights to install it.

 

That should fix the problem.

 

On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:

Your SD for HYPERTENSIVE is what you would get if there is one observation
with HYPERTENSIVE = 1, as shown below.  So it is a mystery to me why you are
getting a sum = 0.  The code below shows me a sum of 1, with everything else
matching your output.

NEW FILE.
DATASET CLOSE ALL.

INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.

FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.

FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.


You could try the following to check for any cases where Hypertensive = a
value other than 1 or 0.  Change ID to whatever your ID variable is called
(if you have one).

TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.

If you don't have an ID variable, generate one via $CASENUM so that you can
find the problematic case (if there is one).

COMPUTE ID = $CASENUM.

HTH.



Rebecca Burzette wrote
> I am using IBM SPSS Version 21 and have run into a little problem using
> DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a
> variable which I know has 1 case.  Here is the syntax I'm using:
> DESC HYPERTENSIVE /STAT DEF SUM.
> My variable is coded 0 or 1 for absence or presence.  The syntax works
> fine with other variables with the same coding, but with more than 1 case
> -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The
> minimum is 0 and the maximum is 1, so I have no idea what's going on or
> how to fix it.
>
> Here is the output I get...
> Descriptive Statistics
>                             N Minimum Maximum Sum     Mean    Std. Deviation
> HYPERTENSIVE      244 0                   1              0    .00        .064
> HYPOTENSIVE       244 0                   1          110      .45        .499
>
>
> Thanks,
>
> Rebecca G. Burzette, Ph.D.
> Assistant Scientist
> Office of Curricular & Student Assessment
> 2259 Veterinary Medicine
> Iowa State University
> 1800 Christensen Drive
> Ames, Iowa 50011-1134
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sum-statistic-in-descriptives-tp5734557p5734560.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Jon Peck
The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what Statistics uses.  A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it.  For most real data this makes no difference, but when the values vary a great deal in magnitude from each other or relative to the sum, the algorithm gives better accuracy.

Here is a link to an explanation of the Kahan algorithm.


On Fri, Jul 21, 2017 at 7:23 AM, Maguin, Eugene <[hidden email]> wrote:

Jon, thanks for posting that reference. I’ve never known how the floating point representation and computation worked. Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: Thursday, July 20, 2017 7:19 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives

 

The problem is solved.  The user's system had not been patched.

 

I don't have V21, but with the oldest version I have, 23, the sum is shown as 1.  To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid roundoff errors in floating point numbers.

 

SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :-)).  It was off by one as you have observed.  I believe that was the maximum error.  Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed.  You can check by looking at the version number in Help > About.  The last digit should be nonzero.

 

You can get the latest fixpack for V21 here

You might need administrative rights to install it.

 

That should fix the problem.

 

On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:

Your SD for HYPERTENSIVE is what you would get if there is one observation
with HYPERTENSIVE = 1, as shown below.  So it is a mystery to me why you are
getting a sum = 0.  The code below shows me a sum of 1, with everything else
matching your output.

NEW FILE.
DATASET CLOSE ALL.

INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.

FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.

FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.


You could try the following to check for any cases where Hypertensive = a
value other than 1 or 0.  Change ID to whatever your ID variable is called
(if you have one).

TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.

If you don't have an ID variable, generate one via $CASENUM so that you can
find the problematic case (if there is one).

COMPUTE ID = $CASENUM.

HTH.



Rebecca Burzette wrote
> I am using IBM SPSS Version 21 and have run into a little problem using
> DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a
> variable which I know has 1 case.  Here is the syntax I'm using:
> DESC HYPERTENSIVE /STAT DEF SUM.
> My variable is coded 0 or 1 for absence or presence.  The syntax works
> fine with other variables with the same coding, but with more than 1 case
> -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The
> minimum is 0 and the maximum is 1, so I have no idea what's going on or
> how to fix it.
>
> Here is the output I get...
> Descriptive Statistics
>                             N Minimum Maximum Sum     Mean    Std. Deviation
> HYPERTENSIVE      244 0                   1              0    .00        .064
> HYPOTENSIVE       244 0                   1          110      .45        .499
>
>
> Thanks,
>
> Rebecca G. Burzette, Ph.D.
> Assistant Scientist
> Office of Curricular & Student Assessment
> 2259 Veterinary Medicine
> Iowa State University
> 1800 Christensen Drive
> Ames, Iowa 50011-1134
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sum-statistic-in-descriptives-tp5734557p5734560.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Sum statistic in descriptives

Rich Ulrich

This reminds me of some sayings that end in something like,

"It takes a computer to really mess things up."


Now that computers can do so much, so fast, a lesson to take away

might be that a computer should be programmed to check the clever

solution against the obvious one ... and do further checking when they disagree. 


-- 

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Friday, July 21, 2017 12:53:00 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives
 
The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what Statistics uses.  A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it.  For most real data this makes no difference, but when the values vary a great deal in magnitude from each other or relative to the sum, the algorithm gives better accuracy.

Here is a link to an explanation of the Kahan algorithm.


On Fri, Jul 21, 2017 at 7:23 AM, Maguin, Eugene <[hidden email]> wrote:

Jon, thanks for posting that reference. I’ve never known how the floating point representation and computation worked. Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: Thursday, July 20, 2017 7:19 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives

 

The problem is solved.  The user's system had not been patched.

 

I don't have V21, but with the oldest version I have, 23, the sum is shown as 1.  To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid roundoff errors in floating point numbers.

 

SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :-)).  It was off by one as you have observed.  I believe that was the maximum error.  Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed.  You can check by looking at the version number in Help > About.  The last digit should be nonzero.

 

You can get the latest fixpack for V21 here

You might need administrative rights to install it.

 

That should fix the problem.

 

On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:

Your SD for HYPERTENSIVE is what you would get if there is one observation
with HYPERTENSIVE = 1, as shown below.  So it is a mystery to me why you are
getting a sum = 0.  The code below shows me a sum of 1, with everything else
matching your output.

NEW FILE.
DATASET CLOSE ALL.

INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.

FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.

FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.


You could try the following to check for any cases where Hypertensive = a
value other than 1 or 0.  Change ID to whatever your ID variable is called
(if you have one).

TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.

If you don't have an ID variable, generate one via $CASENUM so that you can
find the problematic case (if there is one).

COMPUTE ID = $CASENUM.

HTH.



Rebecca Burzette wrote
> I am using IBM SPSS Version 21 and have run into a little problem using
> DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a
> variable which I know has 1 case.  Here is the syntax I'm using:
> DESC HYPERTENSIVE /STAT DEF SUM.
> My variable is coded 0 or 1 for absence or presence.  The syntax works
> fine with other variables with the same coding, but with more than 1 case
> -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The
> minimum is 0 and the maximum is 1, so I have no idea what's going on or
> how to fix it.
>
> Here is the output I get...
> Descriptive Statistics
>                             N Minimum Maximum Sum     Mean    Std. Deviation
> HYPERTENSIVE      244 0                   1              0    .00        .064
> HYPOTENSIVE       244 0                   1          110      .45        .499
>
>
> Thanks,
>
> Rebecca G. Burzette, Ph.D.
> Assistant Scientist
> Office of Curricular & Student Assessment
> 2259 Veterinary Medicine
> Iowa State University
> 1800 Christensen Drive
> Ames, Iowa 50011-1134
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sum-statistic-in-descriptives-tp5734557p5734560.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD