Login  Register

Re: Sum statistic in descriptives

Posted by Rich Ulrich on Jul 21, 2017; 5:44pm
URL: http://spssx-discussion.165.s1.nabble.com/Sum-statistic-in-descriptives-tp5734557p5734566.html

This reminds me of some sayings that end in something like,

"It takes a computer to really mess things up."


Now that computers can do so much, so fast, a lesson to take away

might be that a computer should be programmed to check the clever

solution against the obvious one ... and do further checking when they disagree. 


-- 

Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Friday, July 21, 2017 12:53:00 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives
 
The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what Statistics uses.  A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it.  For most real data this makes no difference, but when the values vary a great deal in magnitude from each other or relative to the sum, the algorithm gives better accuracy.

Here is a link to an explanation of the Kahan algorithm.


On Fri, Jul 21, 2017 at 7:23 AM, Maguin, Eugene <[hidden email]> wrote:

Jon, thanks for posting that reference. I’ve never known how the floating point representation and computation worked. Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon Peck
Sent: Thursday, July 20, 2017 7:19 PM
To: [hidden email]
Subject: Re: Sum statistic in descriptives

 

The problem is solved.  The user's system had not been patched.

 

I don't have V21, but with the oldest version I have, 23, the sum is shown as 1.  To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid roundoff errors in floating point numbers.

 

SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :-)).  It was off by one as you have observed.  I believe that was the maximum error.  Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed.  You can check by looking at the version number in Help > About.  The last digit should be nonzero.

 

You can get the latest fixpack for V21 here

You might need administrative rights to install it.

 

That should fix the problem.

 

On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:

Your SD for HYPERTENSIVE is what you would get if there is one observation
with HYPERTENSIVE = 1, as shown below.  So it is a mystery to me why you are
getting a sum = 0.  The code below shows me a sum of 1, with everything else
matching your output.

NEW FILE.
DATASET CLOSE ALL.

INPUT PROGRAM.
LOOP ID = 1 to 244.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Hypertensive = ID EQ 1.
COMPUTE Hypotensive = ID LE 110.

FORMATS Hypertensive Hypotensive (F1).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* This duplicates Rebecca's pasted output except that
* the SUM for Hypertensive = 1, as expected.

FORMATS Hypertensive Hypotensive (F5.4).
DESCRIPTIVES VARIABLES=Hypertensive Hypotensive
  /STATISTICS=MEAN SUM STDDEV MIN MAX.
* Formatting the variables to display more decimals
* shows more decimals for the stats too, as JP suggested.
* But it is not changing the sums.


You could try the following to check for any cases where Hypertensive = a
value other than 1 or 0.  Change ID to whatever your ID variable is called
(if you have one).

TEMPORARY.
SELECT IF NOT ANY(Hypertensive,0,1).
LIST ID Hypertensive.

If you don't have an ID variable, generate one via $CASENUM so that you can
find the problematic case (if there is one).

COMPUTE ID = $CASENUM.

HTH.



Rebecca Burzette wrote
> I am using IBM SPSS Version 21 and have run into a little problem using
> DESCRIPTIVES.  When I request the SUM stat, I get a 0 for SUM in a
> variable which I know has 1 case.  Here is the syntax I'm using:
> DESC HYPERTENSIVE /STAT DEF SUM.
> My variable is coded 0 or 1 for absence or presence.  The syntax works
> fine with other variables with the same coding, but with more than 1 case
> -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110.  The
> minimum is 0 and the maximum is 1, so I have no idea what's going on or
> how to fix it.
>
> Here is the output I get...
> Descriptive Statistics
>                             N Minimum Maximum Sum     Mean    Std. Deviation
> HYPERTENSIVE      244 0                   1              0    .00        .064
> HYPOTENSIVE       244 0                   1          110      .45        .499
>
>
> Thanks,
>
> Rebecca G. Burzette, Ph.D.
> Assistant Scientist
> Office of Curricular & Student Assessment
> 2259 Veterinary Medicine
> Iowa State University
> 1800 Christensen Drive
> Ames, Iowa 50011-1134
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Sum-statistic-in-descriptives-tp5734557p5734560.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



 

--

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD