I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case. Here is the syntax I'm using:
DESC HYPERTENSIVE /STAT DEF SUM. My variable is coded 0 or 1 for absence or presence. The syntax works fine with other variables with the same coding, but with more than 1 case -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110. The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it. Here is the output I get... Descriptive Statistics N Minimum Maximum Sum Mean Std. Deviation HYPERTENSIVE 244 0 1 0 .00 .064 HYPOTENSIVE 244 0 1 110 .45 .499 Thanks, Rebecca G. Burzette, Ph.D. Assistant Scientist Office of Curricular & Student Assessment 2259 Veterinary Medicine Iowa State University 1800 Christensen Drive Ames, Iowa 50011-1134 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Those statistics suggest that p (the mean) for HYPERTENSIVE is very small, so the result is probably rounding to 0 to the number of decimals displayed. Try increasing the number of decimals shown either by changing the variable format to show more decimals or by editing the pivot table and increasing the display precision there. On Thu, Jul 20, 2017 at 12:54 PM, Rebecca Burzette <[hidden email]> wrote: I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case. Here is the syntax I'm using: |
In reply to this post by Rebecca Burzette
Hi Jon,
I see my table did not translate very well... Yes, the mean is very small (.044) and that shows up when I click on it in the output. My problem is that the SUM is 0 when it should be 1. The maximum of 1 tells me that there is at least one case with "1" as the value and in fact, when I look at the raw data, I do have one case of hypertension. Thanks! Becky Burzette ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Rebecca Burzette
Your SD for HYPERTENSIVE is what you would get if there is one observation with HYPERTENSIVE = 1, as shown below. So it is a mystery to me why you are getting a sum = 0. The code below shows me a sum of 1, with everything else matching your output.
NEW FILE. DATASET CLOSE ALL. INPUT PROGRAM. LOOP ID = 1 to 244. END CASE. END LOOP. END FILE. END INPUT PROGRAM. EXECUTE. COMPUTE Hypertensive = ID EQ 1. COMPUTE Hypotensive = ID LE 110. FORMATS Hypertensive Hypotensive (F1). DESCRIPTIVES VARIABLES=Hypertensive Hypotensive /STATISTICS=MEAN SUM STDDEV MIN MAX. * This duplicates Rebecca's pasted output except that * the SUM for Hypertensive = 1, as expected. FORMATS Hypertensive Hypotensive (F5.4). DESCRIPTIVES VARIABLES=Hypertensive Hypotensive /STATISTICS=MEAN SUM STDDEV MIN MAX. * Formatting the variables to display more decimals * shows more decimals for the stats too, as JP suggested. * But it is not changing the sums. You could try the following to check for any cases where Hypertensive = a value other than 1 or 0. Change ID to whatever your ID variable is called (if you have one). TEMPORARY. SELECT IF NOT ANY(Hypertensive,0,1). LIST ID Hypertensive. If you don't have an ID variable, generate one via $CASENUM so that you can find the problematic case (if there is one). COMPUTE ID = $CASENUM. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
The problem is solved. The user's system had not been patched. I don't have V21, but with the oldest version I have, 23, the sum is shown as 1. To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/ SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think was the easiest :-)). It was off by one as you have observed. I believe that was the maximum error. Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed. You can check by looking at the version number in Help > About. The last digit should be nonzero. You can get the latest fixpack for V21 here You might need administrative rights to install it. That should fix the problem. On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote: Your SD for HYPERTENSIVE is what you would get if there is one observation |
In reply to this post by Rebecca Burzette
I think there are some other odd things going on, which Jon's reply made me think about. For hypertensive the mean is .064 for an N of 244. So 244*.064=15.616. if there were one case with hypertensive=1, the mean should be .0041. The same story applies to hypotensive; the mean is .499 and but the sum is 110.45. .499*244=121.756 but 110.45/244=.452. I'm inclined to think that while you think both variables are 0/1, there are other values in the dataset. So: frequencies would be a useful command.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rebecca Burzette Sent: Thursday, July 20, 2017 2:54 PM To: [hidden email] Subject: Sum statistic in descriptives I am using IBM SPSS Version 21 and have run into a little problem using DESCRIPTIVES. When I request the SUM stat, I get a 0 for SUM in a variable which I know has 1 case. Here is the syntax I'm using: DESC HYPERTENSIVE /STAT DEF SUM. My variable is coded 0 or 1 for absence or presence. The syntax works fine with other variables with the same coding, but with more than 1 case -- e.g., HYPOTENSIVE has an incidence of 110 and its SUM stat = 110. The minimum is 0 and the maximum is 1, so I have no idea what's going on or how to fix it. Here is the output I get... Descriptive Statistics N Minimum Maximum Sum Mean Std. Deviation HYPERTENSIVE 244 0 1 0 .00 .064 HYPOTENSIVE 244 0 1 110 .45 .499 Thanks, Rebecca G. Burzette, Ph.D. Assistant Scientist Office of Curricular & Student Assessment 2259 Veterinary Medicine Iowa State University 1800 Christensen Drive Ames, Iowa 50011-1134 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon Peck
Jon, thanks for posting that reference. I’ve never known how the floating point representation and computation worked. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Jon Peck The problem is solved. The user's system had not been patched. I don't have V21, but with the oldest version I have, 23, the sum is shown as 1. To understand what you are getting, you should know that sums are not calculated in the straightforward way that people would
suspect but are computed with a complicated algorithm such as this https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html in order to avoid
roundoff errors in floating point numbers. SPSS Statistics changed to using this type of formula in, I think, V21, but the algorithm did not correctly handle the special case of a sequence of numbers that is almost all zeros (which you might think
was the easiest :-)). It was off by one as you have observed. I believe that was the maximum error. Users were notified, and this was quickly fixed in a hot fix or fixpack once it was discovered, but apparently you do not have that fixpack installed. You
can check by looking at the version number in Help > About. The last digit should be nonzero. You can get the latest fixpack for V21 here You might need administrative rights to install it. That should fix the problem. On Thu, Jul 20, 2017 at 3:15 PM, Bruce Weaver <[hidden email]> wrote:
-- Jon K Peck ===================== To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what Statistics uses. A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it. For most real data this makes no difference, but when the values vary a great deal in magnitude from each other or relative to the sum, the algorithm gives better accuracy. Here is a link to an explanation of the Kahan algorithm. On Fri, Jul 21, 2017 at 7:23 AM, Maguin, Eugene <[hidden email]> wrote:
|
This reminds me of some sayings that end in something like, "It takes a computer to really mess things up."
Now that computers can do so much, so fast, a lesson to take away might be that a computer should be programmed to check the clever solution against the obvious one ... and do further checking when they disagree.
-- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Friday, July 21, 2017 12:53:00 PM To: [hidden email] Subject: Re: Sum statistic in descriptives The link in my earlier email explains the basics of floating point computation, but the link I really meant to include explains the high precision way of computing a sum via the Kahan algorithm, which is what
Statistics uses. A sum is computed by a roundabout algorithm that calculates the floating point error as each value is added and keeps correcting it. For most real data this makes no difference, but when the values vary a great deal in magnitude from each
other or relative to the sum, the algorithm gives better accuracy.
Here is a link to an explanation of the Kahan algorithm.
On Fri, Jul 21, 2017 at 7:23 AM, Maguin, Eugene
<[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |