I am calculating percentages in IBM Statistics 26 and finding a small number of cases where I am expecting to yield a whole number percent but am not getting
this result. For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would get a value of 56 with no resulting decimal when performing the same computation in
Excel or SQL. In other cases, such as 2/4 I would get 50 with no resulting decimals which matches MS products. This difference seems to only occur in a handful of cases by I’m not sure why.
I know I have come across this issue before, but can’t remember why this occurs. I believe IBM also has some documentation on this issue but can’t seem to locate
it from a previous conversation with them. I also can’t seem to put in a support request with IBM at the moment.
Does anyone know the reason for this difference and if there is a workaround to yield the same result as MS products? |
Calculations in SPSS Statistics are done in double precision floating point hardware. That allows for about 15-16 significant figures. The value of 14/25 is, in greater precision 0.560000000000000053290705182007513940334320068359375 which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0. Statistics uses the IEEE standard for floating point arithmetic. Excel might use truncation or it might just be doing the calculation in a different order Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality. That is generally not a good thing to do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference. On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:
|
Reminder of what Jon is talking about: values of 1/2, 1/4, ...,
(powers of 2) are represented precisely in binary notation of
the usual computer arithmetic; whereas 1/10, 1/100, ..., are not.
You do run into less chance of a problem if you divide by 10 or 25, etc.,
/last/ in your computation. For integer compares of floating point
numbers to be secure (in some sense), you can test for a narrow range
or round or truncate.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Jon Peck <[hidden email]>
Sent: Tuesday, May 12, 2020 1:55 PM To: [hidden email] <[hidden email]> Subject: Re: Issue with Computation Differences Between IBM Statistics 26 and MS Products Calculations in SPSS Statistics are done in double precision floating point hardware. That allows for about 15-16 significant figures. The value of 14/25 is, in greater precision
0.560000000000000053290705182007513940334320068359375
which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0.
Statistics uses the IEEE standard for floating point arithmetic. Excel might use truncation or it might just be doing the calculation in a different order
Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality. That is generally not a good thing to
do as even a difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal difference.
On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:
|
Administrator
|
Rich's comments prompted me to try this:
NEW FILE. DATASET CLOSE ALL. DATA LIST FREE / junk(F1). BEGIN DATA 1 1 1 END DATA. COMPUTE x1 = 14/25*100. COMPUTE x2 = 100*14/25. COMPUTE flag1 = x1 EQ 56. COMPUTE flag2 = x2 EQ 56. FORMATS flag1 flag2 (F1). LIST. Here is the output: junk x1 x2 flag1 flag2 1 56.00 56.00 0 1 1 56.00 56.00 0 1 1 56.00 56.00 0 1 Rich Ulrich wrote > Reminder of what Jon is talking about: values of 1/2, 1/4, ..., > (powers of 2) are represented precisely in binary notation of > the usual computer arithmetic; whereas 1/10, 1/100, ..., are not. > > You do run into less chance of a problem if you divide by 10 or 25, etc., > /last/ in your computation. For integer compares of floating point > numbers to be secure (in some sense), you can test for a narrow range > or round or truncate. > > -- > Rich Ulrich > > ________________________________ > From: SPSSX(r) Discussion < > SPSSX-L@.UGA > > on behalf of Jon Peck < > jkpeck@ > > > Sent: Tuesday, May 12, 2020 1:55 PM > To: > SPSSX-L@.UGA > < > SPSSX-L@.UGA > > > Subject: Re: Issue with Computation Differences Between IBM Statistics 26 > and MS Products > > Calculations in SPSS Statistics are done in double precision floating > point hardware. That allows for about 15-16 significant figures. The > value of 14/25 is, in greater precision > 0.560000000000000053290705182007513940334320068359375 > which has to be rounded or truncated to fit in a double precision floating > point number, so if rounded, it will end in 1 while if truncated it will > end in 0. > > Statistics uses the IEEE standard for floating point arithmetic. Excel > might use truncation or it might just be doing the calculation in a > different order > > Note that a difference this tiny is utterly negligible except in the case > where you are comparing two floating point numbers with fractional parts > for exact equality. That is generally not a good thing to do as even a > difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal > difference. > > On Tue, May 12, 2020 at 11:25 AM Veena Nambiar < > VNambiar@.ca > <mailto: > VNambiar@.ca > >> wrote: > > > > I am calculating percentages in IBM Statistics 26 and finding a small > number of cases where I am expecting to yield a whole number percent but > am not getting this result. > > > > For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would > get a value of 56 with no resulting decimal when performing the same > computation in Excel or SQL. In other cases, such as 2/4 I would get 50 > with no resulting decimals which matches MS products. This difference > seems to only occur in a handful of cases by I’m not sure why. > > > > I know I have come across this issue before, but can’t remember why this > occurs. I believe IBM also has some documentation on this issue but can’t > seem to locate it from a previous conversation with them. I also can’t > seem to put in a support request with IBM at the moment. > > > > Does anyone know the reason for this difference and if there is a > workaround to yield the same result as MS products? > > ===================== To manage your subscription to SPSSX-L, send a > message to > LISTSERV@.UGA > <mailto: > LISTSERV@.UGA > > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to manage > subscriptions, send the command INFO REFCARD > > > -- > Jon K Peck > jkpeck@ > <mailto: > jkpeck@ > > > > ===================== To manage your subscription to SPSSX-L, send a > message to > LISTSERV@.UGA > <mailto: > LISTSERV@.UGA > > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to manage > subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
If you do it with the syntax below, you will see, as I said in my earlier post, that x1 and x2 are not quite equal, and x1 is the rounded value of the higher precision value. But the difference between x1 and x2 is infinitesimal. COMPUTE x1 = 14/25*100. COMPUTE x2 = 100*14/25. COMPUTE flag1 = x1 EQ 56. COMPUTE flag2 = x2 EQ 56. FORMATS flag1 flag2 (F1). format x1 x2 (f30.20). compute equal = x1 eq x2. LIST. On Tue, May 12, 2020 at 2:11 PM Bruce Weaver <[hidden email]> wrote: Rich's comments prompted me to try this: |
In reply to this post by Jon Peck
Thanks, Jon. This is helpful. For my purposes, these small differences do end up mattering since we are dual processing data and both processors need to have
an exact match on our results. I’ve applied the following code to the percentages I’m calculating to employ what they call whole number
rounding as required by our agency business rules: Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)). In the case below, the percentage (Rate2) end ups being 57 versus 56 which is what the other person
I’m processing with yields in SQL. It’s only typically a very few cases that our results are mismatched, but those differences are important
for our purposes. I’m not quite sure how to resolve the issue because it’s difficult to say what each package is doing
differently – but we need to ensure a consistent, explainable outcomes for these percentages every time. I was asked several questions - will using a different data type such as float or decimal in SPSS would make a difference? Would setting the variable to
ratio make a difference? Would lowering the number of decimals change anything? Can FuzzBits be used in the Trunc command? I don’t think any of these suggestions would work and I’m ensure about the FuzzBits piece and its applicability in SPSS. One suggestion
was to simply truncate the rate to enforce a match which would work in that case in particular but wouldn’t for other cases – for instance 195/201*100 would yield 97.01492537313433 and truncated would yield 97 instead of 98 which is what I want and is what
I yield using the code above. If anyone has any ideas for me please let me know! Thanks so much! From: Jon Peck [mailto:[hidden email]]
Calculations in SPSS Statistics are done in double precision floating point hardware. That allows for about 15-16 significant figures. The value of 14/25 is, in greater precision 0.560000000000000053290705182007513940334320068359375 which has to be rounded or truncated to fit in a double precision floating point number, so if rounded, it will end in 1 while if truncated it will end in 0. Statistics uses the IEEE standard for floating point arithmetic. Excel might use truncation or it might just be doing the calculation in a different order Note that a difference this tiny is utterly negligible except in the case where you are comparing two floating point numbers with fractional parts for exact equality. That is generally not a good thing to do as even a difference like 14
* 100/25 vs 14/25 * 100 can produce an infinitesimal difference. On Tue, May 12, 2020 at 11:25 AM Veena Nambiar <[hidden email]> wrote:
-- Jon K Peck |
First, note that the result Statistics produces is mathematically correct as far as is possible with double precision floating point arithmetic. My earlier post showed the value to more precision (calculated with special code outside of Statistics that is thousands of times slower). The Statistics result is more accurate than the Excel result, but the difference is way too tiny for anyone to argue that there is a difference that matters. At least that's my opinion. Statistics has only one numeric data type: all numbers are double precision floating point values. The number of decimals in the format is just a display issue: the displayed values would be the same in Statistics and Excel to more decimals than you would ever want. I'm not sure what you want to do, but if you want to discuss further, let's take this off line ([hidden email]). On Tue, May 12, 2020 at 6:41 PM Veena Nambiar <[hidden email]> wrote:
|
Administrator
|
The comparison of SPSS and Excel made me curious about what would happen
using Stata (another package I have). First, here is my revised SPSS code and output, with Jon's addition of the variable "equal". (I changed F30.20 to F24.30, but I don't think that will change anything important.) COMPUTE x1 = 14/25*100. COMPUTE x2 = 100*14/25. COMPUTE flag1 = x1 EQ 56. COMPUTE flag2 = x2 EQ 56. COMPUTE equal = x1 EQ x2. FORMATS flag1 flag2 equal (F1) / x1 x2 (F24.20). LIST x1 to equal. x1 x2 flag1 flag2 equal 56.0000000000000100 56.0000000000000000 0 1 0 56.0000000000000100 56.0000000000000000 0 1 0 56.0000000000000100 56.0000000000000000 0 1 0 Now using Stata. One difference is that Stata has two IEEE 754-2008 floating-point types: float and double. "float" variables (the default for new numeric variables) are stored in 4 bytes, whereas "double" variables are stored in 8 bytes. (Source: https://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/) * Example 1 using 'float' variables (the default) clear set obs 3 generate x1 = 14/25*100 generate x2 = 100*14/25 generate byte flag1 = x1 == 56 generate byte flag2 = x2 == 56 generate byte equal = x1 == x2 format x1 x2 %24.20f list, clean noobs OUTPUT for Example 1: . list, clean noobs x1 x2 flag1 flag2 equal 56.00000000000000000000 56.00000000000000000000 1 1 1 56.00000000000000000000 56.00000000000000000000 1 1 1 56.00000000000000000000 56.00000000000000000000 1 1 1 * Example 2 using 'double' variables clear set obs 3 generate double x1 = 14/25*100 generate double x2 = 100*14/25 generate byte flag1 = x1 == 56 generate byte flag2 = x2 == 56 generate byte equal = x1 == x2 format x1 x2 %24.20f list, clean noobs Output for Example 2: . list, clean noobs x1 x2 flag1 flag2 equal 56.00000000000000710543 56.00000000000000710543 0 0 1 56.00000000000000710543 56.00000000000000710543 0 0 1 56.00000000000000710543 56.00000000000000710543 0 0 1 Unlike SPSS, Stata is not affected by the order of operations: 14/25*100 returns the same value as 100*14/25, regardless of which numeric type I use (float or double), so long as I use the same type for both x1 and x2. The second thing that caught my eye is the differences between SPSS and Stata for the values that do have some non-zero decimals showing. Stata: 56.00000000000000710543 SPSS: 56.0000000000000100 I think this note explains the difference: https://www.ibm.com/support/pages/spss-does-not-display-more-16-digits-dataview For anyone who wants more info about precision in Stata, this looks good: https://www.stata-journal.com/sjpdf.html?articlenum=pr0025 HTH. Jon Peck wrote > First, note that the result Statistics produces is mathematically correct > as far as is possible with double precision floating point arithmetic. My > earlier post showed the value to more precision (calculated with special > code outside of Statistics that is thousands of times slower). The > Statistics result is more accurate than the Excel result, but the > difference is way too tiny for anyone to argue that there is a difference > that matters. At least that's my opinion. > > Statistics has only one numeric data type: all numbers are double > precision > floating point values. The number of decimals in the format is just a > display issue: the displayed values would be the same in Statistics and > Excel to more decimals than you would ever want. > > I'm not sure what you want to do, but if you want to discuss further, > let's take this off line ( > jkpeck@ > ). > > > > > > On Tue, May 12, 2020 at 6:41 PM Veena Nambiar < > VNambiar@.ca > > wrote: > >> Thanks, Jon. This is helpful. For my purposes, these small differences do >> end up mattering since we are dual processing data and both processors >> need >> to have an exact match on our results. >> >> >> >> I’ve applied the following code to the percentages I’m calculating to >> employ what they call whole number rounding as required by our agency >> business rules: >> >> >> >> Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)). >> >> >> >> In the case below, the percentage (Rate2) end ups being 57 versus 56 >> which >> is what the other person I’m processing with yields in SQL. >> >> >> >> It’s only typically a very few cases that our results are mismatched, but >> those differences are important for our purposes. >> >> >> >> I’m not quite sure how to resolve the issue because it’s difficult to say >> what each package is doing differently – but we need to ensure a >> consistent, explainable outcomes for these percentages every time. I was >> asked several questions - will using a different data type such as float >> or >> decimal in SPSS would make a difference? Would setting the variable to >> ratio make a difference? Would lowering the number of decimals change >> anything? Can FuzzBits be used in the Trunc command? I don’t think any of >> these suggestions would work and I’m ensure about the FuzzBits piece and >> its applicability in SPSS. One suggestion was to simply truncate the rate >> to enforce a match which would work in that case in particular but >> wouldn’t >> for other cases – for instance 195/201*100 would yield 97.01492537313433 >> and truncated would yield 97 instead of 98 which is what I want and is >> what >> I yield using the code above. >> >> >> >> If anyone has any ideas for me please let me know! Thanks so much! >> >> >> >> >> >> >> >> *From:* Jon Peck [mailto: > jkpeck@ > ] >> *Sent:* Tuesday, May 12, 2020 10:56 AM >> *To:* Veena Nambiar >> *Cc:* SPSS List >> *Subject:* [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences >> Between IBM Statistics 26 and MS Products >> >> >> >> Calculations in SPSS Statistics are done in double precision floating >> point hardware. That allows for about 15-16 significant figures. The >> value of 14/25 is, in greater precision >> >> 0.560000000000000053290705182007513940334320068359375 >> >> which has to be rounded or truncated to fit in a double precision >> floating >> point number, so if rounded, it will end in 1 while if truncated it will >> end in 0. >> >> >> >> Statistics uses the IEEE standard for floating point arithmetic. Excel >> might use truncation or it might just be doing the calculation in a >> different order >> >> >> >> Note that a difference this tiny is utterly negligible except in the case >> where you are comparing two floating point numbers with fractional parts >> for exact equality. That is generally not a good thing to do as even a >> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal >> difference. >> >> >> >> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar < > VNambiar@.ca > > >> wrote: >> >> >> >> I am calculating percentages in IBM Statistics 26 and finding a small >> number of cases where I am expecting to yield a whole number percent but >> am >> not getting this result. >> >> >> >> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would >> get a value of 56 with no resulting decimal when performing the same >> computation in Excel or SQL. In other cases, such as 2/4 I would get 50 >> with no resulting decimals which matches MS products. This difference >> seems >> to only occur in a handful of cases by I’m not sure why. >> >> >> >> I know I have come across this issue before, but can’t remember why this >> occurs. I believe IBM also has some documentation on this issue but can’t >> seem to locate it from a previous conversation with them. I also can’t >> seem >> to put in a support request with IBM at the moment. >> >> >> >> Does anyone know the reason for this difference and if there is a >> workaround to yield the same result as MS products? >> >> ===================== To manage your subscription to SPSSX-L, send a >> message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text >> except the command. To leave the list, send the command SIGNOFF SPSSX-L >> For >> a list of commands to manage subscriptions, send the command INFO REFCARD >> >> >> >> >> -- >> >> Jon K Peck >> > jkpeck@ >> > > > -- > Jon K Peck > jkpeck@ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I haven't read the article Bruce cites, but strictly speaking, I think the Stata result is wrong. As I said in my previous post, the correct value to the number of digits displayed is 56.0000000000000053290705182007513940334320068359375 not 56.00000000000000710543 And, since a double has only 53 bits for the precision, numbers should not be shown with more digits than that can hold. But I'm not going to lose any sleep over this. On Wed, May 13, 2020 at 8:23 AM Bruce Weaver <[hidden email]> wrote: The comparison of SPSS and Excel made me curious about what would happen |
In reply to this post by Bruce Weaver
I did read both the blog and the journal article. I'm unhappy.
I'm not sure why the "float" example shows equality, but I will
accept that "internal round-off" seems to work advantageously.
My bigger problem is the double-precision example.
Writing (100*14)/25 - to force the order of computation - should
give an exact, integer answer in double precision. Does it? You
show that 100*14/25 has trailing digits, which must be wrong
if multiplication were done first.
So, it looks to me like Stata is giving division a higher precedence
than multiplication, performing it first, instead of taking the formula
in order.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, May 13, 2020 10:23 AM To: [hidden email] <[hidden email]> Subject: Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products The comparison of SPSS and Excel made me curious about what would happen
using Stata (another package I have). First, here is my revised SPSS code and output, with Jon's addition of the variable "equal". (I changed F30.20 to F24.30, but I don't think that will change anything important.) COMPUTE x1 = 14/25*100. COMPUTE x2 = 100*14/25. COMPUTE flag1 = x1 EQ 56. COMPUTE flag2 = x2 EQ 56. COMPUTE equal = x1 EQ x2. FORMATS flag1 flag2 equal (F1) / x1 x2 (F24.20). LIST x1 to equal. x1 x2 flag1 flag2 equal 56.0000000000000100 56.0000000000000000 0 1 0 56.0000000000000100 56.0000000000000000 0 1 0 56.0000000000000100 56.0000000000000000 0 1 0 Now using Stata. One difference is that Stata has two IEEE 754-2008 floating-point types: float and double. "float" variables (the default for new numeric variables) are stored in 4 bytes, whereas "double" variables are stored in 8 bytes. (Source: https://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/) * Example 1 using 'float' variables (the default) clear set obs 3 generate x1 = 14/25*100 generate x2 = 100*14/25 generate byte flag1 = x1 == 56 generate byte flag2 = x2 == 56 generate byte equal = x1 == x2 format x1 x2 %24.20f list, clean noobs OUTPUT for Example 1: . list, clean noobs x1 x2 flag1 flag2 equal 56.00000000000000000000 56.00000000000000000000 1 1 1 56.00000000000000000000 56.00000000000000000000 1 1 1 56.00000000000000000000 56.00000000000000000000 1 1 1 * Example 2 using 'double' variables clear set obs 3 generate double x1 = 14/25*100 generate double x2 = 100*14/25 generate byte flag1 = x1 == 56 generate byte flag2 = x2 == 56 generate byte equal = x1 == x2 format x1 x2 %24.20f list, clean noobs Output for Example 2: . list, clean noobs x1 x2 flag1 flag2 equal 56.00000000000000710543 56.00000000000000710543 0 0 1 56.00000000000000710543 56.00000000000000710543 0 0 1 56.00000000000000710543 56.00000000000000710543 0 0 1 Unlike SPSS, Stata is not affected by the order of operations: 14/25*100 returns the same value as 100*14/25, regardless of which numeric type I use (float or double), so long as I use the same type for both x1 and x2. The second thing that caught my eye is the differences between SPSS and Stata for the values that do have some non-zero decimals showing. Stata: 56.00000000000000710543 SPSS: 56.0000000000000100 I think this note explains the difference: https://www.ibm.com/support/pages/spss-does-not-display-more-16-digits-dataview For anyone who wants more info about precision in Stata, this looks good: https://www.stata-journal.com/sjpdf.html?articlenum=pr0025 HTH. Jon Peck wrote > First, note that the result Statistics produces is mathematically correct > as far as is possible with double precision floating point arithmetic. My > earlier post showed the value to more precision (calculated with special > code outside of Statistics that is thousands of times slower). The > Statistics result is more accurate than the Excel result, but the > difference is way too tiny for anyone to argue that there is a difference > that matters. At least that's my opinion. > > Statistics has only one numeric data type: all numbers are double > precision > floating point values. The number of decimals in the format is just a > display issue: the displayed values would be the same in Statistics and > Excel to more decimals than you would ever want. > > I'm not sure what you want to do, but if you want to discuss further, > let's take this off line ( > jkpeck@ > ). > > > > > > On Tue, May 12, 2020 at 6:41 PM Veena Nambiar < > VNambiar@.ca > > wrote: > >> Thanks, Jon. This is helpful. For my purposes, these small differences do >> end up mattering since we are dual processing data and both processors >> need >> to have an exact match on our results. >> >> >> >> I’ve applied the following code to the percentages I’m calculating to >> employ what they call whole number rounding as required by our agency >> business rules: >> >> >> >> Compute Rate2= TRUNC(Rate1) + (Rate1 > TRUNC(Rate1)). >> >> >> >> In the case below, the percentage (Rate2) end ups being 57 versus 56 >> which >> is what the other person I’m processing with yields in SQL. >> >> >> >> It’s only typically a very few cases that our results are mismatched, but >> those differences are important for our purposes. >> >> >> >> I’m not quite sure how to resolve the issue because it’s difficult to say >> what each package is doing differently – but we need to ensure a >> consistent, explainable outcomes for these percentages every time. I was >> asked several questions - will using a different data type such as float >> or >> decimal in SPSS would make a difference? Would setting the variable to >> ratio make a difference? Would lowering the number of decimals change >> anything? Can FuzzBits be used in the Trunc command? I don’t think any of >> these suggestions would work and I’m ensure about the FuzzBits piece and >> its applicability in SPSS. One suggestion was to simply truncate the rate >> to enforce a match which would work in that case in particular but >> wouldn’t >> for other cases – for instance 195/201*100 would yield 97.01492537313433 >> and truncated would yield 97 instead of 98 which is what I want and is >> what >> I yield using the code above. >> >> >> >> If anyone has any ideas for me please let me know! Thanks so much! >> >> >> >> >> >> >> >> *From:* Jon Peck [mailto: > jkpeck@ > ] >> *Sent:* Tuesday, May 12, 2020 10:56 AM >> *To:* Veena Nambiar >> *Cc:* SPSS List >> *Subject:* [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences >> Between IBM Statistics 26 and MS Products >> >> >> >> Calculations in SPSS Statistics are done in double precision floating >> point hardware. That allows for about 15-16 significant figures. The >> value of 14/25 is, in greater precision >> >> 0.560000000000000053290705182007513940334320068359375 >> >> which has to be rounded or truncated to fit in a double precision >> floating >> point number, so if rounded, it will end in 1 while if truncated it will >> end in 0. >> >> >> >> Statistics uses the IEEE standard for floating point arithmetic. Excel >> might use truncation or it might just be doing the calculation in a >> different order >> >> >> >> Note that a difference this tiny is utterly negligible except in the case >> where you are comparing two floating point numbers with fractional parts >> for exact equality. That is generally not a good thing to do as even a >> difference like 14 * 100/25 vs 14/25 * 100 can produce an infinitesimal >> difference. >> >> >> >> On Tue, May 12, 2020 at 11:25 AM Veena Nambiar < > VNambiar@.ca > > >> wrote: >> >> >> >> I am calculating percentages in IBM Statistics 26 and finding a small >> number of cases where I am expecting to yield a whole number percent but >> am >> not getting this result. >> >> >> >> For instance, 14/25*100 yields 56.00000000000001 instead of 56. I would >> get a value of 56 with no resulting decimal when performing the same >> computation in Excel or SQL. In other cases, such as 2/4 I would get 50 >> with no resulting decimals which matches MS products. This difference >> seems >> to only occur in a handful of cases by I’m not sure why. >> >> >> >> I know I have come across this issue before, but can’t remember why this >> occurs. I believe IBM also has some documentation on this issue but can’t >> seem to locate it from a previous conversation with them. I also can’t >> seem >> to put in a support request with IBM at the moment. >> >> >> >> Does anyone know the reason for this difference and if there is a >> workaround to yield the same result as MS products? >> >> ===================== To manage your subscription to SPSSX-L, send a >> message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text >> except the command. To leave the list, send the command SIGNOFF SPSSX-L >> For >> a list of commands to manage subscriptions, send the command INFO REFCARD >> >> >> >> >> -- >> >> Jon K Peck >> > jkpeck@ >> > > > -- > Jon K Peck > jkpeck@ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Forcing the operation order with parentheses does give an exact integer-valued result as expected. As far as the order of operations in the absence of parenthesizing overrides is concerned, the CSR says. The order of execution is as follows: functions; exponentiation; multiplication, division, and unary –; and addition and subtraction. • Operators at the same level are executed from left to right. On Wed, May 13, 2020 at 12:11 PM Rich Ulrich <[hidden email]> wrote:
|
Administrator
|
In reply to this post by Rich Ulrich
Right you are, Rich, division, then multiplication. Well-spotted. See
section 13.2.5 here: https://www.stata.com/manuals/u13.pdf And you're also right about using parentheses to force the order of computation. * Example 3 using 'double' variables clear set obs 3 generate double x1 = 14/25*100 generate double x2 = 100*14/25 generate double x3 = (100*14)/25 generate byte flag1 = x1 == 56 generate byte flag2 = x2 == 56 generate byte flag3 = x3 == 56 generate byte eq12 = x1 == x2 generate byte eq13 = x1 == x3 generate byte eq23 = x2 == x3 format x1-x3 %24.20f list x1-x3, clean noobs list flag1-flag3 eq12-eq23, clean noobs OUTPUT: . list x1-x3, clean noobs x1 x2 x3 56.00000000000000710543 56.00000000000000710543 56.00000000000000000000 56.00000000000000710543 56.00000000000000710543 56.00000000000000000000 56.00000000000000710543 56.00000000000000710543 56.00000000000000000000 . list flag1-flag3 eq12-eq23, clean noobs flag1 flag2 flag3 eq12 eq13 eq23 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 Rich Ulrich wrote > I did read both the blog and the journal article. I'm unhappy. > I'm not sure why the "float" example shows equality, but I will > accept that "internal round-off" seems to work advantageously. > > My bigger problem is the double-precision example. > > Writing (100*14)/25 - to force the order of computation - should > give an exact, integer answer in double precision. Does it? You > show that 100*14/25 has trailing digits, which must be wrong > if multiplication were done first. > > So, it looks to me like Stata is giving division a higher precedence > than multiplication, performing it first, instead of taking the formula > in order. > > -- > Rich Ulrich ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce, thanks for the confirmation.
I'm a bit surprised to see that Stata gives multiply/divide and
add/subtract separate precedence, each, instead of following
the high school algebra rules that I assumed were universal.
If you don't want to worry about it, use extra parentheses. I've
sometimes been generous with parentheses in COMPUTE statements,
just for clarity. That now seems like a good idea, for this other reason.
Giving each symbol its own order -- I wonder if they did that
intentionally (Why?), or if it was a side effect of "easier programming"?
The errors you might get from overflow for multiply/divide (done
in the wrong order) are more obvious than the add/subtract problems
illustrated in one of your references. But the latter (rarely) can matter, too.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Wednesday, May 13, 2020 8:53 PM To: [hidden email] <[hidden email]> Subject: Re: [EXTERNAL] Re: [SPSSX-L] Issue with Computation Differences Between IBM Statistics 26 and MS Products Right you are, Rich, division, then multiplication. Well-spotted. See
section 13.2.5 here: https://www.stata.com/manuals/u13.pdf And you're also right about using parentheses to force the order of computation. * Example 3 using 'double' variables clear set obs 3 generate double x1 = 14/25*100 generate double x2 = 100*14/25 generate double x3 = (100*14)/25 generate byte flag1 = x1 == 56 generate byte flag2 = x2 == 56 generate byte flag3 = x3 == 56 generate byte eq12 = x1 == x2 generate byte eq13 = x1 == x3 generate byte eq23 = x2 == x3 format x1-x3 %24.20f list x1-x3, clean noobs list flag1-flag3 eq12-eq23, clean noobs OUTPUT: . list x1-x3, clean noobs x1 x2 x3 56.00000000000000710543 56.00000000000000710543 56.00000000000000000000 56.00000000000000710543 56.00000000000000710543 56.00000000000000000000 56.00000000000000710543 56.00000000000000710543 56.00000000000000000000 . list flag1-flag3 eq12-eq23, clean noobs flag1 flag2 flag3 eq12 eq13 eq23 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 Rich Ulrich wrote > I did read both the blog and the journal article. I'm unhappy. > I'm not sure why the "float" example shows equality, but I will > accept that "internal round-off" seems to work advantageously. > > My bigger problem is the double-precision example. > > Writing (100*14)/25 - to force the order of computation - should > give an exact, integer answer in double precision. Does it? You > show that 100*14/25 has trailing digits, which must be wrong > if multiplication were done first. > > So, it looks to me like Stata is giving division a higher precedence > than multiplication, performing it first, instead of taking the formula > in order. > > -- > Rich Ulrich ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |