|
I’ve run into a bit of odd behavior checking an API scoring engine against SPSS code and was looking for some help. The jest is that I have 60 items, 60 item weights, and a constant. I multiply the item times it’s weight, sum them all together, and then add a constant. When I use the sum function or + sign in SPSS I get one answer. When I store each item*weight into a variable and then sum all those together and add the constant I get a slightly different answer but it matches what I get using python. All item values are whole numbers and all item weights are rounded to two decimal places as is the constant. Below is a break down of the different methods I’ve tried. My guess is this is due to imprecision of floats in SPSS but I’m not sure or if there is a simple "fix." Any insight would be greatly appreciated….. Python· *Item * weight (create list of all results) · * Sum list and add constant · * Resulting value: 59.62 SPSSPlus Sign · * Item1*weight + constant. · * Resulting value: 59.18 Parentheses · * (item1 * weight) + constant · * Resulting value: 59.18 Sum · * Sum(item1*weight, item2*weight) + constant · * Resulting value: 59.18 Sum Parentheses · * Sum((item*weight),(item2*weight)) + constant · * Resulting value: 59.18 Simulate Python in SPSS · * Item * weight (calculate for each item) · * Sum(item1 to item60) + constant · * Note: Comparison of item*weight values match with python · * Resulting value: 59.62
|
|
Administrator
|
That is a bit more of a difference than I would expect from floating point imprecision.
After all the fudge is in the 16th decimal. I would hardly expect it to make that much of a difference over 60 operands. Maybe you should attach a sample data file and the relevant syntax? Restrict it to ONLY the obvious problem cases. If you are posting on Nabble then the Upload link is under the More Tab. If you are not posting on Nabble then reply to this link and please supply required data. http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=reply&node=5730583
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Hey David Ah yes, the obligatory sample files.....I tried to cheat and skip that step to avoid some work. I should have known better. I can't provide the actual weights/algorithms so I need to generate some sample weights, create some new syntax, and replicate the issue. Given the number of scoring iterations I need to rewrite chunk of syntax so hold tight..... On Wed, Sep 9, 2015 at 3:10 PM, David Marso <[hidden email]> wrote: That is a bit more of a difference than I would expect from floating point |
|
In reply to this post by Craig Johnson
Floating point numbers in Statistics have
approximately 53 bits of precision. Precision loss cannot account
for this except in the case where the variation in the values is of very
extreme magnitude (maybe 15 orders of magnitude), and there are positive
and negative numbers arranged optimally to minimize precision. Even
then it would be implausible.
There have to be differences in the inputs. You say the values were rounded, but how was this done? A sav file that replicates this would be helpful in figuring out what is happening. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Craig J <[hidden email]> To: [hidden email] Date: 09/09/2015 04:02 PM Subject: [SPSSX-L] Float Imprecision Issue? Sent by: "SPSSX(r) Discussion" <[hidden email]> I’ve run into a bit of odd behavior checking an API scoring engine against SPSS code and was looking for some help. The jest is that I have 60 items, 60 item weights, and a constant. I multiply the item times it’s weight, sum them all together, and then add a constant. When I use the sum function or + sign in SPSS I get one answer. When I store each item*weight into a variable and then sum all those together and add the constant I get a slightly different answer but it matches what I get using python. All item values are whole numbers and all item weights are rounded to two decimal places as is the constant. Below is a break down of the different methods I’ve tried. My guess is this is due to imprecision of floats in SPSS but I’m not sure or if there is a simple "fix." Any insight would be greatly appreciated….. Python · *Item * weight (create list of all results) · * Sum list and add constant · * Resulting value: 59.62 SPSS Plus Sign · * Item1*weight + constant. · * Resulting value: 59.18 Parentheses · * (item1 * weight) + constant · * Resulting value: 59.18 Sum · * Sum(item1*weight, item2*weight) + constant · * Resulting value: 59.18 Sum Parentheses · * Sum((item*weight),(item2*weight)) + constant · * Resulting value: 59.18 Simulate Python in SPSS · * Item * weight (calculate for each item) · * Sum(item1 to item60) + constant · * Note: Comparison of item*weight values match with python · * Resulting value: 59.62
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
|
In reply to this post by Craig Johnson
I have been bedeviled by the same problem more than once...
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
and I don't remember if there was ever a case where the answers did not become the same after I fixed some weighting-coefficient that had been mis-typed or somehow ignored or misused in some syntax. Especially if I have run into a problem, I will *always* try to create one set of syntax by editing the other; and preferably by use of editing commands that do the whole set at once (to avoid other typing errors). You are using SPSS, and you report exactly one result that is wrong. Is there a data file? Are other several lines, where other values are wrong? - by a multiple of the same difference? You can narrow down what item of 60 is wrong by (say) setting half the items to zero: Does one original SPSS answer now match the "Python simulation" answer, or does it show the same difference? -- Rich Ulrich Date: Wed, 9 Sep 2015 15:01:14 -0700 From: [hidden email] Subject: Float Imprecision Issue? To: [hidden email] I’ve run into a bit of odd behavior checking an API scoring engine against SPSS code and was looking for some help. The jest is that I have 60 items, 60 item weights, and a constant. I multiply the item times it’s weight, sum them all together, and then add a constant. When I use the sum function or + sign in SPSS I get one answer. When I store each item*weight into a variable and then sum all those together and add the constant I get a slightly different answer but it matches what I get using python. All item values are whole numbers and all item weights are rounded to two decimal places as is the constant. Below is a break down of the different methods I’ve tried. My guess is this is due to imprecision of floats in SPSS but I’m not sure or if there is a simple "fix." Any insight would be greatly appreciated….. [ omitting earlier answers] Sum Parentheses · * Sum((item*weight),(item2*weight)) + constant · * Resulting value: 59.18 Simulate Python in SPSS · * Item * weight (calculate for each item) · * Sum(item1 to item60) + constant · * Note: Comparison of item*weight values match with python · * Resulting value: 59.62
|
|
Administrator
|
My guess is that you left out an item from the first calculation ;-0
Easy enough to do. Can do nada without data and code.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
Presumably the items are contiguous (as are the weights)?
DO REPEAT item=item01 TO item60 / wt=weight01 TO weight60. + COMPUTE product=SUM(product, item * wt). END REPEAT. COMPUTE final=SUM(product,constant). ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
I haven't looked at this in a while, but isn't there a difference between how SUM(...) and x+y function--re missing values?
M -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Thursday, September 10, 2015 9:13 AM To: [hidden email] Subject: Re: [SPSSX-L] Float Imprecision Issue? Presumably the items are contiguous (as are the weights)? DO REPEAT item=item01 TO item60 / wt=weight01 TO weight60. + COMPUTE product=SUM(product, item * wt). END REPEAT. COMPUTE final=SUM(product,constant). --- David Marso wrote > My guess is that you left out an item from the first calculation ;-0 > Easy enough to do. Can do nada without data and code. > Rich Ulrich wrote >> I have been bedeviled by the same problem more than once... >> and I don't remember if there was ever a case where the answers did >> not become the same after I fixed some weighting-coefficient that had >> been mis-typed or somehow ignored or misused in some syntax. >> >> Especially if I have run into a problem, I will *always* try to >> create one set of syntax by editing the other; and preferably by use >> of editing commands that do the whole set at once (to avoid other typing errors). >> >> >> You are using SPSS, and you report exactly one result that is wrong. >> Is there a data file? Are other several lines, where other values >> are wrong? >> - by a multiple of the same difference? You can narrow down what item >> of >> 60 is wrong by (say) setting half the items to zero: Does one >> original SPSS answer now match the "Python simulation" answer, or >> does it show the same difference? >> >> -- >> Rich Ulrich >> >> >> >> Date: Wed, 9 Sep 2015 15:01:14 -0700 >> From: >> cjohns38@ >> Subject: Float Imprecision Issue? >> To: >> SPSSX-L@.UGA >> >> I’ve run into a bit of odd behavior checking an API scoring engine >> against SPSS code and was looking for some help. The jest is that I >> have 60 items, 60 item weights, and a constant. I multiply the item >> times it’s weight, sum them all together, and then add a constant. >> When I use the sum function or + sign in SPSS I get one answer. When >> I store each item*weight into a variable and then sum all those >> together and add the constant I get a slightly different answer but >> it matches what I get using python. All item values are whole >> numbers and all item weights are rounded to two decimal places as is >> the constant. Below is a >> break down of the different methods I’ve tried. My guess is this is due >> to imprecision of >> floats in SPSS but I’m not sure or if there is a simple "fix." Any >> insight would be greatly appreciated….. >> [ omitting earlier answers] >> >> Sum Parentheses >> >> · * Sum((item*weight),(item2*weight)) + constant >> >> · * Resulting value: 59.18 >> >> >> >> Simulate Python in SPSS >> >> >> · * Item * weight (calculate for each item) >> >> · * Sum(item1 to item60) + constant >> >> · * Note: Comparison of item*weight values match >> with python >> >> · * Resulting value: 59.62 >> >> ... >> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA >> (not to SPSSX-L), with no body text except the command. To leave the >> list, send the command SIGNOFF SPSSX-L For a list of commands to >> manage subscriptions, send the command INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Float-Imprecision-Issue-tp5730583p5730591.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ________________________________ This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
x+y+z returns missing if any of the variables
has a missing value.
sum(x,y,z) returns missing only if all three variables have missing values. sum.n(x, y, z) returns missing if < n variables have non-missing values. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: "Ives, Melissa L" <[hidden email]> To: [hidden email] Date: 09/10/2015 01:27 PM Subject: Re: Float Imprecision Issue? Sent by: "SPSSX(r) Discussion" <[hidden email]> I haven't looked at this in a while, but isn't there a difference between how SUM(...) and x+y function--re missing values? M -----Original Message----- From: SPSSX(r) Discussion [[hidden email]] On Behalf Of David Marso Sent: Thursday, September 10, 2015 9:13 AM To: [hidden email] Subject: Re: [SPSSX-L] Float Imprecision Issue? Presumably the items are contiguous (as are the weights)? DO REPEAT item=item01 TO item60 / wt=weight01 TO weight60. + COMPUTE product=SUM(product, item * wt). END REPEAT. COMPUTE final=SUM(product,constant). --- David Marso wrote > My guess is that you left out an item from the first calculation ;-0 > Easy enough to do. Can do nada without data and code. > Rich Ulrich wrote >> I have been bedeviled by the same problem more than once... >> and I don't remember if there was ever a case where the answers did >> not become the same after I fixed some weighting-coefficient that had >> been mis-typed or somehow ignored or misused in some syntax. >> >> Especially if I have run into a problem, I will *always* try to >> create one set of syntax by editing the other; and preferably by use >> of editing commands that do the whole set at once (to avoid other typing errors). >> >> >> You are using SPSS, and you report exactly one result that is wrong. >> Is there a data file? Are other several lines, where other values >> are wrong? >> - by a multiple of the same difference? You can narrow down what item >> of >> 60 is wrong by (say) setting half the items to zero: Does one >> original SPSS answer now match the "Python simulation" answer, or >> does it show the same difference? >> >> -- >> Rich Ulrich >> >> >> >> Date: Wed, 9 Sep 2015 15:01:14 -0700 >> From: >> cjohns38@ >> Subject: Float Imprecision Issue? >> To: >> SPSSX-L@.UGA >> >> I’ve run into a bit of odd behavior checking an API scoring engine >> against SPSS code and was looking for some help. The jest is that I >> have 60 items, 60 item weights, and a constant. I multiply the item >> times it’s weight, sum them all together, and then add a constant. >> When I use the sum function or + sign in SPSS I get one answer. When >> I store each item*weight into a variable and then sum all those >> together and add the constant I get a slightly different answer but >> it matches what I get using python. All item values are whole >> numbers and all item weights are rounded to two decimal places as is >> the constant. Below is a >> break down of the different methods I’ve tried. My guess is this is due >> to imprecision of >> floats in SPSS but I’m not sure or if there is a simple "fix." Any >> insight would be greatly appreciated….. >> [ omitting earlier answers] >> >> Sum Parentheses >> >> · * Sum((item*weight),(item2*weight)) + constant >> >> · * Resulting value: 59.18 >> >> >> >> Simulate Python in SPSS >> >> >> · * Item * weight (calculate for each item) >> >> · * Sum(item1 to item60) + constant >> >> · * Note: Comparison of item*weight values match >> with python >> >> · * Resulting value: 59.62 >> >> ... >> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA >> (not to SPSSX-L), with no body text except the command. To leave the >> list, send the command SIGNOFF SPSSX-L For a list of commands to >> manage subscriptions, send the command INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Float-Imprecision-Issue-tp5730583p5730591.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ________________________________ This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
In reply to this post by MLIves
Indeed! If any operand is missing from an expression using + then the entire thing resolves to SYSMIS.
Using SUM disregards MISSING and returns the SUM unless the expression is QUALIFIED using the form SUM.n where n denotes the minimum number of valid values which need be present for the function to return a valid value. Note my 'trick'. COMPUTE var=SUM(var ,expression) . Creates var ex nihilo (initially sysmis but adds expression for first operand). This eliminates the need for explicit initialization. Compare: COMPUTE total=0. DO REPEAT var=v1 TO v10. COMPUTE total=total + var. END REPEAT. vs. DO REPEAT var=v1 TO v10. COMPUTE total=SUM(total, var). END REPEAT. There are 'clumsy' ways to get around the first. COMPUTE total=total +NVALID(var)*var. * Noting that SYSMIS * 0 returns 0 ;-) Have fun, HTH
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
1) No missing values so that can't be the issue 2) I wrote a program that simulated the issue last night and wasn't able to replicate the error. I'm backtracking to see if there is an error an equation somewhere. Thanks for all the feedback and I'll report back what I find. On Thu, Sep 10, 2015 at 11:47 AM, David Marso <[hidden email]> wrote: Indeed! If *any* operand is missing from an expression using + then the |
|
Digging deeper it looks like I sliced a list one item too short which was causing the issue. Thanks again for all the help, learning the precision on the float was worth the 5 hr dive into the rabbit hole. On Thu, Sep 10, 2015 at 12:38 PM, Craig J <[hidden email]> wrote:
|
| Free forum by Nabble | Edit this page |
