Problem matching on non-integer key variables

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem matching on non-integer key variables

Bruce Weaver
Administrator
This is a follow-up to Frank's "loop" thread that can be viewed here:  http://spssx-discussion.1045642.n5.nabble.com/loop-td5715276.html.  In that thread, I mentioned that I ran into a problem with cases that appeared to have the same combinations of k and p not being matched properly when I ran MATCH FILES.  Here's what I tried (with some output inserted).  Note that I am running v20.0.0.1 under Window 7.

new file.
dataset close all.

* Create small data set to mimic Frank's data file.
* Include a couple extra variables (v1 and v2).
DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
SORT CASES by k p. /* Not needed here, but maybe for your actual file.
DATASET NAME f1.

* Use Frank's INPUT PROGRAM to generate all desired combinations of k and p.
INPUT PROGRAM.
LOOP k = 40 to 69 by 1.
loop p =0.01 to 0.99 by 0.01.
if missing(k) k = lag(k).
end case.
end loop.
END LOOP.
end file.
END INPUT PROGRAM.
EXECUTE.
DATASET NAME f2.

* Now merge the two data sets via MATCH FILES.

MATCH FILES
 FILE = 'f1' / in = f1 /
 FILE = 'f2' / in = f2 /
 BY k p .
EXECUTE.
DATASET NAME f3.
dataset close all.

* Find the cases from the first data set and list them.
TEMPORARY.
SELECT IF
 (k EQ 40 and p EQ .1) or
 (k EQ 50 and p EQ .3) or
 (k EQ 60 and p EQ .5) or
 (k EQ 69 and p EQ .7)
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   60.00      .50     3.00     3.00  1  0
   69.00      .70     4.00     1.00  1  0

Number of cases read:  4    Number of cases listed:  4

* Notice that the f2 flags are not set for these cases.
* Cases from f1 and f2 that ~appear~ to have the same
* combinations of k and p are not being matched up.
* Variable p is likely responsible for this.
* Broaden the range of p-values and list again.

TEMPORARY.
SELECT IF
 (k EQ 40 and range(p,.0995,.1005)) or
 (k EQ 50 and range(p,.2995,.3005)) or
 (k EQ 60 and range(p,.4995,.5005)) or
 (k EQ 69 and range(p,.6995,.7005))
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10      .        .    0  1
   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   50.00      .30      .        .    0  1
   60.00      .50     3.00     3.00  1  0
   60.00      .50      .        .    0  1
   69.00      .70     4.00     1.00  1  0
   69.00      .70      .        .    0  1

Number of cases read:  8    Number of cases listed:  8

* All of these records APPEAR to have the same values
* for both k and p, but are not being matched by MATCH FILES.
* If I revise this code making p an integer ranging from 1 to 99,
* everything works as expected.

* SPSS version:  v 20.0.0.1 running under Windows 7.


David's matrix method does have this same problem, by the way.  

* David's code, but with values of k that fall in Frank's range, plus some junk vars.

NEW FILE.
DATASET CLOSE all.

DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
DATASET NAME f1.


MATRIX.
SAVE ({KRONEKER(T({40:69}),MAKE(99,1,1)),KRONEKER(MAKE(30,1,1), T({1:99}/100))}) / OUTFILE * / VARIABLES k p.
END MATRIX.
MATCH FILES / FILE f1 /IN=f1/ FILE * / IN = f2 / BY k p .
EXE.
dataset close all.
temp.
select if f1 and f2.
list.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1

Number of cases read:  4    Number of cases listed:  4


SO, unless there is a problem specific to v20.0.0.1, this leads me to believe that one should be very careful about MATCHING on non-integer key variables.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

Maguin, Eugene
I don't know for sure but I gotta wonder if something else (I don't have a candidate) is going on. First, spss doesn't have an integer variable type in the sense of (ancient) Fortran Integer. Every variable except for strings is, again in terms of Fortran, Double precision (Real*16).

What do you get if you change the format from F8.2 to F16.13 or F16.14?

I'd kind of expect that there'd be some sort of threshold on the number of bits required to match for a number of different comparison operations, of which a match files could be viewed as an example, along with IF and Recode.

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Wednesday, September 26, 2012 4:22 PM
To: [hidden email]
Subject: Problem matching on non-integer key variables

This is a follow-up to Frank's "loop" thread that can be viewed here:
http://spssx-discussion.1045642.n5.nabble.com/loop-td5715276.html.  In that thread, I mentioned that I ran into a problem with cases that appeared to have the same combinations of k and p not being matched properly when I ran MATCH FILES.  Here's what I tried (with some output inserted).  Note that I am running v20.0.0.1 under Window 7.

new file.
dataset close all.

* Create small data set to mimic Frank's data file.
* Include a couple extra variables (v1 and v2).
DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
SORT CASES by k p. /* Not needed here, but maybe for your actual file.
DATASET NAME f1.

* Use Frank's INPUT PROGRAM to generate all desired combinations of k and p.
INPUT PROGRAM.
LOOP k = 40 to 69 by 1.
loop p =0.01 to 0.99 by 0.01.
if missing(k) k = lag(k).
end case.
end loop.
END LOOP.
end file.
END INPUT PROGRAM.
EXECUTE.
DATASET NAME f2.

* Now merge the two data sets via MATCH FILES.

MATCH FILES
 FILE = 'f1' / in = f1 /
 FILE = 'f2' / in = f2 /
 BY k p .
EXECUTE.
DATASET NAME f3.
dataset close all.

* Find the cases from the first data set and list them.
TEMPORARY.
SELECT IF
 (k EQ 40 and p EQ .1) or
 (k EQ 50 and p EQ .3) or
 (k EQ 60 and p EQ .5) or
 (k EQ 69 and p EQ .7)
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   60.00      .50     3.00     3.00  1  0
   69.00      .70     4.00     1.00  1  0

Number of cases read:  4    Number of cases listed:  4

* Notice that the f2 flags are not set for these cases.
* Cases from f1 and f2 that ~appear~ to have the same
* combinations of k and p are not being matched up.
* Variable p is likely responsible for this.
* Broaden the range of p-values and list again.

TEMPORARY.
SELECT IF
 (k EQ 40 and range(p,.0995,.1005)) or
 (k EQ 50 and range(p,.2995,.3005)) or
 (k EQ 60 and range(p,.4995,.5005)) or
 (k EQ 69 and range(p,.6995,.7005))
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10      .        .    0  1
   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   50.00      .30      .        .    0  1
   60.00      .50     3.00     3.00  1  0
   60.00      .50      .        .    0  1
   69.00      .70     4.00     1.00  1  0
   69.00      .70      .        .    0  1

Number of cases read:  8    Number of cases listed:  8

* All of these records APPEAR to have the same values
* for both k and p, but are not being matched by MATCH FILES.
* If I revise this code making p an integer ranging from 1 to 99,
* everything works as expected.

* SPSS version:  v 20.0.0.1 running under Windows 7.


David's matrix method does have this same problem, by the way.

* David's code, but with values of k that fall in Frank's range, plus some junk vars.

NEW FILE.
DATASET CLOSE all.

DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
DATASET NAME f1.


MATRIX.
SAVE ({KRONEKER(T({40:69}),MAKE(99,1,1)),KRONEKER(MAKE(30,1,1),
T({1:99}/100))}) / OUTFILE * / VARIABLES k p.
END MATRIX.
MATCH FILES / FILE f1 /IN=f1/ FILE * / IN = f2 / BY k p .
EXE.
dataset close all.
temp.
select if f1 and f2.
list.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1

Number of cases read:  4    Number of cases listed:  4


SO, unless there is a problem specific to v20.0.0.1, this leads me to believe that one should be very careful about MATCHING on non-integer key variables.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-matching-on-non-integer-key-variables-tp5715296.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

Bruce Weaver
Administrator
GM: "What do you get if you change the format from F8.2 to F16.13 or F16.14?"

The value of p appears to be the same to 14 decimals, but the values are not the same.  

       k                p       v1       v2 f1 f2 KeqLagK PeqLagP

   40.00  .10000000000000      .        .    0  1    1       0
   40.00  .10000000000000     5.00     9.00  1  0    1       0

   50.00  .30000000000000     3.00     8.00  1  0    1       0
   50.00  .30000000000000      .        .    0  1    1       0

   60.00  .50000000000000     3.00     3.00  1  0    1       0
   60.00  .50000000000000      .        .    0  1    1       0

   69.00  .70000000000000     4.00     1.00  1  0    1       0
   69.00  .70000000000000      .        .    0  1    1       0

Number of cases read:  8    Number of cases listed:  8

Variables KeqLagK and PeqLagP were computed as follows:

COMPUTE KeqLagK = k EQ lag(k).
COMPUTE PeqLagP = p EQ lag(p).
FORMATS KeqLagK PeqLagP (f1)



Maguin, Eugene wrote
I don't know for sure but I gotta wonder if something else (I don't have a candidate) is going on. First, spss doesn't have an integer variable type in the sense of (ancient) Fortran Integer. Every variable except for strings is, again in terms of Fortran, Double precision (Real*16).

What do you get if you change the format from F8.2 to F16.13 or F16.14?

I'd kind of expect that there'd be some sort of threshold on the number of bits required to match for a number of different comparison operations, of which a match files could be viewed as an example, along with IF and Recode.

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Wednesday, September 26, 2012 4:22 PM
To: [hidden email]
Subject: Problem matching on non-integer key variables

This is a follow-up to Frank's "loop" thread that can be viewed here:
http://spssx-discussion.1045642.n5.nabble.com/loop-td5715276.html.  In that thread, I mentioned that I ran into a problem with cases that appeared to have the same combinations of k and p not being matched properly when I ran MATCH FILES.  Here's what I tried (with some output inserted).  Note that I am running v20.0.0.1 under Window 7.

new file.
dataset close all.

* Create small data set to mimic Frank's data file.
* Include a couple extra variables (v1 and v2).
DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
SORT CASES by k p. /* Not needed here, but maybe for your actual file.
DATASET NAME f1.

* Use Frank's INPUT PROGRAM to generate all desired combinations of k and p.
INPUT PROGRAM.
LOOP k = 40 to 69 by 1.
loop p =0.01 to 0.99 by 0.01.
if missing(k) k = lag(k).
end case.
end loop.
END LOOP.
end file.
END INPUT PROGRAM.
EXECUTE.
DATASET NAME f2.

* Now merge the two data sets via MATCH FILES.

MATCH FILES
 FILE = 'f1' / in = f1 /
 FILE = 'f2' / in = f2 /
 BY k p .
EXECUTE.
DATASET NAME f3.
dataset close all.

* Find the cases from the first data set and list them.
TEMPORARY.
SELECT IF
 (k EQ 40 and p EQ .1) or
 (k EQ 50 and p EQ .3) or
 (k EQ 60 and p EQ .5) or
 (k EQ 69 and p EQ .7)
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   60.00      .50     3.00     3.00  1  0
   69.00      .70     4.00     1.00  1  0

Number of cases read:  4    Number of cases listed:  4

* Notice that the f2 flags are not set for these cases.
* Cases from f1 and f2 that ~appear~ to have the same
* combinations of k and p are not being matched up.
* Variable p is likely responsible for this.
* Broaden the range of p-values and list again.

TEMPORARY.
SELECT IF
 (k EQ 40 and range(p,.0995,.1005)) or
 (k EQ 50 and range(p,.2995,.3005)) or
 (k EQ 60 and range(p,.4995,.5005)) or
 (k EQ 69 and range(p,.6995,.7005))
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10      .        .    0  1
   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   50.00      .30      .        .    0  1
   60.00      .50     3.00     3.00  1  0
   60.00      .50      .        .    0  1
   69.00      .70     4.00     1.00  1  0
   69.00      .70      .        .    0  1

Number of cases read:  8    Number of cases listed:  8

* All of these records APPEAR to have the same values
* for both k and p, but are not being matched by MATCH FILES.
* If I revise this code making p an integer ranging from 1 to 99,
* everything works as expected.

* SPSS version:  v 20.0.0.1 running under Windows 7.


David's matrix method does have this same problem, by the way.

* David's code, but with values of k that fall in Frank's range, plus some junk vars.

NEW FILE.
DATASET CLOSE all.

DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
DATASET NAME f1.


MATRIX.
SAVE ({KRONEKER(T({40:69}),MAKE(99,1,1)),KRONEKER(MAKE(30,1,1),
T({1:99}/100))}) / OUTFILE * / VARIABLES k p.
END MATRIX.
MATCH FILES / FILE f1 /IN=f1/ FILE * / IN = f2 / BY k p .
EXE.
dataset close all.
temp.
select if f1 and f2.
list.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1

Number of cases read:  4    Number of cases listed:  4


SO, unless there is a problem specific to v20.0.0.1, this leads me to believe that one should be very careful about MATCHING on non-integer key variables.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-matching-on-non-integer-key-variables-tp5715296.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

David Marso
Administrator
This post was updated on .
In reply to this post by Bruce Weaver
Even applying F20.16 format to p doesn't reveal all of the fudge.  
YES!  One had better be VERY careful when attempting to match on non 'integer' valued doubles.
However:

TEMPORARY.
SELECT IF
 (k EQ 40 and range(p,.0995,.1005)) or
 (k EQ 50 and range(p,.2995,.3005)) or
 (k EQ 60 and range(p,.4995,.5005)) or
 (k EQ 69 and range(p,.6995,.7005)).

COMPUTE P2=p*1E10.
FORMAT k (F8.0) p p2 (F20.16).
LIST k p p2 f1 f2.


       k                    p                   P2 f1 f2

      40    .1000000000000000  999999999.999999800  0  1
      40    .1000000000000000 1000000000.000000000  1  0
      50    .3000000000000000 3000000000.000000000  1  0
      50    .3000000000000001 3000000000.000001000  0  1
      60    .5000000000000000 5000000000.000000000  1  0
      60    .5000000000000002 5000000000.000002000  0  1
      69    .7000000000000000 7000000000.000000000  1  0
      69    .7000000000000004 7000000000.000004000  0  1


Number of cases read:  8    Number of cases listed:  8


Bruce Weaver wrote
This is a follow-up to Frank's "loop" thread that can be viewed here:  http://spssx-discussion.1045642.n5.nabble.com/loop-td5715276.html.  In that thread, I mentioned that I ran into a problem with cases that appeared to have the same combinations of k and p not being matched properly when I ran MATCH FILES.  Here's what I tried (with some output inserted).  Note that I am running v20.0.0.1 under Window 7.

new file.
dataset close all.

* Create small data set to mimic Frank's data file.
* Include a couple extra variables (v1 and v2).
DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
SORT CASES by k p. /* Not needed here, but maybe for your actual file.
DATASET NAME f1.

* Use Frank's INPUT PROGRAM to generate all desired combinations of k and p.
INPUT PROGRAM.
LOOP k = 40 to 69 by 1.
loop p =0.01 to 0.99 by 0.01.
if missing(k) k = lag(k).
end case.
end loop.
END LOOP.
end file.
END INPUT PROGRAM.
EXECUTE.
DATASET NAME f2.

* Now merge the two data sets via MATCH FILES.

MATCH FILES
 FILE = 'f1' / in = f1 /
 FILE = 'f2' / in = f2 /
 BY k p .
EXECUTE.
DATASET NAME f3.
dataset close all.

* Find the cases from the first data set and list them.
TEMPORARY.
SELECT IF
 (k EQ 40 and p EQ .1) or
 (k EQ 50 and p EQ .3) or
 (k EQ 60 and p EQ .5) or
 (k EQ 69 and p EQ .7)
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   60.00      .50     3.00     3.00  1  0
   69.00      .70     4.00     1.00  1  0

Number of cases read:  4    Number of cases listed:  4

* Notice that the f2 flags are not set for these cases.
* Cases from f1 and f2 that ~appear~ to have the same
* combinations of k and p are not being matched up.
* Variable p is likely responsible for this.
* Broaden the range of p-values and list again.

TEMPORARY.
SELECT IF
 (k EQ 40 and range(p,.0995,.1005)) or
 (k EQ 50 and range(p,.2995,.3005)) or
 (k EQ 60 and range(p,.4995,.5005)) or
 (k EQ 69 and range(p,.6995,.7005))
.
LIST.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10      .        .    0  1
   40.00      .10     5.00     9.00  1  0
   50.00      .30     3.00     8.00  1  0
   50.00      .30      .        .    0  1
   60.00      .50     3.00     3.00  1  0
   60.00      .50      .        .    0  1
   69.00      .70     4.00     1.00  1  0
   69.00      .70      .        .    0  1

Number of cases read:  8    Number of cases listed:  8

* All of these records APPEAR to have the same values
* for both k and p, but are not being matched by MATCH FILES.
* If I revise this code making p an integer ranging from 1 to 99,
* everything works as expected.

* SPSS version:  v 20.0.0.1 running under Windows 7.


David's matrix method does have this same problem, by the way.  

* David's code, but with values of k that fall in Frank's range, plus some junk vars.

NEW FILE.
DATASET CLOSE all.

DATA LIST FREE / k p v1 v2.
begin data
40 .1 5 9
50 .3 3 8
60 .5 3 3
69 .7 4 1
end data.
DATASET NAME f1.


MATRIX.
SAVE ({KRONEKER(T({40:69}),MAKE(99,1,1)),KRONEKER(MAKE(30,1,1), T({1:99}/100))}) / OUTFILE * / VARIABLES k p.
END MATRIX.
MATCH FILES / FILE f1 /IN=f1/ FILE * / IN = f2 / BY k p .
EXE.
dataset close all.
temp.
select if f1 and f2.
list.

Output from LIST:

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1

Number of cases read:  4    Number of cases listed:  4


SO, unless there is a problem specific to v20.0.0.1, this leads me to believe that one should be very careful about MATCHING on non-integer key variables.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

David Marso
Administrator
This post was updated on .
In reply to this post by Bruce Weaver
I'll bet my Utilikilt that the problem is here.
loop p =0.01 to 0.99 by 0.01.
If it was:
loop #p =1 to 99 .
compute p=#p/100 .
.......
etc.  The problem would likely disappear.
Note my code builds p as { 1:99 }/100.

EDITED:
YEP!!!!
No Utilikilt for you!!!

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1


Number of cases read:  4    Number of cases listed:  4
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

Maguin, Eugene
In reply to this post by David Marso
David,
Ok. You found a difference in the 16th digit. But why would
Loop J = .10 to .90 by .01.
Or
Loop J = 10 to 90.
Compute jj=j/10.
Or
Loop J = 100 to 900 by 10.
Compute jj=j/100.

Necessarily give different numbers in the far off digits in such a way that comparison operations yield different results.

Maybe more to the point (and I don't mean to put you on the spot for an answer, David, because you don't work for spss), how exactly does spss do comparison operations? More practically, when should we be concerned about possible comparison failures and when not?

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Wednesday, September 26, 2012 5:26 PM
To: [hidden email]
Subject: Re: Problem matching on non-integer key variables

Even applying F20.16 format to p doesn't reveal all of the fudge.
However:

TEMPORARY.
SELECT IF
 (k EQ 40 and range(p,.0995,.1005)) or
 (k EQ 50 and range(p,.2995,.3005)) or
 (k EQ 60 and range(p,.4995,.5005)) or
 (k EQ 69 and range(p,.6995,.7005)).

COMPUTE P2=p*1E10.
FORMAT k (F8.0) p p2 (F20.16).
LIST k p p2 f1 f2.


       k                    p                   P2 f1 f2

      *40*    .1000000000000000  999999999.999999800  0  1
      *40*    .1000000000000000 1000000000.000000000  1  0
      50    .3000000000000000 3000000000.000000000  1  0
      50    .3000000000000001 3000000000.00000*1*000  0  1
      60    .5000000000000000 5000000000.000000000  1  0
      60    .5000000000000002 5000000000.00000*2*000  0  1
      69    .7000000000000000 7000000000.000000000  1  0
      69    .7000000000000004 7000000000.00000*4*000  0  1


Number of cases read:  8    Number of cases listed:  8



Bruce Weaver wrote

> This is a follow-up to Frank's "loop" thread that can be viewed here:
> http://spssx-discussion.1045642.n5.nabble.com/loop-td5715276.html.  In
> that thread, I mentioned that I ran into a problem with cases that
> appeared to have the same combinations of k and p not being matched
> properly when I ran MATCH FILES.  Here's what I tried (with some
> output inserted).  Note that I am running v20.0.0.1 under Window 7.
>
> new file.
> dataset close all.
>
> * Create small data set to mimic Frank's data file.
> * Include a couple extra variables (v1 and v2).
> DATA LIST FREE / k p v1 v2.
> begin data
> 40 .1 5 9
> 50 .3 3 8
> 60 .5 3 3
> 69 .7 4 1
> end data.
> SORT CASES by k p. /* Not needed here, but maybe for your actual file.
> DATASET NAME f1.
>
> * Use Frank's INPUT PROGRAM to generate all desired combinations of k
> and p.
> INPUT PROGRAM.
> LOOP k = 40 to 69 by 1.
> loop p =0.01 to 0.99 by 0.01.
> if missing(k) k = lag(k).
> end case.
> end loop.
> END LOOP.
> end file.
> END INPUT PROGRAM.
> EXECUTE.
> DATASET NAME f2.
>
> * Now merge the two data sets via MATCH FILES.
>
> MATCH FILES
>  FILE = 'f1' / in = f1 /
>  FILE = 'f2' / in = f2 /
>  BY k p .
> EXECUTE.
> DATASET NAME f3.
> dataset close all.
>
> * Find the cases from the first data set and list them.
> TEMPORARY.
> SELECT IF
>  (k EQ 40 and p EQ .1) or
>  (k EQ 50 and p EQ .3) or
>  (k EQ 60 and p EQ .5) or
>  (k EQ 69 and p EQ .7)
> .
> LIST.
>
> Output from LIST:
>
>        k        p       v1       v2 f1 f2
>
>    40.00      .10     5.00     9.00  1  0
>    50.00      .30     3.00     8.00  1  0
>    60.00      .50     3.00     3.00  1  0
>    69.00      .70     4.00     1.00  1  0
>
> Number of cases read:  4    Number of cases listed:  4
>
> * Notice that the f2 flags are not set for these cases.
> * Cases from f1 and f2 that ~appear~ to have the same
> * combinations of k and p are not being matched up.
> * Variable p is likely responsible for this.
> * Broaden the range of p-values and list again.
>
> TEMPORARY.
> SELECT IF
>  (k EQ 40 and range(p,.0995,.1005)) or  (k EQ 50 and
> range(p,.2995,.3005)) or  (k EQ 60 and range(p,.4995,.5005)) or  (k EQ
> 69 and range(p,.6995,.7005)) .
> LIST.
>
> Output from LIST:
>
>        k        p       v1       v2 f1 f2
>
>    40.00      .10      .        .    0  1
>    40.00      .10     5.00     9.00  1  0
>    50.00      .30     3.00     8.00  1  0
>    50.00      .30      .        .    0  1
>    60.00      .50     3.00     3.00  1  0
>    60.00      .50      .        .    0  1
>    69.00      .70     4.00     1.00  1  0
>    69.00      .70      .        .    0  1
>
> Number of cases read:  8    Number of cases listed:  8
>
> * All of these records APPEAR to have the same values
> * for both k and p, but are not being matched by MATCH FILES.
> * If I revise this code making p an integer ranging from 1 to 99,
> * everything works as expected.
>
> * SPSS version:  v 20.0.0.1 running under Windows 7.
>
>
> David's matrix method does have this same problem, by the way.
>
> * David's code, but with values of k that fall in Frank's range, plus
> some junk vars.
>
> NEW FILE.
> DATASET CLOSE all.
>
> DATA LIST FREE / k p v1 v2.
> begin data
> 40 .1 5 9
> 50 .3 3 8
> 60 .5 3 3
> 69 .7 4 1
> end data.
> DATASET NAME f1.
>
>
> MATRIX.
> SAVE ({KRONEKER(T({40:69}),MAKE(99,1,1)),KRONEKER(MAKE(30,1,1),
> T({1:99}/100))}) / OUTFILE * / VARIABLES k p.
> END MATRIX.
> MATCH FILES / FILE f1 /IN=f1/ FILE * / IN = f2 / BY k p .
> EXE.
> dataset close all.
> temp.
> select if f1 and f2.
> list.
>
> Output from LIST:
>
>        k        p       v1       v2 f1 f2
>
>    40.00      .10     5.00     9.00  1  1
>    50.00      .30     3.00     8.00  1  1
>    60.00      .50     3.00     3.00  1  1
>    69.00      .70     4.00     1.00  1  1
>
> Number of cases read:  4    Number of cases listed:  4
>
>
> SO, unless there is a problem specific to v20.0.0.1, this leads me to
> believe that one should be very careful about MATCHING on non-integer
> key variables.





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-matching-on-non-integer-key-variables-tp5715296p5715299.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

Bruce Weaver
Administrator
In reply to this post by David Marso
Yep, that's it.  The "loop p = .01 to .99 by .01" method produces "fudge" about 89% of the time in this pared down demo.

new file.
dataset close all.

INPUT PROGRAM.
loop p1 = .01 to 1.00 by .01.
end case.
END LOOP.
end file.
END INPUT PROGRAM.
dataset name f1.

INPUT PROGRAM.
loop #p = 1 to 99 by 1.
compute p2 = #p/100.
end case.
END LOOP.
end file.
END INPUT PROGRAM.
dataset name f2.

match files
 file = 'f1' /  
 file = 'f2' / .
execute.
dataset name f12.
dataset close all.
compute p1EQp2 = p1 EQ p2.
formats p1EQp2 (f1).
frequencies p1EQp2.
* Frequency table shows 88 cases where p1 NE p2 and 11 cases where p1 EQ p2.
* List the 11 cases where p1 EQ p2.
TEMPORARY.
select if p1EQp2.
LIST.

Output from LIST:

      p1       p2 p1EQp2

     .01      .01    1
     .02      .02    1
     .03      .03    1
     .04      .04    1
     .05      .05    1
     .07      .07    1
     .08      .08    1
     .09      .09    1
     .15      .15    1
     .16      .16    1
     .17      .17    1

Number of cases read:  11    Number of cases listed:  11


Pity about missing out on the UtiliKilt.  I understand the Marso tartan is quite a nice one.  :-|


David Marso wrote
I'll bet my Utilikilt that the problem is here.
loop p =0.01 to 0.99 by 0.01.
If it was:
loop #p =1 to 99 .
compute p=#p/100 .
.......
etc.  The problem would likely disappear.
Note my code builds p as { 1:99 }/100.

EDITED:
YEP!!!!
No Utilikilt for you!!!

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1


Number of cases read:  4    Number of cases listed:  4
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

David Marso
Administrator
This post was updated on .
0. Notice the matches are the first 17 values.  Why??
----------------------
1.  .01 (decimal) does not have an exact binary representation.
2.  BY increments the loop counter by addition.
3.  by the time it hits iteration 17+ there is too much fuzz in the resulting addition so everything goes to hell in a hand basket,
===
EDIT:  It actually entered the first circle at 10. (misses on 10..14)
Bruce Weaver wrote
Yep, that's it.  The "loop p = .01 to .99 by .01" method produces "fudge" about 89% of the time in this pared down demo.

new file.
dataset close all.

INPUT PROGRAM.
loop p1 = .01 to 1.00 by .01.
end case.
END LOOP.
end file.
END INPUT PROGRAM.
dataset name f1.

INPUT PROGRAM.
loop #p = 1 to 99 by 1.
compute p2 = #p/100.
end case.
END LOOP.
end file.
END INPUT PROGRAM.
dataset name f2.

match files
 file = 'f1' /  
 file = 'f2' / .
execute.
dataset name f12.
dataset close all.
compute p1EQp2 = p1 EQ p2.
formats p1EQp2 (f1).
frequencies p1EQp2.
* Frequency table shows 88 cases where p1 NE p2 and 11 cases where p1 EQ p2.
* List the 11 cases where p1 EQ p2.
TEMPORARY.
select if p1EQp2.
LIST.

Output from LIST:

      p1       p2 p1EQp2

     .01      .01    1
     .02      .02    1
     .03      .03    1
     .04      .04    1
     .05      .05    1
     .07      .07    1
     .08      .08    1
     .09      .09    1
     .15      .15    1
     .16      .16    1
     .17      .17    1

Number of cases read:  11    Number of cases listed:  11


Pity about missing out on the UtiliKilt.  I understand the Marso tartan is quite a nice one.  :-|


David Marso wrote
I'll bet my Utilikilt that the problem is here.
loop p =0.01 to 0.99 by 0.01.
If it was:
loop #p =1 to 99 .
compute p=#p/100 .
.......
etc.  The problem would likely disappear.
Note my code builds p as { 1:99 }/100.

EDITED:
YEP!!!!
No Utilikilt for you!!!

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1


Number of cases read:  4    Number of cases listed:  4
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

Bruce Weaver
Administrator
There's a mismatch on 6 too.


David Marso wrote
0. Notice the matches are the first 17 values.  Why??
----------------------
1.  .01 (decimal) does not have an exact binary representation.
2.  BY increments the loop counter by addition.
3.  by the time it hits iteration 17+ there is too much fuzz in the resulting addition so everything goes to hell in a hand basket,
===
EDIT:  It actually entered the first circle at 10. (misses on 10..14)
Bruce Weaver wrote
Yep, that's it.  The "loop p = .01 to .99 by .01" method produces "fudge" about 89% of the time in this pared down demo.

new file.
dataset close all.

INPUT PROGRAM.
loop p1 = .01 to 1.00 by .01.
end case.
END LOOP.
end file.
END INPUT PROGRAM.
dataset name f1.

INPUT PROGRAM.
loop #p = 1 to 99 by 1.
compute p2 = #p/100.
end case.
END LOOP.
end file.
END INPUT PROGRAM.
dataset name f2.

match files
 file = 'f1' /  
 file = 'f2' / .
execute.
dataset name f12.
dataset close all.
compute p1EQp2 = p1 EQ p2.
formats p1EQp2 (f1).
frequencies p1EQp2.
* Frequency table shows 88 cases where p1 NE p2 and 11 cases where p1 EQ p2.
* List the 11 cases where p1 EQ p2.
TEMPORARY.
select if p1EQp2.
LIST.

Output from LIST:

      p1       p2 p1EQp2

     .01      .01    1
     .02      .02    1
     .03      .03    1
     .04      .04    1
     .05      .05    1
     .07      .07    1
     .08      .08    1
     .09      .09    1
     .15      .15    1
     .16      .16    1
     .17      .17    1

Number of cases read:  11    Number of cases listed:  11


Pity about missing out on the UtiliKilt.  I understand the Marso tartan is quite a nice one.  :-|


David Marso wrote
I'll bet my Utilikilt that the problem is here.
loop p =0.01 to 0.99 by 0.01.
If it was:
loop #p =1 to 99 .
compute p=#p/100 .
.......
etc.  The problem would likely disappear.
Note my code builds p as { 1:99 }/100.

EDITED:
YEP!!!!
No Utilikilt for you!!!

       k        p       v1       v2 f1 f2

   40.00      .10     5.00     9.00  1  1
   50.00      .30     3.00     8.00  1  1
   60.00      .50     3.00     3.00  1  1
   69.00      .70     4.00     1.00  1  1


Number of cases read:  4    Number of cases listed:  4
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Problem matching on non-integer key variables

Richard Ristow
In reply to this post by Bruce Weaver
At 04:22 PM 9/26/2012, Bruce Weaver wrote:

>This is a follow-up to Frank's "loop" thread that can be viewed here:
>http://spssx-discussion.1045642.n5.nabble.com/loop-td5715276.html.
>In that thread, I mentioned that I ran into a problem with cases
>that appeared to have the same combinations of k and p not being
>matched properly when I ran MATCH FILES.

To generalize what's been said:

If have quantities you need to test for exact equality (as, for
example, MATCH FILE keys and loop boundaries are tested), don't use
non-integer values, or integer values longer than 15 digits; that's
because of limitations in the number representation that SPSS uses.
(The rule can be stretched a little, but only by making it much more
complicated.)

If you have a quantity that's inherently non-integer, and you need to
use it as a key, there are two standard solutions:

1.) Make a copy that IS an integer, by multiplying by 100, 1000, or
whatever's necessary, and use that as the key. In this case, always use

COMPUTE   IntegerKey = RND(100*RealKey).
not
COMPUTE   IntegerKey = 100*RealKey.

or your 'integer' may have a hidden fractional part.

2.) Make a copy that's a string, and use that as the key. If your key
has form "ii.ff", i.e. is less than 100 and has 2 decimals:

STRING    StringKey (A5).
COMPUTE   StringKey =STRING(RealKey,F5.2).

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD