Dear All,
I want to match two dataset by a numerical variable named SerialNo. This variable has more than 10 digits (e.g. 2333008010, 2333008011 etc). Given SPSS default setting is F8.2. Therefore, I manually changed data format to F10.2. However, when I run the following Command, I get the error message as below. It seems that all serialNo have been truncate as 2.33E+08, so even the serialno has different last digit, it is still treated as the duplicate case. Can somebody help me out? I use version 12. Many thanks in advance. Nai get file="C:\TEMP\Issue.sav". sort cases by serialNo projectNo issue_no Int_Num trip_ID(A). save outfile="C:\TEMP\Issuetemp.sav". get file="C:\TEMP\Calls.sav". SORT CASES BY serialNo projectNo issue_no Int_Num trip_ID(A). save outfile="C:\TEMP\Callstemp.sav". MATCH FILES FILE="C:\TEMP\Callstemp.sav" /TABLE="C:\TEMP\Issuetemp.sav" /BY serialNo projectNo issue_no Int_Num trip_ID. SAVE OUTFILE="C:\temp\AnalysisCMS temp1.sav". File #2 KEY: 2.33E+08 2449 2 7083 3 >Error # 5131 >Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES >must be uniquely identified by the BY variables. >This command not executed. Nai Li Senior Researcher National Centre for Social Research 35 Northampton Square London EC1V 0AX Direct line: 020 7549 7036 Fax number: 020 7549 7050 Switchboard: 020 7250 1866 |
Nai
I think that you should check you data carefully as experience tells me that duplicate cases can get in despite best intentions. I suggest you use the 'Find Duplicate Cases' from the 'Data' menu on BOTH data sets Best Wishes John S. Lemon DIT - University of Aberdeen Edward Wright Building: Room G51 Tel: +44 1224 273350 Fax: +44 1224 273372 > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of Nai Li > Sent: Wednesday, April 11, 2007 9:47 AM > To: [hidden email] > Subject: Match merging data files by a variable with 10 digits > > Dear All, > I want to match two dataset by a numerical variable named > SerialNo. This > variable has more than 10 digits (e.g. 2333008010, 2333008011 > etc). Given > SPSS default setting is F8.2. Therefore, I manually changed > data format to > F10.2. However, when I run the following Command, I get the > error message > as below. It seems that all serialNo have been truncate as > 2.33E+08, so > even the serialno has different last digit, it is still treated as the > duplicate case. Can somebody help me out? I use version 12. > Many thanks in > advance. Nai > > get file="C:\TEMP\Issue.sav". > sort cases by serialNo projectNo issue_no Int_Num trip_ID(A). > save outfile="C:\TEMP\Issuetemp.sav". > > get file="C:\TEMP\Calls.sav". > SORT CASES BY serialNo projectNo issue_no Int_Num trip_ID(A). > save outfile="C:\TEMP\Callstemp.sav". > > MATCH FILES FILE="C:\TEMP\Callstemp.sav" > /TABLE="C:\TEMP\Issuetemp.sav" > /BY serialNo projectNo issue_no Int_Num trip_ID. > SAVE OUTFILE="C:\temp\AnalysisCMS temp1.sav". > File #2 > KEY: 2.33E+08 2449 2 7083 3 > >Error # 5131 > >Duplicate key on a TABLE file. Each case on a TABLE file > in MATCH FILES > >must be uniquely identified by the BY variables. > >This command not executed. > > > > Nai Li > Senior Researcher > > National Centre for Social Research > 35 Northampton Square > London > EC1V 0AX > > Direct line: 020 7549 7036 > Fax number: 020 7549 7050 > Switchboard: 020 7250 1866 > |
In reply to this post by Nai Li
a 10 digit number would be f10.
Art Kendall Social Research Consultants Nai Li wrote: > Dear All, > I want to match two dataset by a numerical variable named SerialNo. This > variable has more than 10 digits (e.g. 2333008010, 2333008011 etc). Given > SPSS default setting is F8.2. Therefore, I manually changed data > format to > F10.2. However, when I run the following Command, I get the error message > as below. It seems that all serialNo have been truncate as 2.33E+08, so > even the serialno has different last digit, it is still treated as the > duplicate case. Can somebody help me out? I use version 12. Many > thanks in > advance. Nai > > get file="C:\TEMP\Issue.sav". > sort cases by serialNo projectNo issue_no Int_Num trip_ID(A). > save outfile="C:\TEMP\Issuetemp.sav". > > get file="C:\TEMP\Calls.sav". > SORT CASES BY serialNo projectNo issue_no Int_Num trip_ID(A). > save outfile="C:\TEMP\Callstemp.sav". > > MATCH FILES FILE="C:\TEMP\Callstemp.sav" > /TABLE="C:\TEMP\Issuetemp.sav" > /BY serialNo projectNo issue_no Int_Num trip_ID. > SAVE OUTFILE="C:\temp\AnalysisCMS temp1.sav". > File #2 > KEY: 2.33E+08 2449 2 7083 3 > >Error # 5131 > >Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES > >must be uniquely identified by the BY variables. > >This command not executed. > > > > Nai Li > Senior Researcher > > National Centre for Social Research > 35 Northampton Square > London > EC1V 0AX > > Direct line: 020 7549 7036 > Fax number: 020 7549 7050 > Switchboard: 020 7250 1866 > > |
In reply to this post by Nai Li
Nai,
I'm confused by your message. I hope you recognize that your real problem is here. If so, I apologize for not recognizing that fact. KEY: 2.33E+08 2449 2 7083 3 >Error # 5131 >Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES >must be uniquely identified by the BY variables. >This command not executed. Assuming that is true, then I can understand your wanting to know how to see the problem value. In addition to what Art said, I think you will probably need to go back and change the default format for that variable (serialNo) in your component datasets and then resave each them. I say that because the match operation is completed before you change the format of serialno and I'm assuming that spss prints the value of the problem case in the defined format and not the default format. Since you already open, sort, and resave each that won't be an issue. Let me add that you now know you have a problem. You can run some syntax to check for duplicates in both files. I don't think 12 has the identify duplicates function (or whatever it is called) in the menus. But, this works. Sort cases by a b c d. Compute dups=0. If (a eq lag(a) and b eq lag(b) and c eq lag(c) and d eq lag(d)) dups=1. Temporary. Select if (dups eq 1) List a b c d. Gene Maguin Gene |
In reply to this post by Nai Li
Confirming, with comments, advice you've had from others -
At 04:47 AM 4/11/2007, Nai Li wrote: >I want to match two dataset by a numerical variable named SerialNo. >This variable has more than 10 digits (e.g. 2333008010, 2333008011 >etc). Therefore, I manually changed data format to F10.2. However, >when I run the following Command, I get the error message as below. It >seems that all serialNo have been truncate as 2.33E+08, so even the >serialno has different last digit, it is still treated as the >duplicate case. First, as Art Kendall said, F10.2 won't display a 10-digit integer; it will display at most a 7-digit integer. The '10' includes the full width of the field; 3 spaces go for the decimal point and two post-decimal digits. Use F10, as Art said - or, I might use COMMA13 myself. (Actually, I'd use F11 or COMMA14; I recommend allowing one more place than you think you need.) Or, if you could have post-decimal digits, F13.2 (or F14.2). Second, as John S. Lemon and Gene Maguin have said, you're seeing a real problem with duplicate keys. SPSS is *not* "truncating as 2.33E+08"; SPSS always matches using the full number, as it's stored internally. To repeat a often-made but important point: what's printed is a *display form* of the number. SPSS uses a floating-point binary representation with 53 bits of precision, and compares the numbers in that form. Third, maybe the duplicate problem isn't real, but it has nothing to do with the display format. If you've made a mistake *reading* the serial numbers, then SPSS's internal form may be wrong, or incomplete. You'll to set a format that displays the full number, and see whether you've done that. -Good luck, Richard |
And fourth, if you are trying to do a table-look up, rather than a file to file match, the file can have duplicate keys but the table cannot.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, April 11, 2007 1:31 PM To: [hidden email] Subject: Re: Match merging data files by a variable with 10 digits Confirming, with comments, advice you've had from others - At 04:47 AM 4/11/2007, Nai Li wrote: >I want to match two dataset by a numerical variable named SerialNo. >This variable has more than 10 digits (e.g. 2333008010, 2333008011 >etc). Therefore, I manually changed data format to F10.2. However, >when I run the following Command, I get the error message as below. It >seems that all serialNo have been truncate as 2.33E+08, so even the >serialno has different last digit, it is still treated as the >duplicate case. First, as Art Kendall said, F10.2 won't display a 10-digit integer; it will display at most a 7-digit integer. The '10' includes the full width of the field; 3 spaces go for the decimal point and two post-decimal digits. Use F10, as Art said - or, I might use COMMA13 myself. (Actually, I'd use F11 or COMMA14; I recommend allowing one more place than you think you need.) Or, if you could have post-decimal digits, F13.2 (or F14.2). Second, as John S. Lemon and Gene Maguin have said, you're seeing a real problem with duplicate keys. SPSS is *not* "truncating as 2.33E+08"; SPSS always matches using the full number, as it's stored internally. To repeat a often-made but important point: what's printed is a *display form* of the number. SPSS uses a floating-point binary representation with 53 bits of precision, and compares the numbers in that form. Third, maybe the duplicate problem isn't real, but it has nothing to do with the display format. If you've made a mistake *reading* the serial numbers, then SPSS's internal form may be wrong, or incomplete. You'll to set a format that displays the full number, and see whether you've done that. -Good luck, Richard |
In reply to this post by Richard Ristow
Richard, Art , John, and Gene,
Thanks you all for your comments and advice. Now I have identified the problem , it is to do with duplicate keys. One duplicate case slipped somehow in one of the data set which causing a duplicate problem. Many thanks for your help. Nai At 19:31 11/04/2007, Richard Ristow wrote: >Confirming, with comments, advice you've had from others - > >At 04:47 AM 4/11/2007, Nai Li wrote: > >>I want to match two dataset by a numerical variable named SerialNo. This >>variable has more than 10 digits (e.g. 2333008010, 2333008011 >>etc). Therefore, I manually changed data format to F10.2. However, when >>I run the following Command, I get the error message as below. It seems >>that all serialNo have been truncate as 2.33E+08, so even the serialno >>has different last digit, it is still treated as the duplicate case. > >First, as Art Kendall said, F10.2 won't display a 10-digit integer; it >will display at most a 7-digit integer. The '10' includes the full width >of the field; 3 spaces go for the decimal point and two post-decimal >digits. Use F10, as Art said - or, I might use COMMA13 myself. (Actually, >I'd use F11 or COMMA14; I recommend allowing one more place than you think >you need.) Or, if you could have post-decimal digits, F13.2 (or F14.2). > >Second, as John S. Lemon and Gene Maguin have said, you're seeing a real >problem with duplicate keys. SPSS is *not* "truncating as 2.33E+08"; SPSS >always matches using the full number, as it's stored internally. To repeat >a often-made but important point: what's printed is a *display form* of >the number. SPSS uses a floating-point binary representation with 53 bits >of precision, and compares the numbers in that form. > >Third, maybe the duplicate problem isn't real, but it has nothing to do >with the display format. If you've made a mistake *reading* the serial >numbers, then SPSS's internal form may be wrong, or incomplete. You'll to >set a format that displays the full number, and see whether you've done that. > >-Good luck, > Richard |
Free forum by Nabble | Edit this page |