SPSSX Discussion

Match merging data files by a variable with 10 digits

Classic

List

Threaded

7 messages Options

Nai Li

Match merging data files by a variable with 10 digits

Dear All,
I want to match two dataset by a numerical variable named SerialNo. This
variable has more than 10 digits (e.g. 2333008010, 2333008011 etc). Given
SPSS default setting is F8.2. Therefore, I manually changed data format to
F10.2. However, when I run the following Command, I get the error message
as below. It seems that all serialNo have been truncate as 2.33E+08, so
even the serialno has different last digit, it is still treated as the
duplicate case. Can somebody help me out? I use version 12. Many thanks in
advance. Nai

get file="C:\TEMP\Issue.sav".
sort cases by serialNo projectNo issue_no Int_Num trip_ID(A).
save outfile="C:\TEMP\Issuetemp.sav".

get file="C:\TEMP\Calls.sav".
SORT CASES BY serialNo projectNo issue_no Int_Num trip_ID(A).
save outfile="C:\TEMP\Callstemp.sav".

MATCH FILES FILE="C:\TEMP\Callstemp.sav"
/TABLE="C:\TEMP\Issuetemp.sav"
/BY serialNo projectNo issue_no Int_Num trip_ID.
SAVE OUTFILE="C:\temp\AnalysisCMS temp1.sav".
File #2
KEY: 2.33E+08 2449 2 7083 3
>Error # 5131
>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES
>must be uniquely identified by the BY variables.
>This command not executed.

Nai Li
Senior Researcher

National Centre for Social Research
35 Northampton Square
London
EC1V 0AX

Direct line: 020 7549 7036
Fax number: 020 7549 7050
Switchboard: 020 7250 1866

Lemon, John S.

Re: Match merging data files by a variable with 10 digits

Nai

I think that you should check you data carefully as experience tells me
that duplicate cases can get in despite best intentions. I suggest you
use the 'Find Duplicate Cases' from the 'Data' menu on BOTH data sets

Best Wishes

John S. Lemon
DIT - University of Aberdeen
Edward Wright Building: Room G51
Tel: +44 1224 273350
Fax: +44 1224 273372

> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of Nai Li
> Sent: Wednesday, April 11, 2007 9:47 AM
> To: [hidden email]
> Subject: Match merging data files by a variable with 10 digits
>
> Dear All,
> I want to match two dataset by a numerical variable named
> SerialNo. This
> variable has more than 10 digits (e.g. 2333008010, 2333008011
> etc). Given
> SPSS default setting is F8.2. Therefore, I manually changed
> data format to
> F10.2. However, when I run the following Command, I get the
> error message
> as below. It seems that all serialNo have been truncate as
> 2.33E+08, so
> even the serialno has different last digit, it is still treated as the
> duplicate case. Can somebody help me out? I use version 12.
> Many thanks in
> advance. Nai
>
> get file="C:\TEMP\Issue.sav".
> sort cases by serialNo projectNo issue_no Int_Num trip_ID(A).
> save outfile="C:\TEMP\Issuetemp.sav".
>
> get file="C:\TEMP\Calls.sav".
> SORT CASES BY serialNo projectNo issue_no Int_Num trip_ID(A).
> save outfile="C:\TEMP\Callstemp.sav".
>
> MATCH FILES FILE="C:\TEMP\Callstemp.sav"
> /TABLE="C:\TEMP\Issuetemp.sav"
> /BY serialNo projectNo issue_no Int_Num trip_ID.
> SAVE OUTFILE="C:\temp\AnalysisCMS temp1.sav".
> File #2
> KEY: 2.33E+08 2449 2 7083 3
> >Error # 5131
> >Duplicate key on a TABLE file. Each case on a TABLE file
> in MATCH FILES
> >must be uniquely identified by the BY variables.
> >This command not executed.
>
>
>
> Nai Li
> Senior Researcher
>
> National Centre for Social Research
> 35 Northampton Square
> London
> EC1V 0AX
>
> Direct line: 020 7549 7036
> Fax number: 020 7549 7050
> Switchboard: 020 7250 1866
>

Art Kendall-2

Re: Match merging data files by a variable with 10 digits

In reply to this post by Nai Li

a 10 digit number would be f10.

Art Kendall
Social Research Consultants

Nai Li wrote:

> Dear All,
> I want to match two dataset by a numerical variable named SerialNo. This
> variable has more than 10 digits (e.g. 2333008010, 2333008011 etc). Given
> SPSS default setting is F8.2. Therefore, I manually changed data
> format to
> F10.2. However, when I run the following Command, I get the error message
> as below. It seems that all serialNo have been truncate as 2.33E+08, so
> even the serialno has different last digit, it is still treated as the
> duplicate case. Can somebody help me out? I use version 12. Many
> thanks in
> advance. Nai
>
> get file="C:\TEMP\Issue.sav".
> sort cases by serialNo projectNo issue_no Int_Num trip_ID(A).
> save outfile="C:\TEMP\Issuetemp.sav".
>
> get file="C:\TEMP\Calls.sav".
> SORT CASES BY serialNo projectNo issue_no Int_Num trip_ID(A).
> save outfile="C:\TEMP\Callstemp.sav".
>
> MATCH FILES FILE="C:\TEMP\Callstemp.sav"
> /TABLE="C:\TEMP\Issuetemp.sav"
> /BY serialNo projectNo issue_no Int_Num trip_ID.
> SAVE OUTFILE="C:\temp\AnalysisCMS temp1.sav".
> File #2
> KEY: 2.33E+08 2449 2 7083 3
> >Error # 5131
> >Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES
> >must be uniquely identified by the BY variables.
> >This command not executed.
>
>
>
> Nai Li
> Senior Researcher
>
> National Centre for Social Research
> 35 Northampton Square
> London
> EC1V 0AX
>
> Direct line: 020 7549 7036
> Fax number: 020 7549 7050
> Switchboard: 020 7250 1866
>
>

Maguin, Eugene

Re: Match merging data files by a variable with 10 digits

In reply to this post by Nai Li

Nai,

I'm confused by your message. I hope you recognize that your real problem is
here. If so, I apologize for not recognizing that fact.

KEY: 2.33E+08 2449 2 7083 3
>Error # 5131
>Duplicate key on a TABLE file. Each case on a TABLE file in MATCH FILES
>must be uniquely identified by the BY variables.
>This command not executed.

Assuming that is true, then I can understand your wanting to know how to see
the problem value. In addition to what Art said, I think you will probably
need to go back and change the default format for that variable (serialNo)
in your component datasets and then resave each them. I say that because the
match operation is completed before you change the format of serialno and
I'm assuming that spss prints the value of the problem case in the defined
format and not the default format. Since you already open, sort, and resave
each that won't be an issue.

Let me add that you now know you have a problem. You can run some syntax to
check for duplicates in both files. I don't think 12 has the identify
duplicates function (or whatever it is called) in the menus. But, this
works.

Sort cases by a b c d.
Compute dups=0.
If (a eq lag(a) and b eq lag(b) and c eq lag(c) and d eq lag(d)) dups=1.

Temporary.
Select if (dups eq 1)
List a b c d.

Gene Maguin

Gene

Richard Ristow

Re: Match merging data files by a variable with 10 digits

In reply to this post by Nai Li

Confirming, with comments, advice you've had from others -

At 04:47 AM 4/11/2007, Nai Li wrote:

>I want to match two dataset by a numerical variable named SerialNo.
>This variable has more than 10 digits (e.g. 2333008010, 2333008011
>etc). Therefore, I manually changed data format to F10.2. However,
>when I run the following Command, I get the error message as below. It
>seems that all serialNo have been truncate as 2.33E+08, so even the
>serialno has different last digit, it is still treated as the
>duplicate case.

First, as Art Kendall said, F10.2 won't display a 10-digit integer; it
will display at most a 7-digit integer. The '10' includes the full
width of the field; 3 spaces go for the decimal point and two
post-decimal digits. Use F10, as Art said - or, I might use COMMA13
myself. (Actually, I'd use F11 or COMMA14; I recommend allowing one
more place than you think you need.) Or, if you could have post-decimal
digits, F13.2 (or F14.2).

Second, as John S. Lemon and Gene Maguin have said, you're seeing a
real problem with duplicate keys. SPSS is *not* "truncating as
2.33E+08"; SPSS always matches using the full number, as it's stored
internally. To repeat a often-made but important point: what's printed
is a *display form* of the number. SPSS uses a floating-point binary
representation with 53 bits of precision, and compares the numbers in
that form.

Third, maybe the duplicate problem isn't real, but it has nothing to do
with the display format. If you've made a mistake *reading* the serial
numbers, then SPSS's internal form may be wrong, or incomplete. You'll
to set a format that displays the full number, and see whether you've
done that.

-Good luck,
Richard

Beadle, ViAnn

Re: Match merging data files by a variable with 10 digits

And fourth, if you are trying to do a table-look up, rather than a file to file match, the file can have duplicate keys but the table cannot.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, April 11, 2007 1:31 PM
To: [hidden email]
Subject: Re: Match merging data files by a variable with 10 digits

Confirming, with comments, advice you've had from others -

At 04:47 AM 4/11/2007, Nai Li wrote:

>I want to match two dataset by a numerical variable named SerialNo.
>This variable has more than 10 digits (e.g. 2333008010, 2333008011
>etc). Therefore, I manually changed data format to F10.2. However,
>when I run the following Command, I get the error message as below. It
>seems that all serialNo have been truncate as 2.33E+08, so even the
>serialno has different last digit, it is still treated as the
>duplicate case.

First, as Art Kendall said, F10.2 won't display a 10-digit integer; it
will display at most a 7-digit integer. The '10' includes the full
width of the field; 3 spaces go for the decimal point and two
post-decimal digits. Use F10, as Art said - or, I might use COMMA13
myself. (Actually, I'd use F11 or COMMA14; I recommend allowing one
more place than you think you need.) Or, if you could have post-decimal
digits, F13.2 (or F14.2).

Second, as John S. Lemon and Gene Maguin have said, you're seeing a
real problem with duplicate keys. SPSS is *not* "truncating as
2.33E+08"; SPSS always matches using the full number, as it's stored
internally. To repeat a often-made but important point: what's printed
is a *display form* of the number. SPSS uses a floating-point binary
representation with 53 bits of precision, and compares the numbers in
that form.

Third, maybe the duplicate problem isn't real, but it has nothing to do
with the display format. If you've made a mistake *reading* the serial
numbers, then SPSS's internal form may be wrong, or incomplete. You'll
to set a format that displays the full number, and see whether you've
done that.

-Good luck,
Richard

Nai Li

Re: Match merging data files by a variable with 10 digits

In reply to this post by Richard Ristow

Richard, Art , John, and Gene,
Thanks you all for your comments and advice. Now I have identified the
problem , it is to do with duplicate keys. One duplicate case slipped
somehow in one of the data set which causing a duplicate problem.

Many thanks for your help.

Nai

At 19:31 11/04/2007, Richard Ristow wrote:

>Confirming, with comments, advice you've had from others -
>
>At 04:47 AM 4/11/2007, Nai Li wrote:
>
>>I want to match two dataset by a numerical variable named SerialNo. This
>>variable has more than 10 digits (e.g. 2333008010, 2333008011
>>etc). Therefore, I manually changed data format to F10.2. However, when
>>I run the following Command, I get the error message as below. It seems
>>that all serialNo have been truncate as 2.33E+08, so even the serialno
>>has different last digit, it is still treated as the duplicate case.
>
>First, as Art Kendall said, F10.2 won't display a 10-digit integer; it
>will display at most a 7-digit integer. The '10' includes the full width
>of the field; 3 spaces go for the decimal point and two post-decimal
>digits. Use F10, as Art said - or, I might use COMMA13 myself. (Actually,
>I'd use F11 or COMMA14; I recommend allowing one more place than you think
>you need.) Or, if you could have post-decimal digits, F13.2 (or F14.2).
>
>Second, as John S. Lemon and Gene Maguin have said, you're seeing a real
>problem with duplicate keys. SPSS is *not* "truncating as 2.33E+08"; SPSS
>always matches using the full number, as it's stored internally. To repeat
>a often-made but important point: what's printed is a *display form* of
>the number. SPSS uses a floating-point binary representation with 53 bits
>of precision, and compares the numbers in that form.
>
>Third, maybe the duplicate problem isn't real, but it has nothing to do
>with the display format. If you've made a mistake *reading* the serial
>numbers, then SPSS's internal form may be wrong, or incomplete. You'll to
>set a format that displays the full number, and see whether you've done that.
>
>-Good luck,
> Richard