Creating a numeric ID from name and surnames of cases

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating a numeric ID from name and surnames of cases

pmulonge
I would appreciate help with the following problem in SPSS syntax.
I have a dataset where the cases only have name and surname and no numeric
ID
e.g.
DATA LIST / name(a5) surname(a5).
BEGIN DATA
Johny Smith
Rachel Cohen
Ingrid Brown
END DATA.

I also have another dataset where presumably the same names and surname
appear  with the same spelling.
DATA LIST / name(a5) surname(a5).
BEGIN  DATA
JOHN SMITH
PAUL BLUE
RACHEL COHEN
TONY HILLS
INGRID BROWN
END DATA.

My question is then: is there a way to merge these datasets based on a
unique identifier? We have chosen name and surname.
My solution was to concatenate the name and surname into one string var
COMPUTE NOME = CONCAT(NAME,SURNAME).

I then proceed to try and use
RECODE NOME (CONVERT) NOME1.
ALTER TYPE NOME (F10).
COMPUTE NOME1 = NUMBER(NOME,F10)

But each time it just converts it to missing values, with no numbers.
The only one that worked was
AUTORECODE VAR = NOME /INTO NOME1.

The problem with AUTORECODE is that they are numbered alphabetically
according in the dataset, so INGRID BROWN will be a different number in the
first dataset from the number assigned in the second dataset. Therefore,
there is no unique ID given based on spelling of the string.
Is there a way around this?
Thanks

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Pancho Mulongeni,
Research Assistant
Pharmaccess Foundation,
Namibia
Reply | Threaded
Open this post in threaded view
|

Re: Creating a numeric ID from name and surnames of cases

David Marso
Administrator
Several problematic issues come to mind.
*1. the two files appear to have different case!
See UPCASE and LOWER functions to address this.
*2. reading the two strings as A5 will truncate them.  Maybe that's what you want, but I doubt it as far as data integrity is concerned.
OTOH: for matching purposes you will mismatch Johny and JOHN even after fixing case.
*3.  It is unlikely *ALL* names will be unique so I wouldn't count on just first and last name as sufficient keys *DO YOU HAVE OTHER* variables which appear in both files.
*4.  Why concat?  MATCH allows many key variables to be used concurrently.
...
*5. CONVERT on RECODE only applies to "*NUMERIC*" data values
RTFM: "CONVERT recodes the string representation of numbers to their numeric representation."
..... blah blah blah...
"If RECODE encounters a value that cannot be converted, it scans the remaining value
specifications. If there is no specific recode specification for that value, the target variable
will be system-missing for that case."
6. AUTORECODE has a SAVE TEMPLATE and APPLY TEMPLATE subcommand.  Maybe that will help (don't know, my old reliable version doesn't support it)
So?
------
DATA LIST LIST/ name(a10) surname(a10) othrdat1 (F8).
BEGIN DATA
Johny Smith 12345678
Rachel Cohen 23456789
Ingrid Brown 23451672
END DATA.
LIST.
STRING Merge1 Merge2 (A10).
COMPUTE Merge1=UPCASE(Name).
COMPUTE Merge2=UPCASE(Surname).
SORT CASES BY Merge1 Merge2.
SAVE OUTFILE "temp1.sav".
DATA LIST LIST/ name(a10) surname(a10) othrdat2 (F8).
BEGIN  DATA
JOHN SMITH 344353
PAUL BLUE 345256
RACHEL COHEN 234353
TONY HILLS 262737
INGRID BROWN 123344
END DATA.
STRING Merge1 Merge2 (A10).
COMPUTE Merge1=UPCASE(Name).
COMPUTE Merge2=UPCASE(Surname).
SORT CASES BY Merge1 Merge2.

MATCH FILES / FILE "temp1.sav" /IN=Flag1/ FILE * /IN=Flag2/ BY Merge1 Merge2 /MAP.
LIST.



NAME       SURNAME    OTHRDAT1 MERGE1     MERGE2     OTHRDAT2 FLAG1 FLAG2

Ingrid     Brown      23451672 INGRID     BROWN        123344   1     1
JOHN       SMITH             . JOHN       SMITH        344353   0     1
Johny      Smith      12345678 JOHNY      SMITH             .   1     0
PAUL       BLUE              . PAUL       BLUE         345256   0     1
Rachel     Cohen      23456789 RACHEL     COHEN        234353   1     1
TONY       HILLS             . TONY       HILLS        262737   0     1


Number of cases read:  6    Number of cases listed:  6
 
pmulonge wrote
I would appreciate help with the following problem in SPSS syntax.
I have a dataset where the cases only have name and surname and no numeric
ID
e.g.
DATA LIST / name(a5) surname(a5).
BEGIN DATA
Johny Smith
Rachel Cohen
Ingrid Brown
END DATA.

I also have another dataset where presumably the same names and surname
appear  with the same spelling.
DATA LIST / name(a5) surname(a5).
BEGIN  DATA
JOHN SMITH
PAUL BLUE
RACHEL COHEN
TONY HILLS
INGRID BROWN
END DATA.

My question is then: is there a way to merge these datasets based on a
unique identifier? We have chosen name and surname.
My solution was to concatenate the name and surname into one string var
COMPUTE NOME = CONCAT(NAME,SURNAME).

I then proceed to try and use
RECODE NOME (CONVERT) NOME1.
ALTER TYPE NOME (F10).
COMPUTE NOME1 = NUMBER(NOME,F10)

But each time it just converts it to missing values, with no numbers.
The only one that worked was
AUTORECODE VAR = NOME /INTO NOME1.

The problem with AUTORECODE is that they are numbered alphabetically
according in the dataset, so INGRID BROWN will be a different number in the
first dataset from the number assigned in the second dataset. Therefore,
there is no unique ID given based on spelling of the string.
Is there a way around this?
Thanks

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"