|
Hello Listers,
I have the following variables but no unique id and I want to match the two files and create a new file with one additional variable. Here is the variable information in both datasets A and B but also some other information in dataset B. 1) Mother's Last Name (A & B) 2) Mother's First Name (A & B) 3) Mother's DOB (A & B) 4) Child's Last name (A & B) 5) Child's First name (A & B) 6) Child's DOB (A & B) 7) CERTNUM (B) I want the new datasets matched by mother's last name, first name, mother's dob, child's last name, first name, and child's dob. Any ideas? KH. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
We need a better description of your problem.
make a copy of your files as they stand for backup. For both files determine how frequently the 6 variables constitute unique identification of cases. click <data> click <find duplicate cases> enter the 6 variable as unique identifier set. paste the syntax run the syntax. Is it possible you could accomplish what you want by aggregating each file so that there is a summary record for each unique combination of the six variables? Art Kendall Social Research Consultants Khaleel Hussaini wrote: > Hello Listers, > I have the following variables but no unique id and I want to match > the two files and create a new file with one additional variable. Here is > the variable information in both datasets A and B but also some other > information in dataset B. > 1) Mother's Last Name (A & B) > 2) Mother's First Name (A & B) > 3) Mother's DOB (A & B) > 4) Child's Last name (A & B) > 5) Child's First name (A & B) > 6) Child's DOB (A & B) > 7) CERTNUM (B) > > I want the new datasets matched by mother's last name, first name, mother's > dob, child's last name, first name, and child's dob. Any ideas? > > KH. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
I still do not understand what the problem is.
Match files can use several variables that have to be the same to create a match. E.G., match files file=seta /in = ina file=setb /in = inb /by MLN MFN MDOB CLN CFN CDOB select if ina eq 1 and inb eq 1. . . . Why a _probabilistic_ match? Are dates only approximate, e.g., one has July 4, 1980 for Mother's birthdate and one has July 5, 1980? Are there hours, minutes etc in the date on one file and not the other? Are there variations in the spelling of names for a particular mother or child? or sometimes a nickname? can you create a small example with two files that shows what the problem is? Art Kendall Social Research Consultants Khaleel Hussaini wrote: > Thank you for your response. There are no duplicate cases. The problem > essentially is how does one match files without unique ids? If there > are unique ids in more than database that correspond to an individual > X and each file stores information about the individual X then > matching the data file based on unique ids is easy. However, the > problem arises when the information available is either string or a > date variable (or both) and one has to match X to X using the criteria > that will result in a probabilistic match. In my case it would be that > files for individual X satisfies, not only the last name, first name, > dob, but also child's last name, child's first name and date of birth. > The purpose of doing this is that I have mother and child's records in > two databases and one database has more information than the other. I > want to build a new dataset that contains information on variables > from both datasets essentially matching and merging two datasets into > one, but only those cases that are common to both datasets. Best, > KH > > > On 3/17/08, *Art Kendall* <[hidden email] > <mailto:[hidden email]>> wrote: > > We need a better description of your problem. > > make a copy of your files as they stand for backup. > For both files determine how frequently the 6 variables constitute > unique identification of cases. > click <data> > click <find duplicate cases> > enter the 6 variable as unique identifier set. > paste the syntax > run the syntax. > > Is it possible you could accomplish what you want by aggregating each > file so that there is a summary record for each unique combination of > the six variables? > > > Art Kendall > Social Research Consultants > > > > > > > Khaleel Hussaini wrote: > > Hello Listers, > > I have the following variables but no unique id and I > want to match > > the two files and create a new file with one additional > variable. Here is > > the variable information in both datasets A and B but also some > other > > information in dataset B. > > 1) Mother's Last Name (A & B) > > 2) Mother's First Name (A & B) > > 3) Mother's DOB (A & B) > > 4) Child's Last name (A & B) > > 5) Child's First name (A & B) > > 6) Child's DOB (A & B) > > 7) CERTNUM (B) > > > > I want the new datasets matched by mother's last name, first > name, mother's > > dob, child's last name, first name, and child's dob. Any ideas? > > > > KH. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message to > > [hidden email] <mailto:[hidden email]> > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send the command > > INFO REFCARD > > > > > > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
At 08:57 AM 3/18/2008, Art Kendall wrote:
>I still do not understand what the problem is. >Match files can use several variables that have to be the same to create >a match. >E.G., >match files > file=seta /in = ina > file=setb /in = inb >/by MLN MFN MDOB CLN CFN CDOB >select if ina eq 1 and inb eq 1. >. . . >Why a _probabilistic_ match? "Fuzzy logic" models are, I suppose, based on the idea that some of the data may be wrong, and if two records match on, say, 5 out of 6 criteria (or, "a sufficiently large number of variables"), then the records must be associated with the same case, and the value for the sixth variable must be in error. I would suggest exercising considerable caution in this regard. For example, 5/6 might produce too many matches. Or it might produce a false match. All such "probabilistic" matches should be tagged for future reference so that they can easily be removed if challenged. My own experience in this regard was with trying to find a combination of variables to serve as the key field (in the aggregate.) Strings of variables that I thought surely would be unique turned out not to be unique (i.e., identified more than one case.) But of course that's a different scenario. Take care! Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I originally suggested using aggregate or "find duplicate cases" to see
if the 6 variables produced duplicates. The OP said they did not. Art Kendall Social Research Consultants Bob Schacht wrote: > At 08:57 AM 3/18/2008, Art Kendall wrote: >> I still do not understand what the problem is. >> Match files can use several variables that have to be the same to create >> a match. >> E.G., >> match files >> file=seta /in = ina >> file=setb /in = inb >> /by MLN MFN MDOB CLN CFN CDOB >> select if ina eq 1 and inb eq 1. >> . . . >> Why a _probabilistic_ match? > > "Fuzzy logic" models are, I suppose, based on the idea that some of > the data may be wrong, and if two records match on, say, 5 out of 6 > criteria (or, "a sufficiently large number of variables"), then the > records must be associated with the same case, and the value for the > sixth variable must be in error. I would suggest exercising > considerable caution in this regard. For example, 5/6 might produce > too many matches. Or it might produce a false match. All such > "probabilistic" matches should be tagged for future reference so that > they can easily be removed if challenged. > > My own experience in this regard was with trying to find a combination > of variables to serve as the key field (in the aggregate.) Strings of > variables that I thought surely would be unique turned out not to be > unique (i.e., identified more than one case.) But of course that's a > different scenario. > > Take care! > > Bob Schacht > > > Robert M. Schacht, Ph.D. <[hidden email]> > Pacific Basin Rehabilitation Research & Training Center > 1268 Young Street, Suite #204 > Research Center, University of Hawaii > Honolulu, HI 96814 > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Khaleel Hussaini
There is a literature on matching with incomplete information using
deterministic and/or probabilistic approaches outside of the SPSS Listserve. Art says there are no duplicates so this may be overkill here - it may be sufficient to run MATCH FILES using all the ID variables you have in the two files. However, if there is missing or inconsistent data among these variables then you may need to consider one of the more sophisticated methods. CDC provides such a matching program for free for such a purpose. Dennis Deck, PhD RMC Research Corporation 111 SW Columbia Street, Suite 1200 Portland, Oregon 97201-5843 voice: 503-223-8248 x715 voice: 800-788-1887 x715 fax: 503-223-8248 [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi,
Other free probabilistic linkage programs include: --Febrl (Python based, now has a GUI) --The Link King (SAS based). I am currently using Febrl. It's well-documented, and because it's open source you're free to (and even encouraged) to change or improve the program. You can find it on sourceforge.net. Another option would be to use "n-1 deterministic matching": allow one of the matching variables to be a non-match. Cheers!! Albert-Jan --- Dennis Deck <[hidden email]> wrote: > There is a literature on matching with incomplete > information using > deterministic and/or probabilistic approaches > outside of the SPSS > Listserve. Art says there are no duplicates so > this may be overkill > here - it may be sufficient to run MATCH FILES using > all the ID > variables you have in the two files. > > However, if there is missing or inconsistent data > among these variables > then you may need to consider one of the more > sophisticated methods. CDC > provides such a matching program for free for such a > purpose. > > Dennis Deck, PhD > RMC Research Corporation > 111 SW Columbia Street, Suite 1200 > Portland, Oregon 97201-5843 > voice: 503-223-8248 x715 > voice: 800-788-1887 x715 > fax: 503-223-8248 > [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a > message to > [hidden email] (not to SPSSX-L), with no > body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send > the command > INFO REFCARD > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Art Kendall
Home many cases do you have in each file?
Might you also have gender for the child? I have a client coming in a few minutes and cannot fully develop and test this idea. A start might be to write two new files with just the 6 variables, a flag variable to indicate which file the record came from and the sequence number in its original file file. Use this to test and debug a file of syntax that you can apply to the whole data set. Then something like this; temporary. select if mln ne cln. list. generate a set of specific data patch statements DO if fileflag eq 1 and ANY(caseseq, 101, 222, 333,1234). COMPUTE OLDMLN = MLN. COMPUTE MLN= CLN. ELSE IF fileflag eq 1 and ANY(caseseq, 121, 212, 335,2341). COMPUTE OLDCLN = CLN. COMPUTE CLN= MLN. END IF. do the same with fileflag2. AUTORECODE VARIABLES= MLN CLN /INTO new N_MLN N_CLN /BLANK= MISSING /SAVE TEMPLATE='filespec1' ** /GROUP** /PRINT. /using judgment decide which last names are possibly the same by looking back at the other variables and relevant cases in the input./ recode n_MLN (17, 23=17) (24,28=28) (else=copy) into N2_MLN. recode n_CLN (98,99=99)(101,222=101)(else=copy) into N2_CLN. AUTORECODE VARIABLES= MFN CFN /INTO new N_MFN N_CFN /BLANK= MISSING /SAVE TEMPLATE='filespec2' /PRINT. //using judgment decide which mother (child) first names are possibly the same by looking back at the other variables and relevant cases in the input.// recode n_mfn ... into n2_mfn. recode n_cfn ... into n2_cfn. then open the original files and patch them cannibalizing the patches above autorecode them using the templates in filespec1 and filespec 2. then match the files using n2_MFN n2_MLN etc. examine the results and go back to the beginning and tweak the process. Art Kendall Social Research Consultants. autorecode MLN and CLN Khaleel Hussaini wrote: > Thanks for all your responses. The dataset example is given below. Now > if we were do an exact match this would not be matched in SPSS i > presume, how would we go about matching these datasets as the only > condition that is equivalent is MDOB and CDOB. There are cases in the > databases where the date of birth for either mother and/or child is in > correct. I know that the data in dataset 2 is most reliable. How do I > then match the files? I am aware of the CDC software, however, due to > data integrity and security issues I am unable to download on my > workcomputer. > KH. > > Dataset1 > > > > > > MFN MLN MDOB CLN CFN CDOB > Arroyo Letici 8/13/1975 Arroyo George 11/29/2007 > > > > > > > Dataset2 > > > > > MFN MLN MDOB CLN CFN CDOB > Arroya Leticia 8/13/1975 Arroyo Jorge 11/29/2007 > > > MLN = Mother's Last Name > MFN = Mother's First Name > MDOB = Mother' Date of Birth > CLN = Child's last name > CFN = Child's first name > CDOB = Child's birthdate > > I th > > > > > > > On 3/18/08, *Art Kendall* <[hidden email] > <mailto:[hidden email]>> wrote: > > I still do not understand what the problem is. > Match files can use several variables that have to be the same to > create a match. > E.G., > match files > file=seta /in = ina > file=setb /in = inb > /by MLN MFN MDOB CLN CFN CDOB > select if ina eq 1 and inb eq 1. > . . . > Why a _probabilistic_ match? > Are dates only approximate, e.g., one has July 4, 1980 for > Mother's birthdate and one has July 5, 1980? > Are there hours, minutes etc in the date on one file and not the > other? > Are there variations in the spelling of names for a particular > mother or child? or sometimes a nickname? > > can you create a small example with two files that shows what the > problem is? > > > Art Kendall > Social Research Consultants > Khaleel Hussaini wrote: >> Thank you for your response. There are no duplicate cases. The >> problem essentially is how does one match files without unique >> ids? If there are unique ids in more than database that >> correspond to an individual X and each file stores information >> about the individual X then matching the data file based on >> unique ids is easy. However, the problem arises when the >> information available is either string or a date variable (or >> both) and one has to match X to X using the criteria that will >> result in a probabilistic match. In my case it would be that >> files for individual X satisfies, not only the last name, first >> name, dob, but also child's last name, child's first name and >> date of birth. The purpose of doing this is that I have mother >> and child's records in two databases and one database has more >> information than the other. I want to build a new dataset that >> contains information on variables from both datasets essentially >> matching and merging two datasets into one, but only those cases >> that are common to both datasets. Best, >> KH >> >> >> On 3/17/08, *Art Kendall* <[hidden email] >> <mailto:[hidden email]>> wrote: >> >> We need a better description of your problem. >> >> make a copy of your files as they stand for backup. >> For both files determine how frequently the 6 variables >> constitute >> unique identification of cases. >> click <data> >> click <find duplicate cases> >> enter the 6 variable as unique identifier set. >> paste the syntax >> run the syntax. >> >> Is it possible you could accomplish what you want by >> aggregating each >> file so that there is a summary record for each unique >> combination of >> the six variables? >> >> >> Art Kendall >> Social Research Consultants >> >> >> >> >> >> >> Khaleel Hussaini wrote: >> > Hello Listers, >> > I have the following variables but no unique id and I >> want to match >> > the two files and create a new file with one additional >> variable. Here is >> > the variable information in both datasets A and B but also >> some other >> > information in dataset B. >> > 1) Mother's Last Name (A & B) >> > 2) Mother's First Name (A & B) >> > 3) Mother's DOB (A & B) >> > 4) Child's Last name (A & B) >> > 5) Child's First name (A & B) >> > 6) Child's DOB (A & B) >> > 7) CERTNUM (B) >> > >> > I want the new datasets matched by mother's last name, first >> name, mother's >> > dob, child's last name, first name, and child's dob. Any ideas? >> > >> > KH. >> > >> > ===================== >> > To manage your subscription to SPSSX-L, send a message to >> > [hidden email] <mailto:[hidden email]> >> (not to SPSSX-L), with no body text except the >> > command. To leave the list, send the command >> > SIGNOFF SPSSX-L >> > For a list of commands to manage subscriptions, send the command >> > INFO REFCARD >> > >> > >> > >> >> > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Art Kendall
This is a repost because the listserv said it had already been posted.
However, it did not come back to my mail The note said to add some sentences explaining why some might receive 2 copies. Another sentence should make this sufficiently different from previous posts. *what my post said:* Home many cases do you have in each file? Might you also have gender for the child? I have a client coming in a few minutes and cannot fully develop and test this idea. A start might be to write two new files with just the 6 variables, a flag variable to indicate which file the record came from and the sequence number in its original file file. Use this to test and debug a file of syntax that you can apply to the whole data set. Then something like this; temporary. select if mln ne cln. list. generate a set of specific data patch statements DO if fileflag eq 1 and ANY(caseseq, 101, 222, 333,1234). COMPUTE OLDMLN = MLN. COMPUTE MLN= CLN. ELSE IF fileflag eq 1 and ANY(caseseq, 121, 212, 335,2341). COMPUTE OLDCLN = CLN. COMPUTE CLN= MLN. END IF. do the same with fileflag2. AUTORECODE VARIABLES= MLN CLN /INTO new N_MLN N_CLN /BLANK= MISSING /SAVE TEMPLATE='filespec1' ** /GROUP** /PRINT. /using judgment decide which last names are possibly the same by looking back at the other variables and relevant cases in the input./ recode n_MLN (17, 23=17) (24,28=28) (else=copy) into N2_MLN. recode n_CLN (98,99=99)(101,222=101)(else=copy) into N2_CLN. AUTORECODE VARIABLES= MFN CFN /INTO new N_MFN N_CFN /BLANK= MISSING /SAVE TEMPLATE='filespec2' /PRINT. //using judgment decide which mother (child) first names are possibly the same by looking back at the other variables and relevant cases in the input.// recode n_mfn ... into n2_mfn. recode n_cfn ... into n2_cfn. then open the original files and patch them cannibalizing the patches above autorecode them using the templates in filespec1 and filespec 2. then match the files using n2_MFN n2_MLN etc. examine the results and go back to the beginning and tweak the process. Art Kendall Social Research Consultants. autorecode MLN and CLN Khaleel Hussaini wrote: > Thanks for all your responses. The dataset example is given below. Now > if we were do an exact match this would not be matched in SPSS i > presume, how would we go about matching these datasets as the only > condition that is equivalent is MDOB and CDOB. There are cases in the > databases where the date of birth for either mother and/or child is in > correct. I know that the data in dataset 2 is most reliable. How do I > then match the files? I am aware of the CDC software, however, due to > data integrity and security issues I am unable to download on my > workcomputer. > KH. > > Dataset1 > > > > > > MFN MLN MDOB CLN CFN CDOB > Arroyo Letici 8/13/1975 Arroyo George 11/29/2007 > > > > > > > Dataset2 > > > > > MFN MLN MDOB CLN CFN CDOB > Arroya Leticia 8/13/1975 Arroyo Jorge 11/29/2007 > > > MLN = Mother's Last Name > MFN = Mother's First Name > MDOB = Mother' Date of Birth > CLN = Child's last name > CFN = Child's first name > CDOB = Child's birthdate > > I th > > > > > > > On 3/18/08, *Art Kendall* <[hidden email] > <mailto:[hidden email]>> wrote: > > I still do not understand what the problem is. > Match files can use several variables that have to be the same to > create a match. > E.G., > match files > file=seta /in = ina > file=setb /in = inb > /by MLN MFN MDOB CLN CFN CDOB > select if ina eq 1 and inb eq 1. > . . . > Why a _probabilistic_ match? > Are dates only approximate, e.g., one has July 4, 1980 for > Mother's birthdate and one has July 5, 1980? > Are there hours, minutes etc in the date on one file and not the > other? > Are there variations in the spelling of names for a particular > mother or child? or sometimes a nickname? > > can you create a small example with two files that shows what the > problem is? > > > Art Kendall > Social Research Consultants > Khaleel Hussaini wrote: >> Thank you for your response. There are no duplicate cases. The >> problem essentially is how does one match files without unique >> ids? If there are unique ids in more than database that >> correspond to an individual X and each file stores information >> about the individual X then matching the data file based on >> unique ids is easy. However, the problem arises when the >> information available is either string or a date variable (or >> both) and one has to match X to X using the criteria that will >> result in a probabilistic match. In my case it would be that >> files for individual X satisfies, not only the last name, first >> name, dob, but also child's last name, child's first name and >> date of birth. The purpose of doing this is that I have mother >> and child's records in two databases and one database has more >> information than the other. I want to build a new dataset that >> contains information on variables from both datasets essentially >> matching and merging two datasets into one, but only those cases >> that are common to both datasets. Best, >> KH >> >> >> On 3/17/08, *Art Kendall* <[hidden email] >> <mailto:[hidden email]>> wrote: >> >> We need a better description of your problem. >> >> make a copy of your files as they stand for backup. >> For both files determine how frequently the 6 variables >> constitute >> unique identification of cases. >> click <data> >> click <find duplicate cases> >> enter the 6 variable as unique identifier set. >> paste the syntax >> run the syntax. >> >> Is it possible you could accomplish what you want by >> aggregating each >> file so that there is a summary record for each unique >> combination of >> the six variables? >> >> >> Art Kendall >> Social Research Consultants >> >> >> >> >> >> >> Khaleel Hussaini wrote: >> > Hello Listers, >> > I have the following variables but no unique id and I >> want to match >> > the two files and create a new file with one additional >> variable. Here is >> > the variable information in both datasets A and B but also >> some other >> > information in dataset B. >> > 1) Mother's Last Name (A & B) >> > 2) Mother's First Name (A & B) >> > 3) Mother's DOB (A & B) >> > 4) Child's Last name (A & B) >> > 5) Child's First name (A & B) >> > 6) Child's DOB (A & B) >> > 7) CERTNUM (B) >> > >> > I want the new datasets matched by mother's last name, first >> name, mother's >> > dob, child's last name, first name, and child's dob. Any ideas? >> > >> > KH. >> > >> > ===================== >> > To manage your subscription to SPSSX-L, send a message to >> > [hidden email] <mailto:[hidden email]> >> (not to SPSSX-L), with no body text except the >> > command. To leave the list, send the command >> > SIGNOFF SPSSX-L >> > For a list of commands to manage subscriptions, send the command >> > INFO REFCARD >> > >> > >> > >> >> > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Art Kendall
A list of publications consists of more than 3000
articles presented in html format. The first 2 entries are as follows: 1. Abbott, A. A. (2003). A confirmatory factor analysis of the Professional Opinion Scale: A values assessment instrument. Research on Social Work Practice, 13(5), 641-666. 2. Abbott, G. N., White, F. A., & Charles, M. A. (2005). Linking values and organizational commitment: A correlational and experimental investigation in two organizations. Journal of Occupational and Organizational Psychology, 78, 531-551. I want to create spss database for such data. My variables would be author, year of publication, title, and title of journal. Do you know how to do it using spss syntax? Thank you. Johnny ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Johnny,
Try this. For explanation about the functions and the behavior of scratch variables (those that start with #) see the SPSS Help documentation The assumptions in this code are: 1. The name of the variable containing the list of publications is A. 2. And that each value starts with the author name right away (i.e., no record numbers preceding the Author name -- meaning the 1 preceding Abbott should not be included in variable A). STRING AUTHOR (A200). COMPUTE AUTHOR = SUBSTR(A,1,CHAR.INDEX(A,'(')-1). COMPUTE YEAR = NUMBER(SUBSTR(A,CHAR.INDEX(A,'(')+1,4),F4.0). EXE. STRING TITLE (A300). STRING JOURNAL1 (A300). STRING JOURNAL2 (A300). STRING #AAA (A300). STRING #AA (A300). COMPUTE #AA= SUBSTR(A,CHAR.INDEX(A,'(')+7). COMPUTE TITLE = SUBSTR(#AA,1,CHAR.INDEX(#AA,'.')-1). COMPUTE JOURNAL1 = SUBSTR(#AA,CHAR.INDEX(#AA,'.')+1). COMPUTE #AAA= SUBSTR(#AA,CHAR.INDEX(#AA,'.')+1). COMPUTE JOURNAL2 = SUBSTR(#AAA,1,CHAR.INDEX(#AAA,',')-1). EXE. Hope this helps. Florio At 12:34 AM 3/22/2008, John Amora wrote: >A list of publications consists of more than 3000 >articles presented in html format. The first 2 entries >are as follows: > >1. Abbott, A. A. (2003). A confirmatory factor > analysis of the Professional Opinion Scale: A > values assessment instrument. Research on Social > Work Practice, 13(5), 641-666. >2. Abbott, G. N., White, F. A., & Charles, M. A. > (2005). Linking values and organizational > commitment: A correlational and experimental > investigation in two organizations. Journal of > Occupational and Organizational Psychology, 78, > 531-551. > >I want to create spss database for such data. My >variables would be author, year of publication, title, >and title of journal. Do you know how to do it using >spss syntax? > >Thank you. > >Johnny > > > >____________________________________________________________________________________ >Never miss a thing. Make Yahoo your home page. >http://www.yahoo.com/r/hs > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
