|
Hi All,
Not sure if this is a tricky question or if I'm not thinking about it properly, I have two datasets a 2006 and a 2007, all of the variables are called n_"something numeric" I need to merge cases from 2006 onto the 2007 dataset The problem is the client hasn't been to clever and has used a different order in this years, however the var labels are still the same, does anyone have any good potential solutions? To match up the correct n_ from 2006 with the correct one for 2007? My initial thoughts are using python to inspect the Var labels and then do something clever, but didn't know if someone had a nicer solution? Cheers for your time Mike |
|
So you want to add cases from two files based on matching up the variable labels instead of the names, if I understand you. The following little Python function does the job. Note that it assumes that the labels are unique. (I can email you the py file if the listserve mangles this too badly. Explanation below.
# match files based on variable labels import spss, spssaux def mergeByLabel(firstfile, secondfile): """Merge cases from two sav files matching up the variables by the variable labels instead of the names. The result is based on the names in firstfile. The labels are assumed to be unique so that the match is unambiguous.""" spssaux.OpenDataFile(secondfile) labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()]) spssaux.OpenDataFile(firstfile) labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()]) renamein = [] renameout = [] for (label, name) in labeldict1.items(): renamein.append(labeldict2[label]) renameout.append(name) renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) + ")" cmd = r"""ADD FILES /FILE=* /FILE='%(secondfile)s' /RENAME = %(renamesubcmd)s.""" % locals() spss.Submit([cmd, "EXECUTE."]) secondfile = "c:/temp/dataset2.sav" firstfile = "c:/temp/dataset1.sav" mergeByLabel(firstfile, secondfile) spss.Submit("save outfile = 'c:/temp/firstPlusSecond.sav'") This code, which would be enclosed in BEGIN/END PROGRAM first builds a dictionary of the second file where the keys are the variable labels and the values are the names. Then it goes through the first file and for each variable it looks up the label in the second file and retrieves the corresponding variable name. Finally it submits an ADD FILES command with a rename that changes the variable names in the second file so that they match up with the first one. The last four lines just illustrate the usage. It wouldn't be hard to generalized this so that variables with unmatched labels are ignored. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Pearmain Sent: Tuesday, July 24, 2007 6:28 AM To: [hidden email] Subject: [SPSSX-L] Adding Cases Hi All, Not sure if this is a tricky question or if I'm not thinking about it properly, I have two datasets a 2006 and a 2007, all of the variables are called n_"something numeric" I need to merge cases from 2006 onto the 2007 dataset The problem is the client hasn't been to clever and has used a different order in this years, however the var labels are still the same, does anyone have any good potential solutions? To match up the correct n_ from 2006 with the correct one for 2007? My initial thoughts are using python to inspect the Var labels and then do something clever, but didn't know if someone had a nicer solution? Cheers for your time Mike |
|
In reply to this post by Mike P-5
Michael,
I'm confused by this part of your posting. >>The problem is the client hasn't been to clever and has used a different order in this years, however the var labels are still the same, does anyone have any good potential solutions? To match up the correct n_ from 2006 with the correct one for 2007? I think you are saying that, for example, the variable order in 2006 was n_1, n_2, n_3 while the order in 2007 is n_3, n_1, n_2. And, it is true that N_1 in the 2006 dataset is the same data element as n_1 in 2007 and both has the variable label. If all this so, I don't think there is problem. The add files command should sort things out correctly. Gene Maguin |
|
In reply to this post by Mike P-5
Hi Gene,
It was slightly trickier than that, e.g 2006 Var name Var LAb n_1 .. a1 n_2 .. b5 n_4 .. jkl9 and in 2007 Var name Var LAb n_1 .. b5 n_2 .. jk19 n_4 .. hje9 . . n_39 . a1 Does this make more sense? It seems like the only clever way to match n_1 in 2006 with n_39 in 2007 was through the variable label which stayed consistent they had rotated the question order, Mike -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: 24 July 2007 14:51 To: [hidden email] Subject: Re: Adding Cases Michael, I'm confused by this part of your posting. >>The problem is the client hasn't been to clever and has used a different order in this years, however the var labels are still the same, does anyone have any good potential solutions? To match up the correct n_ from 2006 with the correct one for 2007? I think you are saying that, for example, the variable order in 2006 was n_1, n_2, n_3 while the order in 2007 is n_3, n_1, n_2. And, it is true that N_1 in the 2006 dataset is the same data element as n_1 in 2007 and both has the variable label. If all this so, I don't think there is problem. The add files command should sort things out correctly. Gene Maguin ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ |
|
In reply to this post by Peck, Jon
Jon, also something that might be helpful to me. I modified file names but
get an error as seen below. Thoughts? Will ========= Program ======== BEGIN PROGRAM. # match files based on variable labels import spss, spssaux def mergeByLabel(firstfile, secondfile): """Merge cases from two sav files matching up the variables by the variable labels instead of the names. The result is based on the names in firstfile. The labels are assumed to be unique so that the match is unambiguous.""" spssaux.OpenDataFile(secondfile) labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in spssa ux.VariableDict()]) spssaux.OpenDataFile(firstfile) labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in spssa ux.VariableDict()]) renamein = [] renameout = [] for (label, name) in labeldict1.items(): renamein.append(labeldict2[label]) renameout.append(name) renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) + ")" cmd = r"""ADD FILES /FILE=* /FILE='%(secondfile)s' /RENAME = %(renamesubcmd)s.""" % locals() spss.Submit([cmd, "EXECUTE."]) secondfile = "D:/Work Area/MergeData_2006.sav" firstfile = "D:/Work Area/MergeData_2007.sav" mergeByLabel(firstfile, secondfile) spss.Submit("save outfile ='D:/Work Area/MergeData_2006+2007.sav'") END PROGRAM. Traceback (most recent call last): File "<string>", line 29, in ? File "<string>", line 18, in mergeByLabel KeyError: 'Trolley' ========= END ====================== -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Tuesday, July 24, 2007 9:01 AM To: [hidden email] Subject: Re: Adding Cases So you want to add cases from two files based on matching up the variable labels instead of the names, if I understand you. The following little Python function does the job. Note that it assumes that the labels are unique. (I can email you the py file if the listserve mangles this too badly. Explanation below. # match files based on variable labels import spss, spssaux def mergeByLabel(firstfile, secondfile): """Merge cases from two sav files matching up the variables by the variable labels instead of the names. The result is based on the names in firstfile. The labels are assumed to be unique so that the match is unambiguous.""" spssaux.OpenDataFile(secondfile) labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()]) spssaux.OpenDataFile(firstfile) labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()]) renamein = [] renameout = [] for (label, name) in labeldict1.items(): renamein.append(labeldict2[label]) renameout.append(name) renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) + ")" cmd = r"""ADD FILES /FILE=* /FILE='%(secondfile)s' /RENAME = %(renamesubcmd)s.""" % locals() spss.Submit([cmd, "EXECUTE."]) secondfile = "c:/temp/dataset2.sav" firstfile = "c:/temp/dataset1.sav" mergeByLabel(firstfile, secondfile) spss.Submit("save outfile = 'c:/temp/firstPlusSecond.sav'") This code, which would be enclosed in BEGIN/END PROGRAM first builds a dictionary of the second file where the keys are the variable labels and the values are the names. Then it goes through the first file and for each variable it looks up the label in the second file and retrieves the corresponding variable name. Finally it submits an ADD FILES command with a rename that changes the variable names in the second file so that they match up with the first one. The last four lines just illustrate the usage. It wouldn't be hard to generalized this so that variables with unmatched labels are ignored. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Pearmain Sent: Tuesday, July 24, 2007 6:28 AM To: [hidden email] Subject: [SPSSX-L] Adding Cases Hi All, Not sure if this is a tricky question or if I'm not thinking about it properly, I have two datasets a 2006 and a 2007, all of the variables are called n_"something numeric" I need to merge cases from 2006 onto the 2007 dataset The problem is the client hasn't been to clever and has used a different order in this years, however the var labels are still the same, does anyone have any good potential solutions? To match up the correct n_ from 2006 with the correct one for 2007? My initial thoughts are using python to inspect the Var labels and then do something clever, but didn't know if someone had a nicer solution? Cheers for your time Mike
Will
Statistical Services ============ info.statman@earthlink.net http://home.earthlink.net/~z_statman/ ============ |
| Free forum by Nabble | Edit this page |
