Adding Cases

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding Cases

Mike P-5
Hi All,

Not sure if this is a tricky question or if I'm not thinking about it
properly,

I have two datasets a 2006 and a 2007, all of the variables are called
n_"something numeric"

I need to merge cases from 2006 onto the 2007 dataset

The problem is the client hasn't been to clever and has used a different
order in this years, however the var labels are still the same, does
anyone have any good potential solutions? To match up the correct n_
from 2006 with the correct one for 2007?

My initial thoughts are using python to inspect the Var labels and then
do something clever, but didn't know if someone had a nicer solution?

Cheers for your time
Mike
Reply | Threaded
Open this post in threaded view
|

Re: Adding Cases

Peck, Jon
So you want to add cases from two files based on matching up the variable labels instead of the names, if I understand you.  The following little Python function does the job.  Note that it assumes that the labels are unique.  (I can email you the py file if the listserve mangles this too badly.  Explanation below.


# match files based on variable labels
import spss, spssaux
def mergeByLabel(firstfile, secondfile):
    """Merge cases from two sav files matching up the variables by the variable labels
    instead of the names.  The result is based on the names in firstfile.

    The labels are assumed to be unique so that the match is unambiguous."""

    spssaux.OpenDataFile(secondfile)

    labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()])
    spssaux.OpenDataFile(firstfile)
    labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()])

    renamein = []
    renameout = []
    for (label, name) in labeldict1.items():
        renamein.append(labeldict2[label])
        renameout.append(name)
    renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) + ")"

    cmd = r"""ADD FILES /FILE=*
    /FILE='%(secondfile)s'
    /RENAME =  %(renamesubcmd)s.""" % locals()
    spss.Submit([cmd, "EXECUTE."])

secondfile = "c:/temp/dataset2.sav"
firstfile = "c:/temp/dataset1.sav"
mergeByLabel(firstfile, secondfile)
spss.Submit("save outfile = 'c:/temp/firstPlusSecond.sav'")

This code, which would be enclosed in BEGIN/END PROGRAM first builds a dictionary of the second file where the keys are the variable labels and the values are the names.

Then it goes through the first file and for each variable it looks up the label in the second file and retrieves the corresponding variable name.

Finally it submits an ADD FILES command with a rename that changes the variable names in the second file so that they match up with the first one.

The last four lines just illustrate the usage.

It wouldn't be hard to generalized this so that variables with unmatched labels are ignored.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Pearmain
Sent: Tuesday, July 24, 2007 6:28 AM
To: [hidden email]
Subject: [SPSSX-L] Adding Cases

Hi All,

Not sure if this is a tricky question or if I'm not thinking about it
properly,

I have two datasets a 2006 and a 2007, all of the variables are called
n_"something numeric"

I need to merge cases from 2006 onto the 2007 dataset

The problem is the client hasn't been to clever and has used a different
order in this years, however the var labels are still the same, does
anyone have any good potential solutions? To match up the correct n_
from 2006 with the correct one for 2007?

My initial thoughts are using python to inspect the Var labels and then
do something clever, but didn't know if someone had a nicer solution?

Cheers for your time
Mike
Reply | Threaded
Open this post in threaded view
|

Re: Adding Cases

Maguin, Eugene
In reply to this post by Mike P-5
Michael,

I'm confused by this part of your posting.

>>The problem is the client hasn't been to clever and has used a different
order in this years, however the var labels are still the same, does
anyone have any good potential solutions? To match up the correct n_
from 2006 with the correct one for 2007?

I think you are saying that, for example, the variable order in 2006 was
n_1, n_2, n_3 while the order in 2007 is n_3, n_1, n_2. And, it is true that
N_1 in the 2006 dataset is the same data element as n_1 in 2007 and both has
the variable label.

If all this so, I don't think there is problem. The add files command should
sort things out correctly.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Adding Cases

Mike P-5
In reply to this post by Mike P-5
Hi Gene,
It was slightly trickier than that,

e.g 2006

Var name        Var LAb
n_1 ..  a1
n_2 ..  b5
n_4 ..  jkl9

and in 2007
Var name        Var LAb
n_1 ..  b5
n_2 ..  jk19
n_4 ..  hje9
.
.
n_39 .  a1

Does this make more sense?

It seems like the only clever way to match n_1 in 2006 with n_39 in 2007
was through the variable label which stayed consistent they had rotated
the question order,

Mike



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gene Maguin
Sent: 24 July 2007 14:51
To: [hidden email]
Subject: Re: Adding Cases

Michael,

I'm confused by this part of your posting.

>>The problem is the client hasn't been to clever and has used a
different
order in this years, however the var labels are still the same, does
anyone have any good potential solutions? To match up the correct n_
from 2006 with the correct one for 2007?

I think you are saying that, for example, the variable order in 2006 was
n_1, n_2, n_3 while the order in 2007 is n_3, n_1, n_2. And, it is true
that
N_1 in the 2006 dataset is the same data element as n_1 in 2007 and both
has
the variable label.

If all this so, I don't think there is problem. The add files command
should
sort things out correctly.

Gene Maguin

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Adding Cases

zstatman
In reply to this post by Peck, Jon
Jon, also something that might be helpful to me. I modified file names but
get an error as seen below.

Thoughts?
Will

========= Program ========
BEGIN PROGRAM.
# match files based on variable labels
import spss, spssaux
def mergeByLabel(firstfile, secondfile):
    """Merge cases from two sav files matching up the variables by the
variable
   labels
    instead of the names.  The result is based on the names in firstfile.

    The labels are assumed to be unique so that the match is unambiguous."""

    spssaux.OpenDataFile(secondfile)

    labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in
spssa
   ux.VariableDict()])
    spssaux.OpenDataFile(firstfile)
    labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in
spssa
   ux.VariableDict()])

    renamein = []
    renameout = []
    for (label, name) in labeldict1.items():
        renamein.append(labeldict2[label])
        renameout.append(name)
    renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) +
")"

    cmd = r"""ADD FILES /FILE=*
    /FILE='%(secondfile)s'
    /RENAME =  %(renamesubcmd)s.""" % locals()
    spss.Submit([cmd, "EXECUTE."])

secondfile = "D:/Work Area/MergeData_2006.sav"
firstfile = "D:/Work Area/MergeData_2007.sav"
mergeByLabel(firstfile, secondfile)
spss.Submit("save outfile ='D:/Work Area/MergeData_2006+2007.sav'")
END PROGRAM.

Traceback (most recent call last):
  File "<string>", line 29, in ?
  File "<string>", line 18, in mergeByLabel
KeyError: 'Trolley'

========= END ======================


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Peck, Jon
Sent: Tuesday, July 24, 2007 9:01 AM
To: [hidden email]
Subject: Re: Adding Cases

So you want to add cases from two files based on matching up the variable
labels instead of the names, if I understand you.  The following little
Python function does the job.  Note that it assumes that the labels are
unique.  (I can email you the py file if the listserve mangles this too
badly.  Explanation below.


# match files based on variable labels
import spss, spssaux
def mergeByLabel(firstfile, secondfile):
    """Merge cases from two sav files matching up the variables by the
variable labels
    instead of the names.  The result is based on the names in firstfile.

    The labels are assumed to be unique so that the match is unambiguous."""

    spssaux.OpenDataFile(secondfile)

    labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in
spssaux.VariableDict()])
    spssaux.OpenDataFile(firstfile)
    labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in
spssaux.VariableDict()])

    renamein = []
    renameout = []
    for (label, name) in labeldict1.items():
        renamein.append(labeldict2[label])
        renameout.append(name)
    renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) +
")"

    cmd = r"""ADD FILES /FILE=*
    /FILE='%(secondfile)s'
    /RENAME =  %(renamesubcmd)s.""" % locals()
    spss.Submit([cmd, "EXECUTE."])

secondfile = "c:/temp/dataset2.sav"
firstfile = "c:/temp/dataset1.sav"
mergeByLabel(firstfile, secondfile)
spss.Submit("save outfile = 'c:/temp/firstPlusSecond.sav'")

This code, which would be enclosed in BEGIN/END PROGRAM first builds a
dictionary of the second file where the keys are the variable labels and the
values are the names.

Then it goes through the first file and for each variable it looks up the
label in the second file and retrieves the corresponding variable name.

Finally it submits an ADD FILES command with a rename that changes the
variable names in the second file so that they match up with the first one.

The last four lines just illustrate the usage.

It wouldn't be hard to generalized this so that variables with unmatched
labels are ignored.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Michael Pearmain
Sent: Tuesday, July 24, 2007 6:28 AM
To: [hidden email]
Subject: [SPSSX-L] Adding Cases

Hi All,

Not sure if this is a tricky question or if I'm not thinking about it
properly,

I have two datasets a 2006 and a 2007, all of the variables are called
n_"something numeric"

I need to merge cases from 2006 onto the 2007 dataset

The problem is the client hasn't been to clever and has used a different
order in this years, however the var labels are still the same, does anyone
have any good potential solutions? To match up the correct n_ from 2006 with
the correct one for 2007?

My initial thoughts are using python to inspect the Var labels and then do
something clever, but didn't know if someone had a nicer solution?

Cheers for your time
Mike
Will
Statistical Services
 
============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============