Fw: Merging Files With Duplicates - SOLVED!

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fw: Merging Files With Duplicates - SOLVED!

Nora Douglas
Hello all,

I sent this out the other day, but I think I did something wrong and it didn't get to the listserve.  I'm sending it again, I apologize if anyone is getting it more than once.

Thanks,

Nora

-----Forwarded Message-----

>From: Nora Douglas <[hidden email]>
>Sent: Feb 26, 2008 10:33 AM
>To: [hidden email], [hidden email]
>Subject: Merging Files With Duplicates - SOLVED!
>
>Hi Richard,
>
>Thank you so much for the help.  You had two out of the three files described correctly (B and C).  The CASE file only had CASE_ID and CLNT in it.  I didn't have any file with all three variables.
>
>I did manage to get the files merged.  A colleague of mine from another agency came over and helped me - she has vastly more experience at merging files from multiple agencies.  There were three files originally.
>
>1) CLNT, CASE_ID, STAGE_ID (there were multiple CASE_IDs and STAGE_IDs)
>2) CLNT and other variables of interest
>3) VICTIM_ID, CASE_ID, and other variables of interest
>
>File 1 and 3 had duplicate CASE_IDs.
>
>We took file 1 and aggregated it to keep only CLNT and CASE_ID and then removed the duplicates (by deleting them entirely).  Then we merged file 1 into file 3.  That gave us VICTIM_ID, CASE_ID and CLNT all in one file.  From there we merged file 3 into the existing file.  Then we had everything in one place!  There were 13 duplicates that I will have to go into the file and manually identify.
>
>Thank you again everyone for all your help!!!
>
>Nora
>
>Date:         Thu, 21 Feb 2008 18:31:45 -0500
>Reply-To:     Richard Ristow <[hidden email]>
>Sender:       "SPSSX(r) Discussion" <[hidden email]>
>From:         Richard Ristow <[hidden email]>
>Subject:      Re: Merging Files with Duplicates
>Comments: To: Nora Douglas <[hidden email]>
>Comments: cc: Bob Schacht <[hidden email]>,
>          Gene Maguin <[hidden email]>
>In-Reply-To:  <[hidden email]
>              .earthlink.net>
>Content-Type: text/plain; charset="us-ascii"; format=flowed
>
>At 10:49 AM 2/21/2008, Nora Douglas wrote (in response to Gene Maguin):
>
>>No, CASE_ID cannot have multiple CLNT or VICTIM_IDs.  [But] a CLNT
>>or VICTIM_ID can have multiple CASE_IDs and the CASE_ID can be a duplicate.
>
>That could make things simple: for each case (CASE_ID), you have one
>CLNT and one VICTIM_ID. See if this gets you any closer. If I'm
>missing the point, post again, with some examples of your data, if you can.
>
>Anyway, at 06:17 PM 2/20/2008, you wrote (I'm repeating):
>
>>I now have two files.  The first file has CLNT (an id variable for
>>clients in Agency A) and CASE_ID.1 through CASE_ID.22.  The second
>>file has VICTIM_ID (an id variable for clients in the Agency B) and
>>CASE_ID.1 through CASE_ID.22.  CLNT and VICTIM_ID are not the
>>same.  Only CASE_ID will match between the two files.  I had to
>>restructure the file to get rid of duplicates in the CASE_ID
>>variable.  That led to the multiple CASE_ID variables.
>
>If I've got this right, you started with three files, something like this:
>
>A. File CASES (I'll call it that) with variables
>CASE_ID CLNT VICTIM_ID <other variables about cases>
>Values of all of the variables CASE_ID CLNT and VICTIM_ID can occur
>multiple times in the file.
>
>B. File AGENCY_A with variables
>CLNT <other variables about clients of Agency A>
>Values of variable CLNT cannot be duplicated in this file. The file
>is sorted by CLNT. (Or, sort it this way, if it isn't.)
>
>C. File AGENCY_B with variables
>VICTIM_ID <other variables about clients of Agency B>
>Values of variable VICTIM_ID cannot be duplicated in this file. The
>file is sorted by VICTIM_ID. (Or, sort it this way, if it isn't.)
>
>Then, you want the variables about agency-A cases and agency-B cases
>merged into the case records.
>
>If so, something like this should work:
>1.) Load file CASES as the active file or (SPSS 14-16) active dataset
>2.) The following code (untested), or something similar:
>
>SORT CASES BY CLNT.
>MATCH FILES
>    /FILE=*
>    /TABLE=AGENCY_A
>    /BY CLNT.
>
>SORT CASES BY VICTIM_ID.
>MATCH FILES
>    /FILE=*
>    /TABLE=AGENCY_B
>    /BY VICTIM_ID.
>
>Good luck! And let us know anything better we can do.
>Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD