SPSSX Discussion

Re: how to merge/compare across multiple datasets with duplicate IDs

Classic

List

Threaded

1 message

Melissa Ives

Re: how to merge/compare across multiple datasets with duplicate IDs

If you match many to many, SPSS will match the 1st with the 1st, 2nd
with the 2nd etc. When one file runs out then the variables from that
file will be blank in the matched file. Whether this will work for you
clearly depends on what you want and how the files are sorted.

E.g.

File 1 File 2
A B C A D E
1 2 3 1 4 5
1 2 2 2 4 3
2 2 3 2 3 4
2 2 1 3 5 5
2 3 3
3 1 1
3 2 2

When merged by the id (col A)
Will result in a Merged file

A B C D E
1 2 3 4 5
1 2 2 . .
2 2 3 4 3
2 2 1 3 4
2 3 3 . .
3 1 1 5 5
3 2 2 . .

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Nico Peruzzi
Sent: Thursday, October 05, 2006 1:22 PM
To: [hidden email]
Subject: Re: [SPSSX-L] how to merge/compare across multiple datasets
with duplicate IDs

Melissa,

There's no indicator of account type in dataset 3, just car information.

But match files looks like a good thing for me to explore.

But is not possible to use it for a many to many match - as in dataset 2
to 3?

Thanks, Nico

On 10/5/06, Melissa Ives <[hidden email]> wrote:

>
> Does dataset 3 have an indicator of which account type from dataset 2
> would go with which car?
> You could use a match files with a lookup table e.g.
>
> Match files file=table2/table=table 1.
>
> That will associate each (retained) value of table 1 with each record
> from table 2. However matching 2 and 3 is more questionable since it
> is now a many to many relationship.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of Nico Peruzzi
> Sent: Thursday, October 05, 2006 1:07 PM
> To: [hidden email]
> Subject: [SPSSX-L] how to merge/compare across multiple datasets with
> duplicate IDs
>
> Hi listers,
>
> I've got a pile of data that came in 3 datasets. They all have an ID
> variable, however here's the trick.
>
> dataset #1 has ID + some demographic variables (there is 1 case per
> ID) dataset #2 has ID + some variables related to someone's account
> (note that an ID can have more than 1 acocunt) (there is 1 case per
> account #) dataset #3 has ID + vehicle variables (note that an ID can
> have more than 1
> vehicle) (there is 1 case per vehicle ID)
>
> here's an example of one deceptively simple-sounding chart I need to
> create:
>
> Show frequencies of age ranges (comes from dataset #1) based on type
> of account (comes from dataset #2)
>
> here's another example:
>
> Show frequencies for each vehicle type (comes from dataset #3) based
> on account type (from #2) and ownership status (from #1)
>
> All would be great if I could just merge on ID, but as I mentioned
> above the only variable across all 3 datasets is ID, and there will be

> multiple occurences of ID in datasets #2 and #3.
>
> Any thoughts on how to work through or around this?
>
> Thanks in advance, Nico
>
> --
> Nico Peruzzi, Ph.D.
>
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any
> attachments may contain PRIVILEGED AND CONFIDENTIAL information and is

> intended only for the use of the addressee. If you are not the
> designated recipient, or an employee or agent authorized to deliver
> such transmittals to the designated recipient, you are hereby notified

> that any dissemination, copying or publication of this transmittal is
> strictly prohibited. If you have received this transmittal in error,
> please notify us immediately by replying to the sender and delete this

> copy from your system. You may also call us at (309) 827-6026 for
> assistance.
>

--
Nico Peruzzi, Ph.D.