SPSSX Discussion

Merging cases

Classic

List

Threaded

5 messages Options

Kenny Shen

Merging cases

Hi,

I received a file in CSV format that needs some cleaning up. The file contains responses from a simple survey conducted among 30,000 people, and they're sorted via their email address. What I see in the 1st 10,000 cases is that there are 'duplicate' entries with some responses 'spread' out among them. For eg.

id | name | email | x1 | x2 | x3 | x4 ....
1 | abc | [hidden email] | Y |    |   |    |
2 | abc | [hidden email] |    | Y |   |    |
3 | abc | [hidden email] |    |    |   | N |

Is there any way I can quickly combine these entries so I get

1 | abc | [hidden email] | Y | Y |   | N |

Thanks much in advance!

Ken.

Daciuk, Tim

Re: Merging cases

Hi Ken:
Take a look at RESTRUCTURE… in the DATA menu. I think that should help in doing what you want to do.

Tim Daciuk

Director, Worldwide Demo Resources

SPSS Inc.

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kenny Shen
Sent: Tuesday, June 23, 2009 2:27 AM
To: [hidden email]
Subject: Merging cases

Bruce Weaver

Re: Merging cases

Administrator

In reply to this post by Kenny Shen

Kenny Shen wrote

Hi,

I received a file in CSV format that needs some cleaning up. The file
contains responses from a simple survey conducted among 30,000 people, and
they're sorted via their email address. What I see in the 1st 10,000 cases
is that there are 'duplicate' entries with some responses 'spread' out among
them. For eg.

id | name | email | x1 | x2 | x3 | x4 ....
1 | abc | abc@abc.com | Y | | | |
2 | abc | abc@abc.com | | Y | | |
3 | abc | abc@abc.com | | | | N |

Is there any way I can quickly combine these entries so I get

1 | abc | abc@abc.com | Y | Y | | N |

Thanks much in advance!

Ken.

How about using the FIRST function in AGGREGATE, with "email" as the break variable?

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

David Marso

Re: Merging cases

Administrator

In reply to this post by Kenny Shen

On Tue, 23 Jun 2009 14:37:38 -0700, Bruce Weaver <[hidden email]> wrote:

>Kenny Shen wrote:
>>
>> Hi,
>>
>> I received a file in CSV format that needs some cleaning up. The file
>> contains responses from a simple survey conducted among 30,000 people, and
>> they're sorted via their email address. What I see in the 1st 10,000 cases
>> is that there are 'duplicate' entries with some responses 'spread' out
>> among
>> them. For eg.
>>
>> id | name | email | x1 | x2 | x3 | x4 ....
>> 1 | abc | [hidden email] | Y | | | |
>> 2 | abc | [hidden email] | | Y | | |
>> 3 | abc | [hidden email] | | | | N |
>>
>> Is there any way I can quickly combine these entries so I get
>>
>> 1 | abc | [hidden email] | Y | Y | | N |
>>
>> Thanks much in advance!
>>
>> Ken.
>>
>>
>
>
>How about using the FIRST function in AGGREGATE, with "email" as the break
>variable?
>
>--
>Bruce Weaver
>[hidden email]
>http://sites.google.com/a/lakeheadu.ca/bweaver/
>"When all else fails, RTFM."
>

Hi Ken and Bruce,
I would suggest using MAX function in AGGREGATE with email as the BREAK
variable as Bruce suggested. FIRST will likely give you BLANKS for string
variables but probably work for numerics. OTOH, if you have more than one
data value per email for a given field you will lose that information either
way.
HTH, David

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Bruce Weaver

Re: Merging cases

Administrator

David Marso wrote

--- snip ---

Hi Ken and Bruce,
I would suggest using MAX function in AGGREGATE with email as the BREAK
variable as Bruce suggested. FIRST will likely give you BLANKS for string
variables but probably work for numerics. OTOH, if you have more than one
data value per email for a given field you will lose that information either
way.
HTH, David

David, good catch on MAX versus FIRST. I was of course thinking about how it works for numeric variables.

Cheers,
Bruce
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."