Merging cases

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging cases

Kenny Shen
Hi,

I received a file in CSV format that needs some cleaning up. The file contains responses from a simple survey conducted among 30,000 people, and they're sorted via their email address. What I see in the 1st 10,000 cases is that there are 'duplicate' entries with some responses 'spread' out among them. For eg.

id | name | email | x1 | x2 | x3 | x4 ....
1 | abc | [hidden email] | Y |    |   |    |
2 | abc | [hidden email] |    | Y |   |    |
3 | abc | [hidden email] |    |    |   | N |

Is there any way I can quickly combine these entries so I get

1 | abc | [hidden email] | Y | Y |   | N |

Thanks much in advance!

Ken.
Reply | Threaded
Open this post in threaded view
|

Re: Merging cases

Daciuk, Tim

Hi Ken:
Take a look at RESTRUCTURE… in the DATA menu.  I think that should help in doing what you want to do.

 

Tim Daciuk

Director, Worldwide Demo Resources

SPSS Inc.

 

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kenny Shen
Sent: Tuesday, June 23, 2009 2:27 AM
To: [hidden email]
Subject: Merging cases

 

Hi,

I received a file in CSV format that needs some cleaning up. The file contains responses from a simple survey conducted among 30,000 people, and they're sorted via their email address. What I see in the 1st 10,000 cases is that there are 'duplicate' entries with some responses 'spread' out among them. For eg.

id | name | email | x1 | x2 | x3 | x4 ....
1 | abc | [hidden email] | Y |    |   |    |
2 | abc | [hidden email] |    | Y |   |    |
3 | abc | [hidden email] |    |    |   | N |

Is there any way I can quickly combine these entries so I get

1 | abc | [hidden email] | Y | Y |   | N |

Thanks much in advance!

Ken.

Reply | Threaded
Open this post in threaded view
|

Re: Merging cases

Bruce Weaver
Administrator
In reply to this post by Kenny Shen
Kenny Shen wrote
Hi,

I received a file in CSV format that needs some cleaning up. The file
contains responses from a simple survey conducted among 30,000 people, and
they're sorted via their email address. What I see in the 1st 10,000 cases
is that there are 'duplicate' entries with some responses 'spread' out among
them. For eg.

id | name | email | x1 | x2 | x3 | x4 ....
1 | abc | abc@abc.com | Y |    |   |    |
2 | abc | abc@abc.com |    | Y |   |    |
3 | abc | abc@abc.com |    |    |   | N |

Is there any way I can quickly combine these entries so I get

1 | abc | abc@abc.com | Y | Y |   | N |

Thanks much in advance!

Ken.

How about using the FIRST function in AGGREGATE, with "email" as the break variable?

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Merging cases

David Marso
Administrator
In reply to this post by Kenny Shen
On Tue, 23 Jun 2009 14:37:38 -0700, Bruce Weaver <[hidden email]> wrote:

>Kenny Shen wrote:
>>
>> Hi,
>>
>> I received a file in CSV format that needs some cleaning up. The file
>> contains responses from a simple survey conducted among 30,000 people, and
>> they're sorted via their email address. What I see in the 1st 10,000 cases
>> is that there are 'duplicate' entries with some responses 'spread' out
>> among
>> them. For eg.
>>
>> id | name | email | x1 | x2 | x3 | x4 ....
>> 1 | abc | [hidden email] | Y |    |   |    |
>> 2 | abc | [hidden email] |    | Y |   |    |
>> 3 | abc | [hidden email] |    |    |   | N |
>>
>> Is there any way I can quickly combine these entries so I get
>>
>> 1 | abc | [hidden email] | Y | Y |   | N |
>>
>> Thanks much in advance!
>>
>> Ken.
>>
>>
>
>
>How about using the FIRST function in AGGREGATE, with "email" as the break
>variable?
>
>--
>Bruce Weaver
>[hidden email]
>http://sites.google.com/a/lakeheadu.ca/bweaver/
>"When all else fails, RTFM."
>

Hi Ken and Bruce,
I would suggest using MAX function in AGGREGATE with email as the BREAK
variable as Bruce suggested.  FIRST will likely give you BLANKS for string
variables but probably work for numerics.  OTOH, if you have more than one
data value per email for a given field you will lose that information either
way.
HTH, David

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Merging cases

Bruce Weaver
Administrator
David Marso wrote
--- snip ---

Hi Ken and Bruce,
I would suggest using MAX function in AGGREGATE with email as the BREAK
variable as Bruce suggested.  FIRST will likely give you BLANKS for string
variables but probably work for numerics.  OTOH, if you have more than one
data value per email for a given field you will lose that information either
way.
HTH, David
David, good catch on MAX versus FIRST.  I was of course thinking about how it works for numeric variables.

Cheers,
Bruce
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).