|
Hi,
I received a file in CSV format that needs some cleaning up. The file contains responses from a simple survey conducted among 30,000 people, and they're sorted via their email address. What I see in the 1st 10,000 cases is that there are 'duplicate' entries with some responses 'spread' out among them. For eg. id | name | email | x1 | x2 | x3 | x4 .... 1 | abc | [hidden email] | Y | | | | 2 | abc | [hidden email] | | Y | | | 3 | abc | [hidden email] | | | | N | Is there any way I can quickly combine these entries so I get 1 | abc | [hidden email] | Y | Y | | N | Thanks much in advance! Ken. |
|
Hi Ken: From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of Kenny Shen Hi, |
|
Administrator
|
In reply to this post by Kenny Shen
How about using the FIRST function in AGGREGATE, with "email" as the break variable? -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM."
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Administrator
|
In reply to this post by Kenny Shen
On Tue, 23 Jun 2009 14:37:38 -0700, Bruce Weaver <[hidden email]> wrote:
>Kenny Shen wrote: >> >> Hi, >> >> I received a file in CSV format that needs some cleaning up. The file >> contains responses from a simple survey conducted among 30,000 people, and >> they're sorted via their email address. What I see in the 1st 10,000 cases >> is that there are 'duplicate' entries with some responses 'spread' out >> among >> them. For eg. >> >> id | name | email | x1 | x2 | x3 | x4 .... >> 1 | abc | [hidden email] | Y | | | | >> 2 | abc | [hidden email] | | Y | | | >> 3 | abc | [hidden email] | | | | N | >> >> Is there any way I can quickly combine these entries so I get >> >> 1 | abc | [hidden email] | Y | Y | | N | >> >> Thanks much in advance! >> >> Ken. >> >> > > >How about using the FIRST function in AGGREGATE, with "email" as the break >variable? > >-- >Bruce Weaver >[hidden email] >http://sites.google.com/a/lakeheadu.ca/bweaver/ >"When all else fails, RTFM." > Hi Ken and Bruce, I would suggest using MAX function in AGGREGATE with email as the BREAK variable as Bruce suggested. FIRST will likely give you BLANKS for string variables but probably work for numerics. OTOH, if you have more than one data value per email for a given field you will lose that information either way. HTH, David ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
David, good catch on MAX versus FIRST. I was of course thinking about how it works for numeric variables. Cheers, Bruce -- Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM."
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
| Free forum by Nabble | Edit this page |
