Hi,
I have a file with 87,000 cases. Each case (line) contains data about a car accident. the id number for each person is given. If the person had more than one car accident then he has several entries. I'd like to combine all these entries so that at the end I will have a line for each person with all the different variables (and no duplicate entries). How can I do this with spss? Thanks! Nomi |
Nomi,
I'll be very brief, perhaps too much so but I have other things to do. You want to go from multiple cases per id to one case per id. This type of rearrangement is going from 'long' to 'wide' or 'long-to-wide'. Go into the syntax reference and read up on the Casestovars command. That is all you need. Gene Maguin |
In reply to this post by nomi-6
Or use AGGREGATE if you prefer to summarize across accidents per person.
Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Wednesday, October 11, 2006 9:47 AM To: [hidden email] Subject: Re: [SPSSX-L] combining multiple records with same id Nomi, I'll be very brief, perhaps too much so but I have other things to do. You want to go from multiple cases per id to one case per id. This type of rearrangement is going from 'long' to 'wide' or 'long-to-wide'. Go into the syntax reference and read up on the Casestovars command. That is all you need. Gene Maguin PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
In reply to this post by nomi-6
Hi Nomi
You could try writing out the first accident to one file, everyone's second accident to a second file, and so on and then string them all together, like this: * Sort by id, create a incremented count called flag, which will tell you if it's the first, second, etc record for that id number. sort cases by id. compute flag=1. if (lag(id)=id) flag=lag(flag)+1. save outfile='master.sav'. * Save out a file with the first record for each person only. select if (flag=1). save outfile='one.sav'. * Get your sorted file, rename all variables to something new (SPSS won't like duped var names) and save our file two with all second accidents. get file='master.sav'. rename variables (var1 var2 var3 = var1b var2b var3b). select if (flag=2). save outfile='two.sav'. * Repeat this process for every possible instance (so if a person can have a max of 5 accidents, you'll do this five times and have 5 files). * If the people in your file are particularly accident-prone and will have a higher max, you could probably put this in a macro or something to save time. * Finally, pull them all together onto one row. new file. match files /file='one.sav' /file='two.sav' /by id. execute. HTH Craig -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Nomi Sent: Wednesday, October 11, 2006 7:11 AM To: [hidden email] Subject: combining multiple records with same id Hi, I have a file with 87,000 cases. Each case (line) contains data about a car accident. the id number for each person is given. If the person had more than one car accident then he has several entries. I'd like to combine all these entries so that at the end I will have a line for each person with all the different variables (and no duplicate entries). How can I do this with spss? Thanks! Nomi |
In reply to this post by nomi-6
At 07:11 AM 10/11/2006, Nomi wrote:
>I have a file with 87,000 cases. Each case (line) contains data about >a car accident. If the person had more than one car accident then he >has several entries. I'd like to combine all these entries so that at >the end I will have a line for each person with all the different >variables (and no duplicate entries). At 10:47 AM 10/11/2006, Gene Maguin wrote: >You want to go from multiple cases per id to one case per id. This is >going from 'long' to 'wide' or 'long-to-wide'. In the syntax >reference, read up on the Casestovars command. That's correct. You may also find you can click up the command from the menus; see menu entry "Data>Restructure". HOWEVER, think at least twice, before you restructure long to wide. Go over what you plan to do with your data, and how best to do it. You may want to post, asking about that. Many, many operations are easier with data in 'long' form, one record per incident and multiple per person, particularly when there's an indefinite number of incidents per person. At 11:53 AM 10/11/2006, Melissa Ives wrote: >Use AGGREGATE if you prefer to summarize across accidents per person. That is quite right; you'll almost certainly need to do some of it; and it's one operation much easier with Long organization. |
Free forum by Nabble | Edit this page |