I have a dataset that has information collected from households,
questionaires have been answered by both husband and wife each was assigned a unique ID and has been linked to their spouse through a spouse id. There is an additional variable that tells if the person has a spouse, as in some cases the spouse was not available to be interviewed. I would like to flag one and only one of the two members of the household, it does not particularly matter which one, I have been trying to use the "Identify Duplicate Cases Wizard", but it does not seem to be able to identify duplicate cases the way I want. Any advice would be greatly appreciated. A sample of the data is below: DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA. 1011 1271 1 1045 0 1 1047 1049 1 1049 1047 1 1079 0 0 1088 0 0 1114 0 0 1142 1143 1 1143 1142 1 1271 1011 1 1300 0 1 1351 0 0 END DATA. |
Is there any way to sort the data so that the spouse's data always be
the 'next record'? (e.g. 1011 and 1271 are not.) Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Don Asay Sent: Monday, April 02, 2007 11:55 AM To: [hidden email] Subject: [SPSSX-L] Flaging Spouses within dataset, Probably a syntax solution? I have a dataset that has information collected from households, questionaires have been answered by both husband and wife each was assigned a unique ID and has been linked to their spouse through a spouse id. There is an additional variable that tells if the person has a spouse, as in some cases the spouse was not available to be interviewed. I would like to flag one and only one of the two members of the household, it does not particularly matter which one, I have been trying to use the "Identify Duplicate Cases Wizard", but it does not seem to be able to identify duplicate cases the way I want. Any advice would be greatly appreciated. A sample of the data is below: DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA. 1011 1271 1 1045 0 1 1047 1049 1 1049 1047 1 1079 0 0 1088 0 0 1114 0 0 1142 1143 1 1143 1142 1 1271 1011 1 1300 0 1 1351 0 0 END DATA. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
Unfortunately the data is not structured like that, I saw syntax on the
internet that searches the case before and after for duplicates, but the spouse could be absolutely anywhere in the dataset. Don On 4/2/07, Melissa Ives <[hidden email]> wrote: > > Is there any way to sort the data so that the spouse's data always be > the 'next record'? (e.g. 1011 and 1271 are not.) > > Melissa > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Don Asay > Sent: Monday, April 02, 2007 11:55 AM > To: [hidden email] > Subject: [SPSSX-L] Flaging Spouses within dataset, Probably a syntax > solution? > > I have a dataset that has information collected from households, > questionaires have been answered by both husband and wife each was > assigned a unique ID and has been linked to their spouse through a > spouse id. There is an additional variable that tells if the person has > a spouse, as in some cases the spouse was not available to be > interviewed. I would like to flag one and only one of the two members > of the household, it does not particularly matter which one, I have been > trying to use the "Identify Duplicate Cases Wizard", but it does not > seem to be able to identify duplicate cases the way I want. Any advice > would be greatly appreciated. > > A sample of the data is below: > > DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA. > 1011 1271 1 > 1045 0 1 > 1047 1049 1 > 1049 1047 1 > 1079 0 0 > 1088 0 0 > 1114 0 0 > 1142 1143 1 > 1143 1142 1 > 1271 1011 1 > 1300 0 1 > 1351 0 0 > > END DATA. > > > PRIVILEGED AND CONFIDENTIAL INFORMATION > This transmittal and any attachments may contain PRIVILEGED AND > CONFIDENTIAL information and is intended only for the use of the > addressee. If you are not the designated recipient, or an employee > or agent authorized to deliver such transmittals to the designated > recipient, you are hereby notified that any dissemination, > copying or publication of this transmittal is strictly prohibited. If > you have received this transmittal in error, please notify us > immediately by replying to the sender and delete this copy from your > system. You may also call us at (309) 827-6026 for assistance. > > > |
In reply to this post by Björn Türoque
This is what I would try:
sort cases by Rid (a). compute keepit = 0. if (Rid = lag(SPid)) keepit = 1. select if (keepit = 0). fre var = keepit. This puts the cases in order by Respondent id. Then the syntax looks back to see if the Rid matches the SPid in the previous line. Then it marks it with a number 1. All that is left is one each of the cases that do not have a match. You may have to play around with this a bit to make sure it works ok on your data. Good luck! meljr
|
In reply to this post by Björn Türoque
That would work except for cases like 1011 where the spouse's id is 1271
and would not be the lagged record. I haven't had time to check it, but I've been thinking of saving out the spid values as Rid values and matching that file back into the original file to assign family id of some sort or an 'in' variable. Can you lag within a loop to find the matching record? How many records do you have? Maybe this will trigger an idea that will work the way you want. Melissa The bubbling brook would lose its song if you removed the rocks. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of meljr Sent: Monday, April 02, 2007 12:12 PM To: [hidden email] Subject: Re: [SPSSX-L] Flaging Spouses within dataset, Probably a syntax solution? This is what I would try: sort cases by Rid (a). compute keepit = 0. if (Rid = lag(SPid)) keepit = 1. select if (keepit = 0). fre var = keepit. This puts the cases in order by Respondent id. Then the syntax looks back to see if the Rid matches the SPid in the previous line. Then it marks it with a number 1. All that is left is one each of the cases that do not have a match. You may have to play around with this a bit to make sure it works ok on your data. Good luck! meljr Don Asay wrote: > > I have a dataset that has information collected from households, > questionaires have been answered by both husband and wife each was > assigned a unique ID and has been linked to their spouse through a > spouse id. There is an additional variable that tells if the person > has a spouse, as in some cases the spouse was not available to be > interviewed. I would like to flag one and only one of the two members > of the household, it does not particularly matter which one, I have > been trying to use the "Identify Duplicate Cases Wizard", but it does > not seem to be able to identify duplicate cases the way I want. Any > advice would be greatly appreciated. > > A sample of the data is below: > > DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA. > 1011 1271 1 > 1045 0 1 > 1047 1049 1 > 1049 1047 1 > 1079 0 0 > 1088 0 0 > 1114 0 0 > 1142 1143 1 > 1143 1142 1 > 1271 1011 1 > 1300 0 1 > 1351 0 0 > > END DATA. > > -- View this message in context: http://www.nabble.com/Flaging-Spouses-within-dataset%2C-Probably-a-synta x-solution--tf3506948.html#a9794918 Sent from the SPSSX Discussion mailing list archive at Nabble.com. PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
In reply to this post by meljr
Thank you for the responses, unfortunately the spouse ID is often not
the case immidiately before or immediately after the respondents ID, so looking at the case before or the case after does not work in this situation. Is there a way to have the computer search all Spouse ID's instead of just the one before or the one after? Don On 4/2/07, meljr <[hidden email]> wrote: > > This is what I would try: > > sort cases by Rid (a). > compute keepit = 0. > if (Rid = lag(SPid)) keepit = 1. > select if (keepit = 0). > fre var = keepit. > > This puts the cases in order by Respondent id. > Then the syntax looks back to see if the Rid matches the SPid in the > previous line. > Then it marks it with a number 1. > All that is left is one each of the cases that do not have a match. > > You may have to play around with this a bit to make sure it works ok on > your > data. > Good luck! > meljr > > > Don Asay wrote: > > > > I have a dataset that has information collected from households, > > questionaires have been answered by both husband and wife each was > > assigned > > a unique ID and has been linked to their spouse through a spouse id. > There > > is an additional variable that tells if the person has a spouse, as in > > some > > cases the spouse was not available to be interviewed. I would like to > > flag one and only one of the two members of the household, it does not > > particularly matter which one, I have been trying to use the "Identify > > Duplicate Cases Wizard", but it does not seem to be able to identify > > duplicate cases the way I want. Any advice would be greatly appreciated. > > > > A sample of the data is below: > > > > DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) > > BEGIN DATA. > > 1011 1271 1 > > 1045 0 1 > > 1047 1049 1 > > 1049 1047 1 > > 1079 0 0 > > 1088 0 0 > > 1114 0 0 > > 1142 1143 1 > > 1143 1142 1 > > 1271 1011 1 > > 1300 0 1 > > 1351 0 0 > > > > END DATA. > > > > > > -- > View this message in context: > http://www.nabble.com/Flaging-Spouses-within-dataset%2C-Probably-a-syntax-solution--tf3506948.html#a9794918 > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > |
In reply to this post by Melissa Ives
Don,
I agree with Melissa's idea. I haven't had a chance to work this through, even though I think I had to do something like this once. I hope you have realized that while you have an immediate problem--the one that prompted you to write--you also have a recurrent problem because this problem will occur over and over and over. It does so because, in my opinion, a design error was made in the study execution. I think the only true way out is to construct a household or couple id that is superordinate to the persons who responded. I think I would work this problem in the following manner. (I am aiming for what I regard as the true fix and not a temporary one). I'll assume you are conversant with syntax. You have one copy of the data set. Call it RespDS and sort it by respondent id and create a new variable called Hhid that has the value of respondent id. Save another copy, call it SpouseDS, sort it by spouse id, and create a new variable called HHid that has the value of spouse id. Do an add files and sort by Hhid. Finally, you should number the records within Hhid. This is simple. Just Compute Resp=1. If (Hhid eq lag(Hhid)) resp=2. Gene Maguin |
In reply to this post by Björn Türoque
Don, sorry I should have noticed that. How about something like this:
To get a unique value multiply the Rid by the SPid. If the SPid = 0, recode it to 1 so you will not have to multiply my zero. Then sort and use the lag function to remove duplicates. This is untried, but I would be very surprised if you got any non-unique values in the Rid X SPid. meljr
|
In reply to this post by Maguin, Eugene
Melissa's idea is actually a good one and works for the most part,
unfortunately once I remove the second dataset, I am left with my original dataset with a flag variable. But unfortunately when I use the lag command I loose the people who are flagged when I remove the second dataset. When I use the duplicate cases wizard it seems to be flagging both the husband and the wife as duplicates, and thus I have to go through by hand and pick the one to not be included. Unfortunately I have no control over how other people collect their data, it is sent to me and I need to help people make sense of it. There are some people you just want to shake, especially since they should know better. Unfortunately for me I am going to have to perform this operation several times, as ; so anything quick and repeatable would be super helpful. If you want to talk about messed up data I got one the other day that would make your head spin. The researcher asked a please select all that apply question, with 8 possible answers, but coded it into spss as one variable, so the possible answers went something like this : 1 = choice 1 2= choice 2 3= choice 3 4= choice 4 .... 9 = choice 1 and choice 2 10 = 1+3 ..... 83 = 3+6+8 84 = 3+7+8 and so on.... When I asked the researcher what scheme did they use for this they told me they only coded it in if someone answered in that particular combination, and had a different variable for someoen answering 1+2+3 and someone who answered 3+1+2. The respondent could have picked as many as they want, and the order in which they picked them was not a factor. She had me ripping my hair out for hours trying to figure out a way to not have her re enter the data. All she wanted to do was run a simple crosstab of each response choice. I finally I told her she had to either do the math by hand or re-enter her data; I then showed her how to do it. Wow was she pissed... sorry I am venting, but most people don't get how frustraiting this is. Don On 4/2/07, Gene Maguin <[hidden email]> wrote: > > Don, > > I agree with Melissa's idea. I haven't had a chance to work this through, > even though I think I had to do something like this once. I hope you have > realized that while you have an immediate problem--the one that prompted > you > to write--you also have a recurrent problem because this problem will > occur > over and over and over. It does so because, in my opinion, a design error > was made in the study execution. I think the only true way out is to > construct a household or couple id that is superordinate to the persons > who > responded. > > I think I would work this problem in the following manner. (I am aiming > for > what I regard as the true fix and not a temporary one). I'll assume you > are > conversant with syntax. > > You have one copy of the data set. Call it RespDS and sort it by > respondent > id and create a new variable called Hhid that has the value of respondent > id. > > Save another copy, call it SpouseDS, sort it by spouse id, and create a > new > variable called HHid that has the value of spouse id. > > Do an add files and sort by Hhid. > > Finally, you should number the records within Hhid. This is simple. Just > > Compute Resp=1. > If (Hhid eq lag(Hhid)) resp=2. > > Gene Maguin > |
In reply to this post by Björn Türoque
At 12:55 PM 4/2/2007, Don Asay wrote:
>I have questionaires have been answered by both husband and >wife, each assigned a unique ID and linked to their spouse through a >spouse id. There is an additional variable that tells if the person >has a spouse. I would like to flag one and only one of the two >members of the household. It doesn't look like there's been a complete solution yet. Here's one, assigning a household ID, namely the lower of the two individual IDs associated with the household. In selecting, it takes the spouse with the lower ID number. (I'd usually recommend true random selection.) SPSS 15 draft output: |-----------------------------|---------------------------| |Output Created |02-APR-2007 16:50:52 | |-----------------------------|---------------------------| Rid SPid SPYN 1011 1271 1 1045 0 1 1047 1049 1 1049 1047 1 1079 0 0 1088 0 0 1114 0 0 1142 1143 1 1143 1142 1 1271 1011 1 1300 0 1 1351 0 0 Number of cases read: 12 Number of cases listed: 12 NUMERIC HHid (F6). VAR LABEL HHid 'Household ID: Lower of two individual IDs'. MISSING VAL SPid(0). COMPUTE HHid = MIN(Rid,SPid). SORT CASES BY HHid Rid. LIST. List |-----------------------------|---------------------------| |Output Created |02-APR-2007 16:50:53 | |-----------------------------|---------------------------| Rid SPid SPYN HHid 1011 1271 1 1011 1271 1011 1 1011 1045 0 1 1045 1047 1049 1 1047 1049 1047 1 1047 1079 0 0 1079 1088 0 0 1088 1114 0 0 1114 1142 1143 1 1142 1143 1142 1 1142 1300 0 1 1300 1351 0 0 1351 Number of cases read: 12 Number of cases listed: 12 SELECT IF VALUE(HHid) EQ VALUE(Rid). LIST. List |-----------------------------|---------------------------| |Output Created |02-APR-2007 16:50:53 | |-----------------------------|---------------------------| Rid SPid SPYN HHid 1011 1271 1 1011 1045 0 1 1045 1047 1049 1 1047 1079 0 0 1079 1088 0 0 1088 1114 0 0 1114 1142 1143 1 1142 1300 0 1 1300 1351 0 0 1351 Number of cases read: 9 Number of cases listed: 9 ===================================== APPENDIX: Test data, from the posting ===================================== * Test data, from posting: ........... . DATA LIST LIST /Rid(F6) SPid(F6) SPYN(F2). BEGIN DATA 1011 1271 1 1045 0 1 1047 1049 1 1049 1047 1 1079 0 0 1088 0 0 1114 0 0 1142 1143 1 1143 1142 1 1271 1011 1 1300 0 1 1351 0 0 END DATA. |
Free forum by Nabble | Edit this page |