Flaging Spouses within dataset, Probably a syntax solution?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Flaging Spouses within dataset, Probably a syntax solution?

Björn Türoque
I have a dataset that has information collected from households,
questionaires have been answered by both husband and wife each was assigned
a unique ID and has been linked to their spouse through a spouse id. There
is an additional variable that tells if the person has a spouse, as in some
cases the spouse was not available to be interviewed.  I would like to
flag one and only one of the two members of the household, it does not
particularly matter which one, I have been trying to use the "Identify
Duplicate Cases Wizard", but it does not seem to be able to identify
duplicate cases the way I want. Any advice would be greatly appreciated.

A sample of the data is below:

DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8)
BEGIN DATA.
1011 1271 1
1045 0 1
1047 1049 1
1049 1047 1
1079 0 0
1088 0 0
1114 0 0
1142 1143 1
1143 1142 1
1271 1011 1
1300 0 1
1351 0 0

END DATA.
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

Melissa Ives
Is there any way to sort the data so that the spouse's data always be
the 'next record'? (e.g. 1011 and 1271 are not.)

Melissa
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Don Asay
Sent: Monday, April 02, 2007 11:55 AM
To: [hidden email]
Subject: [SPSSX-L] Flaging Spouses within dataset, Probably a syntax
solution?

I have a dataset that has information collected from households,
questionaires have been answered by both husband and wife each was
assigned a unique ID and has been linked to their spouse through a
spouse id. There is an additional variable that tells if the person has
a spouse, as in some cases the spouse was not available to be
interviewed.  I would like to flag one and only one of the two members
of the household, it does not particularly matter which one, I have been
trying to use the "Identify Duplicate Cases Wizard", but it does not
seem to be able to identify duplicate cases the way I want. Any advice
would be greatly appreciated.

A sample of the data is below:

DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA.
1011 1271 1
1045 0 1
1047 1049 1
1049 1047 1
1079 0 0
1088 0 0
1114 0 0
1142 1143 1
1143 1142 1
1271 1011 1
1300 0 1
1351 0 0

END DATA.


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

Björn Türoque
Unfortunately the data is not structured like that, I saw syntax on the
internet that searches the case before and after for duplicates, but the
spouse could be absolutely anywhere in the dataset.

Don


On 4/2/07, Melissa Ives <[hidden email]> wrote:

>
> Is there any way to sort the data so that the spouse's data always be
> the 'next record'? (e.g. 1011 and 1271 are not.)
>
> Melissa
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Don Asay
> Sent: Monday, April 02, 2007 11:55 AM
> To: [hidden email]
> Subject: [SPSSX-L] Flaging Spouses within dataset, Probably a syntax
> solution?
>
> I have a dataset that has information collected from households,
> questionaires have been answered by both husband and wife each was
> assigned a unique ID and has been linked to their spouse through a
> spouse id. There is an additional variable that tells if the person has
> a spouse, as in some cases the spouse was not available to be
> interviewed.  I would like to flag one and only one of the two members
> of the household, it does not particularly matter which one, I have been
> trying to use the "Identify Duplicate Cases Wizard", but it does not
> seem to be able to identify duplicate cases the way I want. Any advice
> would be greatly appreciated.
>
> A sample of the data is below:
>
> DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA.
> 1011 1271 1
> 1045 0 1
> 1047 1049 1
> 1049 1047 1
> 1079 0 0
> 1088 0 0
> 1114 0 0
> 1142 1143 1
> 1143 1142 1
> 1271 1011 1
> 1300 0 1
> 1351 0 0
>
> END DATA.
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

meljr
In reply to this post by Björn Türoque
This is what I would try:

sort cases by Rid (a).
compute keepit = 0.
if (Rid = lag(SPid)) keepit = 1.
select if (keepit = 0).
fre var = keepit.

This puts the cases in order by Respondent id.
Then the syntax looks back to see if the Rid matches the SPid in the previous line.
Then it marks it with a number 1.
All that is left is one each of the cases that do not have a match.

You may have to play around with this a bit to make sure it works ok on your data.
Good luck!
meljr

Don Asay wrote
I have a dataset that has information collected from households,
questionaires have been answered by both husband and wife each was assigned
a unique ID and has been linked to their spouse through a spouse id. There
is an additional variable that tells if the person has a spouse, as in some
cases the spouse was not available to be interviewed.  I would like to
flag one and only one of the two members of the household, it does not
particularly matter which one, I have been trying to use the "Identify
Duplicate Cases Wizard", but it does not seem to be able to identify
duplicate cases the way I want. Any advice would be greatly appreciated.

A sample of the data is below:

DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8)
BEGIN DATA.
1011 1271 1
1045 0 1
1047 1049 1
1049 1047 1
1079 0 0
1088 0 0
1114 0 0
1142 1143 1
1143 1142 1
1271 1011 1
1300 0 1
1351 0 0

END DATA.
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

Melissa Ives
In reply to this post by Björn Türoque
That would work except for cases like 1011 where the spouse's id is 1271
and would not be the lagged record.

I haven't had time to check it, but I've been thinking of saving out the
spid values as Rid values and matching that file back into the original
file to assign family id of some sort or an 'in' variable.
Can you lag within a loop to find the matching record?  How many records
do you have?

Maybe this will trigger an idea that will work the way you want.

Melissa
The bubbling brook would lose its song if you removed the rocks.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
meljr
Sent: Monday, April 02, 2007 12:12 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Flaging Spouses within dataset, Probably a syntax
solution?

This is what I would try:

sort cases by Rid (a).
compute keepit = 0.
if (Rid = lag(SPid)) keepit = 1.
select if (keepit = 0).
fre var = keepit.

This puts the cases in order by Respondent id.
Then the syntax looks back to see if the Rid matches the SPid in the
previous line.
Then it marks it with a number 1.
All that is left is one each of the cases that do not have a match.

You may have to play around with this a bit to make sure it works ok on
your data.
Good luck!
meljr


Don Asay wrote:
>
> I have a dataset that has information collected from households,
> questionaires have been answered by both husband and wife each was
> assigned a unique ID and has been linked to their spouse through a
> spouse id. There is an additional variable that tells if the person
> has a spouse, as in some cases the spouse was not available to be
> interviewed.  I would like to flag one and only one of the two members

> of the household, it does not particularly matter which one, I have
> been trying to use the "Identify Duplicate Cases Wizard", but it does
> not seem to be able to identify duplicate cases the way I want. Any
> advice would be greatly appreciated.
>
> A sample of the data is below:
>
> DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8) BEGIN DATA.
> 1011 1271 1
> 1045 0 1
> 1047 1049 1
> 1049 1047 1
> 1079 0 0
> 1088 0 0
> 1114 0 0
> 1142 1143 1
> 1143 1142 1
> 1271 1011 1
> 1300 0 1
> 1351 0 0
>
> END DATA.
>
>

--
View this message in context:
http://www.nabble.com/Flaging-Spouses-within-dataset%2C-Probably-a-synta
x-solution--tf3506948.html#a9794918
Sent from the SPSSX Discussion mailing list archive at Nabble.com.


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

Björn Türoque
In reply to this post by meljr
Thank you for the responses, unfortunately the spouse ID is often not
the case immidiately before or immediately after the respondents ID, so
looking at the case before or the case after does not work in this
situation. Is there a way to have the computer search all Spouse ID's
instead of just the one before or the one after?

Don


On 4/2/07, meljr <[hidden email]> wrote:

>
> This is what I would try:
>
> sort cases by Rid (a).
> compute keepit = 0.
> if (Rid = lag(SPid)) keepit = 1.
> select if (keepit = 0).
> fre var = keepit.
>
> This puts the cases in order by Respondent id.
> Then the syntax looks back to see if the Rid matches the SPid in the
> previous line.
> Then it marks it with a number 1.
> All that is left is one each of the cases that do not have a match.
>
> You may have to play around with this a bit to make sure it works ok on
> your
> data.
> Good luck!
> meljr
>
>
> Don Asay wrote:
> >
> > I have a dataset that has information collected from households,
> > questionaires have been answered by both husband and wife each was
> > assigned
> > a unique ID and has been linked to their spouse through a spouse id.
> There
> > is an additional variable that tells if the person has a spouse, as in
> > some
> > cases the spouse was not available to be interviewed.  I would like to
> > flag one and only one of the two members of the household, it does not
> > particularly matter which one, I have been trying to use the "Identify
> > Duplicate Cases Wizard", but it does not seem to be able to identify
> > duplicate cases the way I want. Any advice would be greatly appreciated.
> >
> > A sample of the data is below:
> >
> > DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8)
> > BEGIN DATA.
> > 1011 1271 1
> > 1045 0 1
> > 1047 1049 1
> > 1049 1047 1
> > 1079 0 0
> > 1088 0 0
> > 1114 0 0
> > 1142 1143 1
> > 1143 1142 1
> > 1271 1011 1
> > 1300 0 1
> > 1351 0 0
> >
> > END DATA.
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Flaging-Spouses-within-dataset%2C-Probably-a-syntax-solution--tf3506948.html#a9794918
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

Maguin, Eugene
In reply to this post by Melissa Ives
Don,

I agree with Melissa's idea. I haven't had a chance to work this through,
even though I think I had to do something like this once. I hope you have
realized that while you have an immediate problem--the one that prompted you
to write--you also have a recurrent problem because this problem will occur
over and over and over. It does so because, in my opinion, a design error
was made in the study execution. I think the only true way out is to
construct a household or couple id that is superordinate to the persons who
responded.

I think I would work this problem in the following manner. (I am aiming for
what I regard as the true fix and not a temporary one). I'll assume you are
conversant with syntax.

You have one copy of the data set. Call it RespDS and sort it by respondent
id and create a new variable called Hhid that has the value of respondent
id.

Save another copy, call it SpouseDS, sort it by spouse id, and create a new
variable called HHid that has the value of spouse id.

Do an add files and sort by Hhid.

Finally, you should number the records within Hhid. This is simple. Just

Compute Resp=1.
If (Hhid eq lag(Hhid)) resp=2.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

meljr
In reply to this post by Björn Türoque
Don, sorry I should have noticed that. How about something like this:

To get a unique value multiply the Rid by the SPid.
If the SPid = 0, recode it to 1 so you will not have to multiply my zero.
Then sort and use the lag function to remove duplicates.
This is untried, but I would be very surprised if you got any non-unique values in the Rid X SPid.
meljr

Don Asay wrote
Thank you for the responses, unfortunately the spouse ID is often not
the case immidiately before or immediately after the respondents ID, so
looking at the case before or the case after does not work in this
situation. Is there a way to have the computer search all Spouse ID's
instead of just the one before or the one after?

Don


On 4/2/07, meljr <meljrmailbox-2dognight@yahoo.com> wrote:
>
> This is what I would try:
>
> sort cases by Rid (a).
> compute keepit = 0.
> if (Rid = lag(SPid)) keepit = 1.
> select if (keepit = 0).
> fre var = keepit.
>
> This puts the cases in order by Respondent id.
> Then the syntax looks back to see if the Rid matches the SPid in the
> previous line.
> Then it marks it with a number 1.
> All that is left is one each of the cases that do not have a match.
>
> You may have to play around with this a bit to make sure it works ok on
> your
> data.
> Good luck!
> meljr
>
>
> Don Asay wrote:
> >
> > I have a dataset that has information collected from households,
> > questionaires have been answered by both husband and wife each was
> > assigned
> > a unique ID and has been linked to their spouse through a spouse id.
> There
> > is an additional variable that tells if the person has a spouse, as in
> > some
> > cases the spouse was not available to be interviewed.  I would like to
> > flag one and only one of the two members of the household, it does not
> > particularly matter which one, I have been trying to use the "Identify
> > Duplicate Cases Wizard", but it does not seem to be able to identify
> > duplicate cases the way I want. Any advice would be greatly appreciated.
> >
> > A sample of the data is below:
> >
> > DATA LIST LIST /Rid(F8) SPid(F8) SPYN(F8)
> > BEGIN DATA.
> > 1011 1271 1
> > 1045 0 1
> > 1047 1049 1
> > 1049 1047 1
> > 1079 0 0
> > 1088 0 0
> > 1114 0 0
> > 1142 1143 1
> > 1143 1142 1
> > 1271 1011 1
> > 1300 0 1
> > 1351 0 0
> >
> > END DATA.
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Flaging-Spouses-within-dataset%2C-Probably-a-syntax-solution--tf3506948.html#a9794918
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset, Probably a syntax solution?

Björn Türoque
In reply to this post by Maguin, Eugene
Melissa's idea is actually a good one and works for the most part,
unfortunately once I remove the second dataset, I am left with my original
dataset with a flag variable. But unfortunately when I use the lag command I
loose the people who are flagged when I remove the second dataset. When
I use the duplicate cases wizard it seems to be flagging both the husband
and the wife as duplicates, and thus I have to go through by hand and pick
the one to not be included.

Unfortunately I have no control over how other people collect their data, it
is sent to me and I need to help people make sense of it. There are some
people you just want to shake, especially since they should know better.
Unfortunately for me I am going to have to perform this operation several
times, as ; so anything quick and repeatable would be super helpful.

If you want to talk about messed up data I got one the other day that would
make your head spin. The researcher asked a please select all that apply
question, with 8 possible answers, but coded it into spss as one variable,
so the possible answers went something like this :

1 = choice 1
2= choice 2
3= choice 3
4= choice 4
....
9 = choice 1 and choice 2
10 = 1+3
.....
83 = 3+6+8
84 = 3+7+8
and so on....

When I asked the researcher what scheme did they use for this they told me
they only coded it in if someone answered in that particular combination,
and had a different variable for someoen answering 1+2+3 and someone who
answered 3+1+2. The respondent could have picked as many as they want, and
the order in which they picked them was not a factor. She had me ripping my
hair out for hours trying to figure out a way to not have her re enter the
data. All she wanted to do was run a simple crosstab of each response
choice. I finally I told her she had to either do the math by hand or
re-enter her data; I then showed her how to do it. Wow was she pissed...
sorry I am venting, but most people don't get how frustraiting this is.

Don


On 4/2/07, Gene Maguin <[hidden email]> wrote:

>
> Don,
>
> I agree with Melissa's idea. I haven't had a chance to work this through,
> even though I think I had to do something like this once. I hope you have
> realized that while you have an immediate problem--the one that prompted
> you
> to write--you also have a recurrent problem because this problem will
> occur
> over and over and over. It does so because, in my opinion, a design error
> was made in the study execution. I think the only true way out is to
> construct a household or couple id that is superordinate to the persons
> who
> responded.
>
> I think I would work this problem in the following manner. (I am aiming
> for
> what I regard as the true fix and not a temporary one). I'll assume you
> are
> conversant with syntax.
>
> You have one copy of the data set. Call it RespDS and sort it by
> respondent
> id and create a new variable called Hhid that has the value of respondent
> id.
>
> Save another copy, call it SpouseDS, sort it by spouse id, and create a
> new
> variable called HHid that has the value of spouse id.
>
> Do an add files and sort by Hhid.
>
> Finally, you should number the records within Hhid. This is simple. Just
>
> Compute Resp=1.
> If (Hhid eq lag(Hhid)) resp=2.
>
> Gene Maguin
>
Reply | Threaded
Open this post in threaded view
|

Re: Flaging Spouses within dataset

Richard Ristow
In reply to this post by Björn Türoque
At 12:55 PM 4/2/2007, Don Asay wrote:

>I have questionaires have been answered by both husband and
>wife,  each assigned a unique ID and linked to their spouse through a
>spouse id. There is an additional variable that tells if the person
>has a spouse.  I would like to flag one and only one of the two
>members of the household.

It doesn't look like there's been a complete solution yet. Here's one,
assigning a household ID, namely the lower of the two individual IDs
associated with the household. In selecting, it takes the spouse with
the lower ID number. (I'd usually recommend true random selection.)
SPSS 15 draft output:

|-----------------------------|---------------------------|
|Output Created               |02-APR-2007 16:50:52       |
|-----------------------------|---------------------------|
    Rid   SPid SPYN

   1011   1271   1
   1045      0   1
   1047   1049   1
   1049   1047   1
   1079      0   0
   1088      0   0
   1114      0   0
   1142   1143   1
   1143   1142   1
   1271   1011   1
   1300      0   1
   1351      0   0

Number of cases read:  12    Number of cases listed:  12


NUMERIC HHid (F6).
VAR LABEL HHid 'Household ID: Lower of two individual IDs'.

MISSING VAL    SPid(0).
COMPUTE HHid = MIN(Rid,SPid).
SORT CASES BY  HHid Rid.
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |02-APR-2007 16:50:53       |
|-----------------------------|---------------------------|
    Rid   SPid SPYN   HHid

   1011   1271   1    1011
   1271   1011   1    1011
   1045      0   1    1045
   1047   1049   1    1047
   1049   1047   1    1047
   1079      0   0    1079
   1088      0   0    1088
   1114      0   0    1114
   1142   1143   1    1142
   1143   1142   1    1142
   1300      0   1    1300
   1351      0   0    1351

Number of cases read:  12    Number of cases listed:  12


SELECT IF   VALUE(HHid)
          EQ VALUE(Rid).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |02-APR-2007 16:50:53       |
|-----------------------------|---------------------------|
    Rid   SPid SPYN   HHid

   1011   1271   1    1011
   1045      0   1    1045
   1047   1049   1    1047
   1079      0   0    1079
   1088      0   0    1088
   1114      0   0    1114
   1142   1143   1    1142
   1300      0   1    1300
   1351      0   0    1351

Number of cases read:  9    Number of cases listed:  9

=====================================
APPENDIX: Test data, from the posting
=====================================
*  Test data, from posting:      ........... .
DATA LIST LIST /Rid(F6) SPid(F6) SPYN(F2).

BEGIN DATA
1011 1271 1
1045 0 1
1047 1049 1
1049 1047 1
1079 0 0
1088 0 0
1114 0 0
1142 1143 1
1143 1142 1
1271 1011 1
1300 0 1
1351 0 0
END DATA.