identifying extended family in a database

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

identifying extended family in a database

Nomi-2
Hi,
I'm working on a large database that contains data of a longitudinal study.
For each person I have, among other things, a personal identity code and
personal identity codes of parents.  I have managed to identify siblings,
creating a new variable that has the same "family number" for all siblings.
I now wish to identify cousins as well as uncles/aunts. Any suggestions?
Thank you!
Nomi
Reply | Threaded
Open this post in threaded view
|

Re: identifying extended family in a database

Richard Ristow
At 12:22 PM 1/31/2007, Nomi wrote:

>For each person I have, among other things, a personal identity code
>and personal identity codes of parents.  I have managed to identify
>siblings, creating a new variable that has the same "family number"
>for all siblings.

For all siblings, and their parents? That's a nuclear family.

Or is it not a "family number" but a "siblings" number, with the
parents numbered into whatever sets of siblings THEY belong to? Of
course, each person record needs an identifier for the parents, or
you've no chance.

What do you do about half-siblings? Should you drop the 'family number'
or 'sibling group number', as you've defined it, in favor of giving the
person numbers for the parents? Then siblings would be easy to sort
out: they have the same pair of parents.

>I now wish to identify cousins as well as uncles/aunts. Any
>suggestions?

It's always an interesting problem. Back in the 1970s, the people I
knew who were working on it called it 'family reconstruction'; is that
term still current?

It's a special and complicated case of the problem called 'transitive
closure'. 'Transitive closure': implement the rule "if A is connected
to B, and B is connected to C, A is connected to C". Do that by
exploring the connections to whatever depth is necessary, and adding
connections as necessary; for example, in the above case, if "A is
connected to C" isn't already in the file, add it.

It's complicated, because "is related to" isn't enough. (If I'm related
to you and you're related to Joan, then I'm related to Joan; but if we
follow that chain far enough, both of us are probably related to
everybody. You want degree and kind of relationships, not just their
existence.

Let's see. I think the 'primitive', or irreducible, relationships, are
'father/child', 'mother/child', and 'husband/wife', and every other is
definable by following those. (Gender-specific terms used advisedly.)
'Sibling', for example, is having the same father and mother.

(Further complications: Our society is considering same-gender spouses;
if yours includes those, 'husband/wife' isn't the only marriage
relationship. And if a woman is in 'husband/wife' with two men, is that
widowhood, divorce, or polyandry? Do beginning and ending dates, and
ending reasons, of marriages, need to be included?)

I haven't worked on transitive closure in SPSS for quite a while. It
could be a chance to work up Python skills.

On the other hand, SPSS may not be the most suitable tool. I suggest,

. Look in the literature for 'family reconstruction.' I know people
have done this before - as I say, I knew people at Brown University
doing it, back in the 1970s. There must be computer methods; it's the
first thing one would think of. This particular wheel must already have
been invented.

. Your E-mail domain doesn't specify and you don't use a signature.
But, if you're at a university with a Computer Science department, this
might be about the level for a master's thesis, or maybe senior
project, in CS, if you could persuade somebody there that it's
interesting.

Fascinating problem. Good luck with the computational problem; even
more, good success with your study.

Richard Ristow