finding inconsistent within person information

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

finding inconsistent within person information

Art Kendall
Tomorrow  I have to work on a set of records which are supposed to have unique IDs for people.
There are multiple records per person. But a different number of records per person.



In the past I would do something like
aggregate outfile=aggfile /break = ID
/lo_SSN = min (SSN)
/lo_dob = min (dob)
/lo_race= min(race)
/lo_sex = min(sex)
/lo_zip = min(zipcode)
/hi_ssn = max (ssn)
/hi_dob = max(dob)
/hi_race = max(race)
/hi_sex= max(sex)
/hi_zip = max(zipcode).

do repeat
  redflag =  redflag1 to redflag6/
  xlo = lo_ssn to lo_zip
  xhi = hi_ssn to hi_zip.
compute redflag = xlo ne xhi.
end repeat.

temporary.
select if sum(redflag1 to redflag6 ne 0).
list variables = ID redflag1 to redflag6.



Is there an extension command to find ID's that  have changes across a series of records in a varlist?

Has anyone figured out a way to trick duplicate cases to do this?


-- 
Art Kendall
Social Research Consultants
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

Andy W
Just a minor change to what you have already, aggregate the min (or max) value to the same data-file, then compare that aggregated value to the original values in long format. This will flag any discrepancies.

You do something with sort and lag to give you more info, but that is an easy start.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

Ruben Geert van den Berg
In reply to this post by Art Kendall
Dear Art,

Couldn't you use something like

aggregate
/outfile * mode = addvariables
/break = id
/check_v1=sd(v1)
/check_v2=sd(v2)
/check_v3=sd(v3)
/check_v4=sd(v4)
/check_v5=sd(v5).

If the varlist gets long, you can have Python spit out the necessary aggregate command for you.

Best,

Ruben
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

Art Kendall
Thank you and Andy for replying
Yes using add variables is something that I have done and will do
.
using SD(var) would work for numeric variables.
Also, now that I think back I used FIRST and LAST sometimes.

The last time I thought about how to do this kind of task was when SPSS made "addvariables" available.
Every now and then I think to myself, "this is a task I have done many times over the decades.  I wonder if it has been made more automated?"

It looks like I'll keep on doing it this old way.
Art Kendall
Social Research Consultants
On 4/29/2013 5:05 AM, Ruben van den Berg [via SPSSX Discussion] wrote:
Dear Art,

Couldn't you use something like

aggregate
/outfile * mode = addvariables
/break = id
/check_v1=sd(v1)
/check_v2=sd(v2)
/check_v3=sd(v3)
/check_v4=sd(v4)
/check_v5=sd(v5).

If the varlist gets long, you can have Python spit out the necessary aggregate command for you.

Best,

Ruben


If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/finding-inconsistent-within-person-information-tp5719800p5719804.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Automatic reply: finding inconsistent within person information

Genevieve Odoom
Thank you for your email. Please note that I will be out of the office beginning Monday, April 29th till Wednesday May 1st with limited access to emails. I will respond to your email upon my return to the office on Thursday May 2.

Thank you.

Genevieve Odoom
Policy and Program Analyst
OANHSS
7050 Weston Rd.
Suite 700
Woodbridge, On
L4L 8G7
Tel: (905) 851-8821 x241 Fax: (905) 851-0744
[hidden email]
www.oanhss.org

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

David Marso
Administrator
In reply to this post by Art Kendall
Why not
AGGREGATE OUTFILE * / BREAK ALL/ N=N.
COMPUTE RedFlag = ID EQ LAG(ID).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

Ruben Geert van den Berg
I'm not sure about the second command. If there are consistencies within a case, it will result in a case group with N > 1. The first case within this group will not be flagged while it should. Perhaps

AGGREGATE OUTFILE * / BREAK ALL/ N=N.
AGGREGATE OUTFILE * / BREAK id/ N=N.
COMPUTE RedFlag = n gt 1.

may get around this?
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

Rich Ulrich
In reply to this post by Art Kendall
A different approach -
You could use Aggregate to Break on All, adding the Count
to the file; Break on ID, adding the ID-count to the file;
then Select and list all the cases where the two counts are not equal.

--
Rich Ulrich



Date: Sun, 28 Apr 2013 13:19:42 -0700
From: [hidden email]
Subject: finding inconsistent within person information
To: [hidden email]

Tomorrow  I have to work on a set of records which are supposed to have unique IDs for people.
There are multiple records per person. But a different number of records per person.



In the past I would do something like
aggregate outfile=aggfile /break = ID
/lo_SSN = min (SSN)
/lo_dob = min (dob)
/lo_race= min(race)
/lo_sex = min(sex)
/lo_zip = min(zipcode)
/hi_ssn = max (ssn)
/hi_dob = max(dob)
/hi_race = max(race)
/hi_sex= max(sex)
/hi_zip = max(zipcode).

do repeat
  redflag =  redflag1 to redflag6/
  xlo = lo_ssn to lo_zip
  xhi = hi_ssn to hi_zip.
compute redflag = xlo ne xhi.
end repeat.

temporary.
select if sum(redflag1 to redflag6 ne 0).
list variables = ID redflag1 to redflag6.



Is there an extension command to find ID's that  have changes across a series of records in a varlist?

Has anyone figured out a way to trick duplicate cases to do this?
...

Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

Art Kendall
In reply to this post by Ruben Geert van den Berg
this appears to work.
new file.
data list list /id (f2) year (f4) gender (a1) race (a1)yearborn(f4).
begin data
01 1990 M W 1950
01 1991 M W 1950
01 1992 M W 1950
01 1993 F W 1950
01 1994 M W 1950
01 1995 M W 1950
02 1991 F W 1960
02 1992 F W 1975
02 1993 F W 1960
02 1994 F W 1960
02 1995 F B 1960
02 1996 F W 1960
03 1992 F W 1972
03 1993 F W 1972
03 1994 F W 1972
03 1995 F W 1972
04 1990 M W 1968
04 1991 M W 1968
04 1992 M W 1968
04 1993 M W 1968
04 1994 M W 1968
04 1995 M W 1968
end data.
AGGREGATE OUTFILE * / BREAK ID gender Race yearborn/ Kount=N.
AGGREGATE OUTFILE * / BREAK id/ Kount=N.
COMPUTE RedFlag = Kount gt 1.
list.

Art Kendall
Social Research Consultants
On 4/29/2013 10:55 AM, Ruben van den Berg [via SPSSX Discussion] wrote:
I'm not sure about the second command. If there are consistencies within a case, it will result in a case group with N > 1. The first case within this group will not be flagged while it should. Perhaps

AGGREGATE OUTFILE * / BREAK ALL/ N=N.
AGGREGATE OUTFILE * / BREAK id/ N=N.
COMPUTE RedFlag = n gt 1.

may get around this?



If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/finding-inconsistent-within-person-information-tp5719800p5719815.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: finding inconsistent within person information

David Marso
Administrator
In reply to this post by Ruben Geert van den Berg
Maybe better to use MATCH create FIRST and LAST flags.  Both should be 1 (True) unless there are cases with same ID but diffs in the variable values.
Ruben van den Berg wrote
I'm not sure about the second command. If there are consistencies within a case, it will result in a case group with N > 1. The first case within this group will not be flagged while it should. Perhaps

AGGREGATE OUTFILE * / BREAK ALL/ N=N.
AGGREGATE OUTFILE * / BREAK id/ N=N.
COMPUTE RedFlag = n gt 1.

may get around this?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"