SPSSX Discussion

finding inconsistent within person information

Classic

List

Threaded

10 messages Options

Art Kendall

finding inconsistent within person information

Tomorrow I have to work on a set of records which are supposed to have unique IDs for people.
There are multiple records per person. But a different number of records per person.

In the past I would do something like
aggregate outfile=aggfile /break = ID/lo_SSN = min (SSN)/lo_dob = min (dob)/lo_race= min(race)/lo_sex = min(sex)
/lo_zip = min(zipcode)
/hi_ssn = max (ssn)/hi_dob = max(dob) /hi_race = max(race)/hi_sex= max(sex) /hi_zip = max(zipcode).
do repeat redflag= redflag1 to redflag6/ xlo = lo_ssn tolo_zip xhi = hi_ssn to hi_zip.compute redflag = xlo ne xhi.end repeat.
temporary. select if sum(redflag1 to redflag6 ne 0). list variables = ID redflag1 to redflag6.

Is there an extension command to find ID's that have changes across a series of records in a varlist?

Has anyone figured out a way to trick duplicate cases to do this?

-- 
Art Kendall
Social Research Consultants

Art Kendall
Social Research Consultants

Andy W

Re: finding inconsistent within person information

Just a minor change to what you have already, aggregate the min (or max) value to the same data-file, then compare that aggregated value to the original values in long format. This will flag any discrepancies.

You do something with sort and lag to give you more info, but that is an easy start.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Ruben Geert van den Berg

Re: finding inconsistent within person information

In reply to this post by Art Kendall

Dear Art,

Couldn't you use something like

aggregate
/outfile * mode = addvariables
/break = id
/check_v1=sd(v1)
/check_v2=sd(v2)
/check_v3=sd(v3)
/check_v4=sd(v4)
/check_v5=sd(v5).

If the varlist gets long, you can have Python spit out the necessary aggregate command for you.

Best,

Ruben

Art Kendall

Re: finding inconsistent within person information

Thank you and Andy for replying
Yes using add variables is something that I have done and will do.
using SD(var) would work for numeric variables.
Also, now that I think back I used FIRST and LAST sometimes.

The last time I thought about how to do this kind of task was when SPSS made "addvariables" available.
Every now and then I think to myself, "this is a task I have done many times over the decades. I wonder if it has been made more automated?"

It looks like I'll keep on doing it this old way.

Art Kendall
Social Research Consultants

On 4/29/2013 5:05 AM, Ruben van den Berg [via SPSSX Discussion] wrote:

Dear Art,

Couldn't you use something like

aggregate
/outfile * mode = addvariables
/break = id
/check_v1=sd(v1)
/check_v2=sd(v2)
/check_v3=sd(v3)
/check_v4=sd(v4)
/check_v5=sd(v5).

If the varlist gets long, you can have Python spit out the necessary aggregate command for you.

Best,

Ruben

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/finding-inconsistent-within-person-information-tp5719800p5719804.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

Genevieve Odoom

Automatic reply: finding inconsistent within person information

Thank you for your email. Please note that I will be out of the office beginning Monday, April 29th till Wednesday May 1st with limited access to emails. I will respond to your email upon my return to the office on Thursday May 2.

Thank you.

Genevieve Odoom
Policy and Program Analyst
OANHSS
7050 Weston Rd.
Suite 700
Woodbridge, On
L4L 8G7
Tel: (905) 851-8821 x241 Fax: (905) 851-0744
[hidden email]
www.oanhss.org

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: finding inconsistent within person information

Administrator

In reply to this post by Art Kendall

Why not
AGGREGATE OUTFILE * / BREAK ALL/ N=N.
COMPUTE RedFlag = ID EQ LAG(ID).

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Ruben Geert van den Berg

Re: finding inconsistent within person information

I'm not sure about the second command. If there are consistencies within a case, it will result in a case group with N > 1. The first case within this group will not be flagged while it should. Perhaps

AGGREGATE OUTFILE * / BREAK ALL/ N=N.
AGGREGATE OUTFILE * / BREAK id/ N=N.
COMPUTE RedFlag = n gt 1.

may get around this?

Rich Ulrich

Re: finding inconsistent within person information

In reply to this post by Art Kendall

A different approach -
You could use Aggregate to Break on All, adding the Count
to the file; Break on ID, adding the ID-count to the file;
then Select and list all the cases where the two counts are not equal.

--
Rich Ulrich

Date: Sun, 28 Apr 2013 13:19:42 -0700
From: [hidden email]
Subject: finding inconsistent within person information
To: [hidden email]

Tomorrow I have to work on a set of records which are supposed to have unique IDs for people.
There are multiple records per person. But a different number of records per person.

In the past I would do something like
aggregate outfile=aggfile /break = ID/lo_SSN = min (SSN)/lo_dob = min (dob)/lo_race= min(race)/lo_sex = min(sex)
/lo_zip = min(zipcode)
/hi_ssn = max (ssn)/hi_dob = max(dob) /hi_race = max(race)/hi_sex= max(sex) /hi_zip = max(zipcode).
do repeat redflag= redflag1 to redflag6/ xlo = lo_ssn tolo_zip xhi = hi_ssn to hi_zip.compute redflag = xlo ne xhi.end repeat.
temporary. select if sum(redflag1 to redflag6 ne 0). list variables = ID redflag1 to redflag6.

Is there an extension command to find ID's that have changes across a series of records in a varlist?

Has anyone figured out a way to trick duplicate cases to do this?
...

Art Kendall

Re: finding inconsistent within person information

In reply to this post by Ruben Geert van den Berg

this appears to work.
new file.data list list /id (f2) year (f4) gender (a1) race (a1)yearborn(f4).begin data01 1990 M W 195001 1991 M W 195001 1992 M W 195001 1993 F W 195001 1994 M W 195001 1995 M W 195002 1991 F W 196002 1992 F W 197502 1993 F W 196002 1994 F W 196002 1995 F B 196002 1996 F W 196003 1992 F W 197203 1993 F W 197203 1994 F W 197203 1995 F W 197204 1990 M W 196804 1991 M W 196804 1992 M W 196804 1993 M W 196804 1994 M W 196804 1995 M W 1968end data.AGGREGATE OUTFILE * / BREAK ID gender Race yearborn/ Kount=N.AGGREGATE OUTFILE * / BREAK id/ Kount=N.COMPUTE RedFlag = Kount gt 1.list.

Art Kendall
Social Research Consultants

On 4/29/2013 10:55 AM, Ruben van den Berg [via SPSSX Discussion] wrote:

I'm not sure about the second command. If there are consistencies within a case, it will result in a case group with N > 1. The first case within this group will not be flagged while it should. Perhaps

AGGREGATE OUTFILE * / BREAK ALL/ N=N.
AGGREGATE OUTFILE * / BREAK id/ N=N.
COMPUTE RedFlag = n gt 1.

may get around this?

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/finding-inconsistent-within-person-information-tp5719800p5719815.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

David Marso

Re: finding inconsistent within person information

Administrator

In reply to this post by Ruben Geert van den Berg

Maybe better to use MATCH create FIRST and LAST flags. Both should be 1 (True) unless there are cases with same ID but diffs in the variable values.

Ruben van den Berg wrote

I'm not sure about the second command. If there are consistencies within a case, it will result in a case group with N > 1. The first case within this group will not be flagged while it should. Perhaps

AGGREGATE OUTFILE * / BREAK ALL/ N=N.
AGGREGATE OUTFILE * / BREAK id/ N=N.
COMPUTE RedFlag = n gt 1.

may get around this?