Digitacion Errors Detection

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Digitacion Errors Detection

Libardo López Guzmán
Anybody have a Syntax to detect digitacion errors?
I have two variables to check ID (#) and Name (String).
Example:
ID      Name ErrorID ErrorName

1445566 AAA 1
1445556 AAA 1
2222222 CCC 1
22222222 CCC 1
123      Peter       1
123      Pefer       1
4546     Floyd Dockter     1
4546     Floyd Fockter     1

Probably we have two digitacion errors by ID and two by Name.

The objective is Detect the erros and then correct them.

All help and ideas would wellcome.

BR

Lee

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Digitacion Errors Detection

Maguin, Eugene
Libardo,

Your problem description is not so clear. I'm going to make some assumptions
which you can confirm or reject.

Your incoming file actually looks like

ID      Name
1445566 AAA
1445556 AAA
2222222 CCC
22222222 CCC
123      Peter
123      Pefer
4546     Floyd Dockter
4546     Floyd Fockter

Most important, you always have exactly two records that need to be
compared. Each record has two variables to check. As you show, you want to
know whether the ids match and whether the names match.

The simplest way to do this may be to use the built in faciliy that spss
has. I don't know when it was first included but in 15 it is called Identify
duplicate cases and is in the Data dropdown menu.

If you don't have that facility, then here is how to do it. I also notice
that you want both records in a set to be marked. This is a bit tricky
because you don't have valid case id numbers. You'll need to create them
first. Note that I assume that an error exists and then change the marker
variable if it does not. That should catch missing values as well as
nonmatching ones.

compute case=$casenum.
if (mod(case,2) eq 0) case=lag(case)

Compute errorid=1.
Compute errorname=1.
If (id eq lag(id)) errorid=0.
If (name eq lag(name)) errorname=0.

Sort cases by case errorid.
If (case eq lag(case) and lag(errorid) eq 0) errorid=0.

Sort cases by case errorname.
If (case eq lag(case) and lag(errorname) eq 0) errorname=0.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD