double data entry

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

double data entry

Todd Kashdan
Hi all:

I was wondering how to check the reliability of data
entry if it was entered into two separate datasets?
Is there any technique in spss?

thanks,
Todd

**************************
Todd B. Kashdan, Ph.D.
Assistant Professor
Department of Psychology
George Mason University
Mail Stop 3F5
Fairfax, VA  22030
http://mason.gmu.edu/~tkashdan

Confidentiality Notice: If you have reason to believe that you are not the intended recipient of this communication, please contact the sender immediately, and delete it from your system.  Any unauthorized review, use, disclosure or distribution is prohibited.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: double data entry

Bauer, Craig
SPSS makes (or made, anyway, used it years ago) a great Data Entry
package that not only had a facility for checking DE accuracy via
overtyping a % of cases, but had the ability to force valid codes, skips
in the survey, etc.  If you're going to be doing a lot of this, might be
worth looking into

But to answer your question: have your random sample (or the whole
thing) retyped into another dataset called sample.  To  compare the two,
create a variable called "type" in each data set, set to 1 for the main
set and 2 for the sample.  Merge them together, and compare with "lag"
You can visually scan for which variables were off.  Like this:

* Type stamp into sample file.
Get file='sample.sav'.
Compute type=2.
Save outfile='sample.sav'.

* Get main file, establish type.
Get file='main.sav'.
Compute type=1.

* Merge and sort by id and type to get duplicated cases next to each
other, in order.
Add files
        /file=*
        /file='sample.sav'.
Sort cases by id type.

* check all vars, set bad=1 if there's a problem in comparison to
previous case.
Compute bad=0.
Do if (type=2).
Do repeat
        x=FirstVariableInFile to LastVariableInFile.
+       if (x ne lag(x)) bad=1.
End repeat.
End if.

* identify the cases.
Temporary.
Select if bad=1.
List vars=id.

* select those for visual comparison.
Select if any(id,list of ids from procedure above).

Now fix in the appropriate file & rerun to check your fixes.

Not totally scripted, but functional.  The only thing it wouldn't catch
is if ID is mistyped.  You'd probably want something more robust if the
cases contain lots of vars to help you identify which one(s) is/are off.


HTH

(apologies to the other listers cringing at my lack of proper
capitalization or other SPSS programming conventions)

- Craig

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Todd Kashdan
Sent: Monday, October 09, 2006 3:41 PM
To: [hidden email]
Subject: double data entry

Hi all:

I was wondering how to check the reliability of data entry if it was
entered into two separate datasets?
Is there any technique in spss?

thanks,
Todd

**************************
Todd B. Kashdan, Ph.D.
Assistant Professor
Department of Psychology
George Mason University
Mail Stop 3F5
Fairfax, VA  22030
http://mason.gmu.edu/~tkashdan

Confidentiality Notice: If you have reason to believe that you are not
the intended recipient of this communication, please contact the sender
immediately, and delete it from your system.  Any unauthorized review,
use, disclosure or distribution is prohibited.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com