Double Checking Data and Recoding

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Double Checking Data and Recoding

Tim Main-2
Hi All,

I'm setting up a method to double check all our data entered into very large
databases for a childhood grief research project. We are using a double
entry method. For example, all 22 questions from a survey are entered into
the database by one student, and then immediately entered again by the same
student.

I have created a CHECK variable to make sure these double entry's match each
other. Since we just started this, a lot of the double entry still needs to
be entered into past data entries.

The CHECK Variable will ideally look like this.
0 = missing double check entry
1 = match
2 = mismatch

Here is a very simplified version of my syntax I am running.

COMPUTE c1CHECK = 0.
VALUE LABELS c1CHECK 0 'missing' 1 'match' 2 'mismatch'
EXECUTE.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) = 0) c1CHECK = 1.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) <> 0) c1CHECK = 2.

EXECUTE.

Here's my problem: My syntax does not recognize missing values (-9) as
values, and will tell me that I am 'missing the double check entry' if any
subject has correct -9's entered into their data.

Can someone tell me how to recode the missing values to regular values and
then recode them back to missing?

Or does anyone have a better solution for double checking data using the
double entry method?

Thanks!

Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Double Checking Data and Recoding

David Marso
Administrator
"Can someone tell me how to recode the missing values to regular values and
then recode them back to missing?"
No need to do so!
See VALUE function.

"Or does anyone have a better solution for double checking data using the
double entry method?"
There is reputed to be some SPSS extension ?compare data sets? which supposedly does this.

OTOH: Your current syntax is DOOMED for other reasons!!!
ie let EGI1=1, EGI2=2, EGI3=3,
let EGI1CHCK=3, EGI2CHCK=2,  EGI3CHCK=1.
Even if refactored.
COMPUTE c1Check=SUM(VALUE(EGI1), VALUE(EGI2),VALUE(EGI3) ) EQ
                            SUM(VALUE(CHCKEGI1), VALUE(CHCKEGI2),VALUE(CHCKEGI3) ).
1 if true, 0 if false.

Might be *MUCH BETTER* to do something like:
COMPUTE c1Check=SUM( VALUE(EGI1)-VALUE(CHCKEGI1),
                                     VALUE(EGI2)-VALUE(CHCKEGI2),
                                    VALUE(EGI3)-VALUE(CHCKEGI3) ) EQ 0.
*IF* you REALLY must have 1 and 2 as values either recode or..
COMPUTE c1Check=(SUM(blah blah blah) NE 0 ) + 1.
**AND** Be careful of floating point numbers when doing comparisons to 0!!!
             
HTH, David

----

Tim Main-2 wrote
Hi All,

I'm setting up a method to double check all our data entered into very large
databases for a childhood grief research project. We are using a double
entry method. For example, all 22 questions from a survey are entered into
the database by one student, and then immediately entered again by the same
student.

I have created a CHECK variable to make sure these double entry's match each
other. Since we just started this, a lot of the double entry still needs to
be entered into past data entries.

The CHECK Variable will ideally look like this.
0 = missing double check entry
1 = match
2 = mismatch

Here is a very simplified version of my syntax I am running.

COMPUTE c1CHECK = 0.
VALUE LABELS c1CHECK 0 'missing' 1 'match' 2 'mismatch'
EXECUTE.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) = 0) c1CHECK = 1.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) <> 0) c1CHECK = 2.

EXECUTE.

Here's my problem: My syntax does not recognize missing values (-9) as
values, and will tell me that I am 'missing the double check entry' if any
subject has correct -9's entered into their data.

Can someone tell me how to recode the missing values to regular values and
then recode them back to missing?

Or does anyone have a better solution for double checking data using the
double entry method?

Thanks!

Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Double Checking Data and Recoding

Zdaniuk, Bozena-3
SPSS used to offer that Data Entry software (i am not sure if they still offer it). It was quite buggy but very useful for comparing files. Any spss data file could be saved as the Data Entry file and then I would open one file in the Data Entry software and use the Compare function which produced a very clear and easy to read report of discrepancies between the two files.
bozena
________________________________________
From: SPSSX(r) Discussion [[hidden email]] on behalf of David Marso [[hidden email]]
Sent: Wednesday, January 11, 2012 4:15 PM
To: [hidden email]
Subject: Re: Double Checking Data and Recoding

"Can someone tell me how to recode the missing values to regular values and
then recode them back to missing?"
No need to do so!
See VALUE function.

"Or does anyone have a better solution for double checking data using the
double entry method?"
There is reputed to be some SPSS extension ?compare data sets? which
supposedly does this.

OTOH: Your current syntax is DOOMED for other reasons!!!
ie let EGI1=1, EGI2=2, EGI3=3,
let EGI1CHCK=3, EGI2CHCK=2,  EGI3CHCK=1.
Even if refactored.
COMPUTE c1Check=SUM(VALUE(EGI1), VALUE(EGI2),VALUE(EGI3) ) EQ
                            SUM(VALUE(CHCKEGI1),
VALUE(CHCKEGI2),VALUE(CHCKEGI3) ).
1 if true, 0 if false.

Might be *MUCH BETTER* to do something like:
COMPUTE c1Check=SUM( VALUE(EGI1)-VALUE(CHCKEGI1),
                                     VALUE(EGI2)-VALUE(CHCKEGI2),
                                    VALUE(EGI3)-VALUE(CHCKEGI3) ) EQ 0.
*IF* you REALLY must have 1 and 2 as values either recode or..
COMPUTE c1Check=(SUM(blah blah blah) NE 0 ) + 1.
**AND** Be careful of floating point numbers when doing comparisons to 0!!!

HTH, David

----


Tim Main-2 wrote

>
> Hi All,
>
> I'm setting up a method to double check all our data entered into very
> large
> databases for a childhood grief research project. We are using a double
> entry method. For example, all 22 questions from a survey are entered into
> the database by one student, and then immediately entered again by the
> same
> student.
>
> I have created a CHECK variable to make sure these double entry's match
> each
> other. Since we just started this, a lot of the double entry still needs
> to
> be entered into past data entries.
>
> The CHECK Variable will ideally look like this.
> 0 = missing double check entry
> 1 = match
> 2 = mismatch
>
> Here is a very simplified version of my syntax I am running.
>
> COMPUTE c1CHECK = 0.
> VALUE LABELS c1CHECK 0 'missing' 1 'match' 2 'mismatch'
> EXECUTE.
>
> If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) = 0) c1CHECK
> = 1.
>
> If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) <> 0)
> c1CHECK = 2.
>
> EXECUTE.
>
> Here's my problem: My syntax does not recognize missing values (-9) as
> values, and will tell me that I am 'missing the double check entry' if any
> subject has correct -9's entered into their data.
>
> Can someone tell me how to recode the missing values to regular values and
> then recode them back to missing?
>
> Or does anyone have a better solution for double checking data using the
> double entry method?
>
> Thanks!
>
> Tim
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Double-Checking-Data-and-Recoding-tp5138534p5138559.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Double Checking Data and Recoding

msherman
In reply to this post by David Marso
Here is a syntax program that I use (probably got it on this listserv). You only need to make changes on lines 4, 5, 11, 24 to suit you location of the files and the name of the ID variable.  Make sure that you remove the line numbers.  Also, it is possible within IBM SPSS to use the Identify duplicate cases under the DATA tab.  HTH.  Martin  Sherman
1.comment program to locate the IDs and variable values that
2. are discrepant between two data files.
3. set printback=no mprint=no length=none.
4. file handle oldfile /name='g:\Double Entry data file 1x.sav'.
5. file handle newfile /name='g:\Double Entry data file 2x.sav'.
6. define checkvar (!pos=!tokens(1)).
7. match files
8.   file=oldfile /rename= (!1=!concat(!1,'_old'))/in=inold
 9.  /file=newfile/rename= (!1=!concat(!1,'_new'))/in=innew
 10.  /keep=!concat(!1,'_old') !concat(!1,'_new') ID
 11. /by ID.
12. sel if inold and innew. /* select cases in both files */
13. display labels var=!concat(!1,'_new').
14. sel if !concat(!1,'_old') ne !concat(!1,'_new').
15. list/id  !concat(!1,'_old') !concat(!1,'_new').
16. !enddefine.
17. /* drop ID and variables not in new file here           */
18. /* drop any other variables you don't want to compare   */
19. get file=oldfile  /drop= ID .
20. n of cases 1.
21. oms /destination viewer=no.
22. flip.
23. omsend.
24. write outfile="g:\temp\checkvar1.sps"/"checkvar " case_lbl ".".
25. exe.
26. include "g:\temp\checkvar1.sps".

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Wednesday, January 11, 2012 7:16 PM
To: [hidden email]
Subject: Re: Double Checking Data and Recoding

"Can someone tell me how to recode the missing values to regular values and then recode them back to missing?"
No need to do so!
See VALUE function.

"Or does anyone have a better solution for double checking data using the double entry method?"
There is reputed to be some SPSS extension ?compare data sets? which supposedly does this.

OTOH: Your current syntax is DOOMED for other reasons!!!
ie let EGI1=1, EGI2=2, EGI3=3,
let EGI1CHCK=3, EGI2CHCK=2,  EGI3CHCK=1.
Even if refactored.
COMPUTE c1Check=SUM(VALUE(EGI1), VALUE(EGI2),VALUE(EGI3) ) EQ
                            SUM(VALUE(CHCKEGI1),
VALUE(CHCKEGI2),VALUE(CHCKEGI3) ).
1 if true, 0 if false.

Might be *MUCH BETTER* to do something like:
COMPUTE c1Check=SUM( VALUE(EGI1)-VALUE(CHCKEGI1),
                                     VALUE(EGI2)-VALUE(CHCKEGI2),
                                    VALUE(EGI3)-VALUE(CHCKEGI3) ) EQ 0.
*IF* you REALLY must have 1 and 2 as values either recode or..
COMPUTE c1Check=(SUM(blah blah blah) NE 0 ) + 1.
**AND** Be careful of floating point numbers when doing comparisons to 0!!!

HTH, David

----


Tim Main-2 wrote

>
> Hi All,
>
> I'm setting up a method to double check all our data entered into very
> large databases for a childhood grief research project. We are using a
> double entry method. For example, all 22 questions from a survey are
> entered into the database by one student, and then immediately entered
> again by the same student.
>
> I have created a CHECK variable to make sure these double entry's
> match each other. Since we just started this, a lot of the double
> entry still needs to be entered into past data entries.
>
> The CHECK Variable will ideally look like this.
> 0 = missing double check entry
> 1 = match
> 2 = mismatch
>
> Here is a very simplified version of my syntax I am running.
>
> COMPUTE c1CHECK = 0.
> VALUE LABELS c1CHECK 0 'missing' 1 'match' 2 'mismatch'
> EXECUTE.
>
> If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) = 0)
> c1CHECK = 1.
>
> If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) <> 0)
> c1CHECK = 2.
>
> EXECUTE.
>
> Here's my problem: My syntax does not recognize missing values (-9) as
> values, and will tell me that I am 'missing the double check entry' if
> any subject has correct -9's entered into their data.
>
> Can someone tell me how to recode the missing values to regular values
> and then recode them back to missing?
>
> Or does anyone have a better solution for double checking data using
> the double entry method?
>
> Thanks!
>
> Tim
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Double-Checking-Data-and-Recoding-tp5138534p5138559.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Double Checking Data and Recoding

Jon K Peck
In reply to this post by Tim Main-2
I would suggest just using the SPSSINC COMPARE DATASETS extension command available from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral).  It can create a variable counting the mismatches for a set of variables.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Tim Main <[hidden email]>
To:        [hidden email]
Date:        01/11/2012 04:46 PM
Subject:        [SPSSX-L] Double Checking Data and Recoding
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi All,

I'm setting up a method to double check all our data entered into very large
databases for a childhood grief research project. We are using a double
entry method. For example, all 22 questions from a survey are entered into
the database by one student, and then immediately entered again by the same
student.

I have created a CHECK variable to make sure these double entry's match each
other. Since we just started this, a lot of the double entry still needs to
be entered into past data entries.

The CHECK Variable will ideally look like this.
0 = missing double check entry
1 = match
2 = mismatch

Here is a very simplified version of my syntax I am running.

COMPUTE c1CHECK = 0.
VALUE LABELS c1CHECK 0 'missing' 1 'match' 2 'mismatch'
EXECUTE.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) = 0) c1CHECK = 1.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) <> 0) c1CHECK = 2.

EXECUTE.

Here's my problem: My syntax does not recognize missing values (-9) as
values, and will tell me that I am 'missing the double check entry' if any
subject has correct -9's entered into their data.

Can someone tell me how to recode the missing values to regular values and
then recode them back to missing?

Or does anyone have a better solution for double checking data using the
double entry method?

Thanks!

Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Double Checking Data and Recoding

David Marso
Administrator
OTOH:  If you don't want to be bothered with Python (for whatever reason) etc..
Here is a bare bones straight for the jugular SPSS code fragment.
Note you don't need to try to figure out how to reuse/abuse that obtuse unnecessarily complicated MACRO which someone else reposted without attribution.  Really no sense to MATCH and rename etc when you can just ADD FILES with a flag and use LAG.  Seriously, KISS!!!!!!!  One could certainly spice up the following but the basic simple ideas remain invariant.
data list free / id v1 to v4.
begin data
1 1 2 3 4
2 5 6 7 8
3 9 0 1 2
4 3 4 5 6
end data.
save outfile 'file1.sav'.
data list free / id v1 to v4.
begin data
1 1 2 3 4
2 5 6 7 9
3 9 3 1 2
4 3 4 5 6
end data.
save outfile 'file2.sav'.

ADD FILES / FILE 'file1.sav' / FILE='file2.sav' / IN=f2 / BY ID.
COMPUTE @1=1.
COMPUTE @2=1.
MATCH FILES / FILE * / KEEP ID f2 @1 ALL .
DO REPEAT V=@1 TO @2.
+  DO IF ID=LAG(ID) AND F2.
+    COMPUTE @ERROR@=SUM(@ERROR@,V NE LAG(V)).
+  END IF.
END REPEAT.
LIST.


Jon K Peck wrote
I would suggest just using the SPSSINC COMPARE DATASETS extension command
available from the SPSS Community website (
www.ibm.com/developerworks/spssdevcentral).  It can create a variable
counting the mismatches for a set of variables.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:   Tim Main <[hidden email]>
To:     [hidden email]
Date:   01/11/2012 04:46 PM
Subject:        [SPSSX-L] Double Checking Data and Recoding
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Hi All,

I'm setting up a method to double check all our data entered into very
large
databases for a childhood grief research project. We are using a double
entry method. For example, all 22 questions from a survey are entered into
the database by one student, and then immediately entered again by the
same
student.

I have created a CHECK variable to make sure these double entry's match
each
other. Since we just started this, a lot of the double entry still needs
to
be entered into past data entries.

The CHECK Variable will ideally look like this.
0 = missing double check entry
1 = match
2 = mismatch

Here is a very simplified version of my syntax I am running.

COMPUTE c1CHECK = 0.
VALUE LABELS c1CHECK 0 'missing' 1 'match' 2 'mismatch'
EXECUTE.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) = 0) c1CHECK
= 1.

If (((EGI1 + EGI2 + EGI3) - (EGI1CHCK + EGI2CHCK + EGI3CHCK)) <> 0)
c1CHECK = 2.

EXECUTE.

Here's my problem: My syntax does not recognize missing values (-9) as
values, and will tell me that I am 'missing the double check entry' if any
subject has correct -9's entered into their data.

Can someone tell me how to recode the missing values to regular values and
then recode them back to missing?

Or does anyone have a better solution for double checking data using the
double entry method?

Thanks!

Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"