create a file detailing differences in data entry/coding that is useful for correcting

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

create a file detailing differences in data entry/coding that is useful for correcting

Art Kendall
Before I create syntax to do this I was wondering if anybody has already worked this task out.
If you have would you be willing to share the syntax?
Double data entry/coding has been around for decades so I was hoping someone had syntax.


In some pro bono work looking at procedures in court cases I had them code the court records twice.
The goal is to provide coding conflicts in such a way that they can resolve the conflicts and send the corrections back.

COMPARE DATASETS shows that there are conflicts in the codings.

In order to facilitate the data correction I would like to create a dataset and  send it to the client as an Excel spread sheet.  The spreadsheet would have 5 columns.
The "ShouldBe" column would be empty when sent out and filled when returned  so I can update the data.
CaseID  VariableName  Code1  Code2  ShouldBe
there would be 1 line for each variable for each case with a conflict.
An example might look like this
123456 ProperService    1 2 ___
345678 MethodService  3 5 ___
456123 CourtRoom     203 303 ____
456123 ProperService 3 2 ___

I have not worked out how to do this step but some VARSTOCASES might be the approach.

Then I would like to be able to update the main file analogous to list type entry in FORTRAN.

In the returned spreadsheet the CaseID and VariableName columns would be used as indices to enter   the corrected data.
An example might look like this
123456 ProperService    1 2 1
345678 MethodService  3 5 3
456123 CourtRoom     203 303 303
456123 ProperService 3 2 2

The kludge way to handle the returned data would be hundreds of if statements
creating by reading the returned Excel into WordPerfect, adding columns that were concatenated into IF statements.
An example might look like this
If CaseId eq '123456' ProperService =1.
If CaseID eq '345678' MethodService = 3.
If CaseID eq '456123' CourtRoom = 303.
If CaseID eq '456123' ProperService =2.

____
Summary.
supply details of coding conflicts so client can determine the correct values and the data can be corrected.

Suggestions?







Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

David Marso
Administrator
If the variables are all numeric then use VARSTOCASES on both files
MATCH by id and variable name.
Compare.
EXPORT the faulty cases.
---
On corrected data.
CASESTOVARS.
run UPDATE command BY id.



Art Kendall wrote
Before I create syntax to do this I was wondering if anybody has already worked this task out.
If you have would you be willing to share the syntax?
Double data entry/coding has been around for decades so I was hoping someone had syntax.


In some pro bono work looking at procedures in court cases I had them code the court records twice.
The goal is to provide coding conflicts in such a way that they can resolve the conflicts and send the corrections back.

COMPARE DATASETS shows that there are conflicts in the codings.

In order to facilitate the data correction I would like to create a dataset and  send it to the client as an Excel spread sheet.  The spreadsheet would have 5 columns.
The "ShouldBe" column would be empty when sent out and filled when returned  so I can update the data.
CaseID  VariableName  Code1  Code2  ShouldBe
there would be 1 line for each variable for each case with a conflict.
An example might look like this
123456 ProperService    1 2 ___
345678 MethodService  3 5 ___
456123 CourtRoom     203 303 ____
456123 ProperService 3 2 ___

I have not worked out how to do this step but some VARSTOCASES might be the approach.

Then I would like to be able to update the main file analogous to list type entry in FORTRAN.

In the returned spreadsheet the CaseID and VariableName columns would be used as indices to enter   the corrected data.
An example might look like this
123456 ProperService    1 2 1
345678 MethodService  3 5 3
456123 CourtRoom     203 303 303
456123 ProperService 3 2 2

The kludge way to handle the returned data would be hundreds of if statements
creating by reading the returned Excel into WordPerfect, adding columns that were concatenated into IF statements.
An example might look like this
If CaseId eq '123456' ProperService =1.
If CaseID eq '345678' MethodService = 3.
If CaseID eq '456123' CourtRoom = 303.
If CaseID eq '456123' ProperService =2.

____
Summary.
supply details of coding conflicts so client can determine the correct values and the data can be corrected.

Suggestions?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

David Marso
Administrator
/*simulate data sets*/.
DATASET DECLARE x.
DATASET DECLARE y.
MATRIX.
COMPUTE x={T(1:1000),TRUNC(UNIFORM(1000,20)*5+1)}.
COMPUTE y=x.
COMPUTE y(100,5)=0.
COMPUTE y(10,10)=0.
COMPUTE y(3,2)  =0.
SAVE x/OUTFILE x /VARIABLES id x01 TO x20.
SAVE Y/OUTFILE Y /VARIABLES id x01 TO x20.
END MATRIX.

/*Actual code */.
DATASET ACTIVATE x.
DATASET COPY newX.
VARSTOCASES /ID=id /MAKE data1 FROM ALL/INDEX=varname(data1).
DATASET ACTIVATE y.
VARSTOCASES /ID=id /MAKE data2 FROM ALL/INDEX=varname(data2).
MATCH FILES FILE x / FILE y /BY id varname.
NUMERIC fixed.
SELECT IF (data1-data2 NE 0).
DATASET NAME merged.

SAVE TRANSLATE OUTFILE='C:\Temp\merged.xls' /TYPE=XLS /FIELDNAMES.

/*Fix it              */.
/*  Open Excel file...*/.
DELETE VARIABLES data1 data2.
CASESTOVARS ID=id /INDEX=varname/AUTOFIX=NO.
UPDATE File=NewX / FILE merged / BY ID.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

Art Kendall
In reply to this post by David Marso
Thank you.

I built your idea into the syntax  I used.

(I had finished the task before I saw David's late post about translating into Excel, reading it back, and Updating.)

There are probably other ways to do this, but this approach although clumsy works.
This can be useful when all or some data is entered twice with a goal of reconciling conflicts
e.g., when some interviews are re-done by another interviewer for    quality assurance.
or when day is key twice for quality assurance
or when stimuli such as pictures, court records, etc are coded/rated/etc twice.

Paste the syntax below into a syntax window.  It may provide a model that others can use.
Run the syntax and eyeball the resulting file to the input file.
*completely artificial data do demonstrate comparing pairs of records
* that are supposed to have the same information - For example
* double entry of data pairs  etc.
data list list /county (a20) CourtCase(a6)Entry (f1) coder2(A8) rating1 rating2 rating3 (3f1).
begin data
"Alpha" "123456" 1 "Anne" 1 2 3
"Alpha" "123456" 2 "Alan" 2 4 3
"Beta"  "234567" 1 "Alan" 4 4 -1
"Beta"  "234567" 2 "Anne" 4 5 5
"Gamma" "345678" 1 "Bill" 5 2 1
"Gamma" "345678" 2 "Beth" 1 2 3
"Delta" "456789" 1 "Carrie" -1 3 4
"Delta" "456789" 2 "Carlos" -1 3 -2
end data.
dataset name rawData.
*you would already have the Value Labels in your actual data.
Value labels Rating1 to Rating3
-2 "cannot tell"
-1 "not applicable"
1 "Very negative"
2 "Negative"
3 "Neutral"
4 "Positive"
5 "Very Positive".


*Split the cases into 2 datasets and
*use COMPARE DATASETS to determine what variables to use in DO REPEAT below.
string Valname1 to valname3 (a20).
do repeat v= Rating1 to Rating3 /name= ValName1 to Valname3.
compute name=valuelabel(v).
end repeat.
*Thanks to David Marso for suggesting the VARTOCASES.
*make a single record for each variable for each entry.
VARSTOCASES
  /MAKE coding2 FROM rating1 rating2 rating3
  /INDEX=Myvar "Which variable"(coding2)
  /KEEP=county CourtCase Entry coder2 Valname1 Valname2 Valname3
  /NULL=KEEP.

*put new records to be compared one under the other.

sort cases by county CourtCase myvar entry.
string coder1(a8) GoValname1 GoValname2(a20).
compute coder1 = lag(coder2).
* KLUDGE but it works.
do repeat whichVar = "rating1" "rating2" "rating3"/
whichLabel = Valname1 Valname2 Valname3.
if myVar eq Whichvar GoValName1= lag(WhichLabel).
if myVar eq Whichvar GoValName2= WhichLabel.
end repeat.
compute coding1 = lag(coding2).
compute Keeper =  entry eq 2 and coding2 ne lag(coding2).
frequencies variables = keeper.
select if keeper.
execute.
match files /file=* /keep = county CourtCase myvar coding1 GoValName1 coding2 GoValName2
coder1 coder2.
sort cases by County CourtCase MyVar.








Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

David Marso
Administrator
I believe the following is a bit tidier ;-)
data list list /county (a20) CourtCase(a6)Entry (f1) coder2(A8) rating1 rating2 rating3 (3f2).
begin data
"Alpha" "123456" 1 "Anne" 1 2 3
"Alpha" "123456" 2 "Alan" 2 4 3
"Beta"  "234567" 1 "Alan" 4 4 -1
"Beta"  "234567" 2 "Anne" 4 5 5
"Gamma" "345678" 1 "Bill" 5 2 1
"Gamma" "345678" 2 "Beth" 1 2 3
"Delta" "456789" 1 "Carrie" -1 3 4
"Delta" "456789" 2 "Carlos" -1 3 -2
end data.
dataset name rawData.

CASESTOVARS ID=county courtcase /INDEX=entry/GROUPBY INDEX.
VARSTOCASES MAKE rate1 FROM rating1.1 TO rating3.1  /MAKE rate2 FROM rating1.2 TO rating3.2 /INDEX=varname(rate1).
COMPUTE varname=CHAR.SUBSTR(varname,1,LENGTH(varname)-2).
LIST.
county               CourtCase coder2.1 coder2.2 varname   rate1 rate2
 
Alpha                123456    Anne     Alan     rating1      1     2
Alpha                123456    Anne     Alan     rating2      2     4
Alpha                123456    Anne     Alan     rating3      3     3
Beta                 234567    Alan     Anne     rating1      4     4
Beta                 234567    Alan     Anne     rating2      4     5
Beta                 234567    Alan     Anne     rating3     -1     5
Gamma                345678    Bill     Beth     rating1      5     1
Gamma                345678    Bill     Beth     rating2      2     2
Gamma                345678    Bill     Beth     rating3      1     3
Delta                456789    Carrie   Carlos   rating1     -1    -1
Delta                456789    Carrie   Carlos   rating2      3     3
Delta                456789    Carrie   Carlos   rating3      4    -2
 
 
Number of cases read:  12    Number of cases listed:  12
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

Art Kendall
For my immediate project what I posted worked. I just needed to have simple numeric coding conflict errors selected.

Your approach IS very much tidier. Thank you.
except
-- it does not have value labels
-- it did not limit the things to be corrected to values that differed for the two entries.

----
In the long run it still would be  nice to have details of conflicts between pairs of "cases" arranged this  way for string variables, dates, currency, N format, etc.

<shields up>When we get this election season over, and I have finished  second retirement organizing such as finding a home for my extensive library related to applying social science to public policy and selling my DC area house and moving to Florida
I really have to learn Python in order to use the info in the Data Dictionary.
<shields down>

Alas I have still many things that were on my bucket list when I did my first retirement in 2001.
-- learning Python is one of them.
-- Comparing multilevel modelling and complex sampling. I know that they have very detailed similarities.  I have not had time to see if there are any differences other than originating form discipline.
-- writing more SPSS for the rest of the examples in Statistics for Experimenters by Box.  Unfortunately Wiley only has SAS examples for this book which is considered fundamental in many disciplines.
+ many other things.

I wonder will 2015 be my last retirement or will there be a third retirement farther in the future?
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

David Marso
Administrator
Simply wrap DISPLAY DICTIONARY around OMS tags and party on!

Art Kendall wrote
For my immediate project what I posted worked. I just needed to have simple numeric coding conflict errors selected.

Your approach IS very much tidier. Thank you.
except
-- it does not have value labels
-- it did not limit the things to be corrected to values that differed for the two entries.

----
In the long run it still would be  nice to have details of conflicts between pairs of "cases" arranged this  way for string variables, dates, currency, N format, etc.

<shields up>When we get this election season over, and I have finished  second retirement organizing such as finding a home for my extensive library related to applying social science to public policy and selling my DC area house and moving to Florida
I really have to learn Python in order to use the info in the Data Dictionary.
<shields down>

Alas I have still many things that were on my bucket list when I did my first retirement in 2001.
-- learning Python is one of them.
-- Comparing multilevel modelling and complex sampling. I know that they have very detailed similarities.  I have not had time to see if there are any differences other than originating form discipline.
-- writing more SPSS for the rest of the examples in Statistics for Experimenters by Box.  Unfortunately Wiley only has SAS examples for this book which is considered fundamental in many disciplines.
+ many other things.

I wonder will 2015 be my last retirement or will there be a third retirement farther in the future?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

Art Kendall
I have used OMS a few times.

I'll add looking at OMS again to my bucket list.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: create a file detailing differences in data entry/coding that is useful for correcting

David Marso
Administrator
NEW FILE.
DATASET CLOSE all.
data list /id 1 v1 to v5 3-7 s1 to s5 8-12 (A) .
begin data
1 1234552435
2 3453341244
3 3453554553
4 1234255263
5 2234342343
6 2343553425
END DATA.
VALUE LABELS v1 TO v5 1 'cat1' 2 'cat2' 3 'cat3' 4 'cat4' 5 'cat5'.
VALUE LABELS s1 TO s5 '1' 'cat1' '2' 'cat2' '3' 'cat3' '4' 'cat4' '5' 'cat5'.

VARIABLE LABELS v1 'var1' v2 'var2' v3 'var3' v4 'var4' v5 'var5'
                s1 'var6' s2 'var7' s3 'var8' s4 'var9' s5 'var10'.
DATASET NAME @rawdata1@.

data list /id 1 v1 to v5 3-7 s1 to s5 8-12 (A) .
begin data
1 2234552435
2 3452341444
3 3453554553
4 1232255263
5 2232242343
6 2143553425
END DATA.

VALUE LABELS v1  1 'cat 1' 2 'cat2' 3 'cat3' 4 'cat4' 5 'cat5'.
VALUE LABELS v2  1 'cat1' 2 'cat 2' 3 'cat3' 4 'cat4' 5 'cat5'.
VALUE LABELS v3  1 'cat1' 2 'cat2' 3 'cat3' 4 'cat 4' 5 'cat5'.
VALUE LABELS v4  1 'cat1' 2 'cat2' 3 'cat3' 4 'cat 4' 5 'cat 5'.
VALUE LABELS v5  1 'cat1' 2 'cat2' 3 'cat3' 4 'cat 4' 5 'cat5'.
VALUE LABELS s1 TO s5 '1' 'cat1' '2' 'cat2' '3' 'cat3' '4' 'cat4' '5' 'cat5'.
ALTER TYPE s2 (A2).
VARIABLE LABELS v1 'var 1' v2 'var2' v3 'var3' v4 'var 4' v5 'var5'
                s1 'var6' s2 'var7' s3 'vat8' s4 'var9' s5 'var 10'.
DATASET NAME @rawdata2@.

DEFINE CompareFileMetaData (  !POS !CMDEND) .
!DO !D !IN (!1)
DATASET ACTIVATE !D.
!LET !Info=!CONCAT(!D,'_info')
!LET !labl=!CONCAT(!D,'_labl')
DATASET DECLARE !info.
DATASET DECLARE !labl.
OMS
  /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE=!info.
OMS
 /SELECT TABLES
  /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
  /DESTINATION FORMAT=SAV NUMBERED=TableNumber_
   OUTFILE=!labl.
DISPLAY DICTIONARY.
OMSEND.
OMSEND.
DATASET ACTIVATE !info.
DELETE VARIABLES TableNumber_ Command_ Subtype_ Label_.
SORT CASES BY var1.
ALTER TYPE ALL (A8).
VARSTOCASES / MAKE !D FROM Position TO WriteFormat /INDEX=varname(!D).
SORT CASES BY var1 varname .
DATASET ACTIVATE !labl.
DELETE VARIABLES TableNumber_ Command_ Subtype_ Label_.
SORT CASES BY var1 var2.
RENAME VARIABLES label=!CONCAT('label_',!D).
!DOEND.
MATCH FILES
 /FILE !CONCAT(!HEAD(!1),'_info')
 /FILE !CONCAT(!SUBSTR(!TAIL(!1),2),'_info')
 /BY var1 varname .
COMPUTE different=!HEAD(!1) NE !TAIL(!1).
EXECUTE.
DATASET NAME combinedInfo.
MATCH FILES
 /FILE !CONCAT(!HEAD(!1),'_labl')
 /FILE !CONCAT(!SUBSTR(!TAIL(!1),2),'_labl')
 /BY var1 var2 .
COMPUTE different=!CONCAT('label_',!HEAD(!1)) NE !CONCAT('label_',!SUBSTR(!TAIL(!1),2)).
EXECUTE.
DATASET Name combinedLabels.

DATASET CLOSE !CONCAT(!HEAD(!1),'_info').
DATASET CLOSE !CONCAT(!SUBSTR(!TAIL(!1),2),'_info').
DATASET CLOSE !CONCAT(!HEAD(!1),'_labl').
DATASET CLOSE !CONCAT(!SUBSTR(!TAIL(!1),2),'_labl').
!ENDDEFINE.
SET MPRINT ON /PRINTBACK ON.
CompareFileMetaData @Rawdata1@ @rawdata2@.


Art Kendall wrote
I have used OMS a few times.

I'll add looking at OMS again to my bucket list.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"