Hello All
I have been browsing the list serve for some time now and I have a question that prolly is redundant but I am wanting to bring it up again if possible to get some new opinions. I have been giving two datasets, 2004 and 2006 of criminal history. The 2004 dataset was collected and coded by a prior organization and the 2006 dataset is coded in house. My job is develope a set of standards/programs to determine if the 2006 dataset is similar to the 2004 dataset. I do not have the code that produced the 2004 dataset but I do have the code for the 2006. I figured that I could run a few basic statisitics to see how these independant samples were different but on many variables this is very true and I can not determine if the programming behind the 2006 dataset is correct or not. Are there any ideas/opinions that can help in this hunt? Are there some procedures that you use to varify the datasets before you begin your analsis? |
I think some more detail will be necessary. What is coded?
For example, is it just the offences committed or is it demographics and other variables as well? If it is offences you are concerned with, is there an id number or some way to determine if an individual is included in the two years. If individuals appear in both years and 2004 offences are also coded in the 2005 dataset, this might provide an opportunity to compare how the coding compares and was done. regards Bob At 02:30 AM 29/12/2010, JKRockStomper wrote: >Hello All > >I have been browsing the list serve for some time now and I have a question >that prolly is redundant but I am wanting to bring it up again if possible >to get some new opinions. > >I have been giving two datasets, 2004 and 2006 of criminal history. The >2004 dataset was collected and coded by a prior organization and the 2006 >dataset is coded in house. My job is develope a set of standards/programs >to determine if the 2006 dataset is similar to the 2004 dataset. I do not >have the code that produced the 2004 dataset but I do have the code for the >2006. > >I figured that I could run a few basic statisitics to see how these >independant samples were different but on many variables this is very true >and I can not determine if the programming behind the 2006 dataset is >correct or not. Are there any ideas/opinions that can help in this hunt? >Are there some procedures that you use to varify the datasets before you >begin your analsis? > >-- >View this message in context: >http://spssx-discussion.1045642.n5.nabble.com/Comparing-coded-datasets-tp3320418p3320418.html >Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |