Bottom line. MATCH FILES does not put the value in a TABLE into the master
(big) file if there is already a value in the master file for that variable. I do not recall seeing this kind of problem before. The people who had the data were raw beginners in various countries. The did not return corrections in the form of excel columns ID var1. They returned corrections in excel in the form of columns of BadValue GoodValue QA checks on a survey date variable identified some dates which were not possible. E.g., before or after people were in the field. 3 specific problems were found. For this one variable, a few IF statements would have worked. IF SurveyDate EQ something SurveyDate = something else. However, some other variables had many corrections to be made. The corrections were also in the form of columns BadValue Correctvalue The attached files are Example Dates.sav an example of a big or master file. SurveyDate Updates.sav the TABLE file with BadValue & Correctvalue columns. does not make correction.sps which did not work Workaround.sps which matcha the values to be corrected, creates a flag that the value is to be left alone and then copies the okay old dates into the date variable. Example_Dates.sav <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/Example_Dates.sav> SurveyDate_Updates.sav <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/SurveyDate_Updates.sav> does_not_make_correction.sps <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/does_not_make_correction.sps> Workaround.sps <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/Workaround.sps> ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
It looks to me like you have out-of-range values that you are
correcting. Without further information - I would use RECODE
to fix them in one pass. Depending on the circumstances, I might
follow this with some version of file-compare in order to document
which values were changed, FROM what TO what, for which IDs.
And beg for data with updates listed by ID, var, values (old and new).
I assume that you also prefer to update using UPDATE /file=
when you can get the data that way.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Art Kendall <[hidden email]>
Sent: Saturday, July 4, 2020 2:02 PM To: [hidden email] <[hidden email]> Subject: Repairing data entry mistakes unexpected result MATCH FILES with TABLE Bottom line. MATCH FILES does not put the value in a TABLE into the master
(big) file if there is already a value in the master file for that variable. I do not recall seeing this kind of problem before. The people who had the data were raw beginners in various countries. The did not return corrections in the form of excel columns ID var1. They returned corrections in excel in the form of columns of BadValue GoodValue QA checks on a survey date variable identified some dates which were not possible. E.g., before or after people were in the field. 3 specific problems were found. For this one variable, a few IF statements would have worked. IF SurveyDate EQ something SurveyDate = something else. However, some other variables had many corrections to be made. The corrections were also in the form of columns BadValue Correctvalue The attached files are Example Dates.sav an example of a big or master file. SurveyDate Updates.sav the TABLE file with BadValue & Correctvalue columns. does not make correction.sps which did not work Workaround.sps which matcha the values to be corrected, creates a flag that the value is to be left alone and then copies the okay old dates into the date variable. Example_Dates.sav <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/Example_Dates.sav> SurveyDate_Updates.sav <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/SurveyDate_Updates.sav> does_not_make_correction.sps <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/does_not_make_correction.sps> Workaround.sps <http://spssx-discussion.1045642.n5.nabble.com/file/t47554/Workaround.sps> ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks for getting back to me. Happy 4th!
Some dates were out of range. One bunch was where a coder keyed a date as mm/dd/yyyy instead of dd/mm/yyyy. Fortunately, there were no people in the field for a particular date. The workaround I posted is a kludge that worked. With only a few values to correct some IFs would also work. I have never used RECODE on a date variable. I have never used a function in a recode argument. I'll try it. It would be possible to change Dates to Strings and then RECODE the strings. I'm back. This did not work. New file. data list list/ Old.Date(Date11) NeedsFix (f1). BEGIN DATA 19-Mar-1976 1 12-Oct-2010 0 12-Oct-2010 0 09-Dec-2010 0 09-Dec-2010 0 09-Dec-2010 0 09-Dec-2010 0 09-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 02-Jun-2020 1 02-Jun-2020 1 02-Jun-2020 1 02-Jun-2020 1 30-Oct-2108 1 END DATA. VALUE LABELS NeedsFix (1) 'Yes' 0'No'. RECODE Old.Date (DATE.DMY(19,3,1976) = DATE.DMY(12,3,2017)) (DATE.DMY(2,6,2020) = DATE.DMY(6,2,2020)) (DATE.DMY(30,10,2108) = DATE.DMY(30,10,2018)) (ELSE = COPY) /INTO New.Date. FORMATS New.Date (Date11). LIST. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
Happy 4th to you Art, and to all the other American members.
Art, have you considered DO-REPEAT? E.g., COMPUTE New.Date = Old.Date. *Fix the ones that need fixing. DO REPEAT d1 = 19 2 30 / d2 = 12 6 30 / m1 = 3 6 10 / m2 = 3 2 30 / y1 = 1976 2020 2108 / y2 = 2017 2020 2018 . IF Old.Date EQ DATE.DMY(d1,m1,y1) New.Date = DATE.DMY(d2,m2,y2). END REPEAT. FORMATS New.Date (Date11). LIST. I think this does what you want to do. PS- I first tried altering type to string and recoding the string variable as you suggested, but it was not working for some reason, even after I added RTRIM(LTRIM()) etc. Then DO-REPEAT occurred to me. Art Kendall wrote > Thanks for getting back to me. Happy 4th! > > Some dates were out of range. > One bunch was where a coder keyed a date as mm/dd/yyyy instead of > dd/mm/yyyy. Fortunately, there were no people in the field for a > particular > date. > > The workaround I posted is a kludge that worked. > With only a few values to correct some IFs would also work. > > I have never used RECODE on a date variable. I have never used a > function > in a recode argument. I'll try it. > It would be possible to change Dates to Strings and then RECODE the > strings. > > I'm back. > This did not work. > > New file. > data list list/ Old.Date(Date11) NeedsFix (f1). > BEGIN DATA > 19-Mar-1976 1 > 12-Oct-2010 0 > 12-Oct-2010 0 > 09-Dec-2010 0 > 09-Dec-2010 0 > 09-Dec-2010 0 > 09-Dec-2010 0 > 09-Dec-2010 0 > 10-Dec-2010 0 > 10-Dec-2010 0 > 10-Dec-2010 0 > 10-Dec-2010 0 > 10-Dec-2010 0 > 10-Dec-2010 0 > 14-Dec-2010 0 > 14-Dec-2010 0 > 14-Dec-2010 0 > 14-Dec-2010 0 > 14-Dec-2010 0 > 02-Jun-2020 1 > 02-Jun-2020 1 > 02-Jun-2020 1 > 02-Jun-2020 1 > 30-Oct-2108 1 > END DATA. > VALUE LABELS NeedsFix (1) 'Yes' 0'No'. > RECODE Old.Date > (DATE.DMY(19,3,1976) = DATE.DMY(12,3,2017)) > (DATE.DMY(2,6,2020) = DATE.DMY(6,2,2020)) > (DATE.DMY(30,10,2108) = DATE.DMY(30,10,2018)) > (ELSE = COPY) > /INTO New.Date. > FORMATS New.Date (Date11). > LIST. > > > > > ----- > Art Kendall > Social Research Consultants > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Now we have several ways to do this. One of the benefits of this list is how
many ways there are to do things. IMO one of the weaknesses of much math & stat education is that we teach "this is *THE* way" rather than "this is ONE way". I think some list members cannot get the attachments I am pasting below the kludge syntax that worked, the "TABLE" of corrections, and some example data. The kludge uses MATCH FILES with a TABLE to create a new variable that has *only* cases that need correction. *NOT the IN variable* is then used as a condition to copy the old date to the new variable. ------ syntax. *. * Correct entries in BigFile. GET FILE='somplace\Example Dates.sav'. DATASET NAME BigFile WINDOW=FRONT. RENAME VARIABLES Survey.Date = Old.Survey.Date. SORT CASES by Old.Survey.Date. GET FILE='someplace\SurveyDate Updates.sav'. DATASET NAME Corrections WINDOW=FRONT. DATASET ACTIVATE Corrections. SORT CASES by Old.Survey.Date. MATCH FILES file = Bigfile /TABLE = Corrections /IN = fixed /BY Old.Survey.Date /map. EXECUTE. /* MATCH is a transformation. DATASET NAME Repaired. DATASET ACTIVATE Repaired. LIST. IF NOT Fixed Survey.Date = Old.Survey.Date. /* < *** very important line. LIST. ------ Corrections Old date and what date should be 19-Mar-1976 12-Mar-2017 02-Jun-2020 06-Feb-2020 30-Oct-2108 30-Oct-2018 ------ Example data -- dates and a flag to demo which cases need to be corrected. 19-Mar-1976 1 12-Oct-2010 0 12-Oct-2010 0 09-Dec-2010 0 09-Dec-2010 0 09-Dec-2010 0 09-Dec-2010 0 09-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 10-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 14-Dec-2010 0 02-Jun-2020 1 02-Jun-2020 1 02-Jun-2020 1 02-Jun-2020 1 30-Oct-2108 1 ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
To expand the plethora further, here is a Python example using the vlookup function in the extendedTransforms.py module installed with Statistics and the SPSSINC TRANS extension command. The vlookup function works like its eponym in Excel. This example is from comments in extendedTransforms.py. The INITIAL subcommand in SPSSINC TRANS causes the lookup table to be read into memory and creates a little function named func that sets up to use that file. Then the vlookup function is called for each case passing the key - the y variable - and returning the matching value. If there is no match, it returns sysmis. The result is stored in variable resultcode. No sorting is required. * The lookup table. data list free/ value(F8.0) akey(A1). begin data 10 'a' 20 'b' 100 'z' end data. dataset name lookup. * The main dataset. data list free/x(f8.0) y(A2). begin data 1 'a' 2 'b' 5 'a ' 10 '' 1 'b' end data. dataset name main. dataset activate main. spssinc trans result = resultcode /initial "extendedTransforms.vlookup('akey', 'value', 'lookup')" /formula func(y). On Sun, Jul 5, 2020 at 8:08 AM Art Kendall <[hidden email]> wrote: Now we have several ways to do this. One of the benefits of this list is how |
Should be a useful tool.
Is there a way to have it return a user missing value or would we just have to recode teh result? ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
That function does not know what the missing value codes are for the lookup variable. While that could be added, it would be easier just to map the sysmis or empty string to another value on the SPSS side. On Sun, Jul 5, 2020 at 1:56 PM Art Kendall <[hidden email]> wrote: Should be a useful tool. |
Administrator
|
In reply to this post by Bruce Weaver
Oops. Just noticed that I had m2 = 30 rather than 10 for the 3rd date. It
also occurred to me that the DO-REPEAT structure can be made more readable by ordering things differently and using vertical alignment. (To see the vertical alignment properly, view with a fixed font.) COMPUTE New.Date = Old.Date. *Fix the ones that need fixing. DO REPEAT d1 = 19 2 30 / m1 = 3 6 10 / y1 = 1976 2020 2108 / d2 = 12 6 30 / m2 = 3 2 10 / y2 = 2017 2020 2018 . IF Old.Date EQ DATE.DMY(d1,m1,y1) New.Date = DATE.DMY(d2,m2,y2). END REPEAT. FORMATS New.Date (Date11). LIST. Bruce Weaver wrote > Happy 4th to you Art, and to all the other American members. > > Art, have you considered DO-REPEAT? E.g., > > COMPUTE New.Date = Old.Date. > *Fix the ones that need fixing. > DO REPEAT > d1 = 19 2 30 / > d2 = 12 6 30 / > m1 = 3 6 10 / > m2 = 3 2 30 / > y1 = 1976 2020 2108 / > y2 = 2017 2020 2018 . > IF Old.Date EQ DATE.DMY(d1,m1,y1) New.Date = DATE.DMY(d2,m2,y2). > END REPEAT. > FORMATS New.Date (Date11). > LIST. > > --- snip --- ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Our old friend vertical alignment.
----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
My preferred command for updating data files is the command by the same name: SORT CASES BY ID . UPDATE FILE='Example_Dates.sav' /FILE='SurveyDate_Updates.sav' /BY ID /MAP. SAVE OUTFILE=’Updated File.SAV’. The correction is done for all variables listed in the SurveyDate_Updates.sav file marching cases by ID, as long as the variables have the same name and format as in the Master file. Both files should be sorted by ID. I always have an ID variable in my data sets. If a file comes with no ID, I create as follows: CD 'C:\Service\SPSSGroup'. GET FILE='Example_Dates.sav'. DATASET NAME MasterFile WINDOW=FRONT. COMPUTE ID=$CASENUM. FORMATS ID (F3). EXECUTE. The file "SurveyDate_Updates.sav" with corrections should look like (no need for OLD-dates) : ID Survey.Date 6 12-MAR-2017 25 06-FEB-2020 26 06-FEB-2020 27 06-FEB-2020 28 06-FEB-2020 29 30-OCT-2018 regards Forcheh On Sun, Jul 5, 2020 at 5:47 PM Art Kendall <[hidden email]> wrote: Our old friend vertical alignment. |
Update works well when the corrections are available in the ID and value
setup. It does not work with corrections in the format OldValue NewValue The aim was to do bulk corrections instead of a big set of IFs. if OldVar eq 'Cop' NewVar = 'Police Officer'. if OldVar eq 'Beat Cop' NewVar = 'Police Officer' The workaround was to use MATCH FILES and the TABLE option to create NewVar. Use the IN option to flag which cases had NewVar filled in. Copy values from OldVar to NewVar only when the flag was NOT set. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Fully agree with your comment about update Art. I am returning to this forum after several years off. I notice somethings never change -- hahahaha. One is that data management remains a serious problem for too many users and this shows in their questions. You will notice from the question at hand that the wrong dates were identified from the main file, by adding a column with values 1 if error and 0 otherwise. With an ID variable, they would simply have select those rows and imputed the correct dates against each ID (along with correct values for any other variable). When I work with people in face to face meetings, I have realized that getting to understand how the problem arose in the first place really helps not just with the immediate problem but other problems moving forward, and a better understanding of SPSS. Attempting to give an answer purely to the question asked, at times encourages some of the not so good habits and fear of SPSS. In one exchange, a client who used to think that if you are not a SAS expert you are not a statistician ended up getting an SPSS license when they saw how easy it was to use SPSS for survey data management and analysis compared to what he was doing with SAS. On Thu, Jul 9, 2020 at 10:49 AM Art Kendall <[hidden email]> wrote: Update works well when the corrections are available in the ID and value |
See my post About lesseons I have learned.
----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |