|
I'm trying to do my most complex data management operation of the year,
matching two files with more than 150 variables using the UPDATE command,
with version 12:
UPDATE FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder\12 Working 08 projects\VRISS\ALLQuExt2008-No_Dupes.sav' /IN=IN_Qtrs /FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder\ 12 Working 08 projects\RSA-911\RSA911Nov2008.sav' /IN=IN_RSA /BY SSN DATEAPPL /MAP. I have sorted both files in order by SSN and then by DATEAPPL using the Data/Sort menu listing SSN first and DATEAPPL second. Visual inspection of the two files appears to confirm that the files are, indeed, sorted by SSN and DATEAPPL. The MAP is generated, but then I get the following error: File #2 Upon searching one of the files, I do find that the indicated key is out of place. So I re-sort the file, search it for the problem key, and verify that it is in proper sort order. I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order. What's going wrong here? Bob Schacht University of Hawaii |
|
At 10:20 PM 4/3/2009, Bob Schacht wrote:
I'm trying to [match] two files with more than 150 variables using the UPDATE command, with version 12: Code reformatted: UPDATE I hope that means you've sorted on both variables in one SORT operation, not two successive ones. [Then] I get the following error: First of all, I trust you aren't just confirming this visually. An automatic way to do it is, ADD FILES and the same for the second file. Do they pass these tests? I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order. It makes no sense, but there isn't enough to go on, either. Try testing with small subsets of the files, small enough that you could possibly send complete test data and code. Wish I could do more, but here's for now.... Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 05:47 PM 4/4/2009, Richard Ristow wrote:
At 10:20 PM 4/3/2009, Bob Schacht wrote: UPDATE I hope that means you've sorted on both variables in one SORT operation, not two successive ones. Yes [Then] I get the following error: After getting the error notice, I highlighted the variable in Data View and conducted a "Search" for the key. I wanted to verify visually that the file was in order before the Update, but somehow got out of order after the update attempt. An automatic way to do it is, I don't think I understand. Using ADD FILES is just another way to sort the files using syntax, isn't it? How is this an "automatic" way to examine the error? If I do this without the visual examination, how will I know if the file has become unsorted? What I am trying to understand is how or why the file becomes unsorted during the UPDATE operation, because I verified (by visual inspection) that both files are in proper sort order (at least at the beginning of the file, and in the vicinity of the problem key) before doing the UPDATE. BTW, I tried the same operation using ver. 16 and the same thing happened. Do they pass these tests? I'm not sure I understand how this constitutes a "test". And BTW, I do not want to use ADD FILES to accomplish the file merger. We had a discussion about this several years ago, and at that time I determined that using ADD FILES did not work the way I needed it to work, but that your suggestion (then) that I use UPDATE achieved the merger in a more desirable manner. I have used the UPDATE merger successfully for several years now, until this year. I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order. This would be difficult, because both files have about 150 variables, and thousands of cases. The variable sets overlap, but each has dozens of variables that the other file does not have. And for the variables that do overlap, I want the data from one file to be retained, in place of the data from the other file. So I'm not sure how I could extract a meaningful subset of this data that would assist in diagnosing the problem. Wish I could do more, but here's for now.... Thanks for trying! Bob Schacht |
|
|
In reply to this post by Bob Schacht-3
At 01:18 AM 4/5/2009, Bob Schacht wrote:
At 05:47 PM 4/4/2009, Richard Ristow wrote: An automatic way to [check whether a file is in order] is, No, ADD FILES doesn't sort the data. ADD FILES with a "/BY" subcommand requires that the data be in ascending order, but it doesn't sort it. So, running a file through an ADD FILES is a good test for keys out of order; just look for error messages like the above. It's also possible to check whether a file's in order by using LAG in a transformation program. That's more flexible and powerful, but considerably more complicated to write. And BTW, I do not want to use ADD FILES to accomplish the file merger. Understood. I don't recommend it for that, but for checking the files. What I am trying to understand is how or why the file becomes unsorted during the UPDATE operation, because I verified (by visual inspection) that both files are in proper sort order (at least at the beginning of the file, and in the vicinity of the problem key) before doing the UPDATE. Here's my problem: I don't think that UPDATE is doing that. I know of no other report that it might have, and the file-reading commands (GET FILE, ADD FILES, MATCH FILES, and UPDATE) are designed to treat their inputs as input only. So I'm very interested in algorithmic testing that will show what is happening -- including, if I'm wrong and UPDATE is managing to change its input files. Algorithmic testing is important. It's very difficult to visually check whether a file is sorted, and get reliable results. I wouldn't trust myself to do it. But, here is another test that should tell you a lot: Your input files are disk files, not datasets. So, once you've created and sorted them, but before you try the UPDATE, make them read-only so SPSS can't change them at all. Then, try either the ADD FILES check I've suggested, or your UPDATE run. If the UPDATE fails because SPSS wants to change the input file and can't, there you go: what I thought couldn't happen, is happening. If the UPDATE fails as you've been seeing, the next step is to test the sort order of the inputs algorithmically: Use ADD FILES; or, create a variable whose value is the current $CASENUM, sort again, and test for mismatches between that variable and the new $CASENUM; or, write a transformation program to test the sort order. It'll also help to check your SPSS journal file, and post the exact syntax that sorted and saved the files that are now giving you trouble. And, that's what I can think of for now. Best of luck to you, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 02:57 PM 4/5/2009, Richard Ristow wrote:
At 01:18 AM 4/5/2009, Bob Schacht wrote: [snip] An automatic way to [check whether a file is in order] is, Well, this is interesting. I loaded the file with the problem key, sorted it using the menu, and then tried your ADD FILES check. Here's what happened: GET So my menu-directed sort SORT CASES BY SSN (A) DATEAPPL (A) . did not produce results that ADD FILES found acceptable. Does it make a difference that the Menu-driven Sort Cases By identified both variables as strings, but the ADD FILES command did not? What is going on here? Thanks, Bob |
|
At 03:46 PM 4/6/2009, Bob Schacht wrote:
Well, this is interesting. I loaded the file with the problem key, sorted it using the menu, and then tried your ADD FILES check. Here's what happened: Ah, HAH! SORT CASES sorts the working file ('active dataset', release 14 and later). That's not the file you loaded; it's a copy of it. (Which means you can load a file and do all sorts of selections and modifications without changing the original.) You loaded the file, sorted it, and then read the old, pre-sorted file. You need a SAVE. (Below, I'm also fixing line breaks so the code should run; line breaks in the posted version would 'break' it.) GET FILE='C:\Documents and Settings\R. Schacht\My Documents' + '\1 BASIC VR File Folder' + '\12 Working 08 projects\RSA'+ '-911\RSA911Nov2008.sav'. SORT CASES BY SSN (A) DATEAPPL (A) . SAVE OUTFILE='C:\Documents and Settings\R. Schacht\My Documents' + '\1 BASIC VR File Folder' + '\12 Working 08 projects\RSA'+ '-911\RSA911Nov2008.sav'. ADD FILES /FILE='C:\Documents and Settings\R. Schacht\My Documents' + '\1 BASIC VR File Folder\12 Working 08 projects' + '\RSA-911\RSA911Nov2008.sav' /BY SSN DATEAPPL. By the way, you asked, Does it make a difference that the Menu-driven Sort Cases By identified both variables as strings? It didn't. The (A) means 'sort in ascending order'; (D) is also accepted, for descending order. (A) is the default, so most people don't write it. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 10:25 AM 4/6/2009, Richard Ristow wrote:
At 03:46 PM 4/6/2009, Bob Schacht wrote: Bingo! Thanks, Richard! I've added the reminder to SAVE to my manual and merger syntax file. I guess this is one to file under "challenge the users assumptions." I assumed that the sort I did got saved, and saved in the right place. Wrong! Bob Schacht |
| Free forum by Nabble | Edit this page |
