File merger problem

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

File merger problem

Bob Schacht-3
I'm trying to do my most complex data management operation of the year, matching two files with more than 150 variables using the UPDATE command, with version 12:
UPDATE FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder\12 Working 08 projects\VRISS\ALLQuExt2008-No_Dupes.sav' /IN=IN_Qtrs
/FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder\
   12 Working 08 projects\RSA-911\RSA911Nov2008.sav' /IN=IN_RSA
/BY SSN  DATEAPPL
/MAP.

I have sorted both files in order by SSN and then by DATEAPPL using the Data/Sort menu listing SSN first and DATEAPPL second.
Visual inspection of the two files appears to confirm that the files are, indeed, sorted by SSN and DATEAPPL. The MAP is generated,
but then I get the following error:

File #2
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE.  All files must be in
>non-descending order on the BY variables.  Use SORT CASES to sort the file
>in order.
>This command not executed.


>Note # 35
>There is no working file to be restored.  The state of the SPSS Processor
>is as if a NEW FILE command had been executed.

Upon searching one of the files, I do find that the indicated key is out of place. So I re-sort the file, search it for the problem key, and verify that it is in proper sort order. I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order. What's going wrong here?

Bob Schacht
University of Hawaii
Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

Richard Ristow
At 10:20 PM 4/3/2009, Bob Schacht wrote:

I'm trying to [match] two files with more than 150 variables using the UPDATE command, with version 12:

Code reformatted:

UPDATE
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
      /IN=IN_Qtrs
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
     /IN=IN_RSA
  /BY SSN  DATEAPPL
  /MAP.

I have sorted both files in order by SSN and then by DATEAPPL using the Data/Sort menu listing SSN first and DATEAPPL second.

I hope that means you've sorted on both variables in one SORT operation, not two successive ones.

[Then] I get the following error:

File #2
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE.  All files must be in
>non-descending order on the BY variables.  Use SORT CASES to sort the file
>in order.
>This command not executed.

Upon searching one of the files, I do find that the indicated key is out of place. So I re-sort the file, search it for the problem key, and verify that it is in proper sort order.

First of all, I trust you aren't just confirming this visually. An automatic way to do it is,

ADD FILES
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
  /BY SSN  DATEAPPL.

and the same for the second file. Do they pass these tests?

I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order.

It makes no sense, but there isn't enough to go on, either. Try testing with small subsets of the files, small enough that you could possibly send complete test data and code.

Wish I could do more, but here's for now....

Richard
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

Bob Schacht-3
At 05:47 PM 4/4/2009, Richard Ristow wrote:
At 10:20 PM 4/3/2009, Bob Schacht wrote:

I'm trying to [match] two files with more than 150 variables using the UPDATE command, with version 12:

Code reformatted:

UPDATE
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
      /IN=IN_Qtrs
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
     /IN=IN_RSA
  /BY SSN  DATEAPPL
  /MAP.

I have sorted both files in order by SSN and then by DATEAPPL using the Data/Sort menu listing SSN first and DATEAPPL second.

I hope that means you've sorted on both variables in one SORT operation, not two successive ones.

Yes


[Then] I get the following error:

File #2
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE.  All files must be in
>non-descending order on the BY variables.  Use SORT CASES to sort the file
>in order.
>This command not executed.

Upon searching one of the files, I do find that the indicated key is out of place. So I re-sort the file, search it for the problem key, and verify that it is in proper sort order.

First of all, I trust you aren't just confirming this visually.

After getting the error notice, I highlighted the variable in Data View and conducted a "Search" for the key. I wanted to verify visually that the file was in order before the Update, but somehow got out of order after the update attempt.

An automatic way to do it is,

ADD FILES
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
  /BY SSN  DATEAPPL.

and the same for the second file.

I don't think I understand. Using ADD FILES is just another way to sort the files using syntax, isn't it? How is this an "automatic" way to examine the error? If I do this without the visual examination, how will I know if the file has become unsorted? What I am trying to understand is how or why the file becomes unsorted during the UPDATE operation, because I verified (by visual inspection) that both files are in proper sort order (at least at the beginning of the file, and in the vicinity of the problem key) before doing the UPDATE.

BTW, I tried the same operation using ver. 16 and the same thing happened.

Do they pass these tests?

I'm not sure I understand how this constitutes a "test".

And BTW, I do not want to use ADD FILES to accomplish the file merger. We had a discussion about this several years ago, and at that time I determined that using ADD FILES did not work the way I needed it to work, but that your suggestion (then) that I use UPDATE achieved the merger in a more desirable manner. I have used the UPDATE merger successfully for several years now, until this year.


I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order.

It makes no sense, but there isn't enough to go on, either. Try testing with small subsets of the files, small enough that you could possibly send complete test data and code.

This would be difficult, because both files have about 150 variables, and thousands of cases. The variable sets overlap, but each has dozens of variables that the other file does not have. And for the variables that do overlap, I want the data from one file to be retained, in place of the data from the other file.  So I'm not sure how I could extract a meaningful subset of this data that would assist in diagnosing the problem.


Wish I could do more, but here's for now....

Thanks for trying!

Bob Schacht
Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

DataMaestro
Just a long shot, but why not try deleteing that 1 record before any operation.  It might have somehow gotten corrupted.
 
George

--- On Sun, 4/5/09, Bob Schacht <[hidden email]> wrote:
From: Bob Schacht <[hidden email]>
Subject: Re: File merger problem
To: [hidden email]
Date: Sunday, April 5, 2009, 1:18 AM

At 05:47 PM 4/4/2009, Richard Ristow wrote:
At 10:20 PM 4/3/2009, Bob Schacht wrote:

I'm trying to [match] two files with more than 150 variables using the UPDATE command, with version 12:

Code reformatted:

UPDATE
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
      /IN=IN_Qtrs
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
     /IN=IN_RSA
  /BY SSN  DATEAPPL
  /MAP.

I have sorted both files in order by SSN and then by DATEAPPL using the Data/Sort menu listing SSN first and DATEAPPL second.

I hope that means you've sorted on both variables in one SORT operation, not two successive ones.

Yes


[Then] I get the following error:

File #2
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE.  All files must be in
>non-descending order on the BY variables.  Use SORT CASES to sort the file
>in order.
>This command not executed.

Upon searching one of the files, I do find that the indicated key is out of place. So I re-sort the file, search it for the problem key, and verify that it is in proper sort order.

First of all, I trust you aren't just confirming this visually.

After getting the error notice, I highlighted the variable in Data View and conducted a "Search" for the key. I wanted to verify visually that the file was in order before the Update, but somehow got out of order after the update attempt.

An automatic way to do it is,

ADD FILES
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
  /BY SSN  DATEAPPL.

and the same for the second file.

I don't think I understand. Using ADD FILES is just another way to sort the files using syntax, isn't it? How is this an "automatic" way to examine the error? If I do this without the visual examination, how will I know if the file has become unsorted? What I am trying to understand is how or why the file becomes unsorted during the UPDATE operation, because I verified (by visual inspection) that both files are in proper sort order (at least at the beginning of the file, and in the vicinity of the problem key) before doing the UPDATE.

BTW, I tried the same operation using ver. 16 and the same thing happened.

Do they pass these tests?

I'm not sure I understand how this constitutes a "test".

And BTW, I do not want to use ADD FILES to accomplish the file merger. We had a discussion about this several years ago, and at that time I determined that using ADD FILES did not work the way I needed it to work, but that your suggestion (then) that I use UPDATE achieved the merger in a more desirable manner. I have used the UPDATE merger successfully for several years now, until this year.


I run the UPDATE again, and Voila! Same error message. So it would seem that somehow the Update operation is getting the file out of order.

It makes no sense, but there isn't enough to go on, either. Try testing with small subsets of the files, small enough that you could possibly send complete test data and code.

This would be difficult, because both files have about 150 variables, and thousands of cases. The variable sets overlap, but each has dozens of variables that the other file does not have. And for the variables that do overlap, I want the data from one file to be retained, in place of the data from the other file.  So I'm not sure how I could extract a meaningful subset of this data that would assist in diagnosing the problem.


Wish I could do more, but here's for now....

Thanks for trying!

Bob Schacht

Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

Richard Ristow
In reply to this post by Bob Schacht-3
At 01:18 AM 4/5/2009, Bob Schacht wrote:
At 05:47 PM 4/4/2009, Richard Ristow wrote:

I get the following error:

File #2
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE.  All files must be in
>non-descending order on the BY variables.  Use SORT CASES to sort the file
>in order.
>This command not executed.

Upon searching one of the files, I do find that the indicated key is out of place. After getting the error notice, I highlighted the variable in Data View and conducted a "Search" for the key.




An automatic way to [check whether a file is in order] is,

ADD FILES
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
  /BY SSN  DATEAPPL.

and the same for the second file.

I don't think I understand. Using ADD FILES is just another way to sort the files using syntax, isn't it? How is this an "automatic" way to examine the error? If I do this without the visual examination, how will I know if the file has become unsorted?

No, ADD FILES doesn't sort the data. ADD FILES with a "/BY" subcommand requires that the data be in ascending order, but it doesn't sort it. So, running a file through an ADD FILES is a good test for keys out of order; just look for error messages like the above.

It's also possible to check whether a file's in order by using LAG in a transformation program. That's more flexible and powerful, but considerably more complicated to write.

And BTW, I do not want to use ADD FILES to accomplish the file merger.

Understood. I don't recommend it for that, but for checking the files.

What I am trying to understand is how or why the file becomes unsorted during the UPDATE operation, because I verified (by visual inspection) that both files are in proper sort order (at least at the beginning of the file, and in the vicinity of the problem key) before doing the UPDATE.

Here's my problem: I don't think that UPDATE is doing that. I know of no other report that it might have, and the file-reading commands (GET FILE, ADD FILES, MATCH FILES, and UPDATE) are designed to treat their inputs as input only. So I'm very interested in algorithmic testing that will show what is happening -- including, if I'm wrong and UPDATE is managing to change its input files.

Algorithmic testing is important. It's very difficult to visually check whether a file is sorted, and get reliable results. I wouldn't trust myself to do it.

But, here is another test that should tell you a lot:

Your input files are disk files, not datasets. So, once you've created and sorted them, but before you try the UPDATE, make them read-only so SPSS can't change them at all. Then, try either the ADD FILES check I've suggested, or your UPDATE run.

If the UPDATE fails because SPSS wants to change the input file and can't, there you go: what I thought couldn't happen, is happening.

If the UPDATE fails as you've been seeing, the next step is to test the sort order of the inputs algorithmically: Use ADD FILES; or, create a variable whose value is the current $CASENUM, sort again, and test for mismatches between that variable and the new $CASENUM; or, write a transformation program to test the sort order.

It'll also help to check your SPSS journal file, and post the exact syntax that sorted and saved the files that are now giving you trouble.

And, that's what I can think of for now. Best of luck to you,
Richard
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

Bob Schacht-3
At 02:57 PM 4/5/2009, Richard Ristow wrote:
At 01:18 AM 4/5/2009, Bob Schacht wrote:
At 05:47 PM 4/4/2009, Richard Ristow wrote:

[snip]

An automatic way to [check whether a file is in order] is,

ADD FILES
   FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\VRISS\ALLQuExt2008-No_Dupes.sav'
  /BY SSN  DATEAPPL.

and the same for the second file.

I don't think I understand. Using ADD FILES is just another way to sort the files using syntax, isn't it? How is this an "automatic" way to examine the error? If I do this without the visual examination, how will I know if the file has become unsorted?

No, ADD FILES doesn't sort the data. ADD FILES with a "/BY" subcommand requires that the data be in ascending order, but it doesn't sort it. So, running a file through an ADD FILES is a good test for keys out of order; just look for error messages like the above.

Well, this is interesting. I loaded the file with the problem key, sorted it using the menu, and then tried your ADD FILES check. Here's what happened:

GET
  FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder
   \12 Working 08 projects\RSA'+
 '-911\RSA911Nov2008.sav'.
SORT CASES BY
  SSN (A) DATEAPPL (A) .
ADD FILES
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
  /BY SSN  DATEAPPL.
EXECUTE.
File #1
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE.  All files must be in
>non-descending order on the BY variables.  Use SORT CASES to sort the file
>in order.
>This command not executed.

So my menu-directed sort
SORT CASES BY
  SSN (A) DATEAPPL (A) .
did not produce results that ADD FILES found acceptable.

Does it make a difference that the Menu-driven Sort Cases By identified both variables as strings, but the ADD FILES command did not?
What is going on here?

Thanks,
Bob
Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

Richard Ristow
At 03:46 PM 4/6/2009, Bob Schacht wrote:

Well, this is interesting. I loaded the file with the problem key, sorted it using the menu, and then tried your ADD FILES check. Here's what happened:

GET
  FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder
   \12 Working 08 projects\RSA'+
 '-911\RSA911Nov2008.sav'.
SORT CASES BY
  SSN (A) DATEAPPL (A) .
ADD FILES
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
  /BY SSN  DATEAPPL.
EXECUTE.
File #1
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE. [...]

Ah, HAH!

SORT CASES sorts the working file ('active dataset', release 14 and later). That's not the file you loaded; it's a copy of it. (Which means you can load a file and do all sorts of selections and modifications without changing the original.)

You loaded the file, sorted it, and then read the old, pre-sorted file. You need a SAVE. (Below, I'm also fixing line breaks so the code should run; line breaks in the posted version would 'break' it.)



GET
  FILE='C:\Documents and Settings\R. Schacht\My Documents' +
         '\1 BASIC VR File Folder' +
         '\12 Working 08 projects\RSA'+
         '-911\RSA911Nov2008.sav'.
SORT CASES BY
  SSN (A) DATEAPPL (A) .
SAVE
  OUTFILE='C:\Documents and Settings\R. Schacht\My Documents' +
         '\1 BASIC VR File Folder' +
         '\12 Working 08 projects\RSA'+
         '-911\RSA911Nov2008.sav'.
ADD FILES
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
  /BY SSN  DATEAPPL.

By the way, you asked,

Does it make a difference that the Menu-driven Sort Cases By identified both variables as strings?

It didn't. The (A) means 'sort in ascending order'; (D) is also accepted, for descending order. (A) is the default, so most people don't write it.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: File merger problem

Bob Schacht-3
At 10:25 AM 4/6/2009, Richard Ristow wrote:
At 03:46 PM 4/6/2009, Bob Schacht wrote:

Well, this is interesting. I loaded the file with the problem key, sorted it using the menu, and then tried your ADD FILES check. Here's what happened:

GET
  FILE='C:\Documents and Settings\R. Schacht\My Documents\1 BASIC VR File Folder
   \12 Working 08 projects\RSA'+
 '-911\RSA911Nov2008.sav'.
SORT CASES BY
  SSN (A) DATEAPPL (A) .
ADD FILES
  /FILE='C:\Documents and Settings\R. Schacht\My Documents' +
          '\1 BASIC VR File Folder\12 Working 08 projects'  +
          '\RSA-911\RSA911Nov2008.sav'
  /BY SSN  DATEAPPL.
EXECUTE.
File #1
     KEY: 559929581       1.33E+10

>Error # 5141
>File out of order doing ADD FILES or UPDATE. [...]

Ah, HAH!

SORT CASES sorts the working file ('active dataset', release 14 and later). That's not the file you loaded; it's a copy of it. (Which means you can load a file and do all sorts of selections and modifications without changing the original.)

You loaded the file, sorted it, and then read the old, pre-sorted file. You need a SAVE. (Below, I'm also fixing line breaks so the code should run; line breaks in the posted version would 'break' it.) . . .

Bingo! Thanks, Richard!
I've added the reminder to SAVE to my manual and merger syntax file.

I guess this is one to file under "challenge the users assumptions."
I assumed that the sort I did got saved, and saved in the right place. Wrong!

Bob Schacht