This post was updated on .
Hi guys, I have a 550gb database that I am trying to match with another database. this huge database was a import of a SAS files. Ddatabase i received is supposed to be sorted on the ID value and it does look that way to me but when i try to match files by a /by ID variables I get the following error:
File #1 KEY: . >Error # 5130 >File out of order. All the files in MATCH FILES must be in non-descending >order on the BY variables. Use SORT CASES to sort the file. >Execution of this command stops. What does that "key" line mean? do i have a special character somewhere? I have no idea why i am getting this error if the database looks to be sorted by the ID value but looks like I need to resort... Thanks! |
________________________________
> From: devoidx <[hidden email]> >To: [hidden email] >Sent: Monday, September 23, 2013 7:37 PM >Subject: [SPSSX-L] Sorting and memory issues > > >Hi guys, I have a 550gb database that I need to sort...but spss apparently >needs the same amount of space as the database just to sort it and I don't >have that type of space on the computer (albeit I do have it on an external >drive)....what are your suggestions on how i can sort this massive >database?? > >Thanks! IBM recommends a tempdir space to file size ratio of 4:1. You can change the SPSS tempdir via edit >> options >> file locations and choose a tempdir with more space. 550gb still is much though. Perhaps doing at least some of the work on a database (SQL) would be better. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Bear in mind also that, assuming you are
pulling from a database, that you can specify the sort as part of the database
pull rather than doing it later.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Albert-Jan Roskam <[hidden email]> To: [hidden email], Date: 09/23/2013 12:20 PM Subject: Re: [SPSSX-L] Sorting and memory issues Sent by: "SPSSX(r) Discussion" <[hidden email]> ________________________________ > From: devoidx <[hidden email]> >To: [hidden email] >Sent: Monday, September 23, 2013 7:37 PM >Subject: [SPSSX-L] Sorting and memory issues > > >Hi guys, I have a 550gb database that I need to sort...but spss apparently >needs the same amount of space as the database just to sort it and I don't >have that type of space on the computer (albeit I do have it on an external >drive)....what are your suggestions on how i can sort this massive >database?? > >Thanks! IBM recommends a tempdir space to file size ratio of 4:1. You can change the SPSS tempdir via edit >> options >> file locations and choose a tempdir with more space. 550gb still is much though. Perhaps doing at least some of the work on a database (SQL) would be better. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This post was updated on .
In reply to this post by devoidx
Thank you, however I still dont understand the meaning of my error message with the database is already supposed to be sorted by the ID value.
that "key" line doesn't make sense to me. I suspect I have a special character somewhere that is causing the process to abort halfway but I don't know how to find it |
Administrator
|
Looks like one or more rows of one or more files has a SYSMIS value for the ID (Key) variable.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
is this a problem? there can be no system missing values in the ID variable for match files to complete successfully?
|
Administrator
|
I suspect that nothing pleasant is likely to transpire from such a situation!
--------------------
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I mean considering the MAtch files documentation says:
*Missing values for key variables are handled like any other values I don't see my system missing values as a reason for match files to stop half way through my database especially since all my system missing values are bunched together at the beginning of database |
That statement refers to user missing values,
where there is always a specific value. System missing values should
be thought of as having no specific value. IOW, $sysmis does
not equal $sysmis just as infinity does not equal infinity.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: devoidx <[hidden email]> To: [hidden email], Date: 09/23/2013 06:47 PM Subject: Re: [SPSSX-L] Erorr while matching files Sent by: "SPSSX(r) Discussion" <[hidden email]> I mean considering the MAtch files documentation says: *Missing values for key variables are handled like any other values I don't see my system missing values as a reason for match files to stop half way through my database especially since all my system missing values are bunched together at the beginning of database -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Erorr-while-matching-files-tp5722179p5722190.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
That makes sense so i went ahead and recoded all system missing ones to 999 just to test out and now I get the exact same error and now the problematic key has changed from " ." to "999"... driving me crazy!
|
Depending on where the system missing values are located they might not be properly sorted when using 999 as the user missing value (even if you use ORDER BY on the SQL statement). Are the missing values at the top of the file or the bottom?
Can you recode the system missing values within the SQL statement and use ORDER BY? (Recode in SQL varies by flavor - e.g. CASE, SWITCH.) I presume with such a large database that may be a better option than doing the data manipulation within SPSS. If the values are at the top of the file (presuming they are sorted correctly on all the other MATCH variables), you might be able to use a recode value that you know to be below all other ID variables (e.g. a negative number). If they are at the bottom, use a user missing value that you know is higher than all other IDs. Again, I bet this will take less time to do within the SQL statement than within SPSS (just my guess though). |
This post was updated on .
Thanks yeah my guess is one of these system missing values is not sorted properly...and the database is so large that I don't have the space to re-sort the database..I am thinking of setting the tempdir to my external drive so I can run the sort ....I don't have access to SQL server =[
Can someone confirm that when SPSS runs sort, it is using the temp directory after the size exceeds the memory? |
Free forum by Nabble | Edit this page |