Dear list members, I keep daily backups of changes in a database. Everyday a new .sav file is saved. Normally these files are 300kB. But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases). There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15. There are no differences in data dictionary from day to day. And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data? Sincerely, Eero Olli ________________________________________ Eero Olli Senior Advisor the Equality and Anti-discrimination Ombud [hidden email] +47 2315 7344 POB 8048 Dep, N-0031 Oslo, Norway |
Have you tried to Save As, then choose the SPSS portable format (.por) and finally reopen the .por file?
Cheers, Luca Mr. Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0
Mr. Luca Meyer
www.lucameyer.com |
In reply to this post by Eero Olli
Hi,
That's odd indeed. You could try (1) adding '/ compressed' behind the SAVE command (but I doubt that will have the desired effect) (2) save into another format e.g. .por or .csv. You'd have to use APPLY DICTIONARY when you save as csv. Cheers!!Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Eero Olli <[hidden email]> To: [hidden email] Sent: Tue, January 18, 2011 10:08:03 PM Subject: [SPSSX-L] How to get rid of 45MB extra in a datafile Dear list members,
I keep daily backups of changes in a database. Everyday a new .sav file is saved. Normally these files are 300kB. But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases). There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15.
There are no differences in data dictionary from day to day.
And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data?
Sincerely,
Eero Olli
________________________________________ Eero Olli Senior Advisor the Equality and Anti-discrimination Ombud [hidden email] +47 2315 7344 POB 8048 Dep, N-0031 Oslo, Norway
|
In reply to this post by Eero Olli
Dear list members, Thanks for your suggestions. A little more searching revealed that this error was cumulative in nature. ADD FILES caused the error to be copied into a new datafile, and since I do a comparison between today and past through ADD FILES the error became multiplied everyday… I suspect that there was some kind of bug in .sav structure that was copied from file to file (it must be in the file structure, as deleting all variables did not reduce file size). Weird, but now the problem is taken care of. I solved the problem by · Exporting the log file to .csv. (only text is saved) · Opening the .csv file. · Creating a fix for erroneous data structure (the data were not neatly in four columns, but in five. Sometimes the two date columns were empty and in these cases there was new fifth column with a date, which I could copy back to where it belonged). · Saved the fixed file as .sav. Repeat this procedure for each logfile. Now each file was back in its proper 300kB size. Sincerely, Eero Olli Fra: Eero Olli Dear list members, I keep daily backups of changes in a database. Everyday a new .sav file is saved. Normally these files are 300kB. But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases). There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15. There are no differences in data dictionary from day to day. And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data? Sincerely, Eero Olli ________________________________________ Eero Olli Senior Advisor the Equality and Anti-discrimination Ombud [hidden email] +47 2315 7344 POB 8048 Dep, N-0031 Oslo, Norway |
When you use ADD FILES, any documents in
the files being added become part of the new sav file. Over time
this can add up, so this might be the reason for the growth you saw in
the file size. You can use DROP DOCUMENTS to eliminate all of these
and shrink the file down.
HTH, Jon Peck Senior Software Engineer, IBM [hidden email] 312-651-3435 From: Eero Olli <[hidden email]> To: [hidden email] Date: 01/20/2011 05:07 AM Subject: Re: [SPSSX-L] How to get rid of 45MB extra in a datafile Sent by: "SPSSX(r) Discussion" <[hidden email]> Dear list members, Thanks for your suggestions. A little more searching revealed that this error was cumulative in nature. ADD FILES caused the error to be copied into a new datafile, and since I do a comparison between today and past through ADD FILES the error became multiplied everyday… I suspect that there was some kind of bug in .sav structure that was copied from file to file (it must be in the file structure, as deleting all variables did not reduce file size). Weird, but now the problem is taken care of. I solved the problem by · Exporting the log file to .csv. (only text is saved) · Opening the .csv file. · Creating a fix for erroneous data structure (the data were not neatly in four columns, but in five. Sometimes the two date columns were empty and in these cases there was new fifth column with a date, which I could copy back to where it belonged). · Saved the fixed file as .sav. Repeat this procedure for each logfile. Now each file was back in its proper 300kB size. Sincerely, Eero Olli Fra: Eero Olli Sendt: 18. januar 2011 22:08 Til: 'SPSSX(r) Discussion' Emne: How to get rid of 45MB extra in a datafile Dear list members, I keep daily backups of changes in a database. Everyday a new .sav file is saved. Normally these files are 300kB. But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases). There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15. There are no differences in data dictionary from day to day. And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data? Sincerely, Eero Olli ________________________________________ Eero Olli Senior Advisor the Equality and Anti-discrimination Ombud eero.olli@... +47 2315 7344 POB 8048 Dep, N-0031 Oslo, Norway |
Free forum by Nabble | Edit this page |