How to get rid of 45MB extra in a datafile

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get rid of 45MB extra in a datafile

Eero Olli

Dear list members,

 

I keep daily backups of changes in a database.  Everyday a new .sav file is saved.  Normally these files are 300kB.  But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases).  There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15.

 

There are no differences in data dictionary from day to day.

 

And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data?

 

Sincerely,

 

Eero Olli

 

 

________________________________________

Eero Olli

Senior Advisor

the Equality and Anti-discrimination Ombud

[hidden email]                  +47 2315 7344

POB 8048 Dep,     N-0031 Oslo,     Norway

 

 

Reply | Threaded
Open this post in threaded view
|

Re: How to get rid of 45MB extra in a datafile

lucameyer
Have you tried to Save As, then choose the SPSS portable format (.por) and finally reopen the .por file?

Cheers,
Luca

Mr. Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

Mr. Luca Meyer
www.lucameyer.com
Reply | Threaded
Open this post in threaded view
|

Re: How to get rid of 45MB extra in a datafile

Albert-Jan Roskam
In reply to this post by Eero Olli
Hi,
 
That's odd indeed. You could try (1) adding '/ compressed' behind the SAVE command (but I doubt that will have the desired effect) (2) save into another format e.g. .por or .csv. You'd have to use APPLY DICTIONARY when you save as csv.
 
Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Eero Olli <[hidden email]>
To: [hidden email]
Sent: Tue, January 18, 2011 10:08:03 PM
Subject: [SPSSX-L] How to get rid of 45MB extra in a datafile

Dear list members,

 

I keep daily backups of changes in a database.  Everyday a new .sav file is saved.  Normally these files are 300kB.  But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases).  There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15.

 

There are no differences in data dictionary from day to day.

 

And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data?

 

Sincerely,

 

Eero Olli

 

 

________________________________________

Eero Olli

Senior Advisor

the Equality and Anti-discrimination Ombud

[hidden email]                  +47 2315 7344

POB 8048 Dep,     N-0031 Oslo,     Norway

 

 


Reply | Threaded
Open this post in threaded view
|

Re: How to get rid of 45MB extra in a datafile

Eero Olli
In reply to this post by Eero Olli

Dear list members,

 

Thanks for your suggestions.  A little more searching revealed that this error was cumulative in nature.  ADD FILES caused the error to be copied into a new datafile, and since I do a comparison between today and past through ADD FILES the error became multiplied everyday…  I suspect that there was some kind of bug in .sav structure that was copied from file to file (it must be in the file structure, as deleting all variables did not reduce file size).

 

Weird, but now the problem is taken care of.  I solved the problem by

 

·         Exporting the log file to .csv. (only text is saved)

·         Opening the .csv file.

·         Creating a fix for erroneous data structure (the data were not neatly in four columns, but in five. Sometimes the two date columns were empty and in these cases there was new fifth column with a date, which I could copy back to where it belonged).

·         Saved the fixed file as .sav.

 

Repeat this procedure for each logfile.  Now each file was back in its proper 300kB size.

 

Sincerely,

Eero Olli

 

 

 

Fra: Eero Olli
Sendt: 18. januar 2011 22:08
Til: 'SPSSX(r) Discussion'
Emne: How to get rid of 45MB extra in a datafile

 

Dear list members,

 

I keep daily backups of changes in a database.  Everyday a new .sav file is saved.  Normally these files are 300kB.  But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases).  There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15.

 

There are no differences in data dictionary from day to day.

 

And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data?

 

Sincerely,

 

Eero Olli

 

 

________________________________________

Eero Olli

Senior Advisor

the Equality and Anti-discrimination Ombud

[hidden email]                  +47 2315 7344

POB 8048 Dep,     N-0031 Oslo,     Norway

 

 

Reply | Threaded
Open this post in threaded view
|

Re: How to get rid of 45MB extra in a datafile

Jon K Peck
When you use ADD FILES, any documents in the files being added become part of the new sav file.  Over time this can add up, so this might be the reason for the growth you saw in the file size.  You can use DROP DOCUMENTS to eliminate all of these and shrink the file down.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        Eero Olli <[hidden email]>
To:        [hidden email]
Date:        01/20/2011 05:07 AM
Subject:        Re: [SPSSX-L] How to get rid of 45MB extra in a datafile
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear list members,
 
Thanks for your suggestions.  A little more searching revealed that this error was cumulative in nature.  ADD FILES caused the error to be copied into a new datafile, and since I do a comparison between today and past through ADD FILES the error became multiplied everyday…  I suspect that there was some kind of bug in .sav structure that was copied from file to file (it must be in the file structure, as deleting all variables did not reduce file size).
 
Weird, but now the problem is taken care of.  I solved the problem by
 
·         Exporting the log file to .csv. (only text is saved)
·         Opening the .csv file.
·         Creating a fix for erroneous data structure (the data were not neatly in four columns, but in five. Sometimes the two date columns were empty and in these cases there was new fifth column with a date, which I could copy back to where it belonged).
·         Saved the fixed file as .sav.
 
Repeat this procedure for each logfile.  Now each file was back in its proper 300kB size.
 
Sincerely,
Eero Olli
 
 
 
Fra: Eero Olli
Sendt:
18. januar 2011 22:08
Til:
'SPSSX(r) Discussion'
Emne:
How to get rid of 45MB extra in a datafile

 
Dear list members,
 
I keep daily backups of changes in a database.  Everyday a new .sav file is saved.  Normally these files are 300kB.  But I have a few of these that are 45MB, even if the content matches the 300kB file (two STRING(A8) variables and two dates spread on 11000 cases).  There is no apparent reason for the file growth. All files are compressed .sav files from SPSS v15.
 
There are no differences in data dictionary from day to day.
 
And the mystery gets deeper. If I compute a new variable: dummy = 1, delete all other variables and resave the file with a new name. File size is still 45MB. Any suggestions for how can I get rid of these 45MB of garbage, and keep the data?
 
Sincerely,
 
Eero Olli
 
 
________________________________________
Eero Olli
Senior Advisor
the Equality and Anti-discrimination Ombud
eero.olli@...                  +47 2315 7344
POB 8048 Dep,     N-0031 Oslo,     Norway