to compress or z-compress?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

to compress or z-compress?

JeremyT
V21 has z-compression as an option and the manual says it may offer advantages with large files under certain configurations. So I thought I would give it a go:

I run SPSS on a i5 machine under Win XP with 4GB Ram and a 7200rpm drive: a fairly typical recent desktop PC in a corporate environment.  SPSS is installed locally along with the data and temporary file space.

I have five files with an average of 143.2 million cases each comprised of 17 numerical and string variables.  As uncompressed .sav files they average 42.4Gb, under regular compression 31.3Gb, and with z-compression 13.6Gb.  

To parse and run default descriptives on one numerical variable the mean processor time went up from 1.1 minutes to 1.6 minutes from uncompressed to compressed, and then up to 4.5 minutes with the z-compressed files. Not surprising given the extra processing needed to unpack the heavily compressed z-files.

The mean actual time to do the job went down from 12.9 mins (uncompressed) to 7.8 mins for compressed files and back up to 9.0 mins for the z-compressed files. So with my setup it looks as if z-compression, which gives a pretty impressive performance for storage, is not worth it given the time penalty it imposes overall. After all storage is pretty cheap these days.

When my masters give me the workstation I deserve I will revisit this to see if z-compression is of benefit then
Reply | Threaded
Open this post in threaded view
|

Re: to compress or z-compress?

Albert-Jan Roskam
>V21 has z-compression as an option and the manual says it may offer

>advantages with large files under certain configurations. So I thought I
>would give it a go:
>
>I run SPSS on a i5 machine under Win XP with 4GB Ram and a 7200rpm drive: a
>fairly typical recent desktop PC in a corporate environment.  SPSS is
>installed locally along with the data and temporary file space.
>
>I have five files with an average of 143.2 million cases each comprised of
>17 numerical and string variables.  As uncompressed .sav files they average
>42.4Gb, under regular compression 31.3Gb, and with z-compression 13.6Gb.
>
>To parse and run default descriptives on one numerical variable the mean
>processor time went up from 1.1 minutes to 1.6 minutes from uncompressed to
>compressed, and then up to 4.5 minutes with the z-compressed files. Not
>surprising given the extra processing needed to unpack the heavily
>compressed z-files.
>
>The mean actual time to do the job went down from 12.9 mins (uncompressed)
>to 7.8 mins for compressed files and back up to 9.0 mins for the
>z-compressed files. So with my setup it looks as if z-compression, which
>gives a pretty impressive performance for storage, is not worth it given the
>time penalty it imposes overall. After all storage is pretty cheap these
>days.
>
>When my masters give me the workstation I deserve I will revisit this to see
>if z-compression is of benefit then

Thanks for sharing this very interesting information! May I ask what CPU was in the computer you used?
I'd think that the penalty is less severe with e.g. a quadcore as decompression may be done on a different core.
And the files were stored locally? On a slow network the result may look differently.


This week I tested whether zcompressed files could be opened with R v2.12. Neither the memisc, nor the Hmisc/foreign
package could handle this new format. I believe these packages all use PSPP code under the hood. So that's, at least
temporarily, a disadvantage of zsav. I always have the feeling that PSPP development is rather slow (maintained by one person?),
so let's see how long it will take. It would also be useful if earlier versions of spss could also read zsav by e.g. installing some extra software,
akin to docx/doc in earlier versions of MS Word.

Regards,
Albert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: to compress or z-compress?

Albert-Jan Roskam
In reply to this post by JeremyT
ps: standard compression only 'works' with small integers, so YMMV depending on your data characteristics.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Automatic reply: to compress or z-compress?

Morris, Ramona (MCSCS)
In reply to this post by Albert-Jan Roskam
I'm sorry I can't reply to you just now, I will be out of the office Monday November 5th.
I can be reached on my cell 519 765 5619.


Thank you
Ramona Morris
R&E Ontario Police College

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD