V21 has z-compression as an option and the manual says it may offer advantages with large files under certain configurations. So I thought I would give it a go:
I run SPSS on a i5 machine under Win XP with 4GB Ram and a 7200rpm drive: a fairly typical recent desktop PC in a corporate environment. SPSS is installed locally along with the data and temporary file space. I have five files with an average of 143.2 million cases each comprised of 17 numerical and string variables. As uncompressed .sav files they average 42.4Gb, under regular compression 31.3Gb, and with z-compression 13.6Gb. To parse and run default descriptives on one numerical variable the mean processor time went up from 1.1 minutes to 1.6 minutes from uncompressed to compressed, and then up to 4.5 minutes with the z-compressed files. Not surprising given the extra processing needed to unpack the heavily compressed z-files. The mean actual time to do the job went down from 12.9 mins (uncompressed) to 7.8 mins for compressed files and back up to 9.0 mins for the z-compressed files. So with my setup it looks as if z-compression, which gives a pretty impressive performance for storage, is not worth it given the time penalty it imposes overall. After all storage is pretty cheap these days. When my masters give me the workstation I deserve I will revisit this to see if z-compression is of benefit then |
>V21 has z-compression as an option and the manual says it may offer
>advantages with large files under certain configurations. So I thought I >would give it a go: > >I run SPSS on a i5 machine under Win XP with 4GB Ram and a 7200rpm drive: a >fairly typical recent desktop PC in a corporate environment. SPSS is >installed locally along with the data and temporary file space. > >I have five files with an average of 143.2 million cases each comprised of >17 numerical and string variables. As uncompressed .sav files they average >42.4Gb, under regular compression 31.3Gb, and with z-compression 13.6Gb. > >To parse and run default descriptives on one numerical variable the mean >processor time went up from 1.1 minutes to 1.6 minutes from uncompressed to >compressed, and then up to 4.5 minutes with the z-compressed files. Not >surprising given the extra processing needed to unpack the heavily >compressed z-files. > >The mean actual time to do the job went down from 12.9 mins (uncompressed) >to 7.8 mins for compressed files and back up to 9.0 mins for the >z-compressed files. So with my setup it looks as if z-compression, which >gives a pretty impressive performance for storage, is not worth it given the >time penalty it imposes overall. After all storage is pretty cheap these >days. > >When my masters give me the workstation I deserve I will revisit this to see >if z-compression is of benefit then Thanks for sharing this very interesting information! May I ask what CPU was in the computer you used? I'd think that the penalty is less severe with e.g. a quadcore as decompression may be done on a different core. And the files were stored locally? On a slow network the result may look differently. This week I tested whether zcompressed files could be opened with R v2.12. Neither the memisc, nor the Hmisc/foreign package could handle this new format. I believe these packages all use PSPP code under the hood. So that's, at least temporarily, a disadvantage of zsav. I always have the feeling that PSPP development is rather slow (maintained by one person?), so let's see how long it will take. It would also be useful if earlier versions of spss could also read zsav by e.g. installing some extra software, akin to docx/doc in earlier versions of MS Word. Regards, Albert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by JeremyT
ps: standard compression only 'works' with small integers, so YMMV depending on your data characteristics.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Albert-Jan Roskam
I'm sorry I can't reply to you just now, I will be out of the office Monday November 5th.
I can be reached on my cell 519 765 5619. Thank you Ramona Morris R&E Ontario Police College ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |