|
It occurred to me to look, while answering another question. The v.14
Command Syntax Reference, article on SAVE, says: >COMPRESSED and UNCOMPRESSED Subcommands >COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the >file in uncompressed form. In a compressed file, small integers >(from 99 to 155) are stored in one byte instead of the eight bytes >that are used in an uncompressed file. Is that all that's done? First off, I hope this applies to integers 0-98 as well; why the dickens not? Second, and more important: Are string variables compressed? Often, a lot of space can be saved by compressing out trailing blanks on string values; and it's not hard to do. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
That was MINUS 99 to 155. Since so much SPSS data is represented by small integer values, this scheme can be quite effective. It won't do anything for typical continuous variables, though. Applying external compression utilities can further reduce the size, although that makes reading and writing take an extra step. Using a compressed file directory as I described in an earlier post is a convenient way to do that.
Strings are not compressed, but the ALTER TYPE command introduced with version 16 can optimize the (fixed) size for string variables based on their contents. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, October 08, 2008 10:51 AM To: [hidden email] Subject: [SPSSX-L] What is "COMPRESSED" when saving a file? It occurred to me to look, while answering another question. The v.14 Command Syntax Reference, article on SAVE, says: >COMPRESSED and UNCOMPRESSED Subcommands >COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the >file in uncompressed form. In a compressed file, small integers >(from 99 to 155) are stored in one byte instead of the eight bytes >that are used in an uncompressed file. Is that all that's done? First off, I hope this applies to integers 0-98 as well; why the dickens not? Second, and more important: Are string variables compressed? Often, a lot of space can be saved by compressing out trailing blanks on string values; and it's not hard to do. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Strings are actually compressed, in a slightly complicated way.
The compressor looks at each case as an array of 8-byte cells. It compresses to one byte any cell that contains an integer value in [-99...152], SYSMIS, or eight blanks. All other cells actually take 9 bytes in the compressed file (the eight bytes of data, plus one byte indicating an uncompressed value). Jonathan Fry -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Wednesday, October 08, 2008 12:34 PM To: [hidden email] Subject: Re: What is "COMPRESSED" when saving a file? That was MINUS 99 to 155. Since so much SPSS data is represented by small integer values, this scheme can be quite effective. It won't do anything for typical continuous variables, though. Applying external compression utilities can further reduce the size, although that makes reading and writing take an extra step. Using a compressed file directory as I described in an earlier post is a convenient way to do that. Strings are not compressed, but the ALTER TYPE command introduced with version 16 can optimize the (fixed) size for string variables based on their contents. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, October 08, 2008 10:51 AM To: [hidden email] Subject: [SPSSX-L] What is "COMPRESSED" when saving a file? It occurred to me to look, while answering another question. The v.14 Command Syntax Reference, article on SAVE, says: >COMPRESSED and UNCOMPRESSED Subcommands >COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the >file in uncompressed form. In a compressed file, small integers >(from 99 to 155) are stored in one byte instead of the eight bytes >that are used in an uncompressed file. Is that all that's done? First off, I hope this applies to integers 0-98 as well; why the dickens not? Second, and more important: Are string variables compressed? Often, a lot of space can be saved by compressing out trailing blanks on string values; and it's not hard to do. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thank you, Jonathan!
At 02:29 PM 10/8/2008, Fry, Jonathan B. wrote: >Strings are actually compressed, in a slightly complicated way. > >The compressor looks at each case as an array of 8-byte cells. It >compresses to one byte any cell that contains an integer value in >[-99...152], SYSMIS, or eight blanks. Ah, thank you! I was surprised when Jon wrote [-99...155], since that would take all 256 possible values of a byte. Range [-99...152] leaves 3 values free; I guess those represent 'system-missing', 'eight blanks', and 'not compressed; eight bytes of data follow.' Richard Oliver, I'm copying you to note the correction, for the Command Syntax Reference and elsewhere: in the articles on SAVE and XSAVE, section "COMPRESSED and UNCOMPRESSED Subcommands", change "small integers (from 99 to 155) are stored in one byte" to "small integers (from -99 to 152) are stored in one byte, as are 8-byte sections of string variables that are entirely blank". (The XSAVE article already has '-99', not '99'.) Or, should compressed files be discussed in section "Universals> Files"? Jon Peck suggested, >the ALTER TYPE command introduced with version 16 can optimize the >(fixed) size for string variables based on their contents. True, but doubtful practice for two reasons: First, blank space in strings tends not to result from over-defining them, but from widely varying lengths of data values for the same variable; Second, using 'ALTER TYPE' to match a variable's length to its maximum value in a particular file, will easily give the same variable different lengths in files with different portions of the same data. Then, they'll be incompatible for ADD FILES, per the well-known SPSS glitch. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
