What is "COMPRESSED" when saving a file?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

What is "COMPRESSED" when saving a file?

Richard Ristow
It occurred to me to look, while answering another question. The v.14
Command Syntax Reference, article on SAVE, says:

>COMPRESSED and UNCOMPRESSED Subcommands
>COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the
>file in uncompressed form. In a compressed file, small integers
>(from 99 to 155) are stored in one byte instead of the eight bytes
>that are used in an uncompressed file.

Is that all that's done?

First off, I hope this applies to integers 0-98 as well; why the dickens not?

Second, and more important: Are string variables compressed? Often, a
lot of space can be saved by compressing out trailing blanks on
string values; and it's not hard to do.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: What is "COMPRESSED" when saving a file?

Peck, Jon
That was MINUS 99 to 155.  Since so much SPSS data is represented by small integer values, this scheme can be quite effective.  It won't do anything for typical continuous variables, though.  Applying external compression utilities can further reduce the size, although that makes reading and writing take an extra step.  Using a compressed file directory as I described in an earlier post is a convenient way to do that.

Strings are not compressed, but the ALTER TYPE command introduced with version 16 can optimize the (fixed) size for string variables based on their contents.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, October 08, 2008 10:51 AM
To: [hidden email]
Subject: [SPSSX-L] What is "COMPRESSED" when saving a file?

It occurred to me to look, while answering another question. The v.14
Command Syntax Reference, article on SAVE, says:

>COMPRESSED and UNCOMPRESSED Subcommands
>COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the
>file in uncompressed form. In a compressed file, small integers
>(from 99 to 155) are stored in one byte instead of the eight bytes
>that are used in an uncompressed file.

Is that all that's done?

First off, I hope this applies to integers 0-98 as well; why the dickens not?

Second, and more important: Are string variables compressed? Often, a
lot of space can be saved by compressing out trailing blanks on
string values; and it's not hard to do.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: What is "COMPRESSED" when saving a file?

Fry, Jonathan B.
Strings are actually compressed, in a slightly complicated way.

The compressor looks at each case as an array of 8-byte cells.  It compresses to one byte any cell that contains an integer value in [-99...152], SYSMIS, or eight blanks.  All other cells actually take 9 bytes in the compressed file (the eight bytes of data, plus one byte indicating an uncompressed value).

Jonathan Fry

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon
Sent: Wednesday, October 08, 2008 12:34 PM
To: [hidden email]
Subject: Re: What is "COMPRESSED" when saving a file?

That was MINUS 99 to 155.  Since so much SPSS data is represented by small integer values, this scheme can be quite effective.  It won't do anything for typical continuous variables, though.  Applying external compression utilities can further reduce the size, although that makes reading and writing take an extra step.  Using a compressed file directory as I described in an earlier post is a convenient way to do that.

Strings are not compressed, but the ALTER TYPE command introduced with version 16 can optimize the (fixed) size for string variables based on their contents.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, October 08, 2008 10:51 AM
To: [hidden email]
Subject: [SPSSX-L] What is "COMPRESSED" when saving a file?

It occurred to me to look, while answering another question. The v.14
Command Syntax Reference, article on SAVE, says:

>COMPRESSED and UNCOMPRESSED Subcommands
>COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the
>file in uncompressed form. In a compressed file, small integers
>(from 99 to 155) are stored in one byte instead of the eight bytes
>that are used in an uncompressed file.

Is that all that's done?

First off, I hope this applies to integers 0-98 as well; why the dickens not?

Second, and more important: Are string variables compressed? Often, a
lot of space can be saved by compressing out trailing blanks on
string values; and it's not hard to do.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: What is "COMPRESSED" when saving a file?

Richard Ristow
Thank you, Jonathan!

At 02:29 PM 10/8/2008, Fry, Jonathan B. wrote:

>Strings are actually compressed, in a slightly complicated way.
>
>The compressor looks at each case as an array of 8-byte cells.  It
>compresses to one byte any cell that contains an integer value in
>[-99...152], SYSMIS, or eight blanks.

Ah, thank you! I was surprised when Jon wrote [-99...155], since that
would take all 256 possible values of a byte. Range [-99...152]
leaves 3 values free; I guess those represent 'system-missing',
'eight blanks', and 'not compressed; eight bytes of data follow.'

Richard Oliver, I'm copying you to note the correction, for the
Command Syntax Reference and elsewhere: in the articles on SAVE and
XSAVE, section "COMPRESSED and UNCOMPRESSED Subcommands", change
"small integers (from 99 to 155) are stored in one byte" to "small
integers (from -99 to 152) are stored in one byte, as are 8-byte
sections of string variables that are entirely blank". (The XSAVE
article already has '-99', not '99'.) Or, should compressed files be
discussed in section "Universals> Files"?

Jon Peck suggested,

>the ALTER TYPE command introduced with version 16 can optimize the
>(fixed) size for string variables based on their contents.

True, but doubtful practice for two reasons:

First, blank space in strings tends not to result from over-defining
them, but from widely varying lengths of data values for the same variable;

Second, using 'ALTER TYPE' to match a variable's length to its
maximum value in a particular file, will easily give the same
variable different lengths in files with different portions of the
same data. Then, they'll be incompatible for ADD FILES, per the
well-known SPSS glitch.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD