|
The SAVE command includes the subcommand /COMPRESSED.
Does AGGREGATE support a similar option when writing to a new file? I am seeing file size grow by a factor of 2 ( 30 Mb to 65Mb). For processing speed, I would prefer not to conduct a data pass to aggregate and a second data pass to save. For disk space I would prefer to have the smaller file size. --jim ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 11:17 AM 10/8/2008, Marks, Jim wrote:
>The SAVE command includes the subcommand /COMPRESSED. >Does AGGREGATE support a similar option when writing to a new file? It doesn't look like there is one, nor on other commands (besides SAVE and XSAVE) that take an OUTFILE specification. It's an interesting omission. I thought there might be a COMPRESSED option for FILE HANDLE, which could solve this problem; but there doesn't seem to be. I'd thought that AGGREGATE would write a COMPRESSED file if that's the system default (as it usually is), but no such luck: I just tried an aggregation on my system (using v.14), and the output is not compressed. (To check, use command SYSFILE INFO on the saved file; from the menus, use File> Display Data File Information> External File...) On the other hand, you write, >I am seeing file size grow by a factor of 2 ( 30 Mb to 65Mb). That does surprise me, since aggregated files usually have both fewer variables and fewer cases than the original file. Can you say what your file structure and aggregation logic are? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Richard:
The file has about 100,000 records, of which 10,000 are grouped into pairs that need to be combined-- 90% of the file is unchanged. The aggregate uses multiple break variables, but only two define groups-- id and date. The remaining break variables are constant across id . The aggregating function is /vara avrb ... =SUM(vara varb ...) -- the aggregated file has the same var list, but about 5,000 fewer cases. When I saw the file size doubled, I was afraid of an error, but the data appears to be correct. --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, October 08, 2008 12:15 PM To: [hidden email] Subject: Re: File Size using AGGREGATE OUTFILE = 'c:\tmp.sav' At 11:17 AM 10/8/2008, Marks, Jim wrote: >The SAVE command includes the subcommand /COMPRESSED. >Does AGGREGATE support a similar option when writing to a new file? It doesn't look like there is one, nor on other commands (besides SAVE and XSAVE) that take an OUTFILE specification. It's an interesting omission. I thought there might be a COMPRESSED option for FILE HANDLE, which could solve this problem; but there doesn't seem to be. I'd thought that AGGREGATE would write a COMPRESSED file if that's the system default (as it usually is), but no such luck: I just tried an aggregation on my system (using v.14), and the output is not compressed. (To check, use command SYSFILE INFO on the saved file; from the menus, use File> Display Data File Information> External File...) On the other hand, you write, >I am seeing file size grow by a factor of 2 ( 30 Mb to 65Mb). That does surprise me, since aggregated files usually have both fewer variables and fewer cases than the original file. Can you say what your file structure and aggregation logic are? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 01:41 PM 10/8/2008, Marks, Jim wrote:
>The aggregate uses multiple break variables, but only two define >groups-- id and date. The remaining break variables are constant across id . Curious: Then, why do you have them on the 'break' list? >The file has about 100,000 records, of which 10,000 are grouped into >pairs that need to be combined-- 90% of the file is unchanged. And, 90 variables or thereabouts? >The aggregating function is > /vara avrb ... =SUM(vara varb ...) > >-- the aggregated file has the same var list, but about 5,000 fewer cases. > >When I saw the file size doubled, I was afraid of an error, but the >data appears to be correct. From information in the concurrent thread 'What is "COMPRESSED" when saving a file?', what you're seeing could well arise if many of your variables are numeric having small integer values. But, earlier, you wrote, >For processing speed, I would prefer not to conduct a data pass to >aggregate and a second data pass to save. See whether AGGREGATE OUTFILE='c:\tmp.sav' /BREAK=... /... is actually much faster than AGGREGATE OUTFILE=* /BREAK=... /... SAVE OUTFILE='c:\tmp.sav'/COMPRESSED. I don't think the second form forces an additional data pass, at least in modern releases of SPSS (since, I think, about 12.5). ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
