Hi all,
I have been unable to get SPSS to complete an aggregate command on a reasonably large data file. The data file comprises ~70 million cases/records for ~350,000 individuals, with about 12 variables. When I run the aggregate command I get the following error message: There is memory for only 139313 cases in the aggregated file. >Error # 10963 >There is not enough memory for all the cases in the aggregated file. The >aggregated file is missing some cases. Rerun with more memory. >This command not executed. I have sorted the file to be aggregated by the break variable. In addition, I have included the SET WORKSPACE = 2097151 command in my syntax to attempt to boost the memory allocation of SPSS (I am using SPSS v14.0). I have tried running the command both on my computer (P4 3.8Ghz processor, 300Gb hard drive with ~160GB of free space and 3GB of RAM) and from our departmental server. I have also changed the swap settings on my computer to allow all free hard drive space to be used for extra memory and I do not run any other applications while SPSS is processing. However, after many hours of processing the analysis terminates and gives me the above error message. Also, it may be worth noting that all other SPSS commands seem to work without any problem on the large data file and that it is only the aggregate command which is proving problematic. Does anyone know if there is anything that can be done to circumvent this problem or is SPSS not capable of running aggregate commands on data files of this size? At a pinch, I can segment the file into a number of much smaller data files, or alternatively transfer the file to SAS to use the proc summary command which seems to work ok. However, this has proved to be a little bit of a hassle and I was hoping that there was a more efficient way of overcoming the problem. If anyone has information on this issue it would be greatly appreciated. Thanks. Kind regards, David |
At 12:23 AM 3/1/2007, David Preen wrote:
>I have been unable to get SPSS to complete an aggregate command on a >reasonably large data file. The data file comprises ~70 million >cases/records for ~350,000 individuals, with about 12 variables. When >I run the aggregate command I get the following error message: > >There is memory for only 139313 cases in the aggregated file. > >>Error # 10963 >>There is not enough memory for all the cases in the aggregated file. >>The aggregated file is missing some cases. Rerun with more memory. >>This command not executed. > >I have sorted the file to be aggregated by the break variable. That's it. However, you must also specify the /PRESORTED subcommand for AGGREGATE. I was surprised as the dickens when I learned this, but AGGREGATE, by default, builds all the output cases in memory. I understand from SPSS, Inc., that letting it do that, is significantly faster than sorting and then using /PRESORTED. However, if the cases are in memory, they take up memory. If you have a great many break groups (number of input cases doesn't matter), they can fill available memory and give you what you're seeing. You got "there is memory for only 139,313 cases". That surprises me. I once ran with over a million (output) cases, because I'd meant to specify /PRESORTED but had forgotten. On a good machine, but less capable than yours (1GB main memory), it ran painfully slowly, but it did run. It partly depends on how many variables in the OUTPUT records, since those are what are built in memory. And a few AGGREGATE functions - MEDIAN, anyway - keep information from all *input* cases in memory, and may take much more. /PRESORTED will help with those, as well. -Good luck, Richard |
In reply to this post by David Preen
How many variables are you aggregating by (break vars)? I have run into
a similar situation with large datasets, which resolved when pared down to bare minimum breaks (My machine is p3.2ghz, 160gbhdd, 2gb RAM), and let the program do the sorting, etc. If you absolutely have to have some variables, you can possibly merge them back in after aggregation, depending upon characteristics. HTH Mike -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Preen Sent: Thursday, March 01, 2007 12:24 AM To: [hidden email] Subject: Re: SPSS memory problems with aggregate command Hi all, I have been unable to get SPSS to complete an aggregate command on a reasonably large data file. The data file comprises ~70 million cases/records for ~350,000 individuals, with about 12 variables. When I run the aggregate command I get the following error message: There is memory for only 139313 cases in the aggregated file. >Error # 10963 >There is not enough memory for all the cases in the aggregated file. The >aggregated file is missing some cases. Rerun with more memory. >This command not executed. I have sorted the file to be aggregated by the break variable. In addition, I have included the SET WORKSPACE = 2097151 command in my syntax to attempt to boost the memory allocation of SPSS (I am using SPSS v14.0). I have tried running the command both on my computer (P4 3.8Ghz processor, 300Gb hard drive with ~160GB of free space and 3GB of RAM) and from our departmental server. I have also changed the swap settings on my computer to allow all free hard drive space to be used for extra memory and I do not run any other applications while SPSS is processing. However, after many hours of processing the analysis terminates and gives me the above error message. Also, it may be worth noting that all other SPSS commands seem to work without any problem on the large data file and that it is only the aggregate command which is proving problematic. Does anyone know if there is anything that can be done to circumvent this problem or is SPSS not capable of running aggregate commands on data files of this size? At a pinch, I can segment the file into a number of much smaller data files, or alternatively transfer the file to SAS to use the proc summary command which seems to work ok. However, this has proved to be a little bit of a hassle and I was hoping that there was a more efficient way of overcoming the problem. If anyone has information on this issue it would be greatly appreciated. Thanks. Kind regards, David |
In reply to this post by David Preen
I have a similar problem where sometimes I get a similar error message, and
other times I get an output file where some of the aggregated data, after a few million records have been processed, are simply not written out, so I get partial output. Is there a way around this? Is the solution simply throwing more memory at the problem? Is there a way to make SPSS write the output cases to disk instead of memory.... -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Richard Ristow Sent: Thursday, March 01, 2007 1:10 AM To: [hidden email] Subject: Re: SPSS memory problems with aggregate command At 12:23 AM 3/1/2007, David Preen wrote: >I have been unable to get SPSS to complete an aggregate command on a >reasonably large data file. The data file comprises ~70 million >cases/records for ~350,000 individuals, with about 12 variables. When >I run the aggregate command I get the following error message: > >There is memory for only 139313 cases in the aggregated file. > >>Error # 10963 >>There is not enough memory for all the cases in the aggregated file. >>The aggregated file is missing some cases. Rerun with more memory. >>This command not executed. > >I have sorted the file to be aggregated by the break variable. That's it. However, you must also specify the /PRESORTED subcommand for AGGREGATE. I was surprised as the dickens when I learned this, but AGGREGATE, by default, builds all the output cases in memory. I understand from SPSS, Inc., that letting it do that, is significantly faster than sorting and then using /PRESORTED. However, if the cases are in memory, they take up memory. If you have a great many break groups (number of input cases doesn't matter), they can fill available memory and give you what you're seeing. You got "there is memory for only 139,313 cases". That surprises me. I once ran with over a million (output) cases, because I'd meant to specify /PRESORTED but had forgotten. On a good machine, but less capable than yours (1GB main memory), it ran painfully slowly, but it did run. It partly depends on how many variables in the OUTPUT records, since those are what are built in memory. And a few AGGREGATE functions - MEDIAN, anyway - keep information from all *input* cases in memory, and may take much more. /PRESORTED will help with those, as well. -Good luck, Richard |
Besides memory, one useful trick is having the file sorted according
to the aggregating variables, and using the /PRESORTED subcommand in the AGGREGATE command. This speeds things up considerably. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Egon Kraan Enviado el: 01 March 2007 17:35 Para: [hidden email] Asunto: Re: SPSS memory problems with aggregate command I have a similar problem where sometimes I get a similar error message, and other times I get an output file where some of the aggregated data, after a few million records have been processed, are simply not written out, so I get partial output. Is there a way around this? Is the solution simply throwing more memory at the problem? Is there a way to make SPSS write the output cases to disk instead of memory.... -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Richard Ristow Sent: Thursday, March 01, 2007 1:10 AM To: [hidden email] Subject: Re: SPSS memory problems with aggregate command At 12:23 AM 3/1/2007, David Preen wrote: >I have been unable to get SPSS to complete an aggregate command on a >reasonably large data file. The data file comprises ~70 million >cases/records for ~350,000 individuals, with about 12 variables. When >I run the aggregate command I get the following error message: > >There is memory for only 139313 cases in the aggregated file. > >>Error # 10963 >>There is not enough memory for all the cases in the aggregated file. >>The aggregated file is missing some cases. Rerun with more memory. >>This command not executed. > >I have sorted the file to be aggregated by the break variable. That's it. However, you must also specify the /PRESORTED subcommand for AGGREGATE. I was surprised as the dickens when I learned this, but AGGREGATE, by default, builds all the output cases in memory. I understand from SPSS, Inc., that letting it do that, is significantly faster than sorting and then using /PRESORTED. However, if the cases are in memory, they take up memory. If you have a great many break groups (number of input cases doesn't matter), they can fill available memory and give you what you're seeing. You got "there is memory for only 139,313 cases". That surprises me. I once ran with over a million (output) cases, because I'd meant to specify /PRESORTED but had forgotten. On a good machine, but less capable than yours (1GB main memory), it ran painfully slowly, but it did run. It partly depends on how many variables in the OUTPUT records, since those are what are built in memory. And a few AGGREGATE functions - MEDIAN, anyway - keep information from all *input* cases in memory, and may take much more. /PRESORTED will help with those, as well. -Good luck, Richard |
In reply to this post by Egon Kraan
At 11:35 AM 3/1/2007, Egon Kraan wrote:
>Sometimes I get a similar error message, and other times I get an >output file where some of the aggregated data, after a few million >records have been processed, are simply not written out, so I get >partial output. > >Is there a way around this? Is the solution simply throwing more >memory at the problem? Is there a way to make SPSS write the output >cases to disk instead of memory.... Yes, there is: sort the cases by the break variables, and specify subcommand /PRESORTED on AGGREGATE. I think I'd said that, as Hector Maletta also just did. Can you visualize how AGGREGATE is going to operate, with and without /PRESORTED? By the way, I believe the only recommended use of /PRESORTED is when you have very many break categories - hundreds of thousands. |
Free forum by Nabble | Edit this page |