Re: SPSS memory problems with aggregate command

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: SPSS memory problems with aggregate command

Egon Kraan
I am still running into problems despite using /presorted

About 2 hours into running the syntax, I get a windows message that the
virtual memory is running low and windows will increase the size.

I have 1gig of RAM and the swap file is set at max at 2gig.


Is there a way I can figure out just how much memory I may need for a
specific AGGREGATE procedure?  That is, is there a specific amount of memory
needed for a cell? For example, how much memory would I need to aggregate
800 cases with 400,000 variables?

Any is appreciated.

Thanks

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Thursday, March 01, 2007 2:45 PM
To: Egon Kraan; [hidden email]
Subject: Re: SPSS memory problems with aggregate command


At 11:35 AM 3/1/2007, Egon Kraan wrote:

>Sometimes I get a similar error message, and other times I get an
>output file where some of the aggregated data, after a few million
>records have been processed, are simply not written out, so I get
>partial output.
>
>Is there a way around this?  Is the solution simply throwing more
>memory at the problem?  Is there a way to make SPSS write the output
>cases to disk instead of memory....

Yes, there is: sort the cases by the break variables, and specify
subcommand /PRESORTED on AGGREGATE. I think I'd said that, as Hector
Maletta also just did. Can you visualize how AGGREGATE is going to
operate, with and without /PRESORTED?

By the way, I believe the only recommended use of /PRESORTED is when
you have very many break categories - hundreds of thousands.
Reply | Threaded
Open this post in threaded view
|

Re: SPSS memory problems with aggregate command

Simon Phillip Freidin
Hi Egon

400,000 keys will be an issue. (Let us know if you meant 400,000
cases on 800 vars which would be the more usual situation).
There is probably another way to restructure your data than
aggregate, or some other software solution.

Please post a description or example of your data now (ie data for 5
cases of the first ten of the 400,000 vars and a description of the
the other 399,990; or conversely the first ten of the 800 vars and a
description of the other 790), and how you want it to look after the
'aggregate'. Post the aggregate command you tried as well.

cheers
Simon

On 09/03/2007, at 8:09 AM, Egon Kraan wrote:

> I am still running into problems despite using /presorted
>
> About 2 hours into running the syntax, I get a windows message that
> the
> virtual memory is running low and windows will increase the size.
>
> I have 1gig of RAM and the swap file is set at max at 2gig.
>
>
> Is there a way I can figure out just how much memory I may need for a
> specific AGGREGATE procedure?  That is, is there a specific amount
> of memory
> needed for a cell? For example, how much memory would I need to
> aggregate
> 800 cases with 400,000 variables?
>
> Any is appreciated.
>
> Thanks
>
> -----Original Message-----
> From: Richard Ristow [mailto:[hidden email]]
> Sent: Thursday, March 01, 2007 2:45 PM
> To: Egon Kraan; [hidden email]
> Subject: Re: SPSS memory problems with aggregate command
>
>
> At 11:35 AM 3/1/2007, Egon Kraan wrote:
>
>> Sometimes I get a similar error message, and other times I get an
>> output file where some of the aggregated data, after a few million
>> records have been processed, are simply not written out, so I get
>> partial output.
>>
>> Is there a way around this?  Is the solution simply throwing more
>> memory at the problem?  Is there a way to make SPSS write the output
>> cases to disk instead of memory....
>
> Yes, there is: sort the cases by the break variables, and specify
> subcommand /PRESORTED on AGGREGATE. I think I'd said that, as Hector
> Maletta also just did. Can you visualize how AGGREGATE is going to
> operate, with and without /PRESORTED?
>
> By the way, I believe the only recommended use of /PRESORTED is when
> you have very many break categories - hundreds of thousands.
Reply | Threaded
Open this post in threaded view
|

Re: SPSS memory problems with aggregate command

Richard Ristow
In reply to this post by Egon Kraan
At 04:09 PM 3/8/2007, Egon Kraan wrote:

>I am still running into problems despite using /presorted
>
>About 2 hours into running the syntax, I get a windows message that
>the virtual memory is running low and windows will increase the size.
>
>I have 1gig of RAM and the swap file is set at max at 2gig.
>
>Is there a way I can figure out just how much memory I may need for a
>specific AGGREGATE procedure?  That is, is there a specific amount of
>memory needed for a cell? For example, how much memory would I need to
>aggregate 800 cases with 400,000 variables?

Uh, oh. The number of *cases* has nothing to do with it. With
/PRESORTED, the number of break groups doesn't have anything to do with
it, either. But, 400,000 variables?

You last wrote, "after a few million records have been processed...".
This sounds like something very different.

To answer briefly, very briefly, if you're using /PRESORTED, you need
RAM for the data dictionary, plus one complete case, plus the stored
AGGREGATE specifications, plus one complete AGGREGATE cell. (Plus, of
course, SPSS itself and odds and ends I haven't thought of.) What the
latter two will add up to per variable, I can't say, and it'll depend
on what aggregation functions you're using; but say, minimally, one or
two 64-bit numbers for data storage, plus at least one more to hold the
specifications, per variable. Say, triple the size of one case's worth
of data; and that's a low estimate.

But rather than juggling such estimates, could you post your syntax? I
think we're missing something about your problem.

Among other things, AGGREGATE doesn't use the TO convention very well;
how can you even write an AGGREGATE statement, for 400,000 variables?

And, I'm afraid - generally, a file with 400,000 variables is either
very special (like a file made 'wide' for a many-to-many merge), or a
mistake. Especially with only 800 cases. There almost has to be a more
tractable organization, with many more cases and fewer variables.