Dear Listers,
Does anyone have any advice about memory requirements for running time-dependent Cox regression analyses with large datasets. We have a dataset of 184,000 cases with a quarterly exposure variable observed for upto twenty years (therefore 80 time segments). In SPSS v.18 on recently purchased desktop PCs we keep getting 'Insufficient memory to process the command'. Annoying as this only seems to happen after the time program has run for many minutes. I appreciate some others' experiences of t/d Cox regression and any solutions they found to overcome memory problems. Kind regards, Chris. -- Dr Chris Poole Senior Lecturer in the Evaluation of Medicines Department of Primary Care & Public Health School of Medicine Cardiff University Cardiff MediCentre, Heath Park, Cardiff, CF14 4UJ +44 (0)29 2068 2102 |
Administrator
|
See if
SET WORKSPACE=some large number helps. Default is 6144 (kbtes) . I thought most procedures just grabbed what is required but that may be a false assumption. Hard to keep track of what does what etc... I ran some *HUGE* MATRIX problems recently and my 4G machine happily allowed SET WORKSPACE=200000 (ie 200M) and my 5000x5000 matrix inversion test ran in about 30 minutes. HTH, David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks for the prompt reply David,
I initially thought the same as you but didn't try such a big value for SET WORKSPACE. There is a thread on IBM SPSS forum explaining how the number of computed values increases nearly exponentially using the TIME program. In our case, almost 20 billion are produced before the regression is computed. We're also going to try running SPSS on a very large Amazon EC2 instance to improve performance. Will let you know how we get on. KR, Chris. On Monday, 20 February 2012, David Marso <[hidden email]> wrote: > See if > SET WORKSPACE=some large number helps. > Default is 6144 (kbtes) . > > I thought most procedures just grabbed what is required but that may be a > false assumption. Hard to keep track of what does what etc... > I ran some *HUGE* MATRIX problems recently and my 4G machine happily > allowed SET WORKSPACE=200000 (ie 200M) and my 5000x5000 matrix inversion > test ran in about 30 minutes. > HTH, David > > > > Dr Chris Poole wrote >> >> Dear Listers, >> >> Does anyone have any advice about memory requirements for running >> time-dependent Cox regression analyses with large datasets. >> >> We have a dataset of 184,000 cases with a quarterly exposure variable >> observed for upto twenty years (therefore 80 time segments). >> >> In SPSS v.18 on recently purchased desktop PCs we keep getting >> 'Insufficient memory to process the command'. Annoying as this only seems >> to happen after the time program has run for many minutes. >> >> I appreciate some others' experiences of t/d Cox regression and any >> solutions they found to overcome memory problems. >> >> Kind regards, >> >> Chris. >> >> -- >> Dr Chris Poole >> Senior Lecturer in the Evaluation of Medicines >> Department of Primary Care & Public Health >> School of Medicine >> Cardiff University >> Cardiff MediCentre, Heath Park, Cardiff, CF14 4UJ >> +44 (0)29 2068 2102 >> > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Insufficient-memory-for-time-dependent-Cox-regression-tp5499500p5499561.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- Chris Poole [hidden email] | P. +44(0)7733004258 |
Administrator
|
You can actually set it even much larger.
I'm running SPSS 11.5 on a 4G Win32 Vista and set workspace 8300000. is accepted set workspace 8400000. fails with: >Warning # 882 in column 15. Text: 8400000 >The parameter of the WORKSPACE subcommand of the SET command must be a >positive integer not larger than the computer's process memory limit. Now I'd better set it back to something reasonable before I forget ;-) On Mon, Feb 20, 2012 at 2:54 PM, Chris Poole <[hidden email]> wrote: > Thanks for the prompt reply David, > > I initially thought the same as you but didn't try such a big value for SET > WORKSPACE. > > There is a thread on IBM SPSS forum explaining how the number of computed > values increases nearly exponentially using the TIME program. In our case, > almost 20 billion are produced before the regression is computed. > > We're also going to try running SPSS on a very large Amazon EC2 instance to > improve performance. > > Will let you know how we get on. > > KR, > > Chris. > > On Monday, 20 February 2012, David Marso <[hidden email]> wrote: >> See if >> SET WORKSPACE=some large number helps. >> Default is 6144 (kbtes) . >> >> I thought most procedures just grabbed what is required but that may be a >> false assumption. Hard to keep track of what does what etc... >> I ran some *HUGE* MATRIX problems recently and my 4G machine happily >> allowed SET WORKSPACE=200000 (ie 200M) and my 5000x5000 matrix inversion >> test ran in about 30 minutes. >> HTH, David >> >> >> >> Dr Chris Poole wrote >>> >>> Dear Listers, >>> >>> Does anyone have any advice about memory requirements for running >>> time-dependent Cox regression analyses with large datasets. >>> >>> We have a dataset of 184,000 cases with a quarterly exposure variable >>> observed for upto twenty years (therefore 80 time segments). >>> >>> In SPSS v.18 on recently purchased desktop PCs we keep getting >>> 'Insufficient memory to process the command'. Annoying as this only seems >>> to happen after the time program has run for many minutes. >>> >>> I appreciate some others' experiences of t/d Cox regression and any >>> solutions they found to overcome memory problems. >>> >>> Kind regards, >>> >>> Chris. >>> >>> -- >>> Dr Chris Poole >>> Senior Lecturer in the Evaluation of Medicines >>> Department of Primary Care & Public Health >>> School of Medicine >>> Cardiff University >>> Cardiff MediCentre, Heath Park, Cardiff, CF14 4UJ >>> +44 (0)29 2068 2102 >>> >> >> >> -- >> View this message in context: >> http://spssx-discussion.1045642.n5.nabble.com/Insufficient-memory-for-time-dependent-Cox-regression-tp5499500p5499561.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > -- > Chris Poole > E: [hidden email] | P. +44(0)7733004258 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Chris Poole
Two points:
First, the 64-bit version can generally handle a lot more memory (and it is covered by the same license). Second, do not leave the workspace setting at a big number, because that will starve procedures that do not use the workspace, which is most of them. (I don't know for sure that this procedure actually uses the workspace, but the points above still apply.) Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Chris Poole <[hidden email]> To: [hidden email] Date: 02/20/2012 01:27 PM Subject: Re: [SPSSX-L] Insufficient memory for time-dependent Cox regression Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks for the prompt reply David, I initially thought the same as you but didn't try such a big value for SET WORKSPACE. There is a thread on IBM SPSS forum explaining how the number of computed values increases nearly exponentially using the TIME program. In our case, almost 20 billion are produced before the regression is computed. We're also going to try running SPSS on a very large Amazon EC2 instance to improve performance. Will let you know how we get on. KR, Chris. On Monday, 20 February 2012, David Marso <david.marso@...> wrote: > See if > SET WORKSPACE=some large number helps. > Default is 6144 (kbtes) . > > I thought most procedures just grabbed what is required but that may be a > false assumption. Hard to keep track of what does what etc... > I ran some *HUGE* MATRIX problems recently and my 4G machine happily > allowed SET WORKSPACE=200000 (ie 200M) and my 5000x5000 matrix inversion > test ran in about 30 minutes. > HTH, David > > > > Dr Chris Poole wrote >> >> Dear Listers, >> >> Does anyone have any advice about memory requirements for running >> time-dependent Cox regression analyses with large datasets. >> >> We have a dataset of 184,000 cases with a quarterly exposure variable >> observed for upto twenty years (therefore 80 time segments). >> >> In SPSS v.18 on recently purchased desktop PCs we keep getting >> 'Insufficient memory to process the command'. Annoying as this only seems >> to happen after the time program has run for many minutes. >> >> I appreciate some others' experiences of t/d Cox regression and any >> solutions they found to overcome memory problems. >> >> Kind regards, >> >> Chris. >> >> -- >> Dr Chris Poole >> Senior Lecturer in the Evaluation of Medicines >> Department of Primary Care & Public Health >> School of Medicine >> Cardiff University >> Cardiff MediCentre, Heath Park, Cardiff, CF14 4UJ >> +44 (0)29 2068 2102 >> > > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Insufficient-memory-for-time-dependent-Cox-regression-tp5499500p5499561.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@... (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- Chris Poole E: drchrispoole@... | P. +44(0)7733004258 |
In reply to this post by Dr Chris Poole
Have you tried STATA?
I ran Cox regression on a dataset of 600,000 cases with about 10 variables. Stata works well. ZHANG Lun 2012/2/20 Dr Chris Poole <[hidden email]> Dear Listers, |
It is recommended that Complex Samples be used in order to extract appropriate samples for analysis from super large datasets. It is fun to see how nicely you are relieved from analyzing the whole large dataset. Max. From: 张伦 [via SPSSX Discussion] [mailto:[hidden email]] Have you tried STATA? I ran  Cox regression on a dataset of 600,000 cases with about 10 variables. Stata works well. ZHANG Lun 2012/2/20 Dr Chris Poole <[hidden email]> Dear Listers, If you reply to this email, your message will be added to the discussion below: To start a new topic under SPSSX Discussion, email [hidden email] |
In reply to this post by Dr Chris Poole
I have computed Cox regression with several million cases, about 20 predictor variables and several observations per case, which is more than your study of 184,000 x (80 observations + some predictors), and I had no problem processing those analyses except the (sometimes annoyingly long) time required by SPSS to perform the iterations. Never got any message about insufficient memory. I did get that message sometimes with categorical principal components (CATPCA) and other procedures requiring the whole dataset to be held in RAM, but not in the case of Cox. Thus, there is a possibility that some other problem is the cause of the message, not the size of the dataset (cases x (measures + predictors). Perhaps, for instance, you do have insufficient memory in general. Hector De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Dr Chris Poole Dear Listers, |
Free forum by Nabble | Edit this page |