laptop suggestions?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

laptop suggestions?

Zdaniuk, Bozena-3
hello spss list, my laptop just died so will be purchasing a new one. Any suggestions on what I should look for in terms of RAM, processors, etc. (i don't know much about it) in the latest models in order to run SPSS 23 efficiently?
thanks so much in advance,
bozena
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Marta Garcia-Granero
I have seen SPSS 22 running very fast (complex syntax, with bootstrap,
large dataset with split file on...) on this laptop:

i7 processor (8 cores)
16Gb RAM
256 Gb SSD HD + 2 Tb SATA HD
Windows 10

It also had 4K screen, dual graphic card (4 Gb NVidia + 1 Gb Intel
graphic card), but I suppose that's not really necessary for SPSS
performance.

HTH,

Marta GG

El 04/02/2016 a las 19:41, Zdaniuk, Bozena escribió:

> hello spss list, my laptop just died so will be purchasing a new one. Any suggestions on what I should look for in terms of RAM, processors, etc. (i don't know much about it) in the latest models in order to run SPSS 23 efficiently?
> thanks so much in advance,
> bozena
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

John F Hall
But what make and model was it?

John F Hall (Mr)
[Retired academic survey researcher]

Email:   [hidden email]  
Website: www.surveyresearch.weebly.com  
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta Garcia-Granero
Sent: 05 February 2016 18:25
To: [hidden email]
Subject: Re: laptop suggestions?

I have seen SPSS 22 running very fast (complex syntax, with bootstrap, large
dataset with split file on...) on this laptop:

i7 processor (8 cores)
16Gb RAM
256 Gb SSD HD + 2 Tb SATA HD
Windows 10

It also had 4K screen, dual graphic card (4 Gb NVidia + 1 Gb Intel graphic
card), but I suppose that's not really necessary for SPSS performance.

HTH,

Marta GG

El 04/02/2016 a las 19:41, Zdaniuk, Bozena escribió:
> hello spss list, my laptop just died so will be purchasing a new one. Any
suggestions on what I should look for in terms of RAM, processors, etc. (i
don't know much about it) in the latest models in order to run SPSS 23
efficiently?
> thanks so much in advance,
> bozena
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Marta Garcia-Granero
Acer Aspire V 15 Nitro - Black Edition. RAM can be expanded to 32 Gb.

Regards,
Marta

El 05/02/2016 a las 18:37, John F Hall escribió:

> But what make and model was it?
>
> John F Hall (Mr)
> [Retired academic survey researcher]
>
> Email:   [hidden email]
> Website: www.surveyresearch.weebly.com
> SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop
>
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Marta Garcia-Granero
> Sent: 05 February 2016 18:25
> To: [hidden email]
> Subject: Re: laptop suggestions?
>
> I have seen SPSS 22 running very fast (complex syntax, with bootstrap, large
> dataset with split file on...) on this laptop:
>
> i7 processor (8 cores)
> 16Gb RAM
> 256 Gb SSD HD + 2 Tb SATA HD
> Windows 10
>
> It also had 4K screen, dual graphic card (4 Gb NVidia + 1 Gb Intel graphic
> card), but I suppose that's not really necessary for SPSS performance.
>
> HTH,
>
> Marta GG
>
> El 04/02/2016 a las 19:41, Zdaniuk, Bozena escribió:
>> hello spss list, my laptop just died so will be purchasing a new one. Any
> suggestions on what I should look for in terms of RAM, processors, etc. (i
> don't know much about it) in the latest models in order to run SPSS 23
> efficiently?
>> thanks so much in advance,
>> bozena
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
>> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> list of commands to manage subscriptions, send the command INFO
>> REFCARD
>>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Jon Peck
In reply to this post by Zdaniuk, Bozena-3
The appropriate hardware will depend on what you need to do, but I think 4 cores plus 8GB is a sweet spot.  One other thing that can help a lot is an SSD, since Statistics is heavily disk based.  The laptop I bought recently is 8 cores, 16GB, and 512GB SSD.  I also have an external monitor, which helps with all those windows, although I often have an extra set of windows, because I am running Statististics with an IDE.

And although Statistics is not especially well suited for a touch screen, given the way Windows and many apps are moving into touch, I would consider a touchscreen.

On Thu, Feb 4, 2016 at 11:41 AM, Zdaniuk, Bozena <[hidden email]> wrote:
hello spss list, my laptop just died so will be purchasing a new one. Any suggestions on what I should look for in terms of RAM, processors, etc. (i don't know much about it) in the latest models in order to run SPSS 23 efficiently?
thanks so much in advance,
bozena
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Maurice Vergeer
In reply to this post by Marta Garcia-Granero
In my experience 16Gb is more than enough. Depending om the size of the dataset and whether you want to bootstrap.
I am now using a three year old Lenovo Y500 with 16Gb and an i7 processor. It never reached more than 9Gb of used. It cost me about 1200 euros. I think you can do significantly cheaper when going for 8Gb and i5. It goes without saying you should go for a 64 processor.
Hope this helps

On Sun, Feb 7, 2016 at 5:58 PM, Marta Garcia-Granero <[hidden email]> wrote:
Acer Aspire V 15 Nitro - Black Edition. RAM can be expanded to 32 Gb.

Regards,
Marta


El 05/02/2016 a las 18:37, John F Hall escribió:
But what make and model was it?

John F Hall (Mr)
[Retired academic survey researcher]

Email:   [hidden email]
Website: www.surveyresearch.weebly.com
SPSS start page:  www.surveyresearch.weebly.com/1-survey-analysis-workshop




-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta Garcia-Granero
Sent: 05 February 2016 18:25
To: [hidden email]
Subject: Re: laptop suggestions?

I have seen SPSS 22 running very fast (complex syntax, with bootstrap, large
dataset with split file on...) on this laptop:

i7 processor (8 cores)
16Gb RAM
256 Gb SSD HD + 2 Tb SATA HD
Windows 10

It also had 4K screen, dual graphic card (4 Gb NVidia + 1 Gb Intel graphic
card), but I suppose that's not really necessary for SPSS performance.

HTH,

Marta GG

El 04/02/2016 a las 19:41, Zdaniuk, Bozena escribió:
hello spss list, my laptop just died so will be purchasing a new one. Any
suggestions on what I should look for in terms of RAM, processors, etc. (i
don't know much about it) in the latest models in order to run SPSS 23
efficiently?
thanks so much in advance,
bozena
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
the command. To leave the list, send the command SIGNOFF SPSSX-L For a
list of commands to manage subscriptions, send the command INFO
REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
________________________________________________
Maurice Vergeer
To contact me, see http://mauricevergeer.nl/node/5
To see my publications, see http://mauricevergeer.nl/node/1
________________________________________________
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Kirill Orlov
In reply to this post by Jon Peck
Jon,
Please, can you tell us a bit more words about how and how much Statistics (and its speed) is dependent on SSD?

Will you suggest to change 128 Gb SSD of my new computer to a greater volume?

Thank you!


07.02.2016 20:16, Jon Peck пишет:
The appropriate hardware will depend on what you need to do, but I think 4 cores plus 8GB is a sweet spot.  One other thing that can help a lot is an SSD, since Statistics is heavily disk based.  The laptop I bought recently is 8 cores, 16GB, and 512GB SSD.  I also have an external monitor, which helps with all those windows, although I often have an extra set of windows, because I am running Statististics with an IDE.

And although Statistics is not especially well suited for a touch screen, given the way Windows and many apps are moving into touch, I would consider a touchscreen.


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Jon Peck
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.

On Sun, Feb 7, 2016 at 11:33 AM, Kirill Orlov <[hidden email]> wrote:
Jon,
Please, can you tell us a bit more words about how and how much Statistics (and its speed) is dependent on SSD?

Will you suggest to change 128 Gb SSD of my new computer to a greater volume?

Thank you!


07.02.2016 20:16, Jon Peck пишет:
The appropriate hardware will depend on what you need to do, but I think 4 cores plus 8GB is a sweet spot.  One other thing that can help a lot is an SSD, since Statistics is heavily disk based.  The laptop I bought recently is 8 cores, 16GB, and 512GB SSD.  I also have an external monitor, which helps with all those windows, although I often have an extra set of windows, because I am running Statististics with an IDE.

And although Statistics is not especially well suited for a touch screen, given the way Windows and many apps are moving into touch, I would consider a touchscreen.





--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Kirill Orlov
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Jon Peck
I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <[hidden email]> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Rudobeck, Emil (LLU)
This is a little late, but I have noticed that at least for linear mixed models (LMM; MIXED) SPSS requires a lot of RAM. It's to the point that I think there is something wrong with the program. As an example, if you run LMM in a 4x2 design and try fit a repeated measure with 72 points as a covariate (only ~900 cells), you're looking at almost two days of calculation time for the AD1 covariance structure (without any random effects). If you try to fit UN, there is an error of insufficient memory. And all this on the latest i7 desktop with 32GB RAM. During the entire calculation barely 12% of the CPU is utilized (only 1-2 cores at 40-80% each). I don't know if Jon has any insight about this, but SPSS support wasn't of much help to me. Past SPSS users who now use Stata say that SPSS is extremely slow with calculations. I'm seriously contemplating transitioning to Stata, especially after several statisticians recommended it highly over SPSS.

So the bottom line is that the hardware depends on what analysis you will be running, but chances are that RAM might be the most important component. SSD might make the program open and close faster, but I am not sure if it helps with the calculations much. Incidentally, for serious statistical analysis I would opt for a desktop, especially due to the inherent differences in the CPUs of mobile and desktop versions. You can establish a VPN and use SPSS remotely, if mobility is important.

From: SPSSX(r) Discussion [[hidden email]] on behalf of Jon Peck [[hidden email]]
Sent: Sunday, February 07, 2016 3:59 PM
To: [hidden email]
Subject: Re: laptop suggestions?

I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <kior@...> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Jon Peck
Some mixed models require a surprisingly large amount of memory and time.  I have seen huge differences reported between similar models run in R vs SAS.  There are sometimes subtle computational differences that have a huge effect on resource requirements.  I don't know the exact details.

However, for most procedures, Statistics run times depends much more on I/o time than cpu time, which is why an SSD can speed things up a lot.  Also note that there is a limit on the number of cores that the Statistics client will use.  The limit may depend on the version.  Since the Statistics backend and frontend run in different processes and each has multiple threads, I would insist on at least 2 CPUs.

CPU specs used to be simple, but even processor speed/clock rate is now complicated, because clock rates now vary dynamically and CPUs rarely run at the rated maximum.  There are websites that have all the amazingly gory details for large numbers of cpu models.  In general, though, laptops are very competitive with desktops although they probably cost a bit more for comparable performance.  I opted for a high end laptop for my recent personal purchase.

As I discussed earlier in this thread, lots of memory can help in various ways that depend on how the OS manages memory, including I/o caching, but for most procedures the main concern is I/o speed.

On Monday, February 22, 2016, Rudobeck, Emil (LLU) <[hidden email]> wrote:
This is a little late, but I have noticed that at least for linear mixed models (LMM; MIXED) SPSS requires a lot of RAM. It's to the point that I think there is something wrong with the program. As an example, if you run LMM in a 4x2 design and try fit a repeated measure with 72 points as a covariate (only ~900 cells), you're looking at almost two days of calculation time for the AD1 covariance structure (without any random effects). If you try to fit UN, there is an error of insufficient memory. And all this on the latest i7 desktop with 32GB RAM. During the entire calculation barely 12% of the CPU is utilized (only 1-2 cores at 40-80% each). I don't know if Jon has any insight about this, but SPSS support wasn't of much help to me. Past SPSS users who now use Stata say that SPSS is extremely slow with calculations. I'm seriously contemplating transitioning to Stata, especially after several statisticians recommended it highly over SPSS.

So the bottom line is that the hardware depends on what analysis you will be running, but chances are that RAM might be the most important component. SSD might make the program open and close faster, but I am not sure if it helps with the calculations much. Incidentally, for serious statistical analysis I would opt for a desktop, especially due to the inherent differences in the CPUs of mobile and desktop versions. You can establish a VPN and use SPSS remotely, if mobility is important.

From: SPSSX(r) Discussion [<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;SPSSX-L@LISTSERV.UGA.EDU&#39;);" target="_blank">SPSSX-L@...] on behalf of Jon Peck [<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;jkpeck@gmail.com&#39;);" target="_blank">jkpeck@...]
Sent: Sunday, February 07, 2016 3:59 PM
To: <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;SPSSX-L@LISTSERV.UGA.EDU&#39;);" target="_blank">SPSSX-L@...
Subject: Re: laptop suggestions?

I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <kior@...> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;LISTSERV@LISTSERV.UGA.EDU&#39;);" target="_blank">LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Rudobeck, Emil (LLU)
I do have a computer with SSD and used SPSS extensively on 7200RPM HD before. At least for my analyses, I do not see an improvement that's noticeable.

I don't recall the specs now (should be found online easily) but when you compare a mobile CPU with desktop CPU, the mobile CPUs are usually 30-50% slower. This is understandable because laptop and tablet CPUs are built to save power. I was also in the same boat of buying a laptop over a year ago and my research showed that even if you spend $3000+ on a laptop, it still won't be as fast as a ~$1000 PC. I build my own PCs though, which comes out a bit cheaper than pre-built.

I still hope SPSS will address the calculation inefficiencies. Stata does charge for calculations on extra cores, but I'm not aware of any such options with SPSS. Though one would hardly expect to get charged even more on top of the SPSS premium version charge to get multicore functionality.



From: Jon Peck [[hidden email]]
Sent: Monday, February 22, 2016 3:42 PM
To: Rudobeck, Emil (LLU)
Cc: [hidden email]
Subject: Re: [SPSSX-L] laptop suggestions?

Some mixed models require a surprisingly large amount of memory and time.  I have seen huge differences reported between similar models run in R vs SAS.  There are sometimes subtle computational differences that have a huge effect on resource requirements.  I don't know the exact details.

However, for most procedures, Statistics run times depends much more on I/o time than cpu time, which is why an SSD can speed things up a lot.  Also note that there is a limit on the number of cores that the Statistics client will use.  The limit may depend on the version.  Since the Statistics backend and frontend run in different processes and each has multiple threads, I would insist on at least 2 CPUs.

CPU specs used to be simple, but even processor speed/clock rate is now complicated, because clock rates now vary dynamically and CPUs rarely run at the rated maximum.  There are websites that have all the amazingly gory details for large numbers of cpu models.  In general, though, laptops are very competitive with desktops although they probably cost a bit more for comparable performance.  I opted for a high end laptop for my recent personal purchase.

As I discussed earlier in this thread, lots of memory can help in various ways that depend on how the OS manages memory, including I/o caching, but for most procedures the main concern is I/o speed.

On Monday, February 22, 2016, Rudobeck, Emil (LLU) <erudobeck@...> wrote:
This is a little late, but I have noticed that at least for linear mixed models (LMM; MIXED) SPSS requires a lot of RAM. It's to the point that I think there is something wrong with the program. As an example, if you run LMM in a 4x2 design and try fit a repeated measure with 72 points as a covariate (only ~900 cells), you're looking at almost two days of calculation time for the AD1 covariance structure (without any random effects). If you try to fit UN, there is an error of insufficient memory. And all this on the latest i7 desktop with 32GB RAM. During the entire calculation barely 12% of the CPU is utilized (only 1-2 cores at 40-80% each). I don't know if Jon has any insight about this, but SPSS support wasn't of much help to me. Past SPSS users who now use Stata say that SPSS is extremely slow with calculations. I'm seriously contemplating transitioning to Stata, especially after several statisticians recommended it highly over SPSS.

So the bottom line is that the hardware depends on what analysis you will be running, but chances are that RAM might be the most important component. SSD might make the program open and close faster, but I am not sure if it helps with the calculations much. Incidentally, for serious statistical analysis I would opt for a desktop, especially due to the inherent differences in the CPUs of mobile and desktop versions. You can establish a VPN and use SPSS remotely, if mobility is important.

From: SPSSX(r) Discussion [SPSSX-L@...] on behalf of Jon Peck [jkpeck@...]
Sent: Sunday, February 07, 2016 3:59 PM
To: SPSSX-L@...
Subject: Re: laptop suggestions?

I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <kior@...> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
jkpeck@...


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Jon Peck
I decided to do some benchmarking on my configuration to explore Statistics performance further.  I used a job that opens a large data file and runs a five variable regression for my tests.  The configuration is Statistics 23 running on Win10, which means that the backend process can use up to four cpus.  That means a maximum of 50% cpu (plus a tiny bit for the frontend output).  My system has only an SSD for the C drive, but I have a fairly fast NAS connected via USB3 using WD Red drives.  The CPU is an Intel i7-6700HQ at 2.6GHz, although the Task Manager shows the clock speed varying a lot as the job runs, including a lot of time at 3.20GHz but some at 1GHz.  I have 16GB of memory.

The input data is a variation of the famous airline data.  There are 35,114,685 cases with 29 variables most of which are numeric.  The file is in zsav format, which gives better compression than sav but reading might be slightly slower.  On disk the file size is 1,389,258,699 bytes.

The jobs I ran read the data alternately from the remote NAS drive and from my local SSD.  I further varied the jobs to include or exclude a CACHE statement.  Each job ran the regression three times.

The absolute times are, of course, very dependent on my particular hardware, but there are some interesting patterns.  

Running with local input, no cache, the three regression times were essentially constant at about 35 seconds.

Running with remote input, no cache,  the first pass took 56.1 seconds and passes 2 and three also ran at 35 seconds.

Running with remote input using CACHE, the first pass took 79.7 seconds and passes 2 and 3 took 20 seconds.

Running with local input using CACHE, the first pass took 58 seconds and passes two and 3 took 20 seconds.

Here is a summary table
no cache cache
pass1 pass 23 pass1 pass 23
local 36.8 35.2 58.0 20.0
remote 56.1 35.2 79.7 20.0

You can see that the SSD compared to the physical disk made a big difference on the first pass but no difference on later passes and that the cache, which was written to the SSD in both cases, made a big difference on later passes but pays a penalty on the first pass.

The reason, I  believe, that the local/remote source made no difference on passes 2 and three is that the OS cached the i/o buffers in ram and did not read from either the SSD or the remote drive after pass 1, because there was plenty of surplus memory in this system.  My guess is that the cache written on pass 1 was also kept in memory, but there are some additional efficiencies Statistics uses on the cached files that made that even faster.

CPU utilization varied during these runs and was typically around 25% out of a possible maximum of 50% due to the four-cpu limitation, so even with an SSD or cached buffers and a multithreaded procedure such as REGRESSION, the cpu cannot be driven to its maximum.  I have never actually been able to get close to 100% with any application.

The most important question is how does this generalize?  Clearly the OS plays an important although not easily visible role, so OS variations will affect this.  The nature of the data will affect it as well as type of source (sav, csv, database, etc).  Available memory plays a very important role if there is a lot of data.  In these tests, there were no data transformations or new variables created.  Those would trigger the VAF (virtual active file) management in Statistics, and transformations would put an additional burden on cpu utilization, maybe increasing it.  Transformations normallly do not require their own data pass but do require formula evaluation for every case on the data pass where they are run.  

And, finally, the particular procedures being used will vary in how effectively they can use the multiple threads available in modern system.  A list of the multiply threaded procedures can be found in the CSR under SET THREADS.  The frontend code and other things running on the system will, of course, also have an effect here.

I did all my benchmarking just using the Windows Task Manager and a tad of Python code, but there is an extension command, STATS BENCHMRK, that can be installed via the Utilities menu that can measure all sorts of resource utilization very precisely.  It requires the Python Win32 utilities in addition to the command itself.



On Mon, Feb 22, 2016 at 7:34 PM, Rudobeck, Emil (LLU) <[hidden email]> wrote:
I do have a computer with SSD and used SPSS extensively on 7200RPM HD before. At least for my analyses, I do not see an improvement that's noticeable.

I don't recall the specs now (should be found online easily) but when you compare a mobile CPU with desktop CPU, the mobile CPUs are usually 30-50% slower. This is understandable because laptop and tablet CPUs are built to save power. I was also in the same boat of buying a laptop over a year ago and my research showed that even if you spend $3000+ on a laptop, it still won't be as fast as a ~$1000 PC. I build my own PCs though, which comes out a bit cheaper than pre-built.

I still hope SPSS will address the calculation inefficiencies. Stata does charge for calculations on extra cores, but I'm not aware of any such options with SPSS. Though one would hardly expect to get charged even more on top of the SPSS premium version charge to get multicore functionality.



From: Jon Peck [[hidden email]]
Sent: Monday, February 22, 2016 3:42 PM
To: Rudobeck, Emil (LLU)
Cc: [hidden email]
Subject: Re: [SPSSX-L] laptop suggestions?

Some mixed models require a surprisingly large amount of memory and time.  I have seen huge differences reported between similar models run in R vs SAS.  There are sometimes subtle computational differences that have a huge effect on resource requirements.  I don't know the exact details.

However, for most procedures, Statistics run times depends much more on I/o time than cpu time, which is why an SSD can speed things up a lot.  Also note that there is a limit on the number of cores that the Statistics client will use.  The limit may depend on the version.  Since the Statistics backend and frontend run in different processes and each has multiple threads, I would insist on at least 2 CPUs.

CPU specs used to be simple, but even processor speed/clock rate is now complicated, because clock rates now vary dynamically and CPUs rarely run at the rated maximum.  There are websites that have all the amazingly gory details for large numbers of cpu models.  In general, though, laptops are very competitive with desktops although they probably cost a bit more for comparable performance.  I opted for a high end laptop for my recent personal purchase.

As I discussed earlier in this thread, lots of memory can help in various ways that depend on how the OS manages memory, including I/o caching, but for most procedures the main concern is I/o speed.

On Monday, February 22, 2016, Rudobeck, Emil (LLU) <erudobeck@...> wrote:
This is a little late, but I have noticed that at least for linear mixed models (LMM; MIXED) SPSS requires a lot of RAM. It's to the point that I think there is something wrong with the program. As an example, if you run LMM in a 4x2 design and try fit a repeated measure with 72 points as a covariate (only ~900 cells), you're looking at almost two days of calculation time for the AD1 covariance structure (without any random effects). If you try to fit UN, there is an error of insufficient memory. And all this on the latest i7 desktop with 32GB RAM. During the entire calculation barely 12% of the CPU is utilized (only 1-2 cores at 40-80% each). I don't know if Jon has any insight about this, but SPSS support wasn't of much help to me. Past SPSS users who now use Stata say that SPSS is extremely slow with calculations. I'm seriously contemplating transitioning to Stata, especially after several statisticians recommended it highly over SPSS.

So the bottom line is that the hardware depends on what analysis you will be running, but chances are that RAM might be the most important component. SSD might make the program open and close faster, but I am not sure if it helps with the calculations much. Incidentally, for serious statistical analysis I would opt for a desktop, especially due to the inherent differences in the CPUs of mobile and desktop versions. You can establish a VPN and use SPSS remotely, if mobility is important.

From: SPSSX(r) Discussion [SPSSX-L@...] on behalf of Jon Peck [jkpeck@...]
Sent: Sunday, February 07, 2016 3:59 PM
To: SPSSX-L@...
Subject: Re: laptop suggestions?

I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <kior@...> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
jkpeck@...





--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Rudobeck, Emil (LLU)
Thanks. It does seem that SPSS relies on RAM very heavily. One thing I haven't tried is using a flash drive to extend Windows RAM (ReadyBoost) to see if I can overcome the lack of memory error message for the mixed models. Given the fast SSDs these days, one would hope that SPSS will be optimized to use not only RAM, but also free space on the SSD itself as a backup if RAM runs out. The ability to pause and resume a calculation would also be a rather helpful feature for complex analyses that take days to complete.

From: Jon Peck [[hidden email]]
Sent: Tuesday, February 23, 2016 10:04 AM
To: Rudobeck, Emil (LLU)
Cc: [hidden email]
Subject: Re: [SPSSX-L] laptop suggestions?

I decided to do some benchmarking on my configuration to explore Statistics performance further.  I used a job that opens a large data file and runs a five variable regression for my tests.  The configuration is Statistics 23 running on Win10, which means that the backend process can use up to four cpus.  That means a maximum of 50% cpu (plus a tiny bit for the frontend output).  My system has only an SSD for the C drive, but I have a fairly fast NAS connected via USB3 using WD Red drives.  The CPU is an Intel i7-6700HQ at 2.6GHz, although the Task Manager shows the clock speed varying a lot as the job runs, including a lot of time at 3.20GHz but some at 1GHz.  I have 16GB of memory.

The input data is a variation of the famous airline data.  There are 35,114,685 cases with 29 variables most of which are numeric.  The file is in zsav format, which gives better compression than sav but reading might be slightly slower.  On disk the file size is 1,389,258,699 bytes.

The jobs I ran read the data alternately from the remote NAS drive and from my local SSD.  I further varied the jobs to include or exclude a CACHE statement.  Each job ran the regression three times.

The absolute times are, of course, very dependent on my particular hardware, but there are some interesting patterns.  

Running with local input, no cache, the three regression times were essentially constant at about 35 seconds.

Running with remote input, no cache,  the first pass took 56.1 seconds and passes 2 and three also ran at 35 seconds.

Running with remote input using CACHE, the first pass took 79.7 seconds and passes 2 and 3 took 20 seconds.

Running with local input using CACHE, the first pass took 58 seconds and passes two and 3 took 20 seconds.

Here is a summary table
no cache cache
pass1 pass 23 pass1 pass 23
local 36.8 35.2 58.0 20.0
remote 56.1 35.2 79.7 20.0

You can see that the SSD compared to the physical disk made a big difference on the first pass but no difference on later passes and that the cache, which was written to the SSD in both cases, made a big difference on later passes but pays a penalty on the first pass.

The reason, I  believe, that the local/remote source made no difference on passes 2 and three is that the OS cached the i/o buffers in ram and did not read from either the SSD or the remote drive after pass 1, because there was plenty of surplus memory in this system.  My guess is that the cache written on pass 1 was also kept in memory, but there are some additional efficiencies Statistics uses on the cached files that made that even faster.

CPU utilization varied during these runs and was typically around 25% out of a possible maximum of 50% due to the four-cpu limitation, so even with an SSD or cached buffers and a multithreaded procedure such as REGRESSION, the cpu cannot be driven to its maximum.  I have never actually been able to get close to 100% with any application.

The most important question is how does this generalize?  Clearly the OS plays an important although not easily visible role, so OS variations will affect this.  The nature of the data will affect it as well as type of source (sav, csv, database, etc).  Available memory plays a very important role if there is a lot of data.  In these tests, there were no data transformations or new variables created.  Those would trigger the VAF (virtual active file) management in Statistics, and transformations would put an additional burden on cpu utilization, maybe increasing it.  Transformations normallly do not require their own data pass but do require formula evaluation for every case on the data pass where they are run.  

And, finally, the particular procedures being used will vary in how effectively they can use the multiple threads available in modern system.  A list of the multiply threaded procedures can be found in the CSR under SET THREADS.  The frontend code and other things running on the system will, of course, also have an effect here.

I did all my benchmarking just using the Windows Task Manager and a tad of Python code, but there is an extension command, STATS BENCHMRK, that can be installed via the Utilities menu that can measure all sorts of resource utilization very precisely.  It requires the Python Win32 utilities in addition to the command itself.



On Mon, Feb 22, 2016 at 7:34 PM, Rudobeck, Emil (LLU) <erudobeck@...> wrote:
I do have a computer with SSD and used SPSS extensively on 7200RPM HD before. At least for my analyses, I do not see an improvement that's noticeable.

I don't recall the specs now (should be found online easily) but when you compare a mobile CPU with desktop CPU, the mobile CPUs are usually 30-50% slower. This is understandable because laptop and tablet CPUs are built to save power. I was also in the same boat of buying a laptop over a year ago and my research showed that even if you spend $3000+ on a laptop, it still won't be as fast as a ~$1000 PC. I build my own PCs though, which comes out a bit cheaper than pre-built.

I still hope SPSS will address the calculation inefficiencies. Stata does charge for calculations on extra cores, but I'm not aware of any such options with SPSS. Though one would hardly expect to get charged even more on top of the SPSS premium version charge to get multicore functionality.



From: Jon Peck [jkpeck@...]
Sent: Monday, February 22, 2016 3:42 PM
To: Rudobeck, Emil (LLU)
Cc: SPSSX-L@...
Subject: Re: [SPSSX-L] laptop suggestions?

Some mixed models require a surprisingly large amount of memory and time.  I have seen huge differences reported between similar models run in R vs SAS.  There are sometimes subtle computational differences that have a huge effect on resource requirements.  I don't know the exact details.

However, for most procedures, Statistics run times depends much more on I/o time than cpu time, which is why an SSD can speed things up a lot.  Also note that there is a limit on the number of cores that the Statistics client will use.  The limit may depend on the version.  Since the Statistics backend and frontend run in different processes and each has multiple threads, I would insist on at least 2 CPUs.

CPU specs used to be simple, but even processor speed/clock rate is now complicated, because clock rates now vary dynamically and CPUs rarely run at the rated maximum.  There are websites that have all the amazingly gory details for large numbers of cpu models.  In general, though, laptops are very competitive with desktops although they probably cost a bit more for comparable performance.  I opted for a high end laptop for my recent personal purchase.

As I discussed earlier in this thread, lots of memory can help in various ways that depend on how the OS manages memory, including I/o caching, but for most procedures the main concern is I/o speed.

On Monday, February 22, 2016, Rudobeck, Emil (LLU) <erudobeck@...> wrote:
This is a little late, but I have noticed that at least for linear mixed models (LMM; MIXED) SPSS requires a lot of RAM. It's to the point that I think there is something wrong with the program. As an example, if you run LMM in a 4x2 design and try fit a repeated measure with 72 points as a covariate (only ~900 cells), you're looking at almost two days of calculation time for the AD1 covariance structure (without any random effects). If you try to fit UN, there is an error of insufficient memory. And all this on the latest i7 desktop with 32GB RAM. During the entire calculation barely 12% of the CPU is utilized (only 1-2 cores at 40-80% each). I don't know if Jon has any insight about this, but SPSS support wasn't of much help to me. Past SPSS users who now use Stata say that SPSS is extremely slow with calculations. I'm seriously contemplating transitioning to Stata, especially after several statisticians recommended it highly over SPSS.

So the bottom line is that the hardware depends on what analysis you will be running, but chances are that RAM might be the most important component. SSD might make the program open and close faster, but I am not sure if it helps with the calculations much. Incidentally, for serious statistical analysis I would opt for a desktop, especially due to the inherent differences in the CPUs of mobile and desktop versions. You can establish a VPN and use SPSS remotely, if mobility is important.

From: SPSSX(r) Discussion [SPSSX-L@...] on behalf of Jon Peck [jkpeck@...]
Sent: Sunday, February 07, 2016 3:59 PM
To: SPSSX-L@...
Subject: Re: laptop suggestions?

I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <kior@...> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
jkpeck@...





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: laptop suggestions?

Jon Peck
For MIXED in particular, you may want to contact Technical Support to see whether there are details in your specification that are generating the need for so much memory and time.  However, have you tried using SET WORKSPACE to increase the memory available to the procedure?

From the CSR under the SET command...

WORKSPACE allocates more memory for some procedures when you receive a message indicating that the
available memory has been used up or indicating that only a given number of variables can be processed.
v WORKSPACE allocates workspace memory in kilobytes for some procedures that allocate only one block of
memory. The default and minimum value is 24576.
v Do not increase the workspace memory allocation unless the program issues a message that there is
not enough memory to complete a procedure.

On Wed, Feb 24, 2016 at 8:42 AM, Rudobeck, Emil (LLU) <[hidden email]> wrote:
Thanks. It does seem that SPSS relies on RAM very heavily. One thing I haven't tried is using a flash drive to extend Windows RAM (ReadyBoost) to see if I can overcome the lack of memory error message for the mixed models. Given the fast SSDs these days, one would hope that SPSS will be optimized to use not only RAM, but also free space on the SSD itself as a backup if RAM runs out. The ability to pause and resume a calculation would also be a rather helpful feature for complex analyses that take days to complete.

From: Jon Peck [[hidden email]]
Sent: Tuesday, February 23, 2016 10:04 AM

To: Rudobeck, Emil (LLU)
Cc: [hidden email]
Subject: Re: [SPSSX-L] laptop suggestions?

I decided to do some benchmarking on my configuration to explore Statistics performance further.  I used a job that opens a large data file and runs a five variable regression for my tests.  The configuration is Statistics 23 running on Win10, which means that the backend process can use up to four cpus.  That means a maximum of 50% cpu (plus a tiny bit for the frontend output).  My system has only an SSD for the C drive, but I have a fairly fast NAS connected via USB3 using WD Red drives.  The CPU is an Intel i7-6700HQ at 2.6GHz, although the Task Manager shows the clock speed varying a lot as the job runs, including a lot of time at 3.20GHz but some at 1GHz.  I have 16GB of memory.

The input data is a variation of the famous airline data.  There are 35,114,685 cases with 29 variables most of which are numeric.  The file is in zsav format, which gives better compression than sav but reading might be slightly slower.  On disk the file size is 1,389,258,699 bytes.

The jobs I ran read the data alternately from the remote NAS drive and from my local SSD.  I further varied the jobs to include or exclude a CACHE statement.  Each job ran the regression three times.

The absolute times are, of course, very dependent on my particular hardware, but there are some interesting patterns.  

Running with local input, no cache, the three regression times were essentially constant at about 35 seconds.

Running with remote input, no cache,  the first pass took 56.1 seconds and passes 2 and three also ran at 35 seconds.

Running with remote input using CACHE, the first pass took 79.7 seconds and passes 2 and 3 took 20 seconds.

Running with local input using CACHE, the first pass took 58 seconds and passes two and 3 took 20 seconds.

Here is a summary table
no cache cache
pass1 pass 23 pass1 pass 23
local 36.8 35.2 58.0 20.0
remote 56.1 35.2 79.7 20.0

You can see that the SSD compared to the physical disk made a big difference on the first pass but no difference on later passes and that the cache, which was written to the SSD in both cases, made a big difference on later passes but pays a penalty on the first pass.

The reason, I  believe, that the local/remote source made no difference on passes 2 and three is that the OS cached the i/o buffers in ram and did not read from either the SSD or the remote drive after pass 1, because there was plenty of surplus memory in this system.  My guess is that the cache written on pass 1 was also kept in memory, but there are some additional efficiencies Statistics uses on the cached files that made that even faster.

CPU utilization varied during these runs and was typically around 25% out of a possible maximum of 50% due to the four-cpu limitation, so even with an SSD or cached buffers and a multithreaded procedure such as REGRESSION, the cpu cannot be driven to its maximum.  I have never actually been able to get close to 100% with any application.

The most important question is how does this generalize?  Clearly the OS plays an important although not easily visible role, so OS variations will affect this.  The nature of the data will affect it as well as type of source (sav, csv, database, etc).  Available memory plays a very important role if there is a lot of data.  In these tests, there were no data transformations or new variables created.  Those would trigger the VAF (virtual active file) management in Statistics, and transformations would put an additional burden on cpu utilization, maybe increasing it.  Transformations normallly do not require their own data pass but do require formula evaluation for every case on the data pass where they are run.  

And, finally, the particular procedures being used will vary in how effectively they can use the multiple threads available in modern system.  A list of the multiply threaded procedures can be found in the CSR under SET THREADS.  The frontend code and other things running on the system will, of course, also have an effect here.

I did all my benchmarking just using the Windows Task Manager and a tad of Python code, but there is an extension command, STATS BENCHMRK, that can be installed via the Utilities menu that can measure all sorts of resource utilization very precisely.  It requires the Python Win32 utilities in addition to the command itself.



On Mon, Feb 22, 2016 at 7:34 PM, Rudobeck, Emil (LLU) <erudobeck@...> wrote:
I do have a computer with SSD and used SPSS extensively on 7200RPM HD before. At least for my analyses, I do not see an improvement that's noticeable.

I don't recall the specs now (should be found online easily) but when you compare a mobile CPU with desktop CPU, the mobile CPUs are usually 30-50% slower. This is understandable because laptop and tablet CPUs are built to save power. I was also in the same boat of buying a laptop over a year ago and my research showed that even if you spend $3000+ on a laptop, it still won't be as fast as a ~$1000 PC. I build my own PCs though, which comes out a bit cheaper than pre-built.

I still hope SPSS will address the calculation inefficiencies. Stata does charge for calculations on extra cores, but I'm not aware of any such options with SPSS. Though one would hardly expect to get charged even more on top of the SPSS premium version charge to get multicore functionality.



From: Jon Peck [jkpeck@...]
Sent: Monday, February 22, 2016 3:42 PM
To: Rudobeck, Emil (LLU)
Cc: SPSSX-L@...
Subject: Re: [SPSSX-L] laptop suggestions?

Some mixed models require a surprisingly large amount of memory and time.  I have seen huge differences reported between similar models run in R vs SAS.  There are sometimes subtle computational differences that have a huge effect on resource requirements.  I don't know the exact details.

However, for most procedures, Statistics run times depends much more on I/o time than cpu time, which is why an SSD can speed things up a lot.  Also note that there is a limit on the number of cores that the Statistics client will use.  The limit may depend on the version.  Since the Statistics backend and frontend run in different processes and each has multiple threads, I would insist on at least 2 CPUs.

CPU specs used to be simple, but even processor speed/clock rate is now complicated, because clock rates now vary dynamically and CPUs rarely run at the rated maximum.  There are websites that have all the amazingly gory details for large numbers of cpu models.  In general, though, laptops are very competitive with desktops although they probably cost a bit more for comparable performance.  I opted for a high end laptop for my recent personal purchase.

As I discussed earlier in this thread, lots of memory can help in various ways that depend on how the OS manages memory, including I/o caching, but for most procedures the main concern is I/o speed.

On Monday, February 22, 2016, Rudobeck, Emil (LLU) <erudobeck@...> wrote:
This is a little late, but I have noticed that at least for linear mixed models (LMM; MIXED) SPSS requires a lot of RAM. It's to the point that I think there is something wrong with the program. As an example, if you run LMM in a 4x2 design and try fit a repeated measure with 72 points as a covariate (only ~900 cells), you're looking at almost two days of calculation time for the AD1 covariance structure (without any random effects). If you try to fit UN, there is an error of insufficient memory. And all this on the latest i7 desktop with 32GB RAM. During the entire calculation barely 12% of the CPU is utilized (only 1-2 cores at 40-80% each). I don't know if Jon has any insight about this, but SPSS support wasn't of much help to me. Past SPSS users who now use Stata say that SPSS is extremely slow with calculations. I'm seriously contemplating transitioning to Stata, especially after several statisticians recommended it highly over SPSS.

So the bottom line is that the hardware depends on what analysis you will be running, but chances are that RAM might be the most important component. SSD might make the program open and close faster, but I am not sure if it helps with the calculations much. Incidentally, for serious statistical analysis I would opt for a desktop, especially due to the inherent differences in the CPUs of mobile and desktop versions. You can establish a VPN and use SPSS remotely, if mobility is important.

From: SPSSX(r) Discussion [SPSSX-L@...] on behalf of Jon Peck [jkpeck@...]
Sent: Sunday, February 07, 2016 3:59 PM
To: SPSSX-L@...
Subject: Re: laptop suggestions?

I have only an SSD on my system, but you can certainly have Statistics installed on a hard disk and have the data and/or the temporary directory on an SSD in order to pass the data faster.

Matrix would be reading and writing data files from wherever it is stored - SSD or traditional disk, but all of the computations will be done with the data in memory.  The MATRIX procedure code itself would be paged into memory on demand by the OS.

On Sun, Feb 7, 2016 at 1:15 PM, Kirill Orlov <kior@...> wrote:
Did I get you right that your Statistics is installed on SSD?

Is it possible to have Statistics installed traditionally on HDD but at the same time to have performance gain (speed of computations, mainly) from having SSD too? Like, the program loads from HDD but does some internal temporary savings/readings on SSD.

Another question. Does MATRIX use discs (HDD/SSD) at its session, or it works intirely in memory? I mean, matrix without any statements reading/saving of data explicitly. For example, does it load every its function at a time, at a call, or loads all its functions at once when the matrix session starts?


07.02.2016 22:48, Jon Peck пишет:
I can't precisely quantify this against a traditional disk, since there are many factors involved, but I believe the speed improvement can be substantial.  Both the loading of Statistics modules and processing of the dataset are mainly disk based, and SSDs are dramatically faster than traditional disks.  The OS complicates this, because it may use some available memory to cache data and modules.  

On my SSD-based system, Statistics 23 launches in 7 seconds.  Running DESCRIPTIVES on a 7-million case sav file with 29 variables that has not been previously loaded completes in about 5 seconds.  A 4-variable regression on the same data finished in the same amount of time.  A year or two ago, I heard from a user with an SSD that running a 60-million case regression with many variables completes in less than one minute.





--
Jon K Peck
jkpeck@...

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
jkpeck@...





--
Jon K Peck
jkpeck@...




--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD