SPSSX Discussion

SPSS and GPUs

Classic

List

Threaded

3 messages Options

Craig Johnson

SPSS and GPUs

I've started running into performance issues for larger datasets (500k + cases). I've been talking with our IT team and they've wondered if it's possible to use some of the more modern hardware configurations that are leveraging GPUs for heavy load processing. I was wondering if it's possible to leverage GPUs running the calculations in SPSS and if anyone's doing it what their experience has been.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Richard Ristow

Re: SPSS and GPUs

Craig J wrote on Aug 25, 2016 3:52 PM,

I've started running into performance issues for larger datasets (500k + cases). I've been talking with our IT team and they've wondered if it's possible to use some of the more modern hardware configurations that are leveraging GPUs for heavy load processing.

Others will address your question about using GPUs. However, when you have SPSS performance problems, it's often helpful to look for more efficient ways to do what you're doing, within the normal parameters of SPSS. You don't say anything about what you are doing, so I can only mention a few points that often arise:

. For many operations, SPSS is limited by I/O rather than processing speed. If you're using a smallish subset of the variables or cases in your file, it helps to make a smaller file for analysis, including only the variables and cases you need.

. For the same reason, minimize the number of times your data must be read. In particular, removing unneeded EXECUTEs from transformation code may give a dramatic improvement.

. Some operations slow much more than in proportion as the number of cases grows. If you're doing AGGREGATEs, 500k cases is getting into the range where sorting and using the PRESORTED option may improve performance. And the AUTOFIX option on CASESTOVARS (the default) can drastically slow performance.

. Some transformation code really does use excessive processing time. One case that has come up on this list is that when a string of IF statements can be replaced by a RECODE, there's a very large gain.

. Time for some procedures grows only a little more than linearly with the number of cases; but some, especially those that require holding all the data in memory, slow down much more rapidly. You should post what procedures you are using, for suggestions. For some procedures, increasing available memory is much more important than is a faster processor.

-Best success to you,

Richard Ristow

Jon Peck

Re: SPSS and GPUs

Study of the resource usage of the SPSS processes via the Task Manager can help pin down what resources it is using and what might be in short supply. Look especially at the spssengine process but also the stats.exe process and, if using programmability, the startx process(es).

There is an extension command, STATS BENCHMRK, that can report various resource usage measures for alternative formulations of a job. It requires some setup, however, beyond just the regular Statistics installation.

One thing that can make a big improvement depending on the data source is the use of the CACHE command. Check that out in the CSR.

On Thu, Aug 25, 2016 at 6:39 PM, Richard Ristow <[hidden email]> wrote:

Craig J wrote on Aug 25, 2016 3:52 PM,
I've started running into performance issues for larger datasets (500k + cases). I've been talking with our IT team and they've wondered if it's possible to use some of the more modern hardware configurations that are leveraging GPUs for heavy load processing.
Others will address your question about using GPUs. However, when you have SPSS performance problems, it's often helpful to look for more efficient ways to do what you're doing, within the normal parameters of SPSS. You don't say anything about what you are doing, so I can only mention a few points that often arise:

. For many operations, SPSS is limited by I/O rather than processing speed. If you're using a smallish subset of the variables or cases in your file, it helps to make a smaller file for analysis, including only the variables and cases you need.

. For the same reason, minimize the number of times your data must be read. In particular, removing unneeded EXECUTEs from transformation code may give a dramatic improvement.

. Some operations slow much more than in proportion as the number of cases grows. If you're doing AGGREGATEs, 500k cases is getting into the range where sorting and using the PRESORTED option may improve performance. And the AUTOFIX option on CASESTOVARS (the default) can drastically slow performance.

. Some transformation code really does use excessive processing time. One case that has come up on this list is that when a string of IF statements can be replaced by a RECODE, there's a very large gain.

. Time for some procedures grows only a little more than linearly with the number of cases; but some, especially those that require holding all the data in memory, slow down much more rapidly. You should post what procedures you are using, for suggestions. For some procedures, increasing available memory is much more important than is a faster processor.

-Best success to you,
Richard Ristow
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]