Python vs. macro language performance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Python vs. macro language performance

Bryan Tec
Hi all,

Is there any difference in performance (specifically in terms of
speed) between using the Python plugin and SPSS macro language?  I have a
macro that bogs down when the datset is too large and I am wondering if the
Python facility may allow me to get some more mileage out of it with larger
datasets.  I haven't used the python plugin before, but I'm curious if
moving the code to Python would improve the performance.

Thanks,
Bryan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python vs. macro language performance

Peck, Jon
As the saying goes, your mileage may vary, but here are some general guidelines.

Sometimes a macro has to do things in a very roundabout way when a Python program could do it directly.  That would offer an opportunity for a speedup using the plugin.

If most of the time is spent in relevant computation in SPSS, whether transformations or statistical procedures, using the plugin is unlikely to be faster.  The overhead of macro expansion is unlikely to be a significant fraction of the run time required.

But, if you can use the external (xd) mode of the plugin, where there is no SPSS user interface present - no dialogs, no menus, no Data Editor, no Viewer, there can be a huge performance gain because the updating and synchronization between the SPSS Processor and all the user interface elements is avoided.  And less memory is used.

We have had reports of speedups of 4x to 10x from this approach.  The limitation, of course, is that with no Viewer, your human-readable output is limited to the formats supported by OMS - XML, HTML, Text, and a few others.  But if you are mainly doing computation and producing new data files or if simple text summaries will do, this can be a big win.

If you use external mode, you can even continue to run your macro but run it from the xd interface.  It is very trivial to take an existing syntax job and turn it into a programmability job.  There is an article on SPSS Developer Central and material in the downloadable Data Management book on this topic.

You may be able to divide up a job into a portion that can be run with the external mode and then a second part run in the normal way in order to get ordinary Viewer output.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bryan Tec
Sent: Friday, May 23, 2008 8:19 AM
To: [hidden email]
Subject: [SPSSX-L] Python vs. macro language performance

Hi all,

Is there any difference in performance (specifically in terms of
speed) between using the Python plugin and SPSS macro language?  I have a
macro that bogs down when the datset is too large and I am wondering if the
Python facility may allow me to get some more mileage out of it with larger
datasets.  I haven't used the python plugin before, but I'm curious if
moving the code to Python would improve the performance.

Thanks,
Bryan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python vs. macro language performance

Richard Ristow
In reply to this post by Bryan Tec
At 10:19 AM 5/23/2008, Bryan Tec wrote:

>Is there any difference in performance (specifically in terms of
>speed) between using the Python plugin and SPSS macro language?  I
>have a macro that bogs down when the datset is too large and I am
>wondering if the Python facility may allow me to get some more
>mileage out of it with larger datasets.

As Jon Peck wrote (Fri, 23 May 2008 09:44:05 -0500)

>If most of the time is spent in relevant computation in SPSS, using
>the plugin is unlikely to be faster.

That is, using Python to generate the same SPSS syntax -- using
Python as you'd use a macro processor -- is very unlikely to be
either better or worse. Quoting Jon again, "The overhead of macro
expansion [or Python execution] is unlikely to be a significant
fraction of the run time required."

You have "a macro that bogs down when the dataset is too large".
That's a sure sign the problem's in the SPSS computation. The macro
processing time doesn't depend on the file size.

Run the macro, on any size file, with SET MPRINT ON, and look or
inefficiencies in the code generated. Among other things, this is the
time to look for unnecessary EXECUTE statements. Most *are*
unnecessary, and they can slow processing a lot.

You'll have to post your code before we can suggest anything more.

-Best of luck,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD