Profiling

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Profiling

Jon K Peck
For Python code in particular, the cProfile module, which is part of the
Python standard library, is quite easy to use.

Yup - RunSnakeRun uses Profile and cProfile.  But, Jon, do you know if it
provides valuable information when trying to profile python-wrapped SPSS
code?  I've put more thought into this and I don't think the profiler
would introspect into SPSS's scope enough to know which loops/functions
were eating resources... to truly profile I'd imagine you need to have
each SPSS function isolated in it's own python call.  Thoughts?

[snip]

For Python code in particular, the cProfile module, which is part of the
Python standard library, is quite easy to use.

Yup - RunSnakeRun uses Profile and cProfile.  But, Jon, do you know if it
provides valuable information when trying to profile python-wrapped SPSS
code?  I've put more thought into this and I don't think the profiler
would introspect into SPSS's scope enough to know which loops/functions
were eating resources... to truly profile I'd imagine you need to have
each SPSS function isolated in it's own python call.  Thoughts?

-j

>>>The Python profiler cannot see inside Statistics.  It will attribute
all the time on the SPSS side to the api that is calling SPSS.  Depending
on the situation, that could be much the largest part of the time, but for
complex code such as the SPSSINC TURF procedure that does almost all its
work on the Python side and contains a lot of code, the profiler showed me
some big surprises in where the time was being spent and led me to
optimize code that I never thought would be significant but turned out to
be a major time consumer.  We also made some surprising discoveries about
cycle consumption in the spss Python module that led to a number of
speedups there.

The biggest impact on Python performance, though, is case passing.  It is
always significantly slower to pass the case data to the Python module
(and back).  If you can build SPSS syntax using Python logic and Submit it
to process cases, that will always be substantially faster.  If you dig
into the SPSSINC RAKE module, you can see several examples of that.

Regards,
Jon


----


J. R. Carroll
Independent Researcher through Hurtz Labs
Research Methods, Test Development, and Statistics
www.jrcresearch.net
www.ontvp.com


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621