SPSSX Discussion

optimizing code and speedups

Classic

List

Threaded

8 messages Options

Mumdzhiev

optimizing code and speedups

Hello everyone,

is anyone aware of some kind of performance enhancement papers for the present spss-python-r-java system (by plugins), testing speed and memory for different strategies and approaches, by comparing loops through the systems and other issues?
De-looping processes or using C in R is well known;
is there a tool finding slow parts of code in spss syntax?
nice papers on compilation or strategic papers by IBM whats to be expected in future?
Best practices in the programming structures, string manipulations concerning speedup?
I simply dont want to leave spss platforms but really wait for days for nested loops to finish...
id be interested in testing parts of codes and comparing approaches by myself, if there is no other option...

Ill appreciate any comment or help

Milko

J. R. Carroll-3

Re: optimizing code and speedups

is there a tool finding slow parts of code in spss syntax?

You might be able to wrap your SPSS code with Python and then run standard python profiling tools. One that I use often is http://www.vrplumber.com/programming/runsnakerun/ but that's on pure Python projects - I am uncertain of the 'validity' of such a profiling tool when wrapping around SPSS code... my guess is that the overhead of doing such a wrap can be an assumed minuscule contributor to the overall time it takes your SPSS syntax to run, that and the ratio/relative-percent of time it takes to run should be representative of the true profile. RunSnakeRun will take the output of your profile-capture and give you a nice visual layout tool that details for you which functions/methods are eating what amount of resource (a great way for checking to see if you have a memory leak or a loop that shouldn't be looping as often as you thought)

As far as timing/profiling facilities go in SPSS I am uncertain if they exists (Jon Peck/David Marso will fill you in for sure). If you decide to profile your SPSS code using cProfile in python and want to use RunSnakeRun and get stuck - let me know I'd be happy to offer help.

Best practices in the programming structures, string manipulations
concerning speedup?

Are you talking strictly in SPSS or Python?

As a general response to "where can I optimize my code?", and we are talking about Python, I'd recommend looking into list comprehensions and generators for optimizations especially concerning loops (these are not always compatible with the types of string manipulations one might want to do in SPSS, but it's a place to start looking).

-j

----

J. R. Carroll

Independent Researcher through Hurtz Labs

Research Methods, Test Development, and Statistics

www.jrcresearch.net

www.ontvp.com

Cell: (650) 776-6613

Email: [hidden email]

[hidden email]

On Fri, Jul 12, 2013 at 10:10 AM, Mumdzhiev <[hidden email]> wrote:

Hello everyone,

is anyone aware of some kind of performance enhancement papers for the
present spss-python-r-java system (by plugins), testing speed and memory for
different strategies and approaches, by comparing loops through the systems
and other issues?
De-looping processes or using C in R is well known;
is there a tool finding slow parts of code in spss syntax?
nice papers on compilation or strategic papers by IBM whats to be expected
in future?
Best practices in the programming structures, string manipulations
concerning speedup?
I simply dont want to leave spss platforms but really wait for days for
nested loops to finish...
id be interested in testing parts of codes and comparing approaches by
myself, if there is no other option...

Ill appreciate any comment or help

Milko

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/optimizing-code-and-speedups-tp5721121.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: optimizing code and speedups

Administrator

In reply to this post by Mumdzhiev

I'm curious about "wait for days for nested loops to finish"...
Maybe the structure of your code is inefficient?
-----

Mumdzhiev wrote

Hello everyone,

is anyone aware of some kind of performance enhancement papers for the present spss-python-r-java system (by plugins), testing speed and memory for different strategies and approaches, by comparing loops through the systems and other issues?
De-looping processes or using C in R is well known;
is there a tool finding slow parts of code in spss syntax?
nice papers on compilation or strategic papers by IBM whats to be expected in future?
Best practices in the programming structures, string manipulations concerning speedup?
I simply dont want to leave spss platforms but really wait for days for nested loops to finish...
id be interested in testing parts of codes and comparing approaches by myself, if there is no other option...

Ill appreciate any comment or help

Milko

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Albert-Jan Roskam

Re: optimizing code and speedups

In reply to this post by J. R. Carroll-3

I used this before; it's the best I could come up with in SPSS:

file handle logfile /name = "%temp%/times.txt".
define !stamp () host command = ['echo %time%'] !enddefine.
oms /select texts /if commands = ["Host"] labels = ["Host Output"]
/destination format = text outfile = logfile.
!stamp.
frequencies server_lfe.
!stamp.
frequencies server_lfe.
omsend.

host command = ['echo step #1: %time%']

fre...

host command = ['echo step #2: %time%']

fre...

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: J. R. Carroll <[hidden email]>
To: [hidden email]
Sent: Friday, July 12, 2013 4:33 PM
Subject: Re: [SPSSX-L] optimizing code and speedups

is there a tool finding slow parts of code in spss syntax?

You might be able to wrap your SPSS code with Python and then run standard python profiling tools. One that I use often is http://www.vrplumber.com/programming/runsnakerun/ but that's on pure Python projects - I am uncertain of the 'validity' of such a profiling tool when wrapping around SPSS code... my guess is that the overhead of doing such a wrap can be an assumed minuscule contributor to the overall time it takes your SPSS syntax to run, that and the ratio/relative-percent of time it takes to run should be representative of the true profile. RunSnakeRun will take the output of your profile-capture and give you a nice visual layout tool that details for you which functions/methods are eating what amount of resource (a great way for checking to see if you have a memory leak or a loop that shouldn't be looping as often as you thought)

As far as timing/profiling facilities go in SPSS I am uncertain if they exists (Jon Peck/David Marso will fill you in for sure). If you decide to profile your SPSS code using cProfile in python and want to use RunSnakeRun and get stuck - let me know I'd be happy to offer help.

Best practices in the programming structures, string manipulations
concerning speedup?

Are you talking strictly in SPSS or Python?

As a general response to "where can I optimize my code?", and we are talking about Python, I'd recommend looking into list comprehensions and generators for optimizations especially concerning loops (these are not always compatible with the types of string manipulations one might want to do in SPSS, but it's a place to start looking).

-j

----

J. R. Carroll

Independent Researcher through Hurtz Labs

Research Methods, Test Development, and Statistics

http://www.jrcresearch.net/

http://www.ontvp.com/

Cell: (650) 776-6613

Email: [hidden email]

[hidden email]

[hidden email]

On Fri, Jul 12, 2013 at 10:10 AM, Mumdzhiev <[hidden email]> wrote:

Hello everyone,

is anyone aware of some kind of performance enhancement papers for the
present spss-python-r-java system (by plugins), testing speed and memory for
different strategies and approaches, by comparing loops through the systems
and other issues?
De-looping processes or using C in R is well known;
is there a tool finding slow parts of code in spss syntax?
nice papers on compilation or strategic papers by IBM whats to be expected
in future?
Best practices in the programming structures, string manipulations
concerning speedup?
I simply dont want to leave spss platforms but really wait for days for
nested loops to finish...
id be interested in testing parts of codes and comparing approaches by
myself, if there is no other option...

Ill appreciate any comment or help

Milko

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/optimizing-code-and-speedups-tp5721121.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: optimizing code and speedups

In reply to this post by David Marso

On July 3, Jon posted a link for information on SPSS-specific
efficiency, such as, for reading files over networks. File I/O is
the usual bottleneck for the largest applications.
Jon's cite:
On the Books and Articles page in the SPSS Community website
(www.ibm.com/developerworks/spssdevcentral) there is a link to a
Best Practices article on performance.

That document did not try to encompass to ordinary programming
practices.

I'm curious, too, about waiting "days." It makes me think that
there is possibly something extremely inefficient, and is the
sort of think that any experienced programmer would look at
and say, Don't do THAT.

--
Rich Ulrich

> Date: Fri, 12 Jul 2013 08:31:21 -0700

> From: [hidden email]
> Subject: Re: optimizing code and speedups
> To: [hidden email]
>
> I'm curious about "wait for days for nested loops to finish"...
> Maybe the structure of your code is inefficient?
> -----
>
> Mumdzhiev wrote
> > Hello everyone,
> >
> > is anyone aware of some kind of performance enhancement papers for the
> > present spss-python-r-java system (by plugins), testing speed and memory
> > for different strategies and approaches, by comparing loops through the
> > systems and other issues?
> > De-looping processes or using C in R is well known;
> > is there a tool finding slow parts of code in spss syntax?
> > nice papers on compilation or strategic papers by IBM whats to be expected
> > in future?
> > Best practices in the programming structures, string manipulations
> > concerning speedup?
> > I simply dont want to leave spss platforms but really wait for days for
> > nested loops to finish...
> > id be interested in testing parts of codes and comparing approaches by
> > myself, if there is no other option...
> >
> > Ill appreciate any comment or help
> >
> > Milko

...

J. R. Carroll-3

Re: optimizing code and speedups

Just as a point of potential misunderstanding (the OP's clarification notwithstanding), seeing as several people have queued into it:

I simply dont want to leave spss platforms but really wait for days for
nested loops to finish...

I read it in the non-literal sense. Could just be an esoteric-hipster way of saying "I have to wait a long time"; although I live on the East coast now, my West coast life used the word "days" as hyperbole when talking with friends; not sure if it's literally days.

As in: "ahhh duuuuude... *takes a bong hit* waiting for this burrito at Taco Bell takes days"

And yes, I'm sure it's been used in this way long before I was born in the early 80's.

I wonder what job could take dayS (in the plural) on a modern machine. If it's truly BigData I wouldn't suspect SPSS is used... err, can SPSS be used for BigData applications - or is it limited in how much it can analyze?

For Python code in particular, the cProfile module, which is part of the Python standard library, is quite easy to use.

Yup - RunSnakeRun uses Profile and cProfile. But, Jon, do you know if it provides valuable information when trying to profile python-wrapped SPSS code? I've put more thought into this and I don't think the profiler would introspect into SPSS's scope enough to know which loops/functions were eating resources... to truly profile I'd imagine you need to have each SPSS function isolated in it's own python call. Thoughts?

-j

----

J. R. Carroll

Independent Researcher through Hurtz Labs

Research Methods, Test Development, and Statistics

www.jrcresearch.net

www.ontvp.com

Cell: (650) 776-6613

Email: [hidden email]

[hidden email]

On Fri, Jul 12, 2013 at 1:14 PM, Rich Ulrich <[hidden email]> wrote:

On July 3, Jon posted a link for information on SPSS-specific
efficiency, such as, for reading files over networks. File I/O is
the usual bottleneck for the largest applications.
Jon's cite:
On the Books and Articles page in the SPSS Community website
(www.ibm.com/developerworks/spssdevcentral) there is a link to a
Best Practices article on performance.

That document did not try to encompass to ordinary programming
practices.

I'm curious, too, about waiting "days." It makes me think that
there is possibly something extremely inefficient, and is the
sort of think that any experienced programmer would look at
and say, Don't do THAT.

--
Rich Ulrich

> Date: Fri, 12 Jul 2013 08:31:21 -0700
> From: [hidden email]
> Subject: Re: optimizing code and speedups
> To: [hidden email]

>
> I'm curious about "wait for days for nested loops to finish"...
> Maybe the structure of your code is inefficient?
> -----
>
> Mumdzhiev wrote
> > Hello everyone,
> >
> > is anyone aware of some kind of performance enhancement papers for the
> > present spss-python-r-java system (by plugins), testing speed and memory
> > for different strategies and approaches, by comparing loops through the
> > systems and other issues?
> > De-looping processes or using C in R is well known;
> > is there a tool finding slow parts of code in spss syntax?
> > nice papers on compilation or strategic papers by IBM whats to be expected
> > in future?
> > Best practices in the programming structures, string manipulations
> > concerning speedup?
> > I simply dont want to leave spss platforms but really wait for days for
> > nested loops to finish...
> > id be interested in testing parts of codes and comparing approaches by
> > myself, if there is no other option...
> >
> > Ill appreciate any comment or help
> >
> > Milko
...

Mumdzhiev

Re: optimizing code and speedups

In reply to this post by Mumdzhiev

Thank you everybody for all the help provided...
i think i can use all the parts you mentioned

there are a lot of jobs which require time and memory to be finished, one of
them is turf analysis, which is done in python I believe Jon argued here
with python capability, of course this can not solve the problem of turf and
similar problems
maybe its not always a question of bigger hardware or modern machines to buy
to get jobs finished, its the kind of study or job you do,
were doing more and more simulation stuff, and make recommendations which
depend on licencing, time and often simply comparisons; youre joking, but
sometimes (really not very often) im in trouble when explaining why Spss
with all the plugins and extensions is more than a GUI for R,
python, when being asked how long it takes, is it possible, and how
much...of course im not a seller ;)

Milko

On Fri, 12 Jul 2013 13:53:03 -0400, J. R. Carroll
<[hidden email]> wrote:

>Just as a point of potential misunderstanding (the OP's clarification
>notwithstanding), seeing as several people have queued into it:
>
>I simply dont want to leave spss platforms but really wait for days for
>> nested loops to finish...
>
>
>I read it in the non-literal sense. Could just be an esoteric-hipster way
>of saying "I have to wait a long time"; although I live on the East coast
>now, my West coast life used the word "days" as hyperbole when talking with
>friends; not sure if it's literally days.
>
>*As in*: "ahhh duuuuude... *takes a bong hit* waiting for this burrito at
>Taco Bell takes *days*"
>
>And yes, I'm sure it's been used in this way long before I was born in the
>early 80's.
>
>I wonder what job could take dayS (in the plural) on a modern machine. If
>it's truly BigData I wouldn't suspect SPSS is used... err, can SPSS be used
>for BigData applications - or is it limited in how much it can analyze?
>
>
>For Python code in particular, the cProfile module, which is part of the
>> Python standard library, is quite easy to use.
>
>
>Yup - RunSnakeRun uses Profile and cProfile. But, Jon, do you know if it
>provides valuable information when trying to profile python-wrapped SPSS
>code? I've put more thought into this and I don't think the profiler would
>introspect into SPSS's scope enough to know which loops/functions were
>eating resources... to truly profile I'd imagine you need to have each SPSS
>function isolated in it's own python call. Thoughts?
>
>-j
>
>
>----
>
>
>J. R. Carroll
>Independent Researcher through Hurtz Labs
>Research Methods, Test Development, and Statistics
>www.jrcresearch.net
>www.ontvp.com
>Cell: (650) 776-6613
>Email: [hidden email]
> [hidden email]
> [hidden email]
><https://www.facebook.com/J.R.Car>
><https://twitter.com/JustinRCarroll><http://www.linkedin.com/in/jrcarroll>
>
>
>
>On Fri, Jul 12, 2013 at 1:14 PM, Rich Ulrich <[hidden email]> wrote:
>
>> On July 3, Jon posted a link for information on SPSS-specific
>> efficiency, such as, for reading files over networks. File I/O is
>> the usual bottleneck for the largest applications.
>> Jon's cite:
>> On the Books and Articles page in the SPSS Community website
>> (www.ibm.com/developerworks/spssdevcentral) there is a link to a
>> Best Practices article on performance.
>>
>> That document did not try to encompass to ordinary programming
>> practices.
>>
>> I'm curious, too, about waiting "days." It makes me think that
>> there is possibly something extremely inefficient, and is the
>> sort of think that any experienced programmer would look at
>> and say, Don't do THAT.
>>
>> --
>> Rich Ulrich
>>
>> > Date: Fri, 12 Jul 2013 08:31:21 -0700
>> > From: [hidden email]
>> > Subject: Re: optimizing code and speedups
>> > To: [hidden email]
>>
>> >
>> > I'm curious about "wait for days for nested loops to finish"...
>> > Maybe the structure of your code is inefficient?
>> > -----
>> >
>> > Mumdzhiev wrote
>> > > Hello everyone,
>> > >
>> > > is anyone aware of some kind of performance enhancement papers for the
>> > > present spss-python-r-java system (by plugins), testing speed and
>> memory
>> > > for different strategies and approaches, by comparing loops through the
>> > > systems and other issues?
>> > > De-looping processes or using C in R is well known;
>> > > is there a tool finding slow parts of code in spss syntax?
>> > > nice papers on compilation or strategic papers by IBM whats to be
>> expected
>> > > in future?
>> > > Best practices in the programming structures, string manipulations
>> > > concerning speedup?
>> > > I simply dont want to leave spss platforms but really wait for days for
>> > > nested loops to finish...
>> > > id be interested in testing parts of codes and comparing approaches by
>> > > myself, if there is no other option...
>> > >
>> > > Ill appreciate any comment or help
>> > >
>> > > Milko
>> ...
>>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mumdzhiev

Re: optimizing code and speedups

Thank you Jon,

i already got a lot of help from all of you, im going to add speedtests for different approaches, as now i know how to do that nicely for spss base or python, just because i need the fastest approach for a fix problem, one of them is turf, but also other ones. Because these problems are hard in time on their own, i need the fastest loops, efficient structures, etc, (as im not really working on better algorithms) as i already know to boost R, im just going to test spss, python in addition, to compare and measure.
This kind of issue is often part of books on programming, so i looked that up for spss and didnt find much, some IBM tests for Spss Server Tools; but more literature on r for these sort of question,
comparing papers is the first part of next work, second is empirical of course, and both not a big story.
Thank you all for comments and helping!!

Milko

I don't entirely understand your post, but if you are using TURF and finding that it takes a very long time, remember that it the nature of that procedure that it has to evaluate all possible combinations, and the number of combinations grows explosively with the number of variables. You might want to raise the screening threshold or otherwise eliminate variables that are not likely to contribute much.

If you have a large number of cases and variables, it is possible that the algorithm is starved for memory. You might want to look at the Task Manager statistics for the startx process, and if you are not running a 64-bit version of Statistics, moving to that would allow the code to use more memory.

Regards,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@us.ibm.com
phone: 720-342-5621