SPSSX Discussion

Multiprocessing - Python Module

Classic

List

Threaded

5 messages Options

Eugenia Cachia

Multiprocessing - Python Module

We are trying to use multiprocessing module from the Python library in SPSS, so far - unsuccessfully:

begin program.
import multiprocessing,spss

def doManager ():
print "debug #1"
jobs=[]
* populate jobs - list of tuples
resultsAll = pool.map(doWorker, jobs)
print "debug #2"
pool.close()
pool.join()

doWorker(jobs):
* do something with every tuple fom "jobs"

doManager()

end program.

If I remove the definition for doWorker function, print "debug #1" works, but an error is thrown that doWorker func is not defined.

If doWorker is defined, print "debug #" does not appear, none of them. SPSS engine keeps on running, does not throw an error, but does not produce any result.

My question: is it possible to use Python multiprocessing module from SPSS please?

As a workaround: how to manage SPSS production mode from the command line? In this way I would be able to run few SPSS instances from Python Multiprocessing Pool.

Thanks,
Eugenia

Jon K Peck

Re: Multiprocessing - Python Module

The answer probably lies in the code in the doWorker job. The spssengine and stats processes (backend and frontend) wait for completion of the program block, but the SPSS apis would respond. However, they are not multithreaded and eventually call into C code, so the Python GIL might interfere.

Try a test where any SPSS api calls are made in the top level process and the workers just do things that don't require the apis.

You can, alternative, use external mode where your Python code drives Statistics - just import the spss module in your Python program.

For command line production mode information, search the help for "command line".

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Eugenia Cachia <[hidden email]>
To: [hidden email]
Date: 08/23/2014 07:08 AM
Subject: [SPSSX-L] Multiprocessing - Python Module
Sent by: "SPSSX(r) Discussion" <[hidden email]>

We are trying to use multiprocessing module from the Python library in SPSS, so far - unsuccessfully: *begin program. import multiprocessing,spss def doManager (): print "debug #1" jobs=[] * populate jobs - list of tuples resultsAll = pool.map(doWorker, jobs) print "debug #2" pool.close() pool.join() doWorker(jobs): * do something with every tuple fom "jobs" doManager() end program. * If I remove the definition for doWorker function, *print "debug #1"* works, but an error is thrown that doWorker func is not defined. If doWorker is defined, *print "debug #"* does not appear, none of them. SPSS engine keeps on running, does not throw an error, but does not produce any result. My question: is it possible to use Python multiprocessing module from SPSS please? As a workaround: how to manage SPSS production mode from the command line? In this way I would be able to run few SPSS instances from Python Multiprocessing Pool. Thanks, Eugenia -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Multiprocessing-Python-Module-tp5727067.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Eugenia Cachia

Re: Multiprocessing - Python Module

In reply to this post by Eugenia Cachia

I have kept on experimenting with using Python multiprocessing module from SPSS, so far - unsuccessfull:
Python script, lets call it "example1.py" - works ok:

import os
import multiprocessing

def manager():
args=[]
for n in xrange(1,6):
args.append((n, "hallo"))
pool=multiprocessing.Pool(processes=4)
pool.map(worker,args)
pool.close()
pool.join()

def worker(args):
os.chdir("c:\\Temp1")
print args
f=open("text_"+str(args[0])+".txt","wb+")
f.write(str(args))
f.close()

def main():
manager()
if __name__ == '__main__':
main()

Python script wrapped in SPSS - does not work.

begin program.
import os,multiprocessing,spss

def manager():
args=[]
for n in xrange(1,6):
args.append((n, "hallo"))
pool=multiprocessing.Pool(processes=4)
pool.map(worker,args)
pool.close()
pool.join()

def worker(args):
os.chdir("c:\\Temp1")
print args
f=open("text_"+str(args[0])+".txt","wb+")
f.write(str(args))
f.close()

manager()

end program.

Placing "example1.py" in "C:\Python27\Lib" and then importing it from spss and calling the manager() function also does not work:

begin program.
import os, example1, multiprocessing,spss

example1.manager()

end program.

Definitely, the problem does not lie with the very simplistic Python code.

Could anyone offer an explanation why my multiprocessing Python example is unsuccessful in SPSS please?

Thanks,
Eugenia

Albert-Jan Roskam-2

Re: Multiprocessing - Python Module

--Wat are you trying to accomplish?
--What error do you get? (full traceback)

--Does it work if you do not use os.chdir, but os.path.join instead?

Regards,

Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

----- Original Message -----

> From: Eugenia Cachia <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Wednesday, September 3, 2014 5:09 PM
> Subject: Re: [SPSSX-L] Multiprocessing - Python Module
>
> I have kept on experimenting with using Python multiprocessing module from
> SPSS, so far - unsuccessfull:
> Python script, lets call it "example1.py" - works ok:
>
> *import os
> import multiprocessing
>
> def manager():
> args=[]
> for n in xrange(1,6):
> args.append((n, "hallo"))
> pool=multiprocessing.Pool(processes=4)
> pool.map(worker,args)
> pool.close()
> pool.join()
>
> def worker(args):
> os.chdir("c:\\Temp1")
> print args
> f=open("text_"+str(args[0])+".txt","wb+")
> f.write(str(args))
> f.close()
>
> def main():
> manager()
> if __name__ == '__main__':
> main()
> *
>
> Python script wrapped in SPSS - does not work.
>
> *begin program.
> import os,multiprocessing,spss
>
> def manager():
> args=[]
> for n in xrange(1,6):
> args.append((n, "hallo"))
> pool=multiprocessing.Pool(processes=4)
> pool.map(worker,args)
> pool.close()
> pool.join()
>
>
> def worker(args):
> os.chdir("c:\\Temp1")
> print args
> f=open("text_"+str(args[0])+".txt","wb+")
> f.write(str(args))
> f.close()
>
> manager()
>
> end program.
> *
>
> Placing "example1.py" in "C:\Python27\Lib" and then
> importing it from spss
> and calling the manager() function also does not work:
>
> *begin program.
> import os, example1, multiprocessing,spss
>
> example1.manager()
>
> end program.*
>
> Definitely, the problem does not lie with the very simplistic Python code.
>
> Could anyone offer an explanation why my multiprocessing Python example is
> unsuccessful in SPSS please?
>
> Thanks,
> Eugenia
>
>
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Multiprocessing-Python-Module-tp5727067p5727141.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Eugenia Cachia

Re: Multiprocessing - Python Module

--There's no traceback: only "Running BEGIN PROGRAM...", no error as such, and no result.
-- Using os.path.join does not change anything (we run everything on Win).
-- I am trying to implement multiprocessing for the querying multiple DBs (SELECT statement for long timeframe). Few simultanious processes access different DBs at the same time, and the SELECT statement is limited to a shorter timeframe.

My simple example is just a try at Python multiprocessing module from SPSS.

Another way to do it: to run all multiple simultanious queries from Python (it works), saving the result sets as SAV files - still have to try savReaderWriter :-)

Regards,
Eugenia