We are trying to use multiprocessing module from the Python library in SPSS, so far - unsuccessfully:
begin program. import multiprocessing,spss def doManager (): print "debug #1" jobs=[] * populate jobs - list of tuples resultsAll = pool.map(doWorker, jobs) print "debug #2" pool.close() pool.join() doWorker(jobs): * do something with every tuple fom "jobs" doManager() end program. If I remove the definition for doWorker function, print "debug #1" works, but an error is thrown that doWorker func is not defined. If doWorker is defined, print "debug #" does not appear, none of them. SPSS engine keeps on running, does not throw an error, but does not produce any result. My question: is it possible to use Python multiprocessing module from SPSS please? As a workaround: how to manage SPSS production mode from the command line? In this way I would be able to run few SPSS instances from Python Multiprocessing Pool. Thanks, Eugenia |
The answer probably lies in the code in
the doWorker job. The spssengine and stats processes (backend and
frontend) wait for completion of the program block, but the SPSS apis would
respond. However, they are not multithreaded and eventually call
into C code, so the Python GIL might interfere.
Try a test where any SPSS api calls are made in the top level process and the workers just do things that don't require the apis. You can, alternative, use external mode where your Python code drives Statistics - just import the spss module in your Python program. For command line production mode information, search the help for "command line". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Eugenia Cachia <[hidden email]> To: [hidden email] Date: 08/23/2014 07:08 AM Subject: [SPSSX-L] Multiprocessing - Python Module Sent by: "SPSSX(r) Discussion" <[hidden email]> We are trying to use multiprocessing module from the Python library in SPSS, so far - unsuccessfully: *begin program. import multiprocessing,spss def doManager (): print "debug #1" jobs=[] * populate jobs - list of tuples resultsAll = pool.map(doWorker, jobs) print "debug #2" pool.close() pool.join() doWorker(jobs): * do something with every tuple fom "jobs" doManager() end program. * If I remove the definition for doWorker function, *print "debug #1"* works, but an error is thrown that doWorker func is not defined. If doWorker is defined, *print "debug #"* does not appear, none of them. SPSS engine keeps on running, does not throw an error, but does not produce any result. My question: is it possible to use Python multiprocessing module from SPSS please? As a workaround: how to manage SPSS production mode from the command line? In this way I would be able to run few SPSS instances from Python Multiprocessing Pool. Thanks, Eugenia -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiprocessing-Python-Module-tp5727067.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Eugenia Cachia
I have kept on experimenting with using Python multiprocessing module from SPSS, so far - unsuccessfull:
Python script, lets call it "example1.py" - works ok: import os import multiprocessing def manager(): args=[] for n in xrange(1,6): args.append((n, "hallo")) pool=multiprocessing.Pool(processes=4) pool.map(worker,args) pool.close() pool.join() def worker(args): os.chdir("c:\\Temp1") print args f=open("text_"+str(args[0])+".txt","wb+") f.write(str(args)) f.close() def main(): manager() if __name__ == '__main__': main() Python script wrapped in SPSS - does not work. begin program. import os,multiprocessing,spss def manager(): args=[] for n in xrange(1,6): args.append((n, "hallo")) pool=multiprocessing.Pool(processes=4) pool.map(worker,args) pool.close() pool.join() def worker(args): os.chdir("c:\\Temp1") print args f=open("text_"+str(args[0])+".txt","wb+") f.write(str(args)) f.close() manager() end program. Placing "example1.py" in "C:\Python27\Lib" and then importing it from spss and calling the manager() function also does not work: begin program. import os, example1, multiprocessing,spss example1.manager() end program. Definitely, the problem does not lie with the very simplistic Python code. Could anyone offer an explanation why my multiprocessing Python example is unsuccessful in SPSS please? Thanks, Eugenia |
--Wat are you trying to accomplish?
--What error do you get? (full traceback) --Does it work if you do not use os.chdir, but os.path.join instead? Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ----- Original Message ----- > From: Eugenia Cachia <[hidden email]> > To: [hidden email] > Cc: > Sent: Wednesday, September 3, 2014 5:09 PM > Subject: Re: [SPSSX-L] Multiprocessing - Python Module > > I have kept on experimenting with using Python multiprocessing module from > SPSS, so far - unsuccessfull: > Python script, lets call it "example1.py" - works ok: > > *import os > import multiprocessing > > def manager(): > args=[] > for n in xrange(1,6): > args.append((n, "hallo")) > pool=multiprocessing.Pool(processes=4) > pool.map(worker,args) > pool.close() > pool.join() > > def worker(args): > os.chdir("c:\\Temp1") > print args > f=open("text_"+str(args[0])+".txt","wb+") > f.write(str(args)) > f.close() > > def main(): > manager() > if __name__ == '__main__': > main() > * > > Python script wrapped in SPSS - does not work. > > *begin program. > import os,multiprocessing,spss > > def manager(): > args=[] > for n in xrange(1,6): > args.append((n, "hallo")) > pool=multiprocessing.Pool(processes=4) > pool.map(worker,args) > pool.close() > pool.join() > > > def worker(args): > os.chdir("c:\\Temp1") > print args > f=open("text_"+str(args[0])+".txt","wb+") > f.write(str(args)) > f.close() > > manager() > > end program. > * > > Placing "example1.py" in "C:\Python27\Lib" and then > importing it from spss > and calling the manager() function also does not work: > > *begin program. > import os, example1, multiprocessing,spss > > example1.manager() > > end program.* > > Definitely, the problem does not lie with the very simplistic Python code. > > Could anyone offer an explanation why my multiprocessing Python example is > unsuccessful in SPSS please? > > Thanks, > Eugenia > > > > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Multiprocessing-Python-Module-tp5727067p5727141.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
--There's no traceback: only "Running BEGIN PROGRAM...", no error as such, and no result.
-- Using os.path.join does not change anything (we run everything on Win). -- I am trying to implement multiprocessing for the querying multiple DBs (SELECT statement for long timeframe). Few simultanious processes access different DBs at the same time, and the SELECT statement is limited to a shorter timeframe. My simple example is just a try at Python multiprocessing module from SPSS. Another way to do it: to run all multiple simultanious queries from Python (it works), saving the result sets as SAV files - still have to try savReaderWriter :-) Regards, Eugenia |
Free forum by Nabble | Edit this page |