|
Two issues:
1) I need to add several hundred (1500+) spss 'sav' files of exactly the same variable structure - I wrote a python script that works fine - but what happens not surprisingly is that the deeper into the loop - as the existing file becomes larger - it slows considierably - is there a quicker way to aggregate files? Relevant Current Python Script #Need variable to hold all sav filenames within 'outpath' directory SpssFilesT = os.listdir(outpath) SpssFiles = sorted(SpssFilesT) #print SpssFiles try: Submit(r""" GET FILE = '%s'. EXECUTE. """ %(outpath + SpssFiles[0])) except: pass try: #print SpssFiles[1:] for addfle in SpssFiles[1:]: Submit(r""" ADD FILES FILE= * /FILE= '%s'. EXECUTE. """ %(outpath + addfle)) except: pass 2) Working with time - its nice when working with normal numeric data that integer/fractional component is rounded for display when altering the format - I have noticed with time (hr,mi,se.xx) that what happens if I compute a new variable with 'less precision' ie., TIME11.2 >> TIME8 - that fractional pieces are truncated rather than rounded up - aside from whether I need to do this - is there in fact an easy way to do it? Thanks! Tim **************************** Notice: This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, do not use the information, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. Email transmissions cannot be guaranteed to be secure or error free. The sender therefore does not accept any liability for errors or omissions in the contents of this message that arise as a result of email transmissions. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Here is an example that adds all files in a directory.
You can add up to 50 files at once, so you would need to do this in batches of 50, but this takes advantage of matching many at a time. BEGIN PROGRAM. import spss, glob savlist = glob.glob("c:/temp/parts/*.sav") if savlist: cmd = "ADD FILES " +\ "\n".join(["/FILE='" + fn + "'" for fn in savlist]) spss.Submit(cmd) print "\nFiles merged:\n", "\n".join(savlist) else: print "No files found to merge" END PROGRAM. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tim Hennigar Sent: Thursday, June 26, 2008 10:28 AM To: [hidden email] Subject: [SPSSX-L] Python Adding Files & adjusting time Two issues: 1) I need to add several hundred (1500+) spss 'sav' files of exactly the same variable structure - I wrote a python script that works fine - but what happens not surprisingly is that the deeper into the loop - as the existing file becomes larger - it slows considierably - is there a quicker way to aggregate files? Relevant Current Python Script #Need variable to hold all sav filenames within 'outpath' directory SpssFilesT = os.listdir(outpath) SpssFiles = sorted(SpssFilesT) #print SpssFiles try: Submit(r""" GET FILE = '%s'. EXECUTE. """ %(outpath + SpssFiles[0])) except: pass try: #print SpssFiles[1:] for addfle in SpssFiles[1:]: Submit(r""" ADD FILES FILE= * /FILE= '%s'. EXECUTE. """ %(outpath + addfle)) except: pass 2) Working with time - its nice when working with normal numeric data that integer/fractional component is rounded for display when altering the format - I have noticed with time (hr,mi,se.xx) that what happens if I compute a new variable with 'less precision' ie., TIME11.2 >> TIME8 - that fractional pieces are truncated rather than rounded up - aside from whether I need to do this - is there in fact an easy way to do it? [>>>Peck, Jon] This is strictly a display issue. Internally the values are always kept to full double precision value - down to small fractions of a second. If you want to round the actual values, you can do computes using the rnd function with an appropriate multiplier and addition of a half value. HTH, Jon Peck Thanks! Tim **************************** Notice: This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, do not use the information, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. Email transmissions cannot be guaranteed to be secure or error free. The sender therefore does not accept any liability for errors or omissions in the contents of this message that arise as a result of email transmissions. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Adding a CACHE command to the sequence between the ADD FILES command and the EXECUTE should keep the process from slowing down.
Jonathan Fry SPSS Inc. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon Sent: Thursday, June 26, 2008 11:44 AM To: [hidden email] Subject: Re: Python Adding Files & adjusting time Here is an example that adds all files in a directory. You can add up to 50 files at once, so you would need to do this in batches of 50, but this takes advantage of matching many at a time. BEGIN PROGRAM. import spss, glob savlist = glob.glob("c:/temp/parts/*.sav") if savlist: cmd = "ADD FILES " +\ "\n".join(["/FILE='" + fn + "'" for fn in savlist]) spss.Submit(cmd) print "\nFiles merged:\n", "\n".join(savlist) else: print "No files found to merge" END PROGRAM. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tim Hennigar Sent: Thursday, June 26, 2008 10:28 AM To: [hidden email] Subject: [SPSSX-L] Python Adding Files & adjusting time Two issues: 1) I need to add several hundred (1500+) spss 'sav' files of exactly the same variable structure - I wrote a python script that works fine - but what happens not surprisingly is that the deeper into the loop - as the existing file becomes larger - it slows considierably - is there a quicker way to aggregate files? Relevant Current Python Script #Need variable to hold all sav filenames within 'outpath' directory SpssFilesT = os.listdir(outpath) SpssFiles = sorted(SpssFilesT) #print SpssFiles try: Submit(r""" GET FILE = '%s'. EXECUTE. """ %(outpath + SpssFiles[0])) except: pass try: #print SpssFiles[1:] for addfle in SpssFiles[1:]: Submit(r""" ADD FILES FILE= * /FILE= '%s'. EXECUTE. """ %(outpath + addfle)) except: pass 2) Working with time - its nice when working with normal numeric data that integer/fractional component is rounded for display when altering the format - I have noticed with time (hr,mi,se.xx) that what happens if I compute a new variable with 'less precision' ie., TIME11.2 >> TIME8 - that fractional pieces are truncated rather than rounded up - aside from whether I need to do this - is there in fact an easy way to do it? [>>>Peck, Jon] This is strictly a display issue. Internally the values are always kept to full double precision value - down to small fractions of a second. If you want to round the actual values, you can do computes using the rnd function with an appropriate multiplier and addition of a half value. HTH, Jon Peck Thanks! Tim **************************** Notice: This e-mail and any attachments may contain confidential and privileged information. If you are not the intended recipient, please notify the sender immediately by return e-mail, do not use the information, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. Email transmissions cannot be guaranteed to be secure or error free. The sender therefore does not accept any liability for errors or omissions in the contents of this message that arise as a result of email transmissions. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Tim AT Home
At 12:27 PM 6/26/2008, Tim Hennigar wrote:
>1) I need to add several hundred (1500+) spss 'sav' files of exactly >the same variable structure - I wrote a python script that works >fine - but as the existing file becomes larger - it slows >considierably - is there a quicker way to aggregate files? > >Relevant Current Python Script > >#Need variable to hold all sav filenames within 'outpath' directory >SpssFilesT = os.listdir(outpath) >SpssFiles = sorted(SpssFilesT) >#print SpssFiles > >try: > Submit(r""" > GET FILE = '%s'. > EXECUTE. > """ %(outpath + SpssFiles[0])) >except: > pass > >try: > #print SpssFiles[1:] > for addfle in SpssFiles[1:]: > Submit(r""" > ADD FILES FILE= * > /FILE= '%s'. > EXECUTE. > """ %(outpath + addfle)) >except: > pass First, try leaving out the EXECUTE statements. I think that, in itself, will help noticeably. Second, as you've surmised, ADD FILES does take longer as the files being added get larger. The solution is to do fewer ADD FILES commands, and do more with each one. Here's pseudo-code. It'll translate to Python code that's more complicated than what you have, because of breaking up the loop into an inner and an outer loop, but it should approach 49 times as fast. (You can't do it with a single ADD FILES, because there's a limit of 50 files on an ADD FILES.) First, GET FILE <first_file_on_list> Then, Files_still_to_process = Long_list_of_files UNTIL NUMBER_OF(Files_still_to_process) = 0. Files_to_process_now = FIRST_50_OF(Files_still_to_process) Files_still_to_process = DROP_50_OF(Files_still_to_process) Command = "ADD FILES /FILE=*" UNTIL NUMBER_OF(Files_to_process_now) = 0. One_File= FIRST_01_OF(Files_to_process_now). Files_to_process_now = DROP_01_OF(Files_to_process_now). Command = concat(Command,<newline>," /FILE = ",One_File") END UNTIL SPSS.Submit(Command) END UNTIL ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
