Python Adding Files & adjusting time

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Python Adding Files & adjusting time

Tim AT Home
Two issues:

1) I need to add several hundred (1500+) spss 'sav' files of exactly the
same
variable structure - I wrote a python script that works fine - but what
happens not
surprisingly is that the deeper into the loop - as the existing file becomes
larger -
it slows considierably - is there a quicker way to aggregate files?

Relevant Current Python Script

#Need variable to hold all sav filenames within 'outpath' directory
SpssFilesT = os.listdir(outpath)
SpssFiles = sorted(SpssFilesT)
#print SpssFiles


try:
     Submit(r"""
     GET FILE = '%s'.
     EXECUTE.
     """ %(outpath + SpssFiles[0]))
except:
    pass


try:
     #print SpssFiles[1:]
     for addfle in SpssFiles[1:]:
          Submit(r"""
          ADD FILES FILE= *
                   /FILE= '%s'.
          EXECUTE.
          """ %(outpath + addfle))
except:
    pass



2) Working with time - its nice when working with normal numeric data that
integer/fractional component is rounded for display when altering the format
- I have noticed with time (hr,mi,se.xx) that what happens if I compute a
new variable with 'less precision' ie., TIME11.2 >> TIME8 - that fractional
pieces are truncated
rather than rounded up - aside from whether I need to do this - is there in
fact an easy way to do it?



Thanks!

Tim

****************************

Notice: This e-mail and any attachments may contain confidential and
privileged information.  If you are not the intended recipient, please
notify the sender immediately by return e-mail, do not use the information,
delete this e-mail and destroy any copies.  Any dissemination or use of this
information by a person other than the intended recipient is unauthorized
and may be illegal.  Email transmissions cannot be guaranteed to be secure
or error free. The sender therefore does not accept any liability for errors
or omissions in the contents of this message that arise as a result of email
transmissions.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python Adding Files & adjusting time

Peck, Jon
Here is an example that adds all files in a directory.

You can add up to 50 files at once, so you would need to do this in batches of 50, but this takes advantage of matching many at a time.

BEGIN PROGRAM.
import spss, glob

savlist = glob.glob("c:/temp/parts/*.sav")
if savlist:
    cmd = "ADD FILES " +\
    "\n".join(["/FILE='" + fn + "'" for fn in savlist])
    spss.Submit(cmd)
    print "\nFiles merged:\n", "\n".join(savlist)
else:
    print "No files found to merge"
END PROGRAM.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tim Hennigar
Sent: Thursday, June 26, 2008 10:28 AM
To: [hidden email]
Subject: [SPSSX-L] Python Adding Files & adjusting time

Two issues:

1) I need to add several hundred (1500+) spss 'sav' files of exactly the
same
variable structure - I wrote a python script that works fine - but what
happens not
surprisingly is that the deeper into the loop - as the existing file becomes
larger -
it slows considierably - is there a quicker way to aggregate files?

Relevant Current Python Script

#Need variable to hold all sav filenames within 'outpath' directory
SpssFilesT = os.listdir(outpath)
SpssFiles = sorted(SpssFilesT)
#print SpssFiles


try:
     Submit(r"""
     GET FILE = '%s'.
     EXECUTE.
     """ %(outpath + SpssFiles[0]))
except:
    pass


try:
     #print SpssFiles[1:]
     for addfle in SpssFiles[1:]:
          Submit(r"""
          ADD FILES FILE= *
                   /FILE= '%s'.
          EXECUTE.
          """ %(outpath + addfle))
except:
    pass



2) Working with time - its nice when working with normal numeric data that
integer/fractional component is rounded for display when altering the format
- I have noticed with time (hr,mi,se.xx) that what happens if I compute a
new variable with 'less precision' ie., TIME11.2 >> TIME8 - that fractional
pieces are truncated
rather than rounded up - aside from whether I need to do this - is there in
fact an easy way to do it?
[>>>Peck, Jon]
This is strictly a display issue.  Internally the values are always kept to full double precision value - down to small fractions of a second.

If you want to round the actual values, you can do computes using the rnd function  with an appropriate multiplier and addition of a half value.

HTH,
Jon Peck



Thanks!

Tim

****************************

Notice: This e-mail and any attachments may contain confidential and
privileged information.  If you are not the intended recipient, please
notify the sender immediately by return e-mail, do not use the information,
delete this e-mail and destroy any copies.  Any dissemination or use of this
information by a person other than the intended recipient is unauthorized
and may be illegal.  Email transmissions cannot be guaranteed to be secure
or error free. The sender therefore does not accept any liability for errors
or omissions in the contents of this message that arise as a result of email
transmissions.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python Adding Files & adjusting time

Fry, Jonathan B.
Adding a CACHE command to the sequence between the ADD FILES command and the EXECUTE should keep the process from slowing down.

Jonathan Fry
SPSS Inc.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Peck, Jon
Sent: Thursday, June 26, 2008 11:44 AM
To: [hidden email]
Subject: Re: Python Adding Files & adjusting time

Here is an example that adds all files in a directory.

You can add up to 50 files at once, so you would need to do this in batches of 50, but this takes advantage of matching many at a time.

BEGIN PROGRAM.
import spss, glob

savlist = glob.glob("c:/temp/parts/*.sav")
if savlist:
    cmd = "ADD FILES " +\
    "\n".join(["/FILE='" + fn + "'" for fn in savlist])
    spss.Submit(cmd)
    print "\nFiles merged:\n", "\n".join(savlist)
else:
    print "No files found to merge"
END PROGRAM.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tim Hennigar
Sent: Thursday, June 26, 2008 10:28 AM
To: [hidden email]
Subject: [SPSSX-L] Python Adding Files & adjusting time

Two issues:

1) I need to add several hundred (1500+) spss 'sav' files of exactly the
same
variable structure - I wrote a python script that works fine - but what
happens not
surprisingly is that the deeper into the loop - as the existing file becomes
larger -
it slows considierably - is there a quicker way to aggregate files?

Relevant Current Python Script

#Need variable to hold all sav filenames within 'outpath' directory
SpssFilesT = os.listdir(outpath)
SpssFiles = sorted(SpssFilesT)
#print SpssFiles


try:
     Submit(r"""
     GET FILE = '%s'.
     EXECUTE.
     """ %(outpath + SpssFiles[0]))
except:
    pass


try:
     #print SpssFiles[1:]
     for addfle in SpssFiles[1:]:
          Submit(r"""
          ADD FILES FILE= *
                   /FILE= '%s'.
          EXECUTE.
          """ %(outpath + addfle))
except:
    pass



2) Working with time - its nice when working with normal numeric data that
integer/fractional component is rounded for display when altering the format
- I have noticed with time (hr,mi,se.xx) that what happens if I compute a
new variable with 'less precision' ie., TIME11.2 >> TIME8 - that fractional
pieces are truncated
rather than rounded up - aside from whether I need to do this - is there in
fact an easy way to do it?
[>>>Peck, Jon]
This is strictly a display issue.  Internally the values are always kept to full double precision value - down to small fractions of a second.

If you want to round the actual values, you can do computes using the rnd function  with an appropriate multiplier and addition of a half value.

HTH,
Jon Peck



Thanks!

Tim

****************************

Notice: This e-mail and any attachments may contain confidential and
privileged information.  If you are not the intended recipient, please
notify the sender immediately by return e-mail, do not use the information,
delete this e-mail and destroy any copies.  Any dissemination or use of this
information by a person other than the intended recipient is unauthorized
and may be illegal.  Email transmissions cannot be guaranteed to be secure
or error free. The sender therefore does not accept any liability for errors
or omissions in the contents of this message that arise as a result of email
transmissions.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python Adding Files & adjusting time

Richard Ristow
In reply to this post by Tim AT Home
At 12:27 PM 6/26/2008, Tim Hennigar wrote:

>1) I need to add several hundred (1500+) spss 'sav' files of exactly
>the same variable structure - I wrote a python script that works
>fine - but as the existing file becomes larger - it slows
>considierably - is there a quicker way to aggregate files?
>
>Relevant Current Python Script
>
>#Need variable to hold all sav filenames within 'outpath' directory
>SpssFilesT = os.listdir(outpath)
>SpssFiles = sorted(SpssFilesT)
>#print SpssFiles
>
>try:
>      Submit(r"""
>      GET FILE = '%s'.
>      EXECUTE.
>      """ %(outpath + SpssFiles[0]))
>except:
>     pass
>
>try:
>      #print SpssFiles[1:]
>      for addfle in SpssFiles[1:]:
>           Submit(r"""
>           ADD FILES FILE= *
>                    /FILE= '%s'.
>           EXECUTE.
>           """ %(outpath + addfle))
>except:
>     pass

First, try leaving out the EXECUTE statements. I think that, in
itself, will help noticeably.

Second, as you've surmised, ADD FILES does take longer as the files
being added get larger. The solution is to do fewer ADD FILES
commands, and do more with each one.

Here's pseudo-code. It'll translate to Python code that's more
complicated than what you have, because of breaking up the loop into
an inner and an outer loop, but it should approach 49 times as fast.
(You can't do it with a single ADD FILES, because there's a limit of
50 files on an ADD FILES.)

First,
GET FILE <first_file_on_list>

Then,
Files_still_to_process = Long_list_of_files

UNTIL NUMBER_OF(Files_still_to_process) = 0.
    Files_to_process_now   = FIRST_50_OF(Files_still_to_process)
    Files_still_to_process =  DROP_50_OF(Files_still_to_process)
    Command = "ADD FILES /FILE=*"
    UNTIL NUMBER_OF(Files_to_process_now) = 0.
       One_File=              FIRST_01_OF(Files_to_process_now).
       Files_to_process_now =  DROP_01_OF(Files_to_process_now).
       Command = concat(Command,<newline>,"  /FILE = ",One_File")
    END UNTIL
    SPSS.Submit(Command)
END UNTIL

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD