zip - rar files

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

zip - rar files

drfg2008
can SPSS open zip or rar files (working directly on zip or rar files) ?
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: zip - rar files

Paul Cook
I would use the HOST command to execute my favorite zip utility.

Kind regards,

Paul Cook



On 11 July 2013 08:55, drfg2008 <[hidden email]> wrote:
can SPSS open zip or rar files (working directly on zip or rar files) ?



-----
Dr. Frank Gaeth
FU-Berlin

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/zip-rar-files-tp5721098.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: zip - rar files

Ruben Geert van den Berg
In reply to this post by drfg2008
Dear Frank,

A while ago, I had data (many .gz files) that didn't fit my harddisk if unzipped. I solved it like so:

-Loop over the files with Python
-Use gzip.open(...) to unzip a single file
-Read the file into SPSS and reduce its size by aggregating/filtering
-Save as .sav
-Delete the unzipped .txt file
-Jump to next .gz file
-And so on...

Finally, I ran another loop over the .sav files for merging them with ADD FILES (see http://pythonforspss.org/merge-many-data-files.html).

I'm sure there are (much) more efficient ways to do this but this process ran smoothly in a reasonable amount of time.

Best,

Ruben
Reply | Threaded
Open this post in threaded view
|

Re: zip - rar files

Albert-Jan Roskam
----- Original Message -----

> From: Ruben Geert van den Berg <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Thursday, July 11, 2013 8:55 AM
> Subject: Re: [SPSSX-L] zip - rar files
>
> Dear Frank,
>
> A while ago, I had data (many .gz files) that didn't fit my harddisk if
> unzipped. I solved it like so:
>
> -Loop over the files with Python
> -Use gzip.open(...) to unzip a single file
> -Read the file into SPSS and reduce its size by aggregating/filtering
> -Save as .sav
> -Delete the unzipped .txt file
> -Jump to next .gz file
> -And so on...
>
> Finally, I ran another loop over the .sav files for merging them with ADD
> FILES (see http://pythonforspss.org/merge-many-data-files.html).

Ruben, how about this version? It seems that your version ignores files with extensions other than all-lowercase .sav.

import re
import os
import spss
def superAddFiles(p, resultfile, resultvar="origin"):
    """Use Spss ADD FILES on all the .sav or .zsav files in path <p> and write
    to <resultfile>. This may be more than the ADD FILES limit (50). If
    <resultfile> is not None (and a valid variable name), a variable containing
    the source file names is created. """
    savs = [sav for sav in os.listdir(p) if re.match(r".*\.z?sav$", sav, re.I)]
    cmds = "  /file=%(sav)r /in=%(sav)s" if resultvar else "  /file=%(sav)r"
    cmds = [cmds % locals() for n, sav in enumerate(savs)]
    split = ".\nexecute.\nadd files /file=*"
    crap = [cmds.insert(i, split) for i in range(n) if i and i % 50 == 0]
    adds = "add files\n%s%s" % ("\n".join(cmds), "" if n % 50 == 0 else ".")
    label, save = "", "save outfile = %r" % resultfile
    if resultvar:
        label = ("if(%(sav)s)%(resultvar)s=%(i)s.\n" +
                 "add value labels %(resultvar)s %(i)s %(sav)r.\n")
        label = "\n".join([label % locals() for i, sav in enumerate(savs)])
        save += "/drop = %s." % "\n  ".join(savs)
    spss.Submit(["cd %r." % p , adds, label, save])
superAddFiles(os.getenv("temp"), "result.sav")


> I'm sure there are (much) more efficient ways to do this but this process
> ran smoothly in a reasonable amount of time.
>
> Best,
>
> Ruben
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/zip-rar-files-tp5721098p5721100.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: zip - rar files

Ruben Geert van den Berg
I'm afraid it didn't make the finish upon testing:

Traceback (most recent call last):
  File "<string>", line 23, in <module> 
  File "<string>", line 14, in superAddFiles
UnboundLocalError: local variable 'n' referenced before assignment

What does "%r" mean?

Best,

Ruben
Reply | Threaded
Open this post in threaded view
|

Re: zip - rar files

Albert-Jan Roskam
Python 2.7.3 (default, Apr 10 2013, 05:46:21) [GCC 4.6.3] on linux2
>>> print "it works just %r over here, I just checked it" % "fine"
it works just 'fine' over here, I just checked it

The only thing that makes the code break is file names with spaces in them (or that would lead to invalid variable names). But file%20names%20with%20spaces%20suck in eg. html.

'n' should be available:
cmds = [cmds % locals() for n, sav in enumerate(savs)]
 
Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

From: Ruben Geert van den Berg <[hidden email]>
To: [hidden email]
Sent: Thursday, July 11, 2013 5:29 PM
Subject: Re: [SPSSX-L] zip - rar files

I'm afraid it didn't make the finish upon testing:

Traceback (most recent call last):
  File "<string>", line 23, in <module>
  File "<string>", line 14, in superAddFiles
UnboundLocalError: local variable 'n' referenced before assignment

What does "%r" mean?

Best,

Ruben



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/zip-rar-files-tp5721098p5721108.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: zip - rar files

Ruben Geert van den Berg
I was a bit surprised as well about the error regarding "n"... Perhaps it is a space in a path that causes the trouble indeed. I always avoid them but Windows doesn't:

os.getenv("temp")

returns

C:\DOCUME~1\Work\LOCALS~1\Temp

Fully written out, it would start with "c:\DOCUMENTS AND SETTINGS..." (note the spaces)