can SPSS open zip or rar files (working directly on zip or rar files) ?
Dr. Frank Gaeth
|
I would use the HOST command to execute my favorite zip utility.
On 11 July 2013 08:55, drfg2008 <[hidden email]> wrote: can SPSS open zip or rar files (working directly on zip or rar files) ? |
In reply to this post by drfg2008
Dear Frank,
A while ago, I had data (many .gz files) that didn't fit my harddisk if unzipped. I solved it like so: -Loop over the files with Python -Use gzip.open(...) to unzip a single file -Read the file into SPSS and reduce its size by aggregating/filtering -Save as .sav -Delete the unzipped .txt file -Jump to next .gz file -And so on... Finally, I ran another loop over the .sav files for merging them with ADD FILES (see http://pythonforspss.org/merge-many-data-files.html). I'm sure there are (much) more efficient ways to do this but this process ran smoothly in a reasonable amount of time. Best, Ruben |
----- Original Message -----
> From: Ruben Geert van den Berg <[hidden email]> > To: [hidden email] > Cc: > Sent: Thursday, July 11, 2013 8:55 AM > Subject: Re: [SPSSX-L] zip - rar files > > Dear Frank, > > A while ago, I had data (many .gz files) that didn't fit my harddisk if > unzipped. I solved it like so: > > -Loop over the files with Python > -Use gzip.open(...) to unzip a single file > -Read the file into SPSS and reduce its size by aggregating/filtering > -Save as .sav > -Delete the unzipped .txt file > -Jump to next .gz file > -And so on... > > Finally, I ran another loop over the .sav files for merging them with ADD > FILES (see http://pythonforspss.org/merge-many-data-files.html). Ruben, how about this version? It seems that your version ignores files with extensions other than all-lowercase .sav. import re import os import spss def superAddFiles(p, resultfile, resultvar="origin"): """Use Spss ADD FILES on all the .sav or .zsav files in path <p> and write to <resultfile>. This may be more than the ADD FILES limit (50). If <resultfile> is not None (and a valid variable name), a variable containing the source file names is created. """ savs = [sav for sav in os.listdir(p) if re.match(r".*\.z?sav$", sav, re.I)] cmds = " /file=%(sav)r /in=%(sav)s" if resultvar else " /file=%(sav)r" cmds = [cmds % locals() for n, sav in enumerate(savs)] split = ".\nexecute.\nadd files /file=*" crap = [cmds.insert(i, split) for i in range(n) if i and i % 50 == 0] adds = "add files\n%s%s" % ("\n".join(cmds), "" if n % 50 == 0 else ".") label, save = "", "save outfile = %r" % resultfile if resultvar: label = ("if(%(sav)s)%(resultvar)s=%(i)s.\n" + "add value labels %(resultvar)s %(i)s %(sav)r.\n") label = "\n".join([label % locals() for i, sav in enumerate(savs)]) save += "/drop = %s." % "\n ".join(savs) spss.Submit(["cd %r." % p , adds, label, save]) superAddFiles(os.getenv("temp"), "result.sav") > I'm sure there are (much) more efficient ways to do this but this process > ran smoothly in a reasonable amount of time. > > Best, > > Ruben > > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/zip-rar-files-tp5721098p5721100.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I'm afraid it didn't make the finish upon testing:
Traceback (most recent call last): File "<string>", line 23, in <module> File "<string>", line 14, in superAddFiles UnboundLocalError: local variable 'n' referenced before assignment What does "%r" mean? Best, Ruben |
Python 2.7.3 (default, Apr 10 2013, 05:46:21) [GCC 4.6.3] on linux2
>>> print "it works just %r over here, I just checked it" % "fine" it works just 'fine' over here, I just checked it The only thing that makes the code break is file names with spaces in them (or that would lead to invalid variable names). But file%20names%20with%20spaces%20suck in eg. html. 'n' should be available: cmds = [cmds % locals() for n, sav in
enumerate(savs)] Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
I was a bit surprised as well about the error regarding "n"... Perhaps it is a space in a path that causes the trouble indeed. I always avoid them but Windows doesn't:
os.getenv("temp") returns C:\DOCUME~1\Work\LOCALS~1\Temp Fully written out, it would start with "c:\DOCUMENTS AND SETTINGS..." (note the spaces) |
Free forum by Nabble | Edit this page |