dear all
this is the problem: I have many (appr 600) csv-files that I need to import and finally merge into a single spss system file. -the name of the csv-file need to become a value in a variable -the date/time stamp of each file needs to become a values of a time variable Some time ago (two years) I asked for the assistence on an allmost similar problem for which Albert-Jan Roskam provided me with this python code (see below) which back then worked like clockwork. However not anymore. Also I understand there is the new package SPSSAUX which should do the trick. However going through the programming pdf and looking at examples was not very heklpful (mind you, I am not a real programmer). So any hints on where to look for it or some generic example of SPSSAUX in the context of this problem would be greatly appreciated. best regards Maurice *** merge all separate files in one single text file. BEGIN PROGRAM. import glob, spss, csv, os fs = sorted(glob.glob("V:/19082012/*.csv")) merged = "d:/temp/merged.txt" m = open(merged, "wb") writer = csv.writer(m, delimiter="\t") for fno, f in enumerate(fs): if fno % 50 == 0: print "--> Verwerkt file %s\n" % fno reader = csv.reader(open(f, "rU"), delimiter=",") if fno > 0: skipheader = reader.next() for lino, line in enumerate(reader): if fno == 0 and lino == 0: header = writer.writerow(line+ ["bestand"]) else: writer.writerow(line + [os.path.basename(f)]) m.close() cmd = r""" new file. get data /type = txt /file = '%s' /delcase = line /delimiters = "\t" /arrangement = delimited /firstcase = 2 /importcase = all /variables = x f18.2 source f1.0 bestand a40 . cache. fre source. """ print cmd % (merged) spss.Submit(cmd % (merged)) spss.Submit("save outfile = '%s.sav'." % (merged[:-4])) END PROGRAM. -- ___________________________________________________________________ Maurice Vergeer To contact me, see http://mauricevergeer.nl/node/5 To see my publications, see http://mauricevergeer.nl/node/1 ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Maurice,
Wonder why my code no longer works ;-) What errors are you getting?
Below are two methods to calculate a datestamp. A time stamp can be made by using e.g. "%H:%M:%S". In this case, the file modification date is used. The "time" method may be slightly faster, but the datetime method is really neat when you'd like to do arithmetic with dates/times.
>>> import time, datetime, os
>>> f = "d:/temp/somefile.csv" >>> iso_mdate = time.strftime("%Y-%m-%d", time.localtime(os.path.getmtime(f))) >>> mtime = datetime.datetime.fromtimestamp(os.path.getmtime(f))
>>> peilmoment = datetime.datetime(2012, 8, 14, 15, 0, 0, 0) >>> print (mtime - peilmoment).days Regards,
Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
In reply to this post by Maurice Vergeer
Hi again,
The following method does not need spss (http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/)
Note that it sometimes gives errors with e.g. chinese variabele names, so YMMV. The code below is untested.
import os, time, glob, csv
sys.path.append(r"file://server/share/folder/subfolder") # this is where the next .py file + spssio32.dll live. from SavReaderWriter import * # http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/ tempdir = os.getenv("temp")
header = ["x", "iso_mdate", "filename"] varTypes = {'x': 0, 'iso_mdate': 10, 'filename': 200} savFileName = os.path.join(tempdir, "combined.sav") with SavWriter(savFileName, header, varTypes) as sav: for n, csvfile in enumerate(sorted(glob.glob(os.path.join(tempdir, "*.csv"))): with open(csvfile, "rb") as f: reader = csv.reader(f, delimiter=";") skipheader = reader.next() iso_mdate = time.strftime("%Y-%m-%d", time.localtime(os.path.getmtime(f.name))) for line in reader: sav.writerow(line + [iso_mdate, f.name]) Regards,
Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Hi Albert-Jan,
thanks. Tinkering with your initial code from two years back it didn't work because the csv files were structured differently. For instance, the standard delimiter was a comma, but also " for a text field. I do not yet see how to implement this. The text-field has comma's in it as well, so the " as a delimiter is needed. I will try to test your code. One thing though, I already have trouble finding the suggested spssio32.dll-file on the IBM site. Also the 32 suggests this is a 32 bit dll? I run windows 64 bit, so would I need a 64-version? I'll get back when I have more. Thanks Maurice On Wed, Aug 22, 2012 at 1:24 PM, Albert-Jan Roskam <[hidden email]> wrote: > Hi again, > > The following method does not need spss > (http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/) > Note that it sometimes gives errors with e.g. chinese variabele names, so > YMMV. The code below is untested. > > import os, time, glob, csv > sys.path.append(r"file://server/share/folder/subfolder") # this is where the > next .py file + spssio32.dll live. > from SavReaderWriter import * # > http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/ > tempdir = os.getenv("temp") > header = ["x", "iso_mdate", "filename"] > varTypes = {'x': 0, 'iso_mdate': 10, 'filename': 200} > savFileName = os.path.join(tempdir, "combined.sav") > with SavWriter(savFileName, header, varTypes) as sav: > for n, csvfile in enumerate(sorted(glob.glob(os.path.join(tempdir, > "*.csv"))): > with open(csvfile, "rb") as f: > reader = csv.reader(f, delimiter=";") > skipheader = reader.next() > iso_mdate = time.strftime("%Y-%m-%d", > time.localtime(os.path.getmtime(f.name))) > for line in reader: > sav.writerow(line + [iso_mdate, f.name]) > > Regards, > Albert-Jan > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > All right, but apart from the sanitation, the medicine, education, wine, > public order, irrigation, roads, a > fresh water system, and public health, what have the Romans ever done for > us? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > From: Maurice Vergeer <[hidden email]> > To: [hidden email] > Sent: Tuesday, August 21, 2012 8:57 PM > Subject: [SPSSX-L] file name and time stamp as variables. help on SPSSAUX > (?) requested > > dear all > > this is the problem: I have many (appr 600) csv-files that I need to > import and finally merge into a single spss system file. > -the name of the csv-file need to become a value in a variable > -the date/time stamp of each file needs to become a values of a time > variable > > Some time ago (two years) I asked for the assistence on an allmost > similar problem for which Albert-Jan Roskam provided me with this > python code (see below) which back then worked like clockwork. However > not anymore. > Also I understand there is the new package SPSSAUX which should do the > trick. However going through the programming pdf and looking at > examples was not very heklpful (mind you, I am not a real programmer). > So any hints on where to look for it or some generic example of > SPSSAUX in the context of this problem would be greatly appreciated. > > best regards > Maurice > > *** merge all separate files in one single text file. > BEGIN PROGRAM. > import glob, spss, csv, os > fs = sorted(glob.glob("V:/19082012/*.csv")) > merged = "d:/temp/merged.txt" > m = open(merged, "wb") > writer = csv.writer(m, delimiter="\t") > for fno, f in enumerate(fs): > if fno % 50 == 0: > print "--> Verwerkt file %s\n" % fno > reader = csv.reader(open(f, "rU"), delimiter=",") > if fno > 0: > skipheader = reader.next() > for lino, line in enumerate(reader): > if fno == 0 and lino == 0: > header = writer.writerow(line+ ["bestand"]) > else: > writer.writerow(line + [os.path.basename(f)]) > m.close() > cmd = r""" > new file. > get data /type = txt /file = '%s' /delcase = line /delimiters = "\t" > /arrangement = delimited /firstcase = 2 /importcase = all /variables = > x f18.2 > source f1.0 > bestand a40 > . > cache. > fre source. > """ > print cmd % (merged) > spss.Submit(cmd % (merged)) > spss.Submit("save outfile = '%s.sav'." % (merged[:-4])) > END PROGRAM. > > > > -- > ___________________________________________________________________ > Maurice Vergeer > To contact me, see http://mauricevergeer.nl/node/5 > To see my publications, see http://mauricevergeer.nl/node/1 > ___________________________________________________________________ > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > -- ___________________________________________________________________ Maurice Vergeer To contact me, see http://mauricevergeer.nl/node/5 To see my publications, see http://mauricevergeer.nl/node/1 ___________________________________________________________________ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |