I am new to the group so I hope I post this correctly. I am processing lots of files with student assessment data. For one project I have to merge cases from ~120 different files. Each file is for a different grade, language of test, and administration (e.g., Math3Eng1_2011, Math3Sp1_2011, Math4Eng1_2011, Math5-1_2011, Math6_2011). Each school year there are 26-34 different files. I want to be able to easily/efficiently merge the cases from all files for a particular school year. To make matters more complicated, the file naming structure changes year to year. I don’t want to merge files one at a time because a total of over 5M records exist and it takes forever to do 25+ data passes. I don’t know how to merge multiple files in one data pass unless I hard coded the # of files to merge (but this varies). I could take a directory listing via windows command prompt, then using an editor add the prefixes and suffixes to each line so it can be pasted into a merge statement, but I wanted a more automated process. Anybody have a good suggestions? |
I will be out of the office until the afternoon of July 30th. I will respond to your email upon my return.
Sincerely, Resha ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jim Van Overschelde
Easy to do with a few lines of code using
Python programmability. You would need to install the Python Essentials
from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral)
if you haven't already done that.
Then run this from a syntax window. Change the filespec line below to select the files, e.g., filespec = r"c:/data/*2111.sav" I've assumed that they are all in the same directory. begin program. import spss, glob filespec = r"c:/temp/parts/e*.sav" files = glob.glob(filespec) cmd = "ADD FILES " all = " ".join(["""/FILE="%s" """ % f for f in files]) cmd = cmd + all spss.Submit(cmd) end program. dataset name merged. exec. HTH, Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Jim Van Overschelde <[hidden email]> To: [hidden email] Date: 07/25/2012 01:33 PM Subject: [SPSSX-L] Merging numerous files via macro Sent by: "SPSSX(r) Discussion" <[hidden email]> I am new to the group so I hope I post this correctly. I am processing lots of files with student assessment data. For one project I have to merge cases from ~120 different files. Each file is for a different grade, language of test, and administration (e.g., Math3Eng1_2011, Math3Sp1_2011, Math4Eng1_2011, Math5-1_2011, Math6_2011). Each school year there are 26-34 different files. I want to be able to easily/efficiently merge the cases from all files for a particular school year. To make matters more complicated, the file naming structure changes year to year. I don’t want to merge files one at a time because a total of over 5M records exist and it takes forever to do 25+ data passes. I don’t know how to merge multiple files in one data pass unless I hard coded the # of files to merge (but this varies). I could take a directory listing via windows command prompt, then using an editor add the prefixes and suffixes to each line so it can be pasted into a merge statement, but I wanted a more automated process. Anybody have a good suggestions? |
I installed Python essentials and ran code but got no errors, output, or action. Removed Python essentials and reinstalled as administrator and it worked great!!! Thanks Jon. From: Jon K Peck [mailto:[hidden email]] Easy to do with a few lines of code using Python programmability. You would need to install the Python Essentials from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral) if you haven't already done that.
|
Free forum by Nabble | Edit this page |