Merging numerous files via macro

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging numerous files via macro

Jim Van Overschelde

I am new to the group so I hope I post this correctly. 

 

I am processing lots of files with student assessment data.  For one project I have to merge cases from ~120 different files.

Each file is for a different grade, language of test, and administration (e.g., Math3Eng1_2011, Math3Sp1_2011, Math4Eng1_2011, Math5-1_2011, Math6_2011).  Each school year there are 26-34 different files.  I want to be able to easily/efficiently merge the cases from all files for a particular school year.  To make matters more complicated, the file naming structure changes year to year.  I don’t want to merge files one at a time because a total of over 5M records exist and it takes forever to do 25+ data passes.  I don’t know how to merge multiple files in one data pass unless I hard coded the # of files to merge (but this varies). 

 

I could take a directory listing via windows command prompt, then using an editor add the prefixes and suffixes to each line so it can be pasted into a merge statement, but I wanted a more automated process. 

 

Anybody have a good suggestions?

 

Reply | Threaded
Open this post in threaded view
|

Automatic reply: Merging numerous files via macro

Kreischer,Resha M
I will be out of the office until the afternoon of July 30th. I will respond to your email upon my return.

Sincerely,

Resha

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Merging numerous files via macro

Jon K Peck
In reply to this post by Jim Van Overschelde
Easy to do with a few lines of code using Python programmability.  You would need to install the Python Essentials from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral) if you haven't already done that.

Then run this from a syntax window.  Change the filespec line below to select the files, e.g.,
filespec = r"c:/data/*2111.sav"
I've assumed that they are all in the same directory.

begin program.
import spss, glob

filespec = r"c:/temp/parts/e*.sav"

files = glob.glob(filespec)
cmd = "ADD FILES "
all = " ".join(["""/FILE="%s" """ % f for f in files])
cmd = cmd + all
spss.Submit(cmd)
end program.

dataset name merged.
exec.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Jim Van Overschelde <[hidden email]>
To:        [hidden email]
Date:        07/25/2012 01:33 PM
Subject:        [SPSSX-L] Merging numerous files via macro
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I am new to the group so I hope I post this correctly.  
 
I am processing lots of files with student assessment data.  For one project I have to merge cases from ~120 different files.
Each file is for a different grade, language of test, and administration (e.g., Math3Eng1_2011, Math3Sp1_2011, Math4Eng1_2011, Math5-1_2011, Math6_2011).  Each school year there are 26-34 different files.  I want to be able to easily/efficiently merge the cases from all files for a particular school year.  To make matters more complicated, the file naming structure changes year to year.  I don’t want to merge files one at a time because a total of over 5M records exist and it takes forever to do 25+ data passes.  I don’t know how to merge multiple files in one data pass unless I hard coded the # of files to merge (but this varies).  
 
I could take a directory listing via windows command prompt, then using an editor add the prefixes and suffixes to each line so it can be pasted into a merge statement, but I wanted a more automated process.  
 
Anybody have a good suggestions?
 
Reply | Threaded
Open this post in threaded view
|

Re: Merging numerous files via macro

Jim Van Overschelde

I installed Python essentials and ran code but got no errors, output, or action.

Removed Python essentials and reinstalled as administrator and it worked great!!!

Thanks Jon.

 

From: Jon K Peck [mailto:[hidden email]]
Sent: Wednesday, July 25, 2012 3:06 PM
To: Jim Van Overschelde
Cc: [hidden email]
Subject: Re: [SPSSX-L] Merging numerous files via macro

 

Easy to do with a few lines of code using Python programmability.  You would need to install the Python Essentials from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral) if you haven't already done that.

Then run this from a syntax window.  Change the filespec line below to select the files, e.g.,
filespec = r"c:/data/*2111.sav"
I've assumed that they are all in the same directory.

begin program.
import spss, glob

filespec = r"c:/temp/parts/e*.sav"

files = glob.glob(filespec)
cmd = "ADD FILES "
all = " ".join(["""/FILE="%s" """ % f for f in files])
cmd = cmd + all
spss.Submit(cmd)
end program.

dataset name merged.
exec.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Jim Van Overschelde <[hidden email]>
To:        [hidden email]
Date:        07/25/2012 01:33 PM
Subject:        [SPSSX-L] Merging numerous files via macro
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





I am new to the group so I hope I post this correctly.  
 
I am processing lots of files with student assessment data.  For one project I have to merge cases from ~120 different files.
Each file is for a different grade, language of test, and administration (e.g., Math3Eng1_2011, Math3Sp1_2011, Math4Eng1_2011, Math5-1_2011, Math6_2011).  Each school year there are 26-34 different files.  I want to be able to easily/efficiently merge the cases from all files for a particular school year.  To make matters more complicated, the file naming structure changes year to year.  I don’t want to merge files one at a time because a total of over 5M records exist and it takes forever to do 25+ data passes.  I don’t know how to merge multiple files in one data pass unless I hard coded the # of files to merge (but this varies).  
 
I could take a directory listing via windows command prompt, then using an editor add the prefixes and suffixes to each line so it can be pasted into a merge statement, but I wanted a more automated process.  
 
Anybody have a good suggestions?