Read in multiple Excel files while assigning a value for which file

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Read in multiple Excel files while assigning a value for which file

MLIves

Hi SPSS team,

 

I know about reading files from Excel into SPSS and I know that it is possible to combine multiple files using Add Files with /IN=.

 

I want to learn the best way to read in multiple (>40) identically formatted Excel files into a single SPSS file while assigning a value for a single variable that identifies each source file (e.g. SiteID).

 

I suspect there is an easier way than to read in and save each file, then add them with /IN=, then combine the 40+ IN variables into a single variable, but am not finding the best terms to use to search in help or in the SPSSx-L archives.

 

Thoughts?

 

Thanks,

Melissa

 

 




This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Read in multiple Excel files while assigning a value for which file

Maguin, Eugene

I’m assuming that the all the 40 files are exactly the same in terms of variable names and number and variable order. I did something very much, maybe exactly, like you describe in May 2018 and I used something called SPSSINC PROCESS FILES and which Jon was generous in telling me about, helping me with, and answering specific questions about the file that reads the data files. It may be that the magic command function is now called Process Data Files in the Utilities dropdown. I have no idea where or if it is documented.

If SPSSINC PROCESS FILES is what you need I can send you examples.

Gene Maguin

 

From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Ives, Melissa L
Sent: Monday, November 23, 2020 1:42 PM
To: [hidden email]
Subject: Read in multiple Excel files while assigning a value for which file

 

Hi SPSS team,

 

I know about reading files from Excel into SPSS and I know that it is possible to combine multiple files using Add Files with /IN=.

 

I want to learn the best way to read in multiple (>40) identically formatted Excel files into a single SPSS file while assigning a value for a single variable that identifies each source file (e.g. SiteID).

 

I suspect there is an easier way than to read in and save each file, then add them with /IN=, then combine the 40+ IN variables into a single variable, but am not finding the best terms to use to search in help or in the SPSSx-L archives.

 

Thoughts?

 

Thanks,

Melissa

 

 

 



This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Read in multiple Excel files while assigning a value for which file

Jon Peck
In reply to this post by MLIves
The first issue that often arises in this scenario is that a string variable might come in with different  widths from different sheets, since Excel doesn't have a concept of width.  If that happens, you  can't merge the files.  (STAR JOIN is more permissive, but I wouldn't recommend it.)  I had a client once who had 300 Excel files to merge with this problem.

The solution for that is to read in these files one by one and synchronize the widths.  That would be a royal pain.  Fortunately, there is an extension command, STATS ADJUST WIDTHS that can do this for you.  You  can give it a wildcard list of files, e.g. files="c:\toconvert\*.sav" to process all of them easily and resave.

But this only  works with sav files or open datasets.  So you need first to convert the Excel files to sav files.  For that, there is another extension command, STATS PROCESS FILES, that applies a syntax file to each file.  It also accepts wildcards for the files to process.  The syntax file to apply would just have a GET DATA command and a SAVE OUTFILE command to apply to each input.  You would use a macro in that file to construct the SAV name from the Excel name - there is an example in the syntax help.  You could put a compute in there to identify the file based on the file name or a sequence number or some variable in the input.

Then apply ADJUST WIDTHS if needed.

Finally you  need to merge all these files with ADD CASES.  Rather than listing them all manually, a few lines of Python code, which I can supply, would construct the ADD CASES command from a wildcard specification, or you could cobble together a list by copying a directory listing.  If you are building a repeatable process, I wouldn't recommend that.





On Mon, Nov 23, 2020 at 11:41 AM Ives, Melissa L <[hidden email]> wrote:

Hi SPSS team,

 

I know about reading files from Excel into SPSS and I know that it is possible to combine multiple files using Add Files with /IN=.

 

I want to learn the best way to read in multiple (>40) identically formatted Excel files into a single SPSS file while assigning a value for a single variable that identifies each source file (e.g. SiteID).

 

I suspect there is an easier way than to read in and save each file, then add them with /IN=, then combine the 40+ IN variables into a single variable, but am not finding the best terms to use to search in help or in the SPSSx-L archives.

 

Thoughts?

 

Thanks,

Melissa

 

 




This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD