Hi SPSS team, I know about reading files from Excel into SPSS and I know that it is possible to combine multiple files using Add Files with /IN=. I want to learn the best way to read in multiple (>40) identically formatted Excel files into a single SPSS file while assigning a value for a single variable that identifies each source file (e.g. SiteID). I suspect there is an easier way than to read in and save each file, then add them with /IN=, then combine the 40+ IN variables into a single variable, but am not finding the best terms to use to search in help or in the SPSSx-L archives. Thoughts? Thanks, Melissa This correspondence contains proprietary information some or all of which may be legally privileged; it is for the intended recipient only. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this correspondence and completely dispose of the correspondence immediately. Please notify the sender if you have received this email in error. NOTE: Messages to or from the State of Connecticut domain may be subject to the Freedom of Information statutes and regulations. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I’m assuming that the all the 40 files are exactly the same in terms of variable names and number and variable order. I did something very much, maybe exactly, like you describe in May 2018 and I used something called SPSSINC PROCESS FILES
and which Jon was generous in telling me about, helping me with, and answering specific questions about the file that reads the data files. It may be that the magic command function is now called Process Data Files in the Utilities dropdown. I have no idea
where or if it is documented. If SPSSINC PROCESS FILES is what you need I can send you examples.
Gene Maguin From: SPSSX(r) Discussion <[hidden email]>
On Behalf Of Ives, Melissa L Hi SPSS team, I know about reading files from Excel into SPSS and I know that it is possible to combine multiple files using Add Files with /IN=. I want to learn the best way to read in multiple (>40) identically formatted Excel files into a single SPSS file while assigning a value for a single variable that identifies each source file (e.g. SiteID). I suspect there is an easier way than to read in and save each file, then add them with /IN=, then combine the 40+ IN variables into a single variable, but am not finding the best terms to use to search in help or in the SPSSx-L archives. Thoughts? Thanks, Melissa
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
In reply to this post by MLIves
The first issue that often arises in this scenario is that a string variable might come in with different widths from different sheets, since Excel doesn't have a concept of width. If that happens, you can't merge the files. (STAR JOIN is more permissive, but I wouldn't recommend it.) I had a client once who had 300 Excel files to merge with this problem. The solution for that is to read in these files one by one and synchronize the widths. That would be a royal pain. Fortunately, there is an extension command, STATS ADJUST WIDTHS that can do this for you. You can give it a wildcard list of files, e.g. files="c:\toconvert\*.sav" to process all of them easily and resave. But this only works with sav files or open datasets. So you need first to convert the Excel files to sav files. For that, there is another extension command, STATS PROCESS FILES, that applies a syntax file to each file. It also accepts wildcards for the files to process. The syntax file to apply would just have a GET DATA command and a SAVE OUTFILE command to apply to each input. You would use a macro in that file to construct the SAV name from the Excel name - there is an example in the syntax help. You could put a compute in there to identify the file based on the file name or a sequence number or some variable in the input. Then apply ADJUST WIDTHS if needed. Finally you need to merge all these files with ADD CASES. Rather than listing them all manually, a few lines of Python code, which I can supply, would construct the ADD CASES command from a wildcard specification, or you could cobble together a list by copying a directory listing. If you are building a repeatable process, I wouldn't recommend that. On Mon, Nov 23, 2020 at 11:41 AM Ives, Melissa L <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |