|
Hello SPSS experts,
I have about 10 different data sets that I need to merge into one file. This merge is adding records and not variables, so each data file has the exact same variable names. The problem is that most of the variables are string format and for some reason there are different widths for these variables in the corresponding data sets. As you know, to "add cases" the widths of the string variables have to match when doing the merge. Since there are about 30 variables in each file, it gets to be quite a pain in the derriere to have to manually change each variable to match a "master" format. Is there a way that I can set a particular file as the standard file and it will automatically adjust the variable lengths of the other files when I do the merge? Is there some other way that you know to add cases when the variable lengths don't match? Any help would be very appreciated! Matt ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Matthew,
Opening each file, adjusting the format on the variable view window, and resaving is the default method. Somebody else may have a better solution but doing it in syntax is not much easier because the drill is basically the same but a bit more involved. Get file='xxx'. Rename (x1 to x30=s1 to s30). String x1(a10) x2(a5) etc. Do repeat x=x1 to x30/s=s1 to s30. + compute x=s. End repeat. Save outfile='xxx'/drop=s1 to s30. You could make this into a macro but to me it's spuds or potatoes. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hoover, Matthew
First, is there some reason why the string lengths are different? That
sounds suspicious to me and I would do some basic frequencies on these variables from different files to see what's going on here. Second, is there some primary source from which these files are being extracted? If so, I'd step back and look at the extraction process to see if the problem cannot be solved upstream. Third, is there some reason why the original names must be retained. Why not just create new variables from the existing variables in the same manner and save the whole renaming process when the files are combined. That's a fairly easy piece of syntax to write and execute against each file. It also has the side-effect of preserving the original data since unless you go with the maximum length string across all the files you run the risk of losing data through truncation. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hoover, Matthew Sent: Tuesday, December 18, 2007 8:45 AM To: [hidden email] Subject: adjusting string width? Hello SPSS experts, I have about 10 different data sets that I need to merge into one file. This merge is adding records and not variables, so each data file has the exact same variable names. The problem is that most of the variables are string format and for some reason there are different widths for these variables in the corresponding data sets. As you know, to "add cases" the widths of the string variables have to match when doing the merge. Since there are about 30 variables in each file, it gets to be quite a pain in the derriere to have to manually change each variable to match a "master" format. Is there a way that I can set a particular file as the standard file and it will automatically adjust the variable lengths of the other files when I do the merge? Is there some other way that you know to add cases when the variable lengths don't match? Any help would be very appreciated! Matt ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hoover, Matthew
Shalom
A simple way to do what you need is to export all you file an reread them. One way of doing it is to use the write command . If all your files have the same variable name you will write all of them using the same write command , because they are string you should use the wides string in the write command , and consider using trim to insure that all string begin on the first column Her is a general example using macro . set mprint yes. define write_str (!positional !tokens(1)) . get file=!concat(!eval(!1), '.sav') . write outfile=!concat(!eval(!1),'_n.sav') /str1 to str30(30a22) . execute . !enddefine . write_str file_a.sav . . . write_str file_z. add files file= file_a_n.sav / . . file= file_z_n.sav / Hillel Vard BGU Hoover, Matthew wrote: > Hello SPSS experts, > > > > I have about 10 different data sets that I need to merge into one file. > This merge is adding records and not variables, so each data file has > the exact same variable names. The problem is that most of the > variables are string format and for some reason there are different > widths for these variables in the corresponding data sets. As you know, > to "add cases" the widths of the string variables have to match when > doing the merge. Since there are about 30 variables in each file, it > gets to be quite a pain in the derriere to have to manually change each > variable to match a "master" format. > > > > Is there a way that I can set a particular file as the standard file and > it will automatically adjust the variable lengths of the other files > when I do the merge? Is there some other way that you know to add cases > when the variable lengths don't match? > > > > Any help would be very appreciated! > > > > Matt > > =================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by ViAnn Beadle
If you paste this python script and follow instructions to setting the dir and length variables, you can set length of all string variables to be equal for all spss files in given directory. That should help your merge:
BEGIN PROGRAM. import sys, os, spss, spssaux """This script sets length of all string variables to 100 for all files in the specified directory. It then saves each file with the same name, but with "_rfmt" appended to it. This was done to preserve the original files. The only difference between the original and modified file is that all string variable will be at the end, rather than beginning of the dataset. You have to set the following variables: dir - directory where your files reside length - number of characters for all string variables To run this from syntax editor, wrap it in a BEGIN PROGRAM/END PROGRAM""" try: filelist = [] dir = "c:/" length = "150" filelist = os.listdir(dir) for file in filelist: if file[-3:] == 'sav': spss.Submit("get file '%s%s'." %(dir, file)) strvarlist = [] dataCursor = spss.Cursor() AllData = dataCursor.fetchall() varDict = spssaux.VariableDict() for e in varDict: if e.VariableType > 0: strvarlist.append(e.VariableName) dataCursor.close() for e in strvarlist: spss.Submit("""String %s_rfmt (A%s). compute %s_rfmt = %s. exe. delete variables %s. rename variables (%s_rfmt = %s).""" %(e,length,e,e,e,e,e)) spss.Submit("save outfile '" + dir + file[0:-4] + "_rfmt.sav'.") except: dataCursor.close() raise END PROGRAM. -David -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ViAnn Beadle Sent: Tuesday, December 18, 2007 11:25 AM To: [hidden email] Subject: Re: adjusting string width? First, is there some reason why the string lengths are different? That sounds suspicious to me and I would do some basic frequencies on these variables from different files to see what's going on here. Second, is there some primary source from which these files are being extracted? If so, I'd step back and look at the extraction process to see if the problem cannot be solved upstream. Third, is there some reason why the original names must be retained. Why not just create new variables from the existing variables in the same manner and save the whole renaming process when the files are combined. That's a fairly easy piece of syntax to write and execute against each file. It also has the side-effect of preserving the original data since unless you go with the maximum length string across all the files you run the risk of losing data through truncation. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hoover, Matthew Sent: Tuesday, December 18, 2007 8:45 AM To: [hidden email] Subject: adjusting string width? Hello SPSS experts, I have about 10 different data sets that I need to merge into one file. This merge is adding records and not variables, so each data file has the exact same variable names. The problem is that most of the variables are string format and for some reason there are different widths for these variables in the corresponding data sets. As you know, to "add cases" the widths of the string variables have to match when doing the merge. Since there are about 30 variables in each file, it gets to be quite a pain in the derriere to have to manually change each variable to match a "master" format. Is there a way that I can set a particular file as the standard file and it will automatically adjust the variable lengths of the other files when I do the merge? Is there some other way that you know to add cases when the variable lengths don't match? Any help would be very appreciated! Matt ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.4/1188 - Release Date: 12/17/2007 2:13 PM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.4/1189 - Release Date: 12/18/2007 9:40 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by ViAnn Beadle
Of course, to run the python script, you must have the python integration plugin and have downloaded the spss and spssaux modules.
-David -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ViAnn Beadle Sent: Tuesday, December 18, 2007 11:25 AM To: [hidden email] Subject: Re: adjusting string width? First, is there some reason why the string lengths are different? That sounds suspicious to me and I would do some basic frequencies on these variables from different files to see what's going on here. Second, is there some primary source from which these files are being extracted? If so, I'd step back and look at the extraction process to see if the problem cannot be solved upstream. Third, is there some reason why the original names must be retained. Why not just create new variables from the existing variables in the same manner and save the whole renaming process when the files are combined. That's a fairly easy piece of syntax to write and execute against each file. It also has the side-effect of preserving the original data since unless you go with the maximum length string across all the files you run the risk of losing data through truncation. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hoover, Matthew Sent: Tuesday, December 18, 2007 8:45 AM To: [hidden email] Subject: adjusting string width? Hello SPSS experts, I have about 10 different data sets that I need to merge into one file. This merge is adding records and not variables, so each data file has the exact same variable names. The problem is that most of the variables are string format and for some reason there are different widths for these variables in the corresponding data sets. As you know, to "add cases" the widths of the string variables have to match when doing the merge. Since there are about 30 variables in each file, it gets to be quite a pain in the derriere to have to manually change each variable to match a "master" format. Is there a way that I can set a particular file as the standard file and it will automatically adjust the variable lengths of the other files when I do the merge? Is there some other way that you know to add cases when the variable lengths don't match? Any help would be very appreciated! Matt ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.4/1188 - Release Date: 12/17/2007 2:13 PM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.4/1189 - Release Date: 12/18/2007 9:40 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hoover, Matthew
SPSS 16 has a new ALTER TYPE command that makes it much easier to change string widths than it used to be. And with multiple dataset capabilities, you can just open the other dataset, run ALTER TYPE, and do the merge with that open dataset.
You could also write a Python program to open the dataset, adjust the widths, and issue the merge syntax. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hoover, Matthew Sent: Tuesday, December 18, 2007 8:45 AM To: [hidden email] Subject: [SPSSX-L] adjusting string width? Hello SPSS experts, I have about 10 different data sets that I need to merge into one file. This merge is adding records and not variables, so each data file has the exact same variable names. The problem is that most of the variables are string format and for some reason there are different widths for these variables in the corresponding data sets. As you know, to "add cases" the widths of the string variables have to match when doing the merge. Since there are about 30 variables in each file, it gets to be quite a pain in the derriere to have to manually change each variable to match a "master" format. Is there a way that I can set a particular file as the standard file and it will automatically adjust the variable lengths of the other files when I do the merge? Is there some other way that you know to add cases when the variable lengths don't match? Any help would be very appreciated! Matt ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hoover, Matthew
At 10:44 AM 12/18/2007, Hoover, Matthew wrote:
>I have different data sets that I need to merge into one file. >This merge is adding records and not variables, so each data file >has the same variable names. The problem is that most of the >variables are string format and there are different widths for these >variables in the corresponding data sets. As you know, to "add >cases" the widths of the string variables have to match when doing the merge. Well, you've had a number of solutions. They're workable; but, I'd say, the best are clumsy. So, to repeat an oft-repeated complaint: I think that making different-length strings incompatible for ADD FILES, et al, was a mistake, and I'm sorry it's never been corrected. A natural alternative is to give the result variable the length of the longest input. If there's worry this will be confusing (I think it rarely will be), retain the present behavior by default, and add a subcommand to make different-length strings compatible as described. And it comes up pretty often. At 11:24 AM 12/18/2007, ViAnn Beadle wrote: >First, is there some reason why the string lengths are different? >That sounds suspicious to me and I would do some basic frequencies >on these variables from different files to see what's going on here. Well, I'd check that myself, unless I knew all the data sources pretty well. But I wouldn't be terribly suspicious. Many data-entry programs and transmission routes (Excel, for one) create SPSS string variables with the longest length observed in the data. Very often that differs between batches of the data. It's driven me bats. Sometimes, in Excel for one, you can insert a 'template' line with dummy values having the longest length expected, but it's a pain: you're only too likely to guess too small for one or more of the variables; and then, if you have to change the template, you have to change it exactly the same way in every input spreadsheet. Sigh. This has been an unpaid non-profit rant. We now return you to your regularly scheduled list traffic. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
