In the SPSS help files for DO REPEAT, it states, "The following commands
can be used within a DO REPEATâEND REPEAT structure:... Data definition: DATA LIST..." I have two raw data files that I am trying to import into SPSS, then save to separate files. Both have the same structure and the same variable locations. I would like to be able to use a DO REPEAT command to pull these data into SPSS for analysis. I am currently using the GET DATA command, but have tried it with the DATA LIST command as well and cannot get it to work. Here is a portion of my code (note that there are MANY more variables I am pulling in; I shortened the list for sanity): define !aegen() !let !q=3 /* change this line to reflect the current quarter (closeout=5). !let !i=2006 /* change this line to reflect the current year. !if (!q=5) !then !let !dir=!i. !else !let !dir=!concat(!i,'q',!q). !ifend. *Coal operator data. get data /type=txt /file=!quote(!concat('c:\msha\raw\',!dir,'\cade',!i,'.',!q)) /arrangement=fixed /firstcase=2 /variables=mineid 0-6 a7 contract 7-13 a7 inspoff 17-20 a4 state 21-22 a2. save outfile='c:\temp\coalop.sav'. *Metal-nonmetal-stone-s&g operator data. get data /type=txt /file=!quote(!concat('c:\msha\raw\',!dir,'\made',!i,'.',!q)) /arrangement=fixed /firstcase=2 /variables=mineid 0-6 a7 contract 7-13 a7 inspoff 17-20 a4 state 21-22 a2. save outfile='c:\temp\metlop.sav'. !enddefine. !aegen. --------------------- Thank you in advance for any help you can provide. As it stands, my program works--I am just trying to eliminate some redundancy and make my file smaller (read more manageable--already at 14 printed pages). |
Chad,
I looked through the syntax documentation and I do not find what you describe with respect to data list. As I read the documentation you can use do repeat inside of an input program segment but not as part of a data list. There is an example to this effect. You say you are reading your data correctly with a get data command. I'm curious, other than the challenge of it, why do you want to use data list. What do you expect to gain? Gene Maguin |
Gene,
Thanks for your quick reply. From the SPSS Help files: "The GET DATA command provides functionality comparable to DATA LIST without creating an entire copy of the data file in temporary disk space." and "GET DATA /TYPE=TXT is similar to DATA LIST but does not create a temporary copy of the data file, significantly reducing temporary file space requirements for large data files." Since the data file is huge, I reduce tempory file space by using GET DATA. (In reality, I am recoding a file wrote decades ago. The old syntax used DATA LIST, so I can revert back to that if I can use the DO REPEAT with that.) The SPSS Help file said DO REPEAT could be used with DATA LIST, but mentioned nothing about being used with GET DATA. I saw the example, using DO REPEAT inside the input program, but I would like to use it outside. That is where I am having difficluties. It seems like it could be done, I am just not having luck making it work. As a side note, my complete first part of the file reads: !let !q=3 !let !j=2005 !let !i=2006 It would be nice to be able to just have the user change the !i value since the !j value will always be one less, but I can't figure that out neither :-( On 12/21/06, Gene Maguin <[hidden email]> wrote: > > Chad, > > I looked through the syntax documentation and I do not find what you > describe with respect to data list. As I read the documentation you can > use > do repeat inside of an input program segment but not as part of a data > list. > There is an example to this effect. > > You say you are reading your data correctly with a get data command. I'm > curious, other than the challenge of it, why do you want to use data list. > What do you expect to gain? > > Gene Maguin > |
Chad,
I have to confess that i can't see how in the world you could use a do repeat in a operation that read data, whether a get data or a data list. I guess i also don't understand at all what you are trying to do. You say that you are reading two files and then saving each of them to their own file. Data list and get data can do that. You point out an advantage to get data because the incoming files are very large. I can't comment on that as i have no experience with that. You have set up a macro to do the reading which suggests to me that you have a number of these files to read and save and that the incoming file names are structured in some sort of systematic way that makes a macro useful. (As a side issue, given what what you say about reading and saving, why aren't you controlling the save file name structure in some manner as you do the input file name?) So, what are you trying to do? Let's say you have just run your macro to read in two files and save them as coalop.sav and metalop.sav, respectively. Now what? What comes next? Gene Maguin |
In reply to this post by Chad T. Lower
At 01:31 PM 12/21/2006, Chad T. Lower wrote:
>I have two raw data files that I am trying to import into SPSS, then >save to separate files. Both have the same structure and the same >variable locations. I would like to be able to use a DO REPEAT >command to pull these data into SPSS for analysis. I don't think you want DO REPEAT. DO REPEAT works within a transformation program or input program; reading two inputs, and saving to separate files, would be separate transformation programs. You'd run the two separately from a macro, as you did in your example. >Here is a portion of my code (note that there are MANY more variables >I am pulling in; I shortened the list for sanity): May I step back from the DO REPEAT, and ask what you'd like to do? You might want to read the two files without having to have two copies of your complicated list of variables. That would be a two-pass macro loop, or loop in Python. (But if I were doing it, I'd probably just have two copies of the variable list, creating the second by copying and pasting from the first.) Or do want to generate the long variable lists within a loop of some kind, so you don't have to write every variable individually? Or has a night/day of too little sleep and too much airplane, addled my brain completely, and those are both way off? In any case, DO REPEAT is a means, not an end, for you. Let's tackle from the other direction: describe the end, and find the appropriate means. >My program works--I am just trying to eliminate some redundancy and >make my file smaller (read more manageable--already at 14 printed >pages). Good; that's the place to start. What's repetitive, in those 14 pages? -Onward, and good luck, Richard |
In reply to this post by Maguin, Eugene
Gene,
To answer your question, "As a side issue, given what what you say about reading and saving, why aren't you controlling the save file name structure in some manner as you do the input file name?)" Again, you are seeing a very small portion of a very large code. I am actually taking 4 raw data files, converting them into 4 (more or less) temporary files, then combining the four into one large file and doing data analysis on that file. (Quick note, I am controlling the file name structure of that larger file.) Gene and Richard The reason I use four temp files is so that I can create a new variable (using the add command) to tell me which raw file it came from. What bothers me is that of the four raw data files I use, the first two are set up identically, as are the second two. The only difference in the first two and second two files are that the initial two variables are transposed. The remaining 60 some odd variables are the exact same in all four raw data files. What I have done currently is created one GET DATA for the first file, then copied and pasted for the 2nd, 3rd, and 4th files. My thought is, if I can do it with copy and paste, hopefully I can do it with a DO REPEAT and save the space. Other methods I have tried are creating a second macro of just the variable inputs, then using my primary macro to call the data, then call the second macro to separate the data into the appropriate variables... but I couldn't get that to work either. I tried creating a second syntax file and using INSERT to call that second file to do the same thing... but I couldn't get that to work either. It seems like once you type GET DATA, you cannot do anything else (calling a macro or the INSERT command) until the entire raw data file is pulled and sorted into variables. As for the repeativeness in the pages, what I have described is mostly it. I cleaned up a few other COMPUTE commands with the DO REPEAT, and several consecutive if statements with either DO IF; ELSE or RECODE. I will say a lot of the code is just defining variable definitions (for example, defining that state '01' is Alabama or whatever). Without that, the meat is probably only 6 or 7 pages, but 3 of those pages are pulling the raw data into these coalop.sav, metalop.sav... files. That doesn't include another 7 page syntax file that I have as an INSERT since I use it for another program... |
One situation in which macro is still a good idea even with programmability available is to create a shorthand way to refer to a long variable list. So why not define the 58 variables that are the same in each file and just use that macro in your syntax as needed. It seems that that would be pretty simple.
HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chad T. Lower Sent: Thursday, December 21, 2006 5:07 PM To: [hidden email] Subject: Re: [SPSSX-L] Using DO REPEAT with DATA LIST Gene, To answer your question, "As a side issue, given what what you say about reading and saving, why aren't you controlling the save file name structure in some manner as you do the input file name?)" Again, you are seeing a very small portion of a very large code. I am actually taking 4 raw data files, converting them into 4 (more or less) temporary files, then combining the four into one large file and doing data analysis on that file. (Quick note, I am controlling the file name structure of that larger file.) Gene and Richard The reason I use four temp files is so that I can create a new variable (using the add command) to tell me which raw file it came from. What bothers me is that of the four raw data files I use, the first two are set up identically, as are the second two. The only difference in the first two and second two files are that the initial two variables are transposed. The remaining 60 some odd variables are the exact same in all four raw data files. What I have done currently is created one GET DATA for the first file, then copied and pasted for the 2nd, 3rd, and 4th files. My thought is, if I can do it with copy and paste, hopefully I can do it with a DO REPEAT and save the space. Other methods I have tried are creating a second macro of just the variable inputs, then using my primary macro to call the data, then call the second macro to separate the data into the appropriate variables... but I couldn't get that to work either. I tried creating a second syntax file and using INSERT to call that second file to do the same thing... but I couldn't get that to work either. It seems like once you type GET DATA, you cannot do anything else (calling a macro or the INSERT command) until the entire raw data file is pulled and sorted into variables. As for the repeativeness in the pages, what I have described is mostly it. I cleaned up a few other COMPUTE commands with the DO REPEAT, and several consecutive if statements with either DO IF; ELSE or RECODE. I will say a lot of the code is just defining variable definitions (for example, defining that state '01' is Alabama or whatever). Without that, the meat is probably only 6 or 7 pages, but 3 of those pages are pulling the raw data into these coalop.sav, metalop.sav... files. That doesn't include another 7 page syntax file that I have as an INSERT since I use it for another program... |
In reply to this post by Chad T. Lower
Chad,
>>Gene and Richard The reason I use four temp files is so that I can create a new variable (using the add command) to tell me which raw file it came from. Look at the add files command and see if the In keyword will give you what you need. What bothers me is that of the four raw data files I use, the first two are set up identically, as are the second two. The only difference in the first two and second two files are that the initial two variables are transposed. The remaining 60 some odd variables are the exact same in all four raw data files. What I have done currently is created one GET DATA for the first file, then copied and pasted for the 2nd, 3rd, and 4th files. My thought is, if I can do it with copy and paste, hopefully I can do it with a DO REPEAT and save the space. Other methods I have tried are creating a second macro of just the variable inputs, then using my primary macro to call the data, then call the second macro to separate the data into the appropriate variables... but I couldn't get that to work either. I tried creating a second syntax file and using INSERT to call that second file to do the same thing... but I couldn't get that to work either. I find myself wondering if there is even more to this story than that you apparently just have a few or many sets of four files that you wish to read in and add together and also keep track of where cases in the 'added together' file came from. (I'm deliberately ignoring what ever may happen after the four files are merged because that wasn't what presented a problem for you.) Within a set of four files, the sequence of variables in each of the files is the same except that, and this is the problem, in two of the files the order of the first two variables is reversed from that of the other two files. Assuming all this is true, then it seems to me that you have a pretty compact chunk of code to read, save, merge and save each set four files. Other people have lot more experience with macros than i do. That said and if you wanted to make that section of code even smaller, i wonder if you couldn't write an outer macro that calls each of two inner macros twice. Each inner macro contains a get data command and a save outfile command to read and save a file. The difference between the two inner macros is that one is for files having one variable sequence and the other is for the other sequence. In addition to calling and passing file names to the inner macro, the outer macro merges the four temp files. It'd be clever to do that but, on the other hand, why bother. Gene Maguin |
In reply to this post by Peck, Jon
On 12/21/06, Peck, Jon wrote:
> > ...is to create a shorthand way to refer to a long variable list. This is probably where I am having problems and I do not know how to do the shorthand way... That is what I am trying to make. On 12/21/06, Gene Maguin wrote: > > Look at the add files command and see if the In keyword will give you what > you need. > That is what I am currently doing. I guess I didn't say it clearly enough. > That said and if you wanted to make that section of code even smaller, i > wonder if you couldn't write an outer macro that calls each of two inner > macros twice. > I guess that is what I will have to try next. It'd be clever to do that but, on the other hand, why bother. > I guess you could say that I am old enough to remember the days when memory was in short supply and expensive (both RAM and ROM). Code was written "clever" and succinct with no wasted space that wasn't necessary. As I don't like waste of any kind (food, refuse, etc.), my preference is to make the code as succinct as possible. Again, the program I have is fully functional and works to give me what I need. I am just trying to make it more succinct in its programming. As I have only been working with SPSS four about 2 months, I am learning about new methods and techniques daily. Unfortunately, this problem has stumped me although I have been working on it for several weeks already. As I mentioned in a previous post, the other "idea" I would like to do that has stumped me is: As a side note, my complete first part of the file reads: !let !q=3 !let !j=2005 !let !i=2006 It would be nice to be able to just have the user change the !i value since the !j value will always be one less, but I can't figure that out neither :-( |
There is probably something more complicated in your situation, but here is what I meant.
define !varlist() origin accel weight !enddefine. data list free /x y !varlist. desc !varlist. !varlist is a shorthand, so you define it once and then use it whenever that list is needed. The data list would list the varying part followed by the macro for the common part. As an aside, you might want to consider an alternative approach to all of this using the programmability features introduced in SPSS 14. Once you get the hang of Python, you will find it almost always easier to do things that way then to use macro, and it is much more powerful than macro. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chad T. Lower Sent: Friday, December 22, 2006 10:04 AM To: [hidden email] Subject: Re: [SPSSX-L] Using DO REPEAT with DATA LIST On 12/21/06, Peck, Jon wrote: > > ...is to create a shorthand way to refer to a long variable list. This is probably where I am having problems and I do not know how to do the shorthand way... That is what I am trying to make. On 12/21/06, Gene Maguin wrote: > > Look at the add files command and see if the In keyword will give you what > you need. > That is what I am currently doing. I guess I didn't say it clearly enough. > That said and if you wanted to make that section of code even smaller, i > wonder if you couldn't write an outer macro that calls each of two inner > macros twice. > I guess that is what I will have to try next. It'd be clever to do that but, on the other hand, why bother. > I guess you could say that I am old enough to remember the days when memory was in short supply and expensive (both RAM and ROM). Code was written "clever" and succinct with no wasted space that wasn't necessary. As I don't like waste of any kind (food, refuse, etc.), my preference is to make the code as succinct as possible. Again, the program I have is fully functional and works to give me what I need. I am just trying to make it more succinct in its programming. As I have only been working with SPSS four about 2 months, I am learning about new methods and techniques daily. Unfortunately, this problem has stumped me although I have been working on it for several weeks already. As I mentioned in a previous post, the other "idea" I would like to do that has stumped me is: As a side note, my complete first part of the file reads: !let !q=3 !let !j=2005 !let !i=2006 It would be nice to be able to just have the user change the !i value since the !j value will always be one less, but I can't figure that out neither :-( |
Free forum by Nabble | Edit this page |