Using DO REPEAT with DATA LIST

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Using DO REPEAT with DATA LIST

Chad T. Lower
In the SPSS help files for DO REPEAT, it states, "The following commands
can be used within a DO REPEAT—END REPEAT structure:...

Data definition: DATA LIST..."

I have two raw data files that I am trying to import into SPSS, then save
to separate files.  Both have the same structure and the same variable
locations.  I would like to be able to use a DO REPEAT command to pull
these data into SPSS for analysis.  I am currently using the GET DATA
command, but have tried it with the DATA LIST command as well and cannot
get it to work.  Here is a portion of my code (note that there are MANY
more variables I am pulling in; I shortened the list for sanity):

define !aegen()

!let !q=3 /* change this line to reflect the current quarter (closeout=5).
!let !i=2006 /* change this line to reflect the current year.

!if (!q=5) !then !let !dir=!i.
  !else !let !dir=!concat(!i,'q',!q).
!ifend.

*Coal operator data.
get data /type=txt
 /file=!quote(!concat('c:\msha\raw\',!dir,'\cade',!i,'.',!q))
 /arrangement=fixed
 /firstcase=2
 /variables=mineid 0-6 a7 contract 7-13 a7 inspoff 17-20 a4 state 21-22 a2.
save outfile='c:\temp\coalop.sav'.

*Metal-nonmetal-stone-s&g operator data.
get data /type=txt
 /file=!quote(!concat('c:\msha\raw\',!dir,'\made',!i,'.',!q))
 /arrangement=fixed
 /firstcase=2
 /variables=mineid 0-6 a7 contract 7-13 a7 inspoff 17-20 a4 state 21-22 a2.
save outfile='c:\temp\metlop.sav'.

!enddefine.
!aegen.
---------------------
Thank you in advance for any help you can provide.  As it stands, my
program works--I am just trying to eliminate some redundancy and make my
file smaller (read more manageable--already at 14 printed pages).
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Maguin, Eugene
Chad,

I looked through the syntax documentation and I do not find what you
describe with respect to data list. As I read the documentation you can use
do repeat inside of an input program segment but not as part of a data list.
There is an example to this effect.

You say you are reading your data correctly with a get data command. I'm
curious, other than the challenge of it, why do you want to use data list.
What do you expect to gain?

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Chad T. Lower
Gene,

Thanks for your quick reply.  From the SPSS Help files:
"The GET DATA command provides functionality comparable to DATA LIST without
creating an entire copy of the data file in temporary disk space." and "GET
DATA /TYPE=TXT is similar to DATA LIST but does not create a temporary copy
of the data file, significantly reducing temporary file space requirements
for large data files."

Since the data file is huge, I reduce tempory file space by using GET DATA.
(In reality, I am recoding a file wrote decades ago.  The old syntax used
DATA LIST, so I can revert back to that if I can use the DO REPEAT with
that.) The SPSS Help file said DO REPEAT could be used with DATA LIST, but
mentioned nothing about being used with GET DATA.

I saw the example, using DO REPEAT inside the input program, but I would
like to use it outside.  That is where I am having difficluties.  It seems
like it could be done, I am just not having luck making it work.

As a side note, my complete first part of the file reads:

!let !q=3
!let !j=2005
!let !i=2006

It would be nice to be able to just have the user change the !i value since
the !j value will always be one less, but I can't figure that out neither
:-(
On 12/21/06, Gene Maguin <[hidden email]> wrote:

>
> Chad,
>
> I looked through the syntax documentation and I do not find what you
> describe with respect to data list. As I read the documentation you can
> use
> do repeat inside of an input program segment but not as part of a data
> list.
> There is an example to this effect.
>
> You say you are reading your data correctly with a get data command. I'm
> curious, other than the challenge of it, why do you want to use data list.
> What do you expect to gain?
>
> Gene Maguin
>
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Maguin, Eugene
Chad,
 
I have to confess that i can't see how in the world you could use a do
repeat in a operation that read data, whether a get data or a data list. I
guess i also don't understand at all what you are trying to do. You say that
you are reading two files and then saving each of them to their own file.
Data list and get data can do that. You point out an advantage to get data
because the incoming files are very large. I can't comment on that as i have
no experience with that. You have set up a macro to do the reading which
suggests to me that you have a number of these files to read and save and
that the incoming file names are structured in some sort of systematic way
that makes a macro useful. (As a side issue, given what what you say about
reading and saving, why aren't you controlling the save file name structure
in some manner as you do the input file name?)
 
So, what are you trying to do? Let's say you have just run your macro to
read in two files and save them as coalop.sav and metalop.sav, respectively.
Now what? What comes next?
 
Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Richard Ristow
In reply to this post by Chad T. Lower
At 01:31 PM 12/21/2006, Chad T. Lower wrote:

>I have two raw data files that I am trying to import into SPSS, then
>save to separate files.  Both have the same structure and the same
>variable locations.  I would like to be able to use a DO REPEAT
>command to pull these data into SPSS for analysis.

I don't think you want DO REPEAT. DO REPEAT works within a
transformation program or input program; reading two inputs, and saving
to separate files, would be separate transformation programs. You'd run
the two separately from a macro, as you did in your example.

>Here is a portion of my code (note that there are MANY more variables
>I am pulling in; I shortened the list for sanity):

May I step back from the DO REPEAT, and ask what you'd like to do?

You might want to read the two files without having to have two copies
of your complicated list of variables. That would be a two-pass macro
loop, or loop in Python. (But if I were doing it, I'd probably just
have two copies of the variable list, creating the second by copying
and pasting from the first.)

Or do want to generate the long variable lists within a loop of some
kind, so you don't have to write every variable individually?

Or has a night/day of too little sleep and too much airplane, addled my
brain completely, and those are both way off?

In any case, DO REPEAT is a means, not an end, for you. Let's tackle
from the other direction: describe the end, and find the appropriate
means.

>My program works--I am just trying to eliminate some redundancy and
>make my file smaller (read more manageable--already at 14 printed
>pages).

Good; that's the place to start. What's repetitive, in those 14 pages?

-Onward, and good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Chad T. Lower
In reply to this post by Maguin, Eugene
Gene,
To answer your question, "As a side issue, given what what you say about
reading and saving, why aren't you controlling the save file name structure
in some manner as you do the input file name?)"  Again, you are seeing a
very small portion of a very large code.  I am actually taking 4 raw data
files, converting them into 4 (more or less) temporary files, then combining
the four into one large file and doing data analysis on that file.  (Quick
note, I am controlling the file name structure of that larger file.)

Gene and Richard The reason I use four temp files is so that I can create a
new variable (using the add command) to tell me which raw file it came from.

What bothers me is that of the four raw data files I use, the first two are
set up identically, as are the second two.  The only difference in the first
two and second two files are that the initial two variables are transposed.
The remaining 60 some odd variables are the exact same in all four raw data
files.

What I have done currently is created one GET DATA for the first file, then
copied and pasted for the 2nd, 3rd, and 4th files.  My thought is, if I can
do it with copy and paste, hopefully I can do it with a DO REPEAT and save
the space.  Other methods I have tried are creating a second macro of just
the variable inputs, then using my primary macro to call the data, then call
the second macro to separate the data into the appropriate variables... but
I couldn't get that to work either.  I tried creating a second syntax file
and using INSERT to call that second file to do the same thing... but I
couldn't get that to work either.

It seems like once you type GET DATA, you cannot do anything else (calling a
macro or the INSERT command) until the entire raw data file is pulled and
sorted into variables.

As for the repeativeness in the pages, what I have described is mostly it.
I cleaned up a few other COMPUTE commands with the DO REPEAT, and several
consecutive if statements with either DO IF; ELSE or RECODE.  I will say a
lot of the code is just defining variable definitions (for example, defining
that state '01' is Alabama or whatever).  Without that, the meat is probably
only 6 or 7 pages, but 3 of those pages are pulling the raw data into these
coalop.sav, metalop.sav... files.  That doesn't include another 7 page
syntax file that I have as an INSERT since I use it for another program...
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Peck, Jon
One situation in which macro is still a good idea even with programmability available is to create a shorthand way to refer to a long variable list.  So why not define the 58 variables that are the same in each file and just use that macro in your syntax as needed.  It seems that that would be pretty simple.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chad T. Lower
Sent: Thursday, December 21, 2006 5:07 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Using DO REPEAT with DATA LIST

Gene,
To answer your question, "As a side issue, given what what you say about
reading and saving, why aren't you controlling the save file name structure
in some manner as you do the input file name?)"  Again, you are seeing a
very small portion of a very large code.  I am actually taking 4 raw data
files, converting them into 4 (more or less) temporary files, then combining
the four into one large file and doing data analysis on that file.  (Quick
note, I am controlling the file name structure of that larger file.)

Gene and Richard The reason I use four temp files is so that I can create a
new variable (using the add command) to tell me which raw file it came from.

What bothers me is that of the four raw data files I use, the first two are
set up identically, as are the second two.  The only difference in the first
two and second two files are that the initial two variables are transposed.
The remaining 60 some odd variables are the exact same in all four raw data
files.

What I have done currently is created one GET DATA for the first file, then
copied and pasted for the 2nd, 3rd, and 4th files.  My thought is, if I can
do it with copy and paste, hopefully I can do it with a DO REPEAT and save
the space.  Other methods I have tried are creating a second macro of just
the variable inputs, then using my primary macro to call the data, then call
the second macro to separate the data into the appropriate variables... but
I couldn't get that to work either.  I tried creating a second syntax file
and using INSERT to call that second file to do the same thing... but I
couldn't get that to work either.

It seems like once you type GET DATA, you cannot do anything else (calling a
macro or the INSERT command) until the entire raw data file is pulled and
sorted into variables.

As for the repeativeness in the pages, what I have described is mostly it.
I cleaned up a few other COMPUTE commands with the DO REPEAT, and several
consecutive if statements with either DO IF; ELSE or RECODE.  I will say a
lot of the code is just defining variable definitions (for example, defining
that state '01' is Alabama or whatever).  Without that, the meat is probably
only 6 or 7 pages, but 3 of those pages are pulling the raw data into these
coalop.sav, metalop.sav... files.  That doesn't include another 7 page
syntax file that I have as an INSERT since I use it for another program...
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Maguin, Eugene
In reply to this post by Chad T. Lower
Chad,
 
>>Gene and Richard The reason I use four temp files is so that I can create
a new variable (using the add command) to tell me which raw file it came
from.
 
Look at the add files command and see if the In keyword will give you what
you need.
 
What bothers me is that of the four raw data files I use, the first two are
set up identically, as are the second two.  The only difference in the first
two and second two files are that the initial two variables are transposed.
The remaining 60 some odd variables are the exact same in all four raw data
files.
 
What I have done currently is created one GET DATA for the first file, then
copied and pasted for the 2nd, 3rd, and 4th files.  My thought is, if I can
do it with copy and paste, hopefully I can do it with a DO REPEAT and save
the space.  Other methods I have tried are creating a second macro of just
the variable inputs, then using my primary macro to call the data, then call
the second macro to separate the data into the appropriate variables... but
I couldn't get that to work either.  I tried creating a second syntax file
and using INSERT to call that second file to do the same thing... but I
couldn't get that to work either.
 
I find myself wondering if there is even more to this story than that you
apparently just have a few or many sets of four files that you wish to read
in and add together and also keep track of where cases in the 'added
together' file came from. (I'm deliberately ignoring what ever may happen
after the four files are merged because that wasn't what presented a problem
for you.) Within a set of four files, the sequence of variables in each of
the files is the same except that, and this is the problem, in two of the
files the order of the first two variables is reversed from that of the
other two files. Assuming all this is true, then it seems to me that you
have a pretty compact chunk of code to read, save, merge and save each set
four files.
 
Other people have lot more experience with macros than i do. That said and
if you wanted to make that section of code even smaller, i wonder if you
couldn't write an outer macro that calls each of two inner macros twice.
Each inner macro contains a get data command and a save outfile command to
read and save a file. The difference between the two inner macros is that
one is for files having one variable sequence and the other is for the other
sequence. In addition to calling and passing file names to the inner macro,
the outer macro merges the four temp files. It'd be clever to do that but,
on the other hand, why bother.
 
Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Chad T. Lower
In reply to this post by Peck, Jon
On 12/21/06, Peck, Jon wrote:
>
> ...is to create a shorthand way to refer to a long variable list.


This is probably where I am having problems and I do not know how to do the
shorthand way...  That is what I am trying to make.

On 12/21/06, Gene Maguin wrote:
>
> Look at the add files command and see if the In keyword will give you what
> you need.
>

That is what I am currently doing.  I guess I didn't say it clearly enough.


> That said and if you wanted to make that section of code even smaller, i
> wonder if you couldn't write an outer macro that calls each of two inner
> macros twice.
>

I guess that is what I will have to try next.

 It'd be clever to do that but, on the other hand, why bother.
>

I guess you could say that I am old enough to remember the days when memory
was in short supply and expensive (both RAM and ROM).  Code was written
"clever" and succinct with no wasted space that wasn't necessary.  As I
don't like waste of any kind (food, refuse, etc.), my preference is to make
the code as succinct as possible.  Again, the program I have is fully
functional and works to give me what I need.  I am just trying to make it
more succinct in its programming.  As I have only been working with SPSS
four about 2 months, I am learning about new methods and techniques daily.
Unfortunately, this problem has stumped me although I have been working on
it for several weeks already.  As I mentioned in a previous post, the other
"idea" I would like to do that has stumped me is:

 As a side note, my complete first part of the file reads:

!let !q=3
!let !j=2005
!let !i=2006

It would be nice to be able to just have the user change the !i value since
the !j value will always be one less, but I can't figure that out neither
:-(
Reply | Threaded
Open this post in threaded view
|

Re: Using DO REPEAT with DATA LIST

Peck, Jon
There is probably something more complicated in your situation, but here is what I meant.

define !varlist()
origin accel weight
!enddefine.

data list free /x y !varlist.
desc !varlist.

!varlist is a shorthand, so you define it once and then use it whenever that list is needed.
The data list would list the varying part followed by the macro for the common  part.

As an aside, you might want to consider an alternative approach to all of this using the programmability features introduced in SPSS 14.  Once you get the hang of Python, you will find it almost always easier to do things that way then to use macro, and it is much more powerful than macro.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chad T. Lower
Sent: Friday, December 22, 2006 10:04 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Using DO REPEAT with DATA LIST

On 12/21/06, Peck, Jon wrote:
>
> ...is to create a shorthand way to refer to a long variable list.


This is probably where I am having problems and I do not know how to do the
shorthand way...  That is what I am trying to make.

On 12/21/06, Gene Maguin wrote:
>
> Look at the add files command and see if the In keyword will give you what
> you need.
>

That is what I am currently doing.  I guess I didn't say it clearly enough.


> That said and if you wanted to make that section of code even smaller, i
> wonder if you couldn't write an outer macro that calls each of two inner
> macros twice.
>

I guess that is what I will have to try next.

 It'd be clever to do that but, on the other hand, why bother.
>

I guess you could say that I am old enough to remember the days when memory
was in short supply and expensive (both RAM and ROM).  Code was written
"clever" and succinct with no wasted space that wasn't necessary.  As I
don't like waste of any kind (food, refuse, etc.), my preference is to make
the code as succinct as possible.  Again, the program I have is fully
functional and works to give me what I need.  I am just trying to make it
more succinct in its programming.  As I have only been working with SPSS
four about 2 months, I am learning about new methods and techniques daily.
Unfortunately, this problem has stumped me although I have been working on
it for several weeks already.  As I mentioned in a previous post, the other
"idea" I would like to do that has stumped me is:

 As a side note, my complete first part of the file reads:

!let !q=3
!let !j=2005
!let !i=2006

It would be nice to be able to just have the user change the !i value since
the !j value will always be one less, but I can't figure that out neither
:-(