Help using vectors and loops

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Help using vectors and loops

Vincent LOUIS-2
Hi Everyone,
I have 2 different data files. One for 2002 and the other for 2004.
Both files have the same number of cases, and identical unique
identifying number for each case.

Each file has 314 different variables (including the unique id number).

I am trying to write an spss syntax that do the following:

*filters each file by each of the 314 variables, one at a time;
*aggregate the file on the a separate break variable;
* then save the resulting file out as the name of the filter variable.

An example of the data is the following:

uniqueid mcdid   v1   v2   v3

1               2      1     0    0
2               3      0     1    0
3               3      0     1    0
4               4      0     0    1
5               2      1     0    0
6               3      0     1    0

This is what I've written (but it does not work).

get file = 'M:\VINCENT\panjq32002.sav' .

vector vname = v1 to v3 .

loop #i = 1 to 1 .
filter by (vname) .

aggregate outfile = ''M:\VINCENT\((vname).sav)'  .
end loop .
exe .

Can anyone help me figure out how to make the corrections to the syntax?
I want to repeat the procedure on the 2004 file. Then I want to merge
the 2 files.

Thanks you all in advance.

Vincent
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Maguin, Eugene
Vincent,

The short answer is that procedures such as aggregate, frequencies, etc do
not work within a loop structure, unless you use a macro loop. The other
thing that will cause trouble is that your aggregate statement is
incorrectly written as it does not specify the output and input variables
and function to be used and it doesn't specify the break variable. That
said, I am completely confused by what you are up to. It would help if you
would work through an example for one pass through the data. Suppose you
used v1, what would the resulting file look like given your example input
data?

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Vincent LOUIS-2
On 10/25/06, Gene Maguin <[hidden email]> wrote:

> Vincent,
>
> The short answer is that procedures such as aggregate, frequencies, etc do
> not work within a loop structure, unless you use a macro loop. The other
> thing that will cause trouble is that your aggregate statement is
> incorrectly written as it does not specify the output and input variables
> and function to be used and it doesn't specify the break variable. That
> said, I am completely confused by what you are up to. It would help if you
> would work through an example for one pass through the data. Suppose you
> used v1, what would the resulting file look like given your example input
> data?
>
> Gene Maguin
>

Hi Gene,
Even thought I included just 3 variables (v1 thru v3), I have 314 in
my actual data set.

What I would like to do:

(1) filters each file by each of the 314 variables, one at a time;

(2) aggregate the file on the a separate break variable;

(3) then save the resulting file out as the name of the filter variable.

Using the sample data I presented the final output for v1 (for example) would be

some file "titled" v1.sav with the following information:

mcdid v1   v2   v3

2         2    0    0

Let me know what you think.

Vincent
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Maguin, Eugene
In reply to this post by Vincent LOUIS-2
Vincent,

Your reply leaves a lot for people to fill in. Here's my guess.

1) filter each file by each of the 314 variables, one at a time;

2) aggregate the filtered file on the a separate break variable (mcdid),
keeping the uniqueid variable and counting the number of records in
resulting file for each value of the break variable.

3) then save the resulting file out as the name of the filter variable. The
result being 314 files for the 2002 and 314 for 2004.

I am not clear whether you just wrote a shorthand form for the correct
aggregate syntax or you thought that was the correct syntax. This is the
correct syntax for the aggregate.

Aggregate outfile='v1.sav' mode=replace/break=mcdid/
   uniqueid=first(uniqueid)/v1=nu.

I can not help you with the macro loop. Other people know far more about
this than I do.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Richard Ristow
In reply to this post by Vincent LOUIS-2
At 01:39 PM 10/25/2006, Vincent Louis wrote:

>I have 2 different data files. Each file has 314 different variables
>(including the unique id number).
>
>I am trying to write an spss syntax that do the following:
>
>*filters each file by each of the 314 variables, one at a time;
>*aggregate the file on the a separate break variable;
>* then save the resulting file out as the name of the filter variable.

Gene Maguin's observations are very sound. To add to them,

- Before making your code work 314 times, make it work once. Skipping
the looping, what is the syntax to do what you want for one of your
variables?

- As Gene said, this is 'macro' logic in the general sense: it needs to
generate and execute 314 different sets of SPSS code. There are three
ways (that I have at my fingers' ends):
. If you have SPSS 14 or 15, a loop with spss.submit in Python
. A loop in the macro facility (DEFINE ... !ENDDEFINE)
. Get the data dictionary into an SPSS file (see my posting on the
subject); write a transformation program that generates the code for
each variable; and INCLUDE or INSERT the resulting file.

Python, if you have it, is probably best. Otherwise, I might consider
generating the code from SPSS and INCLUDEing it.

- To criticize based on little knowledge, if this were my project, I'd
look at whether it was a good idea at all. This really is "little
knowledge", since you haven't posted what your AGGREGATE does, but some
things look odd: for example, sometime you're going to
.  FILTER BY <unique ID variable>.
which doesn't look like it makes sense.

Then you'll have the 314 files and need to do something with them, and
you'll probably have to automate that, too. Anyhow, I'd look hard at
the end, and at whether this is the best means.
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Richard Ristow
At 07:23 PM 10/25/2006, Vincent Louis wrote off-list (I think Marta has
a point, making all replies on-list):

>On 10/25/06, Richard Ristow <[hidden email]> wrote,
>responding to question:
>
>>>I am trying to
>>>*filter each file by each of the 314 variables, one at a time;
>>>*aggregate the file on the a separate break variable;
>>>* then save the resulting file out as the name of the filter
>>>variable.
>>
>>- Before making your code work 314 times, make it work once. Skipping
>>the looping, what is the syntax to do what you want for one of your
>>variables?
>
>I have information on the number of employees, and number of
>businesses [for two periods]. I want to see whether the average number
>of employees and average number of businesses increased or decreased.
>
>[This syntax] examines change for one industry
>(ophthgdsmfgq3q4_2002_2004).
>
>I would like to repeat this for 313 other category of manufacturing
>activities in 2002 and in 2004. Is it possible to streamline repeating
>the syntax below 313 times?
>
>****QUARTER 3_2002 .
>
>get file = 'M:\VINCENT\' +
>     'working files \q32002.sav' .
>
>filter by ophthgdsmfgq302 .
>
>AGGREGATE
>  /OUTFILE = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav'
>  /BREAK = mcdid
>  / julempq302ophthgdsmfg = sum(julempq302)
>  / augempq302ophthgdsmfg = sum(augempq302)
>  / sepempq302ophthgdsmfg = sum(sepempq302) /
>  / quarterwage302ophthgdsmfg = sum(quarterwage302) .
>
>get file = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' .
>
>compute testq302 = 1 .
>exe .
>
>save outfile = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' .

I've reformatted some, for readability.

Little things, to warm up: On your AGGREGATE, 'OUTFILE=*' would be
easier than writing to an external file and then reading it back. And
the 'exe.' after the 'compute' is not necessary, nor helpful.

Here's the big stuff: It looks like your file doesn't have 314
variables, but 314 *groups* of variables, each for one industry; this
is the 302nd group and applies to industry 'ophthgdsmfgq302'
("Ophthalmic Goods Manufacturing", from text I'm not quoting.) In other
words, you have a very, very 'wide' file.

If I've got this right, you will have SO much easier a time if you
restructure it to 'long', with a separate record for each industry's
data. It will be a huge pain in the neck generating the VARSTOCASES
syntax (Python, here we come); but then, so much less tangled.

Do you know how to use DISPLAY LABELS to get a complete variable list?
If so, do it, and post the list. In your syntax above, I see 5
variables for this industry alone, so I suppose there are more than
5*314=1570 variables in the whole file.

Ah, onward, ever onward,
Richard
Reply | Threaded
Open this post in threaded view
|

*Use* of dictionary in v11 and earlier

Albert-Jan Roskam
In reply to this post by Richard Ristow
Hi list!

I saw some post about using DISPLAY DICTIONARY in a
smilar way as Proc Contents using OMS. For
V11-and-earlier users, like me, that's not an option.
How about the following solution? The variable
'firstvar' is the variable that physically appears
first in the data set. The it is a self-writing syntax
that dumps variable names, together with
define-!enddefine in an ascii file and uses this to
compile a macro.

get file='d:\temp\myfile.sav'.
n of cases 1.
string temp enddefin (a20).
compute temp = firstvar.
rename variables (firstvar=define)(temp=firstvar).
compute firstvar = enddefin.
flip.
string myvars (a20).
compute myvars = case_lbl.
recode myvars ("firstvar"="firstvar.") ("DEFINE"=
"DEFINE allvars ()") ("ENDDEFIN"="!ENDDEFINE.").
compute myvars = lowcase(myvars).
print outfile = 'd:\temp\varlist.sps' / myvars.
exe.
include file = 'd:\temp\varlist.sps' .
exe.
get file='d:\temp\somefile.sav'.
display macros.
fre allvars.

Cheeeeers!
Albert-Jan





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Beadle, ViAnn
In reply to this post by Richard Ristow
Seems to me that the means procedure can do this so much simpler with industry used as a grouping variable.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, October 25, 2006 7:59 PM
To: [hidden email]
Subject: Re: Help using vectors and loops

At 07:23 PM 10/25/2006, Vincent Louis wrote off-list (I think Marta has
a point, making all replies on-list):

>On 10/25/06, Richard Ristow <[hidden email]> wrote,
>responding to question:
>
>>>I am trying to
>>>*filter each file by each of the 314 variables, one at a time;
>>>*aggregate the file on the a separate break variable;
>>>* then save the resulting file out as the name of the filter
>>>variable.
>>
>>- Before making your code work 314 times, make it work once. Skipping
>>the looping, what is the syntax to do what you want for one of your
>>variables?
>
>I have information on the number of employees, and number of
>businesses [for two periods]. I want to see whether the average number
>of employees and average number of businesses increased or decreased.
>
>[This syntax] examines change for one industry
>(ophthgdsmfgq3q4_2002_2004).
>
>I would like to repeat this for 313 other category of manufacturing
>activities in 2002 and in 2004. Is it possible to streamline repeating
>the syntax below 313 times?
>
>****QUARTER 3_2002 .
>
>get file = 'M:\VINCENT\' +
>     'working files \q32002.sav' .
>
>filter by ophthgdsmfgq302 .
>
>AGGREGATE
>  /OUTFILE = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav'
>  /BREAK = mcdid
>  / julempq302ophthgdsmfg = sum(julempq302)
>  / augempq302ophthgdsmfg = sum(augempq302)
>  / sepempq302ophthgdsmfg = sum(sepempq302) /
>  / quarterwage302ophthgdsmfg = sum(quarterwage302) .
>
>get file = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' .
>
>compute testq302 = 1 .
>exe .
>
>save outfile = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' .

I've reformatted some, for readability.

Little things, to warm up: On your AGGREGATE, 'OUTFILE=*' would be
easier than writing to an external file and then reading it back. And
the 'exe.' after the 'compute' is not necessary, nor helpful.

Here's the big stuff: It looks like your file doesn't have 314
variables, but 314 *groups* of variables, each for one industry; this
is the 302nd group and applies to industry 'ophthgdsmfgq302'
("Ophthalmic Goods Manufacturing", from text I'm not quoting.) In other
words, you have a very, very 'wide' file.

If I've got this right, you will have SO much easier a time if you
restructure it to 'long', with a separate record for each industry's
data. It will be a huge pain in the neck generating the VARSTOCASES
syntax (Python, here we come); but then, so much less tangled.

Do you know how to use DISPLAY LABELS to get a complete variable list?
If so, do it, and post the list. In your syntax above, I see 5
variables for this industry alone, so I suppose there are more than
5*314=1570 variables in the whole file.

Ah, onward, ever onward,
Richard
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Richard Ristow
At 08:56 AM 10/26/2006, Beadle, ViAnn wrote:

>Seems to me that the means procedure can do this so much simpler with
>industry used as a grouping variable.

'Twould exactly, though it looks like the sums are desired for later
calculation, which suggests AGGREGATE rather than MEANS.

The problem, as I see it now, is that "industry" isn't a grouping
variable, nor a variable at all. It looks like the data organization is
'wide', with each industry a set of variables within the record. (I
don't know that the records represent. If they represent firms, it may
well be that, usually, all but one set of "industry" variables is
zero.)

Going to 'long' organization, by the way, may be easier with LOOP/XSAVE
than with VARSTOCASES.

-Cheers, and onward,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: *Use* of dictionary in v11 and earlier

Richard Ristow
In reply to this post by Albert-Jan Roskam
At 07:44 AM 10/26/2006, Albert-jan Roskam wrote:

>I saw some post about using DISPLAY DICTIONARY in a similar way as
>Proc Contents using OMS. For V11-and-earlier users, like me, that's
>not an option. How about the following solution?

Yes. Pre-OMS, command "FLIP" appears to be the best solution, or the
best one known. Raynald's site has examples that generate and use the
dictionary that way (e.g., "Add (or replace) a character at the
beginning of each var names.SPS",
http://spsstools.net/Syntax/LabelsAndVariableNames/ChangeCharacterAtBeginningOfEachVarNames.txt),
though I don't see one that just generates and saves the dictionary.

Ah, SPSS, the simple things that you make hard, sometimes
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Beadle, ViAnn
In reply to this post by Richard Ristow
The MEANS procedure and the SUMMARIZE procedure are essentially the same thing under the cover and will do everything that AGGREGATE does except it produces a nice table.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Thursday, October 26, 2006 10:36 AM
To: Beadle, ViAnn; [hidden email]
Cc: Vincent Louis
Subject: Re: Help using vectors and loops

At 08:56 AM 10/26/2006, Beadle, ViAnn wrote:

>Seems to me that the means procedure can do this so much simpler with
>industry used as a grouping variable.

'Twould exactly, though it looks like the sums are desired for later
calculation, which suggests AGGREGATE rather than MEANS.

The problem, as I see it now, is that "industry" isn't a grouping
variable, nor a variable at all. It looks like the data organization is
'wide', with each industry a set of variables within the record. (I
don't know that the records represent. If they represent firms, it may
well be that, usually, all but one set of "industry" variables is
zero.)

Going to 'long' organization, by the way, may be easier with LOOP/XSAVE
than with VARSTOCASES.

-Cheers, and onward,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Maguin, Eugene
Viann,

I want to comment from the sidelines. One element of Vincent's original
message was the need to do this operation to two files and then match the
resulting files. How can you do that when using means or summarize?

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Norms and fit to distributions

DEMUNA Chorrillos-Peru
Dear colleagues.
I need create T or other scaled scores (type CI), but using information of
shape, location and scale information from the target distributions of
scores. How I can perform this? How I can adjunt my empirical distribution
to some theoretical distribution (Pearson type III, for example), and get
scaled scores using this information?

Thank you very much in advance.

Cesar Merino
Peru

_________________________________________________________________
MSN Amor: busca tu ½ naranja http://latam.msn.com/amor/
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Richard Ristow
In reply to this post by Beadle, ViAnn
At 04:05 PM 10/26/2006, Beadle, ViAnn wrote:

>The MEANS procedure and the SUMMARIZE procedure are essentially the
>same thing under the cover and will do everything that AGGREGATE does
>except it produces a nice table.

I may be totally thick here. I love MEANS, especially the rich suite of
cell statistics. But I don't see, from the documentation, how to
capture the cell statistics to an SPSS file, which I think is desired.
Were you thinking of OMSing the output to a file? Or, what obvious
thing am I missing?
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Beadle, ViAnn
In reply to this post by Maguin, Eugene
I kinda lost track of all this but think that the separate files issue is one of a intermediate tactic rather than the final goal.

IMHO, too often are transformations done to get summarized results more easily done by reporting procedures.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Thursday, October 26, 2006 3:19 PM
To: [hidden email]
Subject: Re: Help using vectors and loops

Viann,

I want to comment from the sidelines. One element of Vincent's original
message was the need to do this operation to two files and then match the
resulting files. How can you do that when using means or summarize?

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Richard Ristow
In reply to this post by Maguin, Eugene
Hi, Gene!

Vincent, are you starting to feel left out of your own problem?

At 04:18 PM 10/26/2006, Gene Maguin wrote:

>I want to comment from the sidelines.

Hey, and here am I, so I must be on the sidelines
of the sidelines. I suppose, that means in the stands?

>One element of Vincent's original message was
>the need to do this operation to two files and
>then match the resulting files. How can you do
>that when using means or summarize?

Right. There's the question of capturing the
results. AND the question, the biggest question,
of getting the files into 'long' form before you
do anything. (There may be a decent way to do the
job with 'wide' files, but I don't see it.
Writing the code to generate the code to produce
134 different output files doesn't appeal to me,
and using the 134 once you've got them would likely be worse.)

But as for multiple files, if you had them in
'long' form, I'd probably catenate them first
(ADD FILES), using /IN= variables to mark which
records come from which inputs; AGGREGATE using
the file identifier as the high-order BY
variable; sort cases by summary category, and
file within category; and voilà  - a 'long' file,
in which the summary records for the input
categories, for all files, are grouped.

OK, you see my style, but I think it's sound:
Keep numbers of files and variables low; to
attain that, multiply records (cases) freely.
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Beadle, ViAnn
In reply to this post by Richard Ristow
I am not sure why Vincent wants to capture this as an SPSS data file? Is that a goal or a strategy?

________________________________

From: Richard Ristow [mailto:[hidden email]]
Sent: Thu 10/26/2006 3:38 PM
To: Beadle, ViAnn; [hidden email]
Subject: Re: Help using vectors and loops



At 04:05 PM 10/26/2006, Beadle, ViAnn wrote:

>The MEANS procedure and the SUMMARIZE procedure are essentially the
>same thing under the cover and will do everything that AGGREGATE does
>except it produces a nice table.

I may be totally thick here. I love MEANS, especially the rich suite of
cell statistics. But I don't see, from the documentation, how to
capture the cell statistics to an SPSS file, which I think is desired.
Were you thinking of OMSing the output to a file? Or, what obvious
thing am I missing?
Reply | Threaded
Open this post in threaded view
|

Re: Help using vectors and loops

Albert-Jan Roskam
In reply to this post by Beadle, ViAnn
But as said, AGGREGATE offers the possibility to run
subsequent analyses on your data. But maybe OMS +
SUMMARIZE will be able do this as well.

AJ

--- "Beadle, ViAnn" <[hidden email]> wrote:

> The MEANS procedure and the SUMMARIZE procedure are
> essentially the same thing under the cover and will
> do everything that AGGREGATE does except it produces
> a nice table.
>
> -----Original Message-----
> From: Richard Ristow
> [mailto:[hidden email]]
> Sent: Thursday, October 26, 2006 10:36 AM
> To: Beadle, ViAnn; [hidden email]
> Cc: Vincent Louis
> Subject: Re: Help using vectors and loops
>
> At 08:56 AM 10/26/2006, Beadle, ViAnn wrote:
>
> >Seems to me that the means procedure can do this so
> much simpler with
> >industry used as a grouping variable.
>
> 'Twould exactly, though it looks like the sums are
> desired for later
> calculation, which suggests AGGREGATE rather than
> MEANS.
>
> The problem, as I see it now, is that "industry"
> isn't a grouping
> variable, nor a variable at all. It looks like the
> data organization is
> 'wide', with each industry a set of variables within
> the record. (I
> don't know that the records represent. If they
> represent firms, it may
> well be that, usually, all but one set of "industry"
> variables is
> zero.)
>
> Going to 'long' organization, by the way, may be
> easier with LOOP/XSAVE
> than with VARSTOCASES.
>
> -Cheers, and onward,
>   Richard
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

data list within a simple macro won't work

Albert-Jan Roskam
In reply to this post by Beadle, ViAnn
Hi list,

Sometimes I create a mini data set just to unload the
current data set so I can rename it, change the
attributes, move it, etc. But why does the syntax
below not work? SPSS keeps giving the message 'waiting
for more inline data'. Indenting, or putting
everything on one line does not help. Any ideas?

Albert-Jan
(that guy so nice they named him twice ;-))

define qwerty ()
data list / empty 1-5 (a).
begin data
empty
end data.
!enddefine.

qwerty.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: data list within a simple macro won't work

Marta García-Granero
Hi Albert-Jan

(You shouldn't complain, my parents christened me using 4 names,
fortunately, only one shows in my documents and ID card)

Now, seriously:

AjR> Sometimes I create a mini data set just to unload the
AjR> current data set so I can rename it, change the
AjR> attributes, move it, etc. But why does the syntax
AjR> below not work? SPSS keeps giving the message 'waiting
AjR> for more inline data'. Indenting, or putting
AjR> everything on one line does not help. Any ideas?

AjR> Albert-Jan
AjR> (that guy so nice they named him twice ;-))

After a bit of T&E (Trial and Error, the best method to learn SPSS
syntax sometimes - wry smile), I've found out this:

The BEGIN DATA... END DATA part should be outside the macro, I'm
afraid. I couldn't find any reference of that in the syntax guide (but
that doesn't necessarily mean it isn't there, I've failed to notice
some items in the past).

This works OK, but is of little use:

DEFINE !qwerty().
INPUT PROGRAM.
DATA LIST FIXED/NoData 1-5 (A).
END INPUT PROGRAM.
EXECUTE.
!ENDDEFINE.

!qwerty.
BEGIN DATA
empty
END DATA.

I have tried to put the BEGIN DATA... END DATA inside a second macro
to call them consecutevely, but it doesn't work either.

The only thing I can concoct in my brain is to use WRITE OUTFILE to save
to disk a file with the necessary commands, and then INCLUDE OR INSERT
it.

Don't blame the messenger...


--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)
12