Hi Everyone,
I have 2 different data files. One for 2002 and the other for 2004. Both files have the same number of cases, and identical unique identifying number for each case. Each file has 314 different variables (including the unique id number). I am trying to write an spss syntax that do the following: *filters each file by each of the 314 variables, one at a time; *aggregate the file on the a separate break variable; * then save the resulting file out as the name of the filter variable. An example of the data is the following: uniqueid mcdid v1 v2 v3 1 2 1 0 0 2 3 0 1 0 3 3 0 1 0 4 4 0 0 1 5 2 1 0 0 6 3 0 1 0 This is what I've written (but it does not work). get file = 'M:\VINCENT\panjq32002.sav' . vector vname = v1 to v3 . loop #i = 1 to 1 . filter by (vname) . aggregate outfile = ''M:\VINCENT\((vname).sav)' . end loop . exe . Can anyone help me figure out how to make the corrections to the syntax? I want to repeat the procedure on the 2004 file. Then I want to merge the 2 files. Thanks you all in advance. Vincent |
Vincent,
The short answer is that procedures such as aggregate, frequencies, etc do not work within a loop structure, unless you use a macro loop. The other thing that will cause trouble is that your aggregate statement is incorrectly written as it does not specify the output and input variables and function to be used and it doesn't specify the break variable. That said, I am completely confused by what you are up to. It would help if you would work through an example for one pass through the data. Suppose you used v1, what would the resulting file look like given your example input data? Gene Maguin |
On 10/25/06, Gene Maguin <[hidden email]> wrote:
> Vincent, > > The short answer is that procedures such as aggregate, frequencies, etc do > not work within a loop structure, unless you use a macro loop. The other > thing that will cause trouble is that your aggregate statement is > incorrectly written as it does not specify the output and input variables > and function to be used and it doesn't specify the break variable. That > said, I am completely confused by what you are up to. It would help if you > would work through an example for one pass through the data. Suppose you > used v1, what would the resulting file look like given your example input > data? > > Gene Maguin > Hi Gene, Even thought I included just 3 variables (v1 thru v3), I have 314 in my actual data set. What I would like to do: (1) filters each file by each of the 314 variables, one at a time; (2) aggregate the file on the a separate break variable; (3) then save the resulting file out as the name of the filter variable. Using the sample data I presented the final output for v1 (for example) would be some file "titled" v1.sav with the following information: mcdid v1 v2 v3 2 2 0 0 Let me know what you think. Vincent |
In reply to this post by Vincent LOUIS-2
Vincent,
Your reply leaves a lot for people to fill in. Here's my guess. 1) filter each file by each of the 314 variables, one at a time; 2) aggregate the filtered file on the a separate break variable (mcdid), keeping the uniqueid variable and counting the number of records in resulting file for each value of the break variable. 3) then save the resulting file out as the name of the filter variable. The result being 314 files for the 2002 and 314 for 2004. I am not clear whether you just wrote a shorthand form for the correct aggregate syntax or you thought that was the correct syntax. This is the correct syntax for the aggregate. Aggregate outfile='v1.sav' mode=replace/break=mcdid/ uniqueid=first(uniqueid)/v1=nu. I can not help you with the macro loop. Other people know far more about this than I do. Gene Maguin |
In reply to this post by Vincent LOUIS-2
At 01:39 PM 10/25/2006, Vincent Louis wrote:
>I have 2 different data files. Each file has 314 different variables >(including the unique id number). > >I am trying to write an spss syntax that do the following: > >*filters each file by each of the 314 variables, one at a time; >*aggregate the file on the a separate break variable; >* then save the resulting file out as the name of the filter variable. Gene Maguin's observations are very sound. To add to them, - Before making your code work 314 times, make it work once. Skipping the looping, what is the syntax to do what you want for one of your variables? - As Gene said, this is 'macro' logic in the general sense: it needs to generate and execute 314 different sets of SPSS code. There are three ways (that I have at my fingers' ends): . If you have SPSS 14 or 15, a loop with spss.submit in Python . A loop in the macro facility (DEFINE ... !ENDDEFINE) . Get the data dictionary into an SPSS file (see my posting on the subject); write a transformation program that generates the code for each variable; and INCLUDE or INSERT the resulting file. Python, if you have it, is probably best. Otherwise, I might consider generating the code from SPSS and INCLUDEing it. - To criticize based on little knowledge, if this were my project, I'd look at whether it was a good idea at all. This really is "little knowledge", since you haven't posted what your AGGREGATE does, but some things look odd: for example, sometime you're going to . FILTER BY <unique ID variable>. which doesn't look like it makes sense. Then you'll have the 314 files and need to do something with them, and you'll probably have to automate that, too. Anyhow, I'd look hard at the end, and at whether this is the best means. |
At 07:23 PM 10/25/2006, Vincent Louis wrote off-list (I think Marta has
a point, making all replies on-list): >On 10/25/06, Richard Ristow <[hidden email]> wrote, >responding to question: > >>>I am trying to >>>*filter each file by each of the 314 variables, one at a time; >>>*aggregate the file on the a separate break variable; >>>* then save the resulting file out as the name of the filter >>>variable. >> >>- Before making your code work 314 times, make it work once. Skipping >>the looping, what is the syntax to do what you want for one of your >>variables? > >I have information on the number of employees, and number of >businesses [for two periods]. I want to see whether the average number >of employees and average number of businesses increased or decreased. > >[This syntax] examines change for one industry >(ophthgdsmfgq3q4_2002_2004). > >I would like to repeat this for 313 other category of manufacturing >activities in 2002 and in 2004. Is it possible to streamline repeating >the syntax below 313 times? > >****QUARTER 3_2002 . > >get file = 'M:\VINCENT\' + > 'working files \q32002.sav' . > >filter by ophthgdsmfgq302 . > >AGGREGATE > /OUTFILE = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' > /BREAK = mcdid > / julempq302ophthgdsmfg = sum(julempq302) > / augempq302ophthgdsmfg = sum(augempq302) > / sepempq302ophthgdsmfg = sum(sepempq302) / > / quarterwage302ophthgdsmfg = sum(quarterwage302) . > >get file = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' . > >compute testq302 = 1 . >exe . > >save outfile = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' . I've reformatted some, for readability. Little things, to warm up: On your AGGREGATE, 'OUTFILE=*' would be easier than writing to an external file and then reading it back. And the 'exe.' after the 'compute' is not necessary, nor helpful. Here's the big stuff: It looks like your file doesn't have 314 variables, but 314 *groups* of variables, each for one industry; this is the 302nd group and applies to industry 'ophthgdsmfgq302' ("Ophthalmic Goods Manufacturing", from text I'm not quoting.) In other words, you have a very, very 'wide' file. If I've got this right, you will have SO much easier a time if you restructure it to 'long', with a separate record for each industry's data. It will be a huge pain in the neck generating the VARSTOCASES syntax (Python, here we come); but then, so much less tangled. Do you know how to use DISPLAY LABELS to get a complete variable list? If so, do it, and post the list. In your syntax above, I see 5 variables for this industry alone, so I suppose there are more than 5*314=1570 variables in the whole file. Ah, onward, ever onward, Richard |
In reply to this post by Richard Ristow
Hi list!
I saw some post about using DISPLAY DICTIONARY in a smilar way as Proc Contents using OMS. For V11-and-earlier users, like me, that's not an option. How about the following solution? The variable 'firstvar' is the variable that physically appears first in the data set. The it is a self-writing syntax that dumps variable names, together with define-!enddefine in an ascii file and uses this to compile a macro. get file='d:\temp\myfile.sav'. n of cases 1. string temp enddefin (a20). compute temp = firstvar. rename variables (firstvar=define)(temp=firstvar). compute firstvar = enddefin. flip. string myvars (a20). compute myvars = case_lbl. recode myvars ("firstvar"="firstvar.") ("DEFINE"= "DEFINE allvars ()") ("ENDDEFIN"="!ENDDEFINE."). compute myvars = lowcase(myvars). print outfile = 'd:\temp\varlist.sps' / myvars. exe. include file = 'd:\temp\varlist.sps' . exe. get file='d:\temp\somefile.sav'. display macros. fre allvars. Cheeeeers! Albert-Jan __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
In reply to this post by Richard Ristow
Seems to me that the means procedure can do this so much simpler with industry used as a grouping variable.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, October 25, 2006 7:59 PM To: [hidden email] Subject: Re: Help using vectors and loops At 07:23 PM 10/25/2006, Vincent Louis wrote off-list (I think Marta has a point, making all replies on-list): >On 10/25/06, Richard Ristow <[hidden email]> wrote, >responding to question: > >>>I am trying to >>>*filter each file by each of the 314 variables, one at a time; >>>*aggregate the file on the a separate break variable; >>>* then save the resulting file out as the name of the filter >>>variable. >> >>- Before making your code work 314 times, make it work once. Skipping >>the looping, what is the syntax to do what you want for one of your >>variables? > >I have information on the number of employees, and number of >businesses [for two periods]. I want to see whether the average number >of employees and average number of businesses increased or decreased. > >[This syntax] examines change for one industry >(ophthgdsmfgq3q4_2002_2004). > >I would like to repeat this for 313 other category of manufacturing >activities in 2002 and in 2004. Is it possible to streamline repeating >the syntax below 313 times? > >****QUARTER 3_2002 . > >get file = 'M:\VINCENT\' + > 'working files \q32002.sav' . > >filter by ophthgdsmfgq302 . > >AGGREGATE > /OUTFILE = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' > /BREAK = mcdid > / julempq302ophthgdsmfg = sum(julempq302) > / augempq302ophthgdsmfg = sum(augempq302) > / sepempq302ophthgdsmfg = sum(sepempq302) / > / quarterwage302ophthgdsmfg = sum(quarterwage302) . > >get file = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' . > >compute testq302 = 1 . >exe . > >save outfile = 'M:\VINCENT\NAICS 6\ophthgdsmfgq302_agg.sav' . I've reformatted some, for readability. Little things, to warm up: On your AGGREGATE, 'OUTFILE=*' would be easier than writing to an external file and then reading it back. And the 'exe.' after the 'compute' is not necessary, nor helpful. Here's the big stuff: It looks like your file doesn't have 314 variables, but 314 *groups* of variables, each for one industry; this is the 302nd group and applies to industry 'ophthgdsmfgq302' ("Ophthalmic Goods Manufacturing", from text I'm not quoting.) In other words, you have a very, very 'wide' file. If I've got this right, you will have SO much easier a time if you restructure it to 'long', with a separate record for each industry's data. It will be a huge pain in the neck generating the VARSTOCASES syntax (Python, here we come); but then, so much less tangled. Do you know how to use DISPLAY LABELS to get a complete variable list? If so, do it, and post the list. In your syntax above, I see 5 variables for this industry alone, so I suppose there are more than 5*314=1570 variables in the whole file. Ah, onward, ever onward, Richard |
At 08:56 AM 10/26/2006, Beadle, ViAnn wrote:
>Seems to me that the means procedure can do this so much simpler with >industry used as a grouping variable. 'Twould exactly, though it looks like the sums are desired for later calculation, which suggests AGGREGATE rather than MEANS. The problem, as I see it now, is that "industry" isn't a grouping variable, nor a variable at all. It looks like the data organization is 'wide', with each industry a set of variables within the record. (I don't know that the records represent. If they represent firms, it may well be that, usually, all but one set of "industry" variables is zero.) Going to 'long' organization, by the way, may be easier with LOOP/XSAVE than with VARSTOCASES. -Cheers, and onward, Richard |
In reply to this post by Albert-Jan Roskam
At 07:44 AM 10/26/2006, Albert-jan Roskam wrote:
>I saw some post about using DISPLAY DICTIONARY in a similar way as >Proc Contents using OMS. For V11-and-earlier users, like me, that's >not an option. How about the following solution? Yes. Pre-OMS, command "FLIP" appears to be the best solution, or the best one known. Raynald's site has examples that generate and use the dictionary that way (e.g., "Add (or replace) a character at the beginning of each var names.SPS", http://spsstools.net/Syntax/LabelsAndVariableNames/ChangeCharacterAtBeginningOfEachVarNames.txt), though I don't see one that just generates and saves the dictionary. Ah, SPSS, the simple things that you make hard, sometimes |
In reply to this post by Richard Ristow
The MEANS procedure and the SUMMARIZE procedure are essentially the same thing under the cover and will do everything that AGGREGATE does except it produces a nice table.
-----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Thursday, October 26, 2006 10:36 AM To: Beadle, ViAnn; [hidden email] Cc: Vincent Louis Subject: Re: Help using vectors and loops At 08:56 AM 10/26/2006, Beadle, ViAnn wrote: >Seems to me that the means procedure can do this so much simpler with >industry used as a grouping variable. 'Twould exactly, though it looks like the sums are desired for later calculation, which suggests AGGREGATE rather than MEANS. The problem, as I see it now, is that "industry" isn't a grouping variable, nor a variable at all. It looks like the data organization is 'wide', with each industry a set of variables within the record. (I don't know that the records represent. If they represent firms, it may well be that, usually, all but one set of "industry" variables is zero.) Going to 'long' organization, by the way, may be easier with LOOP/XSAVE than with VARSTOCASES. -Cheers, and onward, Richard |
Viann,
I want to comment from the sidelines. One element of Vincent's original message was the need to do this operation to two files and then match the resulting files. How can you do that when using means or summarize? Gene Maguin |
Dear colleagues.
I need create T or other scaled scores (type CI), but using information of shape, location and scale information from the target distributions of scores. How I can perform this? How I can adjunt my empirical distribution to some theoretical distribution (Pearson type III, for example), and get scaled scores using this information? Thank you very much in advance. Cesar Merino Peru _________________________________________________________________ MSN Amor: busca tu ½ naranja http://latam.msn.com/amor/ |
In reply to this post by Beadle, ViAnn
At 04:05 PM 10/26/2006, Beadle, ViAnn wrote:
>The MEANS procedure and the SUMMARIZE procedure are essentially the >same thing under the cover and will do everything that AGGREGATE does >except it produces a nice table. I may be totally thick here. I love MEANS, especially the rich suite of cell statistics. But I don't see, from the documentation, how to capture the cell statistics to an SPSS file, which I think is desired. Were you thinking of OMSing the output to a file? Or, what obvious thing am I missing? |
In reply to this post by Maguin, Eugene
I kinda lost track of all this but think that the separate files issue is one of a intermediate tactic rather than the final goal.
IMHO, too often are transformations done to get summarized results more easily done by reporting procedures. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Thursday, October 26, 2006 3:19 PM To: [hidden email] Subject: Re: Help using vectors and loops Viann, I want to comment from the sidelines. One element of Vincent's original message was the need to do this operation to two files and then match the resulting files. How can you do that when using means or summarize? Gene Maguin |
In reply to this post by Maguin, Eugene
Hi, Gene!
Vincent, are you starting to feel left out of your own problem? At 04:18 PM 10/26/2006, Gene Maguin wrote: >I want to comment from the sidelines. Hey, and here am I, so I must be on the sidelines of the sidelines. I suppose, that means in the stands? >One element of Vincent's original message was >the need to do this operation to two files and >then match the resulting files. How can you do >that when using means or summarize? Right. There's the question of capturing the results. AND the question, the biggest question, of getting the files into 'long' form before you do anything. (There may be a decent way to do the job with 'wide' files, but I don't see it. Writing the code to generate the code to produce 134 different output files doesn't appeal to me, and using the 134 once you've got them would likely be worse.) But as for multiple files, if you had them in 'long' form, I'd probably catenate them first (ADD FILES), using /IN= variables to mark which records come from which inputs; AGGREGATE using the file identifier as the high-order BY variable; sort cases by summary category, and file within category; and voilà - a 'long' file, in which the summary records for the input categories, for all files, are grouped. OK, you see my style, but I think it's sound: Keep numbers of files and variables low; to attain that, multiply records (cases) freely. |
In reply to this post by Richard Ristow
I am not sure why Vincent wants to capture this as an SPSS data file? Is that a goal or a strategy?
________________________________ From: Richard Ristow [mailto:[hidden email]] Sent: Thu 10/26/2006 3:38 PM To: Beadle, ViAnn; [hidden email] Subject: Re: Help using vectors and loops At 04:05 PM 10/26/2006, Beadle, ViAnn wrote: >The MEANS procedure and the SUMMARIZE procedure are essentially the >same thing under the cover and will do everything that AGGREGATE does >except it produces a nice table. I may be totally thick here. I love MEANS, especially the rich suite of cell statistics. But I don't see, from the documentation, how to capture the cell statistics to an SPSS file, which I think is desired. Were you thinking of OMSing the output to a file? Or, what obvious thing am I missing? |
In reply to this post by Beadle, ViAnn
But as said, AGGREGATE offers the possibility to run
subsequent analyses on your data. But maybe OMS + SUMMARIZE will be able do this as well. AJ --- "Beadle, ViAnn" <[hidden email]> wrote: > The MEANS procedure and the SUMMARIZE procedure are > essentially the same thing under the cover and will > do everything that AGGREGATE does except it produces > a nice table. > > -----Original Message----- > From: Richard Ristow > [mailto:[hidden email]] > Sent: Thursday, October 26, 2006 10:36 AM > To: Beadle, ViAnn; [hidden email] > Cc: Vincent Louis > Subject: Re: Help using vectors and loops > > At 08:56 AM 10/26/2006, Beadle, ViAnn wrote: > > >Seems to me that the means procedure can do this so > much simpler with > >industry used as a grouping variable. > > 'Twould exactly, though it looks like the sums are > desired for later > calculation, which suggests AGGREGATE rather than > MEANS. > > The problem, as I see it now, is that "industry" > isn't a grouping > variable, nor a variable at all. It looks like the > data organization is > 'wide', with each industry a set of variables within > the record. (I > don't know that the records represent. If they > represent firms, it may > well be that, usually, all but one set of "industry" > variables is > zero.) > > Going to 'long' organization, by the way, may be > easier with LOOP/XSAVE > than with VARSTOCASES. > > -Cheers, and onward, > Richard > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
In reply to this post by Beadle, ViAnn
Hi list,
Sometimes I create a mini data set just to unload the current data set so I can rename it, change the attributes, move it, etc. But why does the syntax below not work? SPSS keeps giving the message 'waiting for more inline data'. Indenting, or putting everything on one line does not help. Any ideas? Albert-Jan (that guy so nice they named him twice ;-)) define qwerty () data list / empty 1-5 (a). begin data empty end data. !enddefine. qwerty. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
Hi Albert-Jan
(You shouldn't complain, my parents christened me using 4 names, fortunately, only one shows in my documents and ID card) Now, seriously: AjR> Sometimes I create a mini data set just to unload the AjR> current data set so I can rename it, change the AjR> attributes, move it, etc. But why does the syntax AjR> below not work? SPSS keeps giving the message 'waiting AjR> for more inline data'. Indenting, or putting AjR> everything on one line does not help. Any ideas? AjR> Albert-Jan AjR> (that guy so nice they named him twice ;-)) After a bit of T&E (Trial and Error, the best method to learn SPSS syntax sometimes - wry smile), I've found out this: The BEGIN DATA... END DATA part should be outside the macro, I'm afraid. I couldn't find any reference of that in the syntax guide (but that doesn't necessarily mean it isn't there, I've failed to notice some items in the past). This works OK, but is of little use: DEFINE !qwerty(). INPUT PROGRAM. DATA LIST FIXED/NoData 1-5 (A). END INPUT PROGRAM. EXECUTE. !ENDDEFINE. !qwerty. BEGIN DATA empty END DATA. I have tried to put the BEGIN DATA... END DATA inside a second macro to call them consecutevely, but it doesn't work either. The only thing I can concoct in my brain is to use WRITE OUTFILE to save to disk a file with the necessary commands, and then INCLUDE OR INSERT it. Don't blame the messenger... -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Free forum by Nabble | Edit this page |