Hi Guys, I am trying to use aggregate on my dataset, my goal is to add the aggregated variables to my dataset and save the dataset (which includes all the original variables plus the aggregated variables) ..I am not seeing this capability in the outfile subcommand of aggregate...it either just writes the aggregated variables to the new dataset without saving ( which then I have to save manually after aggregate procedure is done) or it just saves a new dataset containing only the aggregated variables....am i missing something?
I know i can let it add the aggregated variables to my dataset and then manually save it after it is done but since my database has about 1 billion cases, it would be unnecessarily inefficient and very time consuming...So i was wondering if there is a way to directly have spss save everything into a new dataset while it is aggregating. Thanks! |
in your syntax file put
your mouse on the word AGGREGATE.
Hit the F1 key. look for the words "mode=addvariables" Art Kendall Social Research ConsultantsOn 10/6/2013 10:43 AM, devoidx [via SPSSX Discussion] wrote: Hi Guys, I am trying to use aggregate on my dataset, my goal is to add the aggregated variables to my dataset and save the dataset (which includes all the original variables plus the aggregated variables) ..I am not seeing this capability in the outfile subcommand of aggregate...it either just writes the aggregated variables to the new dataset without saving ( which then I have to save manually after aggregate procedure is done) or it just saves a new dataset containing only the aggregated variables....am i missing something?
Art Kendall
Social Research Consultants |
I am aware of mode=addvariables but that just adds the aggregated variables to the active dataset...it does not save the dataset.
|
If they're in the working file, why not just do:
File > Save as . . and enter a new name? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/spss-without-tears.html -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of devoidx Sent: 06 October 2013 17:05 To: [hidden email] Subject: Re: Aggregate Question I am aware of mode=addvariables but that just adds the aggregated variables to the active dataset...it does not save the dataset. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregate-Question-tp5722404p5 722406.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Because as i mentioned, I am working with a huge database containing 1 billion cases that takes hours upon hours to do anything with it so I am trying to do things in 1 step...having it write the aggregated variables into the active dataset and the saving it after its done means it has to go through the 1 billion database twice...I was hoping that i could do anything with the outfile subcommand of the aggregate to aggregate and save in one step
my hope would have been to do outfile='E:\savefile.sav' mode=addvariables...but that is not supported by aggregate...was hoping for a workaround |
Administrator
|
From the FM!!!
"MODE and OVERWRITE can be used only with OUTFILE=*; they are invalid with OUTFILE=’file specification’." Perhaps you need to fundamentally rethink your process? Over your head?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Not sure why you pasted that, I already mentioned that it is not supported by aggregate which is why i was hoping for a suggestion for a workaround.... unfortunately fundamentally rethinking my 'process' isn't going to shrink the size of my dataset...
i might be over my head but I am dealing with 4 datasets totaling more than 4.5 billion cases, a magnitude that i doubt anyone here has ever even been close to half of it, heck people are freaked out by their tiny 100 million case datasets ;) so i am trying to ask help on things that can add to the efficiency of the things i need to do and also at times amazed at the lack of spss optimization for large datasets compared to SAS. Otherwise, i am slowly but surely taking care of the analyses that i need to take care of with the help of users of this forum =) |
In reply to this post by devoidx
Could you *not* save the file? Add the variables to the working data file, and process the working data file? On 7 October 2013 04:51, devoidx <[hidden email]> wrote: Because as i mentioned, I am working with a huge database containing 1 |
Administrator
|
In reply to this post by devoidx
"amazed at the lack of spss optimization for large datasets compared to SAS."
Well then perhaps you should be using SAS and get out of our hair ;-) OR gain some basic experience with SPSS before tangling with dragons! --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
This post was updated on .
I am not sure why you have an issue with me asking questions here ...considering this is the purpose of this forum lol...I am not tangling with the dragons by choice or leisure, I have to do what I have to do
|
Administrator
|
In reply to this post by David Marso
David, that's a bit harsh, IMO. (Perhaps the wink at the end of line 2 below was meant to convey lightness of tone, but that sort of thing does not always come through clearly in written communication.)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by devoidx
At 10:43 AM 10/6/2013, devoidx wrote:
>I am trying to use aggregate on my dataset, my goal is to add the >aggregated variables to my dataset and save the dataset (which >includes all the original variables plus the aggregated variables) >..I am not seeing this capability in the outfile subcommand of >aggregate...it just writes the aggregated variables to the new >dataset without saving ( which then I have to save manually after >aggregate procedure is done) or it just saves a new dataset >containing only the aggregated variables > >I know I can let it add the aggregated variables to my dataset and >then manually save it after it is done but since my database has >about 1 billion cases, it would be unnecessarily inefficient and >very time consuming I don't think you can do this without reading your database twice. You'd like AGGREGATE OUTFILE=<new file> MODE=ADDVARIABLES... and you're right, AGGREGATE doesn't support that. The most efficient solution I can think of would be to create a separate file with the aggregated variables, MATCH with the original file, and save. The following assumes that your input is sorted in ascending order on the break variable or variables. You've posted about difficulties getting it sorted that way. Here, I'm assuming that's been solved; if it hasn't, we'll address it separately. Code would look like the following. I've used the "names" <in_file>, <agg_var>, <agg_file>, and <out_file>, as placeholders in the code, as follows: . <in_file> is your very large input. The code assumes it's an SPSS .SAV file; if it's not, the code will have to be longer (and probably slower). . <agg_var> is the variable or set of variables on which your input is sorted, and by which you wish to aggregate. . <agg_file> holds the break variables and aggregated values, only. It can be either an SPSS dataset (pr3eviously declared) or a disk file. Making it a dataset is easiest, but making it a file gives you more control. The run will be more efficient if it's a disk file on a disk different from the one (or ones) holding <in_file> and <out_file>. . <out_file> is the file you desire, as an SPSS .SAV file. It must be a disk file. GET FILE=<in_file>. AGGREGATE OUTPUT=<agg_file> /PRESORTED /BREAK=<agg_var> /... aggregation clauses as you need ... MATCH FILES /FILE= <in_file> /TABLE=<agg_file> /BY <agg_var>. XSAVE OUTFILE=<out_file>. EXECUTE. Be sure that none of the variables created by AGGREGATE have the same name as any variable in <in_file>. If any does, the aggregate variable will be lost. And notice XSAVE followed by EXECUTE, in place of SAVE FILE. That *may* save SPSS some disk reading and writing. Finally, the approach you don't want to take may work as well or better. The key is PRESORTED on AGGREGATE: GET FILE=<in_file>. AGGREGATE OUTPUT=<agg_file> MODE=ADDVARIABLES /PRESORTED /BREAK=<agg_var> /... aggregation clauses as you need ... SAVE OUTFILE=<out_file>. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |