SPSSX Discussion

Aggregate Question

Classic

List

Threaded

12 messages Options

devoidx

Aggregate Question

Hi Guys, I am trying to use aggregate on my dataset, my goal is to add the aggregated variables to my dataset and save the dataset (which includes all the original variables plus the aggregated variables) ..I am not seeing this capability in the outfile subcommand of aggregate...it either just writes the aggregated variables to the new dataset without saving ( which then I have to save manually after aggregate procedure is done) or it just saves a new dataset containing only the aggregated variables....am i missing something?

I know i can let it add the aggregated variables to my dataset and then manually save it after it is done but since my database has about 1 billion cases, it would be unnecessarily inefficient and very time consuming...So i was wondering if there is a way to directly have spss save everything into a new dataset while it is aggregating.

Thanks!

Art Kendall

Re: Aggregate Question

in your syntax file put your mouse on the word AGGREGATE.
Hit the F1 key.
look for the words "mode=addvariables"

Art Kendall
Social Research Consultants

On 10/6/2013 10:43 AM, devoidx [via SPSSX Discussion] wrote:

Hi Guys, I am trying to use aggregate on my dataset, my goal is to add the aggregated variables to my dataset and save the dataset (which includes all the original variables plus the aggregated variables) ..I am not seeing this capability in the outfile subcommand of aggregate...it either just writes the aggregated variables to the new dataset without saving ( which then I have to save manually after aggregate procedure is done) or it just saves a new dataset containing only the aggregated variables....am i missing something?

I know i can let it add the aggregated variables to my dataset and then manually save it after it is done but since my database has about 1 billion cases, it would be unnecessarily inefficient and very time consuming...So i was wondering if there is a way to directly have spss save everything into a new dataset while it is aggregating.

Thanks!

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Aggregate-Question-tp5722404.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

devoidx

Re: Aggregate Question

I am aware of mode=addvariables but that just adds the aggregated variables to the active dataset...it does not save the dataset.

John F Hall

Re: Aggregate Question

If they're in the working file, why not just do:

File > Save as

. . and enter a new name?

John F Hall (Mr)
[Retired academic survey researcher]

Email: [hidden email]
Website: www.surveyresearch.weebly.com
SPSS start page: www.surveyresearch.weebly.com/spss-without-tears.html

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
devoidx
Sent: 06 October 2013 17:05
To: [hidden email]
Subject: Re: Aggregate Question

I am aware of mode=addvariables but that just adds the aggregated variables
to the active dataset...it does not save the dataset.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Aggregate-Question-tp5722404p5
722406.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

devoidx

Re: Aggregate Question

Because as i mentioned, I am working with a huge database containing 1 billion cases that takes hours upon hours to do anything with it so I am trying to do things in 1 step...having it write the aggregated variables into the active dataset and the saving it after its done means it has to go through the 1 billion database twice...I was hoping that i could do anything with the outfile subcommand of the aggregate to aggregate and save in one step

my hope would have been to do outfile='E:\savefile.sav' mode=addvariables...but that is not supported by aggregate...was hoping for a workaround

David Marso

Re: Aggregate Question

Administrator

From the FM!!!
"MODE and OVERWRITE can be used only with OUTFILE=*; they are invalid with OUTFILE=’file specification’."
Perhaps you need to fundamentally rethink your process?
Over your head?

devoidx wrote

Because as i mentioned, I am working with a huge database containing 1 billion cases that takes hours upon hours to do anything with it so I am trying to do things in 1 step...having it write the aggregated variables into the active dataset and the saving it after its done means it has to go through the 1 billion database twice...I was hoping that i could do anything with the outfile subcommand of the aggregate to aggregate and save in one step

my hope would have been to do outfile='E:\savefile.sav' mode=addvariables...but that is not supported by aggregate...was hoping for a workaround

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

devoidx

Re: Aggregate Question

Not sure why you pasted that, I already mentioned that it is not supported by aggregate which is why i was hoping for a suggestion for a workaround.... unfortunately fundamentally rethinking my 'process' isn't going to shrink the size of my dataset...

i might be over my head but I am dealing with 4 datasets totaling more than 4.5 billion cases, a magnitude that i doubt anyone here has ever even been close to half of it, heck people are freaked out by their tiny 100 million case datasets ;) so i am trying to ask help on things that can add to the efficiency of the things i need to do and also at times amazed at the lack of spss optimization for large datasets compared to SAS.

Otherwise, i am slowly but surely taking care of the analyses that i need to take care of with the help of users of this forum =)

Paul Cook

Re: Aggregate Question

In reply to this post by devoidx

Could you *not* save the file? Add the variables to the working data file, and process the working data file?

Kind regards,

Paul Cook

Dangerous Enterprises Ltd

On 7 October 2013 04:51, devoidx <[hidden email]> wrote:

Because as i mentioned, I am working with a huge database containing 1
billion cases that takes hours upon hours to do anything with it so I am
trying to do things in 1 step...having it write the aggregated variables
into the active dataset and the saving it after its done means it has to go
through the 1 billion database twice...I was hoping that i could do anything
with the outfile subcommand of the aggregate to aggregate and save in one
step

my hope would have been to do outfile='E:\savefile.sav'
mode=addvariables...but that is not supported by aggregate...was hoping for
a workaround

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Aggregate-Question-tp5722404p5722409.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Aggregate Question

Administrator

In reply to this post by devoidx

"amazed at the lack of spss optimization for large datasets compared to SAS."
Well then perhaps you should be using SAS and get out of our hair ;-)
OR gain some basic experience with SPSS before tangling with dragons!
--

devoidx wrote

Not sure why you pasted that, I already mentioned that it is not supported by aggregate which is why i was hoping for a suggestion for a workaround.... unfortunately fundamentally rethinking my 'process' isn't going to shrink the size of my dataset...

i might be over my head but I am dealing with 4 datasets totaling more than 4.5 billion cases, a magnitude that i doubt anyone here has ever even been close to half of it, heck people are freaked out by their tiny 100 million case datasets ;) so i am trying to ask help on things that can add to the efficiency of the things i need to do and also at times amazed at the lack of spss optimization for large datasets compared to SAS.

Otherwise, i am slowly but surely taking care of the analyses that i need to take care of with the help of users of this forum =)

devoidx

Re: Aggregate Question

This post was updated on .

I am not sure why you have an issue with me asking questions here ...considering this is the purpose of this forum lol...I am not tangling with the dragons by choice or leisure, I have to do what I have to do

Bruce Weaver

Re: Aggregate Question

Administrator

In reply to this post by David Marso

David, that's a bit harsh, IMO. (Perhaps the wink at the end of line 2 below was meant to convey lightness of tone, but that sort of thing does not always come through clearly in written communication.)

David Marso wrote

"amazed at the lack of spss optimization for large datasets compared to SAS."
Well then perhaps you should be using SAS and get out of our hair ;-)
OR gain some basic experience with SPSS before tangling with dragons!
--

devoidx wrote

Not sure why you pasted that, I already mentioned that it is not supported by aggregate which is why i was hoping for a suggestion for a workaround.... unfortunately fundamentally rethinking my 'process' isn't going to shrink the size of my dataset...

i might be over my head but I am dealing with 4 datasets totaling more than 4.5 billion cases, a magnitude that i doubt anyone here has ever even been close to half of it, heck people are freaked out by their tiny 100 million case datasets ;) so i am trying to ask help on things that can add to the efficiency of the things i need to do and also at times amazed at the lack of spss optimization for large datasets compared to SAS.

Otherwise, i am slowly but surely taking care of the analyses that i need to take care of with the help of users of this forum =)

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Richard Ristow

Re: Aggregate Question

In reply to this post by devoidx

At 10:43 AM 10/6/2013, devoidx wrote:

>I am trying to use aggregate on my dataset, my goal is to add the
>aggregated variables to my dataset and save the dataset (which
>includes all the original variables plus the aggregated variables)
>..I am not seeing this capability in the outfile subcommand of
>aggregate...it just writes the aggregated variables to the new
>dataset without saving ( which then I have to save manually after
>aggregate procedure is done) or it just saves a new dataset
>containing only the aggregated variables
>
>I know I can let it add the aggregated variables to my dataset and
>then manually save it after it is done but since my database has
>about 1 billion cases, it would be unnecessarily inefficient and
>very time consuming

I don't think you can do this without reading your database twice. You'd like

AGGREGATE OUTFILE=<new file> MODE=ADDVARIABLES...

and you're right, AGGREGATE doesn't support that.

The most efficient solution I can think of would be to create a
separate file with the aggregated variables, MATCH with the original
file, and save. The following assumes that your input is sorted in
ascending order on the break variable or variables. You've posted
about difficulties getting it sorted that way. Here, I'm assuming
that's been solved; if it hasn't, we'll address it separately.

Code would look like the following. I've used the "names" <in_file>,
<agg_var>, <agg_file>, and <out_file>, as placeholders in the code,
as follows:

. <in_file> is your very large input. The code assumes it's an SPSS
.SAV file; if it's not, the code will have to be longer (and probably slower).

. <agg_var> is the variable or set of variables on which your input
is sorted, and by which you wish to aggregate.

. <agg_file> holds the break variables and aggregated values, only.
It can be either an SPSS dataset (pr3eviously declared) or a disk
file. Making it a dataset is easiest, but making it a file gives you
more control. The run will be more efficient if it's a disk file on a
disk different from the one (or ones) holding <in_file> and <out_file>.

. <out_file> is the file you desire, as an SPSS .SAV file. It must be
a disk file.

GET FILE=<in_file>.
AGGREGATE OUTPUT=<agg_file>
/PRESORTED
/BREAK=<agg_var>
/... aggregation clauses as you need ...
MATCH FILES
/FILE= <in_file>
/TABLE=<agg_file>
/BY <agg_var>.
XSAVE OUTFILE=<out_file>.
EXECUTE.

Be sure that none of the variables created by AGGREGATE have the same
name as any variable in <in_file>. If any does, the aggregate
variable will be lost.

And notice XSAVE followed by EXECUTE, in place of SAVE FILE. That
*may* save SPSS some disk reading and writing.

Finally, the approach you don't want to take may work as well or
better. The key is PRESORTED on AGGREGATE:

GET FILE=<in_file>.
AGGREGATE OUTPUT=<agg_file> MODE=ADDVARIABLES
/PRESORTED
/BREAK=<agg_var>
/... aggregation clauses as you need ...
SAVE OUTFILE=<out_file>.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD