AGGREGATE OUTFILE=*, with datasets

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

AGGREGATE OUTFILE=*, with datasets

Richard Ristow
This is behavior, possibly a design choice, that I find anomalous; I
got bit by it while writing the solution to another posting. I got by
all right; there are several work-arounds. But it still seems strange.

Background: If the active dataset has a name, at least some
operations that accept FILE=* (including ADD FILES, MATCH FILES) that
use FILE=* retain the name for the revised active dataset. See
example at the end of this posting, using ADD FILES.

But AGGREGATE OUTFILE=* seems to create a new, unnamed active
dataset, leaving the formerly active dataset named and inactive. I
find this counter-intuitive. Comments?

ILLUSTRATION: AGGREGATE OUTFILE=* creates a new, unnamed dataset.
(SPSS 14 draft output, not saved separately.)

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:41:26       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|TestData   |
|-----------|
|Long       |
|-----------|
|PreOrder(a)|
|-----------|
a Active dataset


AGGREGATE OUTFILE=*
   /BREAK = id
   /Order = LAST(Order).

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:41:26       |
|-----------------------------|---------------------------|
Datasets
|------------|
|TestData    |
|------------|
|Long        |
|------------|
|PreOrder    |
|------------|
|(unnamed)(a)|
|------------|
a Active dataset
................
ILLUSTRATION:  ADD FILES/FILE=* retains the dataset name. (SPSS 14
draft output, not saved separately.)

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
Datasets
|--------|
|TestData|
|Long(a) |
|PreOrder|
|Order   |
|--------|
a Active dataset

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
[Long]

  id ill NEpisode Episode    startdt
   1   1      3        1  02/05/2004
   1   1      3        2  03/12/2004
   1   1      3        3  03/31/2004
   3   1      2        2  02/11/2004
[truncated]
  Number of cases read:  13    Number of cases listed:  13


ADD FILES
    /FILE  = *
    /BY id
    /FIRST = NewGuy.

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
Datasets
|--------|
|TestData|
|Long(a) |
|PreOrder|
|Order   |
|--------|
a Active dataset

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
[Long]
  id ill NEpisode Episode    startdt NewGuy

   1   1      3        1  02/05/2004    1
   1   1      3        2  03/12/2004    0
   1   1      3        3  03/31/2004    0
   3   1      2        2  02/11/2004    1
[truncated]
Number of cases read:  13    Number of cases listed:  13

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

Marks, Jim
 noticed the same issue i've had to put

dataset name new.
aggregate outfile = new.
dataset activate new.

dataset close old.

in my code when i want to change the active dataset.

--jim



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: Friday, November 09, 2007 12:09 PM
To: [hidden email]
Subject: AGGREGATE OUTFILE=*, with datasets

This is behavior, possibly a design choice, that I find anomalous; I got
bit by it while writing the solution to another posting. I got by all
right; there are several work-arounds. But it still seems strange.

Background: If the active dataset has a name, at least some operations
that accept FILE=* (including ADD FILES, MATCH FILES) that use FILE=*
retain the name for the revised active dataset. See example at the end
of this posting, using ADD FILES.

But AGGREGATE OUTFILE=* seems to create a new, unnamed active dataset,
leaving the formerly active dataset named and inactive. I find this
counter-intuitive. Comments?

ILLUSTRATION: AGGREGATE OUTFILE=* creates a new, unnamed dataset.
(SPSS 14 draft output, not saved separately.)

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:41:26       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|TestData   |
|-----------|
|Long       |
|-----------|
|PreOrder(a)|
|-----------|
a Active dataset


AGGREGATE OUTFILE=*
   /BREAK = id
   /Order = LAST(Order).

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:41:26       |
|-----------------------------|---------------------------|
Datasets
|------------|
|TestData    |
|------------|
|Long        |
|------------|
|PreOrder    |
|------------|
|(unnamed)(a)|
|------------|
a Active dataset
................
ILLUSTRATION:  ADD FILES/FILE=* retains the dataset name. (SPSS 14 draft
output, not saved separately.)

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
Datasets
|--------|
|TestData|
|Long(a) |
|PreOrder|
|Order   |
|--------|
a Active dataset

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
[Long]

  id ill NEpisode Episode    startdt
   1   1      3        1  02/05/2004
   1   1      3        2  03/12/2004
   1   1      3        3  03/31/2004
   3   1      2        2  02/11/2004
[truncated]
  Number of cases read:  13    Number of cases listed:  13


ADD FILES
    /FILE  = *
    /BY id
    /FIRST = NewGuy.

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
Datasets
|--------|
|TestData|
|Long(a) |
|PreOrder|
|Order   |
|--------|
a Active dataset

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
[Long]
  id ill NEpisode Episode    startdt NewGuy

   1   1      3        1  02/05/2004    1
   1   1      3        2  03/12/2004    0
   1   1      3        3  03/31/2004    0
   3   1      2        2  02/11/2004    1
[truncated]
Number of cases read:  13    Number of cases listed:  13

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

Peck, Jon
In reply to this post by Richard Ristow
This is all quite logical (he said, ducking).  Aggregate creates a new dataset, but it does not presume that you want that one to be active immediately.  Think of the output as a branch.  You might have a stream of syntax you are running and will use the aggregated dataset later.  So automatic activation would disrupt the main stream of processing, and you would have to remember to change it back if that wasn’t the behavior you wanted.

As for naming, again, no SPSS commands assign dataset names except the DATASET commands themselves.  But you can declare a dataset name and make that the target of your aggregate command, and you will have a named dataset.  For example,

DATASET DECLARE fred.
AGGREGATE
  /OUTFILE='fred'
  /BREAK=jobcat
  /salbegin_mean = MEAN(salbegin).

IOW, if your target is a dataset name, you will have a named dataset.  If it is a file, you will have a file.  And you can name it afterwards.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Friday, November 09, 2007 11:09 AM
To: [hidden email]
Subject: [SPSSX-L] AGGREGATE OUTFILE=*, with datasets

This is behavior, possibly a design choice, that I find anomalous; I
got bit by it while writing the solution to another posting. I got by
all right; there are several work-arounds. But it still seems strange.

Background: If the active dataset has a name, at least some
operations that accept FILE=* (including ADD FILES, MATCH FILES) that
use FILE=* retain the name for the revised active dataset. See
example at the end of this posting, using ADD FILES.

But AGGREGATE OUTFILE=* seems to create a new, unnamed active
dataset, leaving the formerly active dataset named and inactive. I
find this counter-intuitive. Comments?

ILLUSTRATION: AGGREGATE OUTFILE=* creates a new, unnamed dataset.
(SPSS 14 draft output, not saved separately.)

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:41:26       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|TestData   |
|-----------|
|Long       |
|-----------|
|PreOrder(a)|
|-----------|
a Active dataset


AGGREGATE OUTFILE=*
   /BREAK = id
   /Order = LAST(Order).

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:41:26       |
|-----------------------------|---------------------------|
Datasets
|------------|
|TestData    |
|------------|
|Long        |
|------------|
|PreOrder    |
|------------|
|(unnamed)(a)|
|------------|
a Active dataset
................
ILLUSTRATION:  ADD FILES/FILE=* retains the dataset name. (SPSS 14
draft output, not saved separately.)

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
Datasets
|--------|
|TestData|
|Long(a) |
|PreOrder|
|Order   |
|--------|
a Active dataset

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
[Long]

  id ill NEpisode Episode    startdt
   1   1      3        1  02/05/2004
   1   1      3        2  03/12/2004
   1   1      3        3  03/31/2004
   3   1      2        2  02/11/2004
[truncated]
  Number of cases read:  13    Number of cases listed:  13


ADD FILES
    /FILE  = *
    /BY id
    /FIRST = NewGuy.

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
Datasets
|--------|
|TestData|
|Long(a) |
|PreOrder|
|Order   |
|--------|
a Active dataset

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2007 12:54:20       |
|-----------------------------|---------------------------|
[Long]
  id ill NEpisode Episode    startdt NewGuy

   1   1      3        1  02/05/2004    1
   1   1      3        2  03/12/2004    0
   1   1      3        3  03/31/2004    0
   3   1      2        2  02/11/2004    1
[truncated]
Number of cases read:  13    Number of cases listed:  13

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

John Fiedler
In reply to this post by Richard Ristow
Anomalous behavior, indeed! As (very) long time user of SPSS, I find having
multiple open datasets a hindrance rather than an asset. Is there any way to
restrict later versions of SPSS to a single dataset?
Thanks,
JOHN


----- Original Message -----
From: "Richard Ristow" <[hidden email]>
To: <[hidden email]>
Sent: Friday, November 09, 2007 11:09 AM
Subject: AGGREGATE OUTFILE=*, with datasets


> This is behavior, possibly a design choice, that I find anomalous; I
> got bit by it while writing the solution to another posting. I got by
> all right; there are several work-arounds. But it still seems strange.
>
> Background: If the active dataset has a name, at least some
> operations that accept FILE=* (including ADD FILES, MATCH FILES) that
> use FILE=* retain the name for the revised active dataset. See
> example at the end of this posting, using ADD FILES.
>
> But AGGREGATE OUTFILE=* seems to create a new, unnamed active
> dataset, leaving the formerly active dataset named and inactive. I
> find this counter-intuitive. Comments?
>
> ILLUSTRATION: AGGREGATE OUTFILE=* creates a new, unnamed dataset.
> (SPSS 14 draft output, not saved separately.)
>
> DATASET DISPLAY.
>
> Dataset Display
> |-----------------------------|---------------------------|
> |Output Created               |09-NOV-2007 12:41:26       |
> |-----------------------------|---------------------------|
> Datasets
> |-----------|
> |TestData   |
> |-----------|
> |Long       |
> |-----------|
> |PreOrder(a)|
> |-----------|
> a Active dataset
>
>
> AGGREGATE OUTFILE=*
>   /BREAK = id
>   /Order = LAST(Order).
>
> DATASET DISPLAY.
>
> Dataset Display
> |-----------------------------|---------------------------|
> |Output Created               |09-NOV-2007 12:41:26       |
> |-----------------------------|---------------------------|
> Datasets
> |------------|
> |TestData    |
> |------------|
> |Long        |
> |------------|
> |PreOrder    |
> |------------|
> |(unnamed)(a)|
> |------------|
> a Active dataset
> ................
> ILLUSTRATION:  ADD FILES/FILE=* retains the dataset name. (SPSS 14
> draft output, not saved separately.)
>
> DATASET DISPLAY.
>
> Dataset Display
> |-----------------------------|---------------------------|
> |Output Created               |09-NOV-2007 12:54:20       |
> |-----------------------------|---------------------------|
> Datasets
> |--------|
> |TestData|
> |Long(a) |
> |PreOrder|
> |Order   |
> |--------|
> a Active dataset
>
> LIST.
> |-----------------------------|---------------------------|
> |Output Created               |09-NOV-2007 12:54:20       |
> |-----------------------------|---------------------------|
> [Long]
>
>  id ill NEpisode Episode    startdt
>   1   1      3        1  02/05/2004
>   1   1      3        2  03/12/2004
>   1   1      3        3  03/31/2004
>   3   1      2        2  02/11/2004
> [truncated]
>  Number of cases read:  13    Number of cases listed:  13
>
>
> ADD FILES
>    /FILE  = *
>    /BY id
>    /FIRST = NewGuy.
>
> DATASET DISPLAY.
>
> Dataset Display
> |-----------------------------|---------------------------|
> |Output Created               |09-NOV-2007 12:54:20       |
> |-----------------------------|---------------------------|
> Datasets
> |--------|
> |TestData|
> |Long(a) |
> |PreOrder|
> |Order   |
> |--------|
> a Active dataset
>
> LIST.
> |-----------------------------|---------------------------|
> |Output Created               |09-NOV-2007 12:54:20       |
> |-----------------------------|---------------------------|
> [Long]
>  id ill NEpisode Episode    startdt NewGuy
>
>   1   1      3        1  02/05/2004    1
>   1   1      3        2  03/12/2004    0
>   1   1      3        3  03/31/2004    0
>   3   1      2        2  02/11/2004    1
> [truncated]
> Number of cases read:  13    Number of cases listed:  13
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

Peck, Jon
For syntax users, there is no difference from the old behavior unless you ask for it.  If you don't assign a name to a dataset (via the DATASET commands), one goes away when you open another.

For gui users, dataset names are assigned automatically when a file is opened, so the default is for a dataset to stay around until closed.  SPSS 16 has a preference setting, "Open only one dataset at a time" available on Edit/Options/General.

This does not prevent you from opening multiple datasets, but it means that when you open one in the gui, a dataset opened the same way previously will automatically close (with prompt to save changes if needed).

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John Fiedler
Sent: Friday, November 09, 2007 4:11 PM
To: [hidden email]
Subject: Re: [SPSSX-L] AGGREGATE OUTFILE=*, with datasets

Anomalous behavior, indeed! As (very) long time user of SPSS, I find having
multiple open datasets a hindrance rather than an asset. Is there any way to
restrict later versions of SPSS to a single dataset?
Thanks,
JOHN

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Multiple datasets (was, re: AGGREGATE OUTFILE=*,...)

Richard Ristow
In reply to this post by John Fiedler
At 06:10 PM 11/9/2007, John Fiedler wrote:

>As (very) long time user of SPSS, I find having multiple open
>datasets a hindrance rather than an asset. Is there any way to
>restrict later versions of SPSS to a single dataset?

Two things:

a. In syntax: Syntax that contains no DATASET commands will never
create more than one open dataset. Behind the scenes, certain
commands create new datasets; but they replace the active dataset,
which (because it has no name) is then dropped. Behavior should be
indistinguishable from that in pre-14 versions with the single
'working file' instead of multiple datasets.

b. In SPSS 16, there's a menu option to not have more than one
dataset open. (I don't recall where it is on the Options panels.)
    The option affects the menuing system, not the back-end
processing that I consider "SPSS proper," which runs syntax for data
management and statistics. (Remember, the menuing system works by
issuing syntax commands.) I believe that the option is implemented by issuing
DATASET NAME $DataSet WINDOW=FRONT.
after every command GENERATED BY THE MENUS that creates a new dataset.
    The behavior of syntax that creates multiple datasets doesn't
change if this option is set; you can still do it in syntax, as much
or as little as you want to.
    And if multiple datasets are open, you can still activate them
individually from the menus.

Between them, these sort of approximate what you want, anyway.

-Cheers, of the data, by the data, and for the data,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

Richard Ristow
In reply to this post by Peck, Jon
At 05:26 PM 11/9/2007, Peck, Jon wrote:

>This is all quite logical (he said, ducking).

I see your points, but don't view them QUITE the same way ...

>Aggregate creates a new dataset, but it does not presume that you
>want that one to be active immediately. You might have a stream of
>syntax you are running and will use the aggregated dataset
>later.  So automatic activation would disrupt the main stream of
>processing, and you would have to remember to change it back if that
>wasn't the behavior you wanted.

But AGGREGATE gives specific control whether to make the result active, or not:
If you specify OUTFILE=*, the result is active immediately;
if you specify OUTFILE=<destination>, it isn't.

>As for naming, again, no SPSS commands assign dataset names except
>the DATASET commands themselves.  But you can declare a dataset name
>and make that the target of your aggregate command, and you will
>have a named dataset. If your target is a dataset name, you will
>have a named dataset.

Also true. But none of that is the question. The oddity occurs when
(a) The active dataset has a name
(b) The AGGREGATE command specifies OUTFILE=*.

If the active dataset has a name, then if you run a transformation
program; or run ADD FILES or MATCH FILES with /FILE=*; SPSS considers
you're modifying the active dataset, not creating a new one. The
modified active dataset still has the same name; and the
pre-modification version is not kept. (There are syntactic tools to
get different behaviors, of course.)

AGGREGATE OUTFILE=* seems on a par with the above two: a modification
to the active dataset, not creation of a new one. I'd expect the same
behavior: keep the old dataset name on the new, active AGGREGATE
output; and there's no copy of the old active dataset. Instead, the
AGGREGATE output becomes a new, active, unnamed dataset; the
pre-AGGREGATE version keeps the name, and is inactivated. (I got bit,
assuming that the AGGREGATE output was named.)

To complicate things further, if you specify MODE=ADDVARIABLES, the
modifie active dataset *does* keep its name.

I'd love to have been at the design conference ...
===========================================================
DEMO: AGGREGATE OUTFILE=*, without & with MODE=ADDVARIABLES
===========================================================

DATA LIST FREE / Group X .
BEGIN DATA
1 1 1 2 1 3 2 4 2 5
END DATA.
FORMATS GROUP X (F2).
DATASET NAME     Original.


DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|Original(a)|
|-----------|
a Active dataset


LIST.

List
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
[Original]

Group  X

    1   1
    1   2
    1   3
    2   4
    2   5

Number of cases read:  5    Number of cases listed:  5


*  AGGREGATE with OUTFILE=* creates a new dataset: .

AGGREGATE OUTFILE=*
    /BREAK=GROUP
    /MEAN = MEAN(X).

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|------------|
|Original    |
|------------|
|(unnamed)(a)|
|------------|
a Active dataset

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Group     MEAN

    1      2.00
    2      4.50

Number of cases read:  2    Number of cases listed:  2


*  AGGREGATE with OUTFILE=* MODE=ADDVARIABLES      .
*  does not create a new dataset: .                .

DATASET ACTIVATE  Original.
DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|Original(a)|
|-----------|
a Active dataset


AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK=GROUP
    /MEAN = MEAN(X).


DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|Original(a)|
|-----------|
a Active dataset


LIST.

List
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
[Original]

Group  X     MEAN

    1   1     2.00
    1   2     2.00
    1   3     2.00
    2   4     4.50
    2   5     4.50

Number of cases read:  5    Number of cases listed:  5

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

Oliver, Richard
While I don't disagree that the behavior can be confusing and is perhaps somewhat inconsistent...
 
OUTFILE=* "modifies" the original dataset in the same way that bulldozing a house and building a new one on the same site "modifies" the original house, whereas MODE=ADDVARIABLES is similar to putting an addition on the original house.
 
If the original dataset is unnamed, OUTFILE=* will still replace the active dataset, just like it did prior to introducing the ability to have multiple open datasets. If, however, the original dataset is named, then SPSS takes the less draconian route of creating a new dataset for the aggregated results.
 
*unnamed original dataset will be replaced.
dataset close all.
data list free /scalevar groupvar.
begin data
1 1 2 1 3 1 4 2 5 2 6 2
end data.
aggregate outfile=*
  /break=groupvar /aggvar=sum(scalevar).
*named original dataset will not be replaced.
dataset close all.
data list free /scalevar groupvar.
begin data
1 1 2 1 3 1 4 2 5 2 6 2
end data.
dataset name orginal.
aggregate outfile=*
  /break=groupvar /aggvar=sum(scalevar).
 
Match Files and Add Files don't exhibit similar behavior because they have no way to do anything other than overwrite the original dataset (no equivalent to the OUTFILE subcommand).

________________________________

From: SPSSX(r) Discussion on behalf of Richard Ristow
Sent: Sun 11/11/2007 12:12 AM
To: [hidden email]
Subject: Re: AGGREGATE OUTFILE=*, with datasets



At 05:26 PM 11/9/2007, Peck, Jon wrote:

>This is all quite logical (he said, ducking).

I see your points, but don't view them QUITE the same way ...

>Aggregate creates a new dataset, but it does not presume that you
>want that one to be active immediately. You might have a stream of
>syntax you are running and will use the aggregated dataset
>later.  So automatic activation would disrupt the main stream of
>processing, and you would have to remember to change it back if that
>wasn't the behavior you wanted.

But AGGREGATE gives specific control whether to make the result active, or not:
If you specify OUTFILE=*, the result is active immediately;
if you specify OUTFILE=<destination>, it isn't.

>As for naming, again, no SPSS commands assign dataset names except
>the DATASET commands themselves.  But you can declare a dataset name
>and make that the target of your aggregate command, and you will
>have a named dataset. If your target is a dataset name, you will
>have a named dataset.

Also true. But none of that is the question. The oddity occurs when
(a) The active dataset has a name
(b) The AGGREGATE command specifies OUTFILE=*.

If the active dataset has a name, then if you run a transformation
program; or run ADD FILES or MATCH FILES with /FILE=*; SPSS considers
you're modifying the active dataset, not creating a new one. The
modified active dataset still has the same name; and the
pre-modification version is not kept. (There are syntactic tools to
get different behaviors, of course.)

AGGREGATE OUTFILE=* seems on a par with the above two: a modification
to the active dataset, not creation of a new one. I'd expect the same
behavior: keep the old dataset name on the new, active AGGREGATE
output; and there's no copy of the old active dataset. Instead, the
AGGREGATE output becomes a new, active, unnamed dataset; the
pre-AGGREGATE version keeps the name, and is inactivated. (I got bit,
assuming that the AGGREGATE output was named.)

To complicate things further, if you specify MODE=ADDVARIABLES, the
modifie active dataset *does* keep its name.

I'd love to have been at the design conference ...
===========================================================
DEMO: AGGREGATE OUTFILE=*, without & with MODE=ADDVARIABLES
===========================================================

DATA LIST FREE / Group X .
BEGIN DATA
1 1 1 2 1 3 2 4 2 5
END DATA.
FORMATS GROUP X (F2).
DATASET NAME     Original.


DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|Original(a)|
|-----------|
a Active dataset


LIST.

List
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
[Original]

Group  X

    1   1
    1   2
    1   3
    2   4
    2   5

Number of cases read:  5    Number of cases listed:  5


*  AGGREGATE with OUTFILE=* creates a new dataset: .

AGGREGATE OUTFILE=*
    /BREAK=GROUP
    /MEAN = MEAN(X).

DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|------------|
|Original    |
|------------|
|(unnamed)(a)|
|------------|
a Active dataset

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Group     MEAN

    1      2.00
    2      4.50

Number of cases read:  2    Number of cases listed:  2


*  AGGREGATE with OUTFILE=* MODE=ADDVARIABLES      .
*  does not create a new dataset: .                .

DATASET ACTIVATE  Original.
DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|Original(a)|
|-----------|
a Active dataset


AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK=GROUP
    /MEAN = MEAN(X).


DATASET DISPLAY.

Dataset Display
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
Datasets
|-----------|
|Original(a)|
|-----------|
a Active dataset


LIST.

List
|-----------------------------|---------------------------|
|Output Created               |11-NOV-2007 01:01:13       |
|-----------------------------|---------------------------|
[Original]

Group  X     MEAN

    1   1     2.00
    1   2     2.00
    1   3     2.00
    2   4     4.50
    2   5     4.50

Number of cases read:  5    Number of cases listed:  5

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: AGGREGATE OUTFILE=*, with datasets

Richard Ristow
At 11:28 AM 11/11/2007, Oliver, Richard wrote:

>While I don't disagree that the behavior can be confusing and is
>perhaps somewhat inconsistent...
>
>OUTFILE=* "modifies" the original dataset in the same way that
>bulldozing a house and building a new one on the same site
>"modifies" the original house, whereas MODE=ADDVARIABLES is similar
>to putting an addition on the original house.

Yes, that makes sense. I can well see reaching the decision that was
reached, on this basis.

>If the original dataset is unnamed, OUTFILE=* will still replace the
>active dataset, just like it did prior to introducing the ability to
>have multiple open datasets. If, however, the original dataset is
>named, then SPSS takes the less draconian route of creating a new
>dataset for the aggregated results.

To make a guess at implementation, perhaps it creates a new active
dataset, and deactivates the existing one, in *both* cases. Since
deactivating an unnamed dataset loses it, while deactivating a named
dataset leaves it open under its name, that gives the effects you describe.

>Match Files and Add Files don't exhibit similar behavior because
>they have no way to do anything other than overwrite the original
>dataset (no equivalent to the OUTFILE subcommand).

Fair, though an even more direct argument for this choice might be as
you wrote about MODE=ADDVARIABLES: "[ADD FILES or MATCH FILES, with
/FILE=*,] are similar to putting an addition on the original house."

MATCH FILES and ADD FILES, by the way, also have an interesting
implementation quirk, that looks a little odd at first but I can very
much see the point; and it can be very useful.If the active dataset
is named MyData and there's an inactive one named YourData, then running

MATCH FILES [or ADD FILES]
    /FILE=*
    /FILE=YourData
    /BY [etc.]

gives the results of the MATCH/ADD as the active dataset, still named MyData.

Running

MATCH FILES [or ADD FILES]
    /FILE=MyData
    /FILE=YourData
    /BY [etc.]

gives exactly the same active dataset as result. But in this case,
the active dataset has no name, and MyData remains as it was;
inactive, but still open with its original contents.

Thank you very much, Richard!
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD