FAQ: Avoid using EXECUTE

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

FAQ: Avoid using EXECUTE

Richard Ristow
I haven't posted this for a long time, but
several recent postings have EXECUTE or exe.
statements in example code. None of those
recently posted are needed, and it's important to
know this; unnecessary EXECUTEs can slow processing badly.

(For a recent EXECUTE that is needed, see my "Re:
Question: print or list in if condition", Wed, 11 Jun 2008.)


FAQ: Avoid using EXECUTE

An occasional reminder: there are very few occasions when EXECUTE is needed.

EXECUTE is not needed after a transformation, or
several transformations; the transformations are
carried out when they are needed, when the next procedure or SAVE is executed.

It's confusing that you don't *see*
transformation results in the Data Editor, unless
you run EXECUTE, or click "Run Pending
Transformations" (which is the same thing). It's
often worth doing that, just to see what you've
done. But if you don't, the next procedure or
save will still get the results of the transformations.

EXECUTE is treated very well in section "Use
EXECUTE Sparingly" in any edition of Raynald
Levesque's book: Levesque, Raynald, "SPSS®
Programming and Data Management, A Guide for
SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005.
(Downloadable free from the SPSS, Inc., Web site.)

And EXECUTE isn't harmless. EXECUTE makes SPSS
read the whole data file; multiple EXECUTEs can
badly slow processing of big files.

.....................
The logic of EXECUTE:

In the transformations,

COMPUTE C = A + B.
EXECUTE.
COMPUTE D = E/C.
EXECUTE.

At the first EXECUTE, the file is read; the value
of C is computed for every case; and the
resulting file (with all variables) is saved, as
a scratch file. At the second EXECUTE, the file
is read again; D is computed for every case,
using the computed value of C; and the file is
saved again. Five passes through the data:
reading twice, writing once. (Recent versions of
SPSS do optimizations that will save some of this.)

If you write, instead

COMPUTE C = A + B.
COMPUTE D = E/C.

and then whatever procedure or SAVE is desired,
the computations are done when the file is read
for the procedure or SAVE, needing no data passes
for the computation. In this logic, SPSS computes
the value of C for every case, then computes the
value of D for the same case, and then proceeds to the next case.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Christian Ganser
Funny discussion imho which arises from time to time. Why is there a
command one shouldn't use? Why aren't transformations carried out
immediately as in other software-packages? And: Is EXECUTE realy a major
reason for slow behaviour of SPSS? This depends on the size of the
dataset, and the new interface definitely wastes much more time than
some executes on some 1000 cases. So to me it seems the behaviour of
SPSS should be improved in this respect, not the behaviour of the users.

Richard Ristow wrote:

> I haven't posted this for a long time, but
> several recent postings have EXECUTE or exe.
> statements in example code. None of those
> recently posted are needed, and it's important to
> know this; unnecessary EXECUTEs can slow processing badly.
>
> (For a recent EXECUTE that is needed, see my "Re:
> Question: print or list in if condition", Wed, 11 Jun 2008.)
>
>
> FAQ: Avoid using EXECUTE
>
> An occasional reminder: there are very few occasions when EXECUTE is
> needed.
>
> EXECUTE is not needed after a transformation, or
> several transformations; the transformations are
> carried out when they are needed, when the next procedure or SAVE is
> executed.
>
> It's confusing that you don't *see*
> transformation results in the Data Editor, unless
> you run EXECUTE, or click "Run Pending
> Transformations" (which is the same thing). It's
> often worth doing that, just to see what you've
> done. But if you don't, the next procedure or
> save will still get the results of the transformations.
>
> EXECUTE is treated very well in section "Use
> EXECUTE Sparingly" in any edition of Raynald
> Levesque's book: Levesque, Raynald, "SPSS®
> Programming and Data Management, A Guide for
> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005.
> (Downloadable free from the SPSS, Inc., Web site.)
>
> And EXECUTE isn't harmless. EXECUTE makes SPSS
> read the whole data file; multiple EXECUTEs can
> badly slow processing of big files.
>
> .....................
> The logic of EXECUTE:
>
> In the transformations,
>
> COMPUTE C = A + B.
> EXECUTE.
> COMPUTE D = E/C.
> EXECUTE.
>
> At the first EXECUTE, the file is read; the value
> of C is computed for every case; and the
> resulting file (with all variables) is saved, as
> a scratch file. At the second EXECUTE, the file
> is read again; D is computed for every case,
> using the computed value of C; and the file is
> saved again. Five passes through the data:
> reading twice, writing once. (Recent versions of
> SPSS do optimizations that will save some of this.)
>
> If you write, instead
>
> COMPUTE C = A + B.
> COMPUTE D = E/C.
>
> and then whatever procedure or SAVE is desired,
> the computations are done when the file is read
> for the procedure or SAVE, needing no data passes
> for the computation. In this logic, SPSS computes
> the value of C for every case, then computes the
> value of D for the same case, and then proceeds to the next case.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Albert-Jan Roskam
Hi,

I have one additional question about the use of EXECUTE. Consider the following syntax.
select if var1 = 1.
execute.
compute var2 = 1 / x.
compute var3 = 1 / y.

Given that the number of records is very large, and that var1 = 1 is (quite) rare, would it be more efficient to use EXECUTE before running any other computations? After all, it'd drastically decrease the size of the scratch file.

Cheers!!
Albert-Jan




--- On Fri, 6/20/08, Christian Ganser <[hidden email]> wrote:

> From: Christian Ganser <[hidden email]>
> Subject: Re: FAQ: Avoid using EXECUTE
> To: [hidden email]
> Date: Friday, June 20, 2008, 8:56 AM
> Funny discussion imho which arises from time to time. Why is
> there a
> command one shouldn't use? Why aren't
> transformations carried out
> immediately as in other software-packages? And: Is EXECUTE
> realy a major
> reason for slow behaviour of SPSS? This depends on the size
> of the
> dataset, and the new interface definitely wastes much more
> time than
> some executes on some 1000 cases. So to me it seems the
> behaviour of
> SPSS should be improved in this respect, not the behaviour
> of the users.
>
> Richard Ristow wrote:
> > I haven't posted this for a long time, but
> > several recent postings have EXECUTE or exe.
> > statements in example code. None of those
> > recently posted are needed, and it's important to
> > know this; unnecessary EXECUTEs can slow processing
> badly.
> >
> > (For a recent EXECUTE that is needed, see my "Re:
> > Question: print or list in if condition", Wed, 11
> Jun 2008.)
> >
> >
> > FAQ: Avoid using EXECUTE
> >
> > An occasional reminder: there are very few occasions
> when EXECUTE is
> > needed.
> >
> > EXECUTE is not needed after a transformation, or
> > several transformations; the transformations are
> > carried out when they are needed, when the next
> procedure or SAVE is
> > executed.
> >
> > It's confusing that you don't *see*
> > transformation results in the Data Editor, unless
> > you run EXECUTE, or click "Run Pending
> > Transformations" (which is the same thing).
> It's
> > often worth doing that, just to see what you've
> > done. But if you don't, the next procedure or
> > save will still get the results of the
> transformations.
> >
> > EXECUTE is treated very well in section "Use
> > EXECUTE Sparingly" in any edition of Raynald
> > Levesque's book: Levesque, Raynald, "SPSS®
> > Programming and Data Management, A Guide for
> > SPSS® and SAS® Users". SPSS, Inc., Chicago, IL,
> 2005.
> > (Downloadable free from the SPSS, Inc., Web site.)
> >
> > And EXECUTE isn't harmless. EXECUTE makes SPSS
> > read the whole data file; multiple EXECUTEs can
> > badly slow processing of big files.
> >
> > .....................
> > The logic of EXECUTE:
> >
> > In the transformations,
> >
> > COMPUTE C = A + B.
> > EXECUTE.
> > COMPUTE D = E/C.
> > EXECUTE.
> >
> > At the first EXECUTE, the file is read; the value
> > of C is computed for every case; and the
> > resulting file (with all variables) is saved, as
> > a scratch file. At the second EXECUTE, the file
> > is read again; D is computed for every case,
> > using the computed value of C; and the file is
> > saved again. Five passes through the data:
> > reading twice, writing once. (Recent versions of
> > SPSS do optimizations that will save some of this.)
> >
> > If you write, instead
> >
> > COMPUTE C = A + B.
> > COMPUTE D = E/C.
> >
> > and then whatever procedure or SAVE is desired,
> > the computations are done when the file is read
> > for the procedure or SAVE, needing no data passes
> > for the computation. In this logic, SPSS computes
> > the value of C for every case, then computes the
> > value of D for the same case, and then proceeds to the
> next case.
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message
> to
> > [hidden email] (not to SPSSX-L), with no
> body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send
> the command
> > INFO REFCARD
> >
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Art Kendall
NO.

However, the difference would be small, because the size of the scratch
file would be small.
the "hidden steps" that are eliminated are marked with an asterisk (*).

open a scratch file (I don't know if SPSS still has a "look ahead" to
skip the whole scratch file process if it is not needed,
read each case
test for condition
*write selected cases ( # writes is # selected)
*"rewind" the scratch file
*open the scratch file,
*read the scratch file (# writes is # selected)
do the transformations.

Art Kendall
Social Research Consultants


Albert-jan Roskam wrote:

> Hi,
>
> I have one additional question about the use of EXECUTE. Consider the following syntax.
> select if var1 = 1.
> execute.
> compute var2 = 1 / x.
> compute var3 = 1 / y.
>
> Given that the number of records is very large, and that var1 = 1 is (quite) rare, would it be more efficient to use EXECUTE before running any other computations? After all, it'd drastically decrease the size of the scratch file.
>
> Cheers!!
> Albert-Jan
>
>
>
>
> --- On Fri, 6/20/08, Christian Ganser <[hidden email]> wrote:
>
>
>> From: Christian Ganser <[hidden email]>
>> Subject: Re: FAQ: Avoid using EXECUTE
>> To: [hidden email]
>> Date: Friday, June 20, 2008, 8:56 AM
>> Funny discussion imho which arises from time to time. Why is
>> there a
>> command one shouldn't use? Why aren't
>> transformations carried out
>> immediately as in other software-packages? And: Is EXECUTE
>> realy a major
>> reason for slow behaviour of SPSS? This depends on the size
>> of the
>> dataset, and the new interface definitely wastes much more
>> time than
>> some executes on some 1000 cases. So to me it seems the
>> behaviour of
>> SPSS should be improved in this respect, not the behaviour
>> of the users.
>>
>> Richard Ristow wrote:
>>
>>> I haven't posted this for a long time, but
>>> several recent postings have EXECUTE or exe.
>>> statements in example code. None of those
>>> recently posted are needed, and it's important to
>>> know this; unnecessary EXECUTEs can slow processing
>>>
>> badly.
>>
>>> (For a recent EXECUTE that is needed, see my "Re:
>>> Question: print or list in if condition", Wed, 11
>>>
>> Jun 2008.)
>>
>>> FAQ: Avoid using EXECUTE
>>>
>>> An occasional reminder: there are very few occasions
>>>
>> when EXECUTE is
>>
>>> needed.
>>>
>>> EXECUTE is not needed after a transformation, or
>>> several transformations; the transformations are
>>> carried out when they are needed, when the next
>>>
>> procedure or SAVE is
>>
>>> executed.
>>>
>>> It's confusing that you don't *see*
>>> transformation results in the Data Editor, unless
>>> you run EXECUTE, or click "Run Pending
>>> Transformations" (which is the same thing).
>>>
>> It's
>>
>>> often worth doing that, just to see what you've
>>> done. But if you don't, the next procedure or
>>> save will still get the results of the
>>>
>> transformations.
>>
>>> EXECUTE is treated very well in section "Use
>>> EXECUTE Sparingly" in any edition of Raynald
>>> Levesque's book: Levesque, Raynald, "SPSS®
>>> Programming and Data Management, A Guide for
>>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL,
>>>
>> 2005.
>>
>>> (Downloadable free from the SPSS, Inc., Web site.)
>>>
>>> And EXECUTE isn't harmless. EXECUTE makes SPSS
>>> read the whole data file; multiple EXECUTEs can
>>> badly slow processing of big files.
>>>
>>> .....................
>>> The logic of EXECUTE:
>>>
>>> In the transformations,
>>>
>>> COMPUTE C = A + B.
>>> EXECUTE.
>>> COMPUTE D = E/C.
>>> EXECUTE.
>>>
>>> At the first EXECUTE, the file is read; the value
>>> of C is computed for every case; and the
>>> resulting file (with all variables) is saved, as
>>> a scratch file. At the second EXECUTE, the file
>>> is read again; D is computed for every case,
>>> using the computed value of C; and the file is
>>> saved again. Five passes through the data:
>>> reading twice, writing once. (Recent versions of
>>> SPSS do optimizations that will save some of this.)
>>>
>>> If you write, instead
>>>
>>> COMPUTE C = A + B.
>>> COMPUTE D = E/C.
>>>
>>> and then whatever procedure or SAVE is desired,
>>> the computations are done when the file is read
>>> for the procedure or SAVE, needing no data passes
>>> for the computation. In this logic, SPSS computes
>>> the value of C for every case, then computes the
>>> value of D for the same case, and then proceeds to the
>>>
>> next case.
>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message
>>>
>> to
>>
>>> [hidden email] (not to SPSSX-L), with no
>>>
>> body text except the
>>
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send
>>>
>> the command
>>
>>> INFO REFCARD
>>>
>>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body
>> text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the
>> command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Art Kendall
In reply to this post by Christian Ganser
There are some programming practices that don't do a lot of harm on
small exercise type application, but that can be harmful on many real
problems.  Over use of EXECUTE is one that we see quite often is posts
on lists.
It is helpful for beginners to be aware of things that people with
decades of experience like Richard Ristow have seen to be problematic
for themselves and clients.
Unfortunately,  is often true that people telling us where some of the
pitfalls are is an exercise in futility. Teachers and consultants often
feel like Cassandra must have felt.

Spreadsheets are optimized for small scale problems. Aside: There are
reasons why spreadsheets are not accepted by ISO for accounting
purposes.  They are great for what they do well and I use them all the
time to send clients things output from SPSS and small problems with
small amounts of data transformation.

A major part of optimization for software designed for small problems is
to put all of the data in memory.
Also the amount of transformation done is typically  limited  compared
to what is done with a stat package.  Statistical applications in
spreadsheets, especially those involving matrix inversion, and
probability functions are well known to have numerical analysis problems
resulting in unrealistic results.

Packages like SPSS are optimized to handle  a wide variety  of file
sizes.   Part of SPSS's efficiency is that for most procedures it keeps
only one case at a time in memory and can therefore use memory for
summary information.

If you have only a few cases (e.g, 1000)  and a very up to date machine,
the time saving by dropping execute commands will be small.  That is one
reason it is a common practice to use a test data set with something
like 1000 cases  while developing the application.

Transformation syntax can end up being very long. For example, if one
has a mid-sized set of syntax of 300 lines, eliminating 50 or 60
executes (and therefore a read pass , a write pass, then another read
pass for each execute) can be very time-saving.  Although it is true
that this is no longer in terms of days or even many hours, it can be a
substantial savings.

During the development and debugging phase, it often more useful to look
at intermediate results to check the logic of transformations, by using
additional transformation test, doing descriptive stats, etc., than to
eyeball the results.

Art Kendall
Social Research Consultants



Christian Ganser wrote:

> Funny discussion imho which arises from time to time. Why is there a
> command one shouldn't use? Why aren't transformations carried out
> immediately as in other software-packages? And: Is EXECUTE realy a major
> reason for slow behaviour of SPSS? This depends on the size of the
> dataset, and the new interface definitely wastes much more time than
> some executes on some 1000 cases. So to me it seems the behaviour of
> SPSS should be improved in this respect, not the behaviour of the users.
>
> Richard Ristow wrote:
>> I haven't posted this for a long time, but
>> several recent postings have EXECUTE or exe.
>> statements in example code. None of those
>> recently posted are needed, and it's important to
>> know this; unnecessary EXECUTEs can slow processing badly.
>>
>> (For a recent EXECUTE that is needed, see my "Re:
>> Question: print or list in if condition", Wed, 11 Jun 2008.)
>>
>>
>> FAQ: Avoid using EXECUTE
>>
>> An occasional reminder: there are very few occasions when EXECUTE is
>> needed.
>>
>> EXECUTE is not needed after a transformation, or
>> several transformations; the transformations are
>> carried out when they are needed, when the next procedure or SAVE is
>> executed.
>>
>> It's confusing that you don't *see*
>> transformation results in the Data Editor, unless
>> you run EXECUTE, or click "Run Pending
>> Transformations" (which is the same thing). It's
>> often worth doing that, just to see what you've
>> done. But if you don't, the next procedure or
>> save will still get the results of the transformations.
>>
>> EXECUTE is treated very well in section "Use
>> EXECUTE Sparingly" in any edition of Raynald
>> Levesque's book: Levesque, Raynald, "SPSS®
>> Programming and Data Management, A Guide for
>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005.
>> (Downloadable free from the SPSS, Inc., Web site.)
>>
>> And EXECUTE isn't harmless. EXECUTE makes SPSS
>> read the whole data file; multiple EXECUTEs can
>> badly slow processing of big files.
>>
>> .....................
>> The logic of EXECUTE:
>>
>> In the transformations,
>>
>> COMPUTE C = A + B.
>> EXECUTE.
>> COMPUTE D = E/C.
>> EXECUTE.
>>
>> At the first EXECUTE, the file is read; the value
>> of C is computed for every case; and the
>> resulting file (with all variables) is saved, as
>> a scratch file. At the second EXECUTE, the file
>> is read again; D is computed for every case,
>> using the computed value of C; and the file is
>> saved again. Five passes through the data:
>> reading twice, writing once. (Recent versions of
>> SPSS do optimizations that will save some of this.)
>>
>> If you write, instead
>>
>> COMPUTE C = A + B.
>> COMPUTE D = E/C.
>>
>> and then whatever procedure or SAVE is desired,
>> the computations are done when the file is read
>> for the procedure or SAVE, needing no data passes
>> for the computation. In this logic, SPSS computes
>> the value of C for every case, then computes the
>> value of D for the same case, and then proceeds to the next case.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Oliver, Richard
In reply to this post by Christian Ganser
The command is sometimes needed and often useful.

It's needed, for example, if you use commands such as WRITE or XSAVE to write out data files and you need those files closed before you can continue but the next step in your job doesn't contain any procedure commands that read the data.

It's useful, for example, if you want to check the intermediate results of transformations when testing/debugging complicated jobs.

By default, the GUI automatically generates EXECUTE syntax after each transformation because many users (like you) expect to see the results immediately. (BTW, you can turn this off in Edit>Options>Data.) In the absence of the EXECUTE command, transformations are not executed immediately because it can save a great deal of processing time for large datasets if each block of transformations is processed in a single step.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Christian Ganser
Sent: Friday, June 20, 2008 1:56 AM
To: [hidden email]
Subject: Re: FAQ: Avoid using EXECUTE

Funny discussion imho which arises from time to time. Why is there a
command one shouldn't use? Why aren't transformations carried out
immediately as in other software-packages? And: Is EXECUTE realy a major
reason for slow behaviour of SPSS? This depends on the size of the
dataset, and the new interface definitely wastes much more time than
some executes on some 1000 cases. So to me it seems the behaviour of
SPSS should be improved in this respect, not the behaviour of the users.

Richard Ristow wrote:

> I haven't posted this for a long time, but
> several recent postings have EXECUTE or exe.
> statements in example code. None of those
> recently posted are needed, and it's important to
> know this; unnecessary EXECUTEs can slow processing badly.
>
> (For a recent EXECUTE that is needed, see my "Re:
> Question: print or list in if condition", Wed, 11 Jun 2008.)
>
>
> FAQ: Avoid using EXECUTE
>
> An occasional reminder: there are very few occasions when EXECUTE is
> needed.
>
> EXECUTE is not needed after a transformation, or
> several transformations; the transformations are
> carried out when they are needed, when the next procedure or SAVE is
> executed.
>
> It's confusing that you don't *see*
> transformation results in the Data Editor, unless
> you run EXECUTE, or click "Run Pending
> Transformations" (which is the same thing). It's
> often worth doing that, just to see what you've
> done. But if you don't, the next procedure or
> save will still get the results of the transformations.
>
> EXECUTE is treated very well in section "Use
> EXECUTE Sparingly" in any edition of Raynald
> Levesque's book: Levesque, Raynald, "SPSS(r)
> Programming and Data Management, A Guide for
> SPSS(r) and SAS(r) Users". SPSS, Inc., Chicago, IL, 2005.
> (Downloadable free from the SPSS, Inc., Web site.)
>
> And EXECUTE isn't harmless. EXECUTE makes SPSS
> read the whole data file; multiple EXECUTEs can
> badly slow processing of big files.
>
> .....................
> The logic of EXECUTE:
>
> In the transformations,
>
> COMPUTE C = A + B.
> EXECUTE.
> COMPUTE D = E/C.
> EXECUTE.
>
> At the first EXECUTE, the file is read; the value
> of C is computed for every case; and the
> resulting file (with all variables) is saved, as
> a scratch file. At the second EXECUTE, the file
> is read again; D is computed for every case,
> using the computed value of C; and the file is
> saved again. Five passes through the data:
> reading twice, writing once. (Recent versions of
> SPSS do optimizations that will save some of this.)
>
> If you write, instead
>
> COMPUTE C = A + B.
> COMPUTE D = E/C.
>
> and then whatever procedure or SAVE is desired,
> the computations are done when the file is read
> for the procedure or SAVE, needing no data passes
> for the computation. In this logic, SPSS computes
> the value of C for every case, then computes the
> value of D for the same case, and then proceeds to the next case.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Christian Ganser
In reply to this post by Art Kendall
Well, I didn't want to question Richard Ristow's experience. What I
thought of weren't spredsheets but rather Stata as an example, which I
don't know to have a similar command and where I don't eyeball the
results, as I don't  in SPSS - just seems to me that in many cases there
are other things that would save more time than dropping some executes.
But I might be thinking too much of my own problems with up to 100,000
cases, not more, mostly less.

Art Kendall wrote:

> There are some programming practices that don't do a lot of harm on
> small exercise type application, but that can be harmful on many real
> problems.  Over use of EXECUTE is one that we see quite often is posts
> on lists.
> It is helpful for beginners to be aware of things that people with
> decades of experience like Richard Ristow have seen to be problematic
> for themselves and clients.
> Unfortunately,  is often true that people telling us where some of the
> pitfalls are is an exercise in futility. Teachers and consultants often
> feel like Cassandra must have felt.
>
> Spreadsheets are optimized for small scale problems. Aside: There are
> reasons why spreadsheets are not accepted by ISO for accounting
> purposes.  They are great for what they do well and I use them all the
> time to send clients things output from SPSS and small problems with
> small amounts of data transformation.
>
> A major part of optimization for software designed for small problems is
> to put all of the data in memory.
> Also the amount of transformation done is typically  limited  compared
> to what is done with a stat package.  Statistical applications in
> spreadsheets, especially those involving matrix inversion, and
> probability functions are well known to have numerical analysis problems
> resulting in unrealistic results.
>
> Packages like SPSS are optimized to handle  a wide variety  of file
> sizes.   Part of SPSS's efficiency is that for most procedures it keeps
> only one case at a time in memory and can therefore use memory for
> summary information.
>
> If you have only a few cases (e.g, 1000)  and a very up to date machine,
> the time saving by dropping execute commands will be small.  That is one
> reason it is a common practice to use a test data set with something
> like 1000 cases  while developing the application.
>
> Transformation syntax can end up being very long. For example, if one
> has a mid-sized set of syntax of 300 lines, eliminating 50 or 60
> executes (and therefore a read pass , a write pass, then another read
> pass for each execute) can be very time-saving.  Although it is true
> that this is no longer in terms of days or even many hours, it can be a
> substantial savings.
>
> During the development and debugging phase, it often more useful to look
> at intermediate results to check the logic of transformations, by using
> additional transformation test, doing descriptive stats, etc., than to
> eyeball the results.
>
> Art Kendall
> Social Research Consultants
>
>
>
> Christian Ganser wrote:
>> Funny discussion imho which arises from time to time. Why is there a
>> command one shouldn't use? Why aren't transformations carried out
>> immediately as in other software-packages? And: Is EXECUTE realy a major
>> reason for slow behaviour of SPSS? This depends on the size of the
>> dataset, and the new interface definitely wastes much more time than
>> some executes on some 1000 cases. So to me it seems the behaviour of
>> SPSS should be improved in this respect, not the behaviour of the users.
>>
>> Richard Ristow wrote:
>>> I haven't posted this for a long time, but
>>> several recent postings have EXECUTE or exe.
>>> statements in example code. None of those
>>> recently posted are needed, and it's important to
>>> know this; unnecessary EXECUTEs can slow processing badly.
>>>
>>> (For a recent EXECUTE that is needed, see my "Re:
>>> Question: print or list in if condition", Wed, 11 Jun 2008.)
>>>
>>>
>>> FAQ: Avoid using EXECUTE
>>>
>>> An occasional reminder: there are very few occasions when EXECUTE is
>>> needed.
>>>
>>> EXECUTE is not needed after a transformation, or
>>> several transformations; the transformations are
>>> carried out when they are needed, when the next procedure or SAVE is
>>> executed.
>>>
>>> It's confusing that you don't *see*
>>> transformation results in the Data Editor, unless
>>> you run EXECUTE, or click "Run Pending
>>> Transformations" (which is the same thing). It's
>>> often worth doing that, just to see what you've
>>> done. But if you don't, the next procedure or
>>> save will still get the results of the transformations.
>>>
>>> EXECUTE is treated very well in section "Use
>>> EXECUTE Sparingly" in any edition of Raynald
>>> Levesque's book: Levesque, Raynald, "SPSS®
>>> Programming and Data Management, A Guide for
>>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005.
>>> (Downloadable free from the SPSS, Inc., Web site.)
>>>
>>> And EXECUTE isn't harmless. EXECUTE makes SPSS
>>> read the whole data file; multiple EXECUTEs can
>>> badly slow processing of big files.
>>>
>>> .....................
>>> The logic of EXECUTE:
>>>
>>> In the transformations,
>>>
>>> COMPUTE C = A + B.
>>> EXECUTE.
>>> COMPUTE D = E/C.
>>> EXECUTE.
>>>
>>> At the first EXECUTE, the file is read; the value
>>> of C is computed for every case; and the
>>> resulting file (with all variables) is saved, as
>>> a scratch file. At the second EXECUTE, the file
>>> is read again; D is computed for every case,
>>> using the computed value of C; and the file is
>>> saved again. Five passes through the data:
>>> reading twice, writing once. (Recent versions of
>>> SPSS do optimizations that will save some of this.)
>>>
>>> If you write, instead
>>>
>>> COMPUTE C = A + B.
>>> COMPUTE D = E/C.
>>>
>>> and then whatever procedure or SAVE is desired,
>>> the computations are done when the file is read
>>> for the procedure or SAVE, needing no data passes
>>> for the computation. In this logic, SPSS computes
>>> the value of C for every case, then computes the
>>> value of D for the same case, and then proceeds to the next case.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Oliver, Richard
If memory serves, Stata holds all the data in memory, so the size of a dataset is limited by available memory. SPSS doesn't have this limitation. An SPSS dataset can contain billions of cases.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Christian Ganser
Sent: Friday, June 20, 2008 9:28 AM
To: [hidden email]
Subject: Re: FAQ: Avoid using EXECUTE

Well, I didn't want to question Richard Ristow's experience. What I
thought of weren't spredsheets but rather Stata as an example, which I
don't know to have a similar command and where I don't eyeball the
results, as I don't  in SPSS - just seems to me that in many cases there
are other things that would save more time than dropping some executes.
But I might be thinking too much of my own problems with up to 100,000
cases, not more, mostly less.

Art Kendall wrote:

> There are some programming practices that don't do a lot of harm on
> small exercise type application, but that can be harmful on many real
> problems.  Over use of EXECUTE is one that we see quite often is posts
> on lists.
> It is helpful for beginners to be aware of things that people with
> decades of experience like Richard Ristow have seen to be problematic
> for themselves and clients.
> Unfortunately,  is often true that people telling us where some of the
> pitfalls are is an exercise in futility. Teachers and consultants often
> feel like Cassandra must have felt.
>
> Spreadsheets are optimized for small scale problems. Aside: There are
> reasons why spreadsheets are not accepted by ISO for accounting
> purposes.  They are great for what they do well and I use them all the
> time to send clients things output from SPSS and small problems with
> small amounts of data transformation.
>
> A major part of optimization for software designed for small problems is
> to put all of the data in memory.
> Also the amount of transformation done is typically  limited  compared
> to what is done with a stat package.  Statistical applications in
> spreadsheets, especially those involving matrix inversion, and
> probability functions are well known to have numerical analysis problems
> resulting in unrealistic results.
>
> Packages like SPSS are optimized to handle  a wide variety  of file
> sizes.   Part of SPSS's efficiency is that for most procedures it keeps
> only one case at a time in memory and can therefore use memory for
> summary information.
>
> If you have only a few cases (e.g, 1000)  and a very up to date machine,
> the time saving by dropping execute commands will be small.  That is one
> reason it is a common practice to use a test data set with something
> like 1000 cases  while developing the application.
>
> Transformation syntax can end up being very long. For example, if one
> has a mid-sized set of syntax of 300 lines, eliminating 50 or 60
> executes (and therefore a read pass , a write pass, then another read
> pass for each execute) can be very time-saving.  Although it is true
> that this is no longer in terms of days or even many hours, it can be a
> substantial savings.
>
> During the development and debugging phase, it often more useful to look
> at intermediate results to check the logic of transformations, by using
> additional transformation test, doing descriptive stats, etc., than to
> eyeball the results.
>
> Art Kendall
> Social Research Consultants
>
>
>
> Christian Ganser wrote:
>> Funny discussion imho which arises from time to time. Why is there a
>> command one shouldn't use? Why aren't transformations carried out
>> immediately as in other software-packages? And: Is EXECUTE realy a major
>> reason for slow behaviour of SPSS? This depends on the size of the
>> dataset, and the new interface definitely wastes much more time than
>> some executes on some 1000 cases. So to me it seems the behaviour of
>> SPSS should be improved in this respect, not the behaviour of the users.
>>
>> Richard Ristow wrote:
>>> I haven't posted this for a long time, but
>>> several recent postings have EXECUTE or exe.
>>> statements in example code. None of those
>>> recently posted are needed, and it's important to
>>> know this; unnecessary EXECUTEs can slow processing badly.
>>>
>>> (For a recent EXECUTE that is needed, see my "Re:
>>> Question: print or list in if condition", Wed, 11 Jun 2008.)
>>>
>>>
>>> FAQ: Avoid using EXECUTE
>>>
>>> An occasional reminder: there are very few occasions when EXECUTE is
>>> needed.
>>>
>>> EXECUTE is not needed after a transformation, or
>>> several transformations; the transformations are
>>> carried out when they are needed, when the next procedure or SAVE is
>>> executed.
>>>
>>> It's confusing that you don't *see*
>>> transformation results in the Data Editor, unless
>>> you run EXECUTE, or click "Run Pending
>>> Transformations" (which is the same thing). It's
>>> often worth doing that, just to see what you've
>>> done. But if you don't, the next procedure or
>>> save will still get the results of the transformations.
>>>
>>> EXECUTE is treated very well in section "Use
>>> EXECUTE Sparingly" in any edition of Raynald
>>> Levesque's book: Levesque, Raynald, "SPSS(r)
>>> Programming and Data Management, A Guide for
>>> SPSS(r) and SAS(r) Users". SPSS, Inc., Chicago, IL, 2005.
>>> (Downloadable free from the SPSS, Inc., Web site.)
>>>
>>> And EXECUTE isn't harmless. EXECUTE makes SPSS
>>> read the whole data file; multiple EXECUTEs can
>>> badly slow processing of big files.
>>>
>>> .....................
>>> The logic of EXECUTE:
>>>
>>> In the transformations,
>>>
>>> COMPUTE C = A + B.
>>> EXECUTE.
>>> COMPUTE D = E/C.
>>> EXECUTE.
>>>
>>> At the first EXECUTE, the file is read; the value
>>> of C is computed for every case; and the
>>> resulting file (with all variables) is saved, as
>>> a scratch file. At the second EXECUTE, the file
>>> is read again; D is computed for every case,
>>> using the computed value of C; and the file is
>>> saved again. Five passes through the data:
>>> reading twice, writing once. (Recent versions of
>>> SPSS do optimizations that will save some of this.)
>>>
>>> If you write, instead
>>>
>>> COMPUTE C = A + B.
>>> COMPUTE D = E/C.
>>>
>>> and then whatever procedure or SAVE is desired,
>>> the computations are done when the file is read
>>> for the procedure or SAVE, needing no data passes
>>> for the computation. In this logic, SPSS computes
>>> the value of C for every case, then computes the
>>> value of D for the same case, and then proceeds to the next case.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Art Kendall
In reply to this post by Christian Ganser
There are *additional *things that save time.   The FAQ posted is not
the only thing that could b e a FAQ.
See Raynald's book that is free from SPSS.

What else would save time, depends on the nature of your data, the
nature of your transformations, and the nature of your analysis.
For example, it is a rare occasion that you will not want to refine your
syntax in some way.  You can save time IN THE LONG RUN, by being sure
that you can go back and redo any or all of your work, e.g., by never
saving with the same filename, never writing over variables you
transform, etc.

 It is also a rare occasion that you will do the whole project with no
interruptions such as meals, bathroom breaks, interruptions, etc.  You
can save time if you try to write in such a way that you yourself, or
someone else at another time, will know what you were thinking.

Art


Christian Ganser wrote:

> Well, I didn't want to question Richard Ristow's experience. What I
> thought of weren't spredsheets but rather Stata as an example, which I
> don't know to have a similar command and where I don't eyeball the
> results, as I don't  in SPSS - just seems to me that in many cases
> there are other things that would save more time than dropping some
> executes. But I might be thinking too much of my own problems with up
> to 100,000 cases, not more, mostly less.
>
> Art Kendall wrote:
>> There are some programming practices that don't do a lot of harm on
>> small exercise type application, but that can be harmful on many real
>> problems.  Over use of EXECUTE is one that we see quite often is posts
>> on lists.
>> It is helpful for beginners to be aware of things that people with
>> decades of experience like Richard Ristow have seen to be problematic
>> for themselves and clients.
>> Unfortunately,  is often true that people telling us where some of the
>> pitfalls are is an exercise in futility. Teachers and consultants often
>> feel like Cassandra must have felt.
>>
>> Spreadsheets are optimized for small scale problems. Aside: There are
>> reasons why spreadsheets are not accepted by ISO for accounting
>> purposes.  They are great for what they do well and I use them all the
>> time to send clients things output from SPSS and small problems with
>> small amounts of data transformation.
>>
>> A major part of optimization for software designed for small problems is
>> to put all of the data in memory.
>> Also the amount of transformation done is typically  limited  compared
>> to what is done with a stat package.  Statistical applications in
>> spreadsheets, especially those involving matrix inversion, and
>> probability functions are well known to have numerical analysis problems
>> resulting in unrealistic results.
>>
>> Packages like SPSS are optimized to handle  a wide variety  of file
>> sizes.   Part of SPSS's efficiency is that for most procedures it keeps
>> only one case at a time in memory and can therefore use memory for
>> summary information.
>>
>> If you have only a few cases (e.g, 1000)  and a very up to date machine,
>> the time saving by dropping execute commands will be small.  That is one
>> reason it is a common practice to use a test data set with something
>> like 1000 cases  while developing the application.
>>
>> Transformation syntax can end up being very long. For example, if one
>> has a mid-sized set of syntax of 300 lines, eliminating 50 or 60
>> executes (and therefore a read pass , a write pass, then another read
>> pass for each execute) can be very time-saving.  Although it is true
>> that this is no longer in terms of days or even many hours, it can be a
>> substantial savings.
>>
>> During the development and debugging phase, it often more useful to look
>> at intermediate results to check the logic of transformations, by using
>> additional transformation test, doing descriptive stats, etc., than to
>> eyeball the results.
>>
>> Art Kendall
>> Social Research Consultants
>>
>>
>>
>> Christian Ganser wrote:
>>> Funny discussion imho which arises from time to time. Why is there a
>>> command one shouldn't use? Why aren't transformations carried out
>>> immediately as in other software-packages? And: Is EXECUTE realy a
>>> major
>>> reason for slow behaviour of SPSS? This depends on the size of the
>>> dataset, and the new interface definitely wastes much more time than
>>> some executes on some 1000 cases. So to me it seems the behaviour of
>>> SPSS should be improved in this respect, not the behaviour of the
>>> users.
>>>
>>> Richard Ristow wrote:
>>>> I haven't posted this for a long time, but
>>>> several recent postings have EXECUTE or exe.
>>>> statements in example code. None of those
>>>> recently posted are needed, and it's important to
>>>> know this; unnecessary EXECUTEs can slow processing badly.
>>>>
>>>> (For a recent EXECUTE that is needed, see my "Re:
>>>> Question: print or list in if condition", Wed, 11 Jun 2008.)
>>>>
>>>>
>>>> FAQ: Avoid using EXECUTE
>>>>
>>>> An occasional reminder: there are very few occasions when EXECUTE is
>>>> needed.
>>>>
>>>> EXECUTE is not needed after a transformation, or
>>>> several transformations; the transformations are
>>>> carried out when they are needed, when the next procedure or SAVE is
>>>> executed.
>>>>
>>>> It's confusing that you don't *see*
>>>> transformation results in the Data Editor, unless
>>>> you run EXECUTE, or click "Run Pending
>>>> Transformations" (which is the same thing). It's
>>>> often worth doing that, just to see what you've
>>>> done. But if you don't, the next procedure or
>>>> save will still get the results of the transformations.
>>>>
>>>> EXECUTE is treated very well in section "Use
>>>> EXECUTE Sparingly" in any edition of Raynald
>>>> Levesque's book: Levesque, Raynald, "SPSS®
>>>> Programming and Data Management, A Guide for
>>>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005.
>>>> (Downloadable free from the SPSS, Inc., Web site.)
>>>>
>>>> And EXECUTE isn't harmless. EXECUTE makes SPSS
>>>> read the whole data file; multiple EXECUTEs can
>>>> badly slow processing of big files.
>>>>
>>>> .....................
>>>> The logic of EXECUTE:
>>>>
>>>> In the transformations,
>>>>
>>>> COMPUTE C = A + B.
>>>> EXECUTE.
>>>> COMPUTE D = E/C.
>>>> EXECUTE.
>>>>
>>>> At the first EXECUTE, the file is read; the value
>>>> of C is computed for every case; and the
>>>> resulting file (with all variables) is saved, as
>>>> a scratch file. At the second EXECUTE, the file
>>>> is read again; D is computed for every case,
>>>> using the computed value of C; and the file is
>>>> saved again. Five passes through the data:
>>>> reading twice, writing once. (Recent versions of
>>>> SPSS do optimizations that will save some of this.)
>>>>
>>>> If you write, instead
>>>>
>>>> COMPUTE C = A + B.
>>>> COMPUTE D = E/C.
>>>>
>>>> and then whatever procedure or SAVE is desired,
>>>> the computations are done when the file is read
>>>> for the procedure or SAVE, needing no data passes
>>>> for the computation. In this logic, SPSS computes
>>>> the value of C for every case, then computes the
>>>> value of D for the same case, and then proceeds to the next case.
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text
>>>> except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Albert-Jan Roskam
I agree: usually it's not the hardware (computer), or the software (spss) that is the bottleneck, rather it is the humble researcher. My syntaxes always have a header (title, purpose, context/project, author, date, version, note) and they also have lots of comments. That sounds like a lot of work, but in the end it's not. It streamlines and focuses one's thoughts and makes code readable for my co-workers. When I get back from vacation, I sometimes really need my own comments! I am generally reluctant to save lots of intermediate files, but when it serves to save on time-consuming (e.g. pre-processing) computations I think it's a good idea. Still, after debugging, I still prefer to run everything again from A to Z.

Cheers!
Albert-Jan


--- On Fri, 6/20/08, Art Kendall <[hidden email]> wrote:

> From: Art Kendall <[hidden email]>
> Subject: Re: FAQ: Avoid using EXECUTE
> To: [hidden email]
> Date: Friday, June 20, 2008, 4:53 PM
> There are *additional *things that save time.   The FAQ
> posted is not
> the only thing that could b e a FAQ.
> See Raynald's book that is free from SPSS.
>
> What else would save time, depends on the nature of your
> data, the
> nature of your transformations, and the nature of your
> analysis.
> For example, it is a rare occasion that you will not want
> to refine your
> syntax in some way.  You can save time IN THE LONG RUN, by
> being sure
> that you can go back and redo any or all of your work,
> e.g., by never
> saving with the same filename, never writing over variables
> you
> transform, etc.
>
>  It is also a rare occasion that you will do the whole
> project with no
> interruptions such as meals, bathroom breaks,
> interruptions, etc.  You
> can save time if you try to write in such a way that you
> yourself, or
> someone else at another time, will know what you were
> thinking.
>
> Art
>
>
> Christian Ganser wrote:
> > Well, I didn't want to question Richard
> Ristow's experience. What I
> > thought of weren't spredsheets but rather Stata as
> an example, which I
> > don't know to have a similar command and where I
> don't eyeball the
> > results, as I don't  in SPSS - just seems to me
> that in many cases
> > there are other things that would save more time than
> dropping some
> > executes. But I might be thinking too much of my own
> problems with up
> > to 100,000 cases, not more, mostly less.
> >
> > Art Kendall wrote:
> >> There are some programming practices that
> don't do a lot of harm on
> >> small exercise type application, but that can be
> harmful on many real
> >> problems.  Over use of EXECUTE is one that we see
> quite often is posts
> >> on lists.
> >> It is helpful for beginners to be aware of things
> that people with
> >> decades of experience like Richard Ristow have
> seen to be problematic
> >> for themselves and clients.
> >> Unfortunately,  is often true that people telling
> us where some of the
> >> pitfalls are is an exercise in futility. Teachers
> and consultants often
> >> feel like Cassandra must have felt.
> >>
> >> Spreadsheets are optimized for small scale
> problems. Aside: There are
> >> reasons why spreadsheets are not accepted by ISO
> for accounting
> >> purposes.  They are great for what they do well
> and I use them all the
> >> time to send clients things output from SPSS and
> small problems with
> >> small amounts of data transformation.
> >>
> >> A major part of optimization for software designed
> for small problems is
> >> to put all of the data in memory.
> >> Also the amount of transformation done is
> typically  limited  compared
> >> to what is done with a stat package.  Statistical
> applications in
> >> spreadsheets, especially those involving matrix
> inversion, and
> >> probability functions are well known to have
> numerical analysis problems
> >> resulting in unrealistic results.
> >>
> >> Packages like SPSS are optimized to handle  a wide
> variety  of file
> >> sizes.   Part of SPSS's efficiency is that for
> most procedures it keeps
> >> only one case at a time in memory and can
> therefore use memory for
> >> summary information.
> >>
> >> If you have only a few cases (e.g, 1000)  and a
> very up to date machine,
> >> the time saving by dropping execute commands will
> be small.  That is one
> >> reason it is a common practice to use a test data
> set with something
> >> like 1000 cases  while developing the application.
> >>
> >> Transformation syntax can end up being very long.
> For example, if one
> >> has a mid-sized set of syntax of 300 lines,
> eliminating 50 or 60
> >> executes (and therefore a read pass , a write
> pass, then another read
> >> pass for each execute) can be very time-saving.
> Although it is true
> >> that this is no longer in terms of days or even
> many hours, it can be a
> >> substantial savings.
> >>
> >> During the development and debugging phase, it
> often more useful to look
> >> at intermediate results to check the logic of
> transformations, by using
> >> additional transformation test, doing descriptive
> stats, etc., than to
> >> eyeball the results.
> >>
> >> Art Kendall
> >> Social Research Consultants
> >>
> >>
> >>
> >> Christian Ganser wrote:
> >>> Funny discussion imho which arises from time
> to time. Why is there a
> >>> command one shouldn't use? Why aren't
> transformations carried out
> >>> immediately as in other software-packages?
> And: Is EXECUTE realy a
> >>> major
> >>> reason for slow behaviour of SPSS? This
> depends on the size of the
> >>> dataset, and the new interface definitely
> wastes much more time than
> >>> some executes on some 1000 cases. So to me it
> seems the behaviour of
> >>> SPSS should be improved in this respect, not
> the behaviour of the
> >>> users.
> >>>
> >>> Richard Ristow wrote:
> >>>> I haven't posted this for a long time,
> but
> >>>> several recent postings have EXECUTE or
> exe.
> >>>> statements in example code. None of those
> >>>> recently posted are needed, and it's
> important to
> >>>> know this; unnecessary EXECUTEs can slow
> processing badly.
> >>>>
> >>>> (For a recent EXECUTE that is needed, see
> my "Re:
> >>>> Question: print or list in if
> condition", Wed, 11 Jun 2008.)
> >>>>
> >>>>
> >>>> FAQ: Avoid using EXECUTE
> >>>>
> >>>> An occasional reminder: there are very few
> occasions when EXECUTE is
> >>>> needed.
> >>>>
> >>>> EXECUTE is not needed after a
> transformation, or
> >>>> several transformations; the
> transformations are
> >>>> carried out when they are needed, when the
> next procedure or SAVE is
> >>>> executed.
> >>>>
> >>>> It's confusing that you don't
> *see*
> >>>> transformation results in the Data Editor,
> unless
> >>>> you run EXECUTE, or click "Run
> Pending
> >>>> Transformations" (which is the same
> thing). It's
> >>>> often worth doing that, just to see what
> you've
> >>>> done. But if you don't, the next
> procedure or
> >>>> save will still get the results of the
> transformations.
> >>>>
> >>>> EXECUTE is treated very well in section
> "Use
> >>>> EXECUTE Sparingly" in any edition of
> Raynald
> >>>> Levesque's book: Levesque, Raynald,
> "SPSS®
> >>>> Programming and Data Management, A Guide
> for
> >>>> SPSS® and SAS® Users". SPSS, Inc.,
> Chicago, IL, 2005.
> >>>> (Downloadable free from the SPSS, Inc.,
> Web site.)
> >>>>
> >>>> And EXECUTE isn't harmless. EXECUTE
> makes SPSS
> >>>> read the whole data file; multiple
> EXECUTEs can
> >>>> badly slow processing of big files.
> >>>>
> >>>> .....................
> >>>> The logic of EXECUTE:
> >>>>
> >>>> In the transformations,
> >>>>
> >>>> COMPUTE C = A + B.
> >>>> EXECUTE.
> >>>> COMPUTE D = E/C.
> >>>> EXECUTE.
> >>>>
> >>>> At the first EXECUTE, the file is read;
> the value
> >>>> of C is computed for every case; and the
> >>>> resulting file (with all variables) is
> saved, as
> >>>> a scratch file. At the second EXECUTE, the
> file
> >>>> is read again; D is computed for every
> case,
> >>>> using the computed value of C; and the
> file is
> >>>> saved again. Five passes through the data:
> >>>> reading twice, writing once. (Recent
> versions of
> >>>> SPSS do optimizations that will save some
> of this.)
> >>>>
> >>>> If you write, instead
> >>>>
> >>>> COMPUTE C = A + B.
> >>>> COMPUTE D = E/C.
> >>>>
> >>>> and then whatever procedure or SAVE is
> desired,
> >>>> the computations are done when the file is
> read
> >>>> for the procedure or SAVE, needing no data
> passes
> >>>> for the computation. In this logic, SPSS
> computes
> >>>> the value of C for every case, then
> computes the
> >>>> value of D for the same case, and then
> proceeds to the next case.
> >>>>
> >>>> =====================
> >>>> To manage your subscription to SPSSX-L,
> send a message to
> >>>> [hidden email] (not to
> SPSSX-L), with no body text
> >>>> except the
> >>>> command. To leave the list, send the
> command
> >>>> SIGNOFF SPSSX-L
> >>>> For a list of commands to manage
> subscriptions, send the command
> >>>> INFO REFCARD
> >>>>
> >>>
> >>> =====================
> >>> To manage your subscription to SPSSX-L, send a
> message to
> >>> [hidden email] (not to SPSSX-L),
> with no body text except
> >>> the
> >>> command. To leave the list, send the command
> >>> SIGNOFF SPSSX-L
> >>> For a list of commands to manage
> subscriptions, send the command
> >>> INFO REFCARD
> >>>
> >>>
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a
> message to
> >> [hidden email] (not to SPSSX-L), with
> no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions,
> send the command
> >> INFO REFCARD
> >>
> >
> >
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: FAQ: Avoid using EXECUTE

Richard Ristow
In reply to this post by Christian Ganser
At 02:56 AM 6/20/2008, Christian Ganser wrote:

>Why aren't transformations carried out
>immediately as in other software-packages?

As Art Kendall wrote, it's an optimization that's
very effective for large files. Main memory sizes
('RAM') seem to have grown faster than file
sizes, so the optimization isn't as important as it was.

>Why is there a command one shouldn't use?

Sometimes EXECUTE can be useful, even necessary;
see revised FAQ, below. (But the large majority
in postings to the list, are not.)

>Is EXECUTE really a major reason for slow
>behaviour of SPSS? This depends on the size of the dataset, ...

Right. The cost is very size-dependent, though
it's also proportional to the number of EXECUTEs.
If you use an EXECUTE after every transform
command (and some users come close), you can slow
performance even on a medium-size file.

By the way, EXECUTE does change behavior, and may
not be harmless. For example, it'll break the
code if you put it in any construct such as DO
IF, DO REPEAT, or LOOP; and it'll break most code that uses scratch variables.

>The new interface definitely wastes much more
>time than some executes on some 1000 cases.

I don't doubt *that* for a moment, since the
interface has been much complained of, and 1,000 cases is very small.

>So to me it seems the behaviour of SPSS should
>be improved in this respect, not the behaviour of the users.

For that, on to the SPSS suggestion line. But
take care in designing what you think the
behavior should be, taking into account the warnings I've given.

FAQ: Avoid using EXECUTE

There are very few occasions when EXECUTE is
needed, and it can be harmful. For an excellent
treatment, see section "Use EXECUTE Sparingly" in
any edition of: Levesque, Raynald, "SPSS®
Programming and Data Management, A Guide for
SPSS® and SAS® Users". SPSS, Inc., Chicago, IL
(various dates). It is downloadable free from the SPSS, Inc., Web site.

Below, see
A. On not using EXECUTE
B. What EXECUTE does
C. When EXECUTE is useful

A. On not using EXECUTE
EXECUTE is not needed after a transformation, or
several transformations; the transformations are
carried out when they are needed, when the next procedure or SAVE is executed.

It's confusing that you don't *see*
transformation results in the Data Editor.
Sometimes, you run EXECUTE, or click "Run Pending
Transformations" (the same thing) just for that.
But if you don't, the next procedure or SAVE will
receive the results of the transformations.

And EXECUTE isn't harmless. EXECUTE makes SPSS
read the whole data file; multiple EXECUTEs can
badly slow processing of big files. (EXECUTE also
changes the behavior of code, and can break it.
Don't insert EXECUTEs in complicated code you don't fully understand.)

.....................
B. What EXECUTE does

If you write the transformation code,

COMPUTE C = A + B.
EXECUTE.
COMPUTE D = E/C.
EXECUTE.

at the first EXECUTE, SPSS reads the; it computes
the value of C for every case; and saves the
resulting file (with all variables), as the
active file. At the second EXECUTE, it reads the
file again; computes D for every case, using the
computed value of C; and saves the file again.
That is five passes through the data: reading
twice, writing once. (Optimizations in recent
versions of SPSS will save some of this.)

If you write, instead

COMPUTE C = A + B.
COMPUTE D = E/C.

followed by a procedure or SAVE, SPSS does the
computations while reading the file to pass data
to the procedure or SAVE, and the computation
itself takes no data pass. In this code, when
SPSS reads a case, it computes the value of C for
that case; then computes the value of D for the
same case; and then passes the transformed values where they're needed.

.........................
C. When EXECUTE is useful

Again, see "Use EXECUTE Sparingly" in Raynald Levesque's book.

To summarize, use EXECUTE when,

-> Explicit output: You have a transformation
program whose purpose is to write data explicitly
(using XSAVE, WRITE, or PRINT), rather than to
transform the working file. (This is the most common case.)

-> Selection depending on multiple cases: for
example, "discard the cases after ones where
Y=1"; "keep every fifth case". These may need a
KEEP_IT variable to mark which cases are to be
kept; then EXECUTE, to calculate that variable; and then
SELECT IF (KEEP_IT=1).
See "Using $CASENUM to Select Cases" in Raynald's
book, and http://www.spsstools.net/spsstips.htm on his Web site.

-> LAG and transformations: You transform a
variable, but use the LAG of the pre-transformed
value (see "Lag Functions"). (It's best, though,
not to transform a variable that you are also
LAGging. There are generally other ways to the same result.)

-> MISSING VALUES: Your case assumes two
different sets of MISSING VALUES for one
variable. (Only one set can be in effect during
one data pass. See "MISSING VALUES Command".)
However, you can usually avoid the need, using
tools such as functions MISSING and VALUE.

Finally,

-> To see the results: You're working
interactively, and want to see the results of
your transformations. (Clicking "Run Pending
Transformations" issues an EXECUTE.) This does
increase the computer time you use; but computers
exist to give us results when we need them. (And
for modest-sized files, you likely won't notice.)

HOWEVER, this may give the impression that
EXECUTE is necessary to make transformations take
effect. As said above, it isn't; SPSS carries out
all transformations as soon as their results are needed.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD