|
I haven't posted this for a long time, but
several recent postings have EXECUTE or exe. statements in example code. None of those recently posted are needed, and it's important to know this; unnecessary EXECUTEs can slow processing badly. (For a recent EXECUTE that is needed, see my "Re: Question: print or list in if condition", Wed, 11 Jun 2008.) FAQ: Avoid using EXECUTE An occasional reminder: there are very few occasions when EXECUTE is needed. EXECUTE is not needed after a transformation, or several transformations; the transformations are carried out when they are needed, when the next procedure or SAVE is executed. It's confusing that you don't *see* transformation results in the Data Editor, unless you run EXECUTE, or click "Run Pending Transformations" (which is the same thing). It's often worth doing that, just to see what you've done. But if you don't, the next procedure or save will still get the results of the transformations. EXECUTE is treated very well in section "Use EXECUTE Sparingly" in any edition of Raynald Levesque's book: Levesque, Raynald, "SPSS® Programming and Data Management, A Guide for SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005. (Downloadable free from the SPSS, Inc., Web site.) And EXECUTE isn't harmless. EXECUTE makes SPSS read the whole data file; multiple EXECUTEs can badly slow processing of big files. ..................... The logic of EXECUTE: In the transformations, COMPUTE C = A + B. EXECUTE. COMPUTE D = E/C. EXECUTE. At the first EXECUTE, the file is read; the value of C is computed for every case; and the resulting file (with all variables) is saved, as a scratch file. At the second EXECUTE, the file is read again; D is computed for every case, using the computed value of C; and the file is saved again. Five passes through the data: reading twice, writing once. (Recent versions of SPSS do optimizations that will save some of this.) If you write, instead COMPUTE C = A + B. COMPUTE D = E/C. and then whatever procedure or SAVE is desired, the computations are done when the file is read for the procedure or SAVE, needing no data passes for the computation. In this logic, SPSS computes the value of C for every case, then computes the value of D for the same case, and then proceeds to the next case. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Funny discussion imho which arises from time to time. Why is there a
command one shouldn't use? Why aren't transformations carried out immediately as in other software-packages? And: Is EXECUTE realy a major reason for slow behaviour of SPSS? This depends on the size of the dataset, and the new interface definitely wastes much more time than some executes on some 1000 cases. So to me it seems the behaviour of SPSS should be improved in this respect, not the behaviour of the users. Richard Ristow wrote: > I haven't posted this for a long time, but > several recent postings have EXECUTE or exe. > statements in example code. None of those > recently posted are needed, and it's important to > know this; unnecessary EXECUTEs can slow processing badly. > > (For a recent EXECUTE that is needed, see my "Re: > Question: print or list in if condition", Wed, 11 Jun 2008.) > > > FAQ: Avoid using EXECUTE > > An occasional reminder: there are very few occasions when EXECUTE is > needed. > > EXECUTE is not needed after a transformation, or > several transformations; the transformations are > carried out when they are needed, when the next procedure or SAVE is > executed. > > It's confusing that you don't *see* > transformation results in the Data Editor, unless > you run EXECUTE, or click "Run Pending > Transformations" (which is the same thing). It's > often worth doing that, just to see what you've > done. But if you don't, the next procedure or > save will still get the results of the transformations. > > EXECUTE is treated very well in section "Use > EXECUTE Sparingly" in any edition of Raynald > Levesque's book: Levesque, Raynald, "SPSS® > Programming and Data Management, A Guide for > SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005. > (Downloadable free from the SPSS, Inc., Web site.) > > And EXECUTE isn't harmless. EXECUTE makes SPSS > read the whole data file; multiple EXECUTEs can > badly slow processing of big files. > > ..................... > The logic of EXECUTE: > > In the transformations, > > COMPUTE C = A + B. > EXECUTE. > COMPUTE D = E/C. > EXECUTE. > > At the first EXECUTE, the file is read; the value > of C is computed for every case; and the > resulting file (with all variables) is saved, as > a scratch file. At the second EXECUTE, the file > is read again; D is computed for every case, > using the computed value of C; and the file is > saved again. Five passes through the data: > reading twice, writing once. (Recent versions of > SPSS do optimizations that will save some of this.) > > If you write, instead > > COMPUTE C = A + B. > COMPUTE D = E/C. > > and then whatever procedure or SAVE is desired, > the computations are done when the file is read > for the procedure or SAVE, needing no data passes > for the computation. In this logic, SPSS computes > the value of C for every case, then computes the > value of D for the same case, and then proceeds to the next case. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi,
I have one additional question about the use of EXECUTE. Consider the following syntax. select if var1 = 1. execute. compute var2 = 1 / x. compute var3 = 1 / y. Given that the number of records is very large, and that var1 = 1 is (quite) rare, would it be more efficient to use EXECUTE before running any other computations? After all, it'd drastically decrease the size of the scratch file. Cheers!! Albert-Jan --- On Fri, 6/20/08, Christian Ganser <[hidden email]> wrote: > From: Christian Ganser <[hidden email]> > Subject: Re: FAQ: Avoid using EXECUTE > To: [hidden email] > Date: Friday, June 20, 2008, 8:56 AM > Funny discussion imho which arises from time to time. Why is > there a > command one shouldn't use? Why aren't > transformations carried out > immediately as in other software-packages? And: Is EXECUTE > realy a major > reason for slow behaviour of SPSS? This depends on the size > of the > dataset, and the new interface definitely wastes much more > time than > some executes on some 1000 cases. So to me it seems the > behaviour of > SPSS should be improved in this respect, not the behaviour > of the users. > > Richard Ristow wrote: > > I haven't posted this for a long time, but > > several recent postings have EXECUTE or exe. > > statements in example code. None of those > > recently posted are needed, and it's important to > > know this; unnecessary EXECUTEs can slow processing > badly. > > > > (For a recent EXECUTE that is needed, see my "Re: > > Question: print or list in if condition", Wed, 11 > Jun 2008.) > > > > > > FAQ: Avoid using EXECUTE > > > > An occasional reminder: there are very few occasions > when EXECUTE is > > needed. > > > > EXECUTE is not needed after a transformation, or > > several transformations; the transformations are > > carried out when they are needed, when the next > procedure or SAVE is > > executed. > > > > It's confusing that you don't *see* > > transformation results in the Data Editor, unless > > you run EXECUTE, or click "Run Pending > > Transformations" (which is the same thing). > It's > > often worth doing that, just to see what you've > > done. But if you don't, the next procedure or > > save will still get the results of the > transformations. > > > > EXECUTE is treated very well in section "Use > > EXECUTE Sparingly" in any edition of Raynald > > Levesque's book: Levesque, Raynald, "SPSS® > > Programming and Data Management, A Guide for > > SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, > 2005. > > (Downloadable free from the SPSS, Inc., Web site.) > > > > And EXECUTE isn't harmless. EXECUTE makes SPSS > > read the whole data file; multiple EXECUTEs can > > badly slow processing of big files. > > > > ..................... > > The logic of EXECUTE: > > > > In the transformations, > > > > COMPUTE C = A + B. > > EXECUTE. > > COMPUTE D = E/C. > > EXECUTE. > > > > At the first EXECUTE, the file is read; the value > > of C is computed for every case; and the > > resulting file (with all variables) is saved, as > > a scratch file. At the second EXECUTE, the file > > is read again; D is computed for every case, > > using the computed value of C; and the file is > > saved again. Five passes through the data: > > reading twice, writing once. (Recent versions of > > SPSS do optimizations that will save some of this.) > > > > If you write, instead > > > > COMPUTE C = A + B. > > COMPUTE D = E/C. > > > > and then whatever procedure or SAVE is desired, > > the computations are done when the file is read > > for the procedure or SAVE, needing no data passes > > for the computation. In this logic, SPSS computes > > the value of C for every case, then computes the > > value of D for the same case, and then proceeds to the > next case. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] (not to SPSSX-L), with no > body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send > the command > > INFO REFCARD > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
NO.
However, the difference would be small, because the size of the scratch file would be small. the "hidden steps" that are eliminated are marked with an asterisk (*). open a scratch file (I don't know if SPSS still has a "look ahead" to skip the whole scratch file process if it is not needed, read each case test for condition *write selected cases ( # writes is # selected) *"rewind" the scratch file *open the scratch file, *read the scratch file (# writes is # selected) do the transformations. Art Kendall Social Research Consultants Albert-jan Roskam wrote: > Hi, > > I have one additional question about the use of EXECUTE. Consider the following syntax. > select if var1 = 1. > execute. > compute var2 = 1 / x. > compute var3 = 1 / y. > > Given that the number of records is very large, and that var1 = 1 is (quite) rare, would it be more efficient to use EXECUTE before running any other computations? After all, it'd drastically decrease the size of the scratch file. > > Cheers!! > Albert-Jan > > > > > --- On Fri, 6/20/08, Christian Ganser <[hidden email]> wrote: > > >> From: Christian Ganser <[hidden email]> >> Subject: Re: FAQ: Avoid using EXECUTE >> To: [hidden email] >> Date: Friday, June 20, 2008, 8:56 AM >> Funny discussion imho which arises from time to time. Why is >> there a >> command one shouldn't use? Why aren't >> transformations carried out >> immediately as in other software-packages? And: Is EXECUTE >> realy a major >> reason for slow behaviour of SPSS? This depends on the size >> of the >> dataset, and the new interface definitely wastes much more >> time than >> some executes on some 1000 cases. So to me it seems the >> behaviour of >> SPSS should be improved in this respect, not the behaviour >> of the users. >> >> Richard Ristow wrote: >> >>> I haven't posted this for a long time, but >>> several recent postings have EXECUTE or exe. >>> statements in example code. None of those >>> recently posted are needed, and it's important to >>> know this; unnecessary EXECUTEs can slow processing >>> >> badly. >> >>> (For a recent EXECUTE that is needed, see my "Re: >>> Question: print or list in if condition", Wed, 11 >>> >> Jun 2008.) >> >>> FAQ: Avoid using EXECUTE >>> >>> An occasional reminder: there are very few occasions >>> >> when EXECUTE is >> >>> needed. >>> >>> EXECUTE is not needed after a transformation, or >>> several transformations; the transformations are >>> carried out when they are needed, when the next >>> >> procedure or SAVE is >> >>> executed. >>> >>> It's confusing that you don't *see* >>> transformation results in the Data Editor, unless >>> you run EXECUTE, or click "Run Pending >>> Transformations" (which is the same thing). >>> >> It's >> >>> often worth doing that, just to see what you've >>> done. But if you don't, the next procedure or >>> save will still get the results of the >>> >> transformations. >> >>> EXECUTE is treated very well in section "Use >>> EXECUTE Sparingly" in any edition of Raynald >>> Levesque's book: Levesque, Raynald, "SPSS® >>> Programming and Data Management, A Guide for >>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, >>> >> 2005. >> >>> (Downloadable free from the SPSS, Inc., Web site.) >>> >>> And EXECUTE isn't harmless. EXECUTE makes SPSS >>> read the whole data file; multiple EXECUTEs can >>> badly slow processing of big files. >>> >>> ..................... >>> The logic of EXECUTE: >>> >>> In the transformations, >>> >>> COMPUTE C = A + B. >>> EXECUTE. >>> COMPUTE D = E/C. >>> EXECUTE. >>> >>> At the first EXECUTE, the file is read; the value >>> of C is computed for every case; and the >>> resulting file (with all variables) is saved, as >>> a scratch file. At the second EXECUTE, the file >>> is read again; D is computed for every case, >>> using the computed value of C; and the file is >>> saved again. Five passes through the data: >>> reading twice, writing once. (Recent versions of >>> SPSS do optimizations that will save some of this.) >>> >>> If you write, instead >>> >>> COMPUTE C = A + B. >>> COMPUTE D = E/C. >>> >>> and then whatever procedure or SAVE is desired, >>> the computations are done when the file is read >>> for the procedure or SAVE, needing no data passes >>> for the computation. In this logic, SPSS computes >>> the value of C for every case, then computes the >>> value of D for the same case, and then proceeds to the >>> >> next case. >> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message >>> >> to >> >>> [hidden email] (not to SPSSX-L), with no >>> >> body text except the >> >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send >>> >> the command >> >>> INFO REFCARD >>> >>> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body >> text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the >> command >> INFO REFCARD >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Christian Ganser
There are some programming practices that don't do a lot of harm on
small exercise type application, but that can be harmful on many real problems. Over use of EXECUTE is one that we see quite often is posts on lists. It is helpful for beginners to be aware of things that people with decades of experience like Richard Ristow have seen to be problematic for themselves and clients. Unfortunately, is often true that people telling us where some of the pitfalls are is an exercise in futility. Teachers and consultants often feel like Cassandra must have felt. Spreadsheets are optimized for small scale problems. Aside: There are reasons why spreadsheets are not accepted by ISO for accounting purposes. They are great for what they do well and I use them all the time to send clients things output from SPSS and small problems with small amounts of data transformation. A major part of optimization for software designed for small problems is to put all of the data in memory. Also the amount of transformation done is typically limited compared to what is done with a stat package. Statistical applications in spreadsheets, especially those involving matrix inversion, and probability functions are well known to have numerical analysis problems resulting in unrealistic results. Packages like SPSS are optimized to handle a wide variety of file sizes. Part of SPSS's efficiency is that for most procedures it keeps only one case at a time in memory and can therefore use memory for summary information. If you have only a few cases (e.g, 1000) and a very up to date machine, the time saving by dropping execute commands will be small. That is one reason it is a common practice to use a test data set with something like 1000 cases while developing the application. Transformation syntax can end up being very long. For example, if one has a mid-sized set of syntax of 300 lines, eliminating 50 or 60 executes (and therefore a read pass , a write pass, then another read pass for each execute) can be very time-saving. Although it is true that this is no longer in terms of days or even many hours, it can be a substantial savings. During the development and debugging phase, it often more useful to look at intermediate results to check the logic of transformations, by using additional transformation test, doing descriptive stats, etc., than to eyeball the results. Art Kendall Social Research Consultants Christian Ganser wrote: > Funny discussion imho which arises from time to time. Why is there a > command one shouldn't use? Why aren't transformations carried out > immediately as in other software-packages? And: Is EXECUTE realy a major > reason for slow behaviour of SPSS? This depends on the size of the > dataset, and the new interface definitely wastes much more time than > some executes on some 1000 cases. So to me it seems the behaviour of > SPSS should be improved in this respect, not the behaviour of the users. > > Richard Ristow wrote: >> I haven't posted this for a long time, but >> several recent postings have EXECUTE or exe. >> statements in example code. None of those >> recently posted are needed, and it's important to >> know this; unnecessary EXECUTEs can slow processing badly. >> >> (For a recent EXECUTE that is needed, see my "Re: >> Question: print or list in if condition", Wed, 11 Jun 2008.) >> >> >> FAQ: Avoid using EXECUTE >> >> An occasional reminder: there are very few occasions when EXECUTE is >> needed. >> >> EXECUTE is not needed after a transformation, or >> several transformations; the transformations are >> carried out when they are needed, when the next procedure or SAVE is >> executed. >> >> It's confusing that you don't *see* >> transformation results in the Data Editor, unless >> you run EXECUTE, or click "Run Pending >> Transformations" (which is the same thing). It's >> often worth doing that, just to see what you've >> done. But if you don't, the next procedure or >> save will still get the results of the transformations. >> >> EXECUTE is treated very well in section "Use >> EXECUTE Sparingly" in any edition of Raynald >> Levesque's book: Levesque, Raynald, "SPSS® >> Programming and Data Management, A Guide for >> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005. >> (Downloadable free from the SPSS, Inc., Web site.) >> >> And EXECUTE isn't harmless. EXECUTE makes SPSS >> read the whole data file; multiple EXECUTEs can >> badly slow processing of big files. >> >> ..................... >> The logic of EXECUTE: >> >> In the transformations, >> >> COMPUTE C = A + B. >> EXECUTE. >> COMPUTE D = E/C. >> EXECUTE. >> >> At the first EXECUTE, the file is read; the value >> of C is computed for every case; and the >> resulting file (with all variables) is saved, as >> a scratch file. At the second EXECUTE, the file >> is read again; D is computed for every case, >> using the computed value of C; and the file is >> saved again. Five passes through the data: >> reading twice, writing once. (Recent versions of >> SPSS do optimizations that will save some of this.) >> >> If you write, instead >> >> COMPUTE C = A + B. >> COMPUTE D = E/C. >> >> and then whatever procedure or SAVE is desired, >> the computations are done when the file is read >> for the procedure or SAVE, needing no data passes >> for the computation. In this logic, SPSS computes >> the value of C for every case, then computes the >> value of D for the same case, and then proceeds to the next case. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Christian Ganser
The command is sometimes needed and often useful.
It's needed, for example, if you use commands such as WRITE or XSAVE to write out data files and you need those files closed before you can continue but the next step in your job doesn't contain any procedure commands that read the data. It's useful, for example, if you want to check the intermediate results of transformations when testing/debugging complicated jobs. By default, the GUI automatically generates EXECUTE syntax after each transformation because many users (like you) expect to see the results immediately. (BTW, you can turn this off in Edit>Options>Data.) In the absence of the EXECUTE command, transformations are not executed immediately because it can save a great deal of processing time for large datasets if each block of transformations is processed in a single step. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Christian Ganser Sent: Friday, June 20, 2008 1:56 AM To: [hidden email] Subject: Re: FAQ: Avoid using EXECUTE Funny discussion imho which arises from time to time. Why is there a command one shouldn't use? Why aren't transformations carried out immediately as in other software-packages? And: Is EXECUTE realy a major reason for slow behaviour of SPSS? This depends on the size of the dataset, and the new interface definitely wastes much more time than some executes on some 1000 cases. So to me it seems the behaviour of SPSS should be improved in this respect, not the behaviour of the users. Richard Ristow wrote: > I haven't posted this for a long time, but > several recent postings have EXECUTE or exe. > statements in example code. None of those > recently posted are needed, and it's important to > know this; unnecessary EXECUTEs can slow processing badly. > > (For a recent EXECUTE that is needed, see my "Re: > Question: print or list in if condition", Wed, 11 Jun 2008.) > > > FAQ: Avoid using EXECUTE > > An occasional reminder: there are very few occasions when EXECUTE is > needed. > > EXECUTE is not needed after a transformation, or > several transformations; the transformations are > carried out when they are needed, when the next procedure or SAVE is > executed. > > It's confusing that you don't *see* > transformation results in the Data Editor, unless > you run EXECUTE, or click "Run Pending > Transformations" (which is the same thing). It's > often worth doing that, just to see what you've > done. But if you don't, the next procedure or > save will still get the results of the transformations. > > EXECUTE is treated very well in section "Use > EXECUTE Sparingly" in any edition of Raynald > Levesque's book: Levesque, Raynald, "SPSS(r) > Programming and Data Management, A Guide for > SPSS(r) and SAS(r) Users". SPSS, Inc., Chicago, IL, 2005. > (Downloadable free from the SPSS, Inc., Web site.) > > And EXECUTE isn't harmless. EXECUTE makes SPSS > read the whole data file; multiple EXECUTEs can > badly slow processing of big files. > > ..................... > The logic of EXECUTE: > > In the transformations, > > COMPUTE C = A + B. > EXECUTE. > COMPUTE D = E/C. > EXECUTE. > > At the first EXECUTE, the file is read; the value > of C is computed for every case; and the > resulting file (with all variables) is saved, as > a scratch file. At the second EXECUTE, the file > is read again; D is computed for every case, > using the computed value of C; and the file is > saved again. Five passes through the data: > reading twice, writing once. (Recent versions of > SPSS do optimizations that will save some of this.) > > If you write, instead > > COMPUTE C = A + B. > COMPUTE D = E/C. > > and then whatever procedure or SAVE is desired, > the computations are done when the file is read > for the procedure or SAVE, needing no data passes > for the computation. In this logic, SPSS computes > the value of C for every case, then computes the > value of D for the same case, and then proceeds to the next case. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Art Kendall
Well, I didn't want to question Richard Ristow's experience. What I
thought of weren't spredsheets but rather Stata as an example, which I don't know to have a similar command and where I don't eyeball the results, as I don't in SPSS - just seems to me that in many cases there are other things that would save more time than dropping some executes. But I might be thinking too much of my own problems with up to 100,000 cases, not more, mostly less. Art Kendall wrote: > There are some programming practices that don't do a lot of harm on > small exercise type application, but that can be harmful on many real > problems. Over use of EXECUTE is one that we see quite often is posts > on lists. > It is helpful for beginners to be aware of things that people with > decades of experience like Richard Ristow have seen to be problematic > for themselves and clients. > Unfortunately, is often true that people telling us where some of the > pitfalls are is an exercise in futility. Teachers and consultants often > feel like Cassandra must have felt. > > Spreadsheets are optimized for small scale problems. Aside: There are > reasons why spreadsheets are not accepted by ISO for accounting > purposes. They are great for what they do well and I use them all the > time to send clients things output from SPSS and small problems with > small amounts of data transformation. > > A major part of optimization for software designed for small problems is > to put all of the data in memory. > Also the amount of transformation done is typically limited compared > to what is done with a stat package. Statistical applications in > spreadsheets, especially those involving matrix inversion, and > probability functions are well known to have numerical analysis problems > resulting in unrealistic results. > > Packages like SPSS are optimized to handle a wide variety of file > sizes. Part of SPSS's efficiency is that for most procedures it keeps > only one case at a time in memory and can therefore use memory for > summary information. > > If you have only a few cases (e.g, 1000) and a very up to date machine, > the time saving by dropping execute commands will be small. That is one > reason it is a common practice to use a test data set with something > like 1000 cases while developing the application. > > Transformation syntax can end up being very long. For example, if one > has a mid-sized set of syntax of 300 lines, eliminating 50 or 60 > executes (and therefore a read pass , a write pass, then another read > pass for each execute) can be very time-saving. Although it is true > that this is no longer in terms of days or even many hours, it can be a > substantial savings. > > During the development and debugging phase, it often more useful to look > at intermediate results to check the logic of transformations, by using > additional transformation test, doing descriptive stats, etc., than to > eyeball the results. > > Art Kendall > Social Research Consultants > > > > Christian Ganser wrote: >> Funny discussion imho which arises from time to time. Why is there a >> command one shouldn't use? Why aren't transformations carried out >> immediately as in other software-packages? And: Is EXECUTE realy a major >> reason for slow behaviour of SPSS? This depends on the size of the >> dataset, and the new interface definitely wastes much more time than >> some executes on some 1000 cases. So to me it seems the behaviour of >> SPSS should be improved in this respect, not the behaviour of the users. >> >> Richard Ristow wrote: >>> I haven't posted this for a long time, but >>> several recent postings have EXECUTE or exe. >>> statements in example code. None of those >>> recently posted are needed, and it's important to >>> know this; unnecessary EXECUTEs can slow processing badly. >>> >>> (For a recent EXECUTE that is needed, see my "Re: >>> Question: print or list in if condition", Wed, 11 Jun 2008.) >>> >>> >>> FAQ: Avoid using EXECUTE >>> >>> An occasional reminder: there are very few occasions when EXECUTE is >>> needed. >>> >>> EXECUTE is not needed after a transformation, or >>> several transformations; the transformations are >>> carried out when they are needed, when the next procedure or SAVE is >>> executed. >>> >>> It's confusing that you don't *see* >>> transformation results in the Data Editor, unless >>> you run EXECUTE, or click "Run Pending >>> Transformations" (which is the same thing). It's >>> often worth doing that, just to see what you've >>> done. But if you don't, the next procedure or >>> save will still get the results of the transformations. >>> >>> EXECUTE is treated very well in section "Use >>> EXECUTE Sparingly" in any edition of Raynald >>> Levesque's book: Levesque, Raynald, "SPSS® >>> Programming and Data Management, A Guide for >>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005. >>> (Downloadable free from the SPSS, Inc., Web site.) >>> >>> And EXECUTE isn't harmless. EXECUTE makes SPSS >>> read the whole data file; multiple EXECUTEs can >>> badly slow processing of big files. >>> >>> ..................... >>> The logic of EXECUTE: >>> >>> In the transformations, >>> >>> COMPUTE C = A + B. >>> EXECUTE. >>> COMPUTE D = E/C. >>> EXECUTE. >>> >>> At the first EXECUTE, the file is read; the value >>> of C is computed for every case; and the >>> resulting file (with all variables) is saved, as >>> a scratch file. At the second EXECUTE, the file >>> is read again; D is computed for every case, >>> using the computed value of C; and the file is >>> saved again. Five passes through the data: >>> reading twice, writing once. (Recent versions of >>> SPSS do optimizations that will save some of this.) >>> >>> If you write, instead >>> >>> COMPUTE C = A + B. >>> COMPUTE D = E/C. >>> >>> and then whatever procedure or SAVE is desired, >>> the computations are done when the file is read >>> for the procedure or SAVE, needing no data passes >>> for the computation. In this logic, SPSS computes >>> the value of C for every case, then computes the >>> value of D for the same case, and then proceeds to the next case. >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except >>> the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
If memory serves, Stata holds all the data in memory, so the size of a dataset is limited by available memory. SPSS doesn't have this limitation. An SPSS dataset can contain billions of cases.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Christian Ganser Sent: Friday, June 20, 2008 9:28 AM To: [hidden email] Subject: Re: FAQ: Avoid using EXECUTE Well, I didn't want to question Richard Ristow's experience. What I thought of weren't spredsheets but rather Stata as an example, which I don't know to have a similar command and where I don't eyeball the results, as I don't in SPSS - just seems to me that in many cases there are other things that would save more time than dropping some executes. But I might be thinking too much of my own problems with up to 100,000 cases, not more, mostly less. Art Kendall wrote: > There are some programming practices that don't do a lot of harm on > small exercise type application, but that can be harmful on many real > problems. Over use of EXECUTE is one that we see quite often is posts > on lists. > It is helpful for beginners to be aware of things that people with > decades of experience like Richard Ristow have seen to be problematic > for themselves and clients. > Unfortunately, is often true that people telling us where some of the > pitfalls are is an exercise in futility. Teachers and consultants often > feel like Cassandra must have felt. > > Spreadsheets are optimized for small scale problems. Aside: There are > reasons why spreadsheets are not accepted by ISO for accounting > purposes. They are great for what they do well and I use them all the > time to send clients things output from SPSS and small problems with > small amounts of data transformation. > > A major part of optimization for software designed for small problems is > to put all of the data in memory. > Also the amount of transformation done is typically limited compared > to what is done with a stat package. Statistical applications in > spreadsheets, especially those involving matrix inversion, and > probability functions are well known to have numerical analysis problems > resulting in unrealistic results. > > Packages like SPSS are optimized to handle a wide variety of file > sizes. Part of SPSS's efficiency is that for most procedures it keeps > only one case at a time in memory and can therefore use memory for > summary information. > > If you have only a few cases (e.g, 1000) and a very up to date machine, > the time saving by dropping execute commands will be small. That is one > reason it is a common practice to use a test data set with something > like 1000 cases while developing the application. > > Transformation syntax can end up being very long. For example, if one > has a mid-sized set of syntax of 300 lines, eliminating 50 or 60 > executes (and therefore a read pass , a write pass, then another read > pass for each execute) can be very time-saving. Although it is true > that this is no longer in terms of days or even many hours, it can be a > substantial savings. > > During the development and debugging phase, it often more useful to look > at intermediate results to check the logic of transformations, by using > additional transformation test, doing descriptive stats, etc., than to > eyeball the results. > > Art Kendall > Social Research Consultants > > > > Christian Ganser wrote: >> Funny discussion imho which arises from time to time. Why is there a >> command one shouldn't use? Why aren't transformations carried out >> immediately as in other software-packages? And: Is EXECUTE realy a major >> reason for slow behaviour of SPSS? This depends on the size of the >> dataset, and the new interface definitely wastes much more time than >> some executes on some 1000 cases. So to me it seems the behaviour of >> SPSS should be improved in this respect, not the behaviour of the users. >> >> Richard Ristow wrote: >>> I haven't posted this for a long time, but >>> several recent postings have EXECUTE or exe. >>> statements in example code. None of those >>> recently posted are needed, and it's important to >>> know this; unnecessary EXECUTEs can slow processing badly. >>> >>> (For a recent EXECUTE that is needed, see my "Re: >>> Question: print or list in if condition", Wed, 11 Jun 2008.) >>> >>> >>> FAQ: Avoid using EXECUTE >>> >>> An occasional reminder: there are very few occasions when EXECUTE is >>> needed. >>> >>> EXECUTE is not needed after a transformation, or >>> several transformations; the transformations are >>> carried out when they are needed, when the next procedure or SAVE is >>> executed. >>> >>> It's confusing that you don't *see* >>> transformation results in the Data Editor, unless >>> you run EXECUTE, or click "Run Pending >>> Transformations" (which is the same thing). It's >>> often worth doing that, just to see what you've >>> done. But if you don't, the next procedure or >>> save will still get the results of the transformations. >>> >>> EXECUTE is treated very well in section "Use >>> EXECUTE Sparingly" in any edition of Raynald >>> Levesque's book: Levesque, Raynald, "SPSS(r) >>> Programming and Data Management, A Guide for >>> SPSS(r) and SAS(r) Users". SPSS, Inc., Chicago, IL, 2005. >>> (Downloadable free from the SPSS, Inc., Web site.) >>> >>> And EXECUTE isn't harmless. EXECUTE makes SPSS >>> read the whole data file; multiple EXECUTEs can >>> badly slow processing of big files. >>> >>> ..................... >>> The logic of EXECUTE: >>> >>> In the transformations, >>> >>> COMPUTE C = A + B. >>> EXECUTE. >>> COMPUTE D = E/C. >>> EXECUTE. >>> >>> At the first EXECUTE, the file is read; the value >>> of C is computed for every case; and the >>> resulting file (with all variables) is saved, as >>> a scratch file. At the second EXECUTE, the file >>> is read again; D is computed for every case, >>> using the computed value of C; and the file is >>> saved again. Five passes through the data: >>> reading twice, writing once. (Recent versions of >>> SPSS do optimizations that will save some of this.) >>> >>> If you write, instead >>> >>> COMPUTE C = A + B. >>> COMPUTE D = E/C. >>> >>> and then whatever procedure or SAVE is desired, >>> the computations are done when the file is read >>> for the procedure or SAVE, needing no data passes >>> for the computation. In this logic, SPSS computes >>> the value of C for every case, then computes the >>> value of D for the same case, and then proceeds to the next case. >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except >>> the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Christian Ganser
There are *additional *things that save time. The FAQ posted is not
the only thing that could b e a FAQ. See Raynald's book that is free from SPSS. What else would save time, depends on the nature of your data, the nature of your transformations, and the nature of your analysis. For example, it is a rare occasion that you will not want to refine your syntax in some way. You can save time IN THE LONG RUN, by being sure that you can go back and redo any or all of your work, e.g., by never saving with the same filename, never writing over variables you transform, etc. It is also a rare occasion that you will do the whole project with no interruptions such as meals, bathroom breaks, interruptions, etc. You can save time if you try to write in such a way that you yourself, or someone else at another time, will know what you were thinking. Art Christian Ganser wrote: > Well, I didn't want to question Richard Ristow's experience. What I > thought of weren't spredsheets but rather Stata as an example, which I > don't know to have a similar command and where I don't eyeball the > results, as I don't in SPSS - just seems to me that in many cases > there are other things that would save more time than dropping some > executes. But I might be thinking too much of my own problems with up > to 100,000 cases, not more, mostly less. > > Art Kendall wrote: >> There are some programming practices that don't do a lot of harm on >> small exercise type application, but that can be harmful on many real >> problems. Over use of EXECUTE is one that we see quite often is posts >> on lists. >> It is helpful for beginners to be aware of things that people with >> decades of experience like Richard Ristow have seen to be problematic >> for themselves and clients. >> Unfortunately, is often true that people telling us where some of the >> pitfalls are is an exercise in futility. Teachers and consultants often >> feel like Cassandra must have felt. >> >> Spreadsheets are optimized for small scale problems. Aside: There are >> reasons why spreadsheets are not accepted by ISO for accounting >> purposes. They are great for what they do well and I use them all the >> time to send clients things output from SPSS and small problems with >> small amounts of data transformation. >> >> A major part of optimization for software designed for small problems is >> to put all of the data in memory. >> Also the amount of transformation done is typically limited compared >> to what is done with a stat package. Statistical applications in >> spreadsheets, especially those involving matrix inversion, and >> probability functions are well known to have numerical analysis problems >> resulting in unrealistic results. >> >> Packages like SPSS are optimized to handle a wide variety of file >> sizes. Part of SPSS's efficiency is that for most procedures it keeps >> only one case at a time in memory and can therefore use memory for >> summary information. >> >> If you have only a few cases (e.g, 1000) and a very up to date machine, >> the time saving by dropping execute commands will be small. That is one >> reason it is a common practice to use a test data set with something >> like 1000 cases while developing the application. >> >> Transformation syntax can end up being very long. For example, if one >> has a mid-sized set of syntax of 300 lines, eliminating 50 or 60 >> executes (and therefore a read pass , a write pass, then another read >> pass for each execute) can be very time-saving. Although it is true >> that this is no longer in terms of days or even many hours, it can be a >> substantial savings. >> >> During the development and debugging phase, it often more useful to look >> at intermediate results to check the logic of transformations, by using >> additional transformation test, doing descriptive stats, etc., than to >> eyeball the results. >> >> Art Kendall >> Social Research Consultants >> >> >> >> Christian Ganser wrote: >>> Funny discussion imho which arises from time to time. Why is there a >>> command one shouldn't use? Why aren't transformations carried out >>> immediately as in other software-packages? And: Is EXECUTE realy a >>> major >>> reason for slow behaviour of SPSS? This depends on the size of the >>> dataset, and the new interface definitely wastes much more time than >>> some executes on some 1000 cases. So to me it seems the behaviour of >>> SPSS should be improved in this respect, not the behaviour of the >>> users. >>> >>> Richard Ristow wrote: >>>> I haven't posted this for a long time, but >>>> several recent postings have EXECUTE or exe. >>>> statements in example code. None of those >>>> recently posted are needed, and it's important to >>>> know this; unnecessary EXECUTEs can slow processing badly. >>>> >>>> (For a recent EXECUTE that is needed, see my "Re: >>>> Question: print or list in if condition", Wed, 11 Jun 2008.) >>>> >>>> >>>> FAQ: Avoid using EXECUTE >>>> >>>> An occasional reminder: there are very few occasions when EXECUTE is >>>> needed. >>>> >>>> EXECUTE is not needed after a transformation, or >>>> several transformations; the transformations are >>>> carried out when they are needed, when the next procedure or SAVE is >>>> executed. >>>> >>>> It's confusing that you don't *see* >>>> transformation results in the Data Editor, unless >>>> you run EXECUTE, or click "Run Pending >>>> Transformations" (which is the same thing). It's >>>> often worth doing that, just to see what you've >>>> done. But if you don't, the next procedure or >>>> save will still get the results of the transformations. >>>> >>>> EXECUTE is treated very well in section "Use >>>> EXECUTE Sparingly" in any edition of Raynald >>>> Levesque's book: Levesque, Raynald, "SPSS® >>>> Programming and Data Management, A Guide for >>>> SPSS® and SAS® Users". SPSS, Inc., Chicago, IL, 2005. >>>> (Downloadable free from the SPSS, Inc., Web site.) >>>> >>>> And EXECUTE isn't harmless. EXECUTE makes SPSS >>>> read the whole data file; multiple EXECUTEs can >>>> badly slow processing of big files. >>>> >>>> ..................... >>>> The logic of EXECUTE: >>>> >>>> In the transformations, >>>> >>>> COMPUTE C = A + B. >>>> EXECUTE. >>>> COMPUTE D = E/C. >>>> EXECUTE. >>>> >>>> At the first EXECUTE, the file is read; the value >>>> of C is computed for every case; and the >>>> resulting file (with all variables) is saved, as >>>> a scratch file. At the second EXECUTE, the file >>>> is read again; D is computed for every case, >>>> using the computed value of C; and the file is >>>> saved again. Five passes through the data: >>>> reading twice, writing once. (Recent versions of >>>> SPSS do optimizations that will save some of this.) >>>> >>>> If you write, instead >>>> >>>> COMPUTE C = A + B. >>>> COMPUTE D = E/C. >>>> >>>> and then whatever procedure or SAVE is desired, >>>> the computations are done when the file is read >>>> for the procedure or SAVE, needing no data passes >>>> for the computation. In this logic, SPSS computes >>>> the value of C for every case, then computes the >>>> value of D for the same case, and then proceeds to the next case. >>>> >>>> ===================== >>>> To manage your subscription to SPSSX-L, send a message to >>>> [hidden email] (not to SPSSX-L), with no body text >>>> except the >>>> command. To leave the list, send the command >>>> SIGNOFF SPSSX-L >>>> For a list of commands to manage subscriptions, send the command >>>> INFO REFCARD >>>> >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> [hidden email] (not to SPSSX-L), with no body text except >>> the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >>> >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
I agree: usually it's not the hardware (computer), or the software (spss) that is the bottleneck, rather it is the humble researcher. My syntaxes always have a header (title, purpose, context/project, author, date, version, note) and they also have lots of comments. That sounds like a lot of work, but in the end it's not. It streamlines and focuses one's thoughts and makes code readable for my co-workers. When I get back from vacation, I sometimes really need my own comments! I am generally reluctant to save lots of intermediate files, but when it serves to save on time-consuming (e.g. pre-processing) computations I think it's a good idea. Still, after debugging, I still prefer to run everything again from A to Z.
Cheers! Albert-Jan --- On Fri, 6/20/08, Art Kendall <[hidden email]> wrote: > From: Art Kendall <[hidden email]> > Subject: Re: FAQ: Avoid using EXECUTE > To: [hidden email] > Date: Friday, June 20, 2008, 4:53 PM > There are *additional *things that save time. The FAQ > posted is not > the only thing that could b e a FAQ. > See Raynald's book that is free from SPSS. > > What else would save time, depends on the nature of your > data, the > nature of your transformations, and the nature of your > analysis. > For example, it is a rare occasion that you will not want > to refine your > syntax in some way. You can save time IN THE LONG RUN, by > being sure > that you can go back and redo any or all of your work, > e.g., by never > saving with the same filename, never writing over variables > you > transform, etc. > > It is also a rare occasion that you will do the whole > project with no > interruptions such as meals, bathroom breaks, > interruptions, etc. You > can save time if you try to write in such a way that you > yourself, or > someone else at another time, will know what you were > thinking. > > Art > > > Christian Ganser wrote: > > Well, I didn't want to question Richard > Ristow's experience. What I > > thought of weren't spredsheets but rather Stata as > an example, which I > > don't know to have a similar command and where I > don't eyeball the > > results, as I don't in SPSS - just seems to me > that in many cases > > there are other things that would save more time than > dropping some > > executes. But I might be thinking too much of my own > problems with up > > to 100,000 cases, not more, mostly less. > > > > Art Kendall wrote: > >> There are some programming practices that > don't do a lot of harm on > >> small exercise type application, but that can be > harmful on many real > >> problems. Over use of EXECUTE is one that we see > quite often is posts > >> on lists. > >> It is helpful for beginners to be aware of things > that people with > >> decades of experience like Richard Ristow have > seen to be problematic > >> for themselves and clients. > >> Unfortunately, is often true that people telling > us where some of the > >> pitfalls are is an exercise in futility. Teachers > and consultants often > >> feel like Cassandra must have felt. > >> > >> Spreadsheets are optimized for small scale > problems. Aside: There are > >> reasons why spreadsheets are not accepted by ISO > for accounting > >> purposes. They are great for what they do well > and I use them all the > >> time to send clients things output from SPSS and > small problems with > >> small amounts of data transformation. > >> > >> A major part of optimization for software designed > for small problems is > >> to put all of the data in memory. > >> Also the amount of transformation done is > typically limited compared > >> to what is done with a stat package. Statistical > applications in > >> spreadsheets, especially those involving matrix > inversion, and > >> probability functions are well known to have > numerical analysis problems > >> resulting in unrealistic results. > >> > >> Packages like SPSS are optimized to handle a wide > variety of file > >> sizes. Part of SPSS's efficiency is that for > most procedures it keeps > >> only one case at a time in memory and can > therefore use memory for > >> summary information. > >> > >> If you have only a few cases (e.g, 1000) and a > very up to date machine, > >> the time saving by dropping execute commands will > be small. That is one > >> reason it is a common practice to use a test data > set with something > >> like 1000 cases while developing the application. > >> > >> Transformation syntax can end up being very long. > For example, if one > >> has a mid-sized set of syntax of 300 lines, > eliminating 50 or 60 > >> executes (and therefore a read pass , a write > pass, then another read > >> pass for each execute) can be very time-saving. > Although it is true > >> that this is no longer in terms of days or even > many hours, it can be a > >> substantial savings. > >> > >> During the development and debugging phase, it > often more useful to look > >> at intermediate results to check the logic of > transformations, by using > >> additional transformation test, doing descriptive > stats, etc., than to > >> eyeball the results. > >> > >> Art Kendall > >> Social Research Consultants > >> > >> > >> > >> Christian Ganser wrote: > >>> Funny discussion imho which arises from time > to time. Why is there a > >>> command one shouldn't use? Why aren't > transformations carried out > >>> immediately as in other software-packages? > And: Is EXECUTE realy a > >>> major > >>> reason for slow behaviour of SPSS? This > depends on the size of the > >>> dataset, and the new interface definitely > wastes much more time than > >>> some executes on some 1000 cases. So to me it > seems the behaviour of > >>> SPSS should be improved in this respect, not > the behaviour of the > >>> users. > >>> > >>> Richard Ristow wrote: > >>>> I haven't posted this for a long time, > but > >>>> several recent postings have EXECUTE or > exe. > >>>> statements in example code. None of those > >>>> recently posted are needed, and it's > important to > >>>> know this; unnecessary EXECUTEs can slow > processing badly. > >>>> > >>>> (For a recent EXECUTE that is needed, see > my "Re: > >>>> Question: print or list in if > condition", Wed, 11 Jun 2008.) > >>>> > >>>> > >>>> FAQ: Avoid using EXECUTE > >>>> > >>>> An occasional reminder: there are very few > occasions when EXECUTE is > >>>> needed. > >>>> > >>>> EXECUTE is not needed after a > transformation, or > >>>> several transformations; the > transformations are > >>>> carried out when they are needed, when the > next procedure or SAVE is > >>>> executed. > >>>> > >>>> It's confusing that you don't > *see* > >>>> transformation results in the Data Editor, > unless > >>>> you run EXECUTE, or click "Run > Pending > >>>> Transformations" (which is the same > thing). It's > >>>> often worth doing that, just to see what > you've > >>>> done. But if you don't, the next > procedure or > >>>> save will still get the results of the > transformations. > >>>> > >>>> EXECUTE is treated very well in section > "Use > >>>> EXECUTE Sparingly" in any edition of > Raynald > >>>> Levesque's book: Levesque, Raynald, > "SPSS® > >>>> Programming and Data Management, A Guide > for > >>>> SPSS® and SAS® Users". SPSS, Inc., > Chicago, IL, 2005. > >>>> (Downloadable free from the SPSS, Inc., > Web site.) > >>>> > >>>> And EXECUTE isn't harmless. EXECUTE > makes SPSS > >>>> read the whole data file; multiple > EXECUTEs can > >>>> badly slow processing of big files. > >>>> > >>>> ..................... > >>>> The logic of EXECUTE: > >>>> > >>>> In the transformations, > >>>> > >>>> COMPUTE C = A + B. > >>>> EXECUTE. > >>>> COMPUTE D = E/C. > >>>> EXECUTE. > >>>> > >>>> At the first EXECUTE, the file is read; > the value > >>>> of C is computed for every case; and the > >>>> resulting file (with all variables) is > saved, as > >>>> a scratch file. At the second EXECUTE, the > file > >>>> is read again; D is computed for every > case, > >>>> using the computed value of C; and the > file is > >>>> saved again. Five passes through the data: > >>>> reading twice, writing once. (Recent > versions of > >>>> SPSS do optimizations that will save some > of this.) > >>>> > >>>> If you write, instead > >>>> > >>>> COMPUTE C = A + B. > >>>> COMPUTE D = E/C. > >>>> > >>>> and then whatever procedure or SAVE is > desired, > >>>> the computations are done when the file is > read > >>>> for the procedure or SAVE, needing no data > passes > >>>> for the computation. In this logic, SPSS > computes > >>>> the value of C for every case, then > computes the > >>>> value of D for the same case, and then > proceeds to the next case. > >>>> > >>>> ===================== > >>>> To manage your subscription to SPSSX-L, > send a message to > >>>> [hidden email] (not to > SPSSX-L), with no body text > >>>> except the > >>>> command. To leave the list, send the > command > >>>> SIGNOFF SPSSX-L > >>>> For a list of commands to manage > subscriptions, send the command > >>>> INFO REFCARD > >>>> > >>> > >>> ===================== > >>> To manage your subscription to SPSSX-L, send a > message to > >>> [hidden email] (not to SPSSX-L), > with no body text except > >>> the > >>> command. To leave the list, send the command > >>> SIGNOFF SPSSX-L > >>> For a list of commands to manage > subscriptions, send the command > >>> INFO REFCARD > >>> > >>> > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a > message to > >> [hidden email] (not to SPSSX-L), with > no body text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, > send the command > >> INFO REFCARD > >> > > > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Christian Ganser
At 02:56 AM 6/20/2008, Christian Ganser wrote:
>Why aren't transformations carried out >immediately as in other software-packages? As Art Kendall wrote, it's an optimization that's very effective for large files. Main memory sizes ('RAM') seem to have grown faster than file sizes, so the optimization isn't as important as it was. >Why is there a command one shouldn't use? Sometimes EXECUTE can be useful, even necessary; see revised FAQ, below. (But the large majority in postings to the list, are not.) >Is EXECUTE really a major reason for slow >behaviour of SPSS? This depends on the size of the dataset, ... Right. The cost is very size-dependent, though it's also proportional to the number of EXECUTEs. If you use an EXECUTE after every transform command (and some users come close), you can slow performance even on a medium-size file. By the way, EXECUTE does change behavior, and may not be harmless. For example, it'll break the code if you put it in any construct such as DO IF, DO REPEAT, or LOOP; and it'll break most code that uses scratch variables. >The new interface definitely wastes much more >time than some executes on some 1000 cases. I don't doubt *that* for a moment, since the interface has been much complained of, and 1,000 cases is very small. >So to me it seems the behaviour of SPSS should >be improved in this respect, not the behaviour of the users. For that, on to the SPSS suggestion line. But take care in designing what you think the behavior should be, taking into account the warnings I've given. FAQ: Avoid using EXECUTE There are very few occasions when EXECUTE is needed, and it can be harmful. For an excellent treatment, see section "Use EXECUTE Sparingly" in any edition of: Levesque, Raynald, "SPSS® Programming and Data Management, A Guide for SPSS® and SAS® Users". SPSS, Inc., Chicago, IL (various dates). It is downloadable free from the SPSS, Inc., Web site. Below, see A. On not using EXECUTE B. What EXECUTE does C. When EXECUTE is useful A. On not using EXECUTE EXECUTE is not needed after a transformation, or several transformations; the transformations are carried out when they are needed, when the next procedure or SAVE is executed. It's confusing that you don't *see* transformation results in the Data Editor. Sometimes, you run EXECUTE, or click "Run Pending Transformations" (the same thing) just for that. But if you don't, the next procedure or SAVE will receive the results of the transformations. And EXECUTE isn't harmless. EXECUTE makes SPSS read the whole data file; multiple EXECUTEs can badly slow processing of big files. (EXECUTE also changes the behavior of code, and can break it. Don't insert EXECUTEs in complicated code you don't fully understand.) ..................... B. What EXECUTE does If you write the transformation code, COMPUTE C = A + B. EXECUTE. COMPUTE D = E/C. EXECUTE. at the first EXECUTE, SPSS reads the; it computes the value of C for every case; and saves the resulting file (with all variables), as the active file. At the second EXECUTE, it reads the file again; computes D for every case, using the computed value of C; and saves the file again. That is five passes through the data: reading twice, writing once. (Optimizations in recent versions of SPSS will save some of this.) If you write, instead COMPUTE C = A + B. COMPUTE D = E/C. followed by a procedure or SAVE, SPSS does the computations while reading the file to pass data to the procedure or SAVE, and the computation itself takes no data pass. In this code, when SPSS reads a case, it computes the value of C for that case; then computes the value of D for the same case; and then passes the transformed values where they're needed. ......................... C. When EXECUTE is useful Again, see "Use EXECUTE Sparingly" in Raynald Levesque's book. To summarize, use EXECUTE when, -> Explicit output: You have a transformation program whose purpose is to write data explicitly (using XSAVE, WRITE, or PRINT), rather than to transform the working file. (This is the most common case.) -> Selection depending on multiple cases: for example, "discard the cases after ones where Y=1"; "keep every fifth case". These may need a KEEP_IT variable to mark which cases are to be kept; then EXECUTE, to calculate that variable; and then SELECT IF (KEEP_IT=1). See "Using $CASENUM to Select Cases" in Raynald's book, and http://www.spsstools.net/spsstips.htm on his Web site. -> LAG and transformations: You transform a variable, but use the LAG of the pre-transformed value (see "Lag Functions"). (It's best, though, not to transform a variable that you are also LAGging. There are generally other ways to the same result.) -> MISSING VALUES: Your case assumes two different sets of MISSING VALUES for one variable. (Only one set can be in effect during one data pass. See "MISSING VALUES Command".) However, you can usually avoid the need, using tools such as functions MISSING and VALUE. Finally, -> To see the results: You're working interactively, and want to see the results of your transformations. (Clicking "Run Pending Transformations" issues an EXECUTE.) This does increase the computer time you use; but computers exist to give us results when we need them. (And for modest-sized files, you likely won't notice.) HOWEVER, this may give the impression that EXECUTE is necessary to make transformations take effect. As said above, it isn't; SPSS carries out all transformations as soon as their results are needed. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
