Use of EXECUTE

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Use of EXECUTE

Richard Ristow
I'm starting a new thread, because the topic has shifted some.

At 06:04 AM 4/3/2014, Moon Kid wrote, in thread "time-diff in minutes":

>On 2014-04-02 19:22 Richard Ristow <[hidden email]> wrote:
>
>>. EXECUTE isn't necessary. I remark, because if those who are list
>>guri post unnecessary EXECUTEs, we'll never teach everybody else not to.
>
>Can you specify that? In my understanding and observation COMPUTE
>has no effect without an EXECUTE.

That's easy to get confused about. As Jon Peck wrote recently(*),

>Statistics does lazy evaluation of transformations.  That means that
>they are executed the next time the data has to be passed such as
>with SAVE or a statistical procedure.  This saves a usually
>unnecessary data pass just for the transformations.

That is, when you 'run' a transformation command like

COMPUTE v0v = CTIME.MINUTES(v04bis + 86400 - v04von).

it doesn't perform the computations; it just adds the command to the
transformation program that's being built, which will be run the next
time you make a pass through the data. When you run a COMPUTE
interactively, indeed you won't see the results. That doesn't mean it
hasn't worked; it means it hasn't been run yet. It will be run, and
the results made available, when it is needed.

"EXECUTE" is a null procedure -- it makes a pass through the data,
but does nothing with it. If you write,

COMPUTE vov = v04bis - v04von.
EXECUTE.
IF      vov LT 0
         vov = vov + TIME.HMS(24).
EXECUTE.
COMPUTE vov = CTIME.MINUTES(vov).
EXECUTE.
DESCRIPTIVES VARIABLES=vov.

then SPSS makes a complete pass through the data to compute the first
value of 'vov'; then, another pass to execute the IF statement and
correct for crossing a midnight boundary; then a third, to convert to
minutes; and, finally, a fourth to run the DESCRIPTIVES. You've
forced the transformations to be run in variable-order.

If, instead, you write,

COMPUTE vov = v04bis - v04von.
IF      vov LT 0
         vov = vov + TIME.HMS(24).
COMPUTE vov = CTIME.MINUTES(vov).
DESCRIPTIVES VARIABLES=vov.

then, SPSS makes a single pass through the data to run the
DESCRIPTIVES; but, as part of that pass, all the transformation
commands are executed, in order by cases (or records). That can be a
drastic time saving, on a big file.

You'll want to read Levesque, Raynald and IBM Corp., *Programming and
Data Management for IBM SPSS Statistics 20: A Guide for IBM SPSS
Statistics and SAS Users*, IBM Corporation, 2011. It's available for
free download; there's a link on page
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/We70df3195ec8_4f95_9773_42e448fa9029/page/Books%20and%20Articles.
(And, although it refers to release 20, it seems to be the latest
edition available.) There's a section "Use EXECUTE sparingly" in "2.
Best Practices and Efficiency Tips",

And you also wrote,
>Thx, I really like clean and beautiful code.

Thank you! Appreciation is most appreciated.
===========================================
(*) See posting
Date:     Sat, 29 Mar 2014 06:44:24 -0600
From:     Jon K Peck <[hidden email]>
Subject:  Re: understand transformations
Comments: To: [hidden email]
To:       [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: time-diff in minutes

David Marso
Administrator
This post was updated on .
Moved this post from thread "time-diff in minutes" to here.

I provide the following without comment for rumination.
--
DATA LIST FREE/ a .
BEGIN DATA
1 2 3 4 5 6
END DATA.
LIST.
SELECT IF $CASENUM GT 1.
LIST.

DATA LIST FREE/ a .
BEGIN DATA
1 2 3 4 5 6
END DATA.
SELECT IF a*LAG(a) NE 0.
LIST.

DATA LIST FREE/ a .
BEGIN DATA
1 2 3 4 5 6
END DATA.
COMPUTE b=a*LAG(a) NE 0.
EXECUTE.
SELECT IF b.
LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Use of EXECUTE

David Marso
Administrator
In reply to this post by Richard Ristow
I would ask:  
"Why EVER do an EXECUTE rather than running an informative procedure?".

AND.  
People best watch out for the LAG function if you don't know about certain counterintuitive properties.

From the FM Universals section:
"Note: In a series of transformation commands without any intervening EXECUTE commands or other commands that read the data, lag functions are calculated after all other transformations, regardless of command order. ...."

Also consider SELECT IF very carefully.
Again from the FM:
"System variable $CASENUM is the sequence number of a case in the active dataset. Although it is syntactically correct to use $CASENUM on SELECT IF, it does not produce the expected results. To select a set of cases based on their sequence in a file, create your own sequence variable with the transformation language prior to making the selection".

The fine folks in the publications department could have been a little more specific.
SELECT IF ($CASENUM GT n) is impossible (for n >=1).
SELECT IF ($CASENUM LT n) is quite reasonable and will NOT produce any unexpected results.
Also, simply creating one's own sequence variable does not allow one to get away with case 1 without an intervening data pass (preferable one which provides information).


Richard Ristow wrote
I'm starting a new thread, because the topic has shifted some.

At 06:04 AM 4/3/2014, Moon Kid wrote, in thread "time-diff in minutes":

>On 2014-04-02 19:22 Richard Ristow <[hidden email]> wrote:
>
>>. EXECUTE isn't necessary. I remark, because if those who are list
>>guri post unnecessary EXECUTEs, we'll never teach everybody else not to.
>
>Can you specify that? In my understanding and observation COMPUTE
>has no effect without an EXECUTE.

That's easy to get confused about. As Jon Peck wrote recently(*),

>Statistics does lazy evaluation of transformations.  That means that
>they are executed the next time the data has to be passed such as
>with SAVE or a statistical procedure.  This saves a usually
>unnecessary data pass just for the transformations.

That is, when you 'run' a transformation command like

COMPUTE v0v = CTIME.MINUTES(v04bis + 86400 - v04von).

it doesn't perform the computations; it just adds the command to the
transformation program that's being built, which will be run the next
time you make a pass through the data. When you run a COMPUTE
interactively, indeed you won't see the results. That doesn't mean it
hasn't worked; it means it hasn't been run yet. It will be run, and
the results made available, when it is needed.

"EXECUTE" is a null procedure -- it makes a pass through the data,
but does nothing with it. If you write,

COMPUTE vov = v04bis - v04von.
EXECUTE.
IF      vov LT 0
         vov = vov + TIME.HMS(24).
EXECUTE.
COMPUTE vov = CTIME.MINUTES(vov).
EXECUTE.
DESCRIPTIVES VARIABLES=vov.

then SPSS makes a complete pass through the data to compute the first
value of 'vov'; then, another pass to execute the IF statement and
correct for crossing a midnight boundary; then a third, to convert to
minutes; and, finally, a fourth to run the DESCRIPTIVES. You've
forced the transformations to be run in variable-order.

If, instead, you write,

COMPUTE vov = v04bis - v04von.
IF      vov LT 0
         vov = vov + TIME.HMS(24).
COMPUTE vov = CTIME.MINUTES(vov).
DESCRIPTIVES VARIABLES=vov.

then, SPSS makes a single pass through the data to run the
DESCRIPTIVES; but, as part of that pass, all the transformation
commands are executed, in order by cases (or records). That can be a
drastic time saving, on a big file.

You'll want to read Levesque, Raynald and IBM Corp., *Programming and
Data Management for IBM SPSS Statistics 20: A Guide for IBM SPSS
Statistics and SAS Users*, IBM Corporation, 2011. It's available for
free download; there's a link on page
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/We70df3195ec8_4f95_9773_42e448fa9029/page/Books%20and%20Articles.
(And, although it refers to release 20, it seems to be the latest
edition available.) There's a section "Use EXECUTE sparingly" in "2.
Best Practices and Efficiency Tips",

And you also wrote,
>Thx, I really like clean and beautiful code.

Thank you! Appreciation is most appreciated.
===========================================
(*) See posting
Date:     Sat, 29 Mar 2014 06:44:24 -0600
From:     Jon K Peck <[hidden email]>
Subject:  Re: understand transformations
Comments: To: [hidden email]
To:       [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Use of EXECUTE

Rick Oliver-3
The value of $CASENUM represents the current position of the case in file order.  SELECT IF $CASENUM <= n is functionally equivalent to N OF CASES = n, so I would suggest using of N OF CASES if you just want the first n cases.

SELECT IF permanently deletes cases (if you save the file after SELECT IF, those cases are gone). I would recommend using FILTER instead, unless you really want to permanently delete cases.

SELECT IF $CASENUM > [positive integer] deletes all cases, because the value of $CASENUM changes dynamically as SELECT IF is processed. So when the first case is deleted because it doesn't meet the condition, the second case because case #1 and is consequently deleted since it doesn't meet the condition, etc.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        David Marso <[hidden email]>
To:        [hidden email],
Date:        04/03/2014 02:28 PM
Subject:        Re: Use of EXECUTE
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I would ask:
"Why EVER do an EXECUTE rather than running an informative procedure?".

AND.
People best watch out for the LAG function if you don't know about certain
counterintuitive properties.

From the FM Universals section:
"Note: In a series of transformation commands without any intervening
EXECUTE commands or other commands that read the data, lag functions are
calculated *after all other* transformations, regardless of command order.
...."

Also consider SELECT IF very carefully.
Again from the FM:
"System variable $CASENUM is the sequence number of a case in the active
dataset. Although it is syntactically correct to use $CASENUM on SELECT IF,
it does not produce the expected results. To select a set of cases based on
their sequence in a file, create your own sequence variable with the
transformation language prior to making the selection".

The fine folks in the publications department could have been a little more
specific.
SELECT IF ($CASENUM GT n) is impossible (for n >=1).
SELECT IF ($CASENUM LT n) is quite reasonable and will NOT produce any
unexpected results.
Also, simply creating one's own sequence variable does not allow one to get
away with case 1 without an intervening data pass (preferable one which
provides information).



Richard Ristow wrote
> I'm starting a new thread, because the topic has shifted some.
>
> At 06:04 AM 4/3/2014, Moon Kid wrote, in thread "time-diff in minutes":
>
>>On 2014-04-02 19:22 Richard Ristow &lt;

> wrristow@

> &gt; wrote:
>>
>>>. EXECUTE isn't necessary. I remark, because if those who are list
>>>guri post unnecessary EXECUTEs, we'll never teach everybody else not to.
>>
>>Can you specify that? In my understanding and observation COMPUTE
>>has no effect without an EXECUTE.
>
> That's easy to get confused about. As Jon Peck wrote recently(*),
>
>>Statistics does lazy evaluation of transformations.  That means that
>>they are executed the next time the data has to be passed such as
>>with SAVE or a statistical procedure.  This saves a usually
>>unnecessary data pass just for the transformations.
>
> That is, when you 'run' a transformation command like
>
> COMPUTE v0v = CTIME.MINUTES(v04bis + 86400 - v04von).
>
> it doesn't perform the computations; it just adds the command to the
> transformation program that's being built, which will be run the next
> time you make a pass through the data. When you run a COMPUTE
> interactively, indeed you won't see the results. That doesn't mean it
> hasn't worked; it means it hasn't been run yet. It will be run, and
> the results made available, when it is needed.
>
> "EXECUTE" is a null procedure -- it makes a pass through the data,
> but does nothing with it. If you write,
>
> COMPUTE vov = v04bis - v04von.
> EXECUTE.
> IF      vov LT 0
>          vov = vov + TIME.HMS(24).
> EXECUTE.
> COMPUTE vov = CTIME.MINUTES(vov).
> EXECUTE.
> DESCRIPTIVES VARIABLES=vov.
>
> then SPSS makes a complete pass through the data to compute the first
> value of 'vov'; then, another pass to execute the IF statement and
> correct for crossing a midnight boundary; then a third, to convert to
> minutes; and, finally, a fourth to run the DESCRIPTIVES. You've
> forced the transformations to be run in variable-order.
>
> If, instead, you write,
>
> COMPUTE vov = v04bis - v04von.
> IF      vov LT 0
>          vov = vov + TIME.HMS(24).
> COMPUTE vov = CTIME.MINUTES(vov).
> DESCRIPTIVES VARIABLES=vov.
>
> then, SPSS makes a single pass through the data to run the
> DESCRIPTIVES; but, as part of that pass, all the transformation
> commands are executed, in order by cases (or records). That can be a
> drastic time saving, on a big file.
>
> You'll want to read Levesque, Raynald and IBM Corp., *Programming and
> Data Management for IBM SPSS Statistics 20: A Guide for IBM SPSS
> Statistics and SAS Users*, IBM Corporation, 2011. It's available for
> free download; there's a link on page
>
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/We70df3195ec8_4f95_9773_42e448fa9029/page/Books%20and%20Articles.
> (And, although it refers to release 20, it seems to be the latest
> edition available.) There's a section "Use EXECUTE sparingly" in "2.
> Best Practices and Efficiency Tips",
>
> And you also wrote,
>>Thx, I really like clean and beautiful code.
>
> Thank you! Appreciation is most appreciated.
> ===========================================
> (*) See posting
> Date:     Sat, 29 Mar 2014 06:44:24 -0600
> From:     Jon K Peck &lt;

> peck@.ibm

> &gt;
> Subject:  Re: understand transformations
> Comments: To:

> moonkid@

> To:

> SPSSX-L@.UGA

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Use-of-EXECUTE-tp5725240p5725244.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Use of EXECUTE

Richard Ristow
In reply to this post by David Marso
At 03:24 PM 4/3/2014, David Marso wrote:

>Why EVER do an EXECUTE rather than running an informative procedure?

Well, I use EXECUTE to run transformation programs whose purpose is
to create an output .SAV file (with XSAVE)(1) or external file (with
WRITE)(2). Tastes vary; my own is, not to put in a procedure I don't
really want, just to make use of the data pass.

In that connection, I occasionally add an EXECUTE before an
informative procedure, if the transformation program is likely to
produce a number of warning messages. That 'wastes' a data pass, but
makes the output a lot clearer -- transformation-program warnings are
separated from the procedure output.

(1) XSAVE example:
Date:    Mon, 3 Feb 2014 16:59:05 -0500
From:    Richard Ristow <[hidden email]>
Subject: Re: Determining number of days used per month with a beginning &
          ending date
To:      [hidden email]
(2) WRITE example:
Date:    Mon, 17 Feb 2014 14:06:41 -0500
From:    Richard Ristow <[hidden email]>
Subject: Re: Automating Adding Value Labels
To:      [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Use of EXECUTE

Albert-Jan Roskam
In reply to this post by Rick Oliver-3
<snip>

>AND.
>People best watch out for the LAG function if you don't know about certain
>counterintuitive properties.

In addition, beware of MISSING VALUES and other commands that take effect immediately. The Data Management book has a good example of it.
COMPUTE somevar = 999.

IF (MISSING(somevar)) othervar = 1.
*EXECUTE.

MISSING VALUES somevar (999).
FREQUENCIES othervar.

IIRC, this would result in ones, unless you use an execute or a procedure.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD