Re: Lag Function and Some Potentially Useful Syntax for a Vertically Organized File

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Lag Function and Some Potentially Useful Syntax for a Vertically Organized File

Gregory Hildebrandt
SPSS List Members::

One issue that  caused me difficult until I received help from a database programmer is how to lag a variable in a vertically organized file. I have provided some syntax below that can be used, which some may find helpful..

For example, suppose the "Restructure Data" command may have been used to convert variables to cases, in order to permit the linear mixed model  to be employed.  This creates groups of cases, and it is not appropriate to directly use the lag command when computing the lagged value of the first member of a group.  If this is done, the the last member of the prior group will be the computed lagged variable, and this is not correct.  For the lagged value of the first member of a "group: "system missing" is the appropriate lagged value.

For purposes of a slight generalization of the method, assume that time period, T, is a string variable.  Restructure data will also properly sort the key hierarchical variables, but I will add the sort command for completeness.

In this hierarchical, cross-section and time series vertically organized file, the variables School, S, Pupil, P, and the remaining variables are vertically organized. If X representing a particular variable in the file, which one desires to lag, each data point for this variable could be represented as Xspt, where s is a particular school, p is a particular student within school s, and t is the time period in which measurement occurs for a school, student combination.  (Perhaps, notation could be sharper).  The objective is to lag Xspt to obtain a series Xsp_l1 that returns a missing data point for the last member of the previous S, P, T combination.

Here's the syntax:

*step1.
GET    FILE='....sav'.

*step2.  
Create a  numeric T variable.

COMPUTE T1=NUMBER(T,F4.0).
execute.
.
*step 3.
Sort the data by S, P, and T.

*step 4 -- Compute Lag Xsp_l1.
DO IF S=LAG(S,1) AND    P=LAG(P,1) AND T=( LAG(T,1) + 1 ) .
              COMPUTE   Xspt_l1 =   LAG(Xspt).
                            END IF.
execute.

Having used this syntax many times, I know it works.  However, I still don't fully understand the meaning of  T = (LAG(T,1) + 1), as it relates to this procedure, and interpretations would be appreciated. 

After several year of SPSS use, I've never gotten beyond point, click, paste and edit, and with the exception of a few specialized syntax commands, such as the above, this approach has met my requirements. Point, click and paste brings in the correct syntax, whereas writing syntax from scratch inevitably introduces a few typographical errors (even if one knows how to write the appropriate syntax) that can be hard to find. And, it is straightforward to identify how to edit the pasted syntax, if only changes in the model's variables are made.

A number of years ago I took SPSS Tables from a world class SPSS Tables constructor/instructor who used syntax throughout the course to develop "Basic Tables."  I was lost a good part of the time.  However, when I took the course several years later, from the same instructor, who was then teaching "Custom Tables," even he was using point and click.  He said he was not yet comfortable with the new syntax, and the course became quite easy to follow.
 
Greg H



On Wed, Oct 5, 2011 at 2:24 AM, John F Hall <[hidden email]> wrote:

Here’s what it says on pp111,112 of the syntax reference guide

 

 

John F Hall

 

[hidden email]

www.surveyresearch.weebly.com

 

 

 

 

LAG function

LAG. LAG(variable[, n]). Numeric or string. The value of variable in the previous case or n cases

before. The optional second argument, n, must be a positive integer; the default is 1. For example,

prev4=LAG(gnp,4) returns the value of gnp for the fourth case before the current one. The first

four cases have system-missing values for prev4.

􀂄 The result is of the same type (numeric or string) as the variable specified as the first argument.

112

Universals

􀂄 The first n cases for string variables are set to blanks. For example, if PREV2=LAG

(LNAME,2) is specified, blanks will be assigned to the first two cases for PREV2.

􀂄 When LAG is used with commands that select cases (for example, SELECT IF and SAMPLE),

LAG counts cases after case selection, even if specified before these commands. For more

information, see the topic Command Order on p. 41.

Note: In a series of transformation commands without any intervening EXECUTE commands or

other commands that read the data, lag functions are calculated after all other transformations,

regardless of command order. For example,

COMPUTE lagvar=LAG(var1).

COMPUTE var1=var1*2.

and

COMPUTE lagvar=LAG(var1).

EXECUTE.

COMPUTE var1=var1*2.

yield very different results for the value of lagvar, since the former uses the transformed value of

var1 while the latter uses the original value.

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of joan casellas
Sent: 05 October 2011 10:13
To: [hidden email]
Subject: Lag Function

 

Hi Everyone,

 

Could someone explain to me what the LAG function does exactly?

 

 

 

Joan

Media Research Analyst

Phone: <a href="tel:%2B44%2020%207593%201585" value="+442075931585" target="_blank">+44 20 7593 1585                                     

 


Reply | Threaded
Open this post in threaded view
|

Re: Lag Function and Some Potentially Useless Syntax for a Vertically Organized File

David Marso
Administrator
Step 1: Remove EXECUTE from your syntax!
Step 2: Examine your data.
Step 3: Don't use strings to represent numeric data.
Step 4: Get rid of unnecessary subscript on LAG (1 is default, hence not required).
Step 5: DO IF is only required if multiple transformations are required for a given conditional.

----
" However, I still don't
fully understand the meaning of  T = (LAG(T,1) + 1)"

Because you have not done step 2.  This is placing rather stringent requirements on any success of "your" method.  It is assuming that T is followed by value T+1 in the file.
Revised: Assuming that T is NUMERIC as it *should* be and you already have the sorted file active.
----
IF S EQ LAG(S) AND  P EQ LAG(P)  Xspt_LAG_1 =   LAG(Xspt).
----
My 2 cents re PCP and syntax:
Point/Click/Paste is fine for generating templates, however general reliance on such promises to result in a new generation of barely competent analysts!
---
WTF:  How are you to document what the hell you did to achieve a given result?
I could go on, but that would turn into a f'ing RANT!
--


Gregory Hildebrandt wrote
SPSS List Members::

One issue that  caused me difficult until I received help from a database
programmer is how to lag a variable in a vertically organized file. I have
provided some syntax below that can be used, which some may find helpful..

For example, suppose the "Restructure Data" command may have been used to
convert variables to cases, in order to permit the linear mixed model  to be
employed.  This creates groups of cases, and it is not appropriate to
directly use the lag command when computing the lagged value of the first
member of a group.  If this is done, the the last member of the prior group
will be the computed lagged variable, and this is not correct.  For the
lagged value of the first member of a "group: "system missing" is the
appropriate lagged value.

For purposes of a slight generalization of the method, assume that time
period, T, is a string variable.  Restructure data will also properly sort
the key hierarchical variables, but I will add the sort command for
completeness.

In this hierarchical, cross-section and time series vertically organized
file, the variables School, S, Pupil, P, and the remaining variables are
vertically organized. If X representing a particular variable in the file,
which one desires to lag, each data point for this variable could be
represented as Xspt, where s is a particular school, p is a particular
student within school s, and t is the time period in which measurement
occurs for a school, student combination.  (Perhaps, notation could be
sharper).  The objective is to lag Xspt to obtain a series Xsp_l1 that
returns a missing data point for the last member of the previous S, P, T
combination.

Here's the syntax:

*step1.
GET    FILE='....sav'.

*step2.
Create a  numeric T variable.

COMPUTE T1=NUMBER(T,F4.0).
execute.
.
*step 3.
Sort the data by S, P, and T.

*step 4 -- Compute Lag Xsp_l1.
DO IF S=LAG(S,1) AND    P=LAG(P,1) AND T=( LAG(T,1) + 1 ) .
              COMPUTE   Xspt_l1 =   LAG(Xspt).
                            END IF.
execute.

Having used this syntax many times, I know it works.  However, I still don't
fully understand the meaning of  T = (LAG(T,1) + 1), as it relates to this
procedure, and interpretations would be appreciated.

After several year of SPSS use, I've never gotten beyond point, click, paste
and edit, and with the exception of a few specialized syntax commands, such
as the above, this approach has met my requirements. Point, click and paste
brings in the correct syntax, whereas writing syntax from scratch inevitably
introduces a few typographical errors (even if one knows how to write the
appropriate syntax) that can be hard to find. And, it is straightforward to
identify how to edit the pasted syntax, if only changes in the model's
variables are made.

A number of years ago I took SPSS Tables from a world class SPSS Tables
constructor/instructor who used syntax throughout the course to develop
"Basic Tables."  I was lost a good part of the time.  However, when I took
the course several years later, from the same instructor, who was then
teaching "Custom Tables," even he was using point and click.  He said he was
not yet comfortable with the new syntax, and the course became quite easy to
follow.

Greg H



On Wed, Oct 5, 2011 at 2:24 AM, John F Hall <[hidden email]> wrote:

> Here’s what it says on pp111,112 of the syntax reference guide****
>
> ** **
>
> ** **
>
> John F Hall****
>
> ** **
>
> [hidden email] ****
>
> www.surveyresearch.weebly.com <http://surveyresearch.weebly.com/>****
>
> ****
>
> ** **
>
> ** **
>
> ** **
>
> * *
>
> *LAG function*
>
> *LAG. *LAG(variable[, n]). Numeric or string. The value of variable in the
> previous case or n cases****
>
> before. The optional second argument, n, must be a positive integer; the
> default is 1. For example,****
>
> prev4=LAG(gnp,4) returns the value of gnp for the fourth case before the
> current one. The first****
>
> four cases have system-missing values for prev4.****
>
> 􀂄 The result is of the same type (numeric or string) as the variable
> specified as the first argument.****
>
> 112****
>
> *Universals*
>
> 􀂄 The first *n *cases for string variables are set to blanks. For
> example, if PREV2=LAG****
>
> (LNAME,2) is specified, blanks will be assigned to the first two cases for
> *PREV2*.****
>
> 􀂄 When LAG is used with commands that select cases (for example, SELECT
> IF and SAMPLE),****
>
> LAG counts cases *after *case selection, even if specified before these
> commands. For more****
>
> information, see the topic Command Order on p. 41.****
>
> *Note*: In a series of transformation commands without any intervening EXECUTE
> commands or****
>
> other commands that read the data, lag functions are calculated after all
> other transformations,****
>
> regardless of command order. For example,****
>
> COMPUTE lagvar=LAG(var1).****
>
> COMPUTE var1=var1*2.****
>
> and****
>
> COMPUTE lagvar=LAG(var1).****
>
> EXECUTE.****
>
> COMPUTE var1=var1*2.****
>
> yield very different results for the value of *lagvar*, since the former
> uses the transformed value of****
>
> *var1 *while the latter uses the original value.****
>
> ** **
>
> *From:* SPSSX(r) Discussion [mailto:[hidden email]] *On Behalf
> Of *joan casellas
> *Sent:* 05 October 2011 10:13
> *To:* [hidden email]
> *Subject:* Lag Function****
>
> ** **
>
> Hi Everyone,****
>
> ** **
>
> Could someone explain to me what the LAG function does exactly? ****
>
> ** **
>
> * *
>
> * *
>
> *Joan *
>
> *Media Research Analyst*
>
> Phone: *+44 20 7593 1585*                                      ****
>
> ** **
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Lag Function and Some Potentially Useful Syntax for a Vertically Organized File

Art Kendall
In reply to this post by Gregory Hildebrandt
If you put
ELSE
COMPUTE� � Xspt_l1 =� � -9999999.
in your DO IF loop
you can avoid the SYSMIS.� You know why the value is missing.

MISSING VALUES� � � Xspt_l1 (lo thru -9999999).
VALUE LABELS Xspt_l1
� -9999999 'undefined first time in a case'.

Just use some missing value that is outside the legitimate range of the variable.

If I am reading correctly the syntax is checking to see whether it has the next sequential value for time.
it would be a more concise version of

compute� sameschool = s eq lag(s).
compute samepupil = p eq lag(p).
compute nextseqtime = t eq lag(t+1).
do if sameschool and samepupil and nextseqtime.
� COMPUTE� � Xspt_l1 =� � LAG(Xspt).
ELSE
COMPUTE� � Xspt_l1 =� � -9999999.
END IF.
MISSING VALUES� � � Xspt_l1 (lo thru -9999999).
VALUE LABELS Xspt_l1
� -9999999 'undefined because first� '.

Art Kendall
Social Research Consultants


On 10/27/2011 3:50 AM, Gregory Hildebrandt wrote:
SPSS List Members::

One issue that� caused me difficult until I received help from a database programmer is how to lag a variable in a vertically organized file. I have provided some syntax below that can be used, which some may find helpful..

For example, suppose the "Restructure Data" command may have been used to convert variables to cases, in order to permit the linear mixed model� to be employed.� This creates groups of cases, and it is not appropriate to directly use the lag command when computing the lagged value of the first member of a group.� If this is done, the the last member of the prior group will be the computed lagged variable, and this is not correct.� For the lagged value of the first member of a "group: "system missing" is the appropriate lagged value.

For purposes of a slight generalization of the method, assume that time period, T, is a string variable.� Restructure data will also properly sort the key hierarchical variables, but I will add the sort command for completeness.

In this hierarchical, cross-section and time series vertically organized file, the variables School, S, Pupil, P, and the remaining variables are vertically organized. If X representing a particular variable in the file, which one desires to lag, each data point for this variable could be represented as Xspt, where s is a particular school, p is a particular student within school s, and t is the time period in which measurement occurs for a school, student combination.� (Perhaps, notation could be sharper).� The objective is to lag Xspt to obtain a series Xsp_l1 that returns a missing data point for the last member of the previous S, P, T combination.

Here's the syntax:

*step1.
GET� � � FILE='....sav'.

*step2. �
Create a� numeric T variable.

COMPUTE T1=NUMBER(T,F4.0).
execute.
.
*step 3.
Sort the data by S, P, and T.

*step 4 -- Compute Lag Xsp_l1.
DO IF S=LAG(S,1) AND� � � P=LAG(P,1) AND T=( LAG(T,1) + 1 ) .
� � � � � � � � � � � � � COMPUTE� � Xspt_l1 =� � LAG(Xspt).
� � � � � � � � � � � � � � � � � � � � � � � � � � � END IF.
execute.

Having used this syntax many times, I know it works.� However, I still don't fully understand the meaning of� T = (LAG(T,1) + 1), as it relates to this procedure, and interpretations would be appreciated.�

After several year of SPSS use, I've never gotten beyond point, click, paste and edit, and with the exception of a few specialized syntax commands, such as the above, this approach has met my requirements. Point, click and paste brings in the correct syntax, whereas writing syntax from scratch inevitably introduces a few typographical errors (even if one knows how to write the appropriate syntax) that can be hard to find. And, it is straightforward to identify how to edit the pasted syntax, if only changes in the model's variables are made.

A number of years ago I took SPSS Tables from a world class SPSS Tables constructor/instructor who used syntax throughout the course to develop "Basic Tables."� I was lost a good part of the time.� However, when I took the course several years later, from the same instructor, who was then teaching "Custom Tables," even he was using point and click.� He said he was not yet comfortable with the new syntax, and the course became quite easy to follow.

Greg H



On Wed, Oct 5, 2011 at 2:24 AM, John F Hall <[hidden email]> wrote:

Here’s what it says on pp111,112 of the syntax reference guide

John F Hall

[hidden email]

www.surveyresearch.weebly.com

LAG function

LAG. LAG(variable[, n]). Numeric or string. The value of variable in the previous case or n cases

before. The optional second argument, n, must be a positive integer; the default is 1. For example,

prev4=LAG(gnp,4) returns the value of gnp for the fourth case before the current one. The first

four cases have system-missing values for prev4.

􀂄 The result is of the same type (numeric or string) as the variable specified as the first argument.

112

Universals

􀂄 The first n cases for string variables are set to blanks. For example, if PREV2=LAG

(LNAME,2) is specified, blanks will be assigned to the first two cases for PREV2.

􀂄 When LAG is used with commands that select cases (for example, SELECT IF and SAMPLE),

LAG counts cases after case selection, even if specified before these commands. For more

information, see the topic Command Order on p. 41.

Note: In a series of transformation commands without any intervening EXECUTE commands or

other commands that read the data, lag functions are calculated after all other transformations,

regardless of command order. For example,

COMPUTE lagvar=LAG(var1).

COMPUTE var1=var1*2.

and

COMPUTE lagvar=LAG(var1).

EXECUTE.

COMPUTE var1=var1*2.

yield very different results for the value of lagvar, since the former uses the transformed value of

var1 while the latter uses the original value.

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of joan casellas
Sent: 05 October 2011 10:13
To: [hidden email]
Subject: Lag Function

Hi Everyone,

Could someone explain to me what the LAG function does exactly?

Joan

Media Research Analyst

Phone: <a moz-do-not-send="true" href="tel:%2B44%2020%207593%201585" value="+442075931585" target="_blank">+44 20 7593 1585� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants