SPSSX Discussion

Computing variables based on multiple rows in a tall-format file

Classic

List

Threaded

24 messages Options

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

"Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-) "

MAGIC!!! Obvious is BORING! I really don't expect novices to get it and could really care less if they do!

Bruce Weaver wrote

I mostly agree with Art on this one, partly because there are some aspects of David's one-liner that are a bit mysterious, especially to novices, I expect. E.g., on Row 1, LAG(SEQ) and LAG(ID) both return SYSMIS. The reason why David's code works is that those SYSMIS values are appearing as arguments to the SUM function, and SUM returns a valid result if at least one argument has a valid value--see the demo below.

Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-)

Re Richard's DO-IF, I prefer to have both conditions that result in SEQ = 1 on a single conditional statement with an OR. And perhaps two separate IF statements would be even easier for novice users to understand. See below.

NEW FILE.
DATASET CLOSE all.
DATA LIST FREE / ID.
BEGIN DATA
1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
END DATA.

* David's one-liner.

COMPUTE SEQ1=SUM(1,LAG(SEQ1)*(LAG(ID) EQ ID)).

* Variation on Richard's DO-IF.
* The two conditions for which SEQ2 = 1 are combined with OR.

DO IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) .
. COMPUTE SEQ2 = 1.
ELSE.
. COMPUTE SEQ2 = LAG(SEQ2) + 1.
END IF.

* Two IF statements.
* This is somewhat less efficient in machine time than DO-IF,
* but possibly easier for the novice user to understand.

IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) SEQ3 = 1.
IF MISSING(SEQ3) SEQ3 = LAG(SEQ3) + 1.

FORMATS ID SEQ1 to SEQ3(f5.0).
LIST.

* The slightly mysterious thing about David's one-liner
* is that the result is not SYSMIS on Row 1, because
* on Row 1, LAG(SEQ1) and LAG(ID) both return SYSMIS.

COMPUTE LagID = LAG(ID).
COMPUTE LagSEQ1 = LAG(SEQ1).
FORMATS LagID LagSEQ1(F5.0).
LIST.

Output:
ID SEQ1 SEQ2 SEQ3 LagID LagSEQ1

1 1 1 1 . .
1 2 2 2 1 1
1 3 3 3 1 2
1 4 4 4 1 3
etc.

* The reason David's code does not return SYSMIS as a
* result is that those missing values appear within
* the SUM function: SUM will return a valid result
* if at lest one of the arguments is valid. If you
* compute a sum using plus signs, on the other hand,
* all variables must be valid.

NEW FILE.
DATASET CLOSE all.
DATA LIST LIST / V1 to V3 (3f1).
BEGIN DATA
1 2 3
1 2 .
1 . .
. . .
END DATA.

COMPUTE SumViaSUM = SUM(V1 to V3).
COMPUTE SumViaPlus = V1 + V2 + V3.
FORMATS SumViaSUM SumViaPlus (F5.0).
LIST.

Output:
V1 V2 V3 SumViaSUM SumViaPlus

1 2 3 6 6
1 2 . 3 .
1 . . 1 .
. . . . .

Number of cases read: 4 Number of cases listed: 4

David Marso wrote

Assumptions of my one liner.
Something is equal to something else or it isn't (true=1 false=0).
Multiplication of X by 0 = 0, by 1 = X
0+1 = 1, X+1 = X+1.
Anybody having a problem with this might ponder their choice of careers or majors?

Art Kendall wrote

<Don flame shields!>
Your solution does require fewer lines and characters in
the syntax, and most likely fewer internal operations.
However, I deny your assertion that the one line of code is
"reasonably intuitive".
It would help beginners to see and understand Rich's solution
and then see that it can be expressed more compactly.

Your solution is "reasonably intuitive" only for people with at
least a moderate amount of experience at some computer
languages.

However, Rich's do repeat syntax is much easier for people to
read.
It is my impression that many posts from this list are from
beginners.
It is also my impression that people searching the archives are
beginners.

The easier it is for people to read and understand the syntax
the easier it is to communicate the process to other people,
e.g., classmates working as peer reviewers, on the job QA
reviewers, triers-of-fact, archive users, process maintainers
and up-daters.

Soapbox: Efficiency in terms of cognitive load trumps saving
storage space for syntax and often trumps saving small amounts
of processing time. Marginal labor cost is a greater
consideration than marginal cost of machine resources.

A rhetorical question: How much computer time would it take to
do David's solution compared to the computer time to do Rich's
solution? How many times of running the same code would it take
to actually get a measurable difference in computer time, e.g.,
10 seconds?

<remove flame shield.>

Art Kendall
Social Research Consultants
On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote:

I don't see why people use that ponderous DO IF
$CASENUM = 1 .... ELSE blah blah blah (7 lines more or less)
approach when a counter can be build with !!!!!

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

DATA LIST FREE / ID.

BEGIN DATA

1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5

END DATA.

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

LIST.

ID SEQ

1.00 1.00
1.00 2.00
1.00 3.00
1.00 4.00
1.00 5.00
1.00 6.00
2.00 1.00
2.00 2.00
2.00 3.00
3.00 1.00
3.00 2.00
3.00 3.00
3.00 4.00
3.00 5.00
3.00 6.00
3.00 7.00
4.00 1.00
4.00 2.00
4.00 3.00
5.00 1.00
5.00 2.00
5.00 3.00
5.00 4.00

Number of cases read: 23 Number of cases listed: 23

Michael
Cohn wrote
I think this
solves all my problems! Many thanks to Andy and Richard for
their help. I wasn't familiar with the LAG and AGGREGATE
functions in SPSS but now I know what to start learning
about.

- Michael

----------------------------------

Michael A. Cohn, PhD

[hidden
email]
Osher Center for Integrative Medicine

University of California, San Francisco

From: "Richard Ristow [via SPSSX Discussion]" < [hidden
email] <mailto: [hidden
email] >>

Date: Wednesday, December 4, 2013 at 14:56

To: Michael Cohn < [hidden
email] <mailto: [hidden
email] >>

Subject: Re: Computing variables based on multiple rows in a
tall-format file

At 03:23 PM 12/4/2013, Michael Cohn wrote:

>My data file is currently in "tall" format (one row per
measurement

>per participant). Is there a way to generate variables
in each

>record that are based on information in that user's
other records?

You'll get many answers; the fact is, all of these are quite
easy.

The code I'm posting (not tested) assumes variables

PcptID -- Participant identifier

Date -- Date stamp

Outcome -- Outcome value

neither of the first two are ever missing; and your file is
sorted in

ascending order on the first two.

>* A sequential index variable based on the record's
datestamp (i.e.,

>number a participant's responses 1, 2, 3... in
chronological order).

Various ways; here's a simple one, using transformation
language:

NUMERIC VisitSeq (F4).

DO IF $CASENUM EQ 1.

. COMPUTE VisitSeq = 1.

ELSE IF PcptID NE LAG(PcptID).

. COMPUTE VisitSeq = 1.

ELSE.

. COMPUTE VisitSeq = LAG(VisitSeq) + 1.

END IF.

>* The length of time between the record and the earliest
record for that

>participant

>* The difference between the outcome variable in the
record and the minimum

>value ever recorded for that participant.

In both cases, start by putting the minimum value for the
participant

in every record for that participant, and then it's easy:

AGGREGATE OUTFILE=* MODE=ADDVARIABLES

/BREAK=PcptID

/Earliest 'Date of earliest record for participant' =
MIN(Date)

/MinOut 'Lowest outcome value for participant' =
MIN(Outcome).

>It's easy to do these using a spreadsheet or a python
script ...

Actually, I think it's probably easier in long ('tall') form
in

native SPSS than in either of those two.

-Best of luck,

Richard

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden
email]</user/SendEmail.jtp?type=node&node=5723438&i=0>
(not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the
command

INFO REFCARD

________________________________

If you reply to this email, your message will be added to
the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
To unsubscribe from Computing variables based on multiple
rows in a tall-format file, click here< http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2NpbS51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=> .

NAML< http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

Please
reply to the list and not to my personal email.

Those desiring my consulting or training services please feel
free to email me.

---

"Nolite dare sanctum canibus neque mittatis margaritas vestras
ante porcos ne forte conculcent eas pedibus suis."

Cum es damnatorum possederunt porcos iens ut salire off
sanguinum cliff in abyssum?"

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Maguin, Eugene

Re: Computing variables based on multiple rows in a tall-format file

One of the things that David's solutions use is what I'd call "nesting" and "recursion". I'm curious if spss has limits on either of those elements. For nesting, I think the question is how many sets of parentheses can be accommodated and for recursion how many times can the same function be called.
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Saturday, December 07, 2013 5:14 PM
To: [hidden email]
Subject: Re: Computing variables based on multiple rows in a tall-format file

"Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-) "

MAGIC!!! Obvious is BORING! I really don't expect novices to get it and could really care less if they do!

Bruce Weaver wrote
> I
/
> mostly
/

> agree with Art on this one, partly because there are some aspects of
> David's one-liner that are a bit mysterious, especially to novices, I
> expect. E.g., on Row 1, LAG(SEQ) and LAG(ID) both return SYSMIS. The
> reason why David's code works is that those SYSMIS values are
> appearing as arguments to the SUM function, and SUM returns a valid
> result if at least one argument has a valid value--see the demo below.
>
> Another somewhat mysterious aspect of David's method is that LAG(SEQ)
> can be used on the right side of the COMPUTE statement that is
> bringing SEQ into existence as a variable. That this can be done may
> not be intuitively obvious! ;-)
>
> Re Richard's DO-IF, I prefer to have both conditions that result in
> SEQ =
> 1 on a single conditional statement with an OR. And perhaps two
> separate IF statements would be even easier for novice users to
> understand. See below.
>
>
> NEW FILE.
> DATASET CLOSE all.
> DATA LIST FREE / ID.
> BEGIN DATA
> 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5 END DATA.
>
> * David's one-liner.
>
> COMPUTE SEQ1=SUM(1,LAG(SEQ1)*(LAG(ID) EQ ID)).
>
> * Variation on Richard's DO-IF.
> * The two conditions for which SEQ2 = 1 are combined with OR.
>
> DO IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) .
> . COMPUTE SEQ2 = 1.
> ELSE.
> . COMPUTE SEQ2 = LAG(SEQ2) + 1.
> END IF.
>
> * Two IF statements.
> * This is somewhat less efficient in machine time than DO-IF,
> * but possibly easier for the novice user to understand.
>
> IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) SEQ3 = 1.
> IF MISSING(SEQ3) SEQ3 = LAG(SEQ3) + 1.
>
> FORMATS ID SEQ1 to SEQ3(f5.0).
> LIST.
>
> * The slightly mysterious thing about David's one-liner
> * is that the result is not SYSMIS on Row 1, because
> * on Row 1, LAG(SEQ1) and LAG(ID) both return SYSMIS.
>
> COMPUTE LagID = LAG(ID).
> COMPUTE LagSEQ1 = LAG(SEQ1).
> FORMATS LagID LagSEQ1(F5.0).
> LIST.
>
> Output:
> ID SEQ1 SEQ2 SEQ3 LagID LagSEQ1
>
> 1 1 1 1 . .
> 1 2 2 2 1 1
> 1 3 3 3 1 2
> 1 4 4 4 1 3
> etc.
>
> * The reason David's code does not return SYSMIS as a
> * result is that those missing values appear within
> * the SUM function: SUM will return a valid result
> * if at lest one of the arguments is valid. If you
> * compute a sum using plus signs, on the other hand,
> * all variables must be valid.
>
> NEW FILE.
> DATASET CLOSE all.
> DATA LIST LIST / V1 to V3 (3f1).
> BEGIN DATA
> 1 2 3
> 1 2 .
> 1 . .
> . . .
> END DATA.
>
> COMPUTE SumViaSUM = SUM(V1 to V3).
> COMPUTE SumViaPlus = V1 + V2 + V3.
> FORMATS SumViaSUM SumViaPlus (F5.0).
> LIST.
>
> Output:
> V1 V2 V3 SumViaSUM SumViaPlus
>
> 1 2 3 6 6
> 1 2 . 3 .
> 1 . . 1 .
> . . . . .
>
> Number of cases read: 4 Number of cases listed: 4
>
> David Marso wrote
>> Assumptions of my one liner.
>> Something is equal to something else or it isn't (true=1 false=0).
>> Multiplication of X by 0 = 0, by 1 = X
>> 0+1 = 1, X+1 = X+1.
>> Anybody having a problem with this might ponder their choice of
>> careers or majors?
>>
>> Art Kendall wrote
>>> <Don flame shields!>
>>> Your solution does require fewer lines and characters in
>>> the syntax, and most likely fewer internal operations.
>>> However, I deny your assertion that the one line of code is
>>> "reasonably intuitive".
>>> It would help beginners to see and understand Rich's solution
>>> and then see that it can be expressed more compactly. 
>>>
>>> Your solution is "reasonably intuitive" only for people with at
>>> least a moderate amount of experience at some computer
>>> languages.
>>>
>>> However, Rich's do repeat syntax is much easier for people to
>>> read.
>>> It is my impression that many posts from this list are from
>>> beginners.
>>> It is also my impression that people searching the archives are
>>> beginners.
>>>
>>>
>>> The easier it is for people to read and understand the syntax
>>> the easier it is to communicate the process to other people,
>>> e.g., classmates working as peer reviewers, on the job QA
>>> reviewers, triers-of-fact, archive users, process maintainers
>>> and up-daters.
>>>
>>> Soapbox: Efficiency in terms of cognitive load trumps saving
>>> storage space for syntax and often trumps saving small amounts
>>> of processing time.  Marginal labor cost is a greater
>>> consideration than marginal cost of machine resources.
>>>
>>> A rhetorical question:  How much computer time would it
>>> take to
>>> do David's solution compared to the computer time to do Rich's
>>> solution?  How many times of running the same code
>>> would it take
>>> to actually get a measurable difference in computer time, e.g.,
>>> 10 seconds?
>>>
>>> <remove flame shield.>
>>>
>>>
>>> Art Kendall
>>> Social Research Consultants
>>> On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote:
>>>
>>> I don't see why people use that ponderous DO IF
>>> $CASENUM = 1 .... ELSE blah blah blah (7 lines more or less)
>>> approach when a counter can be build with !!!!!
>>>
>>>
>>> COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
>>>
>>>
>>> DATA LIST FREE / ID.
>>>
>>> BEGIN DATA
>>>
>>> 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
>>>
>>> END DATA.
>>>
>>> COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
>>>
>>> LIST.
>>>
>>>
>>>       ID      SEQ
>>>  
>>>
>>>     1.00     1.00
>>>     1.00     2.00
>>>     1.00     3.00
>>>     1.00     4.00
>>>     1.00     5.00
>>>     1.00     6.00
>>>     2.00     1.00
>>>     2.00     2.00
>>>     2.00     3.00
>>>     3.00     1.00
>>>     3.00     2.00
>>>     3.00     3.00
>>>     3.00     4.00
>>>     3.00     5.00
>>>     3.00     6.00
>>>     3.00     7.00
>>>     4.00     1.00
>>>     4.00     2.00
>>>     4.00     3.00
>>>     5.00     1.00
>>>     5.00     2.00
>>>     5.00     3.00
>>>     5.00     4.00
>>>  
>>>
>>>  
>>>
>>> Number of cases read:  23    Number of cases
>>> listed:  23
>>>
>>>
>>>
>>> Michael
>>> Cohn wrote
>>> I think this
>>> solves all my problems! Many thanks to Andy and Richard for
>>> their help. I wasn't familiar with the LAG and AGGREGATE
>>> functions in SPSS but now I know what to start learning
>>> about.
>>>
>>>
>>> - Michael
>>>
>>>
>>> ----------------------------------
>>>
>>> Michael A. Cohn, PhD
>>>
>>> [hidden
>>> email]
>>> Osher Center for Integrative Medicine
>>>
>>> University of California, San Francisco
>>>
>>>
>>> From: "Richard Ristow [via SPSSX Discussion]" < [hidden
>>> email] <mailto: [hidden
>>> email] >>
>>>
>>> Date: Wednesday, December 4, 2013 at 14:56
>>>
>>> To: Michael Cohn < [hidden
>>> email] <mailto: [hidden
>>> email] >>
>>>
>>> Subject: Re: Computing variables based on multiple rows in a
>>> tall-format file
>>>
>>>
>>> At 03:23 PM 12/4/2013, Michael Cohn wrote:
>>>
>>>
>>> >My data file is currently in "tall" format (one row per
>>> measurement
>>>
>>> >per participant). Is there a way to generate variables
>>> in each
>>>
>>> >record that are based on information in that user's
>>> other records?
>>>
>>>
>>> You'll get many answers; the fact is, all of these are quite
>>> easy.
>>>
>>> The code I'm posting (not tested) assumes variables
>>>
>>>
>>> PcptID  -- Participant identifier
>>>
>>> Date    -- Date stamp
>>>
>>> Outcome -- Outcome value
>>>
>>>
>>> neither of the first two are ever missing; and your file is
>>> sorted in
>>>
>>> ascending order on the first two.
>>>
>>>
>>> >* A sequential index variable based on the record's
>>> datestamp (i.e.,
>>>
>>> >number a participant's responses 1, 2, 3... in
>>> chronological order).
>>>
>>>
>>> Various ways; here's a simple one, using transformation
>>> language:
>>>
>>>
>>> NUMERIC VisitSeq (F4).
>>>
>>>
>>> DO IF    $CASENUM EQ 1.
>>>
>>> .  COMPUTE VisitSeq = 1.
>>>
>>> ELSE IF  PcptID NE LAG(PcptID).
>>>
>>> .  COMPUTE VisitSeq = 1.
>>>
>>> ELSE.
>>>
>>> .  COMPUTE VisitSeq = LAG(VisitSeq) + 1.
>>>
>>> END IF.
>>>
>>>
>>>
>>> >* The length of time between the record and the earliest
>>> record for that
>>>
>>> >participant
>>>
>>> >* The difference between the outcome variable in the
>>> record and the minimum
>>>
>>> >value ever recorded for that participant.
>>>
>>>
>>> In both cases, start by putting the minimum value for the
>>> participant
>>>
>>> in every record for that participant, and then it's easy:
>>>
>>>
>>> AGGREGATE OUTFILE=* MODE=ADDVARIABLES
>>>
>>>     /BREAK=PcptID
>>>
>>>     /Earliest 'Date of earliest record for
>>> participant' =
>>> MIN(Date)
>>>
>>>     /MinOut   'Lowest outcome value for
>>> participant'    =
>>> MIN(Outcome).
>>>
>>>
>>> >It's easy to do these using a spreadsheet or a python
>>> script ...
>>>
>>>
>>> Actually, I think it's probably easier in long ('tall') form
>>> in
>>>
>>> native SPSS than in either of those two.
>>>
>>>
>>> -Best of luck,
>>>
>>>   Richard
>>>
>>>
>>> =====================
>>>
>>> To manage your subscription to SPSSX-L, send a message
>>> to
>>>
>>> [hidden
>>>
>>> email]</user/SendEmail.jtp?type=node&node=5723438&i=0>
>>> (not to SPSSX-L), with no body text except the
>>>
>>> command. To leave the list, send the command
>>>
>>> SIGNOFF SPSSX-L
>>>
>>> For a list of commands to manage subscriptions, send the
>>> command
>>>
>>> INFO REFCARD
>>>
>>>
>>>
>>> ________________________________
>>>
>>> If you reply to this email, your message will be added to
>>> the discussion below:
>>>
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
>>> To unsubscribe from Computing variables based on multiple
>>> rows in a tall-format file, click here<
>>> http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.j
>>> tp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2Npb
>>> S51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=>
>>> .
>>>
>>> NAML<
>>> http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.j
>>> tp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&
>>> base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.
>>> NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrum
>>> bs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%
>>> 3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>>
>>> Please
>>> reply to the list and not to my personal email.
>>>
>>> Those desiring my consulting or training services please feel
>>> free to email me.
>>>
>>> ---
>>>
>>> "Nolite dare sanctum canibus neque mittatis margaritas vestras
>>> ante porcos ne forte conculcent eas pedibus suis."
>>>
>>> Cum es damnatorum possederunt porcos iens ut salire off
>>> sanguinum cliff in abyssum?"
>>>
>>>
>>>
>>>
>>> If you reply to this email, your
>>> message will be added to the discussion below:
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-ba
>>> sed-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html
>>>
>>>
>>> To start a new topic under SPSSX Discussion, email
>>>

>>> ml-node+s1045642n1068821h68@.nabble

>>>
>>> To unsubscribe from SPSSX Discussion, click
>>> here .
>>> NAML

-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723500.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

I'm pretty sure the notion of recursion doesn't apply.
I can't think of where recursion would fit into standard SPSS syntax since aside from python SPSS doesn't really have functions except in certain uses of macro (but then I think of recursion as a primarily mathematical concept and MACRO doesn't know anything about math). I would be very interested in seeing a recursive method implemented in SPSS particularly in a macro.
An example of recursion is say the Fibonacci sequence 1 1 2 3 5 8 13 21 ..
such that Fib(n)=Fib(n-1)+Fib(n-2)
Fib(1)=Fib(2)=1
or Factorial(N)=Factorial(N-1)*N

I do however use nested functions to a great degree in my coding.
I have yet to encounter any sort of misbehavior in standard syntax.
MATRIX does have a nesting limit of 6 .

If you blow past that you are rewarded with the following useless directive (useless because technical support will likely not be of any help whatsoever since the message is in all likelihood on the money ;-)
>Error # 12540
>Parser stack overflow. The possible reason: nesting level is more than 6. A
>program error has occurred: Please note the circumstances under which this
>error occurred, attempting to replicate it if possible, and then notify
>Technical Support.
>Execution of this command stops.

The MATRIX statement skipped.
This resulted from the following monstrosity which I doubt I would ever push
MATRIX.
COMPUTE X=T({1,1,1,1,1;1,2,3,4,5;6,7,8,9,11}).
COMPUTE Boogie1=DET(DET(INV(DET(INV(T(X)*X))*T(X)*X))*T(X)*X).
PRINT Boogie1.
COMPUTE Boogie2=DET(DET(DET(INV(DET(INV(T(X)*X))*T(X)*X))*T(X)*X)*T(X)*X).
PRINT Boogie2.
END MATRIX.

Maguin, Eugene wrote

One of the things that David's solutions use is what I'd call "nesting" and "recursion". I'm curious if spss has limits on either of those elements. For nesting, I think the question is how many sets of parentheses can be accommodated and for recursion how many times can the same function be called.
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Saturday, December 07, 2013 5:14 PM
To: [hidden email]
Subject: Re: Computing variables based on multiple rows in a tall-format file

"Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-) "

MAGIC!!! Obvious is BORING! I really don't expect novices to get it and could really care less if they do!

Bruce Weaver wrote
> I
/
> mostly
/
> agree with Art on this one, partly because there are some aspects of
> David's one-liner that are a bit mysterious, especially to novices, I
> expect. E.g., on Row 1, LAG(SEQ) and LAG(ID) both return SYSMIS. The
> reason why David's code works is that those SYSMIS values are
> appearing as arguments to the SUM function, and SUM returns a valid
> result if at least one argument has a valid value--see the demo below.
>
> Another somewhat mysterious aspect of David's method is that LAG(SEQ)
> can be used on the right side of the COMPUTE statement that is
> bringing SEQ into existence as a variable. That this can be done may
> not be intuitively obvious! ;-)
>
> Re Richard's DO-IF, I prefer to have both conditions that result in
> SEQ =
> 1 on a single conditional statement with an OR. And perhaps two
> separate IF statements would be even easier for novice users to
> understand. See below.
>
>
> NEW FILE.
> DATASET CLOSE all.
> DATA LIST FREE / ID.
> BEGIN DATA
> 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5 END DATA.
>
> * David's one-liner.
>
> COMPUTE SEQ1=SUM(1,LAG(SEQ1)*(LAG(ID) EQ ID)).
>
> * Variation on Richard's DO-IF.
> * The two conditions for which SEQ2 = 1 are combined with OR.
>
> DO IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) .
> . COMPUTE SEQ2 = 1.
> ELSE.
> . COMPUTE SEQ2 = LAG(SEQ2) + 1.
> END IF.
>
> * Two IF statements.
> * This is somewhat less efficient in machine time than DO-IF,
> * but possibly easier for the novice user to understand.
>
> IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) SEQ3 = 1.
> IF MISSING(SEQ3) SEQ3 = LAG(SEQ3) + 1.
>
> FORMATS ID SEQ1 to SEQ3(f5.0).
> LIST.
>
> * The slightly mysterious thing about David's one-liner
> * is that the result is not SYSMIS on Row 1, because
> * on Row 1, LAG(SEQ1) and LAG(ID) both return SYSMIS.
>
> COMPUTE LagID = LAG(ID).
> COMPUTE LagSEQ1 = LAG(SEQ1).
> FORMATS LagID LagSEQ1(F5.0).
> LIST.
>
> Output:
> ID SEQ1 SEQ2 SEQ3 LagID LagSEQ1
>
> 1 1 1 1 . .
> 1 2 2 2 1 1
> 1 3 3 3 1 2
> 1 4 4 4 1 3
> etc.
>
> * The reason David's code does not return SYSMIS as a
> * result is that those missing values appear within
> * the SUM function: SUM will return a valid result
> * if at lest one of the arguments is valid. If you
> * compute a sum using plus signs, on the other hand,
> * all variables must be valid.
>
> NEW FILE.
> DATASET CLOSE all.
> DATA LIST LIST / V1 to V3 (3f1).
> BEGIN DATA
> 1 2 3
> 1 2 .
> 1 . .
> . . .
> END DATA.
>
> COMPUTE SumViaSUM = SUM(V1 to V3).
> COMPUTE SumViaPlus = V1 + V2 + V3.
> FORMATS SumViaSUM SumViaPlus (F5.0).
> LIST.
>
> Output:
> V1 V2 V3 SumViaSUM SumViaPlus
>
> 1 2 3 6 6
> 1 2 . 3 .
> 1 . . 1 .
> . . . . .
>
> Number of cases read: 4 Number of cases listed: 4
>
> David Marso wrote
>> Assumptions of my one liner.
>> Something is equal to something else or it isn't (true=1 false=0).
>> Multiplication of X by 0 = 0, by 1 = X
>> 0+1 = 1, X+1 = X+1.
>> Anybody having a problem with this might ponder their choice of
>> careers or majors?
>>
>> Art Kendall wrote
>>> <Don flame shields!>
>>> Your solution does require fewer lines and characters in
>>> the syntax, and most likely fewer internal operations.
>>> However, I deny your assertion that the one line of code is
>>> "reasonably intuitive".
>>> It would help beginners to see and understand Rich's solution
>>> and then see that it can be expressed more compactly.
>>>
>>> Your solution is "reasonably intuitive" only for people with at
>>> least a moderate amount of experience at some computer
>>> languages.
>>>
>>> However, Rich's do repeat syntax is much easier for people to
>>> read.
>>> It is my impression that many posts from this list are from
>>> beginners.
>>> It is also my impression that people searching the archives are
>>> beginners.
>>>
>>>
>>> The easier it is for people to read and understand the syntax
>>> the easier it is to communicate the process to other people,
>>> e.g., classmates working as peer reviewers, on the job QA
>>> reviewers, triers-of-fact, archive users, process maintainers
>>> and up-daters.
>>>
>>> Soapbox: Efficiency in terms of cognitive load trumps saving
>>> storage space for syntax and often trumps saving small amounts
>>> of processing time. Marginal labor cost is a greater
>>> consideration than marginal cost of machine resources.
>>>
>>> A rhetorical question: How much computer time would it
>>> take to
>>> do David's solution compared to the computer time to do Rich's
>>> solution? How many times of running the same code
>>> would it take
>>> to actually get a measurable difference in computer time, e.g.,
>>> 10 seconds?
>>>
>>> <remove flame shield.>
>>>
>>>
>>> Art Kendall
>>> Social Research Consultants
>>> On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote:
>>>
>>> I don't see why people use that ponderous DO IF
>>> $CASENUM = 1 .... ELSE blah blah blah (7 lines more or less)
>>> approach when a counter can be build with !!!!!
>>>
>>>
>>> COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
>>>
>>>
>>> DATA LIST FREE / ID.
>>>
>>> BEGIN DATA
>>>
>>> 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
>>>
>>> END DATA.
>>>
>>> COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
>>>
>>> LIST.
>>>
>>>
>>> ID SEQ
>>>
>>>
>>> 1.00 1.00
>>> 1.00 2.00
>>> 1.00 3.00
>>> 1.00 4.00
>>> 1.00 5.00
>>> 1.00 6.00
>>> 2.00 1.00
>>> 2.00 2.00
>>> 2.00 3.00
>>> 3.00 1.00
>>> 3.00 2.00
>>> 3.00 3.00
>>> 3.00 4.00
>>> 3.00 5.00
>>> 3.00 6.00
>>> 3.00 7.00
>>> 4.00 1.00
>>> 4.00 2.00
>>> 4.00 3.00
>>> 5.00 1.00
>>> 5.00 2.00
>>> 5.00 3.00
>>> 5.00 4.00
>>>
>>>
>>>
>>>
>>> Number of cases read: 23 Number of cases
>>> listed: 23
>>>
>>>
>>>
>>> Michael
>>> Cohn wrote
>>> I think this
>>> solves all my problems! Many thanks to Andy and Richard for
>>> their help. I wasn't familiar with the LAG and AGGREGATE
>>> functions in SPSS but now I know what to start learning
>>> about.
>>>
>>>
>>> - Michael
>>>
>>>
>>> ----------------------------------
>>>
>>> Michael A. Cohn, PhD
>>>
>>> [hidden
>>> email]
>>> Osher Center for Integrative Medicine
>>>
>>> University of California, San Francisco
>>>
>>>
>>> From: "Richard Ristow [via SPSSX Discussion]" < [hidden
>>> email] <mailto: [hidden
>>> email] >>
>>>
>>> Date: Wednesday, December 4, 2013 at 14:56
>>>
>>> To: Michael Cohn < [hidden
>>> email] <mailto: [hidden
>>> email] >>
>>>
>>> Subject: Re: Computing variables based on multiple rows in a
>>> tall-format file
>>>
>>>
>>> At 03:23 PM 12/4/2013, Michael Cohn wrote:
>>>
>>>
>>> >My data file is currently in "tall" format (one row per
>>> measurement
>>>
>>> >per participant). Is there a way to generate variables
>>> in each
>>>
>>> >record that are based on information in that user's
>>> other records?
>>>
>>>
>>> You'll get many answers; the fact is, all of these are quite
>>> easy.
>>>
>>> The code I'm posting (not tested) assumes variables
>>>
>>>
>>> PcptID -- Participant identifier
>>>
>>> Date -- Date stamp
>>>
>>> Outcome -- Outcome value
>>>
>>>
>>> neither of the first two are ever missing; and your file is
>>> sorted in
>>>
>>> ascending order on the first two.
>>>
>>>
>>> >* A sequential index variable based on the record's
>>> datestamp (i.e.,
>>>
>>> >number a participant's responses 1, 2, 3... in
>>> chronological order).
>>>
>>>
>>> Various ways; here's a simple one, using transformation
>>> language:
>>>
>>>
>>> NUMERIC VisitSeq (F4).
>>>
>>>
>>> DO IF $CASENUM EQ 1.
>>>
>>> . COMPUTE VisitSeq = 1.
>>>
>>> ELSE IF PcptID NE LAG(PcptID).
>>>
>>> . COMPUTE VisitSeq = 1.
>>>
>>> ELSE.
>>>
>>> . COMPUTE VisitSeq = LAG(VisitSeq) + 1.
>>>
>>> END IF.
>>>
>>>
>>>
>>> >* The length of time between the record and the earliest
>>> record for that
>>>
>>> >participant
>>>
>>> >* The difference between the outcome variable in the
>>> record and the minimum
>>>
>>> >value ever recorded for that participant.
>>>
>>>
>>> In both cases, start by putting the minimum value for the
>>> participant
>>>
>>> in every record for that participant, and then it's easy:
>>>
>>>
>>> AGGREGATE OUTFILE=* MODE=ADDVARIABLES
>>>
>>> /BREAK=PcptID
>>>
>>> /Earliest 'Date of earliest record for
>>> participant' =
>>> MIN(Date)
>>>
>>> /MinOut 'Lowest outcome value for
>>> participant' =
>>> MIN(Outcome).
>>>
>>>
>>> >It's easy to do these using a spreadsheet or a python
>>> script ...
>>>
>>>
>>> Actually, I think it's probably easier in long ('tall') form
>>> in
>>>
>>> native SPSS than in either of those two.
>>>
>>>
>>> -Best of luck,
>>>
>>> Richard
>>>
>>>
>>> =====================
>>>
>>> To manage your subscription to SPSSX-L, send a message
>>> to
>>>
>>> [hidden
>>>
>>> email]</user/SendEmail.jtp?type=node&node=5723438&i=0>
>>> (not to SPSSX-L), with no body text except the
>>>
>>> command. To leave the list, send the command
>>>
>>> SIGNOFF SPSSX-L
>>>
>>> For a list of commands to manage subscriptions, send the
>>> command
>>>
>>> INFO REFCARD
>>>
>>>
>>>
>>> ________________________________
>>>
>>> If you reply to this email, your message will be added to
>>> the discussion below:
>>>
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
>>> To unsubscribe from Computing variables based on multiple
>>> rows in a tall-format file, click here<
>>> http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.j
>>> tp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2Npb
>>> S51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=>
>>> .
>>>
>>> NAML<
>>> http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.j
>>> tp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&
>>> base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.
>>> NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrum
>>> bs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%
>>> 3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>>
>>> Please
>>> reply to the list and not to my personal email.
>>>
>>> Those desiring my consulting or training services please feel
>>> free to email me.
>>>
>>> ---
>>>
>>> "Nolite dare sanctum canibus neque mittatis margaritas vestras
>>> ante porcos ne forte conculcent eas pedibus suis."
>>>
>>> Cum es damnatorum possederunt porcos iens ut salire off
>>> sanguinum cliff in abyssum?"
>>>
>>>
>>>
>>>
>>> If you reply to this email, your
>>> message will be added to the discussion below:
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-ba
>>> sed-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html
>>>
>>>
>>> To start a new topic under SPSSX Discussion, email
>>>

>>> ml-node+s1045642n1068821h68@.nabble

>>>
>>> To unsubscribe from SPSSX Discussion, click
>>> here .
>>> NAML

-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723500.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Recursion with Macro ;-)

In reply to this post by Maguin, Eugene

Dear Gene,

At 10:06 AM 12/9/2013, you wrote (in thread "Computing variables
based on multiple rows in a tall-format file"):

>One of the things that David's solutions use is what I'd call
>"nesting" and "recursion".

You did spark quite a discussion, with some real illumination on the
working of macros -- I hope you've been enjoying it.

I would never have imagined that the macro system's memory management
was sophisticated enough to allow true recursive macros. Nor that
they'd be as useful as it looks like they are.

Happy New Year!

-Best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD