SPSSX Discussion

Computing variables based on multiple rows in a tall-format file

Classic

List

Threaded

24 messages Options

Michael Cohn

Computing variables based on multiple rows in a tall-format file

I'm analyzing a repeated measures dataset in which each participant was measured between 1 and 8 times, at essentially random intervals. My main analysis is a linear mixed model, so my data file is currently in "tall" format (one row per measurement per participant).

Is there a way to generate variables in each record that are based on information in that user's other records? For example:

* A sequential index variable based on the record's datestamp (i.e., number a participant's responses 1, 2, 3... in chronological order).

* The length of time between the record and the earliest record for that participant

* The difference between the outcome variable in the record and the minimum value ever recorded for that participant.

It's easy to do these using a spreadsheet or a python script, or by switching to a wide-format file and back. But those are cumbersome and error-prone, and I'd have to do it repeatedly, since we periodically add new records to the dataset. Can SPSS do this kind of thing natively, in the current file format?

Many thanks,

- Michael

Andy W

Re: Computing variables based on multiple rows in a tall-format file

Sequential Index: Can be done use lags with something like below (with "Id" being a variable that uniquely identifies each participant and "Time" being the variable that orders the participant observations in time).

SORT CASES by Id Time.
DO IF $casenum = 1 OR Id <> LAG(Id).
COMPUTE SeqInd = 1.
ELSE.
COMPUTE SeqInd = LAG(SeqInd).
END IF.

(see http://andrewpwheeler.wordpress.com/2013/02/18/using-sequential-case-processing-for-data-management-in-spss/ for related)

Length of time since earliest record & difference between min value: See the AGGREGATE command, for both you would calculate the MIN using ID as the breaking group. (Then just calculate a second variable for the differences.) Taking group MEAN differences is a common procedure as well.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Andy W

Re: Computing variables based on multiple rows in a tall-format file

Whoops the sequential id should be below (forgot the plus 1)

SORT CASES by Id Time.
DO IF $casenum = 1 OR Id <> LAG(Id).
COMPUTE SeqInd = 1.
ELSE.
COMPUTE SeqInd = LAG(SeqInd) + 1.
END IF.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Richard Ristow

Re: Computing variables based on multiple rows in a tall-format file

In reply to this post by Michael Cohn

At 03:23 PM 12/4/2013, Michael Cohn wrote:

>My data file is currently in "tall" format (one row per measurement
>per participant). Is there a way to generate variables in each
>record that are based on information in that user's other records?

You'll get many answers; the fact is, all of these are quite easy.
The code I'm posting (not tested) assumes variables

PcptID -- Participant identifier
Date -- Date stamp
Outcome -- Outcome value

neither of the first two are ever missing; and your file is sorted in
ascending order on the first two.

>* A sequential index variable based on the record's datestamp (i.e.,
>number a participant's responses 1, 2, 3... in chronological order).

Various ways; here's a simple one, using transformation language:

NUMERIC VisitSeq (F4).

DO IF $CASENUM EQ 1.
. COMPUTE VisitSeq = 1.
ELSE IF PcptID NE LAG(PcptID).
. COMPUTE VisitSeq = 1.
ELSE.
. COMPUTE VisitSeq = LAG(VisitSeq) + 1.
END IF.

>* The length of time between the record and the earliest record for that
>participant
>* The difference between the outcome variable in the record and the minimum
>value ever recorded for that participant.

In both cases, start by putting the minimum value for the participant
in every record for that participant, and then it's easy:

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=PcptID
/Earliest 'Date of earliest record for participant' = MIN(Date)
/MinOut 'Lowest outcome value for participant' = MIN(Outcome).

>It's easy to do these using a spreadsheet or a python script ...

Actually, I think it's probably easier in long ('tall') form in
native SPSS than in either of those two.

-Best of luck,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Michael Cohn

Re: Computing variables based on multiple rows in a tall-format file

I think this solves all my problems! Many thanks to Andy and Richard for their help. I wasn't familiar with the LAG and AGGREGATE functions in SPSS but now I know what to start learning about.

- Michael

----------------------------------

Michael A. Cohn, PhD

[hidden email]

Osher Center for Integrative Medicine

University of California, San Francisco

From: "Richard Ristow [via SPSSX Discussion]" <[hidden email]>
Date: Wednesday, December 4, 2013 at 14:56
To: Michael Cohn <[hidden email]>
Subject: Re: Computing variables based on multiple rows in a tall-format file

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html

To unsubscribe from Computing variables based on multiple rows in a tall-format file, click here.
NAML

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

I don't see why people use that ponderous DO IF $CASENUM = 1 .... ELSE blah blah blah (7 lines more or less) approach when a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

DATA LIST FREE / ID.
BEGIN DATA
1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
END DATA.
COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
LIST.

ID SEQ

1.00 1.00
1.00 2.00
1.00 3.00
1.00 4.00
1.00 5.00
1.00 6.00
2.00 1.00
2.00 2.00
2.00 3.00
3.00 1.00
3.00 2.00
3.00 3.00
3.00 4.00
3.00 5.00
3.00 6.00
3.00 7.00
4.00 1.00
4.00 2.00
4.00 3.00
5.00 1.00
5.00 2.00
5.00 3.00
5.00 4.00

Number of cases read: 23 Number of cases listed: 23

Michael Cohn wrote

I think this solves all my problems! Many thanks to Andy and Richard for their help. I wasn't familiar with the LAG and AGGREGATE functions in SPSS but now I know what to start learning about.

- Michael

----------------------------------
Michael A. Cohn, PhD
[hidden email]
Osher Center for Integrative Medicine
University of California, San Francisco

From: "Richard Ristow [via SPSSX Discussion]" <[hidden email]<mailto:[hidden email]>>
Date: Wednesday, December 4, 2013 at 14:56
To: Michael Cohn <[hidden email]<mailto:[hidden email]>>
Subject: Re: Computing variables based on multiple rows in a tall-format file

At 03:23 PM 12/4/2013, Michael Cohn wrote:

>My data file is currently in "tall" format (one row per measurement
>per participant). Is there a way to generate variables in each
>record that are based on information in that user's other records?

You'll get many answers; the fact is, all of these are quite easy.
The code I'm posting (not tested) assumes variables

PcptID -- Participant identifier
Date -- Date stamp
Outcome -- Outcome value

neither of the first two are ever missing; and your file is sorted in
ascending order on the first two.

>* A sequential index variable based on the record's datestamp (i.e.,
>number a participant's responses 1, 2, 3... in chronological order).

Various ways; here's a simple one, using transformation language:

NUMERIC VisitSeq (F4).

DO IF $CASENUM EQ 1.
. COMPUTE VisitSeq = 1.
ELSE IF PcptID NE LAG(PcptID).
. COMPUTE VisitSeq = 1.
ELSE.
. COMPUTE VisitSeq = LAG(VisitSeq) + 1.
END IF.

>* The length of time between the record and the earliest record for that
>participant
>* The difference between the outcome variable in the record and the minimum
>value ever recorded for that participant.

In both cases, start by putting the minimum value for the participant
in every record for that participant, and then it's easy:

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=PcptID
/Earliest 'Date of earliest record for participant' = MIN(Date)
/MinOut 'Lowest outcome value for participant' = MIN(Outcome).

>It's easy to do these using a spreadsheet or a python script ...

Actually, I think it's probably easier in long ('tall') form in
native SPSS than in either of those two.

-Best of luck,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]</user/SendEmail.jtp?type=node&node=5723438&i=0> (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

________________________________
If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
To unsubscribe from Computing variables based on multiple rows in a tall-format file, click here<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2NpbS51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=>.
NAML<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Kirill Orlov

Re: Computing variables based on multiple rows in a tall-format file

David, bravo! Sweetie-line.
I too, can't resist the beauty of concise coding.
However, we must remember that a fewer lines isn't always = faster code.

05.12.2013 15:54, David Marso пишет:

I don't see why people use that ponderous DO IF $CASENUM = 1 .... ELSE blah
blah blah (7 lines more or less) approach when a counter can be build with
ONE LINE OF REASONABLY INTUITIVE CODE!!!!!

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

DATA LIST FREE / ID.
BEGIN DATA
1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
END DATA.
COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
LIST.

      ID      SEQ

    1.00     1.00
    1.00     2.00
    1.00     3.00
    1.00     4.00
    1.00     5.00
    1.00     6.00
    2.00     1.00
    2.00     2.00
    2.00     3.00
    3.00     1.00
    3.00     2.00
    3.00     3.00
    3.00     4.00
    3.00     5.00
    3.00     6.00
    3.00     7.00
    4.00     1.00
    4.00     2.00
    4.00     3.00
    5.00     1.00
    5.00     2.00
    5.00     3.00
    5.00     4.00


Number of cases read:  23    Number of cases listed:  23

Art Kendall

Re: Computing variables based on multiple rows in a tall-format file

In reply to this post by David Marso

<Don flame shields!>
Your solution does require fewer lines and characters in the syntax, and most likely fewer internal operations.
However, I deny your assertion that the one line of code is "reasonably intuitive".
It would help beginners to see and understand Rich's solution and then see that it can be expressed more compactly.

Your solution is "reasonably intuitive" only for people with at least a moderate amount of experience at some computer languages.

However, Rich's do repeat syntax is much easier for people to read.
It is my impression that many posts from this list are from beginners.
It is also my impression that people searching the archives are beginners.

The easier it is for people to read and understand the syntax the easier it is to communicate the process to other people, e.g., classmates working as peer reviewers, on the job QA reviewers, triers-of-fact, archive users, process maintainers and up-daters.

Soapbox: Efficiency in terms of cognitive load trumps saving storage space for syntax and often trumps saving small amounts of processing time. Marginal labor cost is a greater consideration than marginal cost of machine resources.

A rhetorical question: How much computer time would it take to do David's solution compared to the computer time to do Rich's solution? How many times of running the same code would it take to actually get a measurable difference in computer time, e.g., 10 seconds?

<remove flame shield.>

Art Kendall
Social Research Consultants

On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote:

I don't see why people use that ponderous DO IF $CASENUM = 1 .... ELSE blah blah blah (7 lines more or less) approach when a counter can be build with !!!!!

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

DATA LIST FREE / ID.
BEGIN DATA
1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
END DATA.
COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).
LIST.

ID SEQ

1.00 1.00
1.00 2.00
1.00 3.00
1.00 4.00
1.00 5.00
1.00 6.00
2.00 1.00
2.00 2.00
2.00 3.00
3.00 1.00
3.00 2.00
3.00 3.00
3.00 4.00
3.00 5.00
3.00 6.00
3.00 7.00
4.00 1.00
4.00 2.00
4.00 3.00
5.00 1.00
5.00 2.00
5.00 3.00
5.00 4.00

Number of cases read: 23 Number of cases listed: 23

Michael Cohn wrote

I think this solves all my problems! Many thanks to Andy and Richard for their help. I wasn't familiar with the LAG and AGGREGATE functions in SPSS but now I know what to start learning about.

- Michael

----------------------------------
Michael A. Cohn, PhD
[hidden email]
Osher Center for Integrative Medicine
University of California, San Francisco

From: "Richard Ristow [via SPSSX Discussion]" <[hidden email]<mailto:[hidden email]>>
Date: Wednesday, December 4, 2013 at 14:56
To: Michael Cohn <[hidden email]<mailto:[hidden email]>>
Subject: Re: Computing variables based on multiple rows in a tall-format file

At 03:23 PM 12/4/2013, Michael Cohn wrote:

>My data file is currently in "tall" format (one row per measurement
>per participant). Is there a way to generate variables in each
>record that are based on information in that user's other records?

You'll get many answers; the fact is, all of these are quite easy.
The code I'm posting (not tested) assumes variables

PcptID -- Participant identifier
Date -- Date stamp
Outcome -- Outcome value

neither of the first two are ever missing; and your file is sorted in
ascending order on the first two.

>* A sequential index variable based on the record's datestamp (i.e.,
>number a participant's responses 1, 2, 3... in chronological order).

Various ways; here's a simple one, using transformation language:

NUMERIC VisitSeq (F4).

DO IF $CASENUM EQ 1.
. COMPUTE VisitSeq = 1.
ELSE IF PcptID NE LAG(PcptID).
. COMPUTE VisitSeq = 1.
ELSE.
. COMPUTE VisitSeq = LAG(VisitSeq) + 1.
END IF.

>* The length of time between the record and the earliest record for that
>participant
>* The difference between the outcome variable in the record and the minimum
>value ever recorded for that participant.

In both cases, start by putting the minimum value for the participant
in every record for that participant, and then it's easy:

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=PcptID
/Earliest 'Date of earliest record for participant' = MIN(Date)
/MinOut 'Lowest outcome value for participant' = MIN(Outcome).

>It's easy to do these using a spreadsheet or a python script ...

Actually, I think it's probably easier in long ('tall') form in
native SPSS than in either of those two.

-Best of luck,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]</user/SendEmail.jtp?type=node&node=5723438&i=0> (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

________________________________
If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
To unsubscribe from Computing variables based on multiple rows in a tall-format file, click here<
NAML<http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

Assumptions of my one liner.
Something is equal to something else or it isn't (true=1 false=0).
Multiplication of X by 0 = 0, by 1 = X
0+1 = 1, X+1 = X+1.
Anybody having a problem with this might ponder their choice of careers or majors?

Art Kendall wrote

<Don flame shields!>
Your solution does require fewer lines and characters in
the syntax, and most likely fewer internal operations.
However, I deny your assertion that the one line of code is
"reasonably intuitive".
It would help beginners to see and understand Rich's solution
and then see that it can be expressed more compactly.

Your solution is "reasonably intuitive" only for people with at
least a moderate amount of experience at some computer
languages.

However, Rich's do repeat syntax is much easier for people to
read.
It is my impression that many posts from this list are from
beginners.
It is also my impression that people searching the archives are
beginners.

The easier it is for people to read and understand the syntax
the easier it is to communicate the process to other people,
e.g., classmates working as peer reviewers, on the job QA
reviewers, triers-of-fact, archive users, process maintainers
and up-daters.

Soapbox: Efficiency in terms of cognitive load trumps saving
storage space for syntax and often trumps saving small amounts
of processing time. Marginal labor cost is a greater
consideration than marginal cost of machine resources.

A rhetorical question: How much computer time would it take to
do David's solution compared to the computer time to do Rich's
solution? How many times of running the same code would it take
to actually get a measurable difference in computer time, e.g.,
10 seconds?

<remove flame shield.>

Art Kendall
Social Research Consultants
On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote:

I don't see why people use that ponderous DO IF
$CASENUM = 1 .... ELSE blah blah blah (7 lines more or less)
approach when a counter can be build with !!!!!

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

DATA LIST FREE / ID.

BEGIN DATA

1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5

END DATA.

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

LIST.

ID SEQ

1.00 1.00
1.00 2.00
1.00 3.00
1.00 4.00
1.00 5.00
1.00 6.00
2.00 1.00
2.00 2.00
2.00 3.00
3.00 1.00
3.00 2.00
3.00 3.00
3.00 4.00
3.00 5.00
3.00 6.00
3.00 7.00
4.00 1.00
4.00 2.00
4.00 3.00
5.00 1.00
5.00 2.00
5.00 3.00
5.00 4.00

Number of cases read: 23 Number of cases listed: 23

Michael
Cohn wrote
I think this
solves all my problems! Many thanks to Andy and Richard for
their help. I wasn't familiar with the LAG and AGGREGATE
functions in SPSS but now I know what to start learning
about.

- Michael

----------------------------------

Michael A. Cohn, PhD

[hidden
email]
Osher Center for Integrative Medicine

University of California, San Francisco

From: "Richard Ristow [via SPSSX Discussion]" < [hidden
email] <mailto: [hidden
email] >>

Date: Wednesday, December 4, 2013 at 14:56

To: Michael Cohn < [hidden
email] <mailto: [hidden
email] >>

Subject: Re: Computing variables based on multiple rows in a
tall-format file

At 03:23 PM 12/4/2013, Michael Cohn wrote:

>My data file is currently in "tall" format (one row per
measurement

>per participant). Is there a way to generate variables
in each

>record that are based on information in that user's
other records?

You'll get many answers; the fact is, all of these are quite
easy.

The code I'm posting (not tested) assumes variables

PcptID -- Participant identifier

Date -- Date stamp

Outcome -- Outcome value

neither of the first two are ever missing; and your file is
sorted in

ascending order on the first two.

>* A sequential index variable based on the record's
datestamp (i.e.,

>number a participant's responses 1, 2, 3... in
chronological order).

Various ways; here's a simple one, using transformation
language:

NUMERIC VisitSeq (F4).

DO IF $CASENUM EQ 1.

. COMPUTE VisitSeq = 1.

ELSE IF PcptID NE LAG(PcptID).

. COMPUTE VisitSeq = 1.

ELSE.

. COMPUTE VisitSeq = LAG(VisitSeq) + 1.

END IF.

>* The length of time between the record and the earliest
record for that

>participant

>* The difference between the outcome variable in the
record and the minimum

>value ever recorded for that participant.

In both cases, start by putting the minimum value for the
participant

in every record for that participant, and then it's easy:

AGGREGATE OUTFILE=* MODE=ADDVARIABLES

/BREAK=PcptID

/Earliest 'Date of earliest record for participant' =
MIN(Date)

/MinOut 'Lowest outcome value for participant' =
MIN(Outcome).

>It's easy to do these using a spreadsheet or a python
script ...

Actually, I think it's probably easier in long ('tall') form
in

native SPSS than in either of those two.

-Best of luck,

Richard

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden
email]</user/SendEmail.jtp?type=node&node=5723438&i=0>
(not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the
command

INFO REFCARD

________________________________

If you reply to this email, your message will be added to
the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
To unsubscribe from Computing variables based on multiple
rows in a tall-format file, click here< http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2NpbS51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=> .

NAML< http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

Please
reply to the list and not to my personal email.

Those desiring my consulting or training services please feel
free to email me.

---

"Nolite dare sanctum canibus neque mittatis margaritas vestras
ante porcos ne forte conculcent eas pedibus suis."

Cum es damnatorum possederunt porcos iens ut salire off
sanguinum cliff in abyssum?"

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

Bruce Weaver

Re: Computing variables based on multiple rows in a tall-format file

Administrator

I mostly agree with Art on this one, partly because there are some aspects of David's one-liner that are a bit mysterious, especially to novices, I expect. E.g., on Row 1, LAG(SEQ) and LAG(ID) both return SYSMIS. The reason why David's code works is that those SYSMIS values are appearing as arguments to the SUM function, and SUM returns a valid result if at least one argument has a valid value--see the demo below.

Another somewhat mysterious aspect of David's method is that LAG(SEQ) can be used on the right side of the COMPUTE statement that is bringing SEQ into existence as a variable. That this can be done may not be intuitively obvious! ;-)

Re Richard's DO-IF, I prefer to have both conditions that result in SEQ = 1 on a single conditional statement with an OR. And perhaps two separate IF statements would be even easier for novice users to understand. See below.

NEW FILE.
DATASET CLOSE all.
DATA LIST FREE / ID.
BEGIN DATA
1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5
END DATA.

* David's one-liner.

COMPUTE SEQ1=SUM(1,LAG(SEQ1)*(LAG(ID) EQ ID)).

* Variation on Richard's DO-IF.
* The two conditions for which SEQ2 = 1 are combined with OR.

DO IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) .
. COMPUTE SEQ2 = 1.
ELSE.
. COMPUTE SEQ2 = LAG(SEQ2) + 1.
END IF.

* Two IF statements.
* This is somewhat less efficient in machine time than DO-IF,
* but possibly easier for the novice user to understand.

IF ($CASENUM EQ 1) OR (ID NE LAG(ID)) SEQ3 = 1.
IF MISSING(SEQ3) SEQ3 = LAG(SEQ3) + 1.

FORMATS ID SEQ1 to SEQ3(f5.0).
LIST.

* The slightly mysterious thing about David's one-liner
* is that the result is not SYSMIS on Row 1, because
* on Row 1, LAG(SEQ1) and LAG(ID) both return SYSMIS.

COMPUTE LagID = LAG(ID).
COMPUTE LagSEQ1 = LAG(SEQ1).
FORMATS LagID LagSEQ1(F5.0).
LIST.

Output:
ID SEQ1 SEQ2 SEQ3 LagID LagSEQ1

1 1 1 1 . .
1 2 2 2 1 1
1 3 3 3 1 2
1 4 4 4 1 3
etc.

* The reason David's code does not return SYSMIS as a
* result is that those missing values appear within
* the SUM function: SUM will return a valid result
* if at lest one of the arguments is valid. If you
* compute a sum using plus signs, on the other hand,
* all variables must be valid.

NEW FILE.
DATASET CLOSE all.
DATA LIST LIST / V1 to V3 (3f1).
BEGIN DATA
1 2 3
1 2 .
1 . .
. . .
END DATA.

COMPUTE SumViaSUM = SUM(V1 to V3).
COMPUTE SumViaPlus = V1 + V2 + V3.
FORMATS SumViaSUM SumViaPlus (F5.0).
LIST.

Output:
V1 V2 V3 SumViaSUM SumViaPlus

1 2 3 6 6
1 2 . 3 .
1 . . 1 .
. . . . .

Number of cases read: 4 Number of cases listed: 4

David Marso wrote

Assumptions of my one liner.
Something is equal to something else or it isn't (true=1 false=0).
Multiplication of X by 0 = 0, by 1 = X
0+1 = 1, X+1 = X+1.
Anybody having a problem with this might ponder their choice of careers or majors?

Art Kendall wrote

<Don flame shields!>
Your solution does require fewer lines and characters in
the syntax, and most likely fewer internal operations.
However, I deny your assertion that the one line of code is
"reasonably intuitive".
It would help beginners to see and understand Rich's solution
and then see that it can be expressed more compactly.

Your solution is "reasonably intuitive" only for people with at
least a moderate amount of experience at some computer
languages.

However, Rich's do repeat syntax is much easier for people to
read.
It is my impression that many posts from this list are from
beginners.
It is also my impression that people searching the archives are
beginners.

The easier it is for people to read and understand the syntax
the easier it is to communicate the process to other people,
e.g., classmates working as peer reviewers, on the job QA
reviewers, triers-of-fact, archive users, process maintainers
and up-daters.

Soapbox: Efficiency in terms of cognitive load trumps saving
storage space for syntax and often trumps saving small amounts
of processing time. Marginal labor cost is a greater
consideration than marginal cost of machine resources.

A rhetorical question: How much computer time would it take to
do David's solution compared to the computer time to do Rich's
solution? How many times of running the same code would it take
to actually get a measurable difference in computer time, e.g.,
10 seconds?

<remove flame shield.>

Art Kendall
Social Research Consultants
On 12/5/2013 6:54 AM, David Marso [via SPSSX Discussion] wrote:

I don't see why people use that ponderous DO IF
$CASENUM = 1 .... ELSE blah blah blah (7 lines more or less)
approach when a counter can be build with !!!!!

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

DATA LIST FREE / ID.

BEGIN DATA

1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 4 4 4 5 5 5 5

END DATA.

COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

LIST.

ID SEQ

1.00 1.00
1.00 2.00
1.00 3.00
1.00 4.00
1.00 5.00
1.00 6.00
2.00 1.00
2.00 2.00
2.00 3.00
3.00 1.00
3.00 2.00
3.00 3.00
3.00 4.00
3.00 5.00
3.00 6.00
3.00 7.00
4.00 1.00
4.00 2.00
4.00 3.00
5.00 1.00
5.00 2.00
5.00 3.00
5.00 4.00

Number of cases read: 23 Number of cases listed: 23

Michael
Cohn wrote
I think this
solves all my problems! Many thanks to Andy and Richard for
their help. I wasn't familiar with the LAG and AGGREGATE
functions in SPSS but now I know what to start learning
about.

- Michael

----------------------------------

Michael A. Cohn, PhD

[hidden
email]
Osher Center for Integrative Medicine

University of California, San Francisco

From: "Richard Ristow [via SPSSX Discussion]" < [hidden
email] <mailto: [hidden
email] >>

Date: Wednesday, December 4, 2013 at 14:56

To: Michael Cohn < [hidden
email] <mailto: [hidden
email] >>

Subject: Re: Computing variables based on multiple rows in a
tall-format file

At 03:23 PM 12/4/2013, Michael Cohn wrote:

>My data file is currently in "tall" format (one row per
measurement

>per participant). Is there a way to generate variables
in each

>record that are based on information in that user's
other records?

You'll get many answers; the fact is, all of these are quite
easy.

The code I'm posting (not tested) assumes variables

PcptID -- Participant identifier

Date -- Date stamp

Outcome -- Outcome value

neither of the first two are ever missing; and your file is
sorted in

ascending order on the first two.

>* A sequential index variable based on the record's
datestamp (i.e.,

>number a participant's responses 1, 2, 3... in
chronological order).

Various ways; here's a simple one, using transformation
language:

NUMERIC VisitSeq (F4).

DO IF $CASENUM EQ 1.

. COMPUTE VisitSeq = 1.

ELSE IF PcptID NE LAG(PcptID).

. COMPUTE VisitSeq = 1.

ELSE.

. COMPUTE VisitSeq = LAG(VisitSeq) + 1.

END IF.

>* The length of time between the record and the earliest
record for that

>participant

>* The difference between the outcome variable in the
record and the minimum

>value ever recorded for that participant.

In both cases, start by putting the minimum value for the
participant

in every record for that participant, and then it's easy:

AGGREGATE OUTFILE=* MODE=ADDVARIABLES

/BREAK=PcptID

/Earliest 'Date of earliest record for participant' =
MIN(Date)

/MinOut 'Lowest outcome value for participant' =
MIN(Outcome).

>It's easy to do these using a spreadsheet or a python
script ...

Actually, I think it's probably easier in long ('tall') form
in

native SPSS than in either of those two.

-Best of luck,

Richard

=====================

To manage your subscription to SPSSX-L, send a message to

[hidden
email]</user/SendEmail.jtp?type=node&node=5723438&i=0>
(not to SPSSX-L), with no body text except the

command. To leave the list, send the command

SIGNOFF SPSSX-L

For a list of commands to manage subscriptions, send the
command

INFO REFCARD

________________________________

If you reply to this email, your message will be added to
the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723438.html
To unsubscribe from Computing variables based on multiple
rows in a tall-format file, click here< http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5723431&code=Y29obm1Ab2NpbS51Y3NmLmVkdXw1NzIzNDMxfC03OTA5NjYwNDY=> .

NAML< http://spssx-discussion.1045642.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

Please
reply to the list and not to my personal email.

Those desiring my consulting or training services please feel
free to email me.

---

"Nolite dare sanctum canibus neque mittatis margaritas vestras
ante porcos ne forte conculcent eas pedibus suis."

Cum es damnatorum possederunt porcos iens ut salire off
sanguinum cliff in abyssum?"

If you reply to this email, your
message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723446.html

To start a new topic under SPSSX Discussion, email
[hidden email]
To unsubscribe from SPSSX Discussion, click
here .
NAML

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Richard Ristow

Re: Computing variables based on multiple rows in a tall-format file

In reply to this post by David Marso

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Computing variables based on multiple rows in a tall-format file

Administrator

And I will confess that some of David's solutions have indeed sent me to the FM!

One more method, while we're at it.

COMPUTE Case = $casenum.
RANK VARIABLES=Case (A) BY ID
/RANK
/PRINT=YES
/TIES=MEAN.

Output (using the same data as before):

Case ID RCase

1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6

7 2 1
8 2 2
9 2 3

10 3 1
11 3 2
12 3 3
13 3 4
14 3 5
15 3 6
16 3 7

17 4 1
18 4 2
19 4 3

20 5 1
21 5 2
22 5 3
23 5 4

Number of cases read: 23 Number of cases listed: 23

David Marso wrote

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

Note that the RANK command can be stated as simply:

RANK SEQ BY ID .

(Ascending (A), RANK , PRINT=YES,TIES=MEAN are ALL default).
My tendency is to specify only the non default .

Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?

FWIW: In all of my production code there are copious comments
(in a recent project of about 4000 lines about 800-900 are comments).
I deliberately leave out any WTF is going on type comments in NG postings in the hope that the
consumer after scratching his/her head for a few minutes will make an effort to self-educate (FM, help system etc...). TINSTAAFL!!!
---

Bruce Weaver wrote

And I will confess that some of David's solutions have indeed sent me to the FM!

One more method, while we're at it.

COMPUTE Case = $casenum.
RANK VARIABLES=Case (A) BY ID
/RANK
/PRINT=YES
/TIES=MEAN.

Output (using the same data as before):

Case ID RCase

1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6

7 2 1
8 2 2
9 2 3

10 3 1
11 3 2
12 3 3
13 3 4
14 3 5
15 3 6
16 3 7

17 4 1
18 4 2
19 4 3

20 5 1
21 5 2
22 5 3
23 5 4

Number of cases read: 23 Number of cases listed: 23

David Marso wrote

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Computing variables based on multiple rows in a tall-format file

Administrator

"Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?"

I suppose I'm thinking back several years to a time when comp.soft-sys.stat.spss was still an active group without much SPAM (yes, that long ago!). Back then, I didn't know much about syntax generally, and I knew next to nothing about the macro language. (Looking at some the syntax I wrote back then would no doubt have me doing the palm-to-forehead maneuver.) I can't point to any specific posts right now, but I do remember that some things you (or Neila Nessa) posted were impenetrable gibberish to me at that point. Not even STRONG COFFEE would have helped. But I chalk that up to my relative ignorance at the time. I don't think it is necessary (or desirable) to have every post in a forum such as this be understandable by complete novices. If that was a requirement, intermediate and advanced users would not have much opportunity to learn anything. I hope we continue seeing a mix of posts to this list that includes some things that make my head spin!

By the way, I don't think any of the other posters in this thread were suggesting that every post needs to be understandable by complete novices. (Just thought I'd throw that in before someone corrects me.) ;-)

David Marso wrote

Note that the RANK command can be stated as simply:

RANK SEQ BY ID .

(Ascending (A), RANK , PRINT=YES,TIES=MEAN are ALL default).
My tendency is to specify only the non default .

Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?

FWIW: In all of my production code there are copious comments
(in a recent project of about 4000 lines about 800-900 are comments).
I deliberately leave out any WTF is going on type comments in NG postings in the hope that the
consumer after scratching his/her head for a few minutes will make an effort to self-educate (FM, help system etc...). TINSTAAFL!!!
---

Bruce Weaver wrote

And I will confess that some of David's solutions have indeed sent me to the FM!

One more method, while we're at it.

COMPUTE Case = $casenum.
RANK VARIABLES=Case (A) BY ID
/RANK
/PRINT=YES
/TIES=MEAN.

Output (using the same data as before):

Case ID RCase

1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6

7 2 1
8 2 2
9 2 3

10 3 1
11 3 2
12 3 3
13 3 4
14 3 5
15 3 6
16 3 7

17 4 1
18 4 2
19 4 3

20 5 1
21 5 2
22 5 3
23 5 4

Number of cases read: 23 Number of cases listed: 23

David Marso wrote

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Computing variables based on multiple rows in a tall-format file

Administrator

Here is an example of a simple problem with 5 different solutions.
I suspect any of them might be chosen by any particular coder for various reasons (experience, readability, explicitness etc).
I tend to avoid versions 1 and 2 and go more for versions 3-5.
Version 3 comes in particularly handy in the MATRIX language as the conditions and outcomes can be built as arrays and the function is simply a vector operation. SETS of such can be set up as matrices.
Let's see what I can dig out of my head spinner collection for you Bruce ;-)

DATA LIST FREE / A B.
begin data
1 1 1 2 2 1 2 2
END DATA.

DO IF A=1 AND B=1.
+ COMPUTE C=1.
ELSE IF A=1 AND B=2.
+ COMPUTE C=2.
ELSE IF A=2 AND B=1.
+ COMPUTE C=3.
ELSE IF A=2 AND B=2.
+ COMPUTE C=4.
END IF.

IF A=1 AND B=1 C1=1.
IF A=1 AND B=2 C1=2.
IF A=2 AND B=1 C1=3.
IF A=2 AND B=2 C1=4.

DO IF A=1.
+ RECODE B (1=1)(2=2) INTO C2.
ELSE IF A=2.
+ RECODE B (1=3)(2=4) INTO C2.
END IF .

COMPUTE C3=SUM((A=1 AND B=1)*1,
(A=1 AND B=2)*2,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

COMPUTE C4=SUM((A=1)*B,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

COMPUTE C5=(A-1)*2 + B.

FORMATS ALL (F1.0).
LIST.

A B C C1 C2 C3 C4 C5

1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 1 3 3 3 3 3 3
2 2 4 4 4 4 4 4

Number of cases read: 4 Number of cases listed: 4

Bruce Weaver wrote

"Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?"

I suppose I'm thinking back several years to a time when comp.soft-sys.stat.spss was still an active group without much SPAM (yes, that long ago!). Back then, I didn't know much about syntax generally, and I knew next to nothing about the macro language. (Looking at some the syntax I wrote back then would no doubt have me doing the palm-to-forehead maneuver.) I can't point to any specific posts right now, but I do remember that some things you (or Neila Nessa) posted were impenetrable gibberish to me at that point. Not even STRONG COFFEE would have helped. But I chalk that up to my relative ignorance at the time. I don't think it is necessary (or desirable) to have every post in a forum such as this be understandable by complete novices. If that was a requirement, intermediate and advanced users would not have much opportunity to learn anything. I hope we continue seeing a mix of posts to this list that includes some things that make my head spin!

By the way, I don't think any of the other posters in this thread were suggesting that every post needs to be understandable by complete novices. (Just thought I'd throw that in before someone corrects me.) ;-)

David Marso wrote

Note that the RANK command can be stated as simply:

RANK SEQ BY ID .

(Ascending (A), RANK , PRINT=YES,TIES=MEAN are ALL default).
My tendency is to specify only the non default .

Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?

FWIW: In all of my production code there are copious comments
(in a recent project of about 4000 lines about 800-900 are comments).
I deliberately leave out any WTF is going on type comments in NG postings in the hope that the
consumer after scratching his/her head for a few minutes will make an effort to self-educate (FM, help system etc...). TINSTAAFL!!!
---

Bruce Weaver wrote

And I will confess that some of David's solutions have indeed sent me to the FM!

One more method, while we're at it.

COMPUTE Case = $casenum.
RANK VARIABLES=Case (A) BY ID
/RANK
/PRINT=YES
/TIES=MEAN.

Output (using the same data as before):

Case ID RCase

1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6

7 2 1
8 2 2
9 2 3

10 3 1
11 3 2
12 3 3
13 3 4
14 3 5
15 3 6
16 3 7

17 4 1
18 4 2
19 4 3

20 5 1
21 5 2
22 5 3
23 5 4

Number of cases read: 23 Number of cases listed: 23

David Marso wrote

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Computing variables based on multiple rows in a tall-format file

Administrator

I like that C3 computation, but might line things up this way to make it a bit more readable:

COMPUTE C3=SUM(
(A=1 AND B=1)*1,
(A=1 AND B=2)*2,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

Usually, I'd follow Art's advice and use EQ rather than =. But in this case, I think it's actually quite a bit easier to read with =.

COMPUTE C3=SUM(
(A EQ 1 AND B EQ 1)*1,
(A EQ 1 AND B EQ 2)*2,
(A EQ 2 AND B EQ 1)*3,
(A EQ 2 AND B EQ 2)*4).

I have also used that C5 computation before. It's not nearly as transparent--but it is very scalable. I'd be most inclined to use it when the number of combinations of A and B is large, because the other methods result in too much syntax in that case. E.g., if A and B both range from 1 to 5:

COMPUTE C5=(A-1)*5 + B.

David Marso wrote

Here is an example of a simple problem with 5 different solutions.
I suspect any of them might be chosen by any particular coder for various reasons (experience, readability, explicitness etc).
I tend to avoid versions 1 and 2 and go more for versions 3-5.
Version 3 comes in particularly handy in the MATRIX language as the conditions and outcomes can be built as arrays and the function is simply a vector operation. SETS of such can be set up as matrices.
Let's see what I can dig out of my head spinner collection for you Bruce ;-)

DATA LIST FREE / A B.
begin data
1 1 1 2 2 1 2 2
END DATA.

DO IF A=1 AND B=1.
+ COMPUTE C=1.
ELSE IF A=1 AND B=2.
+ COMPUTE C=2.
ELSE IF A=2 AND B=1.
+ COMPUTE C=3.
ELSE IF A=2 AND B=2.
+ COMPUTE C=4.
END IF.

IF A=1 AND B=1 C1=1.
IF A=1 AND B=2 C1=2.
IF A=2 AND B=1 C1=3.
IF A=2 AND B=2 C1=4.

DO IF A=1.
+ RECODE B (1=1)(2=2) INTO C2.
ELSE IF A=2.
+ RECODE B (1=3)(2=4) INTO C2.
END IF .

COMPUTE C3=SUM((A=1 AND B=1)*1,
(A=1 AND B=2)*2,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

COMPUTE C4=SUM((A=1)*B,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

COMPUTE C5=(A-1)*2 + B.

FORMATS ALL (F1.0).
LIST.

A B C C1 C2 C3 C4 C5

1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 1 3 3 3 3 3 3
2 2 4 4 4 4 4 4

Number of cases read: 4 Number of cases listed: 4

Bruce Weaver wrote

"Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?"

I suppose I'm thinking back several years to a time when comp.soft-sys.stat.spss was still an active group without much SPAM (yes, that long ago!). Back then, I didn't know much about syntax generally, and I knew next to nothing about the macro language. (Looking at some the syntax I wrote back then would no doubt have me doing the palm-to-forehead maneuver.) I can't point to any specific posts right now, but I do remember that some things you (or Neila Nessa) posted were impenetrable gibberish to me at that point. Not even STRONG COFFEE would have helped. But I chalk that up to my relative ignorance at the time. I don't think it is necessary (or desirable) to have every post in a forum such as this be understandable by complete novices. If that was a requirement, intermediate and advanced users would not have much opportunity to learn anything. I hope we continue seeing a mix of posts to this list that includes some things that make my head spin!

By the way, I don't think any of the other posters in this thread were suggesting that every post needs to be understandable by complete novices. (Just thought I'd throw that in before someone corrects me.) ;-)

David Marso wrote

Note that the RANK command can be stated as simply:

RANK SEQ BY ID .

(Ascending (A), RANK , PRINT=YES,TIES=MEAN are ALL default).
My tendency is to specify only the non default .

Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?

FWIW: In all of my production code there are copious comments
(in a recent project of about 4000 lines about 800-900 are comments).
I deliberately leave out any WTF is going on type comments in NG postings in the hope that the
consumer after scratching his/her head for a few minutes will make an effort to self-educate (FM, help system etc...). TINSTAAFL!!!
---

Bruce Weaver wrote

And I will confess that some of David's solutions have indeed sent me to the FM!

One more method, while we're at it.

COMPUTE Case = $casenum.
RANK VARIABLES=Case (A) BY ID
/RANK
/PRINT=YES
/TIES=MEAN.

Output (using the same data as before):

Case ID RCase

1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6

7 2 1
8 2 2
9 2 3

10 3 1
11 3 2
12 3 3
13 3 4
14 3 5
15 3 6
16 3 7

17 4 1
18 4 2
19 4 3

20 5 1
21 5 2
22 5 3
23 5 4

Number of cases read: 23 Number of cases listed: 23

David Marso wrote

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Computing variables based on multiple rows in a tall-format file

In reply to this post by Bruce Weaver

That was not what I meant to imply. The use of flame shield was intended as humor.

However, I try to guess from the question posed how much of a beginner the OP is. I also keep in mind building the archives.

The mix of approaches across several posts helps to get the message across that there can be many ways of doing the same thing.

I should but do not always post more than one solution. However, members of this list do often reply with a redraft of syntax.

That being said, I try to push for
-- readability as it helps all of the people concerned.
-- considering any analysis a process that goes through drafts like any other writing.
-- teamwork in learning and doing analysis.
-- completing metadata (variable view) and checking it with the team for readability and shared understanding before any processing.
-- review for QA and learning processes.
-- scientists sharing their data and analysis.

Art Kendall
Social Research Consultants

On 12/6/2013 7:06 PM, Bruce Weaver [via SPSSX Discussion] wrote:

By the way, I don't think any of the other posters in this thread were suggesting that every post needs to be understandable by complete novices. (Just thought I'd throw that in before someone corrects me.) ;-)

Art Kendall
Social Research Consultants

Art Kendall

Re: Computing variables based on multiple rows in a tall-format file

In reply to this post by Bruce Weaver

I have been using SPSS in consulting and evaluations for the Congress for many years. I still learn from things that David, Jon, Bruce, Rich, and Andy post to this list.

Art Kendall
Social Research Consultants

On 12/6/2013 7:06 PM, Bruce Weaver [via SPSSX Discussion] wrote:

By the way, I don't think any of the other posters in this thread were suggesting that every post needs to be understandable by complete novices. (Just thought I'd throw that in before someone corrects me.) ;-)

Art Kendall
Social Research Consultants

Art Kendall

Re: Computing variables based on multiple rows in a tall-format file

In reply to this post by David Marso

of course more drafts can be produced by using EQ and by using parentheses to clarify logic.
DO IF A EQ 1 AND EQ 1.
DO IF (A=1 AND B=1).
DO IF (A EQ 1 AND B EQ 1).

Art Kendall
Social Research Consultants

On 12/6/2013 7:22 PM, David Marso [via SPSSX Discussion] wrote:

Here is an example of a simple problem with 5 different solutions.
I suspect any of them might be chosen by any particular coder for various reasons (experience, readability, explicitness etc).
I tend to avoid versions 1 and 2 and go more for versions 3-5.
Version 3 comes in particularly handy in the MATRIX language as the conditions and outcomes can be built as arrays and the function is simply a vector operation. SETS of such can be set up as matrices.
Let's see what I can dig out of my head spinner collection for you Bruce ;-)

DATA LIST FREE / A B.
begin data
1 1 1 2 2 1 2 2
END DATA.

DO IF A=1 AND B=1.
+ COMPUTE C=1.
ELSE IF A=1 AND B=2.
+ COMPUTE C=2.
ELSE IF A=2 AND B=1.
+ COMPUTE C=3.
ELSE IF A=2 AND B=2.
+ COMPUTE C=4.
END IF.

IF A=1 AND B=1 C1=1.
IF A=1 AND B=2 C1=2.
IF A=2 AND B=1 C1=3.
IF A=2 AND B=2 C1=4.

DO IF A=1.
+ RECODE B (1=1)(2=2) INTO C2.
ELSE IF A=2.
+ RECODE B (1=3)(2=4) INTO C2.
END IF .

COMPUTE C3=SUM((A=1 AND B=1)*1,
(A=1 AND B=2)*2,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

COMPUTE C4=SUM((A=1)*B,
(A=2 AND B=1)*3,
(A=2 AND B=2)*4).

COMPUTE C5=(A-1)*2 + B.

FORMATS ALL (F1.0).
LIST.

A B C C1 C2 C3 C4 C5

1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 1 3 3 3 3 3 3
2 2 4 4 4 4 4 4

Number of cases read: 4 Number of cases listed: 4

Bruce Weaver wrote

"Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?"

I suppose I'm thinking back several years to a time when comp.soft-sys.stat.spss was still an active group without much SPAM (yes, that long ago!). Back then, I didn't know much about syntax generally, and I knew next to nothing about the macro language. (Looking at some the syntax I wrote back then would no doubt have me doing the palm-to-forehead maneuver.) I can't point to any specific posts right now, but I do remember that some things you (or Neila Nessa) posted were impenetrable gibberish to me at that point. Not even STRONG COFFEE would have helped. But I chalk that up to my relative ignorance at the time. I don't think it is necessary (or desirable) to have every post in a forum such as this be understandable by complete novices. If that was a requirement, intermediate and advanced users would not have much opportunity to learn anything. I hope we continue seeing a mix of posts to this list that includes some things that make my head spin!

By the way, I don't think any of the other posters in this thread were suggesting that every post needs to be understandable by complete novices. (Just thought I'd throw that in before someone corrects me.) ;-)

David Marso wrote

Note that the RANK command can be stated as simply:

RANK SEQ BY ID .

(Ascending (A), RANK , PRINT=YES,TIES=MEAN are ALL default).
My tendency is to specify only the non default .

Curious Bruce, which of my many 'cute' postings over the decades have driven you to the manual?

FWIW: In all of my production code there are copious comments
(in a recent project of about 4000 lines about 800-900 are comments).
I deliberately leave out any WTF is going on type comments in NG postings in the hope that the
consumer after scratching his/her head for a few minutes will make an effort to self-educate (FM, help system etc...). TINSTAAFL!!!
---

Bruce Weaver wrote

And I will confess that some of David's solutions have indeed sent me to the FM!

One more method, while we're at it.

COMPUTE Case = $casenum.
RANK VARIABLES=Case (A) BY ID
/RANK
/PRINT=YES
/TIES=MEAN.

Output (using the same data as before):

Case ID RCase

1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6

7 2 1
8 2 2
9 2 3

10 3 1
11 3 2
12 3 3
13 3 4
14 3 5
15 3 6
16 3 7

17 4 1
18 4 2
19 4 3

20 5 1
21 5 2
22 5 3
23 5 4

Number of cases read: 23 Number of cases listed: 23

David Marso wrote

We could also go with:

SPLIT FILE BY ID.
COMPUTE x=1.
CREATE cum = CSUM(x).

Is that to the point enough?

I must confess to an inclination towards the somewhat 'occult' aspects of SPSS.
SUM(something,nothing)=something etc...
I hope that my 'cute' solutions inspire more RTFM and exploration by whoever might read them.
I still abhor the DO IF blah blah blah END IF pattern!!!

Richard Ristow wrote

Let me weigh in, since this began with a comment on my style.

At 06:54 AM 12/5/2013, David Marso wrote:
>I don't see why people use that ponderous DO IF $CASENUM = 1 ....
>ELSE blah blah blah (7 lines more or less) approach

I freely admit that my code is far from compact; possibly, I use more
lines of code for a solution than would any other regular poster.

Here and elsewhere, I write code to be readable, before almost any
other consideration. That's important for a list posting, when the
main purpose of the code is to *be* read; but I've settled on the
practice for production code, as well.

Practically all code is read sometime, by its author if by no one
else; and it is essential that the reader have a clear sense what the
*program* does. It distracts from this if the reader needs to pause
even a moment to make out what a *line* does, just as an obscure
sentence in English can distract from comprehension of the text.

David Marso wrote,

>a counter can be build with ONE LINE OF REASONABLY INTUITIVE CODE!!!!!
>
>COMPUTE SEQ=SUM(1,LAG(SEQ)*(LAG(ID) EQ ID)).

For me, describing a coding technique as "reasonably intuitive" is an
immediate flag that the reader must pause to understand it; as, in
fact, I would have, if I encountered that line in code I was reading.
(And that would very much apply even if I'd written the code. Passage
of time would leave the code, but lose what I was thinking when I wrote it.)

At 08:51 AM 12/5/2013, David Marso wrote:
>Assumptions of my one liner.
>Something is equal to something else or it isn't (true=1 false=0).
>Multiplication of X by 0 = 0, by 1 = X
>0+1 = 1, X+1 = X+1.
>Anybody having a problem with this might ponder their choice of
>careers or majors?

I'm afraid I'm not sympathetic to this. It's the "emperor's new
clothes" argument: "if you don't see it, that just proves you're
stupid." If there's any rule for clear writing, it's that you don't
get to blame the reader for not understanding you.

There's real programming satisfaction in writing solutions like
David's, what I describe as 'cute' solutions. I admire people who can
write them, like David; or like Mel, the Real Programmer
(http://www.catb.org/jargon/html/story-of-mel.html). But I've come to
resist them, for the reasons I've given.

Finally: But, don't speed and compactness matter? Compactness matters
*occasionally*, like when you're putting a spacecraft guidance system
into a rugged but very small computer. As for speed, I think
clearly-written code is rarely much slower; and if it needs speed
improvements, having the logic clear guides you to which high-use
portions need to be optimized.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Computing-variables-based-on-multiple-rows-in-a-tall-format-file-tp5723431p5723493.html

To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants