how to calculate sum of rows

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

how to calculate sum of rows

Pegah Nejat
Hi everyone,
 
Could you please tell me how I can calculate the sum of an arbitrary number of rows in SPSS?
In my study, I have 83 subjects, and for each subjects I had 30 recordings, each of these recordings occupy one row in spss. Now I want to be able to calculate the sum of these 30 recordings for each subject and do the rest of the statistical analyses on these new data. What should I do?
 
Thanks a lot in advance.
Pegah Nejat

Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

John F Hall
File...
    New...
        Syntax
 
Then write in box...
 
compute newvar = sum.30 (v1 to v30).
 
or whatever you've called your variables. 
 
Run 
 
 
----- Original Message -----
Sent: Saturday, March 20, 2010 10:30 AM
Subject: how to calculate sum of rows

Hi everyone,
 
Could you please tell me how I can calculate the sum of an arbitrary number of rows in SPSS?
In my study, I have 83 subjects, and for each subjects I had 30 recordings, each of these recordings occupy one row in spss. Now I want to be able to calculate the sum of these 30 recordings for each subject and do the rest of the statistical analyses on these new data. What should I do?
 
Thanks a lot in advance.
Pegah Nejat

Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

Art Kendall
In reply to this post by Pegah Nejat
Look up AGGREGATE in <help>.

Art Kendall
Social Research Consultants

On 3/20/2010 5:30 AM, Pegah Nejat wrote:
Hi everyone,
 
Could you please tell me how I can calculate the sum of an arbitrary number of rows in SPSS?
In my study, I have 83 subjects, and for each subjects I had 30 recordings, each of these recordings occupy one row in spss. Now I want to be able to calculate the sum of these 30 recordings for each subject and do the rest of the statistical analyses on these new data. What should I do?
 
Thanks a lot in advance.
Pegah Nejat

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

David Marso
Administrator
In reply to this post by Pegah Nejat
John,
Did you even bother to read the question before replying with a blatantly
incorrect answer?  Clearly AGGREGATE is the appropriate approach here, or
perhaps you misunderstand the purpose of compute?
David

On Sat, 20 Mar 2010 11:59:44 +0100, John F Hall <[hidden email]> wrote:

>File...
>    New...
>        Syntax
>
>Then write in box...
>
>compute newvar = sum.30 (v1 to v30).
>
>or whatever you've called your variables.
>
>Run
>
>
>----- Original Message -----
>  From: Pegah Nejat
>  To: [hidden email]
>  Sent: Saturday, March 20, 2010 10:30 AM
>  Subject: how to calculate sum of rows
>
>
>        Hi everyone,
>
>        Could you please tell me how I can calculate the sum of an
arbitrary number of rows in SPSS?
>        In my study, I have 83 subjects, and for each subjects I had 30
recordings, each of these recordings occupy one row in spss. Now I want to
be able to calculate the sum of these 30 recordings for each subject and do
the rest of the statistical analyses on these new data. What should I do?
>
>        Thanks a lot in advance.
>        Pegah Nejat
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

John F Hall
Yes, I did bother to read the question: several times.  When I first read it, like you, I thought it suitable for aggregate, but on more careful reading I decided he meant the sum of 30 readings within each subject.  You may well be right, so why don't you offer him your solution instead of bickering?  I've been using and teaching SPSS since 1972 so I think I know exactly what compute is for.
----- Original Message -----
Sent: Sunday, March 21, 2010 9:30 AM
Subject: Re: how to calculate sum of rows


John,
Did you even bother to read the question before replying with a blatantly
incorrect answer?  Clearly AGGREGATE is the appropriate approach here, or
perhaps you misunderstand the purpose of compute?
David

On Sat, 20 Mar 2010 11:59:44 +0100, John F Hall <[hidden email]> wrote:

>File...
>    New...
>        Syntax
>
>Then write in box...
>
>compute newvar = sum.30 (v1 to v30).
>
>or whatever you've called your variables.
>
>Run
>
>
>----- Original Message -----
>  From: Pegah Nejat
>  To: [hidden email]
>  Sent: Saturday, March 20, 2010 10:30 AM
>  Subject: how to calculate sum of rows
>
>
>        Hi everyone,
>
>        Could you please tell me how I can calculate the sum of an
arbitrary number of rows in SPSS?
>        In my study, I have 83 subjects, and for each subjects I had 30
recordings, each of these recordings occupy one row in spss. Now I want to
be able to calculate the sum of these 30 recordings for each subject and do
the rest of the statistical analyses on these new data. What should I do?
>
>        Thanks a lot in advance.
>        Pegah Nejat
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

Pegah Nejat
Hi again,
 
I thank both of you for your kind offer of help. Yes, I used AGGREGATE and it worked perfectly. Those 30 recordings all belonged to the same column. In other words, I have only one variable which has been measured 30 times for each subject. John's solution works for 30 variables which was not the case here. Perhaps I should have made it more clear what I meant by 30 'recordings'.
 
Best Regards,
Pegah Nejat
 

--- On Sun, 3/21/10, John F Hall <[hidden email]> wrote:

From: John F Hall <[hidden email]>
Subject: Re: how to calculate sum of rows
To: [hidden email]
Date: Sunday, March 21, 2010, 12:48 PM

Yes, I did bother to read the question: several times.  When I first read it, like you, I thought it suitable for aggregate, but on more careful reading I decided he meant the sum of 30 readings within each subject.  You may well be right, so why don't you offer him your solution instead of bickering?  I've been using and teaching SPSS since 1972 so I think I know exactly what compute is for.
----- Original Message -----
Sent: Sunday, March 21, 2010 9:30 AM
Subject: Re: how to calculate sum of rows


John,
Did you even bother to read the question before replying with a blatantly
incorrect answer?  Clearly AGGREGATE is the appropriate approach here, or
perhaps you misunderstand the purpose of compute?
David

On Sat, 20 Mar 2010 11:59:44 +0100, John F Hall <johnfhall@...> wrote:

>File...
>    New...
>        Syntax
>
>Then write in box...
>
>compute newvar = sum.30 (v1 to v30).
>
>or whatever you've called your variables.
>
>Run
>
>
>----- Original Message -----
>  From: Pegah Nejat
>  To: SPSSX-L@...
>  Sent: Saturday, March 20, 2010 10:30 AM
>  Subject: how to calculate sum of rows
>
>
>        Hi everyone,
>
>        Could you please tell me how I can calculate the sum of an
arbitrary number of rows in SPSS?
>        In my study, I have 83 subjects, and for each subjects I had 30
recordings, each of these recordings occupy one row in spss. Now I want to
be able to calculate the sum of these 30 recordings for each subject and do
the rest of the statistical analyses on these new data. What should I do?
>
>        Thanks a lot in advance.
>        Pegah Nejat
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

Mike
I think that there is a pervasive problem in requests for assistance that
most people are aware of (perhaps only dimly) but do not really appreciate
how it affects (a) how they describe what their data is like and (b) the
nature of the problem that they are trying to solve.  The problem is
not being familiar with the conventions for describing the structure of
their data and what how SPSS handles that structure.
 
Consider the problem that Pegah Nejat posed:
 
How to get a sum of rows?
 
First, there is the issue of what do "rows" mean, both in terms of Nejat's
understanding of the data and SPSS conventions.  Correct me if I am
wrong in my points:
 
SPSS has historically assumed that its dataset is a rectangular file with
row representing cases and columns representing measurements on those
cases.  I refer to this kind of structure as a "repeated measures" design and
in terms of experimental design, allows the specification of within-subject
factors or independent variables.  This is the format the SPSS has traditionally
assumed for the procedures MANOVA and GLM for repeated measures
ANOVA.
 
With this in mind, if Nejat wanted a sum of 30 variables or repeated measures,
then
 
comptue repmeas_sum=sum(var1 to var30).
 
would be the appropriate specification to get the sum.  John Hall appears to
be making traditional SPSS assumptions and I believe that this is why he suggested
it as a solution.
 
Second, "rows" can be interpreted in another way, for example, a case may be
repeatedly measured but when the measurement(s) is recorded electronically, it
is specified a "row" in a file (i.e., "rows" are repeated measures within a subject
but a heirarchial organization is being used -- it seems that one ultimately wants
"subject" as the case or unite of analysis instead of time of measurement which
SPSS interprets as a "case").  Thus, contrary to traditional SPSS assumptions,
an individual case or "subject/participant/respondent" is defined not by a
single vector of variables that identify all measurements made on the individual
case but a hierarchy of vectors(rows) that are grouped together by a case ID and
perhaps other grouping variables.  I refer to this type of data format as a
"regression format" because in within-subject experimental design, one can
calculate within-case correlations and regression equations (aggregating across
the repeated measures, as Nejat appears to want to do,  reduces the multiple
measures to a single score which might serve as a replacement of the multiple
measurement, that is, a subject is now synonymous with an SPSS case).
 
From an SPSS perspective, this is a somewhat peculiar data arrangement because
SPSS has historically assumed that each case can be represented by a single
vector of variables.  This reflects, in part, its origins for analyzing survey or
questionnaire data in which each case or respondent is thought of a providing
a vector of responses.
 
However, in experimental design terms, there is no reason why within-subject
factors cannot be represented by a specific value on a grouping variable and
a case is defined by grouping variables that are identified either as between-subject
or within-subject factors plus the dependent variables that are supposed to be
analyzed.  I believe that traditionally, SAS assumed this type of data format for
its GLM procedure which could give rise to confusion because the use of
grouping variables might suggest to the naive analyst that the grouping variables
all reflect between-subjects factors which would lead one to use a signle error
term for the ANOVA Fs which would be incorrect.  For F tests involving within-subject
factors, one had to eliminate the systematic effect of subject differences and use
the interaction of subjects with the factor being tested; that is, each within-subject
F value has a specific error term in contrast to a single error term.  I was on the
dissertation committee of student who used SAS early in 1990s who actually
committed this error in SAS and had to re-run all of the analyses.  I believe that
SAS later added the "repeated" keyword/command specification to make it easier
to specify the within-subject factors so that the correct error terms are used.
SPSS' MIXED procedure operates in the same way as the old SAS GLM procedure,
that is, requiring grouping variables to represent within-subject factors.
However, the current implementation of the MIXED procedure makes it very
difficult to do the appropriate repeated measures ANOVA or, in other words,
replicate the repeated measures ANOVA results that one obtains in SPSS' GLM.
 
Getting to the point, Nejat appears to be using the latter format where each
of the 83 subjects have 30 "cases" as defined by SPSS.  That is, instead of
N=83, SPSS thinks it is dealing with 83x30=2490 cases.  I assume that Nejat
was trying to reduce the N=2490 cases into N=83 subject which may be of
actual concern.  In this case, aggregate would be the appropriate procedure
to use because one wants to reduce the number of cases per subject from
30 to 1 to 1 to 1.
 
The real problem is how to communicate the structure of one's data accurately
so that its structure is clear.  When I first saw Nejat's request, it seemed to me
that he was ambigous in the description of his data.  If he was using standard
SPSS assumptions, then Nejat had 30 respeated measures or variables in a
single vector.  But Nejat is clearly asking for the sum of rows, suggesting that
repeated measures might be nested within some grouping variables, which
means that it violates traditional SPSS conventions.  I asked myself why would
his data be structured this way or if Nejat was aware of these distinctions
because what he wanted to do is relatively trivial (either use a compute statement
or aggregate) but it is not if I does not understand the structure and nature
of the data they are working with.  Now this is a bigger concern for me then
which solution to suggest.
 
-Mike Palij
New York University
 
 
 
----- Original Message -----
Sent: Sunday, March 21, 2010 8:08 AM
Subject: Re: how to calculate sum of rows

Hi again,
 
I thank both of you for your kind offer of help. Yes, I used AGGREGATE and it worked perfectly. Those 30 recordings all belonged to the same column. In other words, I have only one variable which has been measured 30 times for each subject. John's solution works for 30 variables which was not the case here. Perhaps I should have made it more clear what I meant by 30 'recordings'.
 
Best Regards,
Pegah Nejat
 

--- On Sun, 3/21/10, John F Hall <[hidden email]> wrote:

From: John F Hall <[hidden email]>
Subject: Re: how to calculate sum of rows
To: [hidden email]
Date: Sunday, March 21, 2010, 12:48 PM

Yes, I did bother to read the question: several times.  When I first read it, like you, I thought it suitable for aggregate, but on more careful reading I decided he meant the sum of 30 readings within each subject.  You may well be right, so why don't you offer him your solution instead of bickering?  I've been using and teaching SPSS since 1972 so I think I know exactly what compute is for.
----- Original Message -----
Sent: Sunday, March 21, 2010 9:30 AM
Subject: Re: how to calculate sum of rows


John,
Did you even bother to read the question before replying with a blatantly
incorrect answer?  Clearly AGGREGATE is the appropriate approach here, or
perhaps you misunderstand the purpose of compute?
David

On Sat, 20 Mar 2010 11:59:44 +0100, John F Hall <johnfhall@...> wrote:

>File...
>    New...
>        Syntax
>
>Then write in box...
>
>compute newvar = sum.30 (v1 to v30).
>
>or whatever you've called your variables.
>
>Run
>
>
>----- Original Message -----
>  From: Pegah Nejat
>  To: SPSSX-L@...
>  Sent: Saturday, March 20, 2010 10:30 AM
>  Subject: how to calculate sum of rows
>
>
>        Hi everyone,
>
>        Could you please tell me how I can calculate the sum of an
arbitrary number of rows in SPSS?
>        In my study, I have 83 subjects, and for each subjects I had 30
recordings, each of these recordings occupy one row in spss. Now I want to
be able to calculate the sum of these 30 recordings for each subject and do
the rest of the statistical analyses on these new data. What should I do?
>
>        Thanks a lot in advance.
>        Pegah Nejat
>
>

Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

msherman

Mike:  Let me add one thing.  Back when SPSS was on main frames standard data (between subject designs) was inputted using records. Thus each subject might have 10 rows of data identified by an ID variable and then a record variable which would indicate how many rows of data were connected to one ID.  The data list might look something like this.

Data list records=10/ 1   id 1-3  v1 to v30 4-33  / 2  v31 to v40 1-10   / 3   v41 to v50  1-10     / 4   etc.

 

 

 

                                                                                                                                                                             
Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Masters Education: Thesis Track
Loyola College of Arts and Sciences

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Palij
Sent: Sunday, March 21, 2010 10:02 AM
To: [hidden email]
Subject: Re: how to calculate sum of rows

 

I think that there is a pervasive problem in requests for assistance that

most people are aware of (perhaps only dimly) but do not really appreciate

how it affects (a) how they describe what their data is like and (b) the

nature of the problem that they are trying to solve.  The problem is

not being familiar with the conventions for describing the structure of

their data and what how SPSS handles that structure.

 

Consider the problem that Pegah Nejat posed:

 

How to get a sum of rows?

 

First, there is the issue of what do "rows" mean, both in terms of Nejat's

understanding of the data and SPSS conventions.  Correct me if I am

wrong in my points:

 

SPSS has historically assumed that its dataset is a rectangular file with

row representing cases and columns representing measurements on those

cases.  I refer to this kind of structure as a "repeated measures" design and

in terms of experimental design, allows the specification of within-subject

factors or independent variables.  This is the format the SPSS has traditionally

assumed for the procedures MANOVA and GLM for repeated measures

ANOVA.

 

With this in mind, if Nejat wanted a sum of 30 variables or repeated measures,

then

 

comptue repmeas_sum=sum(var1 to var30).

 

would be the appropriate specification to get the sum.  John Hall appears to

be making traditional SPSS assumptions and I believe that this is why he suggested

it as a solution.

 

Second, "rows" can be interpreted in another way, for example, a case may be

repeatedly measured but when the measurement(s) is recorded electronically, it

is specified a "row" in a file (i.e., "rows" are repeated measures within a subject

but a heirarchial organization is being used -- it seems that one ultimately wants

"subject" as the case or unite of analysis instead of time of measurement which

SPSS interprets as a "case").  Thus, contrary to traditional SPSS assumptions,

an individual case or "subject/participant/respondent" is defined not by a

single vector of variables that identify all measurements made on the individual

case but a hierarchy of vectors(rows) that are grouped together by a case ID and

perhaps other grouping variables.  I refer to this type of data format as a

"regression format" because in within-subject experimental design, one can

calculate within-case correlations and regression equations (aggregating across

the repeated measures, as Nejat appears to want to do,  reduces the multiple

measures to a single score which might serve as a replacement of the multiple

measurement, that is, a subject is now synonymous with an SPSS case).

 

From an SPSS perspective, this is a somewhat peculiar data arrangement because

SPSS has historically assumed that each case can be represented by a single

vector of variables.  This reflects, in part, its origins for analyzing survey or

questionnaire data in which each case or respondent is thought of a providing

a vector of responses.

 

However, in experimental design terms, there is no reason why within-subject

factors cannot be represented by a specific value on a grouping variable and

a case is defined by grouping variables that are identified either as between-subject

or within-subject factors plus the dependent variables that are supposed to be

analyzed.  I believe that traditionally, SAS assumed this type of data format for

its GLM procedure which could give rise to confusion because the use of

grouping variables might suggest to the naive analyst that the grouping variables

all reflect between-subjects factors which would lead one to use a signle error

term for the ANOVA Fs which would be incorrect.  For F tests involving within-subject

factors, one had to eliminate the systematic effect of subject differences and use

the interaction of subjects with the factor being tested; that is, each within-subject

F value has a specific error term in contrast to a single error term.  I was on the

dissertation committee of student who used SAS early in 1990s who actually

committed this error in SAS and had to re-run all of the analyses.  I believe that

SAS later added the "repeated" keyword/command specification to make it easier

to specify the within-subject factors so that the correct error terms are used.

SPSS' MIXED procedure operates in the same way as the old SAS GLM procedure,

that is, requiring grouping variables to represent within-subject factors.

However, the current implementation of the MIXED procedure makes it very

difficult to do the appropriate repeated measures ANOVA or, in other words,

replicate the repeated measures ANOVA results that one obtains in SPSS' GLM.

 

Getting to the point, Nejat appears to be using the latter format where each

of the 83 subjects have 30 "cases" as defined by SPSS.  That is, instead of

N=83, SPSS thinks it is dealing with 83x30=2490 cases.  I assume that Nejat

was trying to reduce the N=2490 cases into N=83 subject which may be of

actual concern.  In this case, aggregate would be the appropriate procedure

to use because one wants to reduce the number of cases per subject from

30 to 1 to 1 to 1.

 

The real problem is how to communicate the structure of one's data accurately

so that its structure is clear.  When I first saw Nejat's request, it seemed to me

that he was ambigous in the description of his data.  If he was using standard

SPSS assumptions, then Nejat had 30 respeated measures or variables in a

single vector.  But Nejat is clearly asking for the sum of rows, suggesting that

repeated measures might be nested within some grouping variables, which

means that it violates traditional SPSS conventions.  I asked myself why would

his data be structured this way or if Nejat was aware of these distinctions

because what he wanted to do is relatively trivial (either use a compute statement

or aggregate) but it is not if I does not understand the structure and nature

of the data they are working with.  Now this is a bigger concern for me then

which solution to suggest.

 

-Mike Palij

New York University

 

 

 

----- Original Message -----

Sent: Sunday, March 21, 2010 8:08 AM

Subject: Re: how to calculate sum of rows

 

Hi again,

 

I thank both of you for your kind offer of help. Yes, I used AGGREGATE and it worked perfectly. Those 30 recordings all belonged to the same column. In other words, I have only one variable which has been measured 30 times for each subject. John's solution works for 30 variables which was not the case here. Perhaps I should have made it more clear what I meant by 30 'recordings'.

 

Best Regards,

Pegah Nejat

 


--- On Sun, 3/21/10, John F Hall <[hidden email]> wrote:


From: John F Hall <[hidden email]>
Subject: Re: how to calculate sum of rows
To: [hidden email]
Date: Sunday, March 21, 2010, 12:48 PM

Yes, I did bother to read the question: several times.  When I first read it, like you, I thought it suitable for aggregate, but on more careful reading I decided he meant the sum of 30 readings within each subject.  You may well be right, so why don't you offer him your solution instead of bickering?  I've been using and teaching SPSS since 1972 so I think I know exactly what compute is for.

----- Original Message -----

From: David Marso

Sent: Sunday, March 21, 2010 9:30 AM

Subject: Re: how to calculate sum of rows

 


John,
Did you even bother to read the question before replying with a blatantly
incorrect answer?  Clearly AGGREGATE is the appropriate approach here, or
perhaps you misunderstand the purpose of compute?
David

On Sat, 20 Mar 2010 11:59:44 +0100, John F Hall <johnfhall@...> wrote:

>File...
>    New...
>        Syntax
>
>Then write in box...
>
>compute newvar = sum.30 (v1 to v30).
>
>or whatever you've called your variables.
>
>Run
>
>
>----- Original Message -----
>  From: Pegah Nejat
>  To: SPSSX-L@...
>  Sent: Saturday, March 20, 2010 10:30 AM
>  Subject: how to calculate sum of rows
>
>
>        Hi everyone,
>
>        Could you please tell me how I can calculate the sum of an
arbitrary number of rows in SPSS?
>        In my study, I have 83 subjects, and for each subjects I had 30
recordings, each of these recordings occupy one row in spss. Now I want to
be able to calculate the sum of these 30 recordings for each subject and do
the rest of the statistical analyses on these new data. What should I do?
>
>        Thanks a lot in advance.
>        Pegah Nejat
>
>

 

Reply | Threaded
Open this post in threaded view
|

Re: how to calculate sum of rows

Mike
I actually have text files that consist of SPSS syntax and inline data
with multiple records per case, going back to my mainframe days
when I ran SPSS on Sperry Univac 1100 and various IBM systems
(from CMS to SuperWylbur).  I still create initial SPSS syntax and
inline data with multiple records per card today but with small datasets
(this provides very good documentation about the data as well as
the data itself which can then be sent to anyone who can examine
it with a text editor -- one does need SPSS to read the data and its
documentation and one can even use SAS or other statistical programs
to read the data after the SPSS syntax has been stripped out).
 
However, multiple records per case gets interpreted into a single
data vector for each case.  So, even though one might have 5 or 100
records per case, from SPSS' perspective these are only card images
which gets re-structured into a single case when an SPSS data file
is created. 
 
Within this context, asking for a sum across "records" make no sense
because from SPSS' perspective, the records form a single case.
 
This very different from where a case represents a repeated measure
on a single subject or a nested/hierarchial data structure which is
what I believe the original poster's data was.
 
-Mike Palij
New York University
 
 
----- Original Message -----
Sent: Sunday, March 21, 2010 12:04 PM
Subject: Re: how to calculate sum of rows

Mike:  Let me add one thing.  Back when SPSS was on main frames standard data (between subject designs) was inputted using records. Thus each subject might have 10 rows of data identified by an ID variable and then a record variable which would indicate how many rows of data were connected to one ID.  The data list might look something like this.

Data list records=10/ 1   id 1-3  v1 to v30 4-33  / 2  v31 to v40 1-10   / 3   v41 to v50  1-10     / 4   etc.

 

 

 

                                                                                                                                                                             
Martin F. Sherman, Ph.D.

Professor of Psychology

Director of Masters Education: Thesis Track
Loyola College of Arts and Sciences

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Palij
Sent: Sunday, March 21, 2010 10:02 AM
To: [hidden email]
Subject: Re: how to calculate sum of rows

 

I think that there is a pervasive problem in requests for assistance that

most people are aware of (perhaps only dimly) but do not really appreciate

how it affects (a) how they describe what their data is like and (b) the

nature of the problem that they are trying to solve.  The problem is

not being familiar with the conventions for describing the structure of

their data and what how SPSS handles that structure.

 

Consider the problem that Pegah Nejat posed:

 

How to get a sum of rows?

 

First, there is the issue of what do "rows" mean, both in terms of Nejat's

understanding of the data and SPSS conventions.  Correct me if I am

wrong in my points:

 

SPSS has historically assumed that its dataset is a rectangular file with

row representing cases and columns representing measurements on those

cases.  I refer to this kind of structure as a "repeated measures" design and

in terms of experimental design, allows the specification of within-subject

factors or independent variables.  This is the format the SPSS has traditionally

assumed for the procedures MANOVA and GLM for repeated measures

ANOVA.

 

With this in mind, if Nejat wanted a sum of 30 variables or repeated measures,

then

 

comptue repmeas_sum=sum(var1 to var30).

 

would be the appropriate specification to get the sum.  John Hall appears to

be making traditional SPSS assumptions and I believe that this is why he suggested

it as a solution.

 

Second, "rows" can be interpreted in another way, for example, a case may be

repeatedly measured but when the measurement(s) is recorded electronically, it

is specified a "row" in a file (i.e., "rows" are repeated measures within a subject

but a heirarchial organization is being used -- it seems that one ultimately wants

"subject" as the case or unite of analysis instead of time of measurement which

SPSS interprets as a "case").  Thus, contrary to traditional SPSS assumptions,

an individual case or "subject/participant/respondent" is defined not by a

single vector of variables that identify all measurements made on the individual

case but a hierarchy of vectors(rows) that are grouped together by a case ID and

perhaps other grouping variables.  I refer to this type of data format as a

"regression format" because in within-subject experimental design, one can

calculate within-case correlations and regression equations (aggregating across

the repeated measures, as Nejat appears to want to do,  reduces the multiple

measures to a single score which might serve as a replacement of the multiple

measurement, that is, a subject is now synonymous with an SPSS case).

 

From an SPSS perspective, this is a somewhat peculiar data arrangement because

SPSS has historically assumed that each case can be represented by a single

vector of variables.  This reflects, in part, its origins for analyzing survey or

questionnaire data in which each case or respondent is thought of a providing

a vector of responses.

 

However, in experimental design terms, there is no reason why within-subject

factors cannot be represented by a specific value on a grouping variable and

a case is defined by grouping variables that are identified either as between-subject

or within-subject factors plus the dependent variables that are supposed to be

analyzed.  I believe that traditionally, SAS assumed this type of data format for

its GLM procedure which could give rise to confusion because the use of

grouping variables might suggest to the naive analyst that the grouping variables

all reflect between-subjects factors which would lead one to use a signle error

term for the ANOVA Fs which would be incorrect.  For F tests involving within-subject

factors, one had to eliminate the systematic effect of subject differences and use

the interaction of subjects with the factor being tested; that is, each within-subject

F value has a specific error term in contrast to a single error term.  I was on the

dissertation committee of student who used SAS early in 1990s who actually

committed this error in SAS and had to re-run all of the analyses.  I believe that

SAS later added the "repeated" keyword/command specification to make it easier

to specify the within-subject factors so that the correct error terms are used.

SPSS' MIXED procedure operates in the same way as the old SAS GLM procedure,

that is, requiring grouping variables to represent within-subject factors.

However, the current implementation of the MIXED procedure makes it very

difficult to do the appropriate repeated measures ANOVA or, in other words,

replicate the repeated measures ANOVA results that one obtains in SPSS' GLM.

 

Getting to the point, Nejat appears to be using the latter format where each

of the 83 subjects have 30 "cases" as defined by SPSS.  That is, instead of

N=83, SPSS thinks it is dealing with 83x30=2490 cases.  I assume that Nejat

was trying to reduce the N=2490 cases into N=83 subject which may be of

actual concern.  In this case, aggregate would be the appropriate procedure

to use because one wants to reduce the number of cases per subject from

30 to 1 to 1 to 1.

 

The real problem is how to communicate the structure of one's data accurately

so that its structure is clear.  When I first saw Nejat's request, it seemed to me

that he was ambigous in the description of his data.  If he was using standard

SPSS assumptions, then Nejat had 30 respeated measures or variables in a

single vector.  But Nejat is clearly asking for the sum of rows, suggesting that

repeated measures might be nested within some grouping variables, which

means that it violates traditional SPSS conventions.  I asked myself why would

his data be structured this way or if Nejat was aware of these distinctions

because what he wanted to do is relatively trivial (either use a compute statement

or aggregate) but it is not if I does not understand the structure and nature

of the data they are working with.  Now this is a bigger concern for me then

which solution to suggest.

 

-Mike Palij

New York University

 

 

 

----- Original Message -----

Sent: Sunday, March 21, 2010 8:08 AM

Subject: Re: how to calculate sum of rows

 

Hi again,

 

I thank both of you for your kind offer of help. Yes, I used AGGREGATE and it worked perfectly. Those 30 recordings all belonged to the same column. In other words, I have only one variable which has been measured 30 times for each subject. John's solution works for 30 variables which was not the case here. Perhaps I should have made it more clear what I meant by 30 'recordings'.

 

Best Regards,

Pegah Nejat

 


--- On Sun, 3/21/10, John F Hall <[hidden email]> wrote:


From: John F Hall <[hidden email]>
Subject: Re: how to calculate sum of rows
To: [hidden email]
Date: Sunday, March 21, 2010, 12:48 PM

Yes, I did bother to read the question: several times.  When I first read it, like you, I thought it suitable for aggregate, but on more careful reading I decided he meant the sum of 30 readings within each subject.  You may well be right, so why don't you offer him your solution instead of bickering?  I've been using and teaching SPSS since 1972 so I think I know exactly what compute is for.

----- Original Message -----

From: David Marso

Sent: Sunday, March 21, 2010 9:30 AM

Subject: Re: how to calculate sum of rows

 


John,
Did you even bother to read the question before replying with a blatantly
incorrect answer?  Clearly AGGREGATE is the appropriate approach here, or
perhaps you misunderstand the purpose of compute?
David

On Sat, 20 Mar 2010 11:59:44 +0100, John F Hall <johnfhall@...> wrote:


>File...
>    New...
>        Syntax
>
>Then write in box...
>
>compute newvar = sum.30 (v1 to v30).
>
>or whatever you've called your variables.
>
>Run
>
>
>----- Original Message -----
>  From: Pegah Nejat
>  To: SPSSX-L@...
>  Sent: Saturday, March 20, 2010 10:30 AM
>  Subject: how to calculate sum of rows
>
>
>        Hi everyone,
>
>        Could you please tell me how I can calculate the sum of an
arbitrary number of rows in SPSS?
>        In my study, I have 83 subjects, and for each subjects I had 30
recordings, each of these recordings occupy one row in spss. Now I want to
be able to calculate the sum of these 30 recordings for each subject and do
the rest of the statistical analyses on these new data. What should I do?
>
>        Thanks a lot in advance.
>        Pegah Nejat
>
>