SPSSX Discussion

Summation

Classic

List

Threaded

6 messages Options

Tom

Summation

Hello

Problably a rather simple problem...

I got 1190 individual cases along with informations about the population (number) in the community and the belonging to a certain region (string with 4 categories). There are different numbers of cases in each community and region.

I’d like to summarize now the population in each of the 4 regions, but of course counting a community only once.

My following syntax doesn’t work, because it takes the sum of population for all the cases by region, not only each community once:

SUMMARIZE

/TABLES=population BY region

/FORMAT=NOLIST TOTAL

/MISSING=VARIABLE

/CELLS=COUNT SUM .

Have I to aggregate with community first?

Thanks for any help.

Tom

Maguin, Eugene

Re: Summation

Thomas,

I picture your data as being either persons within communities within
regions or just communities within regions. Not sure which but I think the
first case. So then the value for community population is repeated across
cases in each community. Given this, it seems you need to pick a single case
in each community so that the sum of communities is the region population.
If you agree, then I don't think there is a way to do this within the
summarize command although I'm sure others are better with this command than
I am. I'd do either of these initial steps. Either

1) sort cases by region and community and number cases within
community-region groups, select (temporary, select if would be fine) the
first case, and then run the summarize command.

2) Aggregate the file breaking on region and community and keep either the
first, last or mean of the population variable.

On balance, I'd prefer 1) over 2) because the working dataset is not
altered.

Gene Maguin

>>I got 1190 individual cases along with informations about the population
(number) in the community and the belonging to a certain region (string with
4 categories). There are different numbers of cases in each community and
region.

I'd like to summarize now the population in each of the 4 regions, but of
course counting a community only once.

My following syntax doesn't work, because it takes the sum of population for
all the cases by region, not only each community once:

SUMMARIZE
/TABLES=population BY region
/FORMAT=NOLIST TOTAL
/MISSING=VARIABLE
/CELLS=COUNT SUM .

Have I to aggregate with community first?

Thanks for any help.

Tom

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Tom

AW: Summation

Thanks, Gene

I think I'v got a rough idea about it. You're right with your assumptions, it's actually persons in communities within regions.

In order to follow your first suggestion to pick up the first case in each community, I have to number the cases for each community. I've done it for all the cases with this syntax...

Compute casenr= $casenum.
Execute.

...which should be changed with the condition to do so just within the cases from one community (or: in all these cases, where "Community" is equal).
That seems to my a sort of a loop to perform - I've never done that and no idea and honestly, I do not understand the Syntaxreference guide:
"System variable $CASENUM is the sequence number of a case in the active dataset. Although it is syntactically correct to use $CASENUM on SELECT IF, it does not produce the expected results. To select a set of cases based on their sequence in a file, create your own sequence variable with the transformation language prior to making the selection (see the Examples on
p. 1827)" .

-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Gene Maguin
Gesendet: Montag, 24. Januar 2011 15:35
An: [hidden email]
Betreff: Re: Summation

Thomas,

I picture your data as being either persons within communities within
regions or just communities within regions. Not sure which but I think the
first case. So then the value for community population is repeated across
cases in each community. Given this, it seems you need to pick a single case
in each community so that the sum of communities is the region population.
If you agree, then I don't think there is a way to do this within the
summarize command although I'm sure others are better with this command than
I am. I'd do either of these initial steps. Either

1) sort cases by region and community and number cases within
community-region groups, select (temporary, select if would be fine) the
first case, and then run the summarize command.

2) Aggregate the file breaking on region and community and keep either the
first, last or mean of the population variable.

On balance, I'd prefer 1) over 2) because the working dataset is not
altered.

Gene Maguin

>>I got 1190 individual cases along with informations about the population
(number) in the community and the belonging to a certain region (string with
4 categories). There are different numbers of cases in each community and
region.

I'd like to summarize now the population in each of the 4 regions, but of
course counting a community only once.

My following syntax doesn't work, because it takes the sum of population for
all the cases by region, not only each community once:

SUMMARIZE
/TABLES=population BY region
/FORMAT=NOLIST TOTAL
/MISSING=VARIABLE
/CELLS=COUNT SUM .

Have I to aggregate with community first?

Thanks for any help.

Tom

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Summation

Ok,

To number records within groups, where community defines the group.

Compute rec=1.
If (community eq lag(community)) rec=lag(rec)+1.

Another snippet is

Do if (community ne lag(community) or $casenum eq 1).
+ compute rec=1.
Else.
+ compute rec=lag(rec)+1.
End if.

Generally, these two snippets do the same thing but not always. The issue is
the record 0 ($casenum=0). If community is a string variable, the value of
lag(community) is blank. If community is numeric, the value of
lag(community) is sysmis. Either way the value of lag(rec) is sysmis. Now,
if community is string and it's value for $casenum=1 is blank, the first
snippet falls part because sysmis+1=sysmis. If community is numeric and it's
value for $casenum=1 is sysmis, the first snippet also fall apart. If you
know your data well, you can use the shorter one; if not, or to be more
general, used the longer one. After having gotten snagged by this problem
more than once, this is the functional explanation I have. I imagine it is
the code explanation also but maybe it's not.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Tuesday, January 25, 2011 7:54 AM
To: [hidden email]
Subject: AW: Summation

Thanks, Gene

I think I'v got a rough idea about it. You're right with your assumptions,
it's actually persons in communities within regions.

In order to follow your first suggestion to pick up the first case in each
community, I have to number the cases for each community. I've done it for
all the cases with this syntax...

Compute casenr= $casenum.
Execute.

...which should be changed with the condition to do so just within the cases
from one community (or: in all these cases, where "Community" is equal).
That seems to my a sort of a loop to perform - I've never done that and no
idea and honestly, I do not understand the Syntaxreference guide:
"System variable $CASENUM is the sequence number of a case in the active
dataset. Although it is syntactically correct to use $CASENUM on SELECT IF,
it does not produce the expected results. To select a set of cases based on
their sequence in a file, create your own sequence variable with the
transformation language prior to making the selection (see the Examples on
p. 1827)" .

-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von
Gene Maguin
Gesendet: Montag, 24. Januar 2011 15:35
An: [hidden email]
Betreff: Re: Summation

Thomas,

I picture your data as being either persons within communities within
regions or just communities within regions. Not sure which but I think the
first case. So then the value for community population is repeated across
cases in each community. Given this, it seems you need to pick a single case
in each community so that the sum of communities is the region population.
If you agree, then I don't think there is a way to do this within the
summarize command although I'm sure others are better with this command than
I am. I'd do either of these initial steps. Either

1) sort cases by region and community and number cases within
community-region groups, select (temporary, select if would be fine) the
first case, and then run the summarize command.

2) Aggregate the file breaking on region and community and keep either the
first, last or mean of the population variable.

On balance, I'd prefer 1) over 2) because the working dataset is not
altered.

Gene Maguin

>>I got 1190 individual cases along with informations about the population
(number) in the community and the belonging to a certain region (string with
4 categories). There are different numbers of cases in each community and
region.

I'd like to summarize now the population in each of the 4 regions, but of
course counting a community only once.

My following syntax doesn't work, because it takes the sum of population for
all the cases by region, not only each community once:

SUMMARIZE
/TABLES=population BY region
/FORMAT=NOLIST TOTAL
/MISSING=VARIABLE
/CELLS=COUNT SUM .

Have I to aggregate with community first?

Thanks for any help.

Tom

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Garry Gelade

Re: Summation

Gene/Thomas

- this may be a simpler way to sequentially number within community; it
gets round the problems with the pesky first case.

data list list /community (A).
begin data
A
A
A
B
B
B
B
end data.
COMPUTE temp = 1.
SORT CASES by community.
SPLIT FILE by community.
CREATE Casenum =CSUM(Temp).
SPLIT FILE OFF.
EXECUTE.

Garry Gelade
Business Analytic Ltd.
============================================================================
===================
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gene Maguin
Sent: 25 January 2011 14:11
To: [hidden email]
Subject: Re: Summation

Ok,

To number records within groups, where community defines the group.

Compute rec=1.
If (community eq lag(community)) rec=lag(rec)+1.

Another snippet is

Do if (community ne lag(community) or $casenum eq 1).
+ compute rec=1.
Else.
+ compute rec=lag(rec)+1.
End if.

Generally, these two snippets do the same thing but not always. The issue is
the record 0 ($casenum=0). If community is a string variable, the value of
lag(community) is blank. If community is numeric, the value of
lag(community) is sysmis. Either way the value of lag(rec) is sysmis. Now,
if community is string and it's value for $casenum=1 is blank, the first
snippet falls part because sysmis+1=sysmis. If community is numeric and it's
value for $casenum=1 is sysmis, the first snippet also fall apart. If you
know your data well, you can use the shorter one; if not, or to be more
general, used the longer one. After having gotten snagged by this problem
more than once, this is the functional explanation I have. I imagine it is
the code explanation also but maybe it's not.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Tuesday, January 25, 2011 7:54 AM
To: [hidden email]
Subject: AW: Summation

Thanks, Gene

I think I'v got a rough idea about it. You're right with your assumptions,
it's actually persons in communities within regions.

In order to follow your first suggestion to pick up the first case in each
community, I have to number the cases for each community. I've done it for
all the cases with this syntax...

Compute casenr= $casenum.
Execute.

...which should be changed with the condition to do so just within the cases
from one community (or: in all these cases, where "Community" is equal).
That seems to my a sort of a loop to perform - I've never done that and no
idea and honestly, I do not understand the Syntaxreference guide:
"System variable $CASENUM is the sequence number of a case in the active
dataset. Although it is syntactically correct to use $CASENUM on SELECT IF,
it does not produce the expected results. To select a set of cases based on
their sequence in a file, create your own sequence variable with the
transformation language prior to making the selection (see the Examples on
p. 1827)" .

-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von
Gene Maguin
Gesendet: Montag, 24. Januar 2011 15:35
An: [hidden email]
Betreff: Re: Summation

Thomas,

I picture your data as being either persons within communities within
regions or just communities within regions. Not sure which but I think the
first case. So then the value for community population is repeated across
cases in each community. Given this, it seems you need to pick a single case
in each community so that the sum of communities is the region population.
If you agree, then I don't think there is a way to do this within the
summarize command although I'm sure others are better with this command than
I am. I'd do either of these initial steps. Either

1) sort cases by region and community and number cases within
community-region groups, select (temporary, select if would be fine) the
first case, and then run the summarize command.

2) Aggregate the file breaking on region and community and keep either the
first, last or mean of the population variable.

On balance, I'd prefer 1) over 2) because the working dataset is not
altered.

Gene Maguin

>>I got 1190 individual cases along with informations about the population
(number) in the community and the belonging to a certain region (string with
4 categories). There are different numbers of cases in each community and
region.

I'd like to summarize now the population in each of the 4 regions, but of
course counting a community only once.

My following syntax doesn't work, because it takes the sum of population for
all the cases by region, not only each community once:

SUMMARIZE
/TABLES=population BY region
/FORMAT=NOLIST TOTAL
/MISSING=VARIABLE
/CELLS=COUNT SUM .

Have I to aggregate with community first?

Thanks for any help.

Tom

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Tom

AW: Summation

Thanks for your contributions, Garry, Glen

My final syntax works fine:

COMPUTE temp = 1.
SORT CASES by community.
SPLIT FILE by community.
CREATE CaseinCommunity =CSUM(Temp).
SPLIT FILE OFF.
EXECUTE.

*Then the selection and summation.

TEMPORARY.
SELECT IF (Region EQ 1 AND CaseinCommunity EQ 1).
FREQUENCIES VARIABLES=Population
/FORMAT=NOTABLE
/STATISTICS=SUM
/ORDER=ANALYSIS.
EXECUTE.

I had to do this last part of the syntax 4 times for each region; I didn't found a smarter way to do it.

Thomas

An: [hidden email]; Balmer Thomas
Cc: 'Gene Maguin'
Betreff: RE: Summation

Gene/Thomas

- this may be a simpler way to sequentially number within community; it
gets round the problems with the pesky first case.

data list list /community (A).
begin data
A
A
A
B
B
B
B
end data.
COMPUTE temp = 1.
SORT CASES by community.
SPLIT FILE by community.
CREATE Casenum =CSUM(Temp).
SPLIT FILE OFF.
EXECUTE.

Garry Gelade
Business Analytic Ltd.
============================================================================
===================
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gene Maguin
Sent: 25 January 2011 14:11
To: [hidden email]
Subject: Re: Summation

Ok,

To number records within groups, where community defines the group.

Compute rec=1.
If (community eq lag(community)) rec=lag(rec)+1.

Another snippet is

Do if (community ne lag(community) or $casenum eq 1).
+ compute rec=1.
Else.
+ compute rec=lag(rec)+1.
End if.

Generally, these two snippets do the same thing but not always. The issue is
the record 0 ($casenum=0). If community is a string variable, the value of
lag(community) is blank. If community is numeric, the value of
lag(community) is sysmis. Either way the value of lag(rec) is sysmis. Now,
if community is string and it's value for $casenum=1 is blank, the first
snippet falls part because sysmis+1=sysmis. If community is numeric and it's
value for $casenum=1 is sysmis, the first snippet also fall apart. If you
know your data well, you can use the shorter one; if not, or to be more
general, used the longer one. After having gotten snagged by this problem
more than once, this is the functional explanation I have. I imagine it is
the code explanation also but maybe it's not.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Tuesday, January 25, 2011 7:54 AM
To: [hidden email]
Subject: AW: Summation

Thanks, Gene

I think I'v got a rough idea about it. You're right with your assumptions,
it's actually persons in communities within regions.

In order to follow your first suggestion to pick up the first case in each
community, I have to number the cases for each community. I've done it for
all the cases with this syntax...

Compute casenr= $casenum.
Execute.

...which should be changed with the condition to do so just within the cases
from one community (or: in all these cases, where "Community" is equal).
That seems to my a sort of a loop to perform - I've never done that and no
idea and honestly, I do not understand the Syntaxreference guide:
"System variable $CASENUM is the sequence number of a case in the active
dataset. Although it is syntactically correct to use $CASENUM on SELECT IF,
it does not produce the expected results. To select a set of cases based on
their sequence in a file, create your own sequence variable with the
transformation language prior to making the selection (see the Examples on
p. 1827)" .

-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von
Gene Maguin
Gesendet: Montag, 24. Januar 2011 15:35
An: [hidden email]
Betreff: Re: Summation

Thomas,

I picture your data as being either persons within communities within
regions or just communities within regions. Not sure which but I think the
first case. So then the value for community population is repeated across
cases in each community. Given this, it seems you need to pick a single case
in each community so that the sum of communities is the region population.
If you agree, then I don't think there is a way to do this within the
summarize command although I'm sure others are better with this command than
I am. I'd do either of these initial steps. Either

1) sort cases by region and community and number cases within
community-region groups, select (temporary, select if would be fine) the
first case, and then run the summarize command.

2) Aggregate the file breaking on region and community and keep either the
first, last or mean of the population variable.

On balance, I'd prefer 1) over 2) because the working dataset is not
altered.

Gene Maguin

>>I got 1190 individual cases along with informations about the population
(number) in the community and the belonging to a certain region (string with
4 categories). There are different numbers of cases in each community and
region.

I'd like to summarize now the population in each of the 4 regions, but of
course counting a community only once.

My following syntax doesn't work, because it takes the sum of population for
all the cases by region, not only each community once:

SUMMARIZE
/TABLES=population BY region
/FORMAT=NOLIST TOTAL
/MISSING=VARIABLE
/CELLS=COUNT SUM .

Have I to aggregate with community first?

Thanks for any help.

Tom

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD