Hello Problably a rather simple problem... I got 1190 individual cases along with informations
about the population (number) in the community and the belonging to a certain
region (string with 4 categories). There are different numbers of cases in each
community and region. I’d like to summarize now the population in
each of the 4 regions, but of course counting a community only once. My following syntax doesn’t work, because it takes
the sum of population for all the cases by region, not only each community
once: SUMMARIZE /TABLES=population BY region /FORMAT=NOLIST TOTAL /MISSING=VARIABLE /CELLS=COUNT SUM . Have I to aggregate with community first? Thanks for any help. Tom |
Thomas,
I picture your data as being either persons within communities within regions or just communities within regions. Not sure which but I think the first case. So then the value for community population is repeated across cases in each community. Given this, it seems you need to pick a single case in each community so that the sum of communities is the region population. If you agree, then I don't think there is a way to do this within the summarize command although I'm sure others are better with this command than I am. I'd do either of these initial steps. Either 1) sort cases by region and community and number cases within community-region groups, select (temporary, select if would be fine) the first case, and then run the summarize command. 2) Aggregate the file breaking on region and community and keep either the first, last or mean of the population variable. On balance, I'd prefer 1) over 2) because the working dataset is not altered. Gene Maguin >>I got 1190 individual cases along with informations about the population (number) in the community and the belonging to a certain region (string with 4 categories). There are different numbers of cases in each community and region. I'd like to summarize now the population in each of the 4 regions, but of course counting a community only once. My following syntax doesn't work, because it takes the sum of population for all the cases by region, not only each community once: SUMMARIZE /TABLES=population BY region /FORMAT=NOLIST TOTAL /MISSING=VARIABLE /CELLS=COUNT SUM . Have I to aggregate with community first? Thanks for any help. Tom ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks, Gene
I think I'v got a rough idea about it. You're right with your assumptions, it's actually persons in communities within regions. In order to follow your first suggestion to pick up the first case in each community, I have to number the cases for each community. I've done it for all the cases with this syntax... Compute casenr= $casenum. Execute. ...which should be changed with the condition to do so just within the cases from one community (or: in all these cases, where "Community" is equal). That seems to my a sort of a loop to perform - I've never done that and no idea and honestly, I do not understand the Syntaxreference guide: "System variable $CASENUM is the sequence number of a case in the active dataset. Although it is syntactically correct to use $CASENUM on SELECT IF, it does not produce the expected results. To select a set of cases based on their sequence in a file, create your own sequence variable with the transformation language prior to making the selection (see the Examples on p. 1827)" . -----Ursprüngliche Nachricht----- Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Gene Maguin Gesendet: Montag, 24. Januar 2011 15:35 An: [hidden email] Betreff: Re: Summation Thomas, I picture your data as being either persons within communities within regions or just communities within regions. Not sure which but I think the first case. So then the value for community population is repeated across cases in each community. Given this, it seems you need to pick a single case in each community so that the sum of communities is the region population. If you agree, then I don't think there is a way to do this within the summarize command although I'm sure others are better with this command than I am. I'd do either of these initial steps. Either 1) sort cases by region and community and number cases within community-region groups, select (temporary, select if would be fine) the first case, and then run the summarize command. 2) Aggregate the file breaking on region and community and keep either the first, last or mean of the population variable. On balance, I'd prefer 1) over 2) because the working dataset is not altered. Gene Maguin >>I got 1190 individual cases along with informations about the population (number) in the community and the belonging to a certain region (string with 4 categories). There are different numbers of cases in each community and region. I'd like to summarize now the population in each of the 4 regions, but of course counting a community only once. My following syntax doesn't work, because it takes the sum of population for all the cases by region, not only each community once: SUMMARIZE /TABLES=population BY region /FORMAT=NOLIST TOTAL /MISSING=VARIABLE /CELLS=COUNT SUM . Have I to aggregate with community first? Thanks for any help. Tom ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Ok,
To number records within groups, where community defines the group. Compute rec=1. If (community eq lag(community)) rec=lag(rec)+1. Another snippet is Do if (community ne lag(community) or $casenum eq 1). + compute rec=1. Else. + compute rec=lag(rec)+1. End if. Generally, these two snippets do the same thing but not always. The issue is the record 0 ($casenum=0). If community is a string variable, the value of lag(community) is blank. If community is numeric, the value of lag(community) is sysmis. Either way the value of lag(rec) is sysmis. Now, if community is string and it's value for $casenum=1 is blank, the first snippet falls part because sysmis+1=sysmis. If community is numeric and it's value for $casenum=1 is sysmis, the first snippet also fall apart. If you know your data well, you can use the shorter one; if not, or to be more general, used the longer one. After having gotten snagged by this problem more than once, this is the functional explanation I have. I imagine it is the code explanation also but maybe it's not. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Tuesday, January 25, 2011 7:54 AM To: [hidden email] Subject: AW: Summation Thanks, Gene I think I'v got a rough idea about it. You're right with your assumptions, it's actually persons in communities within regions. In order to follow your first suggestion to pick up the first case in each community, I have to number the cases for each community. I've done it for all the cases with this syntax... Compute casenr= $casenum. Execute. ...which should be changed with the condition to do so just within the cases from one community (or: in all these cases, where "Community" is equal). That seems to my a sort of a loop to perform - I've never done that and no idea and honestly, I do not understand the Syntaxreference guide: "System variable $CASENUM is the sequence number of a case in the active dataset. Although it is syntactically correct to use $CASENUM on SELECT IF, it does not produce the expected results. To select a set of cases based on their sequence in a file, create your own sequence variable with the transformation language prior to making the selection (see the Examples on p. 1827)" . -----Ursprüngliche Nachricht----- Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Gene Maguin Gesendet: Montag, 24. Januar 2011 15:35 An: [hidden email] Betreff: Re: Summation Thomas, I picture your data as being either persons within communities within regions or just communities within regions. Not sure which but I think the first case. So then the value for community population is repeated across cases in each community. Given this, it seems you need to pick a single case in each community so that the sum of communities is the region population. If you agree, then I don't think there is a way to do this within the summarize command although I'm sure others are better with this command than I am. I'd do either of these initial steps. Either 1) sort cases by region and community and number cases within community-region groups, select (temporary, select if would be fine) the first case, and then run the summarize command. 2) Aggregate the file breaking on region and community and keep either the first, last or mean of the population variable. On balance, I'd prefer 1) over 2) because the working dataset is not altered. Gene Maguin >>I got 1190 individual cases along with informations about the population (number) in the community and the belonging to a certain region (string with 4 categories). There are different numbers of cases in each community and region. I'd like to summarize now the population in each of the 4 regions, but of course counting a community only once. My following syntax doesn't work, because it takes the sum of population for all the cases by region, not only each community once: SUMMARIZE /TABLES=population BY region /FORMAT=NOLIST TOTAL /MISSING=VARIABLE /CELLS=COUNT SUM . Have I to aggregate with community first? Thanks for any help. Tom ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Gene/Thomas
- this may be a simpler way to sequentially number within community; it gets round the problems with the pesky first case. data list list /community (A). begin data A A A B B B B end data. COMPUTE temp = 1. SORT CASES by community. SPLIT FILE by community. CREATE Casenum =CSUM(Temp). SPLIT FILE OFF. EXECUTE. Garry Gelade Business Analytic Ltd. ============================================================================ =================== -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: 25 January 2011 14:11 To: [hidden email] Subject: Re: Summation Ok, To number records within groups, where community defines the group. Compute rec=1. If (community eq lag(community)) rec=lag(rec)+1. Another snippet is Do if (community ne lag(community) or $casenum eq 1). + compute rec=1. Else. + compute rec=lag(rec)+1. End if. Generally, these two snippets do the same thing but not always. The issue is the record 0 ($casenum=0). If community is a string variable, the value of lag(community) is blank. If community is numeric, the value of lag(community) is sysmis. Either way the value of lag(rec) is sysmis. Now, if community is string and it's value for $casenum=1 is blank, the first snippet falls part because sysmis+1=sysmis. If community is numeric and it's value for $casenum=1 is sysmis, the first snippet also fall apart. If you know your data well, you can use the shorter one; if not, or to be more general, used the longer one. After having gotten snagged by this problem more than once, this is the functional explanation I have. I imagine it is the code explanation also but maybe it's not. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Tuesday, January 25, 2011 7:54 AM To: [hidden email] Subject: AW: Summation Thanks, Gene I think I'v got a rough idea about it. You're right with your assumptions, it's actually persons in communities within regions. In order to follow your first suggestion to pick up the first case in each community, I have to number the cases for each community. I've done it for all the cases with this syntax... Compute casenr= $casenum. Execute. ...which should be changed with the condition to do so just within the cases from one community (or: in all these cases, where "Community" is equal). That seems to my a sort of a loop to perform - I've never done that and no idea and honestly, I do not understand the Syntaxreference guide: "System variable $CASENUM is the sequence number of a case in the active dataset. Although it is syntactically correct to use $CASENUM on SELECT IF, it does not produce the expected results. To select a set of cases based on their sequence in a file, create your own sequence variable with the transformation language prior to making the selection (see the Examples on p. 1827)" . -----Ursprüngliche Nachricht----- Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Gene Maguin Gesendet: Montag, 24. Januar 2011 15:35 An: [hidden email] Betreff: Re: Summation Thomas, I picture your data as being either persons within communities within regions or just communities within regions. Not sure which but I think the first case. So then the value for community population is repeated across cases in each community. Given this, it seems you need to pick a single case in each community so that the sum of communities is the region population. If you agree, then I don't think there is a way to do this within the summarize command although I'm sure others are better with this command than I am. I'd do either of these initial steps. Either 1) sort cases by region and community and number cases within community-region groups, select (temporary, select if would be fine) the first case, and then run the summarize command. 2) Aggregate the file breaking on region and community and keep either the first, last or mean of the population variable. On balance, I'd prefer 1) over 2) because the working dataset is not altered. Gene Maguin >>I got 1190 individual cases along with informations about the population (number) in the community and the belonging to a certain region (string with 4 categories). There are different numbers of cases in each community and region. I'd like to summarize now the population in each of the 4 regions, but of course counting a community only once. My following syntax doesn't work, because it takes the sum of population for all the cases by region, not only each community once: SUMMARIZE /TABLES=population BY region /FORMAT=NOLIST TOTAL /MISSING=VARIABLE /CELLS=COUNT SUM . Have I to aggregate with community first? Thanks for any help. Tom ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks for your contributions, Garry, Glen
My final syntax works fine: COMPUTE temp = 1. SORT CASES by community. SPLIT FILE by community. CREATE CaseinCommunity =CSUM(Temp). SPLIT FILE OFF. EXECUTE. *Then the selection and summation. TEMPORARY. SELECT IF (Region EQ 1 AND CaseinCommunity EQ 1). FREQUENCIES VARIABLES=Population /FORMAT=NOTABLE /STATISTICS=SUM /ORDER=ANALYSIS. EXECUTE. I had to do this last part of the syntax 4 times for each region; I didn't found a smarter way to do it. Thomas An: [hidden email]; Balmer Thomas Cc: 'Gene Maguin' Betreff: RE: Summation Gene/Thomas - this may be a simpler way to sequentially number within community; it gets round the problems with the pesky first case. data list list /community (A). begin data A A A B B B B end data. COMPUTE temp = 1. SORT CASES by community. SPLIT FILE by community. CREATE Casenum =CSUM(Temp). SPLIT FILE OFF. EXECUTE. Garry Gelade Business Analytic Ltd. ============================================================================ =================== -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: 25 January 2011 14:11 To: [hidden email] Subject: Re: Summation Ok, To number records within groups, where community defines the group. Compute rec=1. If (community eq lag(community)) rec=lag(rec)+1. Another snippet is Do if (community ne lag(community) or $casenum eq 1). + compute rec=1. Else. + compute rec=lag(rec)+1. End if. Generally, these two snippets do the same thing but not always. The issue is the record 0 ($casenum=0). If community is a string variable, the value of lag(community) is blank. If community is numeric, the value of lag(community) is sysmis. Either way the value of lag(rec) is sysmis. Now, if community is string and it's value for $casenum=1 is blank, the first snippet falls part because sysmis+1=sysmis. If community is numeric and it's value for $casenum=1 is sysmis, the first snippet also fall apart. If you know your data well, you can use the shorter one; if not, or to be more general, used the longer one. After having gotten snagged by this problem more than once, this is the functional explanation I have. I imagine it is the code explanation also but maybe it's not. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Tuesday, January 25, 2011 7:54 AM To: [hidden email] Subject: AW: Summation Thanks, Gene I think I'v got a rough idea about it. You're right with your assumptions, it's actually persons in communities within regions. In order to follow your first suggestion to pick up the first case in each community, I have to number the cases for each community. I've done it for all the cases with this syntax... Compute casenr= $casenum. Execute. ...which should be changed with the condition to do so just within the cases from one community (or: in all these cases, where "Community" is equal). That seems to my a sort of a loop to perform - I've never done that and no idea and honestly, I do not understand the Syntaxreference guide: "System variable $CASENUM is the sequence number of a case in the active dataset. Although it is syntactically correct to use $CASENUM on SELECT IF, it does not produce the expected results. To select a set of cases based on their sequence in a file, create your own sequence variable with the transformation language prior to making the selection (see the Examples on p. 1827)" . -----Ursprüngliche Nachricht----- Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von Gene Maguin Gesendet: Montag, 24. Januar 2011 15:35 An: [hidden email] Betreff: Re: Summation Thomas, I picture your data as being either persons within communities within regions or just communities within regions. Not sure which but I think the first case. So then the value for community population is repeated across cases in each community. Given this, it seems you need to pick a single case in each community so that the sum of communities is the region population. If you agree, then I don't think there is a way to do this within the summarize command although I'm sure others are better with this command than I am. I'd do either of these initial steps. Either 1) sort cases by region and community and number cases within community-region groups, select (temporary, select if would be fine) the first case, and then run the summarize command. 2) Aggregate the file breaking on region and community and keep either the first, last or mean of the population variable. On balance, I'd prefer 1) over 2) because the working dataset is not altered. Gene Maguin >>I got 1190 individual cases along with informations about the population (number) in the community and the belonging to a certain region (string with 4 categories). There are different numbers of cases in each community and region. I'd like to summarize now the population in each of the 4 regions, but of course counting a community only once. My following syntax doesn't work, because it takes the sum of population for all the cases by region, not only each community once: SUMMARIZE /TABLES=population BY region /FORMAT=NOLIST TOTAL /MISSING=VARIABLE /CELLS=COUNT SUM . Have I to aggregate with community first? Thanks for any help. Tom ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |