|
Hi there!
I've got a file with informations about communities (like socio-economic variables) and the 'postcode'. In order to be able to match with another file (individual datas) with the key variable 'postcode' I've got to keep just one case per value in the variable 'postcode' in the community-file. Before I'm dropping the other double cases (which are already found and have the value 0 in the variable 'PrimaryFirst') I want to summarize the variable 'population' of all the cases with the same postcode. The sum of the population should be the new value in the variable 'population' in the cases (with doublettes) I want to keep (and which have the value 1 in the variable 'PrimaryFirst'). My problem: I don't know how to get this sum of just the cases with the same postcode. I guess that's something with the Command DO IF which I have to do? But what would be the logical expression? I appreciate any hints! Thank you very much Thomas ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thomas,
Let me first say that I'm not sure I fully understand the background information to your problem. >>My problem: I don't know how to get this sum of just the cases with the same postcode. Taking just the statement of your problem, I'd like to suggest several methods. 1) sort cases by postcode, split the file by postcode, and run descriptives with 'sum' being the only statistic requested. 2) use the aggregrate command with the break variable being postcode and use the sum function to request the sum of a variable over the break set. I don't know that either of these methods will help you but both directly address your problem statement. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Tom
Hi Thomas,
I think you are asking how to first summarize population by postcode, and then append the summarized population data to your file of individual records by matching on 'postcode' -- correct? Try the AGGREGATE command; it is used to summarize variables based on one or more grouping variables (called BREAK variables). I've created a file of individual records (File) and a population file (File2) aggregated by 'postcode'. I'm then matching File1 with File2 using MATCH FILES; the aggregated data is a TABLE file (one record per postcode) versus a FILE, which can have multiple records per postcode. Note that File1 must be sorted on postcode before matching process, otherwise an error will result. DATASET CLOSE ALL. DATA LIST FREE /id population postcode. BEGIN DATA 101,7125,67865 102,9981,77654 103,10211,98998 104,8988,67865 105,6711,67865 106,5674,98998 107,3126,77654 108,7801,98998 109,1211,77654 110,9099,98998 END DATA. * Sort on break variable for MATCH FILES. SORT CASES BY postcode. * Saves individual records. DATASET NAME File1 . * Sums 'postcode' into 'sumpop'. AGGREGATE OUTFILE=* / BREAK postcode / sumpop=SUM(population). * Saves the summarized population data by postcode. DATASET NAME File2 . * Appends summarized data to individual records. MATCH FILES FILE=File1 / TABLE=File2 / BY postcode. * Forces data to be read. EXECUTE. Regards, Bob Walker Surveys & Forecasts, LLC -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Friday, March 07, 2008 6:06 AM To: [hidden email] Subject: Double Cases and Matching Hi there! I've got a file with informations about communities (like socio-economic variables) and the 'postcode'. In order to be able to match with another file (individual datas) with the key variable 'postcode' I've got to keep just one case per value in the variable 'postcode' in the community-file. Before I'm dropping the other double cases (which are already found and have the value 0 in the variable 'PrimaryFirst') I want to summarize the variable 'population' of all the cases with the same postcode. The sum of the population should be the new value in the variable 'population' in the cases (with doublettes) I want to keep (and which have the value 1 in the variable 'PrimaryFirst'). My problem: I don't know how to get this sum of just the cases with the same postcode. I guess that's something with the Command DO IF which I have to do? But what would be the logical expression? I appreciate any hints! Thank you very much Thomas ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Bob
Thank you very much for your answer! I made some remarks/ questions in your answer and added more Data because it didn't work, yet. -----Ursprüngliche Nachricht----- Von: Bob Walker [mailto:[hidden email]] Gesendet: Freitag, 7. März 2008 15:56 An: [hidden email] Cc: Balmer Thomas Betreff: RE: Double Cases and Matching I think you are asking how to first summarize population by postcode, and then append the summarized population data to your file of individual records by matching on 'postcode' -- correct? Yes, but I'd like to append also other variables to my file of individuals from my community file (e.g. typologie). Try the AGGREGATE command; it is used to summarize variables based on one or more grouping variables (called BREAK variables). I've created a file of individual records (File) and a population file (File2) aggregated by 'postcode'. To summarize works fine with the AGGREGATE command. OUTFILE seems to be the command to create a new file with these (new) aggregated variables. But I'd like to add 'sumpop' in my existing community file, which I then want to match as you described. But just leaving out OUTFILE doesn't work. DATASET CLOSE ALL. * My community file. DATA LIST FREE /id population postcode. BEGIN DATA 101,7125,67865 102,9981,77654 103,10211,98998 104,8988,67865 105,6711,67865 106,5674,98998 107,3126,77654 108,7801,98998 109,1211,77654 110,9099,98998 111,9000,12345 112,10000,33227 113,18763,22222 114,388,44444 END DATA. *My individual file with the above postcodes and more. 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 * Sort on break variable for MATCH FILES (both files). SORT CASES BY postcode. * Saves individual records. DATASET NAME File1 . * Saves community file too. DATASET NAME File2 . *But now?? * Sums 'postcode' into 'sumpop'. AGGREGATE OUTFILE=* / BREAK postcode / sumpop=SUM(population). * Saves the summarized population data by postcode. DATASET NAME File2 . * Appends summarized data to individual records. MATCH FILES FILE=File1 / TABLE=File2 / BY postcode. * Forces data to be read. EXECUTE. Regards, Bob Walker Surveys & Forecasts, LLC -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Friday, March 07, 2008 6:06 AM To: [hidden email] Subject: Double Cases and Matching Hi there! I've got a file with informations about communities (like socio-economic variables) and the 'postcode'. In order to be able to match with another file (individual datas) with the key variable 'postcode' I've got to keep just one case per value in the variable 'postcode' in the community-file. Before I'm dropping the other double cases (which are already found and have the value 0 in the variable 'PrimaryFirst') I want to summarize the variable 'population' of all the cases with the same postcode. The sum of the population should be the new value in the variable 'population' in the cases (with doublettes) I want to keep (and which have the value 1 in the variable 'PrimaryFirst'). My problem: I don't know how to get this sum of just the cases with the same postcode. I guess that's something with the Command DO IF which I have to do? But what would be the logical expression? I appreciate any hints! Thank you very much Thomas ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Thomas,
I hope that I am understanding you correctly. Let's assume that your community file has an age variable along with a population figure. And I'm assuming that in your community file, records with same postcode can have different ages and populations -- but this may not be true. My example works in either case. DATASET CLOSE ALL. * Community file, now with an age variable added. DATA LIST FREE /id age population postcode. BEGIN DATA 101,22,7125,67865 102,35,9981,77654 103,32,10211,98998 104,45,8988,67865 105,33,6711,67865 106,27,5674,98998 107,65,3126,77654 108,27,7801,98998 109,21,1211,77654 110,27,9099,98998 111,32,9000,12345 112,44,10000,33227 113,44,18763,22222 114,53,388,44444 END DATA. DATASET NAME Community. * Creates File1 with 'sumpop', the sum of 'population', for each 'postcode'. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Alternatively, creates File2 with 'sumpop' and mean age 'avgage' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /avgage=MEAN(age). DATASET NAME File2. * Creates your individual file (File3). DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * First option: appends 'sumpop' only. MATCH FILES /FILE=File3 / TABLE=File1 / BY postcode. DATASET NAME Result1. EXECUTE. * Second option: append 'sumpop' and 'avgage'. MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode. DATASET NAME Result2. EXECUTE. Note that in both Result1 and Result2, you'll see records with no summary data, because they don't exist in your individual file. Regards, Bob Walker Surveys & Forecasts, LLC -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Friday, March 07, 2008 10:53 AM To: [hidden email] Subject: AW: Double Cases and Matching Hi Bob Thank you very much for your answer! I made some remarks/ questions in your answer and added more Data because it didn't work, yet. -----Ursprüngliche Nachricht----- Von: Bob Walker [mailto:[hidden email]] Gesendet: Freitag, 7. März 2008 15:56 An: [hidden email] Cc: Balmer Thomas Betreff: RE: Double Cases and Matching I think you are asking how to first summarize population by postcode, and then append the summarized population data to your file of individual records by matching on 'postcode' -- correct? Yes, but I'd like to append also other variables to my file of individuals from my community file (e.g. typologie). Try the AGGREGATE command; it is used to summarize variables based on one or more grouping variables (called BREAK variables). I've created a file of individual records (File) and a population file (File2) aggregated by 'postcode'. To summarize works fine with the AGGREGATE command. OUTFILE seems to be the command to create a new file with these (new) aggregated variables. But I'd like to add 'sumpop' in my existing community file, which I then want to match as you described. But just leaving out OUTFILE doesn't work. DATASET CLOSE ALL. * My community file. DATA LIST FREE /id population postcode. BEGIN DATA 101,7125,67865 102,9981,77654 103,10211,98998 104,8988,67865 105,6711,67865 106,5674,98998 107,3126,77654 108,7801,98998 109,1211,77654 110,9099,98998 111,9000,12345 112,10000,33227 113,18763,22222 114,388,44444 END DATA. *My individual file with the above postcodes and more. 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 * Sort on break variable for MATCH FILES (both files). SORT CASES BY postcode. * Saves individual records. DATASET NAME File1 . * Saves community file too. DATASET NAME File2 . *But now?? * Sums 'postcode' into 'sumpop'. AGGREGATE OUTFILE=* / BREAK postcode / sumpop=SUM(population). * Saves the summarized population data by postcode. DATASET NAME File2 . * Appends summarized data to individual records. MATCH FILES FILE=File1 / TABLE=File2 / BY postcode. * Forces data to be read. EXECUTE. Regards, Bob Walker Surveys & Forecasts, LLC -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Friday, March 07, 2008 6:06 AM To: [hidden email] Subject: Double Cases and Matching Hi there! I've got a file with informations about communities (like socio-economic variables) and the 'postcode'. In order to be able to match with another file (individual datas) with the key variable 'postcode' I've got to keep just one case per value in the variable 'postcode' in the community-file. Before I'm dropping the other double cases (which are already found and have the value 0 in the variable 'PrimaryFirst') I want to summarize the variable 'population' of all the cases with the same postcode. The sum of the population should be the new value in the variable 'population' in the cases (with doublettes) I want to keep (and which have the value 1 in the variable 'PrimaryFirst'). My problem: I don't know how to get this sum of just the cases with the same postcode. I guess that's something with the Command DO IF which I have to do? But what would be the logical expression? I appreciate any hints! Thank you very much Thomas ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Bob
Thanks again. It's not exactly what I need to do (see below). DATASET CLOSE ALL. * Community file, now with an age variable added. DATA LIST FREE /cid age population postcode. BEGIN DATA 101,22,7125,67865 102,35,9981,77654 103,32,10211,98998 104,45,8988,67865 105,33,6711,67865 106,27,5674,98998 107,65,3126,77654 108,27,7801,98998 109,21,1211,77654 110,27,9099,98998 111,32,9000,12345 112,44,10000,33227 113,44,18763,22222 114,53,388,44444 END DATA. DATASET NAME Community. * Creates File1 with 'sumpop', the sum of 'population', for each 'postcode'. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Alternatively, creates File2 with 'sumpop' and mean age 'avgage' for each 'postcode'. **Thats not exactly what I want to do, I'd like to have the 'sumpop' for each 'postcode' (as you describe above) and then want to copy 'age' only from this case within those with the same postcode with the highest population, say for instance age 32 from those cases with postcode 98998. **So there should be a step choosing those cases within the community file, which have the same postcode and within those copy the value of 'age' of this community/case with the highest population with 'sumpop' in File1. How can I do this? * Creates your individual file (File3). DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * First option: appends 'sumpop' only. **Additionaly I want to append the age of the community with the highest population MATCH FILES /FILE=File3 / TABLE=File1 / BY postcode. DATASET NAME Result1. EXECUTE. Note that in both Result1 and Result2, you'll see records with no summary data, because they don't exist in your individual file. Thats ok. REgards Thomas ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Thomas,
Just AGGREGATE by postcode in two passes, then append both of these files to your respondent file. For maximum age, use the MAX function... DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode. BEGIN DATA 101,22,7125,67865 102,35,9981,77654 103,32,10211,98998 104,45,8988,67865 105,33,6711,67865 106,27,5674,98998 107,65,3126,77654 108,27,7801,98998 109,21,1211,77654 110,27,9099,98998 111,32,9000,12345 112,44,10000,33227 113,44,18763,22222 114,53,388,44444 END DATA. DATASET NAME Community. * Creates 'sumpop' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Finds maximum age 'maxage' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age). DATASET NAME File2. *Individual file with postcodes. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop and maxage. MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode. DATASET NAME Result WINDOW=FRONT. EXECUTE. HTH, Bob Walker Surveys & Forecasts, LLC -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Sunday, March 09, 2008 10:41 AM To: [hidden email] Subject: AW: Double Cases and Matching Hi Bob Thanks again. It's not exactly what I need to do (see below). DATASET CLOSE ALL. * Community file, now with an age variable added. DATA LIST FREE /cid age population postcode. BEGIN DATA 101,22,7125,67865 102,35,9981,77654 103,32,10211,98998 104,45,8988,67865 105,33,6711,67865 106,27,5674,98998 107,65,3126,77654 108,27,7801,98998 109,21,1211,77654 110,27,9099,98998 111,32,9000,12345 112,44,10000,33227 113,44,18763,22222 114,53,388,44444 END DATA. DATASET NAME Community. * Creates File1 with 'sumpop', the sum of 'population', for each 'postcode'. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Alternatively, creates File2 with 'sumpop' and mean age 'avgage' for each 'postcode'. **Thats not exactly what I want to do, I'd like to have the 'sumpop' for each 'postcode' (as you describe above) and then want to copy 'age' only from this case within those with the same postcode with the highest population, say for instance age 32 from those cases with postcode 98998. **So there should be a step choosing those cases within the community file, which have the same postcode and within those copy the value of 'age' of this community/case with the highest population with 'sumpop' in File1. How can I do this? * Creates your individual file (File3). DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * First option: appends 'sumpop' only. **Additionaly I want to append the age of the community with the highest population MATCH FILES /FILE=File3 / TABLE=File1 / BY postcode. DATASET NAME Result1. EXECUTE. Note that in both Result1 and Result2, you'll see records with no summary data, because they don't exist in your individual file. Thats ok. REgards Thomas ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Tom
Hi Bob
Works fine so far, thanks a lot! I thought to use Select if or something like that, your solution is very elegant. Now, I have other variables in the community file (like 'type', actually 8 other variables), which should be matched at the end with the file of my respondents. But, it should be those values of these other variables from the cases with the max age. How can I take the values of type from those cases with the max age in the result file? As I understood the command DOCUMENT, I tried it in order to keep type in file 1 - but it doesn't work - oh my bad english... Or DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type. BEGIN DATA 101,22,7125,67865,4 102,35,9981,77654,3 103,32,10211,98998,2 104,45,8988,67865,4 105,33,6711,67865,4 106,27,5674,98998,2 107,65,3126,77654,3 108,27,7801,98998,2 109,21,1211,77654,3 110,27,9099,98998,2 111,32,9000,12345,1 112,44,10000,33227,2 113,44,18763,22222,4 114,53,388,44444,1 END DATA. DATASET NAME Community. * Creates 'sumpop' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Finds maximum age 'maxage' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age). DATASET NAME File2. *Individual file with postcodes. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop and maxage. MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode. DATASET NAME Result WINDOW=FRONT. EXECUTE. SET CLOSE ALL. ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Thomas,
Sorry for the silence - been a busy day! So now you're saying you want to append not only the total population and maximum age, but also the variables of the record with the maximum age, say variables 'type1', 'type2', etc. My only concern is: will there only be ONE maximum age for a postcode? If so, your table file may not be accurate. Anyway, here's one approach (hope you agree, Richard): DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type1 type2 type3. BEGIN DATA 101,22,7125,67865,4,2,2 102,35,9981,77654,3,1,3 103,32,10211,98998,2,5,1 104,45,8988,67865,4,3,2 105,33,6711,67865,4,7,1 106,27,5674,98998,2,4,2 107,65,3126,77654,3,6,3 108,42,7801,98998,2,8,3 109,21,1211,77654,3,3,1 110,65,9099,98998,2,4,2 111,32,9000,12345,1,7,1 112,44,10000,33227,2,3,2 113,44,18763,22222,4,1,3 114,53,388,44444,1,1,1 END DATA. SORT CASES BY postcode age. DATASET NAME Community. * Calculate 'sumpop', 'maxage' per 'postcode'. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /age=MAX(age). COMPUTE FLAG=1. EXECUTE. DATASET NAME File1. * Match the summary measures to the community file. DATASET ACTIVATE Community. MATCH FILES /FILE=Community /TABLE=File1 /BY postcode age. SELECT IF FLAG=1. MATCH FILES /FILE=* RENAME (age=maxage)/ KEEP postcode maxage sumpop type1 type2 type3. SORT CASES BY postcode maxage. DATASET NAME File2. *Individual file with the above postcodes and more. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop, maxage, and type1-type3 variables. MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode. EXECUTE. DATASET NAME Result WINDOW=FRONT. Regards, Bob Walker Surveys & Forecasts, LLC ________________________________________ From: Balmer Thomas [mailto:[hidden email]] Sent: Monday, March 10, 2008 1:33 PM To: SPSSX(r) Discussion Cc: Bob Walker Subject: RE: Double Cases and Matching Hi Bob Works fine so far, thanks a lot! I thought to use Select if or something like that, your solution is very elegant. Now, I have other variables in the community file (like 'type', actually 8 other variables), which should be matched at the end with the file of my respondents. But, it should be those values of these other variables from the cases with the max age. How can I take the values of type from those cases with the max age in the result file? As I understood the command DOCUMENT, I tried it in order to keep type in file 1 - but it doesn't work - oh my bad english... Or DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type. BEGIN DATA 101,22,7125,67865,4 102,35,9981,77654,3 103,32,10211,98998,2 104,45,8988,67865,4 105,33,6711,67865,4 106,27,5674,98998,2 107,65,3126,77654,3 108,27,7801,98998,2 109,21,1211,77654,3 110,27,9099,98998,2 111,32,9000,12345,1 112,44,10000,33227,2 113,44,18763,22222,4 114,53,388,44444,1 END DATA. DATASET NAME Community. * Creates 'sumpop' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Finds maximum age 'maxage' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age). DATASET NAME File2. *Individual file with postcodes. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop and maxage. MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode. DATASET NAME Result WINDOW=FRONT. EXECUTE. SET CLOSE ALL. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Bob
Thank you very much, it's exactly what i needed! Just one question in order to understand the syntax: The FLAG Variable is a sort of Dummy to these cases in the community file with the max population, isn't it? Regards Thomas -----Ursprüngliche Nachricht----- Von: Bob Walker [mailto:[hidden email]] Gesendet: Mittwoch, 12. März 2008 04:09 An: 'SPSSX(r) Discussion' Cc: Balmer Thomas Betreff: RE: Double Cases and Matching Hi Thomas, Sorry for the silence - been a busy day! So now you're saying you want to append not only the total population and maximum age, but also the variables of the record with the maximum age, say variables 'type1', 'type2', etc. My only concern is: will there only be ONE maximum age for a postcode? If so, your table file may not be accurate. Anyway, here's one approach (hope you agree, Richard): DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type1 type2 type3. BEGIN DATA 101,22,7125,67865,4,2,2 102,35,9981,77654,3,1,3 103,32,10211,98998,2,5,1 104,45,8988,67865,4,3,2 105,33,6711,67865,4,7,1 106,27,5674,98998,2,4,2 107,65,3126,77654,3,6,3 108,42,7801,98998,2,8,3 109,21,1211,77654,3,3,1 110,65,9099,98998,2,4,2 111,32,9000,12345,1,7,1 112,44,10000,33227,2,3,2 113,44,18763,22222,4,1,3 114,53,388,44444,1,1,1 END DATA. SORT CASES BY postcode age. DATASET NAME Community. * Calculate 'sumpop', 'maxage' per 'postcode'. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /age=MAX(age). COMPUTE FLAG=1. EXECUTE. DATASET NAME File1. * Match the summary measures to the community file. DATASET ACTIVATE Community. MATCH FILES /FILE=Community /TABLE=File1 /BY postcode age. SELECT IF FLAG=1. MATCH FILES /FILE=* RENAME (age=maxage)/ KEEP postcode maxage sumpop type1 type2 type3. SORT CASES BY postcode maxage. DATASET NAME File2. *Individual file with the above postcodes and more. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop, maxage, and type1-type3 variables. MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode. EXECUTE. DATASET NAME Result WINDOW=FRONT. Regards, Bob Walker Surveys & Forecasts, LLC ________________________________________ From: Balmer Thomas [mailto:[hidden email]] Sent: Monday, March 10, 2008 1:33 PM To: SPSSX(r) Discussion Cc: Bob Walker Subject: RE: Double Cases and Matching Hi Bob Works fine so far, thanks a lot! I thought to use Select if or something like that, your solution is very elegant. Now, I have other variables in the community file (like 'type', actually 8 other variables), which should be matched at the end with the file of my respondents. But, it should be those values of these other variables from the cases with the max age. How can I take the values of type from those cases with the max age in the result file? As I understood the command DOCUMENT, I tried it in order to keep type in file 1 - but it doesn't work - oh my bad english... Or DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type. BEGIN DATA 101,22,7125,67865,4 102,35,9981,77654,3 103,32,10211,98998,2 104,45,8988,67865,4 105,33,6711,67865,4 106,27,5674,98998,2 107,65,3126,77654,3 108,27,7801,98998,2 109,21,1211,77654,3 110,27,9099,98998,2 111,32,9000,12345,1 112,44,10000,33227,2 113,44,18763,22222,4 114,53,388,44444,1 END DATA. DATASET NAME Community. * Creates 'sumpop' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Finds maximum age 'maxage' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age). DATASET NAME File2. *Individual file with postcodes. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop and maxage. MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode. DATASET NAME Result WINDOW=FRONT. EXECUTE. SET CLOSE ALL. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Thomas,
Yes, FLAG identifies just those cases in your community file with the maximum age. We can then append the entire record to the individual file (File3). I meant to ask in my earlier post: I'm assuming that it's OK to have more than one maxage record with the same postcode in the Community file? The AGGREGATE command will not care, but I didnt know if it was an issue for you. Regards, Bob Walker Surveys & Forecasts, LLC -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Balmer Thomas Sent: Wednesday, March 12, 2008 7:05 AM To: [hidden email] Subject: AW: Double Cases and Matching Hi Bob Thank you very much, it's exactly what i needed! Just one question in order to understand the syntax: The FLAG Variable is a sort of Dummy to these cases in the community file with the max population, isn't it? Regards Thomas -----Ursprüngliche Nachricht----- Von: Bob Walker [mailto:[hidden email]] Gesendet: Mittwoch, 12. März 2008 04:09 An: 'SPSSX(r) Discussion' Cc: Balmer Thomas Betreff: RE: Double Cases and Matching Hi Thomas, Sorry for the silence - been a busy day! So now you're saying you want to append not only the total population and maximum age, but also the variables of the record with the maximum age, say variables 'type1', 'type2', etc. My only concern is: will there only be ONE maximum age for a postcode? If so, your table file may not be accurate. Anyway, here's one approach (hope you agree, Richard): DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type1 type2 type3. BEGIN DATA 101,22,7125,67865,4,2,2 102,35,9981,77654,3,1,3 103,32,10211,98998,2,5,1 104,45,8988,67865,4,3,2 105,33,6711,67865,4,7,1 106,27,5674,98998,2,4,2 107,65,3126,77654,3,6,3 108,42,7801,98998,2,8,3 109,21,1211,77654,3,3,1 110,65,9099,98998,2,4,2 111,32,9000,12345,1,7,1 112,44,10000,33227,2,3,2 113,44,18763,22222,4,1,3 114,53,388,44444,1,1,1 END DATA. SORT CASES BY postcode age. DATASET NAME Community. * Calculate 'sumpop', 'maxage' per 'postcode'. AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /age=MAX(age). COMPUTE FLAG=1. EXECUTE. DATASET NAME File1. * Match the summary measures to the community file. DATASET ACTIVATE Community. MATCH FILES /FILE=Community /TABLE=File1 /BY postcode age. SELECT IF FLAG=1. MATCH FILES /FILE=* RENAME (age=maxage)/ KEEP postcode maxage sumpop type1 type2 type3. SORT CASES BY postcode maxage. DATASET NAME File2. *Individual file with the above postcodes and more. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop, maxage, and type1-type3 variables. MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode. EXECUTE. DATASET NAME Result WINDOW=FRONT. Regards, Bob Walker Surveys & Forecasts, LLC ________________________________________ From: Balmer Thomas [mailto:[hidden email]] Sent: Monday, March 10, 2008 1:33 PM To: SPSSX(r) Discussion Cc: Bob Walker Subject: RE: Double Cases and Matching Hi Bob Works fine so far, thanks a lot! I thought to use Select if or something like that, your solution is very elegant. Now, I have other variables in the community file (like 'type', actually 8 other variables), which should be matched at the end with the file of my respondents. But, it should be those values of these other variables from the cases with the max age. How can I take the values of type from those cases with the max age in the result file? As I understood the command DOCUMENT, I tried it in order to keep type in file 1 - but it doesn't work - oh my bad english... Or DATASET CLOSE ALL. * Community file, now with an age variable. DATA LIST FREE /cid age population postcode type. BEGIN DATA 101,22,7125,67865,4 102,35,9981,77654,3 103,32,10211,98998,2 104,45,8988,67865,4 105,33,6711,67865,4 106,27,5674,98998,2 107,65,3126,77654,3 108,27,7801,98998,2 109,21,1211,77654,3 110,27,9099,98998,2 111,32,9000,12345,1 112,44,10000,33227,2 113,44,18763,22222,4 114,53,388,44444,1 END DATA. DATASET NAME Community. * Creates 'sumpop' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population). DATASET NAME File1. * Finds maximum age 'maxage' for each 'postcode'. DATASET ACTIVATE Community. AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age). DATASET NAME File2. *Individual file with postcodes. DATA LIST FREE /id var1 var2 postcode. BEGIN DATA 1110, 1, 3, 67865 1111, 3, 3, 77654 1112, 2, 4, 98998 1113, 1, 3, 67865 1114, 5, 5, 67865 1115, 3, 2, 98998 1116, 4, 3, 77654 1117, 2, 4, 98998 1118, 3, 1, 77654 1119, 2, 1, 12345 1120, 4, 4, 33227 1121, 2, 4, 22222 1122, 4, 3, 12122 1123, 2, 2, 44444 1124, 3, 1, 34344 1125, 2, 2, 11112 1126, 5, 2, 23232 1127, 5, 4, 43211 1128, 5, 4, 98998 END DATA. SORT CASES BY postcode. DATASET NAME File3. * Append sumpop and maxage. MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode. DATASET NAME Result WINDOW=FRONT. EXECUTE. SET CLOSE ALL. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
