SPSSX Discussion

Double Cases and Matching

Classic

List

Threaded

11 messages Options

Tom

Double Cases and Matching

Hi there!

I've got a file with informations about communities (like socio-economic
variables) and the 'postcode'.

In order to be able to match with another file (individual datas) with
the key variable 'postcode' I've got to keep just one case per value in
the variable 'postcode' in the community-file. Before I'm dropping the
other double cases (which are already found and have the value 0 in the
variable 'PrimaryFirst') I want to summarize the variable 'population'
of all the cases with the same postcode. The sum of the population
should be the new value in the variable 'population' in the cases (with
doublettes) I want to keep (and which have the value 1 in the variable
'PrimaryFirst').

My problem: I don't know how to get this sum of just the cases with the
same postcode.

I guess that's something with the Command DO IF which I have to do? But
what would be the logical expression?

I appreciate any hints!

Thank you very much

Thomas

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Double Cases and Matching

Thomas,

Let me first say that I'm not sure I fully understand the background
information to your problem.

>>My problem: I don't know how to get this sum of just the cases with the
same postcode.

Taking just the statement of your problem, I'd like to suggest several
methods.

1) sort cases by postcode, split the file by postcode, and run descriptives
with 'sum' being the only statistic requested.

2) use the aggregrate command with the break variable being postcode and use
the sum function to request the sum of a variable over the break set.

I don't know that either of these methods will help you but both directly
address your problem statement.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Walker-2

Re: Double Cases and Matching

In reply to this post by Tom

Hi Thomas,

I think you are asking how to first summarize population by postcode, and
then append the summarized population data to your file of individual
records by matching on 'postcode' -- correct?

Try the AGGREGATE command; it is used to summarize variables based on one or
more grouping variables (called BREAK variables). I've created a file of
individual records (File) and a population file (File2) aggregated by
'postcode'. I'm then matching File1 with File2 using MATCH FILES; the
aggregated data is a TABLE file (one record per postcode) versus a FILE,
which can have multiple records per postcode. Note that File1 must be sorted
on postcode before matching process, otherwise an error will result.

DATASET CLOSE ALL.
DATA LIST FREE /id population postcode.
BEGIN DATA
101,7125,67865
102,9981,77654
103,10211,98998
104,8988,67865
105,6711,67865
106,5674,98998
107,3126,77654
108,7801,98998
109,1211,77654
110,9099,98998
END DATA.

* Sort on break variable for MATCH FILES.
SORT CASES BY postcode.
* Saves individual records.
DATASET NAME File1 .
* Sums 'postcode' into 'sumpop'.
AGGREGATE OUTFILE=* / BREAK postcode / sumpop=SUM(population).
* Saves the summarized population data by postcode.
DATASET NAME File2 .
* Appends summarized data to individual records.
MATCH FILES FILE=File1 / TABLE=File2 / BY postcode.
* Forces data to be read.
EXECUTE.

Regards,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Friday, March 07, 2008 6:06 AM
To: [hidden email]
Subject: Double Cases and Matching

Hi there!

I've got a file with informations about communities (like socio-economic
variables) and the 'postcode'.

In order to be able to match with another file (individual datas) with
the key variable 'postcode' I've got to keep just one case per value in
the variable 'postcode' in the community-file. Before I'm dropping the
other double cases (which are already found and have the value 0 in the
variable 'PrimaryFirst') I want to summarize the variable 'population'
of all the cases with the same postcode. The sum of the population
should be the new value in the variable 'population' in the cases (with
doublettes) I want to keep (and which have the value 1 in the variable
'PrimaryFirst').

My problem: I don't know how to get this sum of just the cases with the
same postcode.

I guess that's something with the Command DO IF which I have to do? But
what would be the logical expression?

I appreciate any hints!

Thank you very much

Thomas

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Tom

AW: Double Cases and Matching

Hi Bob

Thank you very much for your answer! I made some remarks/ questions in your answer and added more Data because it didn't work, yet.

-----Ursprüngliche Nachricht-----
Von: Bob Walker [mailto:[hidden email]]
Gesendet: Freitag, 7. März 2008 15:56
An: [hidden email]
Cc: Balmer Thomas
Betreff: RE: Double Cases and Matching

I think you are asking how to first summarize population by postcode, and
then append the summarized population data to your file of individual
records by matching on 'postcode' -- correct?

Yes, but I'd like to append also other variables to my file of individuals from my community file (e.g. typologie).

Try the AGGREGATE command; it is used to summarize variables based on one or
more grouping variables (called BREAK variables). I've created a file of
individual records (File) and a population file (File2) aggregated by
'postcode'.

To summarize works fine with the AGGREGATE command. OUTFILE seems to be the command to create a new file with these (new) aggregated variables. But I'd like to add 'sumpop' in my existing community file, which I then want to match as you described. But just leaving out OUTFILE doesn't work.

DATASET CLOSE ALL.
* My community file.
DATA LIST FREE /id population postcode.
BEGIN DATA
101,7125,67865
102,9981,77654
103,10211,98998
104,8988,67865
105,6711,67865
106,5674,98998
107,3126,77654
108,7801,98998
109,1211,77654
110,9099,98998
111,9000,12345
112,10000,33227
113,18763,22222
114,388,44444
END DATA.

*My individual file with the above postcodes and more.
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998

* Sort on break variable for MATCH FILES (both files).
SORT CASES BY postcode.
* Saves individual records.
DATASET NAME File1 .
* Saves community file too.
DATASET NAME File2 .

*But now??
* Sums 'postcode' into 'sumpop'.
AGGREGATE OUTFILE=* / BREAK postcode / sumpop=SUM(population).
* Saves the summarized population data by postcode.
DATASET NAME File2 .
* Appends summarized data to individual records.
MATCH FILES FILE=File1 / TABLE=File2 / BY postcode.
* Forces data to be read.
EXECUTE.

Regards,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Friday, March 07, 2008 6:06 AM
To: [hidden email]
Subject: Double Cases and Matching

Hi there!

I've got a file with informations about communities (like socio-economic
variables) and the 'postcode'.

In order to be able to match with another file (individual datas) with
the key variable 'postcode' I've got to keep just one case per value in
the variable 'postcode' in the community-file. Before I'm dropping the
other double cases (which are already found and have the value 0 in the
variable 'PrimaryFirst') I want to summarize the variable 'population'
of all the cases with the same postcode. The sum of the population
should be the new value in the variable 'population' in the cases (with
doublettes) I want to keep (and which have the value 1 in the variable
'PrimaryFirst').

My problem: I don't know how to get this sum of just the cases with the
same postcode.

I guess that's something with the Command DO IF which I have to do? But
what would be the logical expression?

I appreciate any hints!

Thank you very much

Thomas

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Walker-2

Re: Double Cases and Matching

Hi Thomas,

I hope that I am understanding you correctly. Let's assume that your
community file has an age variable along with a population figure. And I'm
assuming that in your community file, records with same postcode can have
different ages and populations -- but this may not be true. My example works
in either case.

DATASET CLOSE ALL.

* Community file, now with an age variable added.
DATA LIST FREE /id age population postcode.
BEGIN DATA
101,22,7125,67865
102,35,9981,77654
103,32,10211,98998
104,45,8988,67865
105,33,6711,67865
106,27,5674,98998
107,65,3126,77654
108,27,7801,98998
109,21,1211,77654
110,27,9099,98998
111,32,9000,12345
112,44,10000,33227
113,44,18763,22222
114,53,388,44444
END DATA.
DATASET NAME Community.

* Creates File1 with 'sumpop', the sum of 'population', for each 'postcode'.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Alternatively, creates File2 with 'sumpop' and mean age 'avgage' for each
'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population)
/avgage=MEAN(age).
DATASET NAME File2.

* Creates your individual file (File3).
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* First option: appends 'sumpop' only.
MATCH FILES /FILE=File3 / TABLE=File1 / BY postcode.
DATASET NAME Result1.
EXECUTE.

* Second option: append 'sumpop' and 'avgage'.
MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode.
DATASET NAME Result2.
EXECUTE.

Note that in both Result1 and Result2, you'll see records with no summary
data, because they don't exist in your individual file.

Regards,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Friday, March 07, 2008 10:53 AM
To: [hidden email]
Subject: AW: Double Cases and Matching

Hi Bob

Thank you very much for your answer! I made some remarks/ questions in your
answer and added more Data because it didn't work, yet.

-----Ursprüngliche Nachricht-----
Von: Bob Walker [mailto:[hidden email]]
Gesendet: Freitag, 7. März 2008 15:56
An: [hidden email]
Cc: Balmer Thomas
Betreff: RE: Double Cases and Matching

I think you are asking how to first summarize population by postcode, and
then append the summarized population data to your file of individual
records by matching on 'postcode' -- correct?

Yes, but I'd like to append also other variables to my file of
individuals from my community file (e.g. typologie).

Try the AGGREGATE command; it is used to summarize variables based on one or
more grouping variables (called BREAK variables). I've created a file of
individual records (File) and a population file (File2) aggregated by
'postcode'.

To summarize works fine with the AGGREGATE command. OUTFILE seems to
be the command to create a new file with these (new) aggregated variables.
But I'd like to add 'sumpop' in my existing community file, which I then
want to match as you described. But just leaving out OUTFILE doesn't work.

DATASET CLOSE ALL.
* My community file.
DATA LIST FREE /id population postcode.
BEGIN DATA
101,7125,67865
102,9981,77654
103,10211,98998
104,8988,67865
105,6711,67865
106,5674,98998
107,3126,77654
108,7801,98998
109,1211,77654
110,9099,98998
111,9000,12345
112,10000,33227
113,18763,22222
114,388,44444
END DATA.

*My individual file with the above postcodes and more.
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998

* Sort on break variable for MATCH FILES (both files).
SORT CASES BY postcode.
* Saves individual records.
DATASET NAME File1 .
* Saves community file too.
DATASET NAME File2 .

*But now??
* Sums 'postcode' into 'sumpop'.
AGGREGATE OUTFILE=* / BREAK postcode / sumpop=SUM(population).
* Saves the summarized population data by postcode.
DATASET NAME File2 .
* Appends summarized data to individual records.
MATCH FILES FILE=File1 / TABLE=File2 / BY postcode.
* Forces data to be read.
EXECUTE.

Regards,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Friday, March 07, 2008 6:06 AM
To: [hidden email]
Subject: Double Cases and Matching

Hi there!

I've got a file with informations about communities (like socio-economic
variables) and the 'postcode'.

In order to be able to match with another file (individual datas) with
the key variable 'postcode' I've got to keep just one case per value in
the variable 'postcode' in the community-file. Before I'm dropping the
other double cases (which are already found and have the value 0 in the
variable 'PrimaryFirst') I want to summarize the variable 'population'
of all the cases with the same postcode. The sum of the population
should be the new value in the variable 'population' in the cases (with
doublettes) I want to keep (and which have the value 1 in the variable
'PrimaryFirst').

My problem: I don't know how to get this sum of just the cases with the
same postcode.

I guess that's something with the Command DO IF which I have to do? But
what would be the logical expression?

I appreciate any hints!

Thank you very much

Thomas

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Tom

AW: Double Cases and Matching

Hi Bob

Thanks again.

It's not exactly what I need to do (see below).

DATASET CLOSE ALL.

* Community file, now with an age variable added.
DATA LIST FREE /cid age population postcode.
BEGIN DATA
101,22,7125,67865
102,35,9981,77654
103,32,10211,98998
104,45,8988,67865
105,33,6711,67865
106,27,5674,98998
107,65,3126,77654
108,27,7801,98998
109,21,1211,77654
110,27,9099,98998
111,32,9000,12345
112,44,10000,33227
113,44,18763,22222
114,53,388,44444
END DATA.
DATASET NAME Community.

* Creates File1 with 'sumpop', the sum of 'population', for each
'postcode'.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Alternatively, creates File2 with 'sumpop' and mean age 'avgage' for
each 'postcode'.
**Thats not exactly what I want to do, I'd like to have the 'sumpop' for
each 'postcode' (as you describe above) and then want to copy 'age' only
from this case within those with the same postcode with the highest
population, say for instance age 32 from those cases with postcode
98998.
**So there should be a step choosing those cases within the community
file, which have the same postcode and within those copy the value of
'age' of this community/case with the highest population with 'sumpop'
in File1. How can I do this?

* Creates your individual file (File3).
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* First option: appends 'sumpop' only.
**Additionaly I want to append the age of the community with the highest
population
MATCH FILES /FILE=File3 / TABLE=File1 / BY postcode.
DATASET NAME Result1.
EXECUTE.

Note that in both Result1 and Result2, you'll see records with no
summary data, because they don't exist in your individual file.

Thats ok.

REgards
Thomas

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Walker-2

Re: Double Cases and Matching

Hi Thomas,

Just AGGREGATE by postcode in two passes, then append both of these files to
your respondent file. For maximum age, use the MAX function...

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode.
BEGIN DATA
101,22,7125,67865
102,35,9981,77654
103,32,10211,98998
104,45,8988,67865
105,33,6711,67865
106,27,5674,98998
107,65,3126,77654
108,27,7801,98998
109,21,1211,77654
110,27,9099,98998
111,32,9000,12345
112,44,10000,33227
113,44,18763,22222
114,53,388,44444
END DATA.
DATASET NAME Community.

* Creates 'sumpop' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Finds maximum age 'maxage' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age).
DATASET NAME File2.

*Individual file with postcodes.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop and maxage.
MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode.
DATASET NAME Result WINDOW=FRONT.
EXECUTE.

HTH,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Sunday, March 09, 2008 10:41 AM
To: [hidden email]
Subject: AW: Double Cases and Matching

Hi Bob

Thanks again.

It's not exactly what I need to do (see below).

DATASET CLOSE ALL.

* Community file, now with an age variable added.
DATA LIST FREE /cid age population postcode.
BEGIN DATA
101,22,7125,67865
102,35,9981,77654
103,32,10211,98998
104,45,8988,67865
105,33,6711,67865
106,27,5674,98998
107,65,3126,77654
108,27,7801,98998
109,21,1211,77654
110,27,9099,98998
111,32,9000,12345
112,44,10000,33227
113,44,18763,22222
114,53,388,44444
END DATA.
DATASET NAME Community.

* Creates File1 with 'sumpop', the sum of 'population', for each
'postcode'.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Alternatively, creates File2 with 'sumpop' and mean age 'avgage' for
each 'postcode'.
**Thats not exactly what I want to do, I'd like to have the 'sumpop' for
each 'postcode' (as you describe above) and then want to copy 'age' only
from this case within those with the same postcode with the highest
population, say for instance age 32 from those cases with postcode
98998.
**So there should be a step choosing those cases within the community
file, which have the same postcode and within those copy the value of
'age' of this community/case with the highest population with 'sumpop'
in File1. How can I do this?

* Creates your individual file (File3).
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* First option: appends 'sumpop' only.
**Additionaly I want to append the age of the community with the highest
population
MATCH FILES /FILE=File3 / TABLE=File1 / BY postcode.
DATASET NAME Result1.
EXECUTE.

Note that in both Result1 and Result2, you'll see records with no
summary data, because they don't exist in your individual file.

Thats ok.

REgards
Thomas

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Tom

Re: Double Cases and Matching

In reply to this post by Tom

Hi Bob

Works fine so far, thanks a lot! I thought to use Select if or something like that, your solution is very elegant.

Now, I have other variables in the community file (like 'type', actually 8 other variables), which should be matched at the end with the file of my respondents. But, it should be those values of these other variables from the cases with the max age.

How can I take the values of type from those cases with the max age in the result file?
As I understood the command DOCUMENT, I tried it in order to keep type in file 1 - but it doesn't work - oh my bad english...
Or

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type.
BEGIN DATA
101,22,7125,67865,4
102,35,9981,77654,3
103,32,10211,98998,2
104,45,8988,67865,4
105,33,6711,67865,4
106,27,5674,98998,2
107,65,3126,77654,3
108,27,7801,98998,2
109,21,1211,77654,3
110,27,9099,98998,2
111,32,9000,12345,1
112,44,10000,33227,2
113,44,18763,22222,4
114,53,388,44444,1
END DATA.
DATASET NAME Community.

* Creates 'sumpop' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Finds maximum age 'maxage' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age).
DATASET NAME File2.

*Individual file with postcodes.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop and maxage.
MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode.
DATASET NAME Result WINDOW=FRONT.
EXECUTE.
SET CLOSE ALL.

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Walker-2

Re: Double Cases and Matching

Hi Thomas,

Sorry for the silence - been a busy day! So now you're saying you want to
append not only the total population and maximum age, but also the variables
of the record with the maximum age, say variables 'type1', 'type2', etc. My
only concern is: will there only be ONE maximum age for a postcode? If so,
your table file may not be accurate. Anyway, here's one approach (hope you
agree, Richard):

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type1 type2 type3.
BEGIN DATA
101,22,7125,67865,4,2,2
102,35,9981,77654,3,1,3
103,32,10211,98998,2,5,1
104,45,8988,67865,4,3,2
105,33,6711,67865,4,7,1
106,27,5674,98998,2,4,2
107,65,3126,77654,3,6,3
108,42,7801,98998,2,8,3
109,21,1211,77654,3,3,1
110,65,9099,98998,2,4,2
111,32,9000,12345,1,7,1
112,44,10000,33227,2,3,2
113,44,18763,22222,4,1,3
114,53,388,44444,1,1,1
END DATA.
SORT CASES BY postcode age.
DATASET NAME Community.

* Calculate 'sumpop', 'maxage' per 'postcode'.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /age=MAX(age).
COMPUTE FLAG=1.
EXECUTE.
DATASET NAME File1.

* Match the summary measures to the community file.
DATASET ACTIVATE Community.
MATCH FILES /FILE=Community /TABLE=File1 /BY postcode age.
SELECT IF FLAG=1.
MATCH FILES /FILE=* RENAME (age=maxage)/ KEEP postcode maxage sumpop type1
type2 type3.
SORT CASES BY postcode maxage.
DATASET NAME File2.

*Individual file with the above postcodes and more.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop, maxage, and type1-type3 variables.
MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode.
EXECUTE.
DATASET NAME Result WINDOW=FRONT.

Regards,

Bob Walker
Surveys & Forecasts, LLC

________________________________________
From: Balmer Thomas [mailto:[hidden email]]
Sent: Monday, March 10, 2008 1:33 PM
To: SPSSX(r) Discussion
Cc: Bob Walker
Subject: RE: Double Cases and Matching

Hi Bob

Works fine so far, thanks a lot! I thought to use Select if or something
like that, your solution is very elegant.

Now, I have other variables in the community file (like 'type', actually 8
other variables), which should be matched at the end with the file of my
respondents. But, it should be those values of these other variables from
the cases with the max age.

How can I take the values of type from those cases with the max age in the
result file?
As I understood the command DOCUMENT, I tried it in order to keep type in
file 1 - but it doesn't work - oh my bad english...
Or

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type.
BEGIN DATA
101,22,7125,67865,4
102,35,9981,77654,3
103,32,10211,98998,2
104,45,8988,67865,4
105,33,6711,67865,4
106,27,5674,98998,2
107,65,3126,77654,3
108,27,7801,98998,2
109,21,1211,77654,3
110,27,9099,98998,2
111,32,9000,12345,1
112,44,10000,33227,2
113,44,18763,22222,4
114,53,388,44444,1
END DATA.
DATASET NAME Community.

* Creates 'sumpop' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Finds maximum age 'maxage' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age).
DATASET NAME File2.

*Individual file with postcodes.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop and maxage.
MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode.
DATASET NAME Result WINDOW=FRONT.
EXECUTE.
SET CLOSE ALL.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Tom

AW: Double Cases and Matching

Hi Bob

Thank you very much, it's exactly what i needed! Just one question in order to understand the syntax: The FLAG Variable is a sort of Dummy to these cases in the community file with the max population, isn't it?

Regards
Thomas

-----Ursprüngliche Nachricht-----
Von: Bob Walker [mailto:[hidden email]]
Gesendet: Mittwoch, 12. März 2008 04:09
An: 'SPSSX(r) Discussion'
Cc: Balmer Thomas
Betreff: RE: Double Cases and Matching

Hi Thomas,

Sorry for the silence - been a busy day! So now you're saying you want to
append not only the total population and maximum age, but also the variables
of the record with the maximum age, say variables 'type1', 'type2', etc. My
only concern is: will there only be ONE maximum age for a postcode? If so,
your table file may not be accurate. Anyway, here's one approach (hope you
agree, Richard):

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type1 type2 type3.
BEGIN DATA
101,22,7125,67865,4,2,2
102,35,9981,77654,3,1,3
103,32,10211,98998,2,5,1
104,45,8988,67865,4,3,2
105,33,6711,67865,4,7,1
106,27,5674,98998,2,4,2
107,65,3126,77654,3,6,3
108,42,7801,98998,2,8,3
109,21,1211,77654,3,3,1
110,65,9099,98998,2,4,2
111,32,9000,12345,1,7,1
112,44,10000,33227,2,3,2
113,44,18763,22222,4,1,3
114,53,388,44444,1,1,1
END DATA.
SORT CASES BY postcode age.
DATASET NAME Community.

* Calculate 'sumpop', 'maxage' per 'postcode'.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /age=MAX(age).
COMPUTE FLAG=1.
EXECUTE.
DATASET NAME File1.

* Match the summary measures to the community file.
DATASET ACTIVATE Community.
MATCH FILES /FILE=Community /TABLE=File1 /BY postcode age.
SELECT IF FLAG=1.
MATCH FILES /FILE=* RENAME (age=maxage)/ KEEP postcode maxage sumpop type1
type2 type3.
SORT CASES BY postcode maxage.
DATASET NAME File2.

*Individual file with the above postcodes and more.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop, maxage, and type1-type3 variables.
MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode.
EXECUTE.
DATASET NAME Result WINDOW=FRONT.

Regards,

Bob Walker
Surveys & Forecasts, LLC

________________________________________
From: Balmer Thomas [mailto:[hidden email]]
Sent: Monday, March 10, 2008 1:33 PM
To: SPSSX(r) Discussion
Cc: Bob Walker
Subject: RE: Double Cases and Matching

Hi Bob

Works fine so far, thanks a lot! I thought to use Select if or something
like that, your solution is very elegant.

Now, I have other variables in the community file (like 'type', actually 8
other variables), which should be matched at the end with the file of my
respondents. But, it should be those values of these other variables from
the cases with the max age.

How can I take the values of type from those cases with the max age in the
result file?
As I understood the command DOCUMENT, I tried it in order to keep type in
file 1 - but it doesn't work - oh my bad english...
Or

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type.
BEGIN DATA
101,22,7125,67865,4
102,35,9981,77654,3
103,32,10211,98998,2
104,45,8988,67865,4
105,33,6711,67865,4
106,27,5674,98998,2
107,65,3126,77654,3
108,27,7801,98998,2
109,21,1211,77654,3
110,27,9099,98998,2
111,32,9000,12345,1
112,44,10000,33227,2
113,44,18763,22222,4
114,53,388,44444,1
END DATA.
DATASET NAME Community.

* Creates 'sumpop' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Finds maximum age 'maxage' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age).
DATASET NAME File2.

*Individual file with postcodes.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop and maxage.
MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode.
DATASET NAME Result WINDOW=FRONT.
EXECUTE.
SET CLOSE ALL.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Walker-2

Re: Double Cases and Matching

Hi Thomas,

Yes, FLAG identifies just those cases in your community file with the
maximum age. We can then append the entire record to the individual file
(File3). I meant to ask in my earlier post: I'm assuming that it's OK to
have more than one maxage record with the same postcode in the Community
file? The AGGREGATE command will not care, but I didnt know if it was an
issue for you.

Regards,

Bob Walker
Surveys & Forecasts, LLC

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Balmer Thomas
Sent: Wednesday, March 12, 2008 7:05 AM
To: [hidden email]
Subject: AW: Double Cases and Matching

Hi Bob

Thank you very much, it's exactly what i needed! Just one question in order
to understand the syntax: The FLAG Variable is a sort of Dummy to these
cases in the community file with the max population, isn't it?

Regards
Thomas

-----Ursprüngliche Nachricht-----
Von: Bob Walker [mailto:[hidden email]]
Gesendet: Mittwoch, 12. März 2008 04:09
An: 'SPSSX(r) Discussion'
Cc: Balmer Thomas
Betreff: RE: Double Cases and Matching

Hi Thomas,

Sorry for the silence - been a busy day! So now you're saying you want to
append not only the total population and maximum age, but also the variables
of the record with the maximum age, say variables 'type1', 'type2', etc. My
only concern is: will there only be ONE maximum age for a postcode? If so,
your table file may not be accurate. Anyway, here's one approach (hope you
agree, Richard):

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type1 type2 type3.
BEGIN DATA
101,22,7125,67865,4,2,2
102,35,9981,77654,3,1,3
103,32,10211,98998,2,5,1
104,45,8988,67865,4,3,2
105,33,6711,67865,4,7,1
106,27,5674,98998,2,4,2
107,65,3126,77654,3,6,3
108,42,7801,98998,2,8,3
109,21,1211,77654,3,3,1
110,65,9099,98998,2,4,2
111,32,9000,12345,1,7,1
112,44,10000,33227,2,3,2
113,44,18763,22222,4,1,3
114,53,388,44444,1,1,1
END DATA.
SORT CASES BY postcode age.
DATASET NAME Community.

* Calculate 'sumpop', 'maxage' per 'postcode'.
AGGREGATE OUTFILE=* /BREAK postcode /sumpop=SUM(population) /age=MAX(age).
COMPUTE FLAG=1.
EXECUTE.
DATASET NAME File1.

* Match the summary measures to the community file.
DATASET ACTIVATE Community.
MATCH FILES /FILE=Community /TABLE=File1 /BY postcode age.
SELECT IF FLAG=1.
MATCH FILES /FILE=* RENAME (age=maxage)/ KEEP postcode maxage sumpop type1
type2 type3.
SORT CASES BY postcode maxage.
DATASET NAME File2.

*Individual file with the above postcodes and more.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop, maxage, and type1-type3 variables.
MATCH FILES /FILE=File3 / TABLE=File2 / BY postcode.
EXECUTE.
DATASET NAME Result WINDOW=FRONT.

Regards,

Bob Walker
Surveys & Forecasts, LLC

________________________________________
From: Balmer Thomas [mailto:[hidden email]]
Sent: Monday, March 10, 2008 1:33 PM
To: SPSSX(r) Discussion
Cc: Bob Walker
Subject: RE: Double Cases and Matching

Hi Bob

Works fine so far, thanks a lot! I thought to use Select if or something
like that, your solution is very elegant.

Now, I have other variables in the community file (like 'type', actually 8
other variables), which should be matched at the end with the file of my
respondents. But, it should be those values of these other variables from
the cases with the max age.

How can I take the values of type from those cases with the max age in the
result file?
As I understood the command DOCUMENT, I tried it in order to keep type in
file 1 - but it doesn't work - oh my bad english...
Or

DATASET CLOSE ALL.

* Community file, now with an age variable.
DATA LIST FREE /cid age population postcode type.
BEGIN DATA
101,22,7125,67865,4
102,35,9981,77654,3
103,32,10211,98998,2
104,45,8988,67865,4
105,33,6711,67865,4
106,27,5674,98998,2
107,65,3126,77654,3
108,27,7801,98998,2
109,21,1211,77654,3
110,27,9099,98998,2
111,32,9000,12345,1
112,44,10000,33227,2
113,44,18763,22222,4
114,53,388,44444,1
END DATA.
DATASET NAME Community.

* Creates 'sumpop' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /DOCUMENT/BREAK postcode /sumpop=SUM(population).
DATASET NAME File1.

* Finds maximum age 'maxage' for each 'postcode'.
DATASET ACTIVATE Community.
AGGREGATE OUTFILE=* /BREAK postcode /maxage=MAX(age).
DATASET NAME File2.

*Individual file with postcodes.
DATA LIST FREE /id var1 var2 postcode.
BEGIN DATA
1110, 1, 3, 67865
1111, 3, 3, 77654
1112, 2, 4, 98998
1113, 1, 3, 67865
1114, 5, 5, 67865
1115, 3, 2, 98998
1116, 4, 3, 77654
1117, 2, 4, 98998
1118, 3, 1, 77654
1119, 2, 1, 12345
1120, 4, 4, 33227
1121, 2, 4, 22222
1122, 4, 3, 12122
1123, 2, 2, 44444
1124, 3, 1, 34344
1125, 2, 2, 11112
1126, 5, 2, 23232
1127, 5, 4, 43211
1128, 5, 4, 98998
END DATA.
SORT CASES BY postcode.
DATASET NAME File3.

* Append sumpop and maxage.
MATCH FILES /FILE=File3 / TABLE=File1 / TABLE=File2 / BY postcode.
DATASET NAME Result WINDOW=FRONT.
EXECUTE.
SET CLOSE ALL.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD