SPSSX Discussion

Macro : Age categories

Classic

List

Threaded

13 messages Options

Yves_Therriault

Macro : Age categories

Dear SPSSers,

First of all, I'd like to express my gratitude to all of those who've
answered my question about how to do Poisson regression with SPSS.

I would appreciate very much if someone could answer this one regarding the
creation of age categories with a macro.

I have a file which contains data about estimates and projections of Quebec
province population from 1981 to 2026. The file has one variable for each
age, MoinsUnan (under 1 year), p_1, p_2, p_3 .... p_100.

What I'm looking for is a macro which could allow me to define any number
of age categories.

Let's say that I'd like to create a file with 20 variables, one for each 20
age categories
first var = MoinsUnAn
second var = p_1a4
third var = p_5a9
fourth var = p_10a14
...
20th var = 90yearsandover

Let's assume that I would like to create another file with 5 variables
corresponding to 5 age categories
first var =0a14
second var =15a29
third var =30a44
fourth var =45a64
fifth var =65andover.

I've been struggling with this problem for more than a day now, but I can't
see the solution.

Thanks in advance to anyone who could give me some help.

Yves Therriault, Ph. D.
Canada
I

emaguin

Re: Macro : Age categories

Yves,

I'm a bit confused by your question. At first, i thought you wanted to do a
'variable condensation' where you condensed the data in 10 variables, age1
to age10, into one new variable. But given what you are working with
'...estimates and projections of Quebec province population from 1981 to
2026'; i think each variable must have data in it, i.e., age1 is the number
of one year olds. etc. So now i think you just want to remane the variables
such that the new names have a meaningful structure. If this is so, i'd do
it with a Rename variables command. Perhaps you've already thought of this
and believe a macro would give you more greater flexibility. If so, perhaps
a macro expert can help.

Gene Maguin

Yves_Therriault

Re: Macro : Age categories

In reply to this post by Yves_Therriault

On Fri, 28 Jul 2006 09:50:44 -0400, [hidden email] wrote:

>Yves,
>
>I'm a bit confused by your question. At first, i thought you wanted to do a
>'variable condensation' where you condensed the data in 10 variables, age1
>to age10, into one new variable. But given what you are working with
>'...estimates and projections of Quebec province population from 1981 to
>2026'; i think each variable must have data in it, i.e., age1 is the number
>of one year olds. etc. So now i think you just want to remane the variables
>such that the new names have a meaningful structure. If this is so, i'd do
>it with a Rename variables command. Perhaps you've already thought of this
>and believe a macro would give you more greater flexibility. If so, perhaps
>a macro expert can help.
>

Hi Gene and all SPSSers,

I'm really sorry for not having posted a clear question. English isn't my
native tongue so I'm not as fluent in English as I would like.

First of all, let me explain why I need a macro to build age categories "on
the fly".

I've already written many sets of macros in order to compute some statistic
like the age-adjusted death rates, age-adjusted YPLL rates, and so forth.
Each set has 3 macros specially designed for my region.

For example, in the case of age-adjusted death rates, the first macro has
been written to compute the total number of deaths observed over a period
of x years for each of the sub-territories of the North Shore region and
the province of Quebec as a whole; the mean number of deaths per year, the
annual crude death rate for each age category. The data related to death is
taken from a file provided by the Quebec government. In order to compute
the death rate, the macro calls a population file that gives the number of
people in each age categories of the the North Shore population, it's sub-
territories and the population of the Quebec province for the mid-period
chosen in the analysis.

The second macro has been written to compute the age-standardized death
rates for each territory. Among other things, it calls the standard
population (for instance : the 2001 Quebec population). The third macro is
used to compute the confidence intervals of the age-standardized death
rates and to see if there is any significant difference between the age-
standardized death rate of Quebec province and those of the North Shore and
it's sub-territories.

I usually work with 20 age categories : under 1 year, 1 to 4, 5 to 9, 10 to
14 ... 90 and over. I've already created many SPSS files (one per year for
each gender and the whole population). In those files, I've one variable
for each age category : p0, p1à4, p5à9, p10à14... p90.

The problem I've been struggling with for more than 2 days, is that I'm
trying to build a macro that would allow the "creation" of different number
of age categories as someone could wish to work with other age categories
than those that I use. Then, instead of working with populations files that
I mentionned above, I would call a file that contains variables about
estimates and projections of Quebec province population from 1981 to 2026.
As I wrote in my earlier post, this file has one variable per age :
MoinsUnan, p_1, p_2, p_3 .... p_100. The variable p_100 is related to
people who are 100 years old and over.

Let's say that someone would like to work with 5 age categories :

first cat=p0a14
second cat=p15a29
third cat=p30a44
fourth cat=p45a64
fifth cat=p65_ans_et_plus.

When he(she) calls the macro, he(she) would have to specify 5 arguments
regarding the age :

!agecat première_cat = 0à14 / seconde_cat = 15à29 / troisième_cat =
30à44 / quatrième_cat = 45à64 / cinquième_cat = 65ans_et_plus.

Hence, for the argument 0à14, the macro would create a variable called
p0a14 which would be the sum of the variables "MoinsUnan to p_14", for the
argument 15à29, the macro would sum the variables p_15 to p_29 and create
the variable "p15a29" ... And for the fifth argument, 65ans_et_plus, the
macro would create a variable "p65_ans_et_plus" for the people who are 65
years old and over.

Should someone want to use 20 age categories, he would have to use 20
arguments like :

!agecat première_cat = 0 / seconde_cat = 1à4 / troisième_cat = 5à9 / ..
20ième_cat = 90_et_plus.

In that case, the macro would create 20 variables, one for each age
category : p0, p1a4, p5a9 .. p90.

I hope that I've succeed to clarify my earlier post.

I would appreciate if someone could show me the light :-).

P.S. Sorry for the bad English.

Yves Therriault, Ph.D.
Agent de recherche,
Direction de la santé publique
Agence de la santé et des services sociaux de la Côte-Nord

Maguin, Eugene

Recode question

All,

I must be doing something wrong or misunderstanding the documetation but ...

This is the frequencies for the variable going in to the recode (apologies
if tabs are not preserved).

P1RAAC1 A1 Par: Alcohol Abuse Count
Frequency Percent Valid Percent Cumulative Percent
Valid .00 89 13.2 13.6 13.6
1.00 102 15.1 15.6 29.2
2.00 58 8.6 8.9 38.1
...
12.00 24 3.6 3.7 98.3
13.00 11 1.6 1.7 100.0
Total 654 97.0 100.0
Missing 99.00 19 2.8
System 1 .1
Total 20 3.0
Total 674 100.0

This the recode statement.

RECODE P1RAAC1(0 1=0)(2=2)(3 THRU 13=3)(MISSING=9).

And this is the result

P1RAAC1 A1 Par: Alcohol Abuse Count
Frequency Percent Valid Percent Cumulative Percent
Valid .00 0-1 NonAlc 191 28.3 28.4 28.4
2.00 2 Maybe 58 8.6 8.6 37.0
3.00 3+ Definite 405 60.1 60.2 97.2
99.00 19 2.8 2.8 100.0
Total 673 99.9 100.0
Missing 9.00 1 .1
Total 674 100.0

As I read the documentation for 13, the 'missing' keyword causes user
missing and system missing value to be coded to sysmis on the output. It
doesn't look like this is happening here. Comments?

Thanks, Gene Maguin

Maguin, Eugene

Recode question (correction)

In reply to this post by Yves_Therriault

All,

My conclusion sentences were in error. I said:

As I read the documentation for 13, the 'missing' keyword causes user
missing and system missing value to be coded to sysmis on the output. It
doesn't look like this is happening here. Comments?

I should have said:

As I read the documentation for 13, the 'missing' keyword causes both user
missing and system missing values to be coded to the specified value on the
output. It doesn't look like the user missing value of 99 was recoded to 9
but should have been. Comments?

Gene Maguin

Oliver, Richard

Re: Recode question

In reply to this post by Maguin, Eugene

The documentation is unfortunately vague. What the MISSING keyword actually does is recode any user- and system-missing values into the specified value. It does not make that value user-missing (or system-missing). If you want user-missing values to be system-missing, use MISSING=SYSMIS.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of emaguin
Sent: Monday, July 31, 2006 12:46 PM
To: [hidden email]
Subject: Recode question

All,

I must be doing something wrong or misunderstanding the documetation but ...

This is the frequencies for the variable going in to the recode (apologies if tabs are not preserved).

P1RAAC1 A1 Par: Alcohol Abuse Count
Frequency Percent Valid Percent Cumulative Percent
Valid .00 89 13.2 13.6 13.6
1.00 102 15.1 15.6 29.2
2.00 58 8.6 8.9 38.1
...
12.00 24 3.6 3.7 98.3
13.00 11 1.6 1.7 100.0
Total 654 97.0 100.0
Missing 99.00 19 2.8
System 1 .1
Total 20 3.0
Total 674 100.0

This the recode statement.

RECODE P1RAAC1(0 1=0)(2=2)(3 THRU 13=3)(MISSING=9).

And this is the result

P1RAAC1 A1 Par: Alcohol Abuse Count
Frequency Percent Valid Percent Cumulative Percent
Valid .00 0-1 NonAlc 191 28.3 28.4 28.4
2.00 2 Maybe 58 8.6 8.6 37.0
3.00 3+ Definite 405 60.1 60.2 97.2
99.00 19 2.8 2.8 100.0
Total 673 99.9 100.0
Missing 9.00 1 .1
Total 674 100.0

As I read the documentation for 13, the 'missing' keyword causes user missing and system missing value to be coded to sysmis on the output. It doesn't look like this is happening here. Comments?

Thanks, Gene Maguin

Maguin, Eugene

Re: Recode question

In reply to this post by Yves_Therriault

All,

I think something has been missed the replies to my question.

Going into the recode 99 was declared to be user missing as the frequency
listing shows. The recode statement ... (missing=9) should have recoded any
user or system missing values to 9. When you look at what comes out, you see
that one case has a value of 9, which would be the case with a sysmis value
going in, and 19 cases have a value of 99. The sysmis part worked as
documented. My contention is that those 19 cases, because they were user
missing going in should have been changed to 9 coming out but they weren't.
It is these 19 cases that I interested in.

I also realize I also did a poor job in making clear what I did because I
left out a missing values statement wherein I declared that 9 was user
missing and which was positioned after the recode statement and before the
frequencies statement.

Gene Maguin

Peck, Jon

Re: Recode question

Quoting the CSR

Value specifications are scanned left to right.
A value is recoded only once per RECODE command

So the first specification that matches determines the recode. Overlaps are ok, and, in fact, if you have interval recodes, you control how the endpoints are treated by the order.

HTH

Jon Peck
SPSS

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of emaguin
Sent: Montag, 31. Juli 2006 14:01
To: [hidden email]
Subject: Re: [SPSSX-L] Recode question

All,

I think something has been missed the replies to my question.

Going into the recode 99 was declared to be user missing as the frequency
listing shows. The recode statement ... (missing=9) should have recoded any
user or system missing values to 9. When you look at what comes out, you see
that one case has a value of 9, which would be the case with a sysmis value
going in, and 19 cases have a value of 99. The sysmis part worked as
documented. My contention is that those 19 cases, because they were user
missing going in should have been changed to 9 coming out but they weren't.
It is these 19 cases that I interested in.

I also realize I also did a poor job in making clear what I did because I
left out a missing values statement wherein I declared that 9 was user
missing and which was positioned after the recode statement and before the
frequencies statement.

Gene Maguin

Oliver, Richard

Re: Recode question

In reply to this post by Maguin, Eugene

I think the problem is the sometimes confusing behavior of MISSING VALUES, which always takes effect immediately, whereas RECODE doesn't execute until the next command that reads the data. This is one of those rare instances where you may need an EXECUTE command.

If your code is currently something like:

missing values somevar (99).
recode somevar (value=value)...(missing=9).
missing values somevar (9).
frequencies variables=somevar.

That is really no different than:

missing values somevar (9, 99).
recode...
frequencies...

because the MISSING VALUES command that sets 9 to missing takes effect before the preceding RECODE is executed, and I think the RECODE specification of MISSING=9 may be problematic if 9 is already considered user-missing.

Try putting an EXECUTE between the RECODE and the second MISSING VALUES command, as in:

missing values somevar (99).
recode somevar (value=value)...(missing=9).
execute.
missing values somevar (9).
frequencies variables=somevar.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of emaguin
Sent: Monday, July 31, 2006 2:01 PM
To: [hidden email]
Subject: Re: Recode question

All,

I think something has been missed the replies to my question.

Going into the recode 99 was declared to be user missing as the frequency listing shows. The recode statement ... (missing=9) should have recoded any user or system missing values to 9. When you look at what comes out, you see that one case has a value of 9, which would be the case with a sysmis value going in, and 19 cases have a value of 99. The sysmis part worked as documented. My contention is that those 19 cases, because they were user missing going in should have been changed to 9 coming out but they weren't.
It is these 19 cases that I interested in.

I also realize I also did a poor job in making clear what I did because I left out a missing values statement wherein I declared that 9 was user missing and which was positioned after the recode statement and before the frequencies statement.

Gene Maguin

Linda Case

Custom Tables Question: Use of Chi-Square for Independence vs. Z-Test for Equality of Column Proportions

In reply to this post by Peck, Jon

Greetings:

I have a new set of data for a repeated measures study with three treatment
groups. We have a number of demographic factors that we would like to test
for equal frequency distribution among the three groups. I have created
tables with columns for the treatment groups and rows for the demographic
factor. I then include the chi-square for independence to test for equal
distribution. However, for some of these (for example, occupation which
includes a number of categories in which less than 5 respondents were
represented), the chi-square test for independence cannot be used because
the data do not meet the minimum expected cell frequency of 5 assumption.

Is it acceptable to use the "compare column proportion" using z-scores, for
these comparisons, which is also included with Tables? I know these are a
series of pairwise comparisons for each pair of treatments, but they are
Bonferroni-protected and appear to give me the answers that I need. I just
am not certain that this is a correct approach as I typically would use
Chi-square for this type of testing.

Thanks for your help!

Linda Case

Linda P. Case
AutumnGold Consulting
www.autumngoldconsulting.com
(217) 586-4864
[hidden email]

David Greenberg

Re: Custom Tables Question: Use of Chi-Square for Independence vs. Z-Test for Equality of Column Proportions

Why not use Fisher's exact test? That way you won't have to worry about
having cells with few respondents. David Greenberg, Sociology
Department, New York University

----- Original Message -----
From: Linda Case <[hidden email]>
Date: Monday, July 31, 2006 5:15 pm
Subject: Custom Tables Question: Use of Chi-Square for Independence vs.
Z-Test for Equality of Column Proportions

> Greetings:
>
> I have a new set of data for a repeated measures study with three
> treatmentgroups. We have a number of demographic factors that we
> would like to test
> for equal frequency distribution among the three groups. I have
> createdtables with columns for the treatment groups and rows for
> the demographic
> factor. I then include the chi-square for independence to test for
> equaldistribution. However, for some of these (for example,
> occupation which
> includes a number of categories in which less than 5 respondents were
> represented), the chi-square test for independence cannot be used
> becausethe data do not meet the minimum expected cell frequency of
> 5 assumption.
>
> Is it acceptable to use the "compare column proportion" using z-
> scores, for
> these comparisons, which is also included with Tables? I know
> these are a
> series of pairwise comparisons for each pair of treatments, but
> they are
> Bonferroni-protected and appear to give me the answers that I need.
> I just
> am not certain that this is a correct approach as I typically would
> useChi-square for this type of testing.
>
> Thanks for your help!
>
> Linda Case
>
> Linda P. Case
> AutumnGold Consulting
> www.autumngoldconsulting.com
> (217) 586-4864
> [hidden email]
>

Richard Ristow

Re: Recode question

In reply to this post by Maguin, Eugene

There have been several answers to this, but I don't think any of them
have quite hit the point. Apologies if it's solved, and I missed the
solution.

At 03:45 PM 7/31/2006, emaguin wrote:

>This is the frequencies for the variable going in to the recode.
>
>P1RAAC1 A1 Par: Alcohol Abuse Count
> Frequency Percent Valid Cumulative
> Percent Percent
>Valid .00 89 13.2 13.6 13.6
> 1.00 102 15.1 15.6 29.2
> 2.00 58 8.6 8.9 38.1
> ...
> 12.00 24 3.6 3.7 98.3
> 13.00 11 1.6 1.7 100.0
> Total 654 97.0 100.0
>Missing 99.00 19 2.8
> System 1 .1
> Total 20 3.0
>Total 674 100.0
>
>This the recode statement.
>
>RECODE P1RAAC1(0 1=0)(2=2)(3 THRU 13=3)(MISSING=9).
>
>And this is the result
>
>P1RAAC1 A1 Par: Alcohol Abuse Count
> Frequency Percent Valid Cumulative
> Percent Percent
>Valid .00 0-1 NonAlc 191 28.3 28.4 28.4
> 2.00 2 Maybe 58 8.6 8.6 37.0
> 3.00 3+ Definite 405 60.1 60.2 97.2
> 99.00 19 2.8 2.8 100.0
> Total 673 99.9 100.0
>Missing 9.00 1 .1
>Total 674 100.0

As from Richard Oliver's second posting, it looks like it's a problem
with a MISSING VALUES statement.

The first FREQUENCIES are what you'd get if 99 is a user-missing value.
The second are what you'd get if, for the RECODE and FREQUENCIES, 9 is
a user-missing value but 99 is not. You probably used a statement
. MISSING VALUES P1RAAC1(9).
Replace it with
. MISSING VALUES P1RAAC1(9,99).
and you should get what you want.

Be cautious: RECODEing a source variable into itself is usually a bad
idea. In this case, it loses information: you can't tell 0 from 1
anymore, or 2 through 13 apart from each other. Much better to RECODE
INTO a new variable, like P1RAAC1A.

Kornbrot, Diana

Comparing correlation matrices

V. simple question
What is best and simplest method for caomparing several correlation matrices
model free
Not large 5 variables, minimum 30 obesrvations per matrix

Any help appreciated
Best

Diana

Professor Diana Kornbrot
Evaluation Co-ordinator, Blended Learning Unit
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
Blended Learning Unit
voice +44 (0) 170 728 1315
fax +44 (0) 170 728 1320
Psychology
voice +44 (0) 170 728 4626
fax +44 (0) 170 728 5073
email: [hidden email]
http://www.psy.herts.ac.uk/pub/D.E.Kornbrot/hmpage.html