SPSSX Discussion

Non-Normal age variable, what test to use?

Classic

List

Threaded

5 messages Options

Charlotte-9

Non-Normal age variable, what test to use?

Dear all,

I am investigating uptake of a screening test by age. However, my age
distribution is not Normal and appears to look more like a step function
with fairly constant frequency between ages 50 and 60 and a constant,
lower, frequency between 60 and 70. Ages are restricted to the range 50 -
70. Can anyone comment on this and perhaps suggest an appropriate test
for comparing the age of people who undertook screening with those who
didnât? Should I be looking to use a non-parametric test in this case?

Thanks in advance,

Lou

Hector Maletta

Re: Non-Normal age variable, what test to use?

No. What are supposed to be normally distributed are the differences between samples, not the distribution of your variable (e.g. age) within a sample (nor in the population). If your sample is a random sample, apply the usual tests, to test whether your two samples come from the same population or from two different ones. Apply non-parametric tests if your samples are too small (n<30 or so) or non random, for in those cases their differences may not follow a normal distribution (they are supposed to approximate a normal distribution as their size tends to infinity and only if randomly extracted).
Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Lou
Enviado el: 26 May 2007 08:59
Para: [hidden email]
Asunto: Non-Normal age variable, what test to use?

Dear all,

I am investigating uptake of a screening test by age. However, my age
distribution is not Normal and appears to look more like a step function
with fairly constant frequency between ages 50 and 60 and a constant,
lower, frequency between 60 and 70. Ages are restricted to the range 50 -
70. Can anyone comment on this and perhaps suggest an appropriate test
for comparing the age of people who undertook screening with those who
didnâ€™t? Should I be looking to use a non-parametric test in this case?

Thanks in advance,

Lou

Charlotte-9

Re: Non-Normal age variable, what test to use?

In reply to this post by Charlotte-9

Hi Hector,

Ah, okay. I sought advice on something similar some time back and it led
to a discussion about the distribtion of the age variable, so I guess I
got caught up on thinking of this as being the important thing here.

Just to clarify what you are saying then, I will take my two groups:
people who were screened and people who were not screened and apply a t-
test to see if these two groups stem from the same population? Am I
basically comparing the mean ages from the groups? My sample is huge
(around 100,000 people).

Thanks,

Lou

On Sat, 26 May 2007 10:03:01 -0300, Hector Maletta
<[hidden email]> wrote:

> No. What are supposed to be normally distributed are the
differences between samples, not the distribution of your variable (e.g.
age) within a sample (nor in the population). If your sample is a random
sample, apply the usual tests, to test whether your two samples come from
the same population or from two different ones. Apply non-parametric tests
if your samples are too small (n<30 or so) or non random, for in those
cases their differences may not follow a normal distribution (they are
supposed to approximate a normal distribution as their size tends to
infinity and only if randomly extracted).

> Hector
>
>
> -----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Lou
>Enviado el: 26 May 2007 08:59
>Para: [hidden email]
>Asunto: Non-Normal age variable, what test to use?
>
> Dear all,
>
> I am investigating uptake of a screening test by age. However,

Hector Maletta

Re: Non-Normal age variable, what test to use?

In reply to this post by Charlotte-9

Lou,
My response referred only to your question about the distribution of ages being not normal, which I think is immaterial to using a parametric or nonparametric test. By the way, age distributions in free living populations are seldom normal: people are born at age 0 and then keep dying, so age distributions have a mode at the left and decrease to the right.
Now, the kind of analysis you are doing is not clear. You say you are analysing the uptake of a screening test by age. That does not look as if your idea is that the mean of age differs according to uptake. Your wording suggests you are trying to predict the uptake using age as a predictor, in a regression or logistic regression function, or assessing whether the uptake varies with age. If that is the case, it may turn out for instance that uptake decreases (or increases) with age, and in that case what is relevant is the significance of the relationship. With the huge number of cases you have I guess it would be (statistically) significant, i.e. likely to be non-zero in the population, even if it turns out to be small. The test to use in order to probe the significance of the relationship should in principle be parametric, assuming your 100,000 cases can be considered as a random sample from the relevant population.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Lou
Enviado el: 26 May 2007 10:15
Para: [hidden email]
Asunto: Re: Non-Normal age variable, what test to use?

Hi Hector,

Ah, okay. I sought advice on something similar some time back and it led
to a discussion about the distribtion of the age variable, so I guess I
got caught up on thinking of this as being the important thing here.

Just to clarify what you are saying then, I will take my two groups:
people who were screened and people who were not screened and apply a t-
test to see if these two groups stem from the same population? Am I
basically comparing the mean ages from the groups? My sample is huge
(around 100,000 people).

Thanks,

Lou

On Sat, 26 May 2007 10:03:01 -0300, Hector Maletta
<[hidden email]> wrote:

> No. What are supposed to be normally distributed are the
differences between samples, not the distribution of your variable (e.g.
age) within a sample (nor in the population). If your sample is a random
sample, apply the usual tests, to test whether your two samples come from
the same population or from two different ones. Apply non-parametric tests
if your samples are too small (n<30 or so) or non random, for in those
cases their differences may not follow a normal distribution (they are
supposed to approximate a normal distribution as their size tends to
infinity and only if randomly extracted).
> Hector
>
>
> -----Mensaje original-----
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Lou
>Enviado el: 26 May 2007 08:59
>Para: [hidden email]
>Asunto: Non-Normal age variable, what test to use?
>
> Dear all,
>
> I am investigating uptake of a screening test by age. However,
my age
> distribution is not Normal and appears to look more like a step
function
> with fairly constant frequency between ages 50 and 60 and a
constant,
> lower, frequency between 60 and 70. Ages are restricted to the
range 50 -
> 70. Can anyone comment on this and perhaps suggest an
appropriate test
> for comparing the age of people who undertook screening with
those who
> didnÃ¢â‚¬â„¢t? Should I be looking to use a non-parametric test in
this case?
>
> Thanks in advance,
>
> Lou

Hector Maletta

Re: Non-Normal age variable, what test to use?

In reply to this post by Charlotte-9

_____

De: Hector Maletta [mailto:[hidden email]]
Enviado el: 26 May 2007 17:46
Para: 'Charlotte Bean'
CC: '[hidden email]'
Asunto: RE: Non-Normal age variable, what test to use?

Lou,

As your dependent variable is binary (screened, not screened), the most appropriate analysis tool seems logistic regression, using age (and perhaps other relevant variables) as predictors. Age can be used as a continuous variable or in groups, though –as you correctly point out—the latter involves loss of information and (I would add) it also involves arbitrary cut off points between groups, so that if two age groups are (for instance) 50-59 and 60-69, two persons aged 51 and 59 are regarded as of the same “age” (group) in spite of having an 8 year difference between their ages, whilst two persons barely one year apart (59 and 60) will be regarded as of different “age”.

Even if you are interested in age, there might be other variables you want to control for in the equation, say gender, education, location or medical history, some of which may even have an interaction with age.

Hector

_____

De: Charlotte Bean [mailto:[hidden email]]
Enviado el: 26 May 2007 13:18
Para: [hidden email]
Asunto: RE: Non-Normal age variable, what test to use?

Hector,

Thanks for your response. Yes, I am looking at whether screening uptake varies with age. A basic plot of my data suggests this to be the case, with uptake increasing as age increases. I have actually done some analysis of this already with the ages categorised into 5 groups. I then performed a chi-square test for trend. However, I did this primarily because me boss wanted me to keep the groupings, but the questions I've been asking here have been primarily for my own benefit. Given that I'm not a statistician, I just wanted to explore the variable in its original form to see what I could do without the loss of information from grouping the data. As such, I was just trying to get a feel for what sort of analysis I should carry out. Any input on this has been/is much appreciated.

Thanks,

Lou

[hidden email] wrote:

Lou,
My response referred only to your question about the distribution of ages being not normal, which I think is immaterial to using a parametric or nonparametric test. By the way, age distributions in free living populations are seldom normal: people are born at age 0 and then keep dying, so age distributions have a mode at the left and decrease to the right.
Now, the kind of analysis you are doing is not clear. You say you are analysing the uptake of a screening test by age. That does not look as if your idea is that the mean of age differs according to uptake. Your wording suggests you are trying to predict the uptake using age as a predictor, in a regression or logistic regression function, or assessing whether the uptake varies with age. If that is the case, it may turn out for instance that uptake decreases (or increases) with age, and in that case what is relevant is the significance of the relationship. With the huge number of cases you have I guess it would be (statistically) significant, i.e. likely to be non-zero in the population, even if it turns out to be small. The test to use in order to probe the significance of the relationship should in principle be parametric, assuming your 100,000 cases can be considered as a random sample from the relevant population.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Lou
Enviado el: 26 May 2007 10:15
Para: [hidden email]
Asunto: Re: Non-Normal age variable, what test to use?

Hi Hector,

Ah, okay. I sought advice on something similar some time back and it led
to a discussion about the distribtion of the age variable, so I guess I
got caught up on thinking of this as being the important thing here.

Just to clarify what you are saying then, I will take my two groups:
people who were screened and people who were not screened and apply a t-
test to see if these two groups stem from the same population? Am I
basically comparing the mean ages from the groups? My sample is huge
(around 100,000 people).

Thanks,

Lou

On Sat, 26 May 2007 10:03:01 -0300, Hector Maletta
wrote:

> No. What are supposed to be normally distributed are the
differences between samples, not the distribution of your variable (e.g.
age) within a sample (nor in the population). If your sample is a random
sample, apply the usual tests, to test whether your two samples come from
the same population or from two different ones. Apply non-parametric tests
if your samples are too small (n<30 or so) or non random, for in those
cases their differences may not follow a normal distribution (they are
supposed to approximate a normal distribution as their size tends to
infinity and only if randomly extracted).

my age
> distribution is not Normal and appears to look more like a step
function
> with fairly constant frequency between ages 50 and 60 and a
constant,
> lower, frequency between 60 and 70. Ages are restricted to the
range 50 -
> 70. Can anyone comment on this and perhaps suggest an
appropriate test
> for comparing the age of people who undertook screening with
those who
> didnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ¢t? Should I be looking to use a non-parametric test in
this case?
>
> Thanks in advance,
>
> Lou

_____

Yahoo! Mail is the world's favourite email. Don't settle for less, sign <http://uk.rd.yahoo.com/evt=44106/*http:/uk.docs.yahoo.com/mail/winter07.html> up for your free account today.