SPSSX Discussion

CATPCA for nominal data

Classic

List

Threaded

19 messages Options

rsling

CATPCA for nominal data

I am now doing CATPCA on nominal data (with only 1 & 2 for correct and incorrect response). Can anyone shown me how to perform the CATPCA so that I can achieve the aim of data reduction?
My data set:
9 domains with 24 items in each resulting in 216 items
response to each item: 1 - correct, 2 - incorrect
sample size: around 300

Thanks

Hector Maletta

Re: CATPCA for nominal data

CATPCA would work perfectly well, but in fact with dichotomous variables,
i.e. with just two values per variable, CATPCA renders the same results as
ordinary PCA (FACTOR command in SPSS). CATPCA is needed for polithomous
variables (ordinal or nominal) in order to estimate numerical values for the
various categories.
Now, either with FACTOR or CATPCA, your ai mis to reduce a large number of
dichotomous questions to a small number of quantitative scales, on the
assumption that observed responses to the binary questions are explained by
one or more underlying (unobserved) constructs.
If each domain touches upon different conceptual problems, you may try
principal component analysis on each of the 9 sets of questions separately.
Thus each run will deal with only one set of 24 binary variables. If they
actually measure the same underlying variable, then the first component
would explain a large share of total variance, and you may use the first
copmponent's scores as a scale for that underlying unobserved variable.
Procedure RELIABILITY and Cronbach's alpha coefficient would tell you
whether each set of 24 questions behaves adequately for that purpose.
Beware, however, that the total number of cases in your sample (300) is
quite small for such a large number of variables (24 per set), even if you
treat them one domain at a time. It is not undoable, but barely. The
traditional absolute minimum recommended for any procedure linked to linear
regression (as PCA is) is to have at least 10 cases per variable, i.e. 240
in your case, but with 300 cases you would be just a little bit above that
absolute minimum. Error might be large, and results may be statistically not
significant, especially in case the first component does not explain such a
large portion of total variance.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
rsling
Sent: 30 August 2009 10:04
To: [hidden email]
Subject: CATPCA for nominal data

I am now doing CATPCA on nominal data (with only 1 & 2 for correct and
incorrect response). Can anyone shown me how to perform the CATPCA so that I
can achieve the aim of data reduction?
My data set:
9 domains with 24 items in each resulting in 216 items
response to each item: 1 - correct, 2 - incorrect
sample size: around 300

Thanks
--
View this message in context:
http://www.nabble.com/CATPCA-for-nominal-data-tp25212313p25212313.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.71/2332 - Release Date: 08/30/09
06:36:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

rsling

Re: CATPCA for nominal data

Thanks for your suggestion, can you comment on the following steps then:

Analysis ... Data Reduction ... Optimal Scaling
Some Variables are not multiple normal
one sets

Define
Choose the 24 items on domain 1
Choose Nominal?

Output
Component loading
Variance accounts for
Choose to produce category quantification for the 24 items?

Object
I always come across problem saying that "No variables were specified where a list of variables was required. This command not executed. The above Text was found on the PLOT subcommand. Multiple LOADING keywords envountered in the PLOT subcommand.

Category
Joint Category Plots for the 24 items?

Then run the above for the rest 8 domains?

Would it be possible if I run it once with all the 216 items and see if they fall on a certain number of "Factor"?

Also, can anyone teach me how to obtain screeplots in the CATPCA?

Thanks a lot! Informationi Urgently needed.

Hector Maletta wrote

CATPCA would work perfectly well, but in fact with dichotomous variables,
i.e. with just two values per variable, CATPCA renders the same results as
ordinary PCA (FACTOR command in SPSS). CATPCA is needed for polithomous
variables (ordinal or nominal) in order to estimate numerical values for the
various categories.
Now, either with FACTOR or CATPCA, your ai mis to reduce a large number of
dichotomous questions to a small number of quantitative scales, on the
assumption that observed responses to the binary questions are explained by
one or more underlying (unobserved) constructs.
If each domain touches upon different conceptual problems, you may try
principal component analysis on each of the 9 sets of questions separately.
Thus each run will deal with only one set of 24 binary variables. If they
actually measure the same underlying variable, then the first component
would explain a large share of total variance, and you may use the first
copmponent's scores as a scale for that underlying unobserved variable.
Procedure RELIABILITY and Cronbach's alpha coefficient would tell you
whether each set of 24 questions behaves adequately for that purpose.
Beware, however, that the total number of cases in your sample (300) is
quite small for such a large number of variables (24 per set), even if you
treat them one domain at a time. It is not undoable, but barely. The
traditional absolute minimum recommended for any procedure linked to linear
regression (as PCA is) is to have at least 10 cases per variable, i.e. 240
in your case, but with 300 cases you would be just a little bit above that
absolute minimum. Error might be large, and results may be statistically not
significant, especially in case the first component does not explain such a
large portion of total variance.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
rsling
Sent: 30 August 2009 10:04
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: CATPCA for nominal data

I am now doing CATPCA on nominal data (with only 1 & 2 for correct and
incorrect response). Can anyone shown me how to perform the CATPCA so that I
can achieve the aim of data reduction?
My data set:
9 domains with 24 items in each resulting in 216 items
response to each item: 1 - correct, 2 - incorrect
sample size: around 300

Thanks
--
View this message in context:
http://www.nabble.com/CATPCA-for-nominal-data-tp25212313p25212313.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.71/2332 - Release Date: 08/30/09
06:36:00

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: CATPCA for nominal data

In reply to this post by rsling

Binary variables can alternatively be seen as nominal, ordinal or interval, without any consequence. It is recommendable that the "Yes" response always corresponds to the same "direction" (for example, if your items try to measure some underlying trait such as patriotism, the "Yes" should always correspond to the most "patriotic" response. Or the reverse ("No" for more patriotic) but the esaential issue is that all be coded consistently. You may choose to consider them as interval variables if you like.
Producing category quantifications is useless with binary variables: whatever numerical codes you choose (0 and 1, or 1 and 2, or 125-207 or whatever) would produce exactly the same results. I normally use 0 and 1. Quantification of categories (optimal scaling) is required for nominal or ordinal responses with more than 2 categories, such as ethnicity or religious affiliation. The quantification would give you the optimal 'spacing' of these categories.
If each set of items measures a different underlying construct, you should run the procedure on each domain separately. Besides, this is better given the limited number of cases in your sample.
Hector
-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: 01 September 2009 09:37
To: [hidden email]
Subject: Re: CATPCA for nominal data

Dear Hector,

Thanks for your suggestion, can you comment on the following steps then:

Analysis ... Data Reduction ... Optimal Scaling
Some Variables are not multiple normal
one sets

Define
Choose the 24 items on domain 1
Choose Nominal?

Output
Component loading
Variance accounts for
Choose to produce category quantification for the 24 items?

Object
I always come across problem saying that "No variables were specified where a list of variables was required. This command not executed. The above Text was found on the PLOT subcommand. Multiple LOADING keywords envountered in the PLOT subcommand.

Category
Joint Category Plots for the 24 items?

Then run the above for the rest 8 domains?
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.73/2338 - Release Date: 08/31/09 17:52:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ruben Geert van den Berg

Confidence interval for extremely skewed metric variable

In reply to this post by Hector Maletta

Dear all,

I want to estimate a confidence interval for the mean of a metric variable that's extremely skewed to the right. As I (hopefully!) understood, the central limit theorem will make sure that the sampling distribution of the mean will follow a Gaussian distribution (assuming enough observations). However, the skewed distribution causes the standard deviation to be very large compared to the mean value, rendering a very wide confidence interval that's not too informative.

Is there any way (e.g. by a transformation or something) to obtain a smaller interval?

TIA!

Ruben van den Berg

See all the ways you can stay connected to friends and family

Joost van Ginkel

Re: Confidence interval for extremely skewed metric variable

Dear Ruben,

That depends on the amount of skewness. If the mode is located at the beginning or the end of the scale, there's no transformation that can possibly accommodate for that. In that case, I would suggest to dichotomize the variable and to perform a different analysis. If not, I think you can perform some kind of log-transformation, construct the confidence interval of this transformed variable, and transform the lower- and upperbound back. Good luck!

Best regards,

Joost van Ginkel

Joost R. Van Ginkel, PhD
Leiden University
Faculty of Social and Behavioural Sciences
Data Theory Group
PO Box 9555
2300 RB Leiden
The Netherlands
Tel: +31-(0)71-527 3620
Fax: +31-(0)71-527 1721

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Sent: 04 September 2009 11:20
To: [hidden email]
Subject: Confidence interval for extremely skewed metric variable

See all the ways you can stay connected to friends and family

**********************************************************************

This email and any files transmitted with it are confidential and

intended solely for the use of the individual or entity to whom they

are addressed. If you have received this email in error please notify

the system manager.

**********************************************************************

Spousta Jan

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Ruben Geert van den Berg

Hi Ruben,

> "the central limit theorem will make sure that the sampling distribution of the mean will follow a Gaussian distribution"

Unfortunately not. One of the assuptions of CLT is a finite variance of the variable(s) - and extreme skewness can theoretically mean unlimited variance. Check it first if possible.

> "to estimate a confidence interval for the mean"

I would consider a bootstrapping technique in such a case.

http://en.wikipedia.org/wiki/Bootstrapping_(statistics)

Beware: you should have _a lot_ of cases (examples) in extremly skewed distributions to be able to estimate confidence intervals somehow trustfuly.

Best regards,

Jan

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Sent: Friday, September 04, 2009 11:20 AM
To: [hidden email]
Subject: Confidence interval for extremely skewed metric variable

See all the ways you can stay connected to friends and family

_____________

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

P Jste si jisti, že opravdu potřebujete vytisknout tuto zprávu a/nebo její přílohy? Myslete na přírodu.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

P Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --

Marta Garcia-Granero

Re: Confidence interval for extremely skewed metric variable

Hi Ruben:

As Jan and Joost suggest, I would use:

1) bootstrap (check my webpage at http://gjyp.nl/marta/ there's code
for bootstrapping the mean).
2) If the variable is positively skewed, consider log-transforming the
data, check if log-values are more symmetric, compute the 95%CI for the
log-values mean and then back transform the limits. But bear in mind
that you will get a 95%CI for the GEOMETRIC mean, not the arithmetic mean.

HTH,
Marta GG

>
> ------------------------------------------------------------------------
> *From:* SPSSX(r) Discussion [mailto:[hidden email]] *On
> Behalf Of *Ruben van den Berg
> *Sent:* Friday, September 04, 2009 11:20 AM
> *To:* [hidden email]
> *Subject:* Confidence interval for extremely skewed metric variable
>
> Dear all,
>
> I want to estimate a confidence interval for the mean of a metric
> variable that's extremely skewed to the right. As I (hopefully!)
> understood, the central limit theorem will make sure that the sampling
> distribution of the mean will follow a Gaussian distribution (assuming
> enough observations). However, the skewed distribution causes the
> standard deviation to be very large compared to the mean value,
> rendering a very wide confidence interval that's not too informative.
>
> Is there any way (e.g. by a transformation or something) to obtain a
> smaller interval?
>
> TIA!
>
> Ruben van den Berg

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ruben Geert van den Berg

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Spousta Jan

Dear Jan, Joost, Diana and others,

Thanks a lot for your suggestions!

>"One of the assuptions of CLT is a finite variance of the variable(s) - and extreme skewness can theoretically mean unlimited variance. Check it first if possible."

In real-life research, no variable whatsoever will ever have unlimited variance or am I missing something now? The problem I posted bothered me twice (so far): kilometers travelled by train and car and the number of telephone contracts that companies have.

>"I would consider a bootstrapping technique in such a case. "

This was also mentioned by Hector's Maletta's wonderful paper (http://www.spsstools.net/Tutorials/WEIGHTING.pdf). My situation is further complicated by mixed weights which are needed due to a number of different reasons (including disproportional stratification and nonresponse). I'm afraid of the theoretical approach becoming overly complicated and breaking down, so an empirical one may be better indeed. Is there any reason I should NOT try and bootstrap?

Kind regards,

Ruben van den Berg

Subject: RE: Confidence interval for extremely skewed metric variable
Date: Fri, 4 Sep 2009 12:12:44 +0200
From: [hidden email]
To: [hidden email]; [hidden email]

Hi Ruben,

> "the central limit theorem will make sure that the sampling distribution of the mean will follow a Gaussian distribution"

Unfortunately not. One of the assuptions of CLT is a finite variance of the variable(s) - and extreme skewness can theoretically mean unlimited variance. Check it first if possible.

> "to estimate a confidence interval for the mean"

I would consider a bootstrapping technique in such a case.

http://en.wikipedia.org/wiki/Bootstrapping_(statistics)

Beware: you should have _a lot_ of cases (examples) in extremly skewed distributions to be able to estimate confidence intervals somehow trustfuly.

Best regards,

Jan

See all the ways you can stay connected to friends and family

_____________

P Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --

Express yourself instantly with MSN Messenger! MSN Messenger

Bruce Weaver

Re: Confidence interval for extremely skewed metric variable

Administrator

In reply to this post by Joost van Ginkel

Ginkel, Joost van wrote

Dear Ruben,

That depends on the amount of skewness. If the mode is located at the
beginning or the end of the scale, there's no transformation that can
possibly accommodate for that. In that case, I would suggest to
dichotomize the variable and to perform a different analysis. If not, I
think you can perform some kind of log-transformation, construct the
confidence interval of this transformed variable, and transform the
lower- and upperbound back. Good luck!

Best regards,

Joost van Ginkel

I too was thinking about transforming the data (possibly log), computing the confidence interval on the transformed scores, and then back-transforming.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Peck, Jon

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Ruben Geert van den Berg

RE: Re: [SPSSX-L] Confidence interval for extremely skewed metric variable

-----Original Message-----
From: SPSSX(r) Discussion on behalf of Ruben van den Berg
Sent: Fri 9/4/2009 5:48 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Confidence interval for extremely skewed metric variable

Dear Jan, Joost, Diana and others,

Thanks a lot for your suggestions!

>"One of the assuptions of CLT is a finite variance of the variable(s) - and extreme skewness can theoretically mean unlimited variance. Check it first if possible."

In real-life research, no variable whatsoever will ever have unlimited variance or am I missing something now? The problem I posted bothered me twice (so far): kilometers travelled by train and car and the number of telephone contracts that companies have.

Not so. A number of real-world distributions have nonfinite variance, e.g. Weibull.

If you really have two-stage sort of model, e.g., travel is zero for many cases and some other distribution if > 0, you might consider TOBIT regression, available in the SPSSINC TOBIT REGR extension command.

Regards,
Jon Peck

>"I would consider a bootstrapping technique in such a case. "

This was also mentioned by Hector's Maletta's wonderful paper (http://www.spsstools.net/Tutorials/WEIGHTING.pdf) <http://www.spsstools.net/Tutorials/WEIGHTING.pdf> . My situation is further complicated by mixed weights which are needed due to a number of different reasons (including disproportional stratification and nonresponse). I'm afraid of the theoretical approach becoming overly complicated and breaking down, so an empirical one may be better indeed. Is there any reason I should NOT try and bootstrap?

Kind regards,

Ruben van den Berg

________________________________

Subject: RE: Confidence interval for extremely skewed metric variable
Date: Fri, 4 Sep 2009 12:12:44 +0200
From: [hidden email]
To: [hidden email]; [hidden email]

Hi Ruben,

> "the central limit theorem will make sure that the sampling distribution of the mean will follow a Gaussian distribution"
Unfortunately not. One of the assuptions of CLT is a finite variance of the variable(s) - and extreme skewness can theoretically mean unlimited variance. Check it first if possible.

> "to estimate a confidence interval for the mean"
I would consider a bootstrapping technique in such a case.
http://en.wikipedia.org/wiki/Bootstrapping_(statistics)
Beware: you should have _a lot_ of cases (examples) in extremly skewed distributions to be able to estimate confidence intervals somehow trustfuly.

Best regards,

Jan

________________________________

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Ruben van den Berg
Sent: Friday, September 04, 2009 11:20 AM
To: [hidden email]
Subject: Confidence interval for extremely skewed metric variable

Dear all,

I want to estimate a confidence interval for the mean of a metric variable that's extremely skewed to the right. As I (hopefully!) understood, the central limit theorem will make sure that the sampling distribution of the mean will follow a Gaussian distribution (assuming enough observations). However, the skewed distribution causes the standard deviation to be very large compared to the mean value, rendering a very wide confidence interval that's not too informative.

Is there any way (e.g. by a transformation or something) to obtain a smaller interval?

TIA!

Ruben van den Berg

________________________________

See all the ways you can stay connected to friends and family <http://www.microsoft.com/windows/windowslive/default.aspx>
_____________

Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí zpusobené tímto prenosem.

P Jste si jisti, ze opravdu potrebujete vytisknout tuto zprávu a/nebo její prílohy? Myslete na prírodu.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

P Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --

________________________________

Express yourself instantly with MSN Messenger! MSN Messenger <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>

Spousta Jan

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Ruben Geert van den Berg

Hi Ruben,

> In real-life research, no variable whatsoever will ever have unlimited variance or am I missing something now? The problem I posted bothered me twice (so far): kilometers travelled by train and car and the number of telephone contracts that companies have.

In my data, I encounter variables with unlimited variances relatively often. Imagine the following thought experiment: You will assign your trains velocities from the distribution UNIFORM(0; 100) - that is each velocity between zero and 100 km/h has the same "chance to appear". Then measure how much time it takes to the trains to reach the next train stop in the distance of 1 km. The distribution of these times has a distribution without mean and variance. It is often the case with so called ratio distributions.

http://en.wikipedia.org/wiki/Ratio_distribution

I work with financial data, and there are lots of ratio measures and indicators, and many of these have very skewed distributions with tails which are approximately Cauchy-distributed.

Best regards,

Jan

_____________

P Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --

Maguin, Eugene

Re: Confidence interval for extremely skewed metric variable

I need some help to understand the statements that a distribution can have
no mean and no variance. I'll use the example Jan offered.

>>In my data, I encounter variables with unlimited variances relatively
often. Imagine the following thought experiment: You will assign your trains
velocities from the distribution UNIFORM(0; 100) - that is each velocity
between zero and 100 km/h has the same "chance to appear". Then measure how
much time it takes to the trains to reach the next train stop in the
distance of 1 km. The distribution of these times has a distribution without
mean and variance. It is often the case with so called ratio distributions.

In this example, does the statement 'The distribution of these times has a
distribution without mean and variance' depend on the fact that some trains
will have a velocity of 0.0 and, therefore, an infinite arrival time.
Therefore then, if the if the lower limit of the uniform distribution were
set at .001, the mean and variance would be finite?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Spousta Jan

Re: Confidence interval for extremely skewed metric variable

Hi Gene,

In my example, no single train will have the velocity exactly 0 (because this has probability 0 to occur). But for each epsilon you will find too many trains moving slower than epsilon.

And you are right, if you take UNIFORM(0.001, 1) the mean and variance will be finite.

Best regard

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Friday, September 04, 2009 3:45 PM
To: [hidden email]
Subject: Re: Confidence interval for extremely skewed metric variable

I need some help to understand the statements that a distribution can have no mean and no variance. I'll use the example Jan offered.

>>In my data, I encounter variables with unlimited variances relatively
often. Imagine the following thought experiment: You will assign your trains velocities from the distribution UNIFORM(0; 100) - that is each velocity between zero and 100 km/h has the same "chance to appear". Then measure how much time it takes to the trains to reach the next train stop in the distance of 1 km. The distribution of these times has a distribution without mean and variance. It is often the case with so called ratio distributions.

In this example, does the statement 'The distribution of these times has a distribution without mean and variance' depend on the fact that some trains will have a velocity of 0.0 and, therefore, an infinite arrival time.
Therefore then, if the if the lower limit of the uniform distribution were set at .001, the mean and variance would be finite?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

_____________
Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

Jste si jisti, že opravdu potřebujete vytisknout tuto zprávu a/nebo její přílohy? Myslete na přírodu.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

Are you sure that you really need a print version of this message and/or its attachments? Think about nature.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ruben Geert van den Berg

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Marta Garcia-Granero

Dear all,

Marta wrote: "you will get a 95%CI for the GEOMETRIC mean".

Does this hold for ALL 'unskewing' transformations? I definitely need the ARITHMETIC mean so if the answer is 'yes' then I know where I should NOT look for a solution.

TIA!

Ruben van den Berg

> Date: Fri, 4 Sep 2009 12:36:46 +0200
> From: [hidden email]
> Subject: Re: Confidence interval for extremely skewed metric variable
> To: [hidden email]
>
> Hi Ruben:
>
> As Jan and Joost suggest, I would use:
>
> 1) bootstrap (check my webpage at http://gjyp.nl/marta/ there's code
> for bootstrapping the mean).
> 2) If the variable is positively skewed, consider log-transforming the
> data, check if log-values are more symmetric, compute the 95%CI for the
> log-values mean and then back transform the limits. But bear in mind
> that you will get a 95%CI for the GEOMETRIC mean, not the arithmetic mean.
>
> HTH,
> Marta GG
> >
> > ------------------------------------------------------------------------
> > *From:* SPSSX(r) Discussion [mailto:[hidden email]] *On
> > Behalf Of *Ruben van den Berg
> > *Sent:* Friday, September 04, 2009 11:20 AM
> > *To:* [hidden email]
> > *Subject:* Confidence interval for extremely skewed metric variable
> >
> > Dear all,
> >
> > I want to estimate a confidence interval for the mean of a metric
> > variable that's extremely skewed to the right. As I (hopefully!)
> > understood, the central limit theorem will make sure that the sampling
> > distribution of the mean will follow a Gaussian distribution (assuming
> > enough observations). However, the skewed distribution causes the
> > standard deviation to be very large compared to the mean value,
> > rendering a very wide confidence interval that's not too informative.
> >
> > Is there any way (e.g. by a transformation or something) to obtain a
> > smaller interval?
> >
> > TIA!
> >
> > Ruben van den Berg
>
>
> --
> For miscellaneous SPSS related statistical stuff, visit:
> http://gjyp.nl/marta/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Express yourself instantly with MSN Messenger! MSN Messenger

Marta Garcia-Granero

Re: Confidence interval for extremely skewed metric variable

Ruben van den Berg wrote:

> Marta wrote: "you will get a 95%CI for the GEOMETRIC mean".
>
> Does this hold for ALL 'unskewing' transformations? I definitely need
> the ARITHMETIC mean so if the answer is 'yes' then I know where I
> should NOT look for a solution.
This is true for the logarithmic transformation. Bear in mind that when
you back-transform the mean computed from transformed data, you RARELY
(if ever) get the arithmetic mean again,but a different (and sometimes
uninterpretable) statistic. Some transformations can't be undone (see
Bland&Altman: /The use of transformation when comparing two means/ BMJ
1996;312:1153) . If you need a 95%CI for the mean, then you must use
bootstrapping.

HTH,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Confidence interval for extremely skewed metric variable

Administrator

Marta García-Granero-3 wrote

Ruben van den Berg wrote:

> Marta wrote: "you will get a 95%CI for the GEOMETRIC mean".
>
> Does this hold for ALL 'unskewing' transformations? I definitely need
> the ARITHMETIC mean so if the answer is 'yes' then I know where I
> should NOT look for a solution.
This is true for the logarithmic transformation. Bear in mind that when
you back-transform the mean computed from transformed data, you RARELY
(if ever) get the arithmetic mean again,but a different (and sometimes
uninterpretable) statistic. Some transformations can't be undone (see
Bland&Altman: /The use of transformation when comparing two means/ BMJ
1996;312:1153) . If you need a 95%CI for the mean, then you must use
bootstrapping.

HTH,
Marta GG

This is reminding me of a discussion that took place a few years ago in a usenet group. I liked Don Burrill's post on "flavours of mean", so I saved it here:

www.angelfire.com/wv/bwhomedir/notes/flavours_of_mean.pdf

Granaas, Michael

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Marta Garcia-Granero

When the transformed data are truly symmetric the mean and median are the same. When you back transform the mean of the transformed data becomes the median of the original.

As long as your transformed data are reasonably symmetric it is probably safe to interpret any tests or CIs of the mean of the transformed data as tests/CIs of the median of the original data set.

Michael.

****************************************************
Michael Granaas [hidden email]
Assoc. Prof. Phone: 605 677 5295
Dept. of Psychology FAX: 605 677 3195
University of South Dakota
414 E. Clark St.
Vermillion, SD 57069
*****************************************************

________________________________________
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Marta García-Granero [[hidden email]]
Sent: Monday, September 07, 2009 3:47 AM
To: [hidden email]
Subject: Re: Confidence interval for extremely skewed metric variable

Ruben van den Berg wrote:

> Marta wrote: "you will get a 95%CI for the GEOMETRIC mean".
>
> Does this hold for ALL 'unskewing' transformations? I definitely need
> the ARITHMETIC mean so if the answer is 'yes' then I know where I
> should NOT look for a solution.
This is true for the logarithmic transformation. Bear in mind that when
you back-transform the mean computed from transformed data, you RARELY
(if ever) get the arithmetic mean again,but a different (and sometimes
uninterpretable) statistic. Some transformations can't be undone (see
Bland&Altman: /The use of transformation when comparing two means/ BMJ
1996;312:1153) . If you need a 95%CI for the mean, then you must use
bootstrapping.

HTH,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Confidence interval for extremely skewed metric variable

In reply to this post by Bruce Weaver

At 08:36 AM 9/7/2009, Bruce Weaver wrote:

This is reminding me of a discussion that took place a few years ago in a
usenet group. I liked Don Burrill's post on "flavours of mean", so I saved it here:

www.angelfire.com/wv/bwhomedir/notes/flavours_of_mean.pdf

Going farther: back in the 60s, the (then) US National Bureau of Standards published a wonderful reference, the Handbook of Mathematical Functions (Applied Mathematics Series no. 55). From that Handbook(*), "Generalized Mean", 3.1.14 - 3.3.5, p. 10:

Below, the data is the n quantities a(1) to a(n); the arithmetic mean is A, the geometric mean is G, and the harmonic mean is H; and sums are to be taken over k=1,n. Some formulae require that all a(k)>0.

Generalized mean M(t):

3.1.14 M(t)==(Sum(a(k)**t)/n)**1/t

3.1.18-20: A=M(1); H=M(-1);

        G=lim(t->0) M(t)

So this definition encompasses all the usual means. In addition,

3.1.16:          lim(t<+)=max(a(k))
3.1.17:          lim(t>-)=min(a(k))

(I've used "t<=" for t going to +infinity, and "t>-" for t going to -infinity.)

That's a warning against transforming your data to make the mean look good: you can get any value in the observed range by such transformations.

---------------------------------------
Abramowitz, Milton, "Elementary Analytical Methods" (pp. 9-63).

In Abramowitz, Milton, and Irene A. Stegun, ed., Handbook of Mathematical Functions: With Formulas, Graphs and Mathematical Tables, National Bureau of Standards Applied Mathematics Series #55. Washington, D.C., US Government Printing Office, `1964.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD