Statistical significance Vs Meaningfulness

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Statistical significance Vs Meaningfulness

Humphrey Paulie
Dear colleagues,
  I was wondering about the differences between “statistical significance” and “meaningfulness”. In a student paper, I read the author, saying things like “the correlation is significant but not meaningful (r=0.33)” or “the difference between the means of the two groups is significant and meaningful”!!
  How does one distinguish between the two?
  What does meaningfulness mean? And how does one know if a correlation or mean difference is meaningful or not?
  Regards
Humphrey

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Theodora B. Consolacion
"Meaningfulness" can have a couple of different meanings. With
t-tests/anovas, a "meaningful" statistically significant finding
generally refers to an effect size that the author computed. With
clinical studies, the author could be referring to a reliable change
index they computed, which can inform whether a difference found
between two or more groups was "clinically significant". With
something like a correlation, I can only assume that the author is
making the judgment him/herself that the relationship found was
spurious (i.e. some other third factor is confounding this
relationship.

Best,
teddy

On Thu, May 22, 2008 at 8:52 AM, Humphrey Paulie
<[hidden email]> wrote:

> Dear colleagues,
>  I was wondering about the differences between "statistical significance" and "meaningfulness". In a student paper, I read the author, saying things like "the correlation is significant but not meaningful (r=0.33)" or "the difference between the means of the two groups is significant and meaningful"!!
>  How does one distinguish between the two?
>  What does meaningfulness mean? And how does one know if a correlation or mean difference is meaningful or not?
>  Regards
> Humphrey
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Richmond Austria
In reply to this post by Humphrey Paulie
To Humphrey,

  I think what you are referring to is the difference between the statistical significance and effect size. A lot of paper have over used statistical significance which only indicates whether the result happened by mere chance or not. If the result is statistically significant, it means that this did not happen by chance. Now, effect size is different since it indicates an index or a measurememt of association between and among variables. It is usually based on a range. The value of r=0.33 means that there is low association.

  In your case, p-value is equal or lower than the alpha value  but the correlation coefficient of 0.33 is within the range of low association depending on the literature where this was based on. For instance, 0.33 is within 0.3 to 0.5 which is equivalent to low association (Hinkle et al, 1979). For Davis ( 1971), 0.33 is within 0.3 to 0.49 - moderate association.

  It is best to report both values especially if there is statistical significance since the effect size would give you much more information on the association betwen variables such that this relationship did not happen by chance (statisically significant).

Humphrey Paulie <[hidden email]> wrote:
  Dear colleagues,
I was wondering about the differences between “statistical significance” and “meaningfulness”. In a student paper, I read the author, saying things like “the correlation is significant but not meaningful (r=0.33)” or “the difference between the means of the two groups is significant and meaningful”!!
How does one distinguish between the two?
What does meaningfulness mean? And how does one know if a correlation or mean difference is meaningful or not?
Regards
Humphrey

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Melissa Ives
In reply to this post by Humphrey Paulie
In the case of many stats (i.e. chi-square), significance can be related to the N of cases. In these cases in particular, the fact of significance may not be of value.  One way to consider meaningfulness is to look at effect sizes.

There are several tools out there to help calculate effect sizes
 (http://www.danielsoper.com/statcalc/default.aspx#c06 or http://www.chestnut.org/LI/downloads/ESWK.xls)

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Humphrey Paulie
Sent: Thursday, May 22, 2008 10:53 AM
To: [hidden email]
Subject: [SPSSX-L] Statistical significance Vs Meaningfulness

Dear colleagues,
  I was wondering about the differences between “statistical significance” and “meaningfulness”. In a student paper, I read the author, saying things like “the correlation is significant but not meaningful (r=0.33)” or “the difference between the means of the two groups is significant and meaningful”!!
  How does one distinguish between the two?
  What does meaningfulness mean? And how does one know if a correlation or mean difference is meaningful or not?
  Regards
Humphrey

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Handel, Richard W.
In reply to this post by Richmond Austria
My understanding is that the interpretation of "chance" findings below is not entirely accurate.  Ronald Carver in a 1978 article published in the Harvard Educational Review calls this the "odds against chance fantasy." From the article: "The first of the three fantasies can be called the "odds-against-chance" fantasy.  It is the interpretation of the p value as the probability that the research results were due to chance, or caused by chance." (p. 383).  And further, "...the p value is the probability of getting the research results when it is first assumed that it is actually true that chance caused the results." (p. 383).

Best regards,
Rick


Richard W. Handel, Ph.D.
Associate Professor
Eastern Virginia Medical School
Department of Psychiatry and Behavioral Sciences
Hofheimer Hall, 825 Fairfax Avenue
Norfolk, VA 23507
Phone: (757)-446-5888
Fax: (757)-446-5918

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richmond Austria
Sent: Thursday, May 22, 2008 12:39 PM
To: [hidden email]
Subject: Re: Statistical significance Vs Meaningfulness

To Humphrey,

  I think what you are referring to is the difference between the statistical significance and effect size. A lot of paper have over used statistical significance which only indicates whether the result happened by mere chance or not. If the result is statistically significant, it means that this did not happen by chance. Now, effect size is different since it indicates an index or a measurememt of association between and among variables. It is usually based on a range. The value of r=0.33 means that there is low association.

  In your case, p-value is equal or lower than the alpha value  but the correlation coefficient of 0.33 is within the range of low association depending on the literature where this was based on. For instance, 0.33 is within 0.3 to 0.5 which is equivalent to low association (Hinkle et al, 1979). For Davis ( 1971), 0.33 is within 0.3 to 0.49 - moderate association.

  It is best to report both values especially if there is statistical significance since the effect size would give you much more information on the association betwen variables such that this relationship did not happen by chance (statisically significant).

Humphrey Paulie <[hidden email]> wrote:
  Dear colleagues,
I was wondering about the differences between “statistical significance” and “meaningfulness”. In a student paper, I read the author, saying things like “the correlation is significant but not meaningful (r=0.33)” or “the difference between the means of the two groups is significant and meaningful”!!
How does one distinguish between the two?
What does meaningfulness mean? And how does one know if a correlation or mean difference is meaningful or not?
Regards
Humphrey

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Bob Schacht-3
In reply to this post by Humphrey Paulie
At 05:52 AM 5/22/2008, Humphrey Paulie wrote:

>Dear colleagues,
>   I was wondering about the differences between “statistical
> significance” and “meaningfulness”. In a student paper, I read the
> author, saying things like “the correlation is significant but not
> meaningful (r=0.33)” or “the difference between the means of the two
> groups is significant and meaningful”!!
>   How does one distinguish between the two?
>   What does meaningfulness mean? And how does one know if a correlation
> or mean difference is meaningful or not?
>   Regards
>Humphrey

Statistical significance always refers to the actual null hypothesis being
tested-- that is all.

What lies embedded in your question is the larger issue of "significance"
-- and that is that significance depends on context.
"Meaningfulness" suggests a context for significance different from the
bare bones of the null hypothesis.

Let me illustrate by means of a story. When I was in grad school, our
instructor in analytical methods gave us a database of a dozen measurements
and other observations on a set of ceramic cups. Our job was to determine
if the cups were all essentially the same, or if we could distinguish
statistically significant types. Well, we crunched our numbers using the
best available statistical methods, and lo and behold! We discerned several
distinct types that were statistically significant. At which point our
teacher laughed and said that the measurements were taken on a set of cups
made by a local potter, to whom all of the cups were as alike as coke
bottles (remember coke bottles?). He was mass-producing them in batches. In
other words, there may have been statistically significant types, but the
types were not meaningfully different-- at least, not to the maker.

Take a look at the coffee cups in your cupboard at home. Could you create a
set of measurements and other observations that would enable you to sort
out your cups into statistically significant types? Perhaps. Would those
types be meaningful to you? Maybe, maybe not.

Bob Schacht


Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Hector Maletta
In reply to this post by Handel, Richard W.
Humphrey:
This matter has arisen several times in this mailing list. The general issue is the difference between statistical hypotheses and substantive hypotheses.
Statistical hypotheses refer to the relationship between your particular dataset and the corresponding population or universe. A statistically significant result [such a correlation r=0.33] is a result that allows you to reject the "null" hypothesis that the actual correlation at the population level is r=0, and your sample result r=0.33 was obtained merely by chance.
A substantive hypothesis refers to relationships between the various variables in the population, such as "performance in Math among US primary schoolchildren depends on parental education level, even after controlling for parental and child IQ and parental income."
Now, a statistical hypothesis is analyzed based on sample/population theory, governed (for random samples) by the Law of Large Numbers, according to which a measurement (such as a mean or average score) taken in many samples of size N from the same population tend to have a normal distribution with a mean equal to the population mean or score, approaching the normal or Gauss distribution as sample size N tends to infinity, with a standard error equal to the population standard deviation divided by the square root of N. This means that the results of many samples would tend to group more closely around the population mean (i.e. with a smaller error) the larger is N.
The null hypothesis that the population mean or value or correlation coefficient or whatever is zero is more likely when your sample result is relatively close to zero, and your sample size is small. Thus r=0.33 may be enough to reject the null hypothesis with N=10,000 but not so with N=35. With a small N, you will have a relatively high probability of drawing a sample with r=0.33 even if the population has r=0. The odds of such an unfortunate event will diminish with larger and larger samples.
Now, suppose your result passes the significance test. So you are reasonably certain (within reason, i.e. with 95% probability) that the actual or population r>0. You do not know whether it is 0.33, 0.20 or 0.50. You have at most a confidence interval telling you that for samples of size N the actual coefficient will be within the interval (say, from 0.13 to 0.53) for 95% of the possible samples (with 2.5% probability of having a population r<0.13 and 2.5% probability of having a population r>0.53).
So much for significance. Now, is that result meaningful? It depends on what do you want to do with it. Suppose you are seeking the causes of math performance. An r=0.33 implies R2=0.1089, indicating that less than 11% of the residual variance in math performance (once you control for other factors) is explained by parental education. Nearly 90% of that residual is yet to be explained by other factors. Perhaps you feel unhappy with this, or perhaps you feel you have progressed a small inch towards explaining away the whole remainder of variance that was bothering you. Suppose now you'd like to use parental education as a predictor of math performance. With r=0.33 this seems a bit presumptuous: parental education explains only 11% of the remaining differences in math performance and leaves 89% unexplained, so you'd be likely to err more often than not. You'd be more comfortable in your role of forecaster with r=0.90 or something like that. So the question of meaningfulness is entirely dependent on the nature of the problem, the exactness of the theory behind your analysis, and other similar issues.
Another consideration is the theoretical background for the correlation observed. In the example above there is such a background: one can easily imagine parental education level influencing child school performance in many ways. But suppose the correlation is between two less obviously related things, such as math performance and blood group. Is THIS meaningful? No, unless you provide some theoretical background. For instance, certain blood groups are more abundant in certain ethnic groups, and certain ethnic groups have different parental education levels, so it might be that blood group is acting as a proxy for parental education level or ethnic group (this should have to be investigated, not merely thought of). If you are not able to come out with some theory explaining the correlation, then you may scratch your head, put the result aside as a curiosum, and leave it alone for the time being: it is not meaningful, not to you at this time at any rate.
Hope this helps.

Hector

Humphrey Paulie <[hidden email]> wrote:
  Dear colleagues,
I was wondering about the differences between “statistical significance” and “meaningfulness”. In a student paper, I read the author, saying things like “the correlation is significant but not meaningful (r=0.33)” or “the difference between the means of the two groups is significant and meaningful”!!
How does one distinguish between the two?
What does meaningfulness mean? And how does one know if a correlation or mean difference is meaningful or not?
Regards
Humphrey

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Richard Ristow
In reply to this post by Handel, Richard W.
I'm following up this posting because it's relevant to questions
raised before, specifically in thread 'Tests of "significance"'. Bob
Schacht began that thread by asking,

>>The statistician says, "At the .05 level, the Null Hypothesis is
>>rejected." To the man on the street, this is just pedantic
>>mumbo-jumbo. How can I translate that into plain English that the
>>proverbial man on the street can understand?

To the statement that "If the result is statistically significant, it
means that this [probably] did not happen by chance," Handel, Richard
W. responded (01:13 PM 5/22/2008) "My understanding is that this
interpretation of 'chance' findings below is not entirely accurate."

Which it isn't. We'd like to say "this [probably] did not happen by
chance", but the significant result doesn't prove, or even assert,
any such thing. "Probably did not happen by chance" is a Bayesian
statement. In Fisherian or frequentist statistics (that's what
significance testing is), it can't even be addressed, let alone proven.

In Bayesian circumstances (where there's an *a priori* probability
that an effect exists), statistics can address whether a result
"probably happened by chance." But that probability can easily be
much larger than the p-value from the test.

Which gets back to,

>How can I translate "At the .05 level, the Null Hypothesis is
>rejected." into plain English?

I don't think you can. It is not a plain statement, and it's not a
satisfactory one. Nobody cares about "probability of getting the
research results when it is first assumed that it is actually true
that chance caused the results" (as quoted by Richard Handel). They
want to know whether THIS result in THIS instance may be trusted.

We make many decisions, for example in clinical trials, based on
significance tests. (Or on confidence intervals, the same logic
applied to measurements instead of just "is there a difference?".)
That is drawing a Bayesian conclusion from a Fisherian measurement.
But to use statistics for decision-making at all, I see no alternative.

Isn't it nice to know that our discipline rests on quicksand?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Bob Schacht-3
At 07:31 AM 5/23/2008, Richard Ristow wrote:
>I'm following up this posting because it's relevant to questions raised
>before, specifically in thread 'Tests of "significance"'.

Richard,
Thanks for following up on this.

>Bob Schacht began that thread by asking,
>
>>>The statistician says, "At the .05 level, the Null Hypothesis is
>>>rejected." To the man on the street, this is just pedantic mumbo-jumbo.
>>>How can I translate that into plain English that the proverbial man on
>>>the street can understand?
>
>To the statement that "If the result is statistically significant, it
>means that this [probably] did not happen by chance," Handel, Richard W.
>responded (01:13 PM 5/22/2008) "My understanding is that this
>interpretation of 'chance' findings below is not entirely accurate."
>
>Which it isn't. We'd like to say "this [probably] did not happen by
>chance", but the significant result doesn't prove, or even assert, any
>such thing. . . .

On the previous thread, what I recall is that no one came up with an answer
that was entirely satisfactory. I attempted a solution on 4/8 as follows:

>>Statistically significant results.
>
>"For this question, there is at least a 95% chance that participant
>satisfaction and employment outcome are correlated."

I was attempting to use the word "correlated" here in the general popular
sense, not the precise statistical sense.
But Hector objected later the same day anyway, pointing out that
>1. The original question was not about correlation but about chi square,
>which concerns the difference between observed frequencies and those
>expected in case of randomness or independence.
>2. Even in the case of evaluating the significance of a correlation, the
>question of significance is not about the existence of correlation, but
>whether you (based on the correlation observed in a sample of a certain
>size) can infer --with a given degree of confidence-- that some nonzero
>correlation exists in the population.

So I am not surprised that your answer to my question seems to be,

>I don't think you can. It is not a plain statement, and it's not a
>satisfactory one. Nobody cares about "probability of getting the research
>results when it is first assumed that it is actually true that chance
>caused the results" (as quoted by Richard Handel). They want to know
>whether THIS result in THIS instance may be trusted.

I am willing to grant your assessment that no "plain English" translation
is adequate. Nevertheless, my question is not so easily dismissed, because
there is a frequent demonstrated need for such translations. So, let me
re-frame my question.
What plain English translation is least bad?

Suppose that I changed my "plain English" translation to the following
(substituting "associated" for "correlated"):

>Statistically significant results.
>
>"For this question, there is at least a 95% chance that participant
>satisfaction and employment outcome are associated."

or maybe
"For this question, there is at least a 95% chance that participant
satisfaction and employment outcome are related in some way, directly or
indirectly."


Can you, or anyone else, propose a summary statement in plain English that
is less bad than this?

The basic question will not go away just because statisticians find it
difficult to deal with.

Thanks,
Bob Schacht

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Richard Ristow
Bob Schacht had asked,

>>The statistician says, "At the .05 level, the Null Hypothesis is
>>rejected." How can I translate that into plain English that the
>>proverbial man on the street can understand?

I wrote,
>I don't think you can. It is not a plain [concept].

At 07:23 PM 5/23/2008, Bob responded:
>I am willing to grant your assessment that no "plain English"
>translation is adequate. Nevertheless, there is a frequent
>demonstrated need for such translations. The question will not go
>away just because statisticians find it difficult.

No, it won't go away. Unfortunately, it also has no answer. The
problem isn't language, that I can see. "Statistical significance" is
a difficult, counter-intuitive concept, not difficult language.

The plainest statement I can think of is made *before* an experiment or study:

"There is no more than 5% probability that the results will be
statistically significant at the .05 level, but there is actually no
relationship."

(I write"there is a relationship" for "the null hypothesis is false."
That, or something similar, is often a legitimate synonym.)

That's the statement. Unfortunately, it's not what's wanted. The
question is, what do you conclude *after* you've observed statistical
significance at the .05 level?

Now, Bob wrote that "I attempted a solution on 4/8 as follows":

>"For this question, there is at least a 95% chance that there is a
>relationship."

That is plain English, stating something plain and useful.
Unfortunately, it's also wrong.
. First, as I said, it's a Bayesian statement. In Pearson statistics,
with hypothesis testing, it isn't even wrong; it's meaningless. (I
wrote "Fisherian statistics" the other day, but I checked origins and
it looks like "Pearson statistics" is closer.)
. Second, if you do Bayesian thinking with Pearson methods (which
isn't wrong), the 'probability that the null hypothesis is false'
depends on two additional parameters: the prior probability that the
null hypothesis is true; and the 'effect size', as measured by the
probability of significance at the .05 level if there is a
relationship (the null hypothesis is false).

Take an example: Run an artificial 'experiment', with randomly
generated data, for two groups that have the same mean. Then, you
have a 1/20 chance of observing significance at the .05 level; but in
that case, the probability that there's a real difference in means is still 0.
=========================
So the question has no satisfactory answer.

But there's another question that really won't go away:  What do you
conclude *after* the study or experiment shows statistical
significance at the .05 (or other) level?

I've argued something very harsh: Pearson statistical methods, in
pure form, give no guidance at all.

But we do use Pearson results to make decisions; we have no choice.
Run a clinical trial, and at the end you must adopt the new
treatment, or stay with the old.

I think we make such decisions by back-door Bayesian thinking. That's
not wholly unreasonable. With rough (usually unstated) estimates of
the prior probabilities, plus power analysis, plus the Pearson
results, you can draw such conclusions as Bob (and the rest of us) would like.

For example, in , writing "there is no effect" for "the null
hypothesis is true", and "there is an effect" for "the null
hypothesis is false",

If the prior probability that there is an effect is 0.5; and the
statistical power (probability of significance, when there is an
effect) is 0.5, then if the study shows significance at the .05
level, there's a 91% (not 95%) chance that there is an effect.

And I think that, without saying so, that's what we're really doing.

(Did I say "quicksand"?)

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Statistical significance Vs Meaningfulness

Sean McKenzie
I tried to respond to this the other day, but my internet went down and I lost the email.  Let me try again.
 
Every few months someone asks you're question in this forum.  It is one of "the" questions.
 
The name Tukey frequently gets mentioned here:
 
From the wiki on Tukey:
 
·       A D Gordon offered the following summary of Tukey's principles for statistical practice:
... the usefulness and limitation of mathematical statistics; the importance of having methods of statistical analysis that are robust to violations of the assumptions underlying their use; the need to amass experience of the behaviour of specific methods of analysis in order to provide guidance on their use; the importance of allowing the possibility of data's influencing the choice of method by which they are analysed; the need for statisticians to reject the role of 'guardian of proven truth', and to resist attempts to provide once-for-all solutions and tidy over-unifications of the subject; the iterative nature of data analysis; implications of the increasing power, availability and cheapness of computing facilities; the training of statisticians.
·       "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise." J. W. Tukey (1962, page 13), "The future of data analysis". Annals of Mathematical Statistics 33(1), pp. 1-67.
·       "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." J. W. Tukey (1986), "Sunset salvo". The American Statistician 40(1). Online at http://www.jstor.org/pss/2683137
 
In particular, "... the usefulness and limitation of mathematical statistics".
 
Unfortunately much of what we say in statistics/mathematics means something completely different from what other people think they mean.
 
Real and Imaginary numbers, well both are abstract concepts.  I am an Economist by background, and there are Real versus Nominal variables.  Even something basic, like Positive/Negative, arbitrary in mathematics, but people think positive is good, and negative is bad, and so in the end when we are promoting something we set up our equations to make the results positive, so that our audience is more receptive to what we say, and yet at least supposedly, we know better than that.  
 
In the end even we ourselves are confused.  I don't know how many times I've had the discussion about, if we use the term budget balance then we represent deficits as negative, but if we use the term deficit it means something negative and so we should represent something as positive when we say deficit and there is a deficit....I don't really understand that my self.
 
Despite all the advances that have been made since the rise of the computer, a fundamental issue in forecasting economic variables remains the same, "most econometric models do well at forecasting in sample data, but the don't do so well in forecasting out of sample data"...First of all, the above statement ignores the phenomenon of "publication bias", although my overwhelming expectation has been that both the people who said it and the people who heard it understood that.
 
When we estimate a model based on historical data, we arrive at assorted coefficients, and point and interval estimates.  And so we are 99% sure that GDP will grow 3.5% +/- 0.2%, because using this particular model on this particular set of data that's what the results were.  However, apart from the data in the future may not behave the same way as the data in the past, there are many different models and many different data sets.  If you use the coefficients you generated for the US on French data, you may get very bad results, and even if you re-estimate the model for France you could still get very bad results.  Notice I am using the word bad, which is really inappropriate.
 
"we are 99% sure that GDP will grow 3.5% +/- 0.2%", is a statement that can't really be made if 99% sure is to be interpreted in lay man's terms.  We don't really know in any way that the probability that GDP growth will be 3.5% +/- 0.2%.  It is an arbitrary and stylised representation of something, and has a very limited meaning.  Just like positive versus negative, real versus imaginary, it has a very specific meaning.
 
When you work with artificially generated data with 10,000 columns of data 10,000 rows long, and the random process that generated the data is known, there is this overwhelming tendency for our reported statistical results to hold true, but the real world is much more complicated.  We frequently have little data, and if the true process by which the data was generated was already known, we wouldn't have jobs.  
 
These things work because you in fact know the correct model to apply and you use it.  In the real world we do not actually know the correct model, and if we did, we wouldn't need statistics/econometrics.
 
That's actually my second attempt at answering your question, after having seen other people's responses.
 
My first reaction was to mention something I had mentioned before in answering similar questions.
 
Frequently, in Economics at least, we say: there's significance, and then there is significance.  Sometimes three times: there's significance, and then there is significance, and then there is significance.
 
The example I usually use is wealth effects on GDP growth, i.e., if the stock market rises 20% does this induce people to consume more and raise GDP growth.  This was a hot topic in the 1990's.
 
Typical results that people were reporting (I almost said finding, once again forgetting publication bias).
 
Typical results that people were reporting were like this:
 
1) The coefficient on wealth is significant.  That is
the T-Stat indicates it is Statistically Significant, which in Economics typically means is different from zero.  So something is significant/meaningful.
 
2) The coefficient on Wealth is "small".  A $ trillion (10^12) rise in stock market capitalization, raises GDP by 1 or 2 billion (10^9), for a variable whose magnitude is 10 Trillion or so (10^13).  A Pittance. 0.01% growth in a variable that typically grows 2% to 3%. And so, Significant but not meaningful.
 
3)  The likelihood ratio test for the model with wealth versus the model without, indicates the two models are indistinguishable.  Most typically some type of R^2 type calculation, in this case since we were just adding an extra variable to existing models, a likelihood ratio test.  Once again, not significant/meaningful in this category.
 
The above was all of course based on the United States.
 
In the more recent past I have read papers where they say the wealth effect is significant in categories 2) and 3).  Some conjecture as to why results 10 years ago were different is, we simply appended wealth onto existing models, existing models which were "good", which typically already had high values of the relevant statistics, rather than developing models to showcase wealth effects, although of course then the latter models are biased in that they were deliberately designed to showcase wealth effects.
 
All permutations of significant/not significant by categories 1), 2) and 3) are possible.
 
1) and 3) are typically somehow "objective" in some sense, in that they are statistics spewed out by our programs.
 
2) has a tendency to be application specific and "subjective".  In some other application a 0.01% change could be considered "large".  But the interpretations of 1) and 3) also vary from topic to topic, and across time.  If you were forecasting GDP in 1955 and generated R^2's of 0.5 that might be worth reporting, but if you get that in 2008 that would be uninteresting, unless of course you were using some variable/model which had rarely been used before, or had previously been believed to do nothing etc...
I'll say most of the times your question has been asked, its been answered the second way.
 
(It has and it was are contracted to its in Brooklyn.  I meant it has above.  Sorry but I have been away from my native land for so long I question my ability to speak my native language anymore.  I was tickled pink to see that I had done that.)
 
_________________________________________________________________
Keep your kids safer online with Windows Live Family Safety.
http://www.windowslive.com/family_safety/overview.html?ocid=TXT_TAGLM_WL_Refresh_family_safety_052008
====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Converting string variables into numeric

Deepa Bhat
In reply to this post by Richard Ristow
Hi everyone,

I just transferred an Excel file to SPSS.
All the variables were transferred as string.
I would like to convert it to numeric.
For example, right now there are YES or NO in the column.

I would like to convert YES=1 and NO= 2.

If I first change the Variable Type from String to Numeric, I lose the
values.

How do I do all this using syntax? I tried the recode command and it
didn't change.

RECODE
  q3_1  (YES='1')  .
EXECUTE .

Thanks,
Deepa

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Converting string variables into numeric

Hector Maletta
You should remember that strings have a specified length, and should be
enclosed in quotes. I do not know the length of the string variable in your
example, but suppose it is three characters long. The sequence should be
something like this:
RECODE Q3_1 ('YES' = '  1') (' NO' = '  2').
AUTORECODE Q3_1 (CONVERT) INTO Q3_1_NUM.
The first command recodes the character string 'YES' (all uppercase, not
including 'Yes' or 'yes') into the character string '[space][space]1', and
the character string '[SPACE]NO' (not including NO[SPACE] or any other
variants with lowercase letters) into the character string
'[space][space]2'. You should check that all existing variants of the
strings YES and NO are covered by the RECODE command. The second command
recodes these resulting strings into the numeric values 1 or 2. You could
also include several variables in the same RECODE command, as long as all
have the same strings that should be recoded into the same new strings.
Likewise, AUTORECODE may include a variable list with character strings
involving only number symbols, to be converted into numeric variables.
Surviving strings that do not represent numbers would not be converted into
figures.
Hector


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Deepa Bhat
Sent: 10 June 2008 13:49
To: [hidden email]
Subject: Converting string variables into numeric

Hi everyone,

I just transferred an Excel file to SPSS.
All the variables were transferred as string.
I would like to convert it to numeric.
For example, right now there are YES or NO in the column.

I would like to convert YES=1 and NO= 2.

If I first change the Variable Type from String to Numeric, I lose the
values.

How do I do all this using syntax? I tried the recode command and it
didn't change.

RECODE
  q3_1  (YES='1')  .
EXECUTE .

Thanks,
Deepa

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Converting string variables into numeric

Mahbub Khandoker
In reply to this post by Deepa Bhat
Hi Deepa,
Your syntax change original variable, kind a risky one to loose data.
However it works as,

RECODE q3_1   ('Yes'='1') ('No'='2').
EXECUTE.
Then you have to change string to numeric.

On the other hand you can recode into different variables as follows,

RECODE q3_1   (CONVERT) ('Yes'=1) ('No'=2) INTO q3_1_num.
EXECUTE.

You can also use auto recode,

AUTORECODE VARIABLES= q3_1
  /INTO q3_1_new
  /PRINT.

Cheers!
Mahbub

Mahbub Khandoker
Decision Support
Tel: 416 535 8501 Ex 6534

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Deepa Bhat
Sent: Tuesday, June 10, 2008 12:49 PM
To: [hidden email]
Subject: Converting string variables into numeric

Hi everyone,

I just transferred an Excel file to SPSS.
All the variables were transferred as string.
I would like to convert it to numeric.
For example, right now there are YES or NO in the column.

I would like to convert YES=1 and NO= 2.

If I first change the Variable Type from String to Numeric, I lose the
values.

How do I do all this using syntax? I tried the recode command and it
didn't change.

RECODE
  q3_1  (YES='1')  .
EXECUTE .

Thanks,
Deepa

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

______________________________________________________________________
This email has been scanned by the CAMH Email Security System.
______________________________________________________________________



______________________________________________________________________
This email has been scanned by the CAMH Email Security System.
______________________________________________________________________

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Data Wizard problem

Mark A Davenport MADAVENP
In reply to this post by Sean McKenzie
All,
 
Sorry.  Found the problem.  Although the Wizard allows an empty column on the end of a dataset, it gives you a GET DATA error when you do.  Why this is a GET DATA error I have no idea.
 

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962)
====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD