statistical validity of instrument

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

statistical validity of instrument

Juanito Talili
Dear Listers,

  We (as team) are constructing a self administered instrument(say, instrument to measure cognitive skills of athletes). Here are the steps in the development of the instrument.

  1. The indicators were identified based on related literature plus practical decisions of the team.  After which, the suggestions and comments of the experts (other independent group) were incorporated to improve the instrument.  We finally come up with 7 indicators at this stage.
  2. Then, pilot testing was done to 146 respondents.
  3. We conducted an exploratory factor analysis (efa). The efa extracted only one factor which explained about 93% of the variability.
  4. A confirmatory factor analysis was done using AMOS software and found that the data fit significantly the one factor model (e.g., RMSEA<0.05, the chisquare has nonsignificant p-value, AGFI and CFI are close to 1.0).  In addition, all the 7 indicators are statistically significant (p<.001).

  The question here is: Were we in the right tract in testing the statistical validity of our instrument?. Based on the statistical results, is it correct if we say that the instrument is statistically valid?  Your thoughts are very important.

  Thank you.
  Juanito


---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: statistical validity of instrument

Art Kendall
Validity is the degree to which a measure corresponds to the construct.
Face validity is a matter of expert opinion.  Two ways to look at
validity are 1) to see how well the measure correlates with established
measures of validity and 2) to see if it correlates as predicted with
things it should like standardized test of achievement.
Google ["test validity" definition ] and ["test validity" "cognitive
skills"].

Reliability is the degree to which a measure consistently measures
whatever it purports to measure.
Use the RELIABILITY procedure to look at this and other aspects of item
analysis. Alpha is KR-20 when items are dichotomous.
Google ["test reliability" definition] .

Of course a measure can be no more valid than it is reliable.  In other
words, reliability is the upper limit of validity.

- - - -
In a nutshell:
If this is a study limited to your tiny sample, if you intend it to be
used in one or a few studies on a very similar population  you are
probably ok.

if you intend to publish this as a standardized test, you have to do a*
lot *more work.

----
Some things to think about:
How does the pilot study group relate to the pop you are interested in?
Is it a sample?  A convenient group?
The term "athletes" can mean many things.  If you intend to use it in
the long run to talk about  a pop of say Olympic javelin throwers,  then
147 is substantial *if *it is a sample.
If you mean participants in middle school intramurals, then 147 is close
to useless.

When you say "indicators" do you mean dichotomous items  or do you mean
summative scales each made up of sets of dichotomous items?


Why aren't existing measures of cognitive skills adequate for your work?
If you are talking about children from middle school  up to grad school
applicants there is a major industry for standardized tests.
If you are talking about I/O applications there are another wide series
of existing tests.

Is this an exercise to familiarize you with ideas about test
construction?  Do you intend this measure to be used only in general
research or do you intend it as input to decisions about selection?

Art Kendall
Social Research Consultants


Juanito Talili wrote:

> Dear Listers,
>
>   We (as team) are constructing a self administered instrument(say, instrument to measure cognitive skills of athletes). Here are the steps in the development of the instrument.
>
>   1. The indicators were identified based on related literature plus practical decisions of the team.  After which, the suggestions and comments of the experts (other independent group) were incorporated to improve the instrument.  We finally come up with 7 indicators at this stage.
>   2. Then, pilot testing was done to 146 respondents.
>   3. We conducted an exploratory factor analysis (efa). The efa extracted only one factor which explained about 93% of the variability.
>   4. A confirmatory factor analysis was done using AMOS software and found that the data fit significantly the one factor model (e.g., RMSEA<0.05, the chisquare has nonsignificant p-value, AGFI and CFI are close to 1.0).  In addition, all the 7 indicators are statistically significant (p<.001).
>
>   The question here is: Were we in the right tract in testing the statistical validity of our instrument?. Based on the statistical results, is it correct if we say that the instrument is statistically valid?  Your thoughts are very important.
>
>   Thank you.
>   Juanito
>
>
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: statistical validity of instrument

Swank, Paul R
I have a somewhat different take. Face validity is when the test appears
valid to the examinee. Content validity is based on expert opinion. But
most importantly, validity is the accumulation of evidence that supports
the validity of a scale. There is no one piece of evidence which
guarantees a test is valid. It may establish the validity of a scale for
a specific purpose in a specific population. In your case, I would scrap
the EFA. If you expect a single factor then CFA is the way to go.
However, factorial validity is relatively weak as evidence, compared say
to criterion related validity. It is especially so in  your case given
that you have poor power to actually detect differences between an RMSEA
of .05 from .08 (power = .27) or to detect a difference between .05 and
.02 (.15). See MacCallum, R., Browne, M., & Sugawara, H. (1996). Power
analysis and determination of sample size for covariance structure
modeling. Psychological Methods, 1, 130-149. Thus, I agree with Art that
you should be cautious in your statements about validity. You have a
start but it will take more evidence than this to convince many people
that your scale has the necessary validity for use.

Paul R. Swank, Ph.D.
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center - Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Art Kendall
Sent: Wednesday, May 07, 2008 6:37 AM
To: [hidden email]
Subject: Re: statistical validity of instrument

Validity is the degree to which a measure corresponds to the construct.
Face validity is a matter of expert opinion.  Two ways to look at
validity are 1) to see how well the measure correlates with established
measures of validity and 2) to see if it correlates as predicted with
things it should like standardized test of achievement.
Google ["test validity" definition ] and ["test validity" "cognitive
skills"].

Reliability is the degree to which a measure consistently measures
whatever it purports to measure.
Use the RELIABILITY procedure to look at this and other aspects of item
analysis. Alpha is KR-20 when items are dichotomous.
Google ["test reliability" definition] .

Of course a measure can be no more valid than it is reliable.  In other
words, reliability is the upper limit of validity.

- - - -
In a nutshell:
If this is a study limited to your tiny sample, if you intend it to be
used in one or a few studies on a very similar population  you are
probably ok.

if you intend to publish this as a standardized test, you have to do a*
lot *more work.

----
Some things to think about:
How does the pilot study group relate to the pop you are interested in?
Is it a sample?  A convenient group?
The term "athletes" can mean many things.  If you intend to use it in
the long run to talk about  a pop of say Olympic javelin throwers,  then
147 is substantial *if *it is a sample.
If you mean participants in middle school intramurals, then 147 is close
to useless.

When you say "indicators" do you mean dichotomous items  or do you mean
summative scales each made up of sets of dichotomous items?


Why aren't existing measures of cognitive skills adequate for your work?
If you are talking about children from middle school  up to grad school
applicants there is a major industry for standardized tests.
If you are talking about I/O applications there are another wide series
of existing tests.

Is this an exercise to familiarize you with ideas about test
construction?  Do you intend this measure to be used only in general
research or do you intend it as input to decisions about selection?

Art Kendall
Social Research Consultants


Juanito Talili wrote:
> Dear Listers,
>
>   We (as team) are constructing a self administered instrument(say,
instrument to measure cognitive skills of athletes). Here are the steps
in the development of the instrument.
>
>   1. The indicators were identified based on related literature plus
practical decisions of the team.  After which, the suggestions and
comments of the experts (other independent group) were incorporated to
improve the instrument.  We finally come up with 7 indicators at this
stage.
>   2. Then, pilot testing was done to 146 respondents.
>   3. We conducted an exploratory factor analysis (efa). The efa
extracted only one factor which explained about 93% of the variability.
>   4. A confirmatory factor analysis was done using AMOS software and
found that the data fit significantly the one factor model (e.g.,
RMSEA<0.05, the chisquare has nonsignificant p-value, AGFI and CFI are
close to 1.0).  In addition, all the 7 indicators are statistically
significant (p<.001).
>
>   The question here is: Were we in the right tract in testing the
statistical validity of our instrument?. Based on the statistical
results, is it correct if we say that the instrument is statistically
valid?  Your thoughts are very important.
>
>   Thank you.
>   Juanito
>
>
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.
Try it now.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD