Appropriate statistical analyses

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Appropriate statistical analyses

Mehdi Riazi
Dear all,
Wish you all a healthy 2012. I appreciate your comments for the appropriate analyses for the following data set.
270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the followig table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

To check the validity of the ratings we've analysed the scripts using some textual features such as Word Frequency, Syntactic complexity, and Cohesion. We now want to find out:
1) Is there a sig. difference between the scripts belonging to levels 5, 6, & 7 in terms of any of the three text features (WF, SC, and Cohesion)?
2) Is there a sig. difference between the scripts at the same level (5, e.g.) across the three L1s (A, B, & C) in terms of the three text features?
3) Is there a sig. difference between the scripts at the three levels (5, 6, 7) for each L1 (A, B, & C)?
 
Any suggetions for appropriate statistical anslyses to answer these RQs is much appreciated.
Regards;
Mehdi

--
Mehdi Riazi, PhD
Associate Professor & Director HDR
Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: +61(02) 98507951
Fax: +61(02) 98509199
www.ling.mq.edu.au

Reply | Threaded
Open this post in threaded view
|

Re: Appropriate statistical analyses

statisticsdoc

Medhi,

 

Assuming that your dependent variables are multivariate normal, you may want to carry out a Multivariate Analysis of Variance (MANOVA) to test the significance of the Level, L1, and two-way  Level by L1 interactions.  Assuming that your multivariate effects are significant, you could then follow up with univariate ANOVAs for each dependent variable, and then follow-up post-hoc tests to determine which levels of the independent variables are significantly different.  There are a couple of caveats to this advice:

 

1.       I am a little puzzled as to how Word Frequency is measured as a single rating.  If you have multiple ratings for specific words, or classes of words, then the assumption of normality might not apply to your raw data.  You may need to transform frequency count data (e.g., by  a log transform).  If you are maintaining frequency counts of multiple words or classes of words then you might have multiple Word Frequency variables, and hence you might have too many dependent variables to carry out a MANOVA with adequate statistical power.

2.       It has been argued that it is not strictly necessary to obtain a significant MANOVA in order to carry out the univariate ANOVAs, although this practice is still quite common.  A critical issue in running ANOVAs with multiple dependent variables is the risk of spuriously rejecting the Null Hypothesis in the process of carrying out multiple tests.  The more appropriate way to address this issue is to use a more stringent alph level on the univariate ANOVAs, rather than requiring a significant MANOVA.

 

Perhaps it would be helpful to provide more information on the Word Frequency variable.

 

Best Regards,

 

Stephen Brand, Ph.D. 

 www.StatisticsDoc.com

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mehdi Riazi
Sent: Saturday, December 31, 2011 3:05 AM
To: [hidden email]
Subject: Appropriate statistical analyses

 

Dear all,

Wish you all a healthy 2012. I appreciate your comments for the appropriate analyses for the following data set.

270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the followig table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

To check the validity of the ratings we've analysed the scripts using some textual features such as Word Frequency, Syntactic complexity, and Cohesion. We now want to find out:

1) Is there a sig. difference between the scripts belonging to levels 5, 6, & 7 in terms of any of the three text features (WF, SC, and Cohesion)?

2) Is there a sig. difference between the scripts at the same level (5, e.g.) across the three L1s (A, B, & C) in terms of the three text features?

3) Is there a sig. difference between the scripts at the three levels (5, 6, 7) for each L1 (A, B, & C)?

 

Any suggetions for appropriate statistical anslyses to answer these RQs is much appreciated.

Regards;
Mehdi


--
Mehdi Riazi, PhD
Associate Professor & Director HDR

Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: +61(02) 98507951
Fax: +61(02) 98509199
www.ling.mq.edu.au

 

Reply | Threaded
Open this post in threaded view
|

Fwd: Appropriate statistical analyses

Mehdi Riazi
I thought I resent my response to Stephen to the whole list.
Cheers;
Mehdi

---------- Forwarded message ----------
From: Mehdi Riazi <[hidden email]>
Date: Sun, Jan 1, 2012 at 12:06 PM
Subject: Re: Appropriate statistical analyses
To: StatisticsDoc <[hidden email]>


Dear Stephen,
 
Thank you for your useful comments. Here are some more elaborations:
 
1) Given the proximity of the three levels (5, 6 & 7), I'm afraid the dept variables (WF, SC, & Cohesion) are limited in range and perhaps lack normality. The skewness and kurtosis indices for the three variables are 0.223, -0.178; 0.955, 1.53; 0.267, -0.214 respectively. 
Q: With the skewness and kurtosis indices can we use MANOVA? If yes, do we include all the 3 dependent variables in the model at the same time?
 
2) You're right. We've used a log transform to calculate the word frequency for each text.
 
kind regards;
Mehdi

On Sun, Jan 1, 2012 at 1:12 AM, StatisticsDoc <[hidden email]> wrote:

Medhi,

 

Assuming that your dependent variables are multivariate normal, you may want to carry out a Multivariate Analysis of Variance (MANOVA) to test the significance of the Level, L1, and two-way  Level by L1 interactions.  Assuming that your multivariate effects are significant, you could then follow up with univariate ANOVAs for each dependent variable, and then follow-up post-hoc tests to determine which levels of the independent variables are significantly different.  There are a couple of caveats to this advice:

 

1.       I am a little puzzled as to how Word Frequency is measured as a single rating.  If you have multiple ratings for specific words, or classes of words, then the assumption of normality might not apply to your raw data.  You may need to transform frequency count data (e.g., by  a log transform).  If you are maintaining frequency counts of multiple words or classes of words then you might have multiple Word Frequency variables, and hence you might have too many dependent variables to carry out a MANOVA with adequate statistical power.

2.       It has been argued that it is not strictly necessary to obtain a significant MANOVA in order to carry out the univariate ANOVAs, although this practice is still quite common.  A critical issue in running ANOVAs with multiple dependent variables is the risk of spuriously rejecting the Null Hypothesis in the process of carrying out multiple tests.  The more appropriate way to address this issue is to use a more stringent alph level on the univariate ANOVAs, rather than requiring a significant MANOVA.

 

Perhaps it would be helpful to provide more information on the Word Frequency variable.

 

Best Regards,

 

Stephen Brand, Ph.D. 

 www.StatisticsDoc.com

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mehdi Riazi
Sent: Saturday, December 31, 2011 3:05 AM
To: [hidden email]
Subject: Appropriate statistical analyses

 

Dear all,

Wish you all a healthy 2012. I appreciate your comments for the appropriate analyses for the following data set.

270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the followig table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

To check the validity of the ratings we've analysed the scripts using some textual features such as Word Frequency, Syntactic complexity, and Cohesion. We now want to find out:

1) Is there a sig. difference between the scripts belonging to levels 5, 6, & 7 in terms of any of the three text features (WF, SC, and Cohesion)?

2) Is there a sig. difference between the scripts at the same level (5, e.g.) across the three L1s (A, B, & C) in terms of the three text features?

3) Is there a sig. difference between the scripts at the three levels (5, 6, 7) for each L1 (A, B, & C)?

 

Any suggetions for appropriate statistical anslyses to answer these RQs is much appreciated.

Regards;
Mehdi


--
Mehdi Riazi, PhD
Associate Professor & Director HDR

Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: <a href="tel:%2B61%2802%29%2098507951" target="_blank" value="+61298507951">+61(02) 98507951
Fax: <a href="tel:%2B61%2802%29%2098509199" target="_blank" value="+61298509199">+61(02) 98509199
www.ling.mq.edu.au

 







Reply | Threaded
Open this post in threaded view
|

Re: Appropriate statistical analyses

Maguin, Eugene
In reply to this post by Mehdi Riazi

Mehdi,

 

I think we need more information about the design because I’m not sure whether L1 and Level are between or within factors. The questions would be 1) was there one rater or multiple raters. 2) If multiple raters, how were raters assigned to scripts (or scripts to raters). For instance, did all raters rate all scripts or did each rater rate non-overlapping sets of scripts. 3) If multiple raters rated each script what happened to the multiple ratings. Were they averaged, for instance? 4)Were there 9 sets of 30 test takers such that each set of 30 test takers corresponds to a different combination of L1 and Level.

 

Turning to the DVs. Are Word Frequency, Syntactic complexity, and Cohesion continuous enough to be considered continuous or are they ordinal categorical?

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mehdi Riazi
Sent: Saturday, December 31, 2011 3:05 AM
To: [hidden email]
Subject: Appropriate statistical analyses

 

Dear all,

Wish you all a healthy 2012. I appreciate your comments for the appropriate analyses for the following data set.

270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the followig table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

To check the validity of the ratings we've analysed the scripts using some textual features such as Word Frequency, Syntactic complexity, and Cohesion. We now want to find out:

1) Is there a sig. difference between the scripts belonging to levels 5, 6, & 7 in terms of any of the three text features (WF, SC, and Cohesion)?

2) Is there a sig. difference between the scripts at the same level (5, e.g.) across the three L1s (A, B, & C) in terms of the three text features?

3) Is there a sig. difference between the scripts at the three levels (5, 6, 7) for each L1 (A, B, & C)?

 

Any suggetions for appropriate statistical anslyses to answer these RQs is much appreciated.

Regards;
Mehdi


--
Mehdi Riazi, PhD
Associate Professor & Director HDR

Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: +61(02) 98507951
Fax: +61(02) 98509199
www.ling.mq.edu.au

 

Reply | Threaded
Open this post in threaded view
|

Re: Appropriate statistical analyses

Mehdi Riazi
Dear Gene,
The raters are experienced trained raters who do the ratings of the scripts as a professional job. They are from different L1 backgrounds, but have been trained to use the same rating scale. Their consistency of rating is checked regularly through workshops. The scripts of A (L1) test takers are therefore marked by raters from A background and so forth for B and C L1s. Each script is rated by one rater only and the mark is final, and each rater marks multiple scripts. Test takers' scripts are therefore marked in terms of 3 levels of performance: High (3), Med. (2), and Low (1).
 
There's a claim that this system of rating is consistent and high indices of intra- and inter-rater reliability are reported.
Now, we want to check this claim by using other criteria. The criteria we are using are WF, SC, and Cohesion which are obtained by doing text analysis using a computer program (Coh-Metrix). They are continuous though with limited range.
 
From each L1 background (A, B, C) and from each level (3, 2, 1) we've collected 30 scripts, and we want to check if:
1) scripts rated at different levels (3, 2, 1) are different in terms of textual features (WF, SC, Cohesion)?
2) scripts rated at each level (1, e.g.) are the same across the three L1 backgrounds in terms of the DVs (WF, SC, C)
3) scripts rated at different levels (3, 2, 1) are different in terms of the DVs for each L1.
 
I've been thinking of discriminant analysis, but would like to know if list members can agree on this or suggest alternative analyses.
 
I hope this provides enough information, but please let me know if further elaboration is needed.
Best;
Mehdi


On Wed, Jan 4, 2012 at 3:02 AM, Gene Maguin <[hidden email]> wrote:

Mehdi,

 

I think we need more information about the design because I’m not sure whether L1 and Level are between or within factors. The questions would be 1) was there one rater or multiple raters. 2) If multiple raters, how were raters assigned to scripts (or scripts to raters). For instance, did all raters rate all scripts or did each rater rate non-overlapping sets of scripts. 3) If multiple raters rated each script what happened to the multiple ratings. Were they averaged, for instance? 4)Were there 9 sets of 30 test takers such that each set of 30 test takers corresponds to a different combination of L1 and Level.

 

Turning to the DVs. Are Word Frequency, Syntactic complexity, and Cohesion continuous enough to be considered continuous or are they ordinal categorical?

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mehdi Riazi
Sent: Saturday, December 31, 2011 3:05 AM
To: [hidden email]
Subject: Appropriate statistical analyses

 

Dear all,

Wish you all a healthy 2012. I appreciate your comments for the appropriate analyses for the following data set.

270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the followig table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

To check the validity of the ratings we've analysed the scripts using some textual features such as Word Frequency, Syntactic complexity, and Cohesion. We now want to find out:

1) Is there a sig. difference between the scripts belonging to levels 5, 6, & 7 in terms of any of the three text features (WF, SC, and Cohesion)?

2) Is there a sig. difference between the scripts at the same level (5, e.g.) across the three L1s (A, B, & C) in terms of the three text features?

3) Is there a sig. difference between the scripts at the three levels (5, 6, 7) for each L1 (A, B, & C)?

 

Any suggetions for appropriate statistical anslyses to answer these RQs is much appreciated.

Regards;
Mehdi


--
Mehdi Riazi, PhD
Associate Professor & Director HDR

Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: <a href="tel:%2B61%2802%29%2098507951" target="_blank" value="+61298507951">+61(02) 98507951
Fax: <a href="tel:%2B61%2802%29%2098509199" target="_blank" value="+61298509199">+61(02) 98509199
www.ling.mq.edu.au

 




--
Mehdi Riazi, PhD
Associate Professor & Director HDR
Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: +61(02) 98507951
Fax: +61(02) 98509199
www.ling.mq.edu.au

Reply | Threaded
Open this post in threaded view
|

Re: Appropriate statistical analyses

Maguin, Eugene

Mehdi,

 

I know you are waiting for answers but I still don’t quite understand some elements. I’d like to go back to what you wrote in your first message.

270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the following table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

 

Based on your last message, we know, I hope, that you have three sets of raters (A, B, C) with each set being from a different background but all are well trained on the same rating scale, etc. Here’s what I’m confused about

1)      Are the test takers from three different backgrounds or are the raters from three different backgrounds or ‘test takers’ and ‘raters’ the same person?

2)      The original message referred to levels 5, 6, and 7 but your reply referred to levels 1, 2 and 3. Could these be the same thing?

3)      How many group A, group B, group C raters are there? I get the impression that there are multiple people in each group, not just one.

 

I wonder if the problem could be stated correctly this way. 270 people wrote a script and by an unspecified procedure the 270 scripts were divided into three sets of 90. Each set of 90 scripts was given to one of three groups of raters, with each group having an unspecified number of member-raters. Each script was rated by one rater who assigned it a score 1, 2, or 3 (or maybe, 5, 6, or 7).  So you have these other measures (WF, SC, and Cohesion) and you want to see if the differently rated scripts or scripts rated by different sets of raters differ on these other measures. I know there are people on the list that are better at Anova than I am but, right now, it seems to me that you have a two between factor (rater-group by script-score) design. You have multiple measures so why not use a Manova design?

GLM WF, SC, Cohesion by rater level/ . . .

 

If you agree that my restatement is correct, then I apologize for getting lost in the details.

 

Gene Maguin

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mehdi Riazi
Sent: Tuesday, January 03, 2012 7:16 PM
To: [hidden email]
Subject: Re: Appropriate statistical analyses

 

Dear Gene,

The raters are experienced trained raters who do the ratings of the scripts as a professional job. They are from different L1 backgrounds, but have been trained to use the same rating scale. Their consistency of rating is checked regularly through workshops. The scripts of A (L1) test takers are therefore marked by raters from A background and so forth for B and C L1s. Each script is rated by one rater only and the mark is final, and each rater marks multiple scripts. Test takers' scripts are therefore marked in terms of 3 levels of performance: High (3), Med. (2), and Low (1).

 

There's a claim that this system of rating is consistent and high indices of intra- and inter-rater reliability are reported.

Now, we want to check this claim by using other criteria. The criteria we are using are WF, SC, and Cohesion which are obtained by doing text analysis using a computer program (Coh-Metrix). They are continuous though with limited range.

 

From each L1 background (A, B, C) and from each level (3, 2, 1) we've collected 30 scripts, and we want to check if:

1) scripts rated at different levels (3, 2, 1) are different in terms of textual features (WF, SC, Cohesion)?

2) scripts rated at each level (1, e.g.) are the same across the three L1 backgrounds in terms of the DVs (WF, SC, C)

3) scripts rated at different levels (3, 2, 1) are different in terms of the DVs for each L1.

 

I've been thinking of discriminant analysis, but would like to know if list members can agree on this or suggest alternative analyses.

 

I hope this provides enough information, but please let me know if further elaboration is needed.

Best;

Mehdi

 

On Wed, Jan 4, 2012 at 3:02 AM, Gene Maguin <[hidden email]> wrote:

Mehdi,

 

I think we need more information about the design because I’m not sure whether L1 and Level are between or within factors. The questions would be 1) was there one rater or multiple raters. 2) If multiple raters, how were raters assigned to scripts (or scripts to raters). For instance, did all raters rate all scripts or did each rater rate non-overlapping sets of scripts. 3) If multiple raters rated each script what happened to the multiple ratings. Were they averaged, for instance? 4)Were there 9 sets of 30 test takers such that each set of 30 test takers corresponds to a different combination of L1 and Level.

 

Turning to the DVs. Are Word Frequency, Syntactic complexity, and Cohesion continuous enough to be considered continuous or are they ordinal categorical?

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mehdi Riazi
Sent: Saturday, December 31, 2011 3:05 AM
To: [hidden email]
Subject: Appropriate statistical analyses

 

Dear all,

Wish you all a healthy 2012. I appreciate your comments for the appropriate analyses for the following data set.

270 written scripts by test takers across three L1 backgrounds: A, B, & C are rated by trained and experienced raters at levels of 5, 6 & 7 as presented in the followig table.

 

L1

Level

A

B

C

5

30

30

30

6

30

30

30

7

30

30

30

To check the validity of the ratings we've analysed the scripts using some textual features such as Word Frequency, Syntactic complexity, and Cohesion. We now want to find out:

1) Is there a sig. difference between the scripts belonging to levels 5, 6, & 7 in terms of any of the three text features (WF, SC, and Cohesion)?

2) Is there a sig. difference between the scripts at the same level (5, e.g.) across the three L1s (A, B, & C) in terms of the three text features?

3) Is there a sig. difference between the scripts at the three levels (5, 6, 7) for each L1 (A, B, & C)?

 

Any suggetions for appropriate statistical anslyses to answer these RQs is much appreciated.

Regards;
Mehdi


--
Mehdi Riazi, PhD
Associate Professor & Director HDR

Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: <a href="tel:%2B61%2802%29%2098507951" target="_blank">+61(02) 98507951
Fax: <a href="tel:%2B61%2802%29%2098509199" target="_blank">+61(02) 98509199
www.ling.mq.edu.au

 




--
Mehdi Riazi, PhD
Associate Professor & Director HDR

Department of Linguistics
Faculty of Human Sciences
Macquarie University NSW 2109

Room: C5A 575
Phone: +61(02) 98507951
Fax: +61(02) 98509199
www.ling.mq.edu.au