Key Drivers-Regression vs Random Forest

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Key Drivers-Regression vs Random Forest

Mark Webb-5
I'm interested in ranking 9 domains/attributes in terms of derived importance methods.
I'm using the Standardised Beta in Linear Regression [SPSS], and the 2 importance measures offered in Random Forest [R] .
I'm experimenting with Random Forest as I getting wary of the high inter-domain correlations violating linear regression assumptions. Average inter-domain correlation is about 0.75.
My data consists of mean ratings [0-100%].
Used same data & dependent variable for both methods.

I get different rankings for each method. I would expect this but I would like comments on -
# Would it make any sense in averaging the 3 ranking and using this as a "best" estimate ?
# If averaging is not acceptable - which one is preferred ?
# Are correlations of 0.75 high in this context - should I be worried about regression assumptions ?
# Of the 2 RF importance measures, which is the most commonly used / preferred.

 

Rank

Rank

Rank

Domains

Regression

RF-%IncMSE

RF-IncNodePurity

Dom1  Integrity

5

2

5

Dom2  Service

9

6

7

Dom3  SelfDevelopment

3

9

8

Dom4  SelfExpression

2

1

1

Dom5  Expertise

8

7

9

Dom6  Teamwork

6

3

3

Dom7  Enthusiam

4

8

4

Dom8  Action

1

4

2

Dom9  Stability

7

5

6


--
Mark Webb

Line +27 (21) 786 4379
Cell +27 (72) 199 1000
Fax to email +27 (86) 5513075
Skype  webbmark
Email  [hidden email]
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD