Hello, I wanted to reach out to those in this community whereby I’m trying to determine the potential of customers who have the best opportunity to acquire a mortgage.
Initially I utilized OLS Regression models (stepwise) in SPSS version 19 to select a model, score the dataset with the regression equation, and sort the data
in descending order and group the data into equal deciles. Then this data was utilized to develop a Gain Chart like the one shown below. Although the regression equation used for the model only explained about 14% (adjusted R2-coefficient of determination)
of the variance in the dependent variable (i.e., First Time Mortgagees, Non Mortgagees). I’m trying to develop a predictive model to see what variables have the best tendency to predict the outcomes of mortgage applications within a Marketing Department for
a credit union. My former boss who has 40 years of experience in Direct Marketing indicated to me that it isn’t important how much variance is explained, but rather I should
look for is that the top decile is more than 10 times the bottom decile. My basic question is if this is a viable approach given what I was told how to develop the model using OLS Regression. The dependent variable was Mortgage Balance
(currency). Would it be better to utilize Logistic Regression as opposed to Linear Regression due to the research question (what variables predict the procurement of a mortgage versus those who are denied). Any insights are highly appreciated because I’m new to Gain Charting and ignoring R2 when using regression analysis to develop models.
Thank you, Quentin Zavala SchoolsFirst Federal Credit Union 714-258-4000 ext 8601 qzavala[hidden email]
|
I'm new to Gain Charting, too, but I have a lot of experience
with OLS regression, doing them and describing them. Here are some of my insights. About the apparent results. An R^2 of 0.14 is pretty fair for a dichotomous outcome, though that sort of statement always depends on What is Possible or What is Useful. Using the top 10% versus the bottom 10% to judge the usefulness seems like a good approach -- especially if that is how it is going to be used. And it has long been my opinion that screening applications of this sort should probably focus on the extremes -- especially to exclude the "worst" before considering other criteria. It sounds like there ought to be quite a few examples available elsewhere, in order to judge these results. About the methodology. I always flinch when I see "stepwise" because of the problems inherent in those approaches. See http://www.stata.com/support/faqs/stat/stepwise.html Your N of 200 000 eliminates the questions about the F-tests being invalid, but it does nothing about the questions of biases and collinearity. It is a good idea to use sub-samples in order to create replications, to show the validity. You might do cross-validation by repeating your methodology with 10 random sub-samples, each 1/10th of original, and fitting the equations *outside* the deriving sample That would be conventional and fairly convincing. However, in order to further reduce the chance of irrelevant biases, it could be wise to do some *non*-random sub-sampling. - Does a formula created from one region of the country (say) replicate when applied to data from another region? ... and so on. The Gain Chart looks useful, but your description does leave me wondering at what you were regressing. It *seems* to me that you say that you are trying to predict whether a mortgage was granted, but that you are using some other, continuous variable (amount) as DV in a regression. That doesn't seem to be a problem if the Gain Chart is useful and intelligible, except that the eventual write-up should be clearer on what was done. -- Rich Ulrich Date: Tue, 22 May 2012 21:54:30 +0000 From: [hidden email] Subject: Re: Using SPSS-Linear Regression to develop Mortgage Models for a financial institution To: [hidden email] Hello,
I wanted to reach out to those in this community whereby I’m trying to determine the potential of customers who have the best opportunity to acquire a mortgage.
Initially I utilized OLS Regression models (stepwise) in SPSS version 19 to select a model, score the dataset with the regression equation, and sort the data in descending order and group the data into equal deciles. Then this data was utilized to develop a Gain Chart like the one shown below. Although the regression equation used for the model only explained about 14% (adjusted R2-coefficient of determination) of the variance in the dependent variable (i.e., First Time Mortgagees, Non Mortgagees). I’m trying to develop a predictive model to see what variables have the best tendency to predict the outcomes of mortgage applications within a Marketing Department for a credit union.
My former boss who has 40 years of experience in Direct Marketing indicated to me that it isn’t important how much variance is explained, but rather I should look for is that the top decile is more than 10 times the bottom decile.
My basic question is if this is a viable approach given what I was told how to develop the model using OLS Regression. The dependent variable was Mortgage Balance (currency). Would it be better to utilize Logistic Regression as opposed to Linear Regression due to the research question (what variables predict the procurement of a mortgage versus those who are denied).
Any insights are highly appreciated because I’m new to Gain Charting and ignoring R2 when using regression analysis to develop models.
Thank you,
Quentin Zavala SchoolsFirst Federal Credit Union 714-258-4000 ext 8601 qzavala[hidden email]
|
Free forum by Nabble | Edit this page |