Login  Register

Re: SPSS OLS Regression to develop Mortgage Model for a financial institution

Posted by Rich Ulrich on May 25, 2012; 6:47am
URL: http://spssx-discussion.165.s1.nabble.com/Re-SPSS-OLS-Regression-to-develop-Mortgage-Model-for-a-financial-institution-tp5713357p5713362.html

One thing you did right was to create your Gain chart
as (mostly) scores projected from a different subsample,
so you aren't showing merely a "fit."

Okay, you don't have states -- that was just a "for instance". 
You do have Counties.  You do have months or years of data.
When you select subsets this way, or any other way, there are
two questions to consider.  DO you consistently get similar Gain
charts?  - If any single selection fails, that is a warning that
other results are leaning on biases and artifacts; and that this
set of Predictions, though it may seem to fit your data today, is
apt to become obsolete pretty damn fast.  (Now, I am reminded
of the studies that show how quickly the best computerized Wall
Street stock-fund strategies deteriorate.  A find, new stock-picking
strategy works for a year or so.)

 - DO you get the same stepwise-selected variables?  - Probably not.
Forward or backward?  The best generality I've read is that if you
intend to keep most the variables, start with them all in.  If you will
keep only a few, start with them out.

For my purposes, I've never wanted an equation that uses an
explicit "suppressor" variable -- If the variables are confounding
each other, I want to untangle the confounding.  That is: Find
the two (or more) variables that are collinear, and compute a
new composite score that is the difference or ratio (or whatever)
that plays a sensible role; and use that new variable as a predictor.

Sometimes what invites a suppressor variable is a problem in
scaling, where the top or bottom of a measurement needs to
be compressed or expanded in order to have linear prediction,
and the suppressor is trying to correct that fit.


I'm still at a loss to figure out what you are attempting to
achieve.  As I read it, you are using other variables to "predict"
which customers presently have a loan.  So, why?  You *know*
that already.  Let's see: People not yet retired; and living in the
suburbs, who are buying a house.  I don't imagine what goals
you have in mind for fitting to that criterion, so I can't guess
whether using the Balance (House versus Car?  Current versus
original?) would help anything.
But if you want to see how it works, why not take 15 minutes to
try it?  For an estimate that shows some difference, I would try
a regression using just for the people with a positive balance
 - Using the log of balance would create a predictor with better
variance characteristics.

--
Rich Ulrich



Date: Thu, 24 May 2012 19:03:25 +0000
From: [hidden email]
Subject: Re: SPSS OLS Regression to develop Mortgage Model for a financial institution
To: [hidden email]

Hello Rich,

 

Thank you for you response.  It gave me many good points to think about…it is been seven years since I completed a MS Thesis whereby I was using logistic and linear regressions without a second thought.  The regional aspect of the formula for example doesn’t apply to all regions (i.e., branches are basically located in 4 separate counties in California therefore 99% of mortgages are in California within the 4 counties and this distribution is skewed).     

 

Let me declare I’m not a statistician although my boss is and has 40 years of Direct Marketing experience.  He was tasked with developing a Mortgage Model, so we can determine what variables best predict what type of members may apply for a mortgage. He told me to use the stepwise method.  But I’ll have to investigate this method as you recommended.       

 

To highlight the nature of the IV regressed on the dichotomously coded Mortgage variable; I used a binary (0, 1) or (0=No, 1=Yes) to represent if a customer had a current mortgage instead of the mortgage balance range of (0 to 1,536,757) on 3.3% of the cases or 12,307.  The independent variables used consisted of ordinal and interval/ratio data.   Basic demographic variables were dummy coded; for example male and female were separated out into there own variables (0 = no, 1 = yes).  Some of the continuous variables were fico score, age (continuous) and other products.  The products were dummy coded as well (0 = no participation, 1 = participation).

 

I first used the entire sample or universe of cases.  Then I used a 27% random sample (using SPSS’s probability algorithm in V. 19) and applied it to the full sample and scored the data set using the regression equation to create the gain chart.  

 

What would you recommend in terms what “Method of regression (Enter, Stepwise, Remove Forward, Backwards) to use?   Or if Mortgage Bal should be used instead of binary variables due to the fact I’m using linear regression models.  

 

Thank you in advance for any further comments,

 

 

Quentin Zavala, MS

 

 

 

Quentin Zavala

SchoolsFirst Federal Credit Union
Business Analyst, Research and Analytics

714-258-4000  ext 8601

qzavala[hidden email]

[hidden email]

 

[hidden email]